Duplicate Content Filter: What it is and how it works

Duplicate Content has become a huge topic of discussion lately, thanks to the new filters that search engines have implemented. This article will help you understand why you might be caught in the filter, and ways to avoid it. We'll also show you how you can determine if your pages have duplicate content, and what to do to fix it.

Search engine spam is any deceitful attempts to deliberately trick the search engine into returning inappropriate, redundant, or poor-quality search results. Many times this behavior is seen in pages that are exact replicas of other pages which are created to receive better results in the search engine. Many people assume that creating multiple or similar copies of the same page will either increase their chances of getting listed in search engines or help them get multiple listings, due to the presence of more keywords.

In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, and the spam along with it. Unfortunately, good, hardworking webmasters have fallen prey to the filters imposed by the search engines that remove duplicate content. It is those webmasters who unknowingly spam the search engines, when there are some things they can do to avoid being filtered out. In order for you to truly understand the concepts you can implement to avoid the duplicate content filter, you need to know how this filter works.

First, we must understand that the term "duplicate content penalty" is actually a misnomer. When we refer to penalties in search engine rankings, we are actually talking about points that are deducted from a page in order to come to an overall relevancy score. But in reality, duplicate content pages are not penalized. Rather they are simply filtered, the way you would use a sieve to remove unwanted particles. Sometimes, "good particles" are accidentally filtered out.

Knowing the difference between the filter and the penalty, you can now understand how a search engine determines what duplicate content is. There are basically four types of duplicate content that are filtered out:

Websites with Identical Pages - These pages are considered duplicate, as well as websites that are identical to another website on the Internet are also considered to be spam. Affiliate sites with the same look and feel which contain identical content, for example, are especially vulnerable to a duplicate content filter. Another example would be a website with doorway pages. Many times, these doorways are skewed versions of landing pages. However, these landing pages are identical to other landing pages. Generally, doorway pages are intended to be used to spam the search engines in order to manipulate search engine results.

Scraped Content - Scraped content is taking content from a web site and repackaging it to make it look different, but in essence it is nothing more than a duplicate page. With the popularity of blogs on the internet and the syndication of those blogs, scraping is becoming more of a problem for search engines.

E-Commerce Product Descriptions - Many eCommerce sites out there use the manufacturer's descriptions for the products, which hundreds or thousands of other eCommerce stores in the same competitive markets are using too. This duplicate content, while harder to spot, is still considered spam.

Distribution of Articles - If you publish an article, and it gets copied and put all over the Internet, this is good, right? Not necessarily for all the sites that feature the same article. This type of duplicate content can be tricky, because even though Yahoo and MSN determine the source of the original article and deems it most relevant in search results, other search engines like Google may not, according to some experts.

So, how does a search engine's duplicate content filter work? Essentially, when a search engine robot crawls a website, it reads the pages, and stores the information in its database. Then, it compares its findings to other information it has in its database. Depending upon a few factors, such as the overall relevancy score of a website, it then determines which are duplicate content, and then filters out the pages or the websites that qualify as spam. Unfortunately, if your pages are not spam, but have enough similar content, they may still be regarded as spam.

There are several things you can do to avoid the duplicate content filter. First, you must be able to check your pages for duplicate content. Using our Similar Page Checker, you will be able to determine similarity between two pages and make them as unique as possible. By entering the URLs of two pages, this tool will compare those pages, and point out how they are similar so that you can make them unique.

Since you need to know which sites might have copied your site or pages, you will need some help. We recommend using a tool that searches for copies of your page on the Internet: www.copyscape.com. Here, you can put in your web page URL to find replicas of your page on the Internet. This can help you create unique content, or even address the issue of someone "borrowing" your content without your permission.

Let's look at the issue regarding some search engines possibly not considering the source of the original content from distributed articles. Remember, some search engines, like Google, use link popularity to determine the most relevant results. Continue to build your link popularity, while using tools like www.copyscape.com to find how many other sites have the same article, and if allowed by the author, you may be able to alter the article as to make the content unique.

If you use distributed articles for your content, consider how relevant the article is to your overall web page and then to the site as a whole. Sometimes, simply adding your own commentary to the articles can be enough to avoid the duplicate content filter; the Similar Page Checker could help you make your content unique. Further, the more relevant articles you can add to compliment the first article, the better. Search engines look at the entire web page and its relationship to the whole site, so as long as you aren't exactly copying someone's pages, you should be fine.

If you have an eCommerce site, you should write original descriptions for your products. This can be hard to do if you have many products, but it really is necessary if you wish to avoid the duplicate content filter. Here's another example why using the Similar Page Checker is a great idea. It can tell you how you can change your descriptions so as to have unique and original content for your site. This also works well for scraped content also. Many scraped content sites offer news. With the Similar Page Checker, you can easily determine where the news content is similar, and then change it to make it unique.

Do not rely on an affiliate site which is identical to other sites or create identical doorway pages. These types of behaviors are not only filtered out immediately as spam, but there is generally no comparison of the page to the site as a whole if another site or page is found as duplicate, and get your entire site in trouble.

The duplicate content filter is sometimes hard on sites that don't intend to spam the search engines. But it is ultimately up to you to help the search engines determine that your site is as unique as possible. By using the tools in this article to eliminate as much duplicate content as you can, you'll help keep your site original and fresh.

The Issue at Hand
Websites that utilize databases which can insert content into a webpage by way of a dynamic script like PHP or JavaScript are increasingly popular. This type of site is considered dynamic. Many websites choose dynamic content over static content. This is because if a website has thousands of products or pages, writing or updating each static by hand is a monumental task.

There are two types of URLs: dynamic and static. A dynamic URL is a page address that results from the search of a database-driven web site or the URL of a web site that runs a script. In contrast to static URLs, in which the contents of the web page stay the same unless the changes are hard-coded into the HTML, dynamic URLs are generated from specific queries to a site's database. The dynamic page is basically only a template in which to display the results of the database query. Instead of changing information in the HTML code, the data is changed in the database.

But there is a risk when using dynamic URLs: search engines don't like them. For those at most risk of losing search engine positioning due to dynamic URLs are e-commerce stores, forums, sites utilizing content management systems and blogs like Mambo or WordPress, or any other database-driven website. Many times the URL that is generated for the content in a dynamic site looks something like this:

http://www.somesites.com/forums/thread.php?threadid=12345&sort=date

A static URL on the other hand, is a URL that doesn't change, and doesn't have variable strings. It looks like this:

http://www.somesites.com/forums/the-challenges-of-dynamic-urls.htm

Static URLs are typically ranked better in search engine results pages, and they are indexed more quickly than dynamic URLs, if dynamic URLs get indexed at all. Static URLs are also easier for the end-user to view and understand what the page is about. If a user sees a URL in a search engine query that matches the title and description, they are more likely to click on that URL than one that doesn't make sense to them.

A search engine wants to only list pages its index that are unique. Search engines decide to combat this issue by cutting off the URLs after a specific number of variable strings (e.g.: ? & =).

For example, let's look at three URLs:

   http://www.somesites.com/forums/thread.php?threadid=12345&sort=date
   http://www.somesites.com/forums/thread.php?threadid=67890&sort=date
   http://www.somesites.com/forums/thread.php?threadid=13579&sort=date

All three of these URLs point to three different pages. But if the search engine purges the information after the first offending character, the question mark (?), now all three pages look the same:

   http://www.somesites.com/forums/thread.php
   http://www.somesites.com/forums/thread.php
   http://www.somesites.com/forums/thread.php

Now, you don't have unique pages, and consequently, the duplicate URLs won't be indexed.

Another issue is that dynamic pages generally do not have any keywords in the URL. It is very important to have keyword rich URLs. Highly relevant keywords should appear in the domain name or the page URL. This became clear in a recent study on how the top three search engines, Google, Yahoo, and MSN, rank websites.

The study involved taking hundreds of highly competitive keyword queries, like travel, cars, and computer software, and comparing factors involving the top ten results. The statistics show that of those top ten, Google has 40-50% of those with the keyword either in the URL or the domain; Yahoo shows 60%; and MSN has an astonishing 85%! What that means is that to these search engines, having your keywords in your URL or domain name could mean the difference between a top ten ranking, and a ranking far down in the results pages.

The Solution
So what can you do about this difficult problem? You certainly don't want to have to go back and recode every single dynamic URL into a static URL. This would be too much work for any website owner.

If you are hosted on a Linux server, then you will want to make the most of the Apache Mod Rewrite Rule, which is gives you the ability to inconspicuously redirect one URL to another, without the user's (or a search engine's) knowledge. You will need to have this module installed in Apache; for more information, you can view the documentation for this module here. This module saves you from having to rewrite your static URLs manually.

How does this module work? When a request comes in to a server for the new static URL, the Apache module redirects the URL internally to the old, dynamic URL, while still looking like the new static URL. The web server compares the URL requested by the client with the search pattern in the individual rules.

For example, when someone requests this URL:
   http://www.somesites.com/forums/the-challenges-of-dynamic-urls.html

The server looks for and compares this static-looking URL to what information is listed in the .htaccess file, such as:

   RewriteEngine on
   RewriteRule thread-threadid-(.*)\.htm$ thread.php?threadid=$1

It then converts the static URL to the old dynamic URL that looks like this, with no one the wiser:
   http://www.somesites.com/forums/thread.php?threadid=12345

You now have a URL that only will rank better in the search engines, but your end-users can definitely understand by glancing at the URL what the page will be about, while allowing Apache's Mod Rewrite Rule to handle to conversion for you, and still keeping the dynamic URL.

If you are not particularly technical, you may not wish to attempt to figure out the complex Mod Rewrite code and how to use it, or you simply may not have the time to embark upon a new learning curve. Therefore, it would be extremely beneficial to have something to do it for you. This URL Rewriting Tool can definitely help you. What this tool does is implement the Mod Rewrite Rule in your .htaccess file to secretly convert a URL to another, such as with dynamic and static ones.

With the URL Rewriting Tool, you can opt to rewrite single pages or entire directories. Simply enter the URL into the box, press submit, and copy and paste the generated code into your .htaccess file on the root of your website. You must remember to place any additional rewrite commands in your .htaccess file for each dynamic URL you want Apache to rewrite. Now, you can give out the static URL links on your website without having to alter all of your dynamic URLs manually because you are letting the Mod Rewrite Rule do the conversion for you, without JavaScript, cloaking, or any sneaky tactics.

Another thing you must remember to do is to change all of your links in your website to the static URLs in order to avoid penalties by search engines due to having duplicate URLs. You could even add your dynamic URLs to your Robots Exclusion Standard File (robots.txt) to keep the search engines from spidering the duplicate URLs. Regardless of your methods, after using the URL Rewrite Tool, you should ideally have no links pointing to any of your old dynamic URLs.

You have multiple reasons to utilize static URLs in your website whenever possible. When it's not possible, and you need to keep your database-driven content as those old dynamic URLs, you can still give end-users and search engine a static URL to navigate, and all the while, they are still your dynamic URLs in disguise. When a search engine engineer was asked if this method was considered "cloaking", he responded that it indeed was not, and that in fact, search engines prefer you do it this way. The URL Rewrite Tool not only saves you time and energy by helping you use static URLs by converting them transparently to your dynamic URLs, but it will also save your rankings in the search engines.

If you've read anything about or studied Search Engine Optimization, you've come across the term "backlink" at least once. For those of you new to SEO, you may be wondering what a backlink is, and why they are important. Backlinks have become so important to the scope of Search Engine Optimization, that they have become some of the main building blocks to good SEO. In this article, we will explain to you what a backlink is, why they are important, and what you can do to help gain them while avoiding getting into trouble with the Search Engines.

What are "backlinks"? Backlinks are links that are directed towards your website. Also knows as Inbound links (IBL's). The number of backlinks is an indication of the popularity or importance of that website. Backlinks are important for SEO because some search engines, especially Google, will give more credit to websites that have a good number of quality backlinks, and consider those websites more relevant than others in their results pages for a search query.

When search engines calculate the relevance of a site to a keyword, they consider the number of QUALITY inbound links to that site. So we should not be satisfied with merely getting inbound links, it is the quality of the inbound link that matters.
A search engine considers the content of the sites to determine the QUALITY of a link. When inbound links to your site come from other sites, and those sites have content related to your site, these inbound links are considered more relevant to your site. If inbound links are found on sites with unrelated content, they are considered less relevant. The higher the relevance of inbound links, the greater their quality.

For example, if a webmaster has a website about how to rescue orphaned kittens, and received a backlink from another website about kittens, then that would be more relevant in a search engine's assessment than say a link from a site about car racing. The more relevant the site is that is linking back to your website, the better the quality of the backlink.

Search engines want websites to have a level playing field, and look for natural links built slowly over time. While it is fairly easy to manipulate links on a web page to try to achieve a higher ranking, it is a lot harder to influence a search engine with external backlinks from other websites. This is also a reason why backlinks factor in so highly into a search engine's algorithm. Lately, however, a search engine's criteria for quality inbound links has gotten even tougher, thanks to unscrupulous webmasters trying to achieve these inbound links by deceptive or sneaky techniques, such as with hidden links, or automatically generated pages whose sole purpose is to provide inbound links to websites. These pages are called link farms, and they are not only disregarded by search engines, but linking to a link farm could get your site banned entirely.

Another reason to achieve quality backlinks is to entice visitors to come to your website. You can't build a website, and then expect that people will find your website without pointing the way. You will probably have to get the word out there about your site. One way webmasters got the word out used to be through reciprocal linking. Let's talk about reciprocal linking for a moment.

There is much discussion in these last few months about reciprocal linking. In the last Google update, reciprocal links were one of the targets of the search engine's latest filter. Many webmasters had agreed upon reciprocal link exchanges, in order to boost their site's rankings with the sheer number of inbound links. In a link exchange, one webmaster places a link on his website that points to another webmasters website, and vice versa. Many of these links were simply not relevant, and were just discounted. So while the irrelevant inbound link was ignored, the outbound links still got counted, diluting the relevancy score of many websites. This caused a great many websites to drop off the Google map.

We must be careful with our reciprocal links. There is a Google patent in the works that will deal with not only the popularity of the sites being linked to, but also how trustworthy a site is that you link to from your own website. This will mean that you could get into trouble with the search engine just for linking to a bad apple. We could begin preparing for this future change in the search engine algorithm by being choosier with which we exchange links right now. By choosing only relevant sites to link with, and sites that don't have tons of outbound links on a page, or sites that don't practice black-hat SEO techniques, we will have a better chance that our reciprocal links won't be discounted.

Many webmasters have more than one website. Sometimes these websites are related, sometimes they are not. You have to also be careful about interlinking multiple websites on the same IP. If you own seven related websites, then a link to each of those websites on a page could hurt you, as it may look like to a search engine that you are trying to do something fishy. Many webmasters have tried to manipulate backlinks in this way; and too many links to sites with the same IP address is referred to as backlink bombing.

One thing is certain: interlinking sites doesn't help you from a search engine standpoint. The only reason you may want to interlink your sites in the first place might be to provide your visitors with extra resources to visit. In this case, it would probably be okay to provide visitors with a link to another of your websites, but try to keep many instances of linking to the same IP address to a bare minimum. One or two links on a page here and there probably won't hurt you.

There are a few things to consider when beginning your backlink building campaign. It is helpful to keep track of your backlinks, to know which sites are linking back to you, and how the anchor text of the backlink incorporates keywords relating to your site. A tool to help you keep track of your backlinks is the Domain Stats Tool. This tool displays the backlinks of a domain in Google, Yahoo, and MSN. It will also tell you a few other details about your website, like your listings in the Open Directory, or DMOZ, from which Google regards backlinks highly important; Alexa traffic rank, and how many pages from your site that have been indexed, to name just a few.

Another tool to help you with your link building campaign is the Backlink Builder Tool. It is not enough just to have a large number of inbound links pointing to your site. Rather, you need to have a large number of QUALITY inbound links. This tool searches for websites that have a related theme to your website which are likely to add your link to their website. You specify a particular keyword or keyword phrase, and then the tool seeks out related sites for you. This helps to simplify your backlink building efforts by helping you create quality, relevant backlinks to your site, and making the job easier in the process.

There is another way to gain quality backlinks to your site, in addition to related site themes: anchor text. When a link incorporates a keyword into the text of the hyperlink, we call this quality anchor text. A link's anchor text may be one of the under-estimated resources a webmaster has. Instead of using words like "click here" which probably won't relate in any way to your website, using the words "Please visit our tips page for how to nurse an orphaned kitten" is a far better way to utilize a hyperlink. A good tool for helping you find your backlinks and what text is being used to link to your site is the Backlink Anchor Text Analysis Tool. If you find that your site is being linked to from another website, but the anchor text is not being utilized properly, you should request that the website change the anchor text to something incorporating relevant keywords. This will also help boost your quality backlinks score.

Building quality backlinks is extremely important to Search Engine Optimization, and because of their importance, it should be very high on your priority list in your SEO efforts. We hope you have a better understanding of why you need good quality inbound links to your site, and have a handle on a few helpful tools to gain those links.

One of the many factors in Google's search engine algorithm is the age of a domain name. In a small way, the age of a domain gives the appearance of longevity and therefore a higher relevancy score in Google.

Driven by spam sites which pop up and die off quickly, the age of the domain is usually a sign whether or not a site is yesterday's news or tomorrow's popular site. We see this in the world of business, for example. While the novelty that may go with a new store in town brings a short burst of initial business, people tend to trust a business that has been around for a long time over one that is brand new. The same is true for websites. Or, as Rob from BlackwoodProductions.com says, "Rent the store (i.e. register the domain) before you open for business".

Two things that are considered in the age of a domain name are:

The age of the website

The length of time a domain has been registered

The age of the website is built up of how long the content has been actually on the web, how long the site has been in promotion, and even the last time content was updated. The length of time a domain has been registered is measured by not only the actual date the domain was registered, but also how long it is registered for. Some domains only register for a year at a time, while others are registered for two, five, or even ten years.

In the latest Google update that SEOs call the Jagger Update, some of the big changes seen were the importance given to age; age of incoming links, age of web content, and the date the domain was registered. There were many things, in reality, that were changed in this last update, but since we're talking about the age of a domain, we'll only deal with those issues specifically. We'll talk more in other articles about other factors you will want to be aware of that Google changed in their evaluation criteria of websites on the Internet.

One of the ways Google uses to minimize search engine spam is by giving new websites a waiting period of three to four months before giving it any kind of PageRank. This is referred to as the "sandbox effect". It's called the "sandbox effect" because it has been said that Google wants to see if those sites are serious about staying around on the web. The sandbox analogy comes from the concept that Google does this by throwing all of the new sites into a sandbox and let them play together, away from all the adults. Then, when those new sites "grow up", so to speak, then they are allowed to be categorized with the "adults", or the websites that aren't considered new.

What does this mean to you? For those of you with new websites, you may be disappointed in this news, but don't worry. There are some things you can do while waiting for the sandbox period to expire, such as concentrating on your backlink strategies, promoting your site through Pay-per-click, articles, RSS feeds, or in other ways. Many times, if you spend this sandbox period wisely, you'll be ready for Google when it does finally assign you a PageRank, and you could find yourself starting out with a great PageRank!

Even though the domain's age is a factor, critics believe it only gets a little weight in the algorithm. Since the age of your domain is something you have no control over, it doesn't necessarily mean that your site isn't going to rank well in the Search Engine Results Pages (SERPs). It does mean, however, that you will have to work harder in order to build up your site popularity and concentrate on factors that you can control, link inbound links and the type of content you present on your website.

So what happens if you change your domain name? Does this mean you're going to get a low grade with a search engine if you have a new site? No, not necessarily. There are a few things you can do to help ensure that your site won't get lost in the SERPs because of the age of the domain.

1. Make sure you register your domain name for the longest amount of time possible. Many registrars allow you to register a domain name for as long as five years, and some even longer. Registering your domain for a longer period of time gives an indication that your site intends to be around for a long time, and isn't going to just disappear after a few months. This will help boost your score with regards to your domain's age.

2. Consider registering a domain name even before you are sure you're going to need it. We see many domains out there that even while they are registered; they don't have a website to go with it. This could mean that the site is in development, or simply someone saw the use of that particular domain name, and wanted to snatch it up before someone else did. There doesn't seem to be any problems with this method so far, so it certainly can't hurt you to buy a domain name you think could be catchy, even if you end up just selling it later on.

3. Think about purchasing a domain name that was already pre-owned. Not only will this allow you to avoid the "sandbox effect" of a new website in Google, but it also allows you to keep whatever PageRank may have already been attributed to the domain. Be aware that most pre-owned domains with PageRank aren't as cheaply had as a new domain, but it might be well worth it to you to invest a bit more money right at the start.

4. Keep track of your domain's age. One of the ways you can determine the age of a domain is with this handy Domain Age Tool. What it does is allows you to view the approximate age of a website on the Internet, which can be very helpful in determining what kind of edge your competitors might have over you, and even what a site might have looked like when it first started.

To use it, simply type in the URL of your domain and the URLs of your competitors, and click submit. This will give you the age of the domains and other interesting information, like anything that had been cached from the site initially. This could be especially helpful if you are purchasing a pre-owned domain.

Because trustworthy sites are going to have to be the wave of the future, factoring in the age of a domain is a good idea. Even though a site that may have been around for years may suddenly go belly-up, or the next big eBay or Yahoo! just might be getting it start, it may not be a full measure of how trustworthy a site is or will be. This is why there are many other factors that weigh into a search engine's algorithm and not just a single factor alone. What we do know is that we've seen age becoming of more importance that it had been previously, there are only good things to be said about having a site that's been around for a while.

In the world of Search Engine Optimization, Location is important. Search engines like to bring relevant results to a user, not only in the area of keywords and sites that give the user exactly what they are looking for, but also in the correct language as well. It doesn't do a lot of good for a Russian-speaking individual to continually get websites returned in a search query that are written in Egyptian or in Chinese. So a search engine has to have some way to be able to return the results the user is looking for in the right language, and a search engine's goal is also to try and get the user as close to home as possible in the realm of their search results.

Many people wonder why their websites don't rank well in some search engines, especially if they are trying to get ranked in a search engine based in another country. Perhaps they may not even know they are in another country? You say that is impossible: how could one not know what country they are in? It might surprise that individual to find that their website might in fact be hosted in a completely different country, perhaps even on another continent!

Consider that many search engines, including Google, will determine country not only based on the domain name (like .co.uk or .com.au), but also the country of a website's physical location based upon IP address. Search engines are programmed with information that tells them which IP addresses belong to which particular country, as well as which domain suffixes are assigned to which countries.

Let's say, for instance, that you are wishing to rank highly in Google based in the United States. It would not do well, then, for you to have your website hosted in Japan or Australia. You might have to switch your web host to one whose servers reside in the United States.

There is a tool we like to use called the Website to Country Tool. What this tool does is it allows you to view which country your website is hosted. Not only will this tell you what country your site is hosted in, but it can also help you determine a possible reason why your website may not be ranking as highly as you might like in a particular search engine.

It might be disheartening to learn that your website has been hosted in another country, but it is better to understand why your site might not be ranking as highly as you'd like it to be, especially when there is something you can definitely do about it.

The fight to top search engines' results knows no limits – neither ethical, nor technical. There are often reports of sites that have been temporarily or permanently excluded from Google and the other search engines because of malpractice and using “black hat” SEO optimization techniques. The reaction of search engines is easy to understand – with so many tricks and cheats that SEO experts include in their arsenal, the relevancy of returned results is seriously compromised to the point where search engines start to deliver completely irrelevant and manipulated search results. And even if search engines do not discover your scams right away, your competitors might report you.

Keyword Density or Keyword Stuffing?

Sometimes SEO experts go too far in their desire to push their clients' sites to top positions and resort to questionable practices, like keyword stuffing. Keyword stuffing is considered an unethical practice because what you actually do is use the keyword in question throughout the text suspiciously often. Having in mind that the recommended keyword density is from 3 to 7%, anything above this, say 10% density starts to look very much like keyword stuffing and it is likely that will not get unnoticed by search engines. A text with 10% keyword density can hardly make sense, if read by a human. Some time ago Google implemented the so called “Florida Update” and essentially imposed a penalty for pages that are keyword-stuffed and over-optimized in general.
Generally, keyword density in the title, the headings, and the first paragraphs matters more. Needless to say that you should be especially careful not to stuff these areas. Try the Keyword Density Cloud tool to check if your keyword density is in the acceptable limits, especially in the above-mentioned places. If you have a high density percentage for a frequently used keyword, then consider replacing some of the occurrences of the keyword with synonyms. Also, generally words that are in bold and/or italic are considered important by search engines but if any occurrence of the target keywords is in bold and italic, this also looks unnatural and in the best case it will not push your page up.

Doorway Pages and Hidden Text

Another common keyword scam is doorway pages. Before Google introduced the PageRank algorithm, doorways were a common practice and there were times when they were not considered an illegal optimization. A doorway page is a page that is made especially for the search engines and that has no meaning for humans but is used to get high positions in search engines and to trick users to come to the site. Although keywords are still very important, today keywords alone have less effect in determining the position of a site in search results, so doorway pages do not get so much traffic to a site but if you use them, don't ask why Google punished you.
Very similar to doorway pages was a scam called hidden text. This is text, which is invisible to humans (e.g. the text color is the same as the page background) but is included in the HTML source of the page, trying to fool search engines that the particular page is keyword-rich. Needless to say, both doorway pages and hidden text can hardly be qualified as optimization techniques, there are more manipulation than everything else.

Duplicate Content

It is a basic SEO rule that content is king. But not duplicate content. In terms of Google, duplicate content means text that is the same as the text on a different page on the SAME site (or on a sister-site, or on a site that is heavily linked to the site in question and it can be presumed that the two sites are related) – i.e. when you copy and paste the same paragraphs from one page on your site to another, then you might expect to see your site's rank drop. Most SEO experts believe that syndicated content is not treated as duplicate content and there are many examples of this. If syndicated content were duplicate content, that the sites of news agencies would have been the first to drop out of search results. Still, it does not hurt to check from time if your site has duplicate content with another, at least because somebody might be illegally copying your content and you do not know. The Similar Page Checker tool will help you see if you have grounds to worry about duplicate content.

Links Spam

Links are another major SEO tool and like the other SEO tools it can be used or misused. While backlinks are certainly important (for Yahoo backlinks are important as quantity, while for Google it is more important what sites backlinks come from), getting tons of backlinks from a link farm or a blacklisted site is begging to be penalized. Also, if outbound links (links from your site to other sites) considerably outnumber your inbound links (links from other sites to your site), then you have put too much effort in creating useless links because this will not improve your ranking. You can use the Domain Stats Tool to see the number of backlinks (inbound links) to your site and the Site Link Analyzer to see how many outbound links you have.
Using keywords in links (the anchor text), domain names, folder and file names does boost your search engine rankings but again, the precise measure is the boundary between topping the search results and being kicked out of them. For instance, if you are optimizing for the keyword “cat”, which is a frequently chosen keyword and as with all popular keywords and phrases, competition is fierce, you might not see other alternative for reaching the top but getting a domain name like http://www.cat-cats-kittens-kitty.com, which no doubt is packed with keywords to the maximum but is first – difficult to remember, and second – if the contents does not correspond to the plenitude of cats in the domain name, you will never top the search results.
Although file and folder names are less important than domain names, now and then (but definitely not all the time) you can include “cat” (and synonyms) in them and in the anchor text of the links. This counts well, provided that anchors are not artificially stuffed (for instance if you use “cat_cats_kitten” as anchor for internal site links this anchor certainly is stuffed). While you have no control over third sides that link to you and use anchors that you don't like, it is up to you to perform periodic checks what anchors do other sites use to link to you. A handy tool for this task is the Backlink Anchor Text Analysis, where you enter the URL and get a listing of the sites that link to you and the anchor text they use.
Finally, to Google and the other search engines it makes no difference if a site is intentionally over-optimized to cheat them or over-optimization is the result of good intentions, so no matter what your motives are, always try to keep to reasonable practices and remember that do not overstep the line.

Making efforts to optimize a site is great but what counts is how search engines see your efforts. While even the most careful optimization does not guarantee tops position in search results, if your site does not follow basic search engine optimisation truths, then it is more than certain that this site will not score well with search engines. One way to check in advance how your SEO efforts are seen by search engines is to use a search engine simulator.

Spiders Explained

Basically all search engine spiders function on the same principle – they crawl the Web and index pages, which are stored in a database and later use various algorithms to determine page ranking, relevancy, etc of the collected pages. While the algorithms of calculating ranking and relevancy widely differ among search engines, the way they index sites is more or less uniform and it is very important that you know what spiders are interested in and what they neglect.
Search engine spiders are robots and they do not read your pages the way a human does. Instead, they tend to see only particular stuff and are blind for many extras (Flash, JavaScript) that are intended for humans. Since spiders determine if humans will find your site, it is worth to consider what spiders like and what don't.

Flash, JavaScript, Image Text or Frames?!

Flash, JavaScript and image text are NOT visible to search engines. Frames are a real disaster in terms of SEO ranking. All of them might be great in terms of design and usability but for search engines they are absolutely wrong. An incredible mistake one can make is to have a Flash intro page (frames or no frames, this will hardly make the situation worse) with the keywords buried in the animation. Check with the Search Engine Spider Simulator tool a page with Flash and images (and preferably no text or inbound or outbound hyperlinks) and you will see that to search engines this page appears almost blank.
Running your site through this simulator will show you more than the fact that Flash and JavaScript are not SEO favorites. In a way, spiders are like text browsers and they don't see anything that is not a piece of text. So having an image with text in it means nothing to a spider and it will ignore it. A workaround (recommended as a SEO best practice) is to include meaningful description of the image in the ALT attribute of the <IMG> tag but be careful not to use too many keywords in it because you risk penalties for keyword stuffing. ALT attribute is especially essential, when you use links rather than text for links. You can use ALT text for describing what a Flash movie is about but again, be careful not to trespass the line between optimization and over-optimization.

Are Your Hyperlinks Spiderable?

The search engine spider simulator can be of great help when trying to figure out if the hyperlinks lead to the right place. For instance, link exchange websites often put fake links to your site with _javascript (using mouse over events and stuff to make the link look genuine) but actually this is not a link that search engines will see and follow. Since the spider simulator would not display such links, you'll know that something with the link is wrong.
It is highly recommended to use the <noscript> tag, as opposed to _javascript based menus. The reason is that _javascript based menus are not spiderable and all the links in them will be ignored as page text. The solution to this problem is to put all menu item links in the <noscript> tag. The <noscript> tag can hold a lot but please avoid using it for link stuffing or any other kind of SEO manipulation.
If you happen to have tons of hyperlinks on your pages (although it is highly recommended to have less than 100 hyperlinks on a page), then you might have hard times checking if they are OK. For instance, if you have pages that display “403 Forbidden”, “404 Page Not Found” or similar errors that prevent the spider from accessing the page, then it is certain that this page will not be indexed. It is necessary to mention that a spider simulator does not deal with 403 and 404 errors because it is checking where links lead to not if the target of the link is in place, so you need to use other tools for checking if the targets of hyperlinks are the intended ones.

Looking for Your Keywords

While there are specific tools, like the Keyword Playground or the Website Keyword Suggestions, which deal with keywords in more detail, search engine spider simulators also help to see with the eyes of a spider where keywords are located among the text of the page. Why is this important? Because keywords in the first paragraphs of a page weigh more than keywords in the middle or at the end. And if keywords visually appear to us to be on the top, this may not be the way spiders see them. Consider a standard Web page with tables. In this case chronologically the code that describes the page layout (like navigation links or separate cells with text that are the same sitewise) might come first and what is worse, can be so long that the actual page-specific content will be screens away from the top of the page. When we look at the page in a browser, to us everything is fine – the page-specific content is on top but since in the HTML code this is just the opposite, the page will not be noticed as keyword-rich.

Are Dynamic Pages Too Dynamic to be Seen At All

Dynamic pages (especially ones with question marks in the URL) are also an extra that spiders do not love, although many search engines do index dynamic pages as well. Running the spider simulator will give you an idea how well your dynamic pages are accepted by search engines. Useful suggestions how to deal with search engines and dynamic URLs can be found in the Dynamic URLs vs. Static URLs article.

Meta Keywords and Meta Description

Meta keywords and meta description, as the name implies, are to be found in the <META> tag of a HTML page. Once meta keywords and meta descriptions were the single most important criterion for determining relevance of a page but now search engines employ alternative mechanisms for determining relevancy, so you can safely skip listing keywords and description in Meta tags (unless you want to add there instructions for the spider what to index and what not but apart from that meta tags are not very useful anymore).

Back in the dawn of the Internet, Yahoo! was the most popular search engine. When Google arrived, its indisputably precise search results made it the preferred search engine. However, Google is not the only search engine and it is estimated that about 20-25% or searches are conducted on Yahoo! Another major player on the market is MSN, which means that SEO professionals cannot afford to optimize only for Google but need to take into account the specifics of the other two engines (Yahoo! and MSN) as well.
Optimizing for three search engines at the same time is not an easy task. There were times, when the SEO community was inclined to think that the algorithm of Yahoo! was on deliberately just the opposite to the Google algorithm because pages that ranked high in Google did not do so well in Yahoo! and vice versa. The attempt to optimize a site to appeal to both search engines usually lead to being kicked out of the top of both of them.
Although there is no doubt that the algorithms of the two search engines are different, since both are constantly changing, none of them is made publicly available by its authors and the details about how each of the algorithms function are obtained by speculation based on probe-trial tests for particular keywords, it is not possible to say for certain what exactly is different. What is more, having in mind the frequency with which algorithms are changed, it is not possible to react to every slight change, even if algorithms' details were known officially. But knowing some basic differences between the two does help to get better ranking. A nice visual representation of the differences in positioning between Yahoo! and Google gives the Yahoo vs Google tool.

The Yahoo! Algorithm - Differences With Google

Like all search engines, Yahoo! too spiders the pages on the Web, indexes them in its database and later performs various mathematical operations to produce the pages with the search results. Yahoo! Slurp (the Yahoo! spiderbot) is the the second most active spider crawler on the Web. Yahoo! Slurp is not different from the other bots and if your page misses important elements of the SEO mix that make it not spiderable, then it hardly makes a difference which algorithm will be used because you will never get to a top position. (You may want to try the Search Engine Spider Simulator and check what of your pages is spiderable).
Yahoo! Slurp might be even more active than Googlebot because occasionally there are more pages in the Yahoo! index than in Google. Another alleged difference between Yahoo! and Google is the sandbox (putting the sites “on hold” for some time till they appear in search results). Google's sandbox is deeper, so if you have made recent changes to your site, you might have to wait a month or two (shorter for Yahoo! and longer for Google) till these changes are reflected in the search results.
With new major changes in the Google algorithm under way (the so-called “BigDaddy” Infrastructure expected to be fully launched in March-April 2006) it's hard to tell if the same SEO tactics will be hot on Google in two months' time. One of the supposed changes is the decrease in weight of links. If this happens, a major difference between Yahoo! and Google will be eliminated because as of today Google places more importance on factors such as backlinks, while Yahoo! sticks more to onpage factors, like keyword density in the title, the URL, and the headings.
Of all the differences between Yahoo! and Google, the way keywords in the title and in the URL are treated is the most important. If you have the keyword in these two places, then you can expect a top 10 place in Yahoo!. But beware – a title and an URL cannot be unlimited and technically you can place no more than 3 or 4 keywords there. Also, it matters if the keyword in the title and in the URL is in a basic form or if it is a derivative – e.g. when searching for “cat”, URLs with “catwalk” will also be displayed in Yahoo! but most likely in the second 100 results, while URLs with “cat” only are quite near to the top.
Since Yahoo! is first a directory for submissions and then a search engine (with Google it's just the opposite), a site, which has the keyword in the category it is listed under, stands a better chance to be in the beginning of the search results. With Google this is not that important. For Yahoo! keywords in filenames also score well, while for Google this is not a factor of exceptional importance.
But the major difference is keyword density. The higher the density, the higher the positioning with Yahoo! But beware – some of the keyword-rich sites on Yahoo! can with no difficulty fall into the keyword-stuffed category for Google, so if you attempt to score well on Yahoo! (with keyword density above 7-8%), you risk to be banned by Google!

Yahoo! WebRank

Following Google's example, Yahoo! introduced a Web toolbar that collects anonymous statistics about which sites users browse, thus way getting an aggregated value (from 0 to 10) of how popular a given site is. The higher the value, the more popular a site is and the more valuable the backlinks from it are.
Although WebRank and positioning in the search results are not directly correlated, there is a dependency between them – sites with high WebRank tend to position higher than comparable sites with lower WebRank and the WebRanks of the top 20-30 results for a given keyword are most often above 5.00 on average.
The practical value of WebRank as a measure of success is often discussed in SEO communities and the general opinion is that this is not the most relevant metrics. However, one of the benefits of WebRank is that it alerts Yahoo! Slurp that a new page has appeared, thus inviting it to spider it, if it is not already in the Yahoo! Search index.
When Yahoo! toolbar was launched in 2004, it had an icon that showed the WebRank of the page that is currently open in the browser. Later this feature has been removed but still there are tools on the Web that allow to check the WebRank of a particular page. For instance, this tool allows to check the WebRanks of a whole bunch of pages at a time.

It's never easy for newcomers to enter a market and there are barriers of different kinds. For newcomers to the world of search engines, the barrier is called a sandbox – your site stays there until it gets mature enough to be allowed to the Top Positions club. Although there is no direct confirmation of the existence of a sandbox, Google employees have implied it and SEO experts have seen in practice that new sites, no matter how well optimized, don't rank high on Google, while on MSN and Yahoo they catch quickly. For Google, the jailing in the sandbox for new sites with new domains is on average 6 months, although it can vary from less than a month to over 8 months.

Sandbox and Aging Delay

While it might be considered unfair to stop new sites by artificial means like keeping them at the bottom of search results, there is a fair amount of reasoning why search engines, and above all Google, have resorted to such measures. With blackhat practices like bulk buying of links, creation of duplicate content or simply keyword stuffing to get to the coveted top, it is no surprise that Google chose to penalize new sites, which overnight get tons of backlinks, or which are used as a source of backlinks to support an older site (possibly owned by the same company). Needless to say, when such fake sites are indexed and admitted to top positions, this deteriorates search results, so Google had to take measures for ensuring that such practices will not be tolerated. The sandbox effect works like a probation period for new sites and by making the practice of farming fake sites a long-term, rather than a short-term payoff for site owners, it is supposed to decrease its use.
Sandbox and aging delay are similar in meaning and many SEO experts use them interchangeably. Aging delay is more self-explanatory – sites are “delayed” till they come of age. Well, unlike in legislation, with search engines this age is not defined and it differs. There are cases when several sites were launched in the same day, were indexed within a week from each other but the aging delay for each of them expired in different months. As you see, the sandbox is something beyond your control and you cannot avoid it but still there are steps you can undertake to minimize the damage for new sites with new domains.

Minimizing Sandbox Damages

While Google sandbox is not something you can control, there are certain steps you can take in order to make the sandbox effect less destructive for your new site. As with many aspects of SEO, there are ethical and unethical tips and tricks and unethical tricks can get you additional penalties or a complete ban from Google, so think twice before resorting to them. The unethical approaches will not be discussed in this article because they don comply with our policy.
Before we delve into more detail about particular techniques to minimize sandbox damage, it is necessary to clarify the general rule: you cannot fight the sandbox. The only thing you can do is to adapt to it and patiently wait for time to pass. Any attempts to fool Google – starting from writing melodramatic letters to Google, to using “sandbox tools” to bypass the filter – can only make your situation worse. There are many initiatives you can take, while in the sandbox, for as example:

Actively gather content and good links – as time passes by, relevant and fresh content and good links will take you to the top. When getting links, have in mind that they need to be from trusted sources – like DMOZ, CNN, Fortune 500 sites, or other reputable places. Also, links from .edu, .gov, and .mil domains might help because these domains are usually exempt from the sandbox filter. Don't get 500 links a month – this will kill your site! Instead, build links slowly and steadily.

Plan ahead– contrary to the general practice of launching a site when it is absolutely complete, launch a couple of pages, when you have them. This will start the clock and time will be running parallel to your site development efforts.

Buy old or expired domains – the sandbox effect is more serious for new sites on new domains, so if you buy old or expired domains and launch your new site there, you'll experience less problems.

Host on a well- established host – another solution is to host your new site on a subdomain of a well-established host (however, free hosts are generally not a good idea in terms of SEO ranking). The sandbox effect is not so severe for new subdomains (unless the domain itself is blacklisted). You can also host the main site on a subdomain and on a separate domain host just some contents, linked with the main site. You can also use redirects from the subdomained site to the new one, although the effect of this practice is also questionable because it can also be viewed as an attempt to fool Google.

Concentrate on less popular keywords – the fact that your site is sandboxed does not mean that it is not indexed by Google at all. On the contrary, you could be able to top the search results from the very beginning! Looking like a contradiction with the rest of the article? Not at all! You could top the results for less popular keywords – sure, it is better than nothing. And while you wait to get to the top for the most lucrative keywords, you can discover that even less popular keywords are enough to keep the ball rolling, so you may want to make some optimization for them.

Rely more on non-Google ways to increase traffic – it is often reminded that Google is not the only search engine or marketing tool out there. So if you plan your SEO efforts to include other search engines, which either have no sandbox at all or the period of stay there is relatively short, this will also minimize the damages of the sandbox effect.

Robots.txt

It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.

One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots metatag. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.

What Is Robots.txt?

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.

The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.

Structure of a Robots.txt File

The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:

User-agent:

Disallow:

“User-agent” are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:

# All user agents are disallowed to see the /temp directory.

User-agent: *

Disallow: /temp/

The Traps of a Robots.txt File

When you start making complicated files – i.e. you decide to allow different user agents access to different directories – problems can start, if you do not pay special attention to the traps of a robots.txt file. Common mistakes include typos and contradicting directives. Typos are misspelled user-agents, directories, missing colons after User-agent and Disallow, etc. Typos can be tricky to find but in some cases validation tools help.

The more serious problem is with logical errors. For instance:

User-agent: *

Disallow: /temp/

User-agent: Googlebot

Disallow: /images/

Disallow: /temp/

Disallow: /cgi-bin/

The above example is from a robots.txt that allows all agents to access everything on the site except the /temp directory. Up to here it is fine but later on there is another record that specifies more restrictive terms for Googlebot. When Googlebot starts reading robots.txt, it will see that all user agents (including Googlebot itself) are allowed to all folders except /temp/. This is enough for Googlebot to know, so it will not read the file to the end and will index everything except /temp/ - including /images/ and /cgi-bin/, which you think you have told it not to touch. You see, the structure of a robots.txt file is simple but still serious mistakes can be made easily.

Tools to Generate and Validate a Robots.txt File

Having in mind the simple syntax of a robots.txt file, you can always read it to see if everything is OK but it is much easier to use a validator, like this one: http://tool.motoricerca.info/robots-checker.phtml. These tools report about common mistakes like missing slashes or colons, which if not detected compromise your efforts. For instance, if you have typed:

User agent: *

Disallow: /temp/

this is wrong because there is no slash between “user” and “agent” and the syntax is incorrect.

In those cases, when you have a complex robots.txt file – i.e. you give different instructions to different user agents or you have a long list of directories and subdirectories to exclude, writing the file manually can be a real pain. But do not worry – there are tools that will generate the file for you. What is more, there are visual tools that allow to point and select which files and folders are to be excluded. But even if you do not feel like buying a graphical tool for robots.txt generation, there are online tools to assist you. For instance, the Server-Side Robots Generator offers a dropdown list of user agents and a text box for you to list the files you don't want indexed. Honestly, it is not much of a help, unless you want to set specific rules for different search engines because in any case it is up to you to type the list of directories but is more than nothing.

Has it ever happened to you to have a perfectly optimized site with lots of links and content and the right keyword density and still do not rank high in search engines? Probably every SEO has experienced this. The reasons for such kind of failure can be really diverse – starting from the sandbox effect (your site just needs time to get mature), to overoptimization and inappropriate online relations (i.e. the so called “bad neighborhood” effect).
While there is not much you can do about the sandbox effect but wait, in most other cases it is up to you to counteract the negative effects you are suffering from. You just need to figure out what is stopping you from achieving the deserved rankings. Careful analysis of your site and the sites that link to you can give you ideas where to look for for the source of trouble and deal with it. If it is overoptimization – remove excessive stuffing; if it is bad neighbors – say “goodbye” to them. We have already deals with overoptimization as a SEO overkill and in this article we will have a look at another frequent rankings killer.

Link Wisely, Avoid Bad Neighbors

It is a known fact that one of the most important items for high rankings, especially with Google, are links. The Web is woven out of links and inbound and outbound links are most natural. Generally, the more inbound links (i.e. other sites link to you) you have, the better. On the contrary, if you have many outbound links, this is not very good. And what is worse – it can be disastrous, if you link to improper places – i.e. bad neighbors. The concept is hardly difficult to comprehend – it is so similar to real life: if you choose outlaws or bad guys for friends, you are considered to be one of them.
It might look unfair to be penalized for things that you have not done but linking to sites with bad reputation is equal to a crime for search engines and by linking to such a site, you can expect to be penalized as well. And yes, it is fair because search engines do penalize sites that use different tricks to manipulate search results. In a way, in order to guarantee the integrity of search results, search engines cannot afford to tolerate unethical practices.
However, search engines tend to be fair and do not punish you for things that are out of your control. If you have many inbound links from suspicious sites, this will not be regarded as a malpractice on your side because generally it is their Web master, not you, who has put all these links. So, inbound links, no matter where they come from, cannot harm you. But if in addition to inbound links, you have a considerable amount of outbound links to such sites, in a sense you vote for them. Search engines consider this as malpractice and you will get punished.

Why Do Some Sites Get Labelled as Bad Neighbors?

We have already mentioned in this article some of the practices that are a reason for search engines to ban particular sites. But the “sins” are not only limited to being a spam domain. Generally, companies get blacklisted because they try to boost their ranking by using illegal techniques such as keyword stuffing, duplicate content (or lack of any original content), hidden text and links, doorway pages, deceptive titles, machine-generated pages, copyright violators, etc. Search engines also tend to dislike meaningless link directories that conceive the impression that they are topically arranged, so if you have a fat links section on your site, double-check what you link to.

Figuring Out Who's Good, Who's Not

Probably the question that is popping is: “But since the Web is so vast and so constantly changing, how can I know who is good and who is bad?” Well, you don't have to know each of the sites on the black list, even if it were possible. The black list itself is changing all the time but it looks like there will always be companies and individuals who are eager to earn some cash by spamming, disseminating viruses and porn or simply performing fraudulent activities.
The first check you need to perform when you have doubts that some of the sites you are linking to are bed neighbors is to see if they are included in the indices of Google and the other search engines. Type “site:siteX.com”, where “siteX.com” is the site you are performing a check about and see if Google returns any results from it. If it does not return any results, chances are that this site is banned from Google and you should immediately remove any outbound links to siteX.com.
If you have outbound links to many different sites, such checks might take a lot of time. Fortunately, there are tools that can help you in performing this task. The CEO of Blackwood Productions has recommended http://www.bad-neighborhood.com/ as one of the reliable tools that reports links to and from suspicious sites and sites that are missing in Google's index.

If there is a really hot potato that divides SEO experts and Web designers, this is Flash. Undoubtedly a great technology to include sounds and picture on a Web site, Flash movies are a real nightmare for SEO experts. The reason is pretty prosaic – search engines cannot index (or at least not easily) the contents inside a Flash file and unless you feed them with the text inside a Flash movie, you can simply count this text lost for boosting your rankings. Of course, there are workarounds but until search engines start indexing Flash movies as if they were plain text, these workarounds are just a clumsy way to optimize Flash sites, although certainly they are better than nothing.

Why Search Engines Dislike Flash Sites?

Search engines dislike Flash Web sites not because of their artistic qualities (or the lack of these) but because Flash movies are too complex for a spider to understand. Spiders cannot index a Flash movie directly, as they do with a plain page of text. Spiders index filenames (and you can find tons of these on the Web), but not the contents inside.

Flash movies come in a proprietary binary format (.swf) and spiders cannot read the insides of a Flash file, at least not without assistance. And even with assistance, do not count that spiders will crawl and index all your Flash content. And this is true for all search engines. There might be differences in how search engines weigh page relevancy but in their approach to Flash, at least for the time beings, search engines are really united – they hate it but they index portions of it.

What (Not) to Use Flash For?

Despite the fact that Flash movies are not spider favorites, there are cases when a Flash movie is worth the SEO efforts. But as a general rule, keep Flash movies at a minimum. In this case less is definitely better and search engines are not the only reason. First, Flash movies, especially banners and other kinds of advertisement, distract users and they generally tend to skip them. Second, Flash movies are fat. They consume a lot of bandwidth, and although dialup days are over for the majority of users, a 1 Mbit connection or better is still not the standard one.

Basically, designers should keep to the statement that Flash is good for enhancing a story, but not for telling it – i.e. you have some text with the main points of the story (and the keywords that you optimize for) and then you have the Flash movie to add further detail or just a visual representation of the story. In that connection, the greatest SEO sin is to have the whole site made in Flash! This is is simply unforgivable and do not even dream of high rankings!

Another “No” is to use Flash for navigation. This applies not only to the starting page, where once it was fashionable to splash a gorgeous Flash movie but external links as well. Although it is a more common mistake to use images and/or javascript for navigation, Flash banners and movies must not be used to lead users from one page to another. Text links are the only SEO approved way to build site navigation.

Workarounds for Optimizing Flash Sites

Although a workaround is not a solution, Flash sites still can be optimized. There are several approaches to this:

Input metadata
This is a very important approach, although it is often underestimated and misunderstood. Although metadata is not as important to search engines as it used to be, Flash development tools allow easily to add metadata to your movies, so there is no excuse to leave the metadata fields empty.

Provide alternative pages
For a good site it is a must to provide html only pages that do not force the user to watch the Flash movie. Preparing these pages requires more work but the reward is worth because not only users, but search engines as well will see the html only pages.

Flash Search Engine SDK
This is the life-belt. The most advanced tool to extract text from a Flash movie. One of the handiest applications in the Flash Search Engine SDK is the tool named swf2html. As it name implies, this tool extracts text and links from a Macromedia Flash file and writes the output unto a standard HTML document, thus saving you the tedious job to do it manually.
However, you still need to have a look at the extracted contents and correct it, if necessary. For example, the order in which the text and links is arranged might need a little restructuring in order to put the keyword-rich content in the title and headings or in the beginning of the page.
Also, you need to check if there is no duplicate content among the extracted sentences and paragraphs. The font color of the extracted text is also another issue. If the font color of the extracted text is the same as the background color, you will run into hidden text territory.

SE-Flash.com
Here is a tool that visually shows what from your Flash files is visible to search engines and what is not. This tool is very useful, even if you already have the Flash Search Engine SDK installed because it provides one more check of the accuracy of the extracted text. Besides, it is not certain that Google and the other search engines use Flash Search Engine SDK to get contents from a Flash file, so this tool might give completely different results from those that the SDK will produce.

These approaches are just some of the most important examples of how to optimize Flash sites. There are many other approaches as well. However, not all of them are brilliant and clear, or they can be classified on the boundary of ethical SEO – e.g. creating invisible layers of text that is delivered to spiders instead the Flash movie itself. Although this technique is not wrong – i.e. there is no duplicate or fake content, it is very similar to cloaking and doorway pages and it is better to avoid it.

Even if you are not looking for trouble and do not violate any known Google SEO rule, you still might have to experience the ultimate SEO nightmare - being excluded from Google’s index. Although Google is a kind of a monopolist among search engines, it is not a bully company that excludes innocent victims for pure pleasure. Google keeps rigorously to SEO best practices and excludes sites that misbehave.

If you own and run a blog or website then being listed by Google is a very important step so it is read by as many people as possible; but what if your website gets Google banned? If this has happened to you, then you know that it hurts your site because you won’t show up in the Google search engine and that means less traffic to your site. Getting unbanned from Google is a long and drawn out process. And sometimes Google won’t even tell you the reason they banned your website in the first place, which doesn’t make things any easier.

Some of the ways a site can be Google banned include having spam on it, putting in too many keywords that clog up your site, making your owned URLs redirect to each other, improperly inserting a robot.txt file, duplicating your own pages and sending people to them over and over, and linking to bad sites like those with adult content, gambling or other unauthorized areas. There are multiple other reasons, so it’s a good idea to try to get them to let you know the reason for being Google banned. That will make it much simpler to fix the problem. Over-optimization has many faces and you can have a look at the Optimization, Over-Optimization or SEO Overkill? Article to get some ideas of practices that you should avoid.

Here are the necessary steps that you need to follow in order to get Google reconsideration for getting unbanned. Be sure to follow Google reconsideration request process precisely and correctly if you want to get your website unbanned and get your site back in business providing whatever products or services that it has:

1 Send an Google Reconsideration Request for getting Unbanned

Getting Google reinclusion of your website requires putting in a Google reconsideration request. First, the way you know your site is Google banned is that suddenly it doesn’t have a page rank on it. Then, in order to determine for sure that this is the case, enter your site at www.yoursite.com into Google, using whatever the name of your site is instead of the words yoursite. If you don’t see any of your pages there, then it’s likely you were Google banned.

Another way to tell if you are truly Google banned is to see if your pages show up in page indexing on Google. Or, if it is a news blog then you can go to www.googlenews.com and if you don’t see your articles there, you will also know you were probably banned from Google and now need to send a Google reconsideration request.

2 Be Polite to Google

Next, make remember that you are sending your Google reconsideration request to a real person who works for Google and someone will actually read your reconsideration request at Google office to be unbanned. Therefore you want to be polite and go into as much detail as possible, as it is better to give too much information than not enough in this situation. Being nice counts in this situation and if you act like a jerk, then it’s likely no one will want to help you.

3 Provide Information about the Domain

List things such as if it was a brand new domain name, tell them some background about your website, and also tell them the rules you think you may have broken. In case, there has been spam click on your account, get to the proofs of the same and write to them about it. This shows them you are serious about resolving the problem when sending the Google reconsideration suggestion. Put down everything that you think someone would need to know in order to know who you are and to jog their memory on why you were banned in the first place. Be sure to do your research so you will understand what his going on and can fully explain it to the Google representatives while sending the reconsideration request to Google.

4 Explain the Solution to the Past Problem

While sending the reconsideration request to Google, tell the representative what you have already done to fix the problem that caused you to be banned. Spell it out in detail and give them your actual page URL to prove it. It’s best to give as much information and data as you can so they will understand what you did to solve the issue. For example, if you had your site linked to bad links, then you must make sure that you remove every one of those and unlink them. Be sure to have removed all spam, or anything else that Google doesn’t approve or like. Then, prove to Google that you did this by showing them the evidence. Or, if you had invalid clicks, which is one of the common reasons to get Google banned, show why the clicks were valid. It takes all this sort of information in detail to make them understand the situation and help you to resolve it. Also, ensure that the changes now made to your website meet the requirements for Google reinclusion. Don’t do even a single thing on your website, which may annoy them.

5 Verify the Website

Next, login to your Google webmaster account and add and verify your site. Then go to http://www.google.com/webmasters/tools/reconsideration. This is the area that you use to put in your reconsideration request to Google to be unbanned. You can also send the information in an email to help@google.com. This is where Google representatives give support to customers. You may have to also sign up for Google webmaster tools once you are logged into your account if you don’t already have it

6 Provide Proof

It’s never a good idea to be an idiot and try to blame Google, or try to say you didn’t have any idea what you did wrong. You need real proof for Google reinclusion, not just blame or acting stupid. Show them the proof of the changes you made for Google reconsideration. And at last, always be considerate and thank them for the time and effort that they are taking to look into your reconsideration request to Google and help you to solve the problems and to get you unbanned from Google so your site can be relisted and you can keep getting the traffic you need to run your business, blog or news site.

7 Be Patient

It can take several weeks for a Google representative to get back with you and answer your Google reinclusion suggestion. They do have a lot of other things to handle and you need to understand that you aren’t the only one who may be having issues. While you are waiting, continue to look over your site and try to make sure all the alleged violations are fixed and good to go.

8 Send Follow Up Email for Google reconsideration

Be sure to send follow-up emails to Google to ask how the request is going and if they know when the situation will be resolved. You probably shouldn’t send one every day, because this could be regarded as you being a pest, but be sure to send one in periodically until you get an answer that you understand and can deal with to solve the Google banned problem.

All in all, it can be a time consuming and complicated process in order for your site to switch your site from Google banned to Google unbanned, but with the proper preparation and information, you should be well on your way to being in their good graces again. However, it’s well worth your efforts, so just follow these steps and Google should get back with you and fix your situation and your site.

Catwidget4

tabber

Keyword Density or Keyword Stuffing?

Doorway Pages and Hidden Text

Duplicate Content

Links Spam

Spiders Explained

Flash, JavaScript, Image Text or Frames?!

Are Your Hyperlinks Spiderable?

Looking for Your Keywords

Are Dynamic Pages Too Dynamic to be Seen At All

Meta Keywords and Meta Description

The Yahoo! Algorithm - Differences With Google

Yahoo! WebRank

Sandbox and Aging Delay

Minimizing Sandbox Damages

Robots.txt

What Is Robots.txt?

Structure of a Robots.txt File

The Traps of a Robots.txt File

Tools to Generate and Validate a Robots.txt File

Link Wisely, Avoid Bad Neighbors

Why Do Some Sites Get Labelled as Bad Neighbors?

Figuring Out Who's Good, Who's Not

Why Search Engines Dislike Flash Sites?

What (Not) to Use Flash For?

Workarounds for Optimizing Flash Sites

1 Send an Google Reconsideration Request for getting Unbanned

2 Be Polite to Google

3 Provide Information about the Domain

4 Explain the Solution to the Past Problem

5 Verify the Website

6 Provide Proof

7 Be Patient

8 Send Follow Up Email for Google reconsideration

Popular Posts

Template Information

Template Information

Test Footer 2