Free Web tutorials covering HTML, CSS, JavaScript, and DHTML from beginner to advanced. Free downloads and developer resources. Personalized help via email, form, and chat.

free, web, tutorials, HTML, html, CSS, css, stylesheet, cascading stylesheet, Javascript, javascript, JavaScript, DHTML, dhtml, beginner, advanced, web development, web page, web site, free web tutorial, free HTML tutorial, free CSS tutorial, free css tutorial, free cascading stylesheet tutorial, free stylesheet tutorial, free javascript tutorial, free DHTML tutorial, free HTML class, free CSS class, free stylesheet class, free cascading stylesheet class, free javascript class, free DHTML class">

Free Web tutorials covering HTML, CSS, JavaScript, and DHTML from beginner to advanced. Free downloads and developer resources. Personalized help via email, form, and chat. free, web, tutorials, HTML, html, CSS, css, stylesheet, cascading stylesheet, Javascript, javascript, JavaScript, DHTML, dhtml, beginner, advanced, web development, web page, web site, free web tutorial, free HTML tutorial, free CSS tutorial, free css tutorial, free cascading stylesheet tutorial, free stylesheet tutorial, free javascript tutorial, free DHTML tutorial, free HTML class, free CSS class, free stylesheet class, free cascading stylesheet class, free javascript class, free DHTML class

<Code_Punk>'s

How Search Engines Work

Code Tutorials



Site Development



Downloads



Help!!



Home

What Are Search Engines?

We've all used a search engine, like Google or Yahoo! at one time or another. These sites are just large databases of web pages that are sorted by text strings.

Web pages are scanned by a "spider" and sent to the search engine's database to be "filtered" for key text strings. This tutorial will give you a brief overview of the spidering and filtering process. This knowledge is needed to be able to optimize your pages for high rankings in search engines.

When you type in some search words, called a "query", into a search engine, it goes through its database of web pages and text strings to find matches with your query. The closer a page's key strings (as determined by the search engine filters) matches your query, the more "relevance" it has to your query and the higher it's listed. This high listing means that the most relevant pages will be listed early in the list of links that a search engine presents the viewer.

The importance of having a good search engine ranking for your page's field (possible queries) should be apparent. Viewers will try the first links first. Some 90% of all viewers will not go past the first page of links presented by a search engine. So, if your page isn't listed in the top 10-50 for your field, you won't be getting much traffic from search engines.

There is a lot of crap continuously going around about how to make your pages rank highly in search engines. Most of this advice is either outdated or outright bull. The search engines themselves start a lot of these rumors to throw developers off track.

There is no "secret recipe" to place a site at the top of any given search engine query. There is no service that can place you at the top of the list for specific queries.

Think about it: Most of these companies are offering 100 similar sites a top ten ranking. How can they all fit in the top 10 for a specific query? They can't. It just doesn't add up. These companies are frauds and should be avoided at all costs.

Other companies run search engines and will offer to list you for a fee. The bigger the fee, the higher you'll place. This is okay except for the fact that no one uses those search engines. The big, legitimate search engines don't offer "for pay" listings. Yahoo! thought about it, but I think they gave up on it. It would ruin their engine and their traffic.

Save your money. You can learn to code a page that's optimized to get you a higher than average placement for free. And, "higher than average" is all any web developer can honestly hope for.

Spidering

The goal of the spider is to try to determine what the "real" content of your page is and send back strings of text that act to represent your site in the filtering process. This spider goes to site URLs it gets from at least three sources:

Site Submissions -- These are direct submissions to the search engine. The submission asks the search engine to spider you page. There are "automated" submitters. These don't work as the biggest, best search engines block them.

Links -- Spiders follow links on pages that they're reading. If your link is on one of these pages, your site will be spidered, too. Another good argument for the power of reciprocal links.

Other Databases -- The big search engines will swap their databases of URLs. Spiders also look for large databases of URLs when spidering a site.

Once a spider gets to your site, it starts "reading" your page's source code. Spiders are very busy little critters and can't read whole pages. A spider has a little program in it that acts like a pre-filter that limits it to reading certain portions of your code.

Spiders seem particularly fond of text strings found in the following areas:

The page's <title>

The <meta> tag's CONTENT for <meta>s with a NAME of "description" and "keywords". Don't be fooled by the old "Search Engines don't read <meta> tags anymore." They sure do. We'll see what they do with this data in the next section.

Header tags, like <h2> and <h3>. Spiders usually will snag the text between the bigger three header tags.

The <p> tags beginning content. Usually just a few dozen characters or so right after a <p> tag.

The ALT attribute in <img> tags.

The logic behind what text a spider collects is really pretty straightforward. The search engine wants to determine what strings represent the true nature your web page's content. It figures that strings you put in the places above are good indicators of your site's true content.

Spiders make a text file of these strings on the search engine's database. These files will now be filtered to determine what your site's about and what queries your site will rank highly in.

Filtering

Filtering is a search engine's attempt to do two things: First, it wants to match your site with words and phrases. Secondly, it wants to determine if you are trying to "trick" it or not.

The filter analyzes the text file returned by the spider. It looks for individual words by looking at the characters between spaces. It looks for phrases the same way, text between the <h> tags, for example. These are the strings the engine will identify as legitimate content.

If words occur in the same order in a string in too many places in the spider's text file, your page will be considered fraudulent and probably not listed by the search engine. The same holds true if strings in the <meta> tags don't match other strings the filter determines is legitimate content.

This is why the best search engines read <meta> tags. The <meta> tags give them a baseline to measure the true content and veracity of the page by. If the <meta> data and the content strings either don't match up, or match up exactly too often, your page is scrapped.

Abuses of the <meta> tag and copying keywords all over a page are old hat. Not only will they not help your pages' rankings, they may well kill them.

It's important to remember that no search engine optimization code is better than bad code commonly used by contemporary web developers.

All of the companies claiming they'll boost your search engine rankings use obsolete techniques that will do more harm than good in the majority of cases.

Summary

In short, a search engine is merely a database of web sites that are sorted according to keywords and key phrases.

A search engine sends out a "spider" to look at your pages' source code. The spider sends certain strings of text back to the search engine's database. A "filter" will further analyze these strings to determine your site's relevance to various keywords and phrases.

Abuses of the <meta> tag and other tricks have caused search engines to become very sophisticated in spidering and filtering your source code to determine actual content and nature of your page.

If a search engine thinks your code is trying to trick it, your page will either not be listed by the engine, or listed near last in all relevant queries. So, it's better to use no search engine trickery if you don't know what you're doing. Better yet, it's better to know how to optimize your page for search engines.