Hai guys in this post am going to explain about Search Engine and its process of working in detail of Processing a document, Processing a query, Function of search and match,Capability for ranking very clearly.
Search engine is an information system which is designed to help and find information stored on a computer system. It helps to minimize the time required to find information and the amount of information, which must be searched. Information retrieval is the science of searching for information in document, searching for meta data, searching for document themselves or searching within databases. The most popular form of a search engine is a web search engine which searches information on the world wide web. Other kinds of search engines include enterprise search engines which search on intranet, personal search engines.
Working of Search Engine Search engine, match queries against an index that they create. ‘The index consists of words present in document, pointer to their location within the document. A search engine comprises, of four essential steps, which are as follows,
(a) Processing a document.
(b) Processing a query.
(c) Function of search and match.
(d) Capability for ranking.
(a) Processing a Document
The document processor prepares, creates, processes and inputs the document, pages or sites that user searches. The document processor performs one of the following steps,
(i) Normalizes the document stream to a predefined format.
(ii) Breaks the document stream into desired retrievable units.
(iii) Isolates and metatags subdocument pieces.
(iv) Identities potential indexable elements in documents.
(v) Deletes stop words.
(vi) Extract index entries.
(vii) Compute weights.
(viii) Creates and updates the main inverted file against which the search engine searches in order to match queries, to document.
(b) Processing a Query
Query processing has seven steps. More steps and more documents makes the process more expensive for processing in terms of resources and responsiveness.
Steps for query processing are as given below,
1. Tokenize query term.
2. Recognize query tems.
4. Delete stop words.
5. Stem words.
6. Create query representation.
7. Expand query term.
8. Compute weights.
Matching step is used after the steps 2, 4, 5, 6 and after 7, 8.
Step 1: Tokenize Query Term
As soon as user inputs a query, the search engine tokenizes the query stream i.e., breaks it down into understandable segments.
Step 2: Parsing
Since users may employ _special operators in their query, including Boolean, adjacency, the system needs to parse the query first into query terms and operators.
Steps 4, 5: Stop List and Stemming
Stop list and stemming the query is similar to the process in the document processor.
Step 6: Creating a Query
Query representation depends on how th'e system does its matching. Good statistical queries should contain many synonyms and other terms in order to create a full representation.
Step 7, 8 are performed by more advanced search engines.
Step 7: Query Expansion
More sophisticated systems may expand the query into all possible synonymous term and perhaps even broader and narrower terms.
Step 8: Compute Weight
The final step in query processing involves computing weights for the terms in the query.
(c) Functions of Search and Match
Searching the inverted file for documents, meeting the query requirements, referred simply as matching which is a standard binary search. The simpler the document representation and the matching algorithm, the less relevant the results.
(d) Capability for Ranking
After determining the pages that matches the query requirements, a score is computed between the query and each page is ranked based on the scoring algorithm used by the system. However, the search engine determines the rank, the ranked results list goes to the user, who can then simply click and follow the system's internal pointers to the selected page.