Search in Microsoft®
SharePoint® Server 2010 is re-architected with new components to create greater
redundancy within a single farm and to allow scalability in multiple
directions. Each of the components that make up the query architecture and the
crawling architecture can be scaled out separately based on the needs of an
organization. More information:
The query architecture includes query components, index partitions, and
property databases.
About index partitions:
- An index partition
is a logical portion of the entire index. The index is the aggregation of all
index partitions.
- Index partitions
are associated with query components. You deploy a query component that is
associated with a particular index partition to a specific server. In this way,
index partitions are spread across query servers. For example, in a farm with
three index partitions and one query component per partition, each query
component contains one-third of the total index.
- Deploying query
components that are associated with index partitions across different servers
creates faster query architecture because the processing power of multiple
query servers is used to respond to queries.
- Index partitions
can be associated with one or more query components. Multiple query components
(mirrors) for a given index partition can be deployed across query servers to
achieve redundancy. Typically, two query components are configured for each
index partition, and these query components reside on different query servers
to achieve redundancy of the index partition.
The crawl architecture includes several components that can be scaled
out based on crawl volume and performance requirements:
- Crawl component —
multiple crawl components can be deployed to crawl simultaneously. Each crawl
component is associated with a crawl database. Crawl components reside on
application servers. Crawl components produce portions of the index (per index
partition) and propagate them to the servers that are running the query components
associated with the given index partition.
- Crawl database —
Manages crawl operations and stores crawl history. You can assign multiple
crawl components to each crawl database for redundancy. In this case, each
crawl component will crawl different content during a crawl.
- Property database —
Also considered part of the query architecture; stores properties for crawled
data. The number of required property databases depends on the volume of
content that is crawled and the amount of metadata that is associated with the
content.
Search Flow
The Step Search:
1. Upload document or
create new item
2. The crawl process
works: When those Full crawl process starts, The Start address of the search
source moved to queue. iFilter opens files and Content index created on crawl
server. Then the Index moved in batches to query server and the relevant Data
written to crawl.
The crawler uses protocol handlers and iFilters as follows:
a. The crawler retrieves the start
addresses of content sources and calls the protocol handler based on the URL’s
prefix.
b. The protocol handler connects to the content source
and extracts system-level metadata and access control list information.
c. The protocol handler identifies the
file type of each content item based on the file name extension and calls the
appropriate iFilter associated with that file type.
d. The iFilter extracts content, removing any embedded
formatting, and then retrieves content item metadata.
e. Content is parsed by one or more
language-appropriate word breakers and is added to the content index, also
called the full-text index. Metadata and access control lists are added to the
Search database.
Additional Reading:
3. Written to Property
databases
4. User input keyword
5. The query flow: the
WFE serving the call uses the associated search service application proxy to
connect to a server running the Query and Site Settings Service also known as
the Query Processor. It uses WCF for this communication. The Query Processor
will connect to the following components to gather results merges\security
trims and return results back to WFE: Query Component (holds entire index or
partition of an index) Property Store DB (holds metadata\properties of indexed
content) Search Admin DB (holds Security Descriptors\Configuration data). Then
WFE displays search results to the user.

No comments:
Post a Comment