The application is entitled " the HYBRID-DISTRIBUTION MODEL FOR SEARCH submitted on November 22nd, 2010
ENGINE INDEXES(Mixture Distribution Model for search engine index)" Application No. 12/951,815(Lawyer's archives
Number MFCP. 157166)U. S. application part continuation application, entire contents are incorporated herein by reference.
Embodiment
Describe subject of the present invention with specificity to meet legal requirements herein.However, this explanation itself is not
It is intended to the scope of limitation this patent.On the contrary, inventors have contemplated that, with reference to other present or following technology, it is also possible to
Otherwise embody claimed theme, with including different steps or with described some step classes herein
As step combination.In addition, although term " step " and/or " square frame " herein can be for meaning used method
Different elements, but the term should not be construed as to imply that among various steps disclosed herein or between any specific book
Sequence, except non-sum except when when clearly describing the order of separate step.
As described above, embodiments of the invention are provided, node formation fragment, so that each is stored for before the fragment
To a part for index and Converse Index.For example, in the document total amount to be indexed(Such as 1,000,000,000,000)Among, can be each
Fragment distributes the document of some part so that the fragment is responsible for indexing and perform sequence calculating for those documents.It is stored in
Part in the specific fragment is reverse and forward index be for the document for distributing to the fragment it is complete it is reverse with
Forward index.Each fragment includes multiple nodes, and it is substantially machine or computing device with storage capacity.Converse Index
Each node in the fragment is assigned to the independent sector of forward index so that can be performed using each node various
Sequence is calculated.So, each node have stored thereon the Converse Index of the fragment and the subset of forward index, and be responsible in piece
Each is accessed in various sequencer procedures in section.For example, total sequencer procedure can include matching stage, preliminary phase sorting and
Final phase sorting.Matching/elementary step may require compiling some atom from search inquiry using its Converse Index
Those nodes of index recognize the first group document relevant with search inquiry.First group of document is from distributing to the fragment
One group of document of document.Then, its forward index index by a pair document identification associated with the document in first group of document
Those nodes can be used to identify the second group document more relevant with search inquiry.In one embodiment, second group
Document is the subset of first group of document.This total process can make for being limited to one group of document to be found those related
If final sequencer procedure generally more time-consuming than preliminary sequencer procedure and expensive be used to pair with to each in index
Document ordering(Regardless of whether related)When situation about being likely to occur be ranked up compared to less document.
Therefore, on the one hand, embodiments of the invention are directed to one or more computer-readable storage mediums, and it is stored in and counted
Calculate and promote computing device when equipment is used for recognizing related text based on search inquiry using mixed distribution system
The computer-useable instructions of the method for shelves.This method includes distributing one group of document to fragment, and this group of document is pressed with Converse Index
Atom is indexed and indexed with forward index by document, and the different piece of Converse Index and forward index is stored in into shape
Into in each in multiple nodes of fragment.In addition, this method includes accessing in each being stored in the first group node
Reverse index portion to recognize the first group document relevant with search inquiry.This method, which is comprised additionally in, to be based on and first group of document
Associated document identification come access the forward index part in each being stored in the second group node with by first group text
A large amount of relevant documentations in shelves are limited to second group of document.
In another embodiment, aspect of the invention is directed to one or more computer-readable storage mediums, and it is stored in and counted
Calculate equipment promote when using computing device be used for generate be used for many procedure documents searching systems mixed distribution system side
The computer-useable instructions of method.This method includes receiving the instruction for one group of document for distributing to fragment, and the fragment includes multiple sections
Point.For fragment, this method also include one group of document of distribution is indexed by atom with generate Converse Index and by
Document indexs one group of document of distribution to generate forward index.This method is comprised additionally in into multiple nodes of formation fragment
Each distribution Converse Index part and forward index a part so that each in the multiple node has been deposited
Store up the different piece of forward index and the different piece of Converse Index.
Another embodiment of the present invention is directed to one or more computer-readable storage mediums, and it, which is stored in, is used by a computing device
When promote computing device to be used for using mixed distribution system so as to the method for recognizing relevant documentation based on search inquiry
Computer-useable instructions.This method includes receiving search inquiry, recognizes one or more of search inquiry atom, and will be described
One or more atoms are sent to multiple fragments, and each fragment has been allocated the one group of document indexed by atom and by document,
So that Converse Index and forward index be generated and stored in each in the multiple fragment at.In the multiple fragment
Each include multiple nodes, each node is allocated a part for forward index and Converse Index.Based on one or
Multiple atoms, this method recognizes the first group node at the first fragment, and its reverse index portion includes one from search inquiry
At least one in individual or multiple atoms.In addition, methods described includes accessing at each being stored in the first group node
First group of document that reverse index portion is found relevant with one or more of atoms to recognize and based on first group
The associated document identification of document recognizes that its forward index part is included in the document identification associated with first group of document
The second one or more group nodes.This method also includes accessing the forward index at each being stored in the second group node
Part is to recognize second group of document as the subset of first group of document.
In other embodiments of the invention, segment root is selected from the node for performing ad hoc inquiry, such as it is searched
Rustling sound draws those nodes including being present in word or atom in search index.Initially, preliminary segment root is chosen and interim
Ground is used to collect data from other nodes.In one embodiment, the preliminary segment root is collected and polymerize from other nodes
Statistics, its will be used to perform search inquiry.Preliminary amendment determines which node is best suited for serving as using algorithm
Final segment root for the ad hoc inquiry.For example, can be used for final fragment with most data segment root to be transmitted
The good selection of root, because need not polymerize substantial amounts of data transfer to another node.In one embodiment, select most
The principal concern of whole segment root is cost, such as by timeliness of the data from a node-node transmission to final segment root and easily
Property.For example, when the node being related in the execution in ad hoc inquiry recognizes the document relevant with search inquiry, it is necessary to by this data
It is transferred to the final segment root from many this information of node aggregation.The inquiry for the polymerization that final segment root takes pride in multinode in the future is carried
Access evidence is transferred to another component of the similar data assembling from multiple fragments.Similarly, target is that selection will cause always
The more efficient final segment root of body query execution process cost.
Similarly, on the one hand it is to be directed to one or more computer-readable storage mediums, it is stored in when being used by a computing device
Computing device is promoted to be used for the computer-useable instructions of method of allocated segment root.This method includes receiving search inquiry,
The group node that will be used to decompose in the fragment of search inquiry is recognized, and preliminary segment root is selected from the group node.In addition, should
Method is included at preliminary segment root receives statistics from each node in the group node recognized, and the statistics refers to
Show the ability that each node serves as final segment root, the final segment root is responsible for looking into from the group node based on search inquiry
Ask implementing result assembling.This method comprises additionally in based on the statistics and to select final segment root simultaneously from the group node by algorithm
Final segment root is notified to give the group node so that node know by their own result of query execution send where.
Second aspect is directed to one or more computer-readable storage mediums, and it, which is stored in when being used by a computing device, promotes to calculate
Equipment performs the computer-useable instructions of the method for allocated segment root.This method is included at the fragment including multiple nodes
The search inquiry to be performed is received, and a group node of execution search inquiry will be used to from the identification of the multiple node.Holding
Before row search inquiry, this method includes selecting preliminary segment root from the multiple node, and the selection is based on being used for each node
One or more of expection load or random selection.In addition, this method is included at preliminary segment root from will be used to hold
Each node in the group node of row search inquiry receives statistics.The statistics is included with sending data across network
Associated current loads and cost data.Based on statistics, selection will polymerize during query execution from the group node looks into
Ask the final segment root for performing data.Then search inquiry is performed.
The third aspect is directed to one or more computer-readable storage mediums, and it, which is stored in when being used by a computing device, promotes to calculate
Equipment performs the computer-useable instructions of the method for allocated segment root.This method is included in the main body root including multiple fragments
(corpus root) place receives search inquiry.Each in the multiple fragment includes multiple nodes.Each node, which has, to be deposited
A part for the search index of storage in the above.Methods described also includes recognizing will be used to perform what is received in each fragment
One group node of search inquiry simultaneously recognizes preliminary segment root for each in the multiple fragment from the group node.In addition,
Statistics is asked to by each node for being used to perform in the group node of the search inquiry received.The statistics is
Received from each node in the group node, the statistics indicates that each node serves as the availability of final segment root,
The final segment root collects query execution data from the node group in its respective segments.This method is comprised additionally in based on statistical number
According to selecting the final segment root for each fragment and perform search inquiry.
The general survey of embodiments of the invention has been briefly described, has described wherein realize embodiments of the invention below
Illustrative Operating Environment so as to provide for the present invention various aspects general background.Initially referring in particular to Fig. 1, it is used for
Realize that the Illustrative Operating Environment of embodiments of the invention is illustrated and is designated generally as computing device 100.Calculating is set
Standby 100 be only an example of appropriate computing environment, it is not intended to which hint is for the scope of use of the invention or function
Any limitation.Also computing device 100 should not be construed to any one with shown component or combines relevant any
Dependence or requirement.
The present invention, including such as program module can be described under the general background of computer code or machine usable instructions
Computer executable instructions, it is by the computer of such as personal digital assistant or other portable equipments etc or other machines
Perform.Generally, including the program module of routine, program, object, component, data structure etc. refers to performing particular task or reality
The code of existing particular abstract data type.The present invention, including hand-held formula equipment, consumption can be implemented with multiple systems configuration
Electronic installation, all-purpose computer, more professional computing device etc..The present invention can also be implemented in a distributed computing environment, its
In, task is performed by the remote processing devices by communication network links.
With reference to Fig. 1, computing device 100 includes either directly or indirectly coupling the bus 110 of following equipment:Memory 112,
One or more processors 114, one or more presentation components 116, input/output(I/O)Port 118, input output assembly
120 and illustrative power supply 122.Bus 110 can represent one or more buses(Such as address bus, data/address bus or its
Combination).Although being for the sake of clarity shown by lines Fig. 1 various square frames, in fact, it is not so clear to describe various assemblies
Chu, for example, the line will be more accurately grey and fuzzy.For example, a people may think that such as display device be in
Existing component is I/O components.In addition, processor has memory.Present inventors have recognized that this is the characteristic of this area, and reaffirm
Fig. 1 figure is merely illustrative the exemplary computer device that can be used in conjunction with one or more embodiments of the invention.Such as
In the range of Fig. 1 and with reference to " computing device " all it is expected that, not in such as " work station ", " server ", " on knee
Made a distinction between the classification such as computer ", " portable equipment ".
Computing device 100 generally includes a variety of computer-readable mediums.Computer-readable medium can be any available
Medium, it can be accessed by computing device 100 and including volatibility and non-volatile media, removable and irremovable medium.
By way of example and not by way of limitation, computer-readable medium can include computer-readable storage medium and communication media.Computer
Storage medium includes storing the letter of such as computer-readable instruction, data structure, program module or other data etc
Volatibility and non-volatile, removable and irremovable medium that any method or technique of breath is realized.Computer-readable storage medium
Including but not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc
(DVD)Other optical disc storages, cassette, tape, disk storage or other magnetic storage apparatus or can be used for store desired
Information and any other medium that can be accessed by computing device 100.Communication media is generally in such as carrier wave or other conveyers
Computer-readable instruction, data structure, program module or other data are included in the modulated message signal of system, and including appointing
What information-delivery media.Term " modulated message signal " means to make one or more of its characteristic by encode information onto
The signal that mode in signal sets or changed.By way of example and not limitation, communication media include such as cable network or
The wire medium and such as wireless medium of sound, RF, infrared ray and other wireless mediums etc of direct wire connection etc.On
Stating any combinations of items should also be included within the scope of computer readable media.
Memory 112 includes the computer-readable storage medium of volatibility and/or nonvolatile memory form.Memory can be with
It is moveable, immovable or its combination.Exemplary hardware devices include solid-state memory, hard disk drive, CD and driven
Dynamic device etc..Computing device 100 includes one or many that data are read from the various entities of such as memory 112 or I/O components 120
Individual processor.Component 116 is presented data instruction is presented to user or miscellaneous equipment.Exemplary presentation components include display device,
Loudspeaker, print components, vibration component etc..
I/O ports 118 allow computing device 100 to be logically coupled to include the miscellaneous equipment of I/O components 120, therein
Some can be built-in.Illustrative components include microphone, control stick, game mat, satellite antenna, scanner, printer, nothing
Line equipment etc..
Referring now to Figure 2, can wherein use the square frame of the example system 200 of embodiments of the invention there is provided diagram
Figure.It should be understood that this arrangement as described herein and other arrangements are merely possible to example elaboration.In addition to those shown
Or alternatively, other arrangements and element can be used(For example, machine, interface, function, order and function packet etc.), and
And some elements can together be omitted.In addition, many elements as described herein are to may be implemented as discrete or distributed group
Part is combined the functional entity realized and be in any appropriately combined and position with other components.Can by hardware, firmware and/
Or software performs the various functions for being described herein as being performed by one or more entities.For example, can be by performing
The processor of storage instruction in memory performs various functions.
In addition to unshowned other components, system 200 includes user equipment 202, fragment 204 and mixed distribution system
Server 206.Each component shown in Fig. 2 can be any kind of computing device, for example, the meter such as described with reference to Fig. 1
Calculate equipment 100.The component can be in communication with each other via network 208, and network 208 can include one without limitation
Or multiple LANs(LAN)And/or wide area network(WAN).Such networked environment office, the computer network of enterprise-wide,
It is universal in in-house network and internet.It should be understood that within the scope of the invention, any number can be used in system 200
Purpose user equipment, fragment and mixed distribution system server.Individual equipment each can be included or assisted in distributed environment
The multiple equipment of work.For example, the fragment can include the fragment 204 as described herein of offer jointly being arranged in distributed environment
Function multiple equipment.In addition, unshowned other components may also be included in that in system 200, and in some embodiments
In can omit the component shown in Fig. 2.
User equipment 202 can be by being able to access that any kind of calculating that the user of network 200 possesses and/or operated
Equipment.For example, user equipment 202 can be desktop computer, laptop computer, tablet PC, mobile device or can visit
Ask any other equipment of network.Usually, by submitting search inquiry to search engine, end user can be set using user
Electronic document is especially accessed for 202.For example, end user can be come to visit using the web browser on user equipment 202
Ask and watch the electronic document of storage in systems.
Fragment 204 generally includes multiple nodes, also referred to as blade.In fig. 2, it is illustrated that two nodes, including numbering be
The node 2 that 210 node 1 and numbering is 212.Although illustrating two nodes in the embodiment of fig. 2, fragment can include
The node more much more than two(Such as 10,40,100).Two nodes are illustrated merely for the sake of exemplary purpose.Such as fragment
204 each fragment is allocated its one group of responsible document.Similarly, Converse Index and forward index are generated and fragment is stored in
At 204.Although Converse Index and forward index of the generation for specific fragment can be in fragment 204, replaceable
In embodiment, generation indexes and is sent to fragment 204 at some other positions or on some other computing device.In addition,
Once two indexes, then be divided into each by Converse Index and forward index based on one group of document structure tree for distributing to fragment 204
Part.In one embodiment, the number of the part is equal to the number of the node associated with specific fragment.Therefore, in spy
In the case of 40 nodes in stator section, two indexes are all divided 40 tunnels so that each node is responsible for Converse Index with before
The different piece of each into index.As indicated, node 1 has reverse index portion 214 and forward index part 216.
Node 2 also has reverse index portion 218 and forward index part 220.Although dividually being shown with node, in an implementation
In example, index is stored in node sheet.Under any circumstance, each node is responsible for based on one group of text for distributing to the fragment
Converse Index and a part for forward index that shelves are indexed.
As described, index can be indexed or burst or by document burst by atom.As it is used herein, burst refers to
Be the process one group of document indexed by atom or by document.Each side is used alone in the case where not having another
There are pros and cons in method.For example, when by document burst, benefit includes the isolation of the processing between fragment so that only need knot
The merging of fruit.In addition, easily making the information of each document be aligned with matching.In addition, network traffics are small.On the contrary, unfavorable
Part includes needing each fragment to handle any ad hoc inquiry.If Converse Index data are placed on disk, for N number of
Minimum O is needed for K atom inquiry on fragment(KN)Secondary interrogate and examine is looked for (disk seek).When by atom burst, it is favourable it
Place includes reduced calculating so that only need K fragment to handle the inquiry of K atoms.If Converse Index data are placed on disk
On, then need O for the inquiry of K atoms(K)Secondary interrogate and examine is looked for.But, on the contrary, disadvantage includes the processing for needing to connect,
So that all fragments that storage participates in the atom of inquiry need cooperation.In addition to the information of each document does not allow manageability, network
Flow is also significant.Embodiments of the invention require the management of the data to each document less than conventional method.It is former
Because being pre-calculated including some fractions and to index(Such as Converse Index)To score, and in matching stage(L0)Afterwards also
Occur the further refinement and filtering of document.Similarly, above-mentioned disadvantage relative to the data to each document management and
Speech is largely reduced.
In addition, each node in specific fragment is able to carry out various functions, including allow to recognize relevant search result
Ranking function.In certain embodiments, search engine can select the search knot for search inquiry using process stage by stage
Really, such as in entitled " MATCHING FUNNEL FOR LARGE DOCUMENT INDEX(Matching for big document index is leaked
Bucket)" U.S. Patent application(Application number is not yet distributed)(Lawyer's file number MFCP.157120)Described in side stage by stage
Method.Herein, each node can use multiple stages of total sequencer procedure.Exemplary sort process is described below, but
Be its be only the sequencer procedure that each node can be used an example.Can be using total row when receiving search inquiry
Program process is so that the quantity of matching document is downwards with that can manage size pairing.When receiving search inquiry, search inquiry is analyzed
To recognize atom.Then atom is used during the various stages of total sequencer procedure.These stages can be referred to as the L0 stages(
With the stage)Indexed with query search and recognize one group of initial matching document comprising the atom from search inquiry.This is initial
Process can by the number of candidate documents from all documents for indexing in search index reduce to from search inquiry
Those documents of atom matching.For example, search engine can it is millions of or even several trillion document in search for determine
With particular search query it is maximally related those.Once L0 matching stages are completed, then the number of candidate documents is greatly reduced.So
And, many algorithms for being positioned to most relevant documentation are costly and time-consuming.So, can be using two other ranks
Section, including preliminary phase sorting and final phase sorting.
Also referred to as the preliminary phase sorting in L1 stages is using function of scoring is simplified, and it, which is used to calculate, is used for from above-mentioned L0
The preliminary of the candidate documents retained with the stage is scored or sorted.Preliminary sequencing assembly 210 is similarly responsible for providing being used for from L0
The preliminary sequence of each candidate documents retained with the stage.Alternatively, it is possible to candidate documents are scored, and similarly
Given absolute number rather than sequence.Preliminary phase sorting is simplified when compared with final phase sorting because its only with
The subset of sequencing feature used in final phase sorting.For example, what is used in final phase sorting is one or more(But
It is not all of in certain embodiments)Sequencing feature is used by preliminary phase sorting.In addition, do not used by final phase sorting
Feature can be used by preliminary phase sorting.In an embodiment of the present invention, sequencing feature used in preliminary phase sorting is not
With atom interdependency, such as word compactness and word together occur.For example, solely for illustrative purposes, first
The sequencing feature used in step phase sorting can include static nature and dynamic atom isolation part(atom-
isolated component).Usually, static nature is those parts of the only independent feature of investigation inquiry.It is static
The example of feature includes sequence of pages, spam classification of the particular webpage page etc..Dynamic atom isolation part is every
The secondary part for only checking the feature relevant with single atom.Example can include such as BM25f, some atom in a document
Frequency, the position of atom in a document(Context)(For example, title, URL, anchor buoy, header, main body, business, class, attribute)
Deng.
Once the number of candidate documents is reduced by preliminary phase sorting again, then the also referred to as final sequence rank in L2 stages
Section sorts the candidate documents provided it by preliminary phase sorting.When the sequencing feature phase with being used in preliminary phase sorting
Than when, the algorithm being used in conjunction with final phase sorting is the more expensive operation with greater number of sequencing feature.
However, final sort algorithm is applied to the candidate documents of much smaller number.Final sort algorithm provides the document of one group of sequence,
And provide search result in response to initial search query based on this group of ranking documents.In certain embodiments, such as it is described herein
Final phase sorting can use forward index, such as in entitled " EFFICIENT FORWARD RANKING IN A SEARCH
ENGINE(Efficient forward direction sequence in search engine)" U.S. Patent application(Application number is not yet distributed)(Lawyer's file number
MFCP. 157 165)Described in.
Fig. 2 is returned to, mixed distribution system server 206 includes document allocation 222, inquiry resolution component 224, inquiry
Distributed components 226 and result combining block 228.Document allocation 222 is generally responsible for using in given ordering system
Various fragments distribute document.Solely for illustrative purposes, if there is needing 100,000,000 documents indexing and exist available
100 fragments, then can distribute 1,000,000 document for each fragment.Or, in bigger scale, if there is needing to compile rope
1,000,000,000,000 documents drawing and have available 100,000 fragment then can distribute 1,000 ten thousand documents for each fragment.As used
What above example was illustrated, document can be uniformly distributed between fragment, or can distribute in a different manner, make
It is not with its identical responsible document of number to obtain each fragment.
For example, when on user equipment 202 via user interface to receive search inquiry when, inquiry resolution component 224 enter
Row operation is reformatted with that will inquire about.How the inquiry is indexed from its free text based on data in search index
Form is reformated into the form for being easy to query search to index, such as Converse Index and forward index.In embodiment, parsing
And analyze the word of search inquiry to recognize the atom that can be indexed for query search.It can use in search index to text
Shelves recognize atom when indexing for recognizing the similar techniques of the atom in document.For example, Query distribution information can be based on
Atom is recognized with the statistics of word.Inquiry resolution component 224 can provide one group of connection and these atoms of atom
Cascade variant.
Atom or atomic unit used herein can refer to a variety of units of inquiry or document.These units can include
Such as word, n first (n-gram), n tuples (n-tuple), nearly k n tuples (k-near n-tuple).Word maps downwards
To the single symbol or word of specific segmenter (tokenizer) technical definition by using.In one embodiment, word
It is simple character.In another embodiment, word is single word or word group.N members are can be individual continuous from document extraction " n "
Or the sequence of the word of nearly singular integral.N members are said to be " compacting ", if it correspond to if the continuous word distance of swimming, and
It is " loose ", if its order occurred in a document according to word includes word, but word is not necessarily continuously
's.Loose n members are usually used to the class equivalent phrases for representing difference buzz words(For example, " if rained, I will do
It is wet " and " if rained, then I will get wet ").N tuples used herein are that together occur in a document(It is sequentially unrelated)'s
One group of " n " individual word.In addition, nearly k n tuples used herein are referred in the window of " k " individual word in a document together
One group of " n " the individual word occurred.Therefore, atom is generally defined as above-mentioned whole generalization.The reality of embodiments of the invention
Can now use different types of atom, but atom general description used herein it is above-mentioned it is various types of in each.
Query distribution component 226 is essentially responsible for receiving the search inquiry submitted and is distributed between fragment.One
In individual embodiment, each search inquiry is distributed to each fragment so that each fragment provides preliminary last set result.Example
Such as, when fragment receives search inquiry, the component in fragment or fragment determines which node will be assigned execution using storage
The task of the preliminary ranking function of reverse index portion on node.In one case, as one of the first group node
Point selected node be its Converse Index one or more of the atom parsed from search inquiry has been indexed that
A bit, as described above.So, when search inquiry is continuously reformatted, one or more atoms are identified and sent to each
Section.Each in first group node is found the first group document relevant with search inquiry based on the return of preliminary ranking function,
As briefly described above.It is then determined that the second group node.In one embodiment, each in these nodes is at it
It is each before the document stored into index in first group of document at least one.In second group node each using it is preceding to
Index data and other considerations perform final ranking function, and as a result, second group of document is identified.In an implementation
In example, each document in second group is included in the first set, because being used and first group of document phase in final phase sorting
The document identification of association.
As a result combining block 228 is given the search result from each fragment(For example, document identification and extracts), and
The final search result list merged from the formation of those results.In the presence of the various modes for forming final search result list, bag
Include and simply remove any repetitive file and be put into each document in list according to the order determined by final sequence.At one
In embodiment, there is the component similar with result combining block 228 in each fragment so that the result produced by each node
It is integrated at the fragment in single list, the list afterwards is sent to result combining block 228.
Turning now to Fig. 3, according to embodiments of the invention, the exemplary plot of mixed distribution system 300 is shown.Fig. 3 is illustrated
Various assemblies, including main body manager 310, main body root 312 and two fragments, fragment 314 and fragment 316.Such as ellipsis 310
Indicated, more than two fragments can be provided.Which process main body manager 310 keeps be forward index and Converse Index
Which fragment provides the state of service.It also keeps the temperature and state of each process.This data is used to generate one group of process
For making inquiry combine from different fragments.Main body root 312 is the top-level root process for also performing query planning function.Main body root
312 will cross over required fragment unnaming and collection and amalgamation result, and can include customized logic.Each fragment
With segment root, such as segment root 320 and segment root 322.Segment root is served as the joint of inquiry and combined from
Process result polymerization process.Segment root 322 is likely to be reassigned to for final inquiry compilation
(assembly) dynamic process of optimal blade or node for.
As indicated, each segment root includes multiple nodes.It is that segment root 320 and segment root 332 are illustrated due to space constraint
Three nodes.Segment root 320 includes node 322, node 324 and node 326.Ellipsis 328 indicates expected more than three section
Point is within the scope of the invention.Segment root 332 includes node 334, node 336 and node 338.Because any number of node can
To constitute segment root, so ellipsis 340 indicates the node of any additives amount.As described, each node be able to carry out it is many
It is individual to calculate(Such as ranking function)Machine or computing device.For example, in one embodiment, each node is included such as in node
L01 adaptation 322A and L2 sorting unit (ranker) 322B shown in 322.Similarly, node 334 includes L01 adaptations
334A and L2 sorting units 334B.More in detail above describes these, but L0 matchings and L1 that can be by total sequencer procedure be arranged
The sequence stage(Preliminary phase sorting)Combine and be referred to as L01 adaptations.Because each node includes L01 adaptations and L2 sequences
Device, so each node must also store a part for Converse Index and forward index, because in one embodiment, L01
Adaptation is using Converse Index and L2 sorting units utilize forward index.As mentioned, piece can be belonged to for each node distribution
A part for the reverse and forward index of section.The fragment communication bus 330 associated with fragment 314 and associated with fragment 316
Fragment communication bus 342 allow each node for example to be communicated when necessary with segment root.
Fig. 4 is the exemplary plot of the mixed distribution system 400 of the diagram Payload requirement according to embodiments of the invention.System
System 400 is the diagram of the individual chip root 410 with multiple nodes.Herein, it is illustrated that six nodes(Be 412 including numbering,
414th, 416,418,420 and 422 node).Although illustrating six nodes, the present invention can be performed using any number
Embodiment.As previously indicated, each node, which has, performs the function that various sequences are calculated, including matching stage(L0)、
Preliminary phase sorting(L1)With final phase sorting(L2)In those.So, node 412 had for example both had for described herein
L0 the and L1 stages L01 adaptation 412A, again have for the L2 stages as described herein L2 sorting units 412B.However, with
It can be differed considerably in the Payload of different phase.In order to preferably be illustrated to this, shown in the first pattern
For the Payload of L01 adaptations, as indicated by numeral 424, and shown in the second pattern for L2 adaptations
Payload, as numeral 426 indicated by.
One group of document for distributing to specific fragment is indexed or by atom(Converse Index)With by document(Forward index)
Burst.These indexes are divided into the equal numbers of part of the node with constituting the specific fragment.In one embodiment, deposit
In 40 nodes, therefore, each in Converse Index and forward index is divided into 40 parts and is stored in each
At each in node.When search inquiry is submitted to search engine, the inquiry is sent to each fragment.Identification first
Group node is the responsibility of fragment, the Converse Index of the first group node have one in the atom from the inquiry being indexed or
It is multiple.Make in this way, if inquiry is resolvable to two atoms, such as from inquiry " William Shakespeare
(Shakespear William)" " William(William)" and " Shakespeare(Shakespear)", then it will be responsible for L01 adaptations
The maximum number of node in fragment will be two.This is that figure 4 illustrates because associated with node 412 and node 416
L01 adaptations are be identified for using in L01 matching process only two.Due to distribute to the document of each fragment by by
Atom is indexed, so each atom is only indexed once in Converse Index so that any specific atoms are only in distribution
Occur in one in the reverse index portion of the node of the fragment.In exemplary scenario, once identify first group of section
Point, the then atom from search inquiry matched with the atom in Converse Index is sent to appropriate node.The node is performed
It is many to calculate so that one group of document is identified.This first group of document includes connect from preliminary phase sorting in one embodiment
Receive those documents of highest sequence.
This first group of document is from each node from the first group node at segment root 410(Including the He of node 412
416)Collect.These results are combined in any of a number of ways so that next segment root 410 can recognize
By the second group node being used in conjunction with final phase sorting.As indicated, each L2 sorting units are used for rank of finally sorting
In section or L2 stages.Because each node has stored a part for the forward index for the fragment, so, exist
The good opportunity of most of access or all forward indexes will be needed in the terminal stage of sequence.In final phase sorting, second
Each node in group node is given its document identification included in its forward index so that node can be at least based on
The data found in forward index are ranked up to the document.Because most of or all nodes are used for final phase sorting
In, so as shown in Fig. 4 system 400, the Payload for final phase sorting, which is typically larger than, to be used to match/tentatively sequence
The Payload in stage.Fragment communication bus 428 allows other assembly communications of node and such as segment root 410.
With reference to Fig. 5, flow chart illustration according to embodiments of the invention be used for using mixed distribution system so as to based on
Search inquiry recognizes the method 500 of relevant documentation.Initially, one group of document is distributed into fragment at step 510.This group of document
Not only indexed but also compiled by document with forward index with Converse Index by atom before or after being received at fragment
Index, as indicated by step 512.So, the document indexed with forward index is to include distributing to one group of text of the fragment
Atom in the document and Converse Index of shelves is the Context resolution from these documents.At step 514, by Converse Index with before
It is stored in a part for index at each node in fragment.Usually, fragment includes multiple nodes.Each node is can
Reverse index portion and forward index part based on the face that is stored thereon perform sequence calculate machine or computing device.
In one embodiment, the Converse Index of each node stored fragments and the different or unique parts of forward index.
Step 516 indicates to access reverse index portion at each node of the first group node.It is every in first group node
Individual node has been identified as that one in the atom of the search inquiry received indexing.First is recognized at step 518
Group document.In one embodiment, these documents have been used preliminary ranking function sequence, enabling the most related text of identification
Shelves.This step can be for example corresponding to the preliminary phase sortings of L1 and/or L0 matching stages.Based on the document in first group of document
Forward index part is accessed at associated document identification, each in the second group node, is shown at step 520.This
Step can correspond to the final phase sortings of L2.This effectively limits the quantity of the relevant documentation for particular search query.
So, the quantity of document is restricted to second group of document, is shown at step 522.In many or in most cases, second group
In node number be more than first group in node, as being more fully described above.Because search inquiry can be only
Only there are two atoms so that L01 matching stages need at most two nodes, but thousands of documents are identified as looking into search
Two atoms ask are relevant, so, can use much more node to perform final row using its respective forward index
Sequence calculates to recognize second group of document.In addition, in embodiment, because final ranking function is utilized from the production of preliminary ranking function
Raw document identification, so the number of the document in second group is less than the number of the document in first group so that in second group
Each document is also contained in first group.
In one embodiment, overall process can be related to reception search inquiry.Recognize one or many in search inquiry
Individual atom, and once each fragment knows one or more atoms, then and identification includes one from search inquiry in fragment
The first group node of at least one in individual or multiple atoms.For example, each node in the first group node is by first group of document
(Such as document identification)It is sent to segment root so that segment root can combine result(For example delete what is repeated)And merging.The
Then two group nodes send second group of document to segment root.Similarly, segment root result is combined and merged with produce in response to
Search inquiry is presented to final one group of document of user.
Go to Fig. 6, it is illustrated that be used for generation for the mixed of many procedure documents searching systems according to embodiments of the invention
Close the flow chart of the method 600 of compartment system.At step 610, the instruction of one group of document is received, this group of document is assigned to
Receive the fragment of this group of document.The fragment includes multiple nodes(Such as ten, 40,50).This group of document by by
Atom is indexed to generate Converse Index, is shown at step 612.At step 614, this group of document by by document index with
Generate forward index.At step 616, a part for a part for Converse Index and forward index is distributed to and constitutes fragment
Each node.In embodiment, each node is allocated the different piece of reverse and forward index so that specific atoms are only in piece
It is indexed in the forward index of a node in section.
In embodiment, at fragment, the instruction of the one or more atoms recognized from search inquiry is received.Identification
First group node, its reverse index portion includes at least one in one or more atoms.These nodes each can
Perform various ranking functions.First group of document is recognized based on the reverse index portion of the first group node.It is each in first group
Node can produce first group and send it to segment root so that various first group nodes can be combined and be merged.One
In individual example, produced via the preliminary sequencer procedure of the multistage sequencer procedure of the reverse index portion using the face that is stored thereon
Raw first group of document.In addition, the second group node then can be recognized, its forward index part is by corresponding to first group of document
One or more document identifications are indexed.Then it can be based in part on and be stored in the data in forward index to recognize second group
Document, and it can calculate feature in real time rather than using the fraction precalculated.It can be based on utilizing forward index
The final sequencer procedure of multistage sequencer procedure recognize second group of document.Once each node in the second group node
Second group of document combined and merged, then also it is merged with second group of document from all other fragment so that formed
One group of document finally simultaneously returns to user as search result.
Fig. 7 is to show being used for using mixed distribution system come based on search inquiry identification phase according to embodiments of the invention
Close the flow chart of the method 700 of document.Initially, at step 710, search inquiry is received.In one embodiment, to inquire about into
Row supplement is changed, such as using spelling truing tool or stem method (stemming).In identification search inquiry at step 712
Atom.At step 714, atom is sent to various fragments.Each fragment is allocated one group of document, this group of document by by
Atom and the Converse Index and forward index being stored in formation at each fragment of being indexed by both documents.Each fragment includes
Multiple nodes, each node is allocated a part for Converse Index and forward index.At step 716, the first group node is recognized,
Its reverse index portion includes at least one in the atom from search inquiry.At step 718, access in the first group node
Each node reverse index portion to recognize first group of relevant documentation.Based on related to each document in first group of document
The document identification of connection, recognizes the second group node at step 720.Each node in second group node is in its respective forward direction
In index part store document mark at least one so that node can to each document perform sequencer procedure.In step
The forward index part at each node in the second group node is accessed at 722 to limit the number of relevant documentation.In a reality
Apply in example, each document in second group of document is also contained in first group of document.Based on second group of document, generation search knot
Really(For example, by being compiled to second group of document from multiple fragments)And it is presented to user.
Referring now to Figure 8, there is provided the method 800 for each fragment allocated segment root for main body root.It is all as above
In the embodiment stated, segment root can take the form of preliminary segment root and final segment root.For example, based on known letter at that time
Breath(The current or prospective load of each node in such as fragment)To select preliminary segment root, or even can randomly it select
Preliminary segment root is selected, such as according to circulation time table.Final segment root is selected based on many factors and hereafter will in more detail
Discuss.Usually, segment root is served as the joint of inquiry and from by the polymerization of the result of united process
Process.Select preliminary and final segment root(All segment roots 322 as shown in Figure 3)Process be probably dynamic process.For example, point
Process with preliminary and final segment root can be directed to the inquiry each received and occur simultaneously across multiple fragments.To be final
It is optimal that the node of segment root selection, which is considered as final inquiry compilation,.
As shown in Fig. 3 herein, each segment root includes multiple nodes.It is segment root 320 and piece due to space constraint
Section root 332 illustrates three nodes.Segment root 320 includes node 322, node 324 and node 326.Ellipsis 328 indicates to be expected
More than three node is within the scope of the invention.Segment root 332 includes node 334, node 336 and node 338.Due to any number
Destination node may be constructed segment root, so ellipsis 340 indicates the node of any additives amount.As mentioned, Mei Gejie
Point is to be able to carry out multiple calculating(Such as ranking function)Machine or computing device.For example, in one embodiment, Mei Gejie
Point includes the L01 adaptation 322A and L2 sorting units 322B as shown in node 322.Similarly, node 334 includes L01
Orchestration 334A and L2 sorting unit 334B.More in detail above describes these, but can by the L0 of total sequencer procedure matching and
L1 phase sortings(Preliminary phase sorting)Combine and be referred to as L01 adaptations.Because each node includes L01 adaptations and L2 rows
Sequence device, thus each node must also stored Converse Index and forward index a part because in one embodiment,
L01 adaptations are using Converse Index and L2 sorting units utilize forward index.As mentioned, it can belong to for each node distribution
A part for the reverse and forward index of fragment.The fragment communication bus 330 and related with fragment 316 being associated to fragment 314
The fragment communication bus 342 of connection allows each node for example to be communicated when necessary with segment root.
Fig. 8 is returned, search inquiry is initially received at step 810.With reference to Fig. 3, search can be received at main body root 312
Search inquiry or part thereof is then distributed to each fragment by inquiry, main body root 312.At step 812, based on each node whether
It will be used to decompose search inquiry to recognize the group node in fragment.As mentioned, each node is allocated one group of document,
By this group of document structure tree Converse Index and forward index.So, the atom in search inquiry, some nodes will be used for spy
Fixed inquiry and some nodes will not.Based on by be used to decompose ad hoc inquiry node or inquiry in the hash of atom know
The group node not recognized at step 812.Hash function obtains the word or atom in search inquiry and determines which node
The record list (posting list) with the particular words or atom is stored in the above.This allows identification to be used to
Decompose the node of the particular search query.Record list is only the list of word and those documents comprising the word.Can be with
Algorithm is used for hash function.Exemplary algorithm includes MD5 or CRC, but others are certainly contemplated to the scope of the present invention
It is interior.
At step 814, preliminary segment root is selected from the group node.In one embodiment, it is randomly chosen preliminary
Segment root, such as cyclically.However, in another embodiment, selecting preliminary segment root based on expected load so that have
The node of lowest desired load is selected as preliminary segment root.In an example, this can be with minimum purpose record
Outstanding requests(Such as current loads)Node so that the node of the minimum current loads with unfinished inquiry be selected as just
Walk segment root.Once have selected preliminary segment root from the group node, then received at preliminary segment root in the group node
The statistics of each node, shows at step 816.In one embodiment, preliminary segment root request by means of for example by
The communication bus of each node is connected to send this data.Statistics can indicate that each node serves as final segment root
Ability.Final segment root is responsible for based on result of query execution of the search inquiry compilation from the group node.Property only by way of example
Purpose, statistics can include the record length of list of each node, input/output load, associated with specific node
The problem of signal or will be required to be transferred to the data volume of final segment root.Usually, selection is considered as serving as final fragment
Cause minimum cost during root(Such as time, money)Node.
If for example, specific node has extremely long record list, then will need to pass to final segment root across network
Defeated substantial amounts of data, so that final segment root, which can polymerize the inquiry from all nodes, extracts data.In an implementation
In example, this there can be the specific node to be transmitted of mass data to be elected to be final segment root so that its data need not be sent out
Another node is sent to, the transmission will cause the data transfer of high cost.As it was previously stated, sending out the node of problem signals may have
Some problems.The auxiliary signal of many types can indicate that the performance of the taking-up data of node is weakened.Equally as described, elected
It is contemplated that input/output load when selecting final segment root.For example, this can cover from the node queue of hard disk extraction data
Length.In addition, recording list if there is it includes word " dog(Dog)" three nodes, then when receive also include word " dog
(Dog)" inquiry when, the node with minimum load can be selected, because it will have more times serve as final fragment
Root.When it is determined that during final segment root, the other factorses comprising bandwidth can also be included.In an example, preliminary segment root is real
The algorithm for determining final segment root is performed on border.
At step 818, final segment root is selected from the group node by algorithm based on statistics.In some embodiments
In, preliminary segment root and final segment root are identical nodes, but in other embodiments, they are different nodes.
Algorithm can be used to carry out the determination on the node by final segment root for ad hoc inquiry is used as.Inquiry will consider
Above-mentioned statistics.At step 820, the identity of final segment root is notified into the group node so that node is known they are each
From result of query execution send where.In one embodiment, preliminary segment root oneself, which is taken, is transferred to itself most
The task of whole segment root, or if it is selected as final segment root, then preliminary segment root can pass on it to show to other nodes
It is being final segment root.
In embodiment, search inquiry is performed using the group node being previously identified.It is as described herein, can be by first
Group node is identified as participating in preliminary phase sorting(For example utilize the Converse Index being stored on node), and can be by second group
Node is identified as participating in final phase sorting(For example utilize the forward index being stored on node).Final segment root can be for
Collected from both preliminary and final phase sortings and aggregated data.For example, the node being related in preliminary phase sorting returns to bag
Containing some word or the lists of documents of atom in search inquiry.The node that final phase sorting is related to is returned with search inquiry most
Related document, such as document identification.So, result of query execution can refer to from preliminary phase sorting, final phase sorting
Or both result.Further, can be without using multi-step sequencer procedure.There is single sequence or search procedure wherein
Example in, collected by final segment root and polymerize single group result.
The method described in fig. 8 can be used as system.It is, for example, possible to use various system components come select it is preliminary and
Final segment root.Solely for illustrative purposes, these components can include preliminary segment root selection component, statistics reception
Component, final segment root selection component and query execution component.These components can by all networks 208 as shown in Figure 2 it
The network of class is in communication with each other.Preliminary segment root selection component is responsible for aggregation data and performs hash calculating to determine the specific fragment
Which of node will be used to perform search inquiry.As described, each node has one of record list and search index
Point.The record list includes the document that atom and the atom are contained therein.Statistics receiving unit can be asked from node
Indicate that each node serves as the availability of final segment root and the statistics of ability with receiving.According to searching from execution will be used to
The statistics that the node of rope inquiry is received, final segment root selection component is responsible for selecting final segment root.In an implementation
In example, preliminary segment root carries out this determination using algorithm.Finally, query execution component distributes the search inquiry or part thereof
To final segment root, then the search inquiry part is distributed to the appropriate node in the fragment by final segment root.The node is true
Which fixed document is most related to the part inquired about or inquired about, and the data are sent into final piece by means of such as communication bus
Duan Gen.As described, multiple fragments occur this process simultaneously.
Go to Fig. 9, it is illustrated that the method 900 for selecting segment root from multiple nodes.Initially, at 910, including many
Search inquiry is received at the fragment of individual node.As described, there are each multiple fragments for simultaneously performing search inquiry(For example it is several
Hundred fragments).Each fragment includes multiple nodes, and each of which is allocated the document indexed with one or more search indexes
A part.So, each node has stored a part for the reverse and forward index indexed respectively by atom and document.
At step 912, the group node from the multiple node is identified as being used to perform to the search inquiry received.Can be with
Which for example carried out using hash function on by using the determination of node.This can depend on storing on each node
Index so that its those node for indexing the particular words or atom that have stored in search inquiry is identified as being used to decompose being somebody's turn to do
Particular search query.At step 914, before search inquiry is performed, preliminary segment root is selected from the multiple node.Tentatively
The selection of segment root is the expection load based on each node, the current loads of each node, random selection etc..In preliminary segment
Available information is used for selection when root is selected.Preliminary segment root serves as preliminary segment root until have selected final fragment
Root.
Statistics is received at selected preliminary segment root at step 916.The statistics can before being received
With to for perform search inquiry the group node in each node request and be also to be received at preliminary segment root.Such as
Preceding described, statistics can include the length or other numbers by necessary cross-domain network transmission of the record list of each node
According to the input/output load on, node(Such as by the queue length that data are extracted from hard disk have how long), it is associated with node
One or more of problem signals, cost etc..At step 918, final fragment is selected based on the statistics received
Root.Final segment root is collected during query execution from the group node and aggregate query performs data.In one embodiment, most
Whole segment root receives the search inquiry of the external source from such as server etc.Final segment root can with or alternatively connect
By the search inquiry for carrying out autonomous agent root, all main body roots 312 as shown in Figure 3.At step 920, search inquiry is performed.As institute
State, there may be one or more stages of query execution, such as preliminary phase sorting and final phase sorting.
Figure 10 illustrates the method 1000 for selecting segment root.At the main body root with one or more fragments,
Step 1010 place receives search inquiry.Each fragment in main body root has multiple nodes, each has and is stored thereon face
Search for a part for index.For example, each node can store in the above by the Converse Index of atom tissue a part and
By a part for the forward index of document tissue.Tied when using multiple sequences or search phase based on search inquiry to provide search
Situation may be so during fruit.At step 1012, recognizing will be used to perform the search inquiry that receives in each segment
One group node.Multiple fragments perform preliminary and final segment root the process of selection simultaneously.In addition, this process can be directed in main body
The each search inquiry received at root occurs.At step 1014, for each fragment in main body root, from the group section
Point identification preliminary segment root.For example from preliminary segment root to will be used to perform the search inquiry received at step 1016
Each node request statistics in the group node.Statistics include indicate node be selected as final segment root ability or
Any data of availability.Overall goal is to transmit data as few as possible for example from node to final segment root on network.
Therefore most cost-efficient node is selected as final segment root.In some instances, can be more towards load adjustment
Overall goal, but in other examples, preliminary segment root may be more favourable to network capacity.
At step 1018, statistics is received from each node in the group node at preliminary segment root.As described,
Statistics indicate each node serve as final fragment with availability, final segment root is from the group node in its respective segments
Collect query execution data.In one embodiment, statistics and other data are transmitted from node using communication bus
To preliminary or final segment root.A section root is finally arranged for each Piece Selection based on statistics at step 1020.In step
At 1022, search inquiry is performed.In certain embodiments, before search inquiry is performed, end is sent to described many
Individual node, or at least indicate the group node of the identification of final segment root so that node is known their own including looking into
Where is the data transmission of inquiry execution data.Final segment root is received after inquiry is performed from each node in the group node
Query execution data.
The present invention is described relative to specific embodiment, it is intended to be illustrative and nonrestrictive in all respects.
Without departing from the scope of the invention, alternative embodiment will become for those skilled in the art in the invention
Obviously.
According to foregoing teachings, it will be seen that be that the present invention is well suited for reaching all results described above and mesh
And advantage that the system and method are apparent and intrinsic.It should be understood that some features or sub-portfolio have practicality
And can be used in the case of without reference to further feature and sub-portfolio.This scope of the claims is expected and in claim
In the range of.