CN102693274B

CN102693274B - Dynamic queries master agent for query execution

Info

Publication number: CN102693274B
Application number: CN201210079487.8A
Authority: CN
Inventors: K.M.里斯维克; M.霍普克罗夫特; K.卡尔亚纳拉曼; T.基林比; H.塞蒂亚万; C.W.安德森
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2011-03-25
Filing date: 2012-03-23
Publication date: 2017-08-15
Anticipated expiration: 2032-03-23
Also published as: CN102693274A

Abstract

For each Piece Selection preliminary segment root and final segment root.Every time when receiving search inquiry, the group node that will be used to decompose in each fragment of search inquiry is recognized.Preliminary segment root is selected from the group node.The each node of instruction based on each node in the group node serves as the statistics of the ability of the final segment root of compilation query execution data, and preliminary segment root passes through the final segment root of algorithms selection.Identity on final segment root is notified into other nodes in the group node.

Description

Dynamic queries master agent for query execution

The cross reference of related application

The application is entitled " the HYBRID-DISTRIBUTION MODEL FOR SEARCH submitted on November 22nd, 2010 ENGINE INDEXES（Mixture Distribution Model for search engine index）" Application No. 12/951,815（Lawyer's archives Number MFCP. 157166）U. S. application part continuation application, entire contents are incorporated herein by reference.

Background technology

The amount sustainable growth very fast of available information and content on internet.Given huge information content, has been opened Search engine has been sent out to promote the search to electronic document.Especially, user can include for a user may be used by input Can the search inquiries of one or more words interested search for information and document.From user receive search inquiry it Afterwards, search engine recognizes the document and/or webpage of correlation based on the search inquiry.Due to its practicality, web search that is, The process that the search inquiry issued for user finds related web page and document becomes most stream on current internet provablely One of capable service.

In addition, search engine is usually using single step process, it is using search index so as to based on the search inquiry received To recognize the relevant documentation to be returned to user.However, search engine sorts, (ranking) function has appearsed as extremely complex Function, if may be both time-consuming and costly in the case of being used for each document for being indexed.In addition, these complicated formulas institutes The storage of the data needed may also cause problem, especially come in the Converse Index indexed with leading to everyday expressions or phrase During storage.When with Converse Index to store, the extraction of the related data needed for complicated formulas is poorly efficient.

The content of the invention

It is to introduce the concept hereafter further described in the detailed description in order in simplified form to provide present invention Selection.Present invention is not intended the key feature or essential feature of the claimed theme of identification, be also not intended by It is used to help determine the scope of claimed theme.

Embodiments of the invention, which are related to across same group node, uses atom burst (atom-sharded) and document burst (document-sharded) both distribution so that each node or machine had both stored a part for Converse Index（For example by Atom burst）A part for forward index is stored again（For example by document burst）.It can be distributed for fragment (segment) to be responsible for One group of document.Not only indexed by atom but also by document for this group of document so that exist associated with this group of document reverse Index and forward index.Each fragment includes multiple nodes, and can both and forward indexes reverse for each node distribution Different piece.In addition, each node is responsible for reverse and both forward index parts using the face that is stored thereon to perform Multiple sequences are calculated.For example, preliminary sequencer procedure can utilize Converse Index and final sequencer procedure can utilize forward index. The formation of these sequencer procedures is used to recognize total sequencer procedure of most relevant documentation based on the search inquiry received.

Other embodiments of the invention are directed to preliminary segment root（segment root）With the selection of final segment root.Typically Ground, preliminary segment root is selected based on any Given information during preliminary segment root is selected, and only only temporarily first using this Segment root is walked until final fragment is chosen.In embodiment, preliminary segment root using algorithm based on from constitute fragment it is various The statistics that node or machine are received selects final segment root.As will be explained in further detail herein, there is use To decompose many fragments of (resolve) search inquiry, each fragment includes multiple nodes or machine.Only from its search index bag Preliminary segment root is selected containing the word being present in the search inquiry that has been received or those nodes of atom.The group node is only Including the node by being used to perform particular search query.Once more information can be provided, such as input/output load, Current and anticipated load, including query request, it is associated with node the problem of signal etc., then select final segment root so that across The more data of network transmission minimum, therefore reduction performs the totle drilling cost of search inquiry.

Brief description of the drawings

Below with reference to the accompanying drawings the present invention is described in detail, in the accompanying drawings：

Fig. 1 is suitable for the block diagram of the exemplary computing environments used when realizing embodiments of the invention；

Fig. 2 is the block diagram for the example system that can wherein use embodiments of the invention；

Fig. 3 is the exemplary plot of the mixed distribution system according to embodiments of the invention；

Fig. 4 is the exemplary plot of the mixed distribution system of the diagram Payload requirement according to embodiments of the invention；

Fig. 5 is shown according to embodiments of the invention for being known using mixed distribution system based on search inquiry The flow chart of the method for other relevant documentation；

Fig. 6 is to show to be used for mixed distribution of the generation for many procedure documents searching systems according to embodiments of the invention The flow chart of the method for system；

Fig. 7 is shown according to embodiments of the invention for being known using mixed distribution system based on search inquiry The flow chart of the method for other relevant documentation；And

Fig. 8~10 are to show the various methods for being used to recognize segment root from multiple nodes according to embodiments of the invention Flow chart.

Embodiment

Describe subject of the present invention with specificity to meet legal requirements herein.However, this explanation itself is not It is intended to the scope of limitation this patent.On the contrary, inventors have contemplated that, with reference to other present or following technology, it is also possible to Otherwise embody claimed theme, with including different steps or with described some step classes herein As step combination.In addition, although term " step " and/or " square frame " herein can be for meaning used method Different elements, but the term should not be construed as to imply that among various steps disclosed herein or between any specific book Sequence, except non-sum except when when clearly describing the order of separate step.

As described above, embodiments of the invention are provided, node formation fragment, so that each is stored for before the fragment To a part for index and Converse Index.For example, in the document total amount to be indexed（Such as 1,000,000,000,000）Among, can be each Fragment distributes the document of some part so that the fragment is responsible for indexing and perform sequence calculating for those documents.It is stored in Part in the specific fragment is reverse and forward index be for the document for distributing to the fragment it is complete it is reverse with Forward index.Each fragment includes multiple nodes, and it is substantially machine or computing device with storage capacity.Converse Index Each node in the fragment is assigned to the independent sector of forward index so that can be performed using each node various Sequence is calculated.So, each node have stored thereon the Converse Index of the fragment and the subset of forward index, and be responsible in piece Each is accessed in various sequencer procedures in section.For example, total sequencer procedure can include matching stage, preliminary phase sorting and Final phase sorting.Matching/elementary step may require compiling some atom from search inquiry using its Converse Index Those nodes of index recognize the first group document relevant with search inquiry.First group of document is from distributing to the fragment One group of document of document.Then, its forward index index by a pair document identification associated with the document in first group of document Those nodes can be used to identify the second group document more relevant with search inquiry.In one embodiment, second group Document is the subset of first group of document.This total process can make for being limited to one group of document to be found those related If final sequencer procedure generally more time-consuming than preliminary sequencer procedure and expensive be used to pair with to each in index Document ordering（Regardless of whether related）When situation about being likely to occur be ranked up compared to less document.

Therefore, on the one hand, embodiments of the invention are directed to one or more computer-readable storage mediums, and it is stored in and counted Calculate and promote computing device when equipment is used for recognizing related text based on search inquiry using mixed distribution system The computer-useable instructions of the method for shelves.This method includes distributing one group of document to fragment, and this group of document is pressed with Converse Index Atom is indexed and indexed with forward index by document, and the different piece of Converse Index and forward index is stored in into shape Into in each in multiple nodes of fragment.In addition, this method includes accessing in each being stored in the first group node Reverse index portion to recognize the first group document relevant with search inquiry.This method, which is comprised additionally in, to be based on and first group of document Associated document identification come access the forward index part in each being stored in the second group node with by first group text A large amount of relevant documentations in shelves are limited to second group of document.

In another embodiment, aspect of the invention is directed to one or more computer-readable storage mediums, and it is stored in and counted Calculate equipment promote when using computing device be used for generate be used for many procedure documents searching systems mixed distribution system side The computer-useable instructions of method.This method includes receiving the instruction for one group of document for distributing to fragment, and the fragment includes multiple sections Point.For fragment, this method also include one group of document of distribution is indexed by atom with generate Converse Index and by Document indexs one group of document of distribution to generate forward index.This method is comprised additionally in into multiple nodes of formation fragment Each distribution Converse Index part and forward index a part so that each in the multiple node has been deposited Store up the different piece of forward index and the different piece of Converse Index.

Another embodiment of the present invention is directed to one or more computer-readable storage mediums, and it, which is stored in, is used by a computing device When promote computing device to be used for using mixed distribution system so as to the method for recognizing relevant documentation based on search inquiry Computer-useable instructions.This method includes receiving search inquiry, recognizes one or more of search inquiry atom, and will be described One or more atoms are sent to multiple fragments, and each fragment has been allocated the one group of document indexed by atom and by document, So that Converse Index and forward index be generated and stored in each in the multiple fragment at.In the multiple fragment Each include multiple nodes, each node is allocated a part for forward index and Converse Index.Based on one or Multiple atoms, this method recognizes the first group node at the first fragment, and its reverse index portion includes one from search inquiry At least one in individual or multiple atoms.In addition, methods described includes accessing at each being stored in the first group node First group of document that reverse index portion is found relevant with one or more of atoms to recognize and based on first group The associated document identification of document recognizes that its forward index part is included in the document identification associated with first group of document The second one or more group nodes.This method also includes accessing the forward index at each being stored in the second group node Part is to recognize second group of document as the subset of first group of document.

In other embodiments of the invention, segment root is selected from the node for performing ad hoc inquiry, such as it is searched Rustling sound draws those nodes including being present in word or atom in search index.Initially, preliminary segment root is chosen and interim Ground is used to collect data from other nodes.In one embodiment, the preliminary segment root is collected and polymerize from other nodes Statistics, its will be used to perform search inquiry.Preliminary amendment determines which node is best suited for serving as using algorithm Final segment root for the ad hoc inquiry.For example, can be used for final fragment with most data segment root to be transmitted The good selection of root, because need not polymerize substantial amounts of data transfer to another node.In one embodiment, select most The principal concern of whole segment root is cost, such as by timeliness of the data from a node-node transmission to final segment root and easily Property.For example, when the node being related in the execution in ad hoc inquiry recognizes the document relevant with search inquiry, it is necessary to by this data It is transferred to the final segment root from many this information of node aggregation.The inquiry for the polymerization that final segment root takes pride in multinode in the future is carried Access evidence is transferred to another component of the similar data assembling from multiple fragments.Similarly, target is that selection will cause always The more efficient final segment root of body query execution process cost.

Similarly, on the one hand it is to be directed to one or more computer-readable storage mediums, it is stored in when being used by a computing device Computing device is promoted to be used for the computer-useable instructions of method of allocated segment root.This method includes receiving search inquiry, The group node that will be used to decompose in the fragment of search inquiry is recognized, and preliminary segment root is selected from the group node.In addition, should Method is included at preliminary segment root receives statistics from each node in the group node recognized, and the statistics refers to Show the ability that each node serves as final segment root, the final segment root is responsible for looking into from the group node based on search inquiry Ask implementing result assembling.This method comprises additionally in based on the statistics and to select final segment root simultaneously from the group node by algorithm Final segment root is notified to give the group node so that node know by their own result of query execution send where.

Second aspect is directed to one or more computer-readable storage mediums, and it, which is stored in when being used by a computing device, promotes to calculate Equipment performs the computer-useable instructions of the method for allocated segment root.This method is included at the fragment including multiple nodes The search inquiry to be performed is received, and a group node of execution search inquiry will be used to from the identification of the multiple node.Holding Before row search inquiry, this method includes selecting preliminary segment root from the multiple node, and the selection is based on being used for each node One or more of expection load or random selection.In addition, this method is included at preliminary segment root from will be used to hold Each node in the group node of row search inquiry receives statistics.The statistics is included with sending data across network Associated current loads and cost data.Based on statistics, selection will polymerize during query execution from the group node looks into Ask the final segment root for performing data.Then search inquiry is performed.

The third aspect is directed to one or more computer-readable storage mediums, and it, which is stored in when being used by a computing device, promotes to calculate Equipment performs the computer-useable instructions of the method for allocated segment root.This method is included in the main body root including multiple fragments (corpus root) place receives search inquiry.Each in the multiple fragment includes multiple nodes.Each node, which has, to be deposited A part for the search index of storage in the above.Methods described also includes recognizing will be used to perform what is received in each fragment One group node of search inquiry simultaneously recognizes preliminary segment root for each in the multiple fragment from the group node.In addition, Statistics is asked to by each node for being used to perform in the group node of the search inquiry received.The statistics is Received from each node in the group node, the statistics indicates that each node serves as the availability of final segment root, The final segment root collects query execution data from the node group in its respective segments.This method is comprised additionally in based on statistical number According to selecting the final segment root for each fragment and perform search inquiry.

The general survey of embodiments of the invention has been briefly described, has described wherein realize embodiments of the invention below Illustrative Operating Environment so as to provide for the present invention various aspects general background.Initially referring in particular to Fig. 1, it is used for Realize that the Illustrative Operating Environment of embodiments of the invention is illustrated and is designated generally as computing device 100.Calculating is set Standby 100 be only an example of appropriate computing environment, it is not intended to which hint is for the scope of use of the invention or function Any limitation.Also computing device 100 should not be construed to any one with shown component or combines relevant any Dependence or requirement.

The present invention, including such as program module can be described under the general background of computer code or machine usable instructions Computer executable instructions, it is by the computer of such as personal digital assistant or other portable equipments etc or other machines Perform.Generally, including the program module of routine, program, object, component, data structure etc. refers to performing particular task or reality The code of existing particular abstract data type.The present invention, including hand-held formula equipment, consumption can be implemented with multiple systems configuration Electronic installation, all-purpose computer, more professional computing device etc..The present invention can also be implemented in a distributed computing environment, its In, task is performed by the remote processing devices by communication network links.

With reference to Fig. 1, computing device 100 includes either directly or indirectly coupling the bus 110 of following equipment：Memory 112, One or more processors 114, one or more presentation components 116, input/output（I/O）Port 118, input output assembly 120 and illustrative power supply 122.Bus 110 can represent one or more buses（Such as address bus, data/address bus or its Combination）.Although being for the sake of clarity shown by lines Fig. 1 various square frames, in fact, it is not so clear to describe various assemblies Chu, for example, the line will be more accurately grey and fuzzy.For example, a people may think that such as display device be in Existing component is I/O components.In addition, processor has memory.Present inventors have recognized that this is the characteristic of this area, and reaffirm Fig. 1 figure is merely illustrative the exemplary computer device that can be used in conjunction with one or more embodiments of the invention.Such as In the range of Fig. 1 and with reference to " computing device " all it is expected that, not in such as " work station ", " server ", " on knee Made a distinction between the classification such as computer ", " portable equipment ".

Computing device 100 generally includes a variety of computer-readable mediums.Computer-readable medium can be any available Medium, it can be accessed by computing device 100 and including volatibility and non-volatile media, removable and irremovable medium. By way of example and not by way of limitation, computer-readable medium can include computer-readable storage medium and communication media.Computer Storage medium includes storing the letter of such as computer-readable instruction, data structure, program module or other data etc Volatibility and non-volatile, removable and irremovable medium that any method or technique of breath is realized.Computer-readable storage medium Including but not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc （DVD）Other optical disc storages, cassette, tape, disk storage or other magnetic storage apparatus or can be used for store desired Information and any other medium that can be accessed by computing device 100.Communication media is generally in such as carrier wave or other conveyers Computer-readable instruction, data structure, program module or other data are included in the modulated message signal of system, and including appointing What information-delivery media.Term " modulated message signal " means to make one or more of its characteristic by encode information onto The signal that mode in signal sets or changed.By way of example and not limitation, communication media include such as cable network or The wire medium and such as wireless medium of sound, RF, infrared ray and other wireless mediums etc of direct wire connection etc.On Stating any combinations of items should also be included within the scope of computer readable media.

Memory 112 includes the computer-readable storage medium of volatibility and/or nonvolatile memory form.Memory can be with It is moveable, immovable or its combination.Exemplary hardware devices include solid-state memory, hard disk drive, CD and driven Dynamic device etc..Computing device 100 includes one or many that data are read from the various entities of such as memory 112 or I/O components 120 Individual processor.Component 116 is presented data instruction is presented to user or miscellaneous equipment.Exemplary presentation components include display device, Loudspeaker, print components, vibration component etc..

I/O ports 118 allow computing device 100 to be logically coupled to include the miscellaneous equipment of I/O components 120, therein Some can be built-in.Illustrative components include microphone, control stick, game mat, satellite antenna, scanner, printer, nothing Line equipment etc..

Referring now to Figure 2, can wherein use the square frame of the example system 200 of embodiments of the invention there is provided diagram Figure.It should be understood that this arrangement as described herein and other arrangements are merely possible to example elaboration.In addition to those shown Or alternatively, other arrangements and element can be used（For example, machine, interface, function, order and function packet etc.）, and And some elements can together be omitted.In addition, many elements as described herein are to may be implemented as discrete or distributed group Part is combined the functional entity realized and be in any appropriately combined and position with other components.Can by hardware, firmware and/ Or software performs the various functions for being described herein as being performed by one or more entities.For example, can be by performing The processor of storage instruction in memory performs various functions.

In addition to unshowned other components, system 200 includes user equipment 202, fragment 204 and mixed distribution system Server 206.Each component shown in Fig. 2 can be any kind of computing device, for example, the meter such as described with reference to Fig. 1 Calculate equipment 100.The component can be in communication with each other via network 208, and network 208 can include one without limitation Or multiple LANs（LAN）And/or wide area network（WAN）.Such networked environment office, the computer network of enterprise-wide, It is universal in in-house network and internet.It should be understood that within the scope of the invention, any number can be used in system 200 Purpose user equipment, fragment and mixed distribution system server.Individual equipment each can be included or assisted in distributed environment The multiple equipment of work.For example, the fragment can include the fragment 204 as described herein of offer jointly being arranged in distributed environment Function multiple equipment.In addition, unshowned other components may also be included in that in system 200, and in some embodiments In can omit the component shown in Fig. 2.

User equipment 202 can be by being able to access that any kind of calculating that the user of network 200 possesses and/or operated Equipment.For example, user equipment 202 can be desktop computer, laptop computer, tablet PC, mobile device or can visit Ask any other equipment of network.Usually, by submitting search inquiry to search engine, end user can be set using user Electronic document is especially accessed for 202.For example, end user can be come to visit using the web browser on user equipment 202 Ask and watch the electronic document of storage in systems.

Fragment 204 generally includes multiple nodes, also referred to as blade.In fig. 2, it is illustrated that two nodes, including numbering be The node 2 that 210 node 1 and numbering is 212.Although illustrating two nodes in the embodiment of fig. 2, fragment can include The node more much more than two（Such as 10,40,100）.Two nodes are illustrated merely for the sake of exemplary purpose.Such as fragment 204 each fragment is allocated its one group of responsible document.Similarly, Converse Index and forward index are generated and fragment is stored in At 204.Although Converse Index and forward index of the generation for specific fragment can be in fragment 204, replaceable In embodiment, generation indexes and is sent to fragment 204 at some other positions or on some other computing device.In addition, Once two indexes, then be divided into each by Converse Index and forward index based on one group of document structure tree for distributing to fragment 204 Part.In one embodiment, the number of the part is equal to the number of the node associated with specific fragment.Therefore, in spy In the case of 40 nodes in stator section, two indexes are all divided 40 tunnels so that each node is responsible for Converse Index with before The different piece of each into index.As indicated, node 1 has reverse index portion 214 and forward index part 216. Node 2 also has reverse index portion 218 and forward index part 220.Although dividually being shown with node, in an implementation In example, index is stored in node sheet.Under any circumstance, each node is responsible for based on one group of text for distributing to the fragment Converse Index and a part for forward index that shelves are indexed.

As described, index can be indexed or burst or by document burst by atom.As it is used herein, burst refers to Be the process one group of document indexed by atom or by document.Each side is used alone in the case where not having another There are pros and cons in method.For example, when by document burst, benefit includes the isolation of the processing between fragment so that only need knot The merging of fruit.In addition, easily making the information of each document be aligned with matching.In addition, network traffics are small.On the contrary, unfavorable Part includes needing each fragment to handle any ad hoc inquiry.If Converse Index data are placed on disk, for N number of Minimum O is needed for K atom inquiry on fragment（KN）Secondary interrogate and examine is looked for (disk seek).When by atom burst, it is favourable it Place includes reduced calculating so that only need K fragment to handle the inquiry of K atoms.If Converse Index data are placed on disk On, then need O for the inquiry of K atoms（K）Secondary interrogate and examine is looked for.But, on the contrary, disadvantage includes the processing for needing to connect, So that all fragments that storage participates in the atom of inquiry need cooperation.In addition to the information of each document does not allow manageability, network Flow is also significant.Embodiments of the invention require the management of the data to each document less than conventional method.It is former Because being pre-calculated including some fractions and to index（Such as Converse Index）To score, and in matching stage（L0）Afterwards also Occur the further refinement and filtering of document.Similarly, above-mentioned disadvantage relative to the data to each document management and Speech is largely reduced.

In addition, each node in specific fragment is able to carry out various functions, including allow to recognize relevant search result Ranking function.In certain embodiments, search engine can select the search knot for search inquiry using process stage by stage Really, such as in entitled " MATCHING FUNNEL FOR LARGE DOCUMENT INDEX（Matching for big document index is leaked Bucket）" U.S. Patent application（Application number is not yet distributed）（Lawyer's file number MFCP.157120）Described in side stage by stage Method.Herein, each node can use multiple stages of total sequencer procedure.Exemplary sort process is described below, but Be its be only the sequencer procedure that each node can be used an example.Can be using total row when receiving search inquiry Program process is so that the quantity of matching document is downwards with that can manage size pairing.When receiving search inquiry, search inquiry is analyzed To recognize atom.Then atom is used during the various stages of total sequencer procedure.These stages can be referred to as the L0 stages（ With the stage）Indexed with query search and recognize one group of initial matching document comprising the atom from search inquiry.This is initial Process can by the number of candidate documents from all documents for indexing in search index reduce to from search inquiry Those documents of atom matching.For example, search engine can it is millions of or even several trillion document in search for determine With particular search query it is maximally related those.Once L0 matching stages are completed, then the number of candidate documents is greatly reduced.So And, many algorithms for being positioned to most relevant documentation are costly and time-consuming.So, can be using two other ranks Section, including preliminary phase sorting and final phase sorting.

Also referred to as the preliminary phase sorting in L1 stages is using function of scoring is simplified, and it, which is used to calculate, is used for from above-mentioned L0 The preliminary of the candidate documents retained with the stage is scored or sorted.Preliminary sequencing assembly 210 is similarly responsible for providing being used for from L0 The preliminary sequence of each candidate documents retained with the stage.Alternatively, it is possible to candidate documents are scored, and similarly Given absolute number rather than sequence.Preliminary phase sorting is simplified when compared with final phase sorting because its only with The subset of sequencing feature used in final phase sorting.For example, what is used in final phase sorting is one or more（But It is not all of in certain embodiments）Sequencing feature is used by preliminary phase sorting.In addition, do not used by final phase sorting Feature can be used by preliminary phase sorting.In an embodiment of the present invention, sequencing feature used in preliminary phase sorting is not With atom interdependency, such as word compactness and word together occur.For example, solely for illustrative purposes, first The sequencing feature used in step phase sorting can include static nature and dynamic atom isolation part（atom- isolated component）.Usually, static nature is those parts of the only independent feature of investigation inquiry.It is static The example of feature includes sequence of pages, spam classification of the particular webpage page etc..Dynamic atom isolation part is every The secondary part for only checking the feature relevant with single atom.Example can include such as BM25f, some atom in a document Frequency, the position of atom in a document（Context）（For example, title, URL, anchor buoy, header, main body, business, class, attribute） Deng.

Once the number of candidate documents is reduced by preliminary phase sorting again, then the also referred to as final sequence rank in L2 stages Section sorts the candidate documents provided it by preliminary phase sorting.When the sequencing feature phase with being used in preliminary phase sorting Than when, the algorithm being used in conjunction with final phase sorting is the more expensive operation with greater number of sequencing feature. However, final sort algorithm is applied to the candidate documents of much smaller number.Final sort algorithm provides the document of one group of sequence, And provide search result in response to initial search query based on this group of ranking documents.In certain embodiments, such as it is described herein Final phase sorting can use forward index, such as in entitled " EFFICIENT FORWARD RANKING IN A SEARCH ENGINE（Efficient forward direction sequence in search engine）" U.S. Patent application（Application number is not yet distributed）（Lawyer's file number MFCP. 157 165）Described in.

Fig. 2 is returned to, mixed distribution system server 206 includes document allocation 222, inquiry resolution component 224, inquiry Distributed components 226 and result combining block 228.Document allocation 222 is generally responsible for using in given ordering system Various fragments distribute document.Solely for illustrative purposes, if there is needing 100,000,000 documents indexing and exist available 100 fragments, then can distribute 1,000,000 document for each fragment.Or, in bigger scale, if there is needing to compile rope 1,000,000,000,000 documents drawing and have available 100,000 fragment then can distribute 1,000 ten thousand documents for each fragment.As used What above example was illustrated, document can be uniformly distributed between fragment, or can distribute in a different manner, make It is not with its identical responsible document of number to obtain each fragment.

For example, when on user equipment 202 via user interface to receive search inquiry when, inquiry resolution component 224 enter Row operation is reformatted with that will inquire about.How the inquiry is indexed from its free text based on data in search index Form is reformated into the form for being easy to query search to index, such as Converse Index and forward index.In embodiment, parsing And analyze the word of search inquiry to recognize the atom that can be indexed for query search.It can use in search index to text Shelves recognize atom when indexing for recognizing the similar techniques of the atom in document.For example, Query distribution information can be based on Atom is recognized with the statistics of word.Inquiry resolution component 224 can provide one group of connection and these atoms of atom Cascade variant.

Atom or atomic unit used herein can refer to a variety of units of inquiry or document.These units can include Such as word, n first (n-gram), n tuples (n-tuple), nearly k n tuples (k-near n-tuple).Word maps downwards To the single symbol or word of specific segmenter (tokenizer) technical definition by using.In one embodiment, word It is simple character.In another embodiment, word is single word or word group.N members are can be individual continuous from document extraction " n " Or the sequence of the word of nearly singular integral.N members are said to be " compacting ", if it correspond to if the continuous word distance of swimming, and It is " loose ", if its order occurred in a document according to word includes word, but word is not necessarily continuously 's.Loose n members are usually used to the class equivalent phrases for representing difference buzz words（For example, " if rained, I will do It is wet " and " if rained, then I will get wet "）.N tuples used herein are that together occur in a document（It is sequentially unrelated）'s One group of " n " individual word.In addition, nearly k n tuples used herein are referred in the window of " k " individual word in a document together One group of " n " the individual word occurred.Therefore, atom is generally defined as above-mentioned whole generalization.The reality of embodiments of the invention Can now use different types of atom, but atom general description used herein it is above-mentioned it is various types of in each.

Query distribution component 226 is essentially responsible for receiving the search inquiry submitted and is distributed between fragment.One In individual embodiment, each search inquiry is distributed to each fragment so that each fragment provides preliminary last set result.Example Such as, when fragment receives search inquiry, the component in fragment or fragment determines which node will be assigned execution using storage The task of the preliminary ranking function of reverse index portion on node.In one case, as one of the first group node Point selected node be its Converse Index one or more of the atom parsed from search inquiry has been indexed that A bit, as described above.So, when search inquiry is continuously reformatted, one or more atoms are identified and sent to each Section.Each in first group node is found the first group document relevant with search inquiry based on the return of preliminary ranking function, As briefly described above.It is then determined that the second group node.In one embodiment, each in these nodes is at it It is each before the document stored into index in first group of document at least one.In second group node each using it is preceding to Index data and other considerations perform final ranking function, and as a result, second group of document is identified.In an implementation In example, each document in second group is included in the first set, because being used and first group of document phase in final phase sorting The document identification of association.

As a result combining block 228 is given the search result from each fragment（For example, document identification and extracts）, and The final search result list merged from the formation of those results.In the presence of the various modes for forming final search result list, bag Include and simply remove any repetitive file and be put into each document in list according to the order determined by final sequence.At one In embodiment, there is the component similar with result combining block 228 in each fragment so that the result produced by each node It is integrated at the fragment in single list, the list afterwards is sent to result combining block 228.

Turning now to Fig. 3, according to embodiments of the invention, the exemplary plot of mixed distribution system 300 is shown.Fig. 3 is illustrated Various assemblies, including main body manager 310, main body root 312 and two fragments, fragment 314 and fragment 316.Such as ellipsis 310 Indicated, more than two fragments can be provided.Which process main body manager 310 keeps be forward index and Converse Index Which fragment provides the state of service.It also keeps the temperature and state of each process.This data is used to generate one group of process For making inquiry combine from different fragments.Main body root 312 is the top-level root process for also performing query planning function.Main body root 312 will cross over required fragment unnaming and collection and amalgamation result, and can include customized logic.Each fragment With segment root, such as segment root 320 and segment root 322.Segment root is served as the joint of inquiry and combined from Process result polymerization process.Segment root 322 is likely to be reassigned to for final inquiry compilation (assembly) dynamic process of optimal blade or node for.

As indicated, each segment root includes multiple nodes.It is that segment root 320 and segment root 332 are illustrated due to space constraint Three nodes.Segment root 320 includes node 322, node 324 and node 326.Ellipsis 328 indicates expected more than three section Point is within the scope of the invention.Segment root 332 includes node 334, node 336 and node 338.Because any number of node can To constitute segment root, so ellipsis 340 indicates the node of any additives amount.As described, each node be able to carry out it is many It is individual to calculate（Such as ranking function）Machine or computing device.For example, in one embodiment, each node is included such as in node L01 adaptation 322A and L2 sorting unit (ranker) 322B shown in 322.Similarly, node 334 includes L01 adaptations 334A and L2 sorting units 334B.More in detail above describes these, but L0 matchings and L1 that can be by total sequencer procedure be arranged The sequence stage（Preliminary phase sorting）Combine and be referred to as L01 adaptations.Because each node includes L01 adaptations and L2 sequences Device, so each node must also store a part for Converse Index and forward index, because in one embodiment, L01 Adaptation is using Converse Index and L2 sorting units utilize forward index.As mentioned, piece can be belonged to for each node distribution A part for the reverse and forward index of section.The fragment communication bus 330 associated with fragment 314 and associated with fragment 316 Fragment communication bus 342 allow each node for example to be communicated when necessary with segment root.

Fig. 4 is the exemplary plot of the mixed distribution system 400 of the diagram Payload requirement according to embodiments of the invention.System System 400 is the diagram of the individual chip root 410 with multiple nodes.Herein, it is illustrated that six nodes（Be 412 including numbering, 414th, 416,418,420 and 422 node）.Although illustrating six nodes, the present invention can be performed using any number Embodiment.As previously indicated, each node, which has, performs the function that various sequences are calculated, including matching stage（L0）、 Preliminary phase sorting（L1）With final phase sorting（L2）In those.So, node 412 had for example both had for described herein L0 the and L1 stages L01 adaptation 412A, again have for the L2 stages as described herein L2 sorting units 412B.However, with It can be differed considerably in the Payload of different phase.In order to preferably be illustrated to this, shown in the first pattern For the Payload of L01 adaptations, as indicated by numeral 424, and shown in the second pattern for L2 adaptations Payload, as numeral 426 indicated by.

One group of document for distributing to specific fragment is indexed or by atom（Converse Index）With by document（Forward index） Burst.These indexes are divided into the equal numbers of part of the node with constituting the specific fragment.In one embodiment, deposit In 40 nodes, therefore, each in Converse Index and forward index is divided into 40 parts and is stored in each At each in node.When search inquiry is submitted to search engine, the inquiry is sent to each fragment.Identification first Group node is the responsibility of fragment, the Converse Index of the first group node have one in the atom from the inquiry being indexed or It is multiple.Make in this way, if inquiry is resolvable to two atoms, such as from inquiry " William Shakespeare （Shakespear William）" " William（William）" and " Shakespeare（Shakespear）", then it will be responsible for L01 adaptations The maximum number of node in fragment will be two.This is that figure 4 illustrates because associated with node 412 and node 416 L01 adaptations are be identified for using in L01 matching process only two.Due to distribute to the document of each fragment by by Atom is indexed, so each atom is only indexed once in Converse Index so that any specific atoms are only in distribution Occur in one in the reverse index portion of the node of the fragment.In exemplary scenario, once identify first group of section Point, the then atom from search inquiry matched with the atom in Converse Index is sent to appropriate node.The node is performed It is many to calculate so that one group of document is identified.This first group of document includes connect from preliminary phase sorting in one embodiment Receive those documents of highest sequence.

This first group of document is from each node from the first group node at segment root 410（Including the He of node 412 416）Collect.These results are combined in any of a number of ways so that next segment root 410 can recognize By the second group node being used in conjunction with final phase sorting.As indicated, each L2 sorting units are used for rank of finally sorting In section or L2 stages.Because each node has stored a part for the forward index for the fragment, so, exist The good opportunity of most of access or all forward indexes will be needed in the terminal stage of sequence.In final phase sorting, second Each node in group node is given its document identification included in its forward index so that node can be at least based on The data found in forward index are ranked up to the document.Because most of or all nodes are used for final phase sorting In, so as shown in Fig. 4 system 400, the Payload for final phase sorting, which is typically larger than, to be used to match/tentatively sequence The Payload in stage.Fragment communication bus 428 allows other assembly communications of node and such as segment root 410.

With reference to Fig. 5, flow chart illustration according to embodiments of the invention be used for using mixed distribution system so as to based on Search inquiry recognizes the method 500 of relevant documentation.Initially, one group of document is distributed into fragment at step 510.This group of document Not only indexed but also compiled by document with forward index with Converse Index by atom before or after being received at fragment Index, as indicated by step 512.So, the document indexed with forward index is to include distributing to one group of text of the fragment Atom in the document and Converse Index of shelves is the Context resolution from these documents.At step 514, by Converse Index with before It is stored in a part for index at each node in fragment.Usually, fragment includes multiple nodes.Each node is can Reverse index portion and forward index part based on the face that is stored thereon perform sequence calculate machine or computing device. In one embodiment, the Converse Index of each node stored fragments and the different or unique parts of forward index.

Step 516 indicates to access reverse index portion at each node of the first group node.It is every in first group node Individual node has been identified as that one in the atom of the search inquiry received indexing.First is recognized at step 518 Group document.In one embodiment, these documents have been used preliminary ranking function sequence, enabling the most related text of identification Shelves.This step can be for example corresponding to the preliminary phase sortings of L1 and/or L0 matching stages.Based on the document in first group of document Forward index part is accessed at associated document identification, each in the second group node, is shown at step 520.This Step can correspond to the final phase sortings of L2.This effectively limits the quantity of the relevant documentation for particular search query. So, the quantity of document is restricted to second group of document, is shown at step 522.In many or in most cases, second group In node number be more than first group in node, as being more fully described above.Because search inquiry can be only Only there are two atoms so that L01 matching stages need at most two nodes, but thousands of documents are identified as looking into search Two atoms ask are relevant, so, can use much more node to perform final row using its respective forward index Sequence calculates to recognize second group of document.In addition, in embodiment, because final ranking function is utilized from the production of preliminary ranking function Raw document identification, so the number of the document in second group is less than the number of the document in first group so that in second group Each document is also contained in first group.

In one embodiment, overall process can be related to reception search inquiry.Recognize one or many in search inquiry Individual atom, and once each fragment knows one or more atoms, then and identification includes one from search inquiry in fragment The first group node of at least one in individual or multiple atoms.For example, each node in the first group node is by first group of document （Such as document identification）It is sent to segment root so that segment root can combine result（For example delete what is repeated）And merging.The Then two group nodes send second group of document to segment root.Similarly, segment root result is combined and merged with produce in response to Search inquiry is presented to final one group of document of user.

Go to Fig. 6, it is illustrated that be used for generation for the mixed of many procedure documents searching systems according to embodiments of the invention Close the flow chart of the method 600 of compartment system.At step 610, the instruction of one group of document is received, this group of document is assigned to Receive the fragment of this group of document.The fragment includes multiple nodes（Such as ten, 40,50）.This group of document by by Atom is indexed to generate Converse Index, is shown at step 612.At step 614, this group of document by by document index with Generate forward index.At step 616, a part for a part for Converse Index and forward index is distributed to and constitutes fragment Each node.In embodiment, each node is allocated the different piece of reverse and forward index so that specific atoms are only in piece It is indexed in the forward index of a node in section.

In embodiment, at fragment, the instruction of the one or more atoms recognized from search inquiry is received.Identification First group node, its reverse index portion includes at least one in one or more atoms.These nodes each can Perform various ranking functions.First group of document is recognized based on the reverse index portion of the first group node.It is each in first group Node can produce first group and send it to segment root so that various first group nodes can be combined and be merged.One In individual example, produced via the preliminary sequencer procedure of the multistage sequencer procedure of the reverse index portion using the face that is stored thereon Raw first group of document.In addition, the second group node then can be recognized, its forward index part is by corresponding to first group of document One or more document identifications are indexed.Then it can be based in part on and be stored in the data in forward index to recognize second group Document, and it can calculate feature in real time rather than using the fraction precalculated.It can be based on utilizing forward index The final sequencer procedure of multistage sequencer procedure recognize second group of document.Once each node in the second group node Second group of document combined and merged, then also it is merged with second group of document from all other fragment so that formed One group of document finally simultaneously returns to user as search result.

Fig. 7 is to show being used for using mixed distribution system come based on search inquiry identification phase according to embodiments of the invention Close the flow chart of the method 700 of document.Initially, at step 710, search inquiry is received.In one embodiment, to inquire about into Row supplement is changed, such as using spelling truing tool or stem method (stemming).In identification search inquiry at step 712 Atom.At step 714, atom is sent to various fragments.Each fragment is allocated one group of document, this group of document by by Atom and the Converse Index and forward index being stored in formation at each fragment of being indexed by both documents.Each fragment includes Multiple nodes, each node is allocated a part for Converse Index and forward index.At step 716, the first group node is recognized, Its reverse index portion includes at least one in the atom from search inquiry.At step 718, access in the first group node Each node reverse index portion to recognize first group of relevant documentation.Based on related to each document in first group of document The document identification of connection, recognizes the second group node at step 720.Each node in second group node is in its respective forward direction In index part store document mark at least one so that node can to each document perform sequencer procedure.In step The forward index part at each node in the second group node is accessed at 722 to limit the number of relevant documentation.In a reality Apply in example, each document in second group of document is also contained in first group of document.Based on second group of document, generation search knot Really（For example, by being compiled to second group of document from multiple fragments）And it is presented to user.

Referring now to Figure 8, there is provided the method 800 for each fragment allocated segment root for main body root.It is all as above In the embodiment stated, segment root can take the form of preliminary segment root and final segment root.For example, based on known letter at that time Breath（The current or prospective load of each node in such as fragment）To select preliminary segment root, or even can randomly it select Preliminary segment root is selected, such as according to circulation time table.Final segment root is selected based on many factors and hereafter will in more detail Discuss.Usually, segment root is served as the joint of inquiry and from by the polymerization of the result of united process Process.Select preliminary and final segment root（All segment roots 322 as shown in Figure 3）Process be probably dynamic process.For example, point Process with preliminary and final segment root can be directed to the inquiry each received and occur simultaneously across multiple fragments.To be final It is optimal that the node of segment root selection, which is considered as final inquiry compilation,.

As shown in Fig. 3 herein, each segment root includes multiple nodes.It is segment root 320 and piece due to space constraint Section root 332 illustrates three nodes.Segment root 320 includes node 322, node 324 and node 326.Ellipsis 328 indicates to be expected More than three node is within the scope of the invention.Segment root 332 includes node 334, node 336 and node 338.Due to any number Destination node may be constructed segment root, so ellipsis 340 indicates the node of any additives amount.As mentioned, Mei Gejie Point is to be able to carry out multiple calculating（Such as ranking function）Machine or computing device.For example, in one embodiment, Mei Gejie Point includes the L01 adaptation 322A and L2 sorting units 322B as shown in node 322.Similarly, node 334 includes L01 Orchestration 334A and L2 sorting unit 334B.More in detail above describes these, but can by the L0 of total sequencer procedure matching and L1 phase sortings（Preliminary phase sorting）Combine and be referred to as L01 adaptations.Because each node includes L01 adaptations and L2 rows Sequence device, thus each node must also stored Converse Index and forward index a part because in one embodiment, L01 adaptations are using Converse Index and L2 sorting units utilize forward index.As mentioned, it can belong to for each node distribution A part for the reverse and forward index of fragment.The fragment communication bus 330 and related with fragment 316 being associated to fragment 314 The fragment communication bus 342 of connection allows each node for example to be communicated when necessary with segment root.

Fig. 8 is returned, search inquiry is initially received at step 810.With reference to Fig. 3, search can be received at main body root 312 Search inquiry or part thereof is then distributed to each fragment by inquiry, main body root 312.At step 812, based on each node whether It will be used to decompose search inquiry to recognize the group node in fragment.As mentioned, each node is allocated one group of document, By this group of document structure tree Converse Index and forward index.So, the atom in search inquiry, some nodes will be used for spy Fixed inquiry and some nodes will not.Based on by be used to decompose ad hoc inquiry node or inquiry in the hash of atom know The group node not recognized at step 812.Hash function obtains the word or atom in search inquiry and determines which node The record list (posting list) with the particular words or atom is stored in the above.This allows identification to be used to Decompose the node of the particular search query.Record list is only the list of word and those documents comprising the word.Can be with Algorithm is used for hash function.Exemplary algorithm includes MD5 or CRC, but others are certainly contemplated to the scope of the present invention It is interior.

At step 814, preliminary segment root is selected from the group node.In one embodiment, it is randomly chosen preliminary Segment root, such as cyclically.However, in another embodiment, selecting preliminary segment root based on expected load so that have The node of lowest desired load is selected as preliminary segment root.In an example, this can be with minimum purpose record Outstanding requests（Such as current loads）Node so that the node of the minimum current loads with unfinished inquiry be selected as just Walk segment root.Once have selected preliminary segment root from the group node, then received at preliminary segment root in the group node The statistics of each node, shows at step 816.In one embodiment, preliminary segment root request by means of for example by The communication bus of each node is connected to send this data.Statistics can indicate that each node serves as final segment root Ability.Final segment root is responsible for based on result of query execution of the search inquiry compilation from the group node.Property only by way of example Purpose, statistics can include the record length of list of each node, input/output load, associated with specific node The problem of signal or will be required to be transferred to the data volume of final segment root.Usually, selection is considered as serving as final fragment Cause minimum cost during root（Such as time, money）Node.

If for example, specific node has extremely long record list, then will need to pass to final segment root across network Defeated substantial amounts of data, so that final segment root, which can polymerize the inquiry from all nodes, extracts data.In an implementation In example, this there can be the specific node to be transmitted of mass data to be elected to be final segment root so that its data need not be sent out Another node is sent to, the transmission will cause the data transfer of high cost.As it was previously stated, sending out the node of problem signals may have Some problems.The auxiliary signal of many types can indicate that the performance of the taking-up data of node is weakened.Equally as described, elected It is contemplated that input/output load when selecting final segment root.For example, this can cover from the node queue of hard disk extraction data Length.In addition, recording list if there is it includes word " dog（Dog）" three nodes, then when receive also include word " dog （Dog）" inquiry when, the node with minimum load can be selected, because it will have more times serve as final fragment Root.When it is determined that during final segment root, the other factorses comprising bandwidth can also be included.In an example, preliminary segment root is real The algorithm for determining final segment root is performed on border.

At step 818, final segment root is selected from the group node by algorithm based on statistics.In some embodiments In, preliminary segment root and final segment root are identical nodes, but in other embodiments, they are different nodes. Algorithm can be used to carry out the determination on the node by final segment root for ad hoc inquiry is used as.Inquiry will consider Above-mentioned statistics.At step 820, the identity of final segment root is notified into the group node so that node is known they are each From result of query execution send where.In one embodiment, preliminary segment root oneself, which is taken, is transferred to itself most The task of whole segment root, or if it is selected as final segment root, then preliminary segment root can pass on it to show to other nodes It is being final segment root.

In embodiment, search inquiry is performed using the group node being previously identified.It is as described herein, can be by first Group node is identified as participating in preliminary phase sorting（For example utilize the Converse Index being stored on node）, and can be by second group Node is identified as participating in final phase sorting（For example utilize the forward index being stored on node）.Final segment root can be for Collected from both preliminary and final phase sortings and aggregated data.For example, the node being related in preliminary phase sorting returns to bag Containing some word or the lists of documents of atom in search inquiry.The node that final phase sorting is related to is returned with search inquiry most Related document, such as document identification.So, result of query execution can refer to from preliminary phase sorting, final phase sorting Or both result.Further, can be without using multi-step sequencer procedure.There is single sequence or search procedure wherein Example in, collected by final segment root and polymerize single group result.

The method described in fig. 8 can be used as system.It is, for example, possible to use various system components come select it is preliminary and Final segment root.Solely for illustrative purposes, these components can include preliminary segment root selection component, statistics reception Component, final segment root selection component and query execution component.These components can by all networks 208 as shown in Figure 2 it The network of class is in communication with each other.Preliminary segment root selection component is responsible for aggregation data and performs hash calculating to determine the specific fragment Which of node will be used to perform search inquiry.As described, each node has one of record list and search index Point.The record list includes the document that atom and the atom are contained therein.Statistics receiving unit can be asked from node Indicate that each node serves as the availability of final segment root and the statistics of ability with receiving.According to searching from execution will be used to The statistics that the node of rope inquiry is received, final segment root selection component is responsible for selecting final segment root.In an implementation In example, preliminary segment root carries out this determination using algorithm.Finally, query execution component distributes the search inquiry or part thereof To final segment root, then the search inquiry part is distributed to the appropriate node in the fragment by final segment root.The node is true Which fixed document is most related to the part inquired about or inquired about, and the data are sent into final piece by means of such as communication bus Duan Gen.As described, multiple fragments occur this process simultaneously.

Go to Fig. 9, it is illustrated that the method 900 for selecting segment root from multiple nodes.Initially, at 910, including many Search inquiry is received at the fragment of individual node.As described, there are each multiple fragments for simultaneously performing search inquiry（For example it is several Hundred fragments）.Each fragment includes multiple nodes, and each of which is allocated the document indexed with one or more search indexes A part.So, each node has stored a part for the reverse and forward index indexed respectively by atom and document. At step 912, the group node from the multiple node is identified as being used to perform to the search inquiry received.Can be with Which for example carried out using hash function on by using the determination of node.This can depend on storing on each node Index so that its those node for indexing the particular words or atom that have stored in search inquiry is identified as being used to decompose being somebody's turn to do Particular search query.At step 914, before search inquiry is performed, preliminary segment root is selected from the multiple node.Tentatively The selection of segment root is the expection load based on each node, the current loads of each node, random selection etc..In preliminary segment Available information is used for selection when root is selected.Preliminary segment root serves as preliminary segment root until have selected final fragment Root.

Statistics is received at selected preliminary segment root at step 916.The statistics can before being received With to for perform search inquiry the group node in each node request and be also to be received at preliminary segment root.Such as Preceding described, statistics can include the length or other numbers by necessary cross-domain network transmission of the record list of each node According to the input/output load on, node（Such as by the queue length that data are extracted from hard disk have how long）, it is associated with node One or more of problem signals, cost etc..At step 918, final fragment is selected based on the statistics received Root.Final segment root is collected during query execution from the group node and aggregate query performs data.In one embodiment, most Whole segment root receives the search inquiry of the external source from such as server etc.Final segment root can with or alternatively connect By the search inquiry for carrying out autonomous agent root, all main body roots 312 as shown in Figure 3.At step 920, search inquiry is performed.As institute State, there may be one or more stages of query execution, such as preliminary phase sorting and final phase sorting.

Figure 10 illustrates the method 1000 for selecting segment root.At the main body root with one or more fragments, Step 1010 place receives search inquiry.Each fragment in main body root has multiple nodes, each has and is stored thereon face Search for a part for index.For example, each node can store in the above by the Converse Index of atom tissue a part and By a part for the forward index of document tissue.Tied when using multiple sequences or search phase based on search inquiry to provide search Situation may be so during fruit.At step 1012, recognizing will be used to perform the search inquiry that receives in each segment One group node.Multiple fragments perform preliminary and final segment root the process of selection simultaneously.In addition, this process can be directed in main body The each search inquiry received at root occurs.At step 1014, for each fragment in main body root, from the group section Point identification preliminary segment root.For example from preliminary segment root to will be used to perform the search inquiry received at step 1016 Each node request statistics in the group node.Statistics include indicate node be selected as final segment root ability or Any data of availability.Overall goal is to transmit data as few as possible for example from node to final segment root on network. Therefore most cost-efficient node is selected as final segment root.In some instances, can be more towards load adjustment Overall goal, but in other examples, preliminary segment root may be more favourable to network capacity.

At step 1018, statistics is received from each node in the group node at preliminary segment root.As described, Statistics indicate each node serve as final fragment with availability, final segment root is from the group node in its respective segments Collect query execution data.In one embodiment, statistics and other data are transmitted from node using communication bus To preliminary or final segment root.A section root is finally arranged for each Piece Selection based on statistics at step 1020.In step At 1022, search inquiry is performed.In certain embodiments, before search inquiry is performed, end is sent to described many Individual node, or at least indicate the group node of the identification of final segment root so that node is known their own including looking into Where is the data transmission of inquiry execution data.Final segment root is received after inquiry is performed from each node in the group node Query execution data.

The present invention is described relative to specific embodiment, it is intended to be illustrative and nonrestrictive in all respects. Without departing from the scope of the invention, alternative embodiment will become for those skilled in the art in the invention Obviously.

According to foregoing teachings, it will be seen that be that the present invention is well suited for reaching all results described above and mesh And advantage that the system and method are apparent and intrinsic.It should be understood that some features or sub-portfolio have practicality And can be used in the case of without reference to further feature and sub-portfolio.This scope of the claims is expected and in claim In the range of.

Claims

1. a kind of method (800) for allocated segment root, this method includes：

Receive (810) search inquiry；

The group node that identification (812) will be used to decompose in the fragment of search inquiry, the fragment is assigned one group of document, Described group of document is indexed by atom with Converse Index and indexed with forward index by document, wherein, the reverse rope The appropriate section drawn with forward index is assigned to each node in the group node, and wherein, the Converse Index For preliminary sequencer procedure, the forward index is used for final sequencer procedure；

(814) preliminary segment root is selected from the group node；

At preliminary segment root (816) statistics, the statistics are received from each node in the group node recognized The ability that each node serves as final segment root is indicated, the final segment root is responsible for collecting from the group section based on search inquiry The result of query execution of point；

(818) final segment root is selected from the group node by algorithm based on statistics；And

Notify (820) to the group node final segment root so that node, which is known, sends their own result of query execution Where.

2. the method for claim 1, wherein storage in the above is used to perform each node in the group node A part for the search index of search inquiry.

3. preliminary segment root is the method for claim 1, wherein selected based on expected load so that with minimum pre- The node of phase load is selected as preliminary segment root.

4. the method for claim 1, wherein select preliminary segment root based on current loads so that with unfinished The node of the minimum current loads of inquiry is selected as preliminary segment root.

5. it is the method for claim 1, wherein preliminary described in each Piece Selection in response to the search inquiry received Segment root and the final segment root.

6. the method for claim 1, wherein the statistics includes the length, defeated of the record list of each node Enter/output load, with specific node signal or will be required to be transferred in the data volume of final segment root the problem of associated It is one or more.

7. a kind of method (900) for allocated segment root, this method includes：

At the fragment including multiple nodes, (910) search inquiry to be performed is received；

A group node of search inquiry, described will be used to perform from the multiple node identification (912) in the fragment Section is assigned one group of document, and described group of document is indexed by atom with Converse Index and compiled rope by document with forward index Draw, wherein, the appropriate section of the Converse Index and forward index is assigned to each node in the group node, and Wherein, the Converse Index is used for preliminary sequencer procedure, and the forward index is used for final sequencer procedure；

Before search inquiry is performed, (914) preliminary segment root is selected from the multiple node, the selection is based on each node It is expected that one or more of load or random selection；

At preliminary segment root (916) statistical number is received from by each node for being used to perform in the group node of search inquiry According to, wherein, the statistics includes current loads and the cost data associated with sending data across network；

The final piece of data will be performed from the group node aggregate query during query execution based on statistics selection (918) Duan Gen；And

Perform (920) search inquiry.

8. method as claimed in claim 7, wherein, storage is used to each node in the multiple node in the above A part for the search index of relevant documentation is recognized based on search inquiry.

9. method as claimed in claim 7, wherein, each in the multiple node is stored by by atom in the above A part for a part for the Converse Index indexed and the forward index indexed by document.

10. a kind of method (1000) for allocated segment root, this method includes：

At the main body root including multiple fragments, (1010) search inquiry is received, wherein, each bag in the multiple fragment Multiple nodes of a part for each search index with the face that is stored thereon are included, the search index is included by with reverse rope Draw the one group of document indexed and indexed with forward index by document by atom, and wherein, in the multiple node It is stored with each and is assigned to the Converse Index of each node and the appropriate section of forward index, and wherein, institute Stating Converse Index is used for preliminary sequencer procedure, and the forward index is used for final sequencer procedure；

The group node that identification (1012) will be used to perform in each fragment of the search inquiry received；

For each in the multiple fragment, (1014) preliminary segment root is recognized from the group node；

(1016) statistics is asked to by each node for being used to perform in the group node of the search inquiry received；

(1018) statistics is received from each node in the group node, the statistics indicates that each node serves as final piece The availability of section root, the final segment root collects query execution data from the group node in its respective segments；

It is the final segment root of each Piece Selection (1020) based on the statistics；And

Perform (1022) search inquiry.

11. a kind of device for allocated segment root, the device includes：

Unit for receiving (810) search inquiry；

For recognizing (812) by the unit for the group node for being used to decompose in the fragment of search inquiry, the fragment is allocated There is one group of document, described group of document is indexed by atom with Converse Index and indexed with forward index by document, wherein, The appropriate section of the Converse Index and forward index is assigned to each node in the group node, and wherein, institute Stating Converse Index is used for preliminary sequencer procedure, and the forward index is used for final sequencer procedure；

Unit for selecting (814) preliminary segment root from the group node；

Unit for receiving (816) statistics from each node in the group node recognized at preliminary segment root, The statistics indicates the ability that each node serves as final segment root, and the final segment root is responsible for collecting based on search inquiry Result of query execution from the group node；

Unit for selecting (818) final segment root from the group node by algorithm based on statistics；And

For notifying (820) to the group node final segment root so that node is known their own result of query execution Unit where sent.

12. a kind of device for allocated segment root, the device includes：

For at the fragment including multiple nodes, receiving the unit of (910) search inquiry to be performed；

For the list for the group node that will be used to perform search inquiry from the multiple node identification (912) in the fragment Member, the fragment is assigned one group of document, and described group of document is indexed with Converse Index by atom and by with forward index Indexed by document, wherein, the appropriate section of the Converse Index and forward index is assigned to each in the group node Individual node, and wherein, the Converse Index is used for preliminary sequencer procedure, and the forward index is used for final sequencer procedure；

For before search inquiry is performed, the unit of (914) preliminary segment root to be selected from the multiple node, the selection is based on One or more of the expection load of each node or random selection；

For receiving (916) system from by each node for being used to perform in the group node of search inquiry at preliminary segment root The unit counted, wherein, the statistics includes current loads and the cost number associated with sending data across network According to；

For based on statistics selection (918) data will to be performed most from the group node aggregate query during query execution The unit of whole segment root；And

Unit for performing (920) search inquiry.

13. a kind of device for allocated segment root, the device includes：

For at the main body root including multiple fragments, receiving the unit of (1010) search inquiry, wherein, in the multiple fragment Each include each search index with the face that is stored thereon a part multiple nodes, the search index includes The one group of document indexed and indexed with forward index by document by atom with Converse Index, and wherein, it is described many It is stored with each in individual node and is assigned to the Converse Index of each node and the appropriate section of forward index, and And wherein, the Converse Index is used for preliminary sequencer procedure, the forward index is used for final sequencer procedure；

For recognizing (1012) by the unit for the group node for being used to perform in each fragment of the search inquiry received；

For for each in the multiple fragment, the unit of (1014) preliminary segment root to be recognized from the group node；

For asking (1016) statistical number to by each node in the group node for being used to perform the search inquiry received According to unit；

Unit for receiving (1018) statistics from each node in the group node, the statistics indicates each node The availability of final segment root is served as, the final segment root collects query execution number from the group node in its respective segments According to；

For being the unit of the final segment root of each Piece Selection (1020) based on the statistics；And

Unit for performing (1022) search inquiry.