CN103186650A - Searching method and device - Google Patents

Searching method and device Download PDF

Info

Publication number
CN103186650A
CN103186650A CN2011104611284A CN201110461128A CN103186650A CN 103186650 A CN103186650 A CN 103186650A CN 2011104611284 A CN2011104611284 A CN 2011104611284A CN 201110461128 A CN201110461128 A CN 201110461128A CN 103186650 A CN103186650 A CN 103186650A
Authority
CN
China
Prior art keywords
search
record
list
etymology
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104611284A
Other languages
Chinese (zh)
Other versions
CN103186650B (en
Inventor
简勤
郭正平
陈健骥
何丹
赖航
肖巍
郑长松
王全礼
杨俊拯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Sichuan Co Ltd
Original Assignee
China Mobile Group Sichuan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Sichuan Co Ltd filed Critical China Mobile Group Sichuan Co Ltd
Priority to CN201110461128.4A priority Critical patent/CN103186650B/en
Publication of CN103186650A publication Critical patent/CN103186650A/en
Application granted granted Critical
Publication of CN103186650B publication Critical patent/CN103186650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a searching method and device. The searching method comprises the following steps: setting key attributes and key attribute weights for the same class of documents and calculating a key attribute value of each document; establishing an index inverted list, wherein each record of a document list in the index inverted list comprises a document serial number and a key attribute value of one document, the document list consists of an ordered list and an unordered list, the ordered list comprises n records of which the key attribute values are maximized and are arrayed according to the key attribute values from large to small and n is a predetermined value; generating corresponding word origins according to retrieving character strings inputted by users; retrieving the index inverted list according to the generated word origins; and preferentially acquiring the records from the ordered list corresponding to the generated word origin according to the search results inputted by the users so as to obtain the required search results and the total number of relevant results. According to the searching method and device disclosed by the invention, the searching speed can be increased and the occupation of system resources is reduced.

Description

A kind of searching method and device
Technical field
The present invention relates to the data service technical field, relate in particular to a kind of searching method and device.
Background technology
Existing search engine generally all is unilaterally to sort based on text similarity, and the storage of index generally is based on key-value pair<KEY, DocList〉form.Wherein, KEY represents key word, and DocList represents to comprise the lists of documents of key word KEY.Each element among the DocList is a document object, is used for depositing the essential information of a document, for example: the number of times that the ID of the document, this key word KEY occur in the document and the information such as position of appearance.After the user imports a key word, will retrieve according to this key word earlier; After retrieving corresponding key word, the score of all documents under this key word among the overall calculation DocList corresponding with this key word again; Then, according to above-mentioned score all Search Results integral body is sorted, read the Search Results of required number of user again in the Search Results after ordering, the Search Results that reads is returned to the user.
For example, the sort method that has a kind of integrative searching result in the prior art, in the method, to adopt sort algorithm to calculate the integrated value of vertical search engine, and according to comprehensively directly this vertical search engine being sorted, gather all vertical search engine ranking results, generate final search result.
But, above-mentioned sort method is that the computation process with the Search Results score is placed in the search procedure and carries out, and all be to carry out the full dose ordering at every turn, thereby the occupancy to CPU and memory source is bigger, search complexity is higher, search speed is slower, and does not consider the surcharge attribute of information, thereby can't guarantee that costly information sorting is necessarily forward.Simultaneously, owing to need to store the factor and the integrate score calculated factor of score basis in this method, thus storage space also there is higher requirement.
In addition, also there is a kind of search result ordering method based on search engine in the prior art.In the method, mainly be the Internet resources weight according to configured in advance, according to the text weight of key word in resource of user's input, every related resource score of COMPREHENSIVE CALCULATING is carried out the full dose ordering then, to generate Search Results simultaneously.But, in the method, though consider the surcharge attribute of information, but in search procedure, need to calculate each document score, also need simultaneously the result of search is carried out the full dose ordering, thereby the occupancy to CPU and memory source is also bigger, and search complexity is also higher, and search speed is also slower.Simultaneously, owing to also need to store the factor and the integrate score calculated factor of score basis in this method, thus storage space also there is higher requirement
In summary, in the searching method in the prior art, because the calculating of document score and ordering all are to be placed in the search procedure to finish, and be to sort at full dose; And the structure of index organization is not considered user's search custom yet, causes the canned data amount bigger; In addition, each search all needs to carry out at full dose, thereby has increased the burden of system's scarce resources such as internal memory and CPU greatly.Simultaneously, the search complexity of searching method of the prior art is generally all higher, and search response speed is also slower.In addition, the foundation that sorts in the searching method of the prior art is the text degree of correlation, and considers the determinant attribute that the energy expressing information is worth, and therefore causes problems such as sortord is single, user friendly deficiency.
Summary of the invention
In view of this, the invention provides a kind of searching method and device, thereby can improve search speed, reduce taking of system resource.
The technical solution used in the present invention specifically is achieved in that
A kind of searching method, this method comprises:
A, for same at least one determinant attribute of class document setup and corresponding determinant attribute weight, and according to the determinant attribute score value KFScore of described determinant attribute and each document of determinant attribute weight calculation;
B, be that key word carries out index with whole documents to be retrieved with etymology Term, to set up with Term be key word index, be the index inverted list of value with the total TotalCount of the document that comprises this Term and the lists of documents DocList that comprises this Term; Include document code and the determinant attribute score value of a document in every record in the described lists of documents; Described lists of documents is made up of ordered list and unordered list, comprises n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the described ordered list; Wherein, described n is predetermined value;
C, the searching character string of importing according to the user generate corresponding etymology, according to the etymology that generates described index inverted list is retrieved, and preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum.
A kind of searcher also is provided among the present invention, and this searcher comprises: inverted list generation module, memory module, etymology generation module and retrieval module;
Described inverted list generation module, being used for whole documents to be retrieved is that key word carries out index with etymology Term, and foundation is key word index with Term, be the index inverted list of value with the total TotalCount of the document that comprises this Term and the lists of documents DocList that comprises this Term; Include document code and the determinant attribute score value of a document in every record in the described lists of documents; Described lists of documents is made up of ordered list and unordered list, comprises n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the described ordered list; Wherein, described n is predetermined value; Described index inverted list is sent to memory module;
Described memory module is used for the described index inverted list of storage;
Described etymology generation module is used for generating corresponding etymology according to the searching character string of user's input; The Search Results scope of described etymology and user's input is sent to retrieval module;
Described retrieval module, be used for retrieving according to the index inverted list that the etymology that generates is stored memory module, and preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum; Described Search Results and correlated results sum are sent to the user.
As seen from the above technical solution, among the present invention because be document setup determinant attribute and determinant attribute weight, therefore can calculate the KFScore of each document, and set up an index inverted list according to the KFScore of each document, make that the lists of documents in this index inverted list is made up of ordered list and unordered list.When the searching character string of user input, can generate corresponding etymology, and according to the etymology that generates the index inverted list is retrieved, preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum, thereby can be under the prerequisite that guarantees relevance of search results (being validity), the value of the various information of Multi-dimensional Scaling, so that the rank of valuable information is forward, thereby can improve search speed greatly, improve the search response time of search engine; Simultaneously, also can reduce taking of system resources such as CPU and internal memory, thereby save a large amount of hardware and software resources.
Description of drawings
Fig. 1 is the process flow diagram of the searching method among the present invention.
Fig. 2 is the information structure diagram of document among the present invention.
Fig. 3 is the structural representation of index inverted list among the present invention.
Fig. 4 is the process flow diagram of single Term searching method among the present invention.
Fig. 5 is the structural representation of the index inverted list in the instantiation of the present invention.
Fig. 6 is the process flow diagram of many Term searching method among the present invention.
Fig. 7 is the process flow diagram of multilayer searching method among the present invention.
Fig. 8 is the process flow diagram of a kind of implementation method of step 702 among the present invention.
Fig. 9 is the structural representation of the searcher among the present invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention express clearlyer, the present invention is further described in more detail below in conjunction with drawings and the specific embodiments.
Fig. 1 is the process flow diagram of the searching method among the present invention.
As shown in Figure 1, this method comprises:
Step 101 is for same at least one determinant attribute of class document setup (KeyField) and set in advance the determinant attribute weight of each determinant attribute.
In the data service field, the information set of describing a things can be closed and be referred to as a document (Document).Fig. 2 is the information structure diagram of document among the present invention.As shown in Figure 2, in technical scheme of the present invention, comprise a plurality of etymologies (Term) and a plurality of attribute (Field) in each document; Wherein, described attribute can be made up of one or more Term.For example, can comprise attributes such as " title ", " content " in the document, " title " attribute wherein can be made up of one or more Term.Again for example, the attribute 1 among Fig. 2 is made up of N etymology, and attribute J is made up of M etymology etc.
In addition, in technical scheme of the present invention, also will be the one or more determinant attributes for the important information that identifies the document of each document setup, and same class document has identical determinant attribute.Following table 1 is the example explanation of determinant attribute among the present invention.
The document classification Determinant attribute 1 Determinant attribute 2 ......
Commodity class document Commodity price Buy number of times ......
Paper class document Quote number of times Download time ......
Music class document The audition number of times Download time ......
...... ...... ...... ......
Table 1
As described in Table 1, for commodity class document, can commodity price and/or buy number of times and be set to determinant attribute; For paper class document, can will quote number of times and/or download time as determinant attribute; For music class document, can the audition number of times and/or download time be set to determinant attribute.
After determinant attribute is set, will be that each set determinant attribute pre-determines corresponding weights W eight (can be described as the determinant attribute weight) according to practical situations also.
Step 102 is according to the determinant attribute score value (KFScore) of described determinant attribute and each document of determinant attribute weight calculation.
After above-mentioned determinant attribute and determinant attribute weight are set, can be according to the determinant attribute score value (KFScore) of set determinant attribute and each document of determinant attribute weight calculation.
For example, can calculate the KFScore of a document by formula as described below (1):
KFScore=KeyField 1*W 1+KeyField 2*W 2+......+KeyField X*W X
Wherein, W 1+ W 2+ ...+W X=1 (1)
Described KeyField 1The value of the 1st KeyField of expression the document, described W 1Expression and the value of the 1st Weight that KeyField is corresponding of the document, described KeyField xThe value of representing x KeyField of described document, described W xRepresent the value of the Weight corresponding with x KeyField of the document, the rest may be inferred.
Step 103, be that key word (Key) carries out index with Term with whole documents to be retrieved, to set up with Term be key word index, be the index inverted list InvertIndexList of value (Value) with the sum (TotalCount) of the document that comprises this Term and the lists of documents (DocList) that comprises this Term.
In this step, will set up an index inverted list.Specifically, will to set up one be key word index with Term, with key-value pair<TotalCount, DocList〉be the index inverted list of value, this index inverted list can be described as InvertIndexList.
Fig. 3 is the structural representation of index inverted list among the present invention.As shown in Figure 3, comprise one or more record among the index inverted list InvertIndexList among the present invention, every record all can key-value pair<Key, Value〉mode stores, and can comprise two fields in every record: key word (Key) field and key assignments (Value) field.Wherein, described Key field is used for the corresponding Term of storage document, and the Value field is used for the storage key-value pair<TotalCount corresponding with the Term of described Key field, DocList 〉.Wherein, DocList is the lists of documents that comprises this Term, and TotalCount is the total number of documents among the DocList.
As shown in Figure 3, in specific embodiments of the invention, described DocList can be chained list, and TotalCount then is the element sum in this chained list.Comprise many records in the described chained list, every record is all with key-value pair<KFScore, Did〉mode stores, and each key-value pair is corresponding to a document that comprises corresponding Term.Wherein, described KFScore represents the determinant attribute score value of the document, can calculate by the formula in the step 102 (1); Described Did then represents the sign of the document, for example, and the numbering of document.
In addition, in specific embodiments of the invention, for each DocList among the index inverted list InvertIndexList, all can carry out partial ordered according to the size of KFScore to the record among the same DocList in advance, the record of n KFScore maximum before namely in this DocList, determining earlier, and the KFScore size order pressed in the record of n KFScore maximum before described arrange, form an ordered list (OrderedList); A unordered list (DisorderedList) formed in other record among this DocList.Therefore, each DocList all is made up of ordered list and unordered list in the specific embodiments of the invention, comprises n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the described ordered list.Wherein, described n is predetermined value, and therefore, described n is the length value of OrderedList.
Hence one can see that, in specific embodiments of the invention, need not all records among the DocList are all sorted by the KFScore size, and only need carry out partial ordered, the record of n KFScore maximum before namely only needing to determine, and with the record (being the record in the ordered list) of preceding n KFScore maximum sorting by the KFScore size gets final product, other record (being the record in the unordered list) among this DocList then can not sort.
Step 104, searching character string according to user's input generates corresponding etymology, according to the etymology that generates described index inverted list is retrieved, and preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum.
In technical scheme of the present invention, after setting up the index inverted list by above-mentioned steps 103, can retrieve described index inverted list according to searching character string and the Search Results scope of user's input, to obtain required Search Results and correlated results sum.
Specifically, at first can generate corresponding etymology according to the searching character string of user's input.
For example: the searching character string of importing as the user is " Liu Dehua ", then can convert this searching character string " Liu Dehua " to corresponding etymology according to different transformation rules: " Liu Dehua " or " liudehua ".
The searching character string of importing as the user is " Liu De China providence ", then can convert this searching character string " Liu De China providence " to corresponding etymology according to different transformation rules: " Liu De China providence " or " liudehuatianyi "; Perhaps this searching character string " Liu De China providence " can be converted to corresponding two etymologies: " Liu Dehua " and " providence ".
After generating corresponding etymology, can retrieve described index inverted list according to this etymology, judge in the described index inverted list whether store above-mentioned etymology.
When storing this etymology in the described index inverted list, then can read the corresponding DocList of this etymology, and preferentially from the ordered list of described DocList, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum.
Wherein, described Search Results scope is the scope at the desired Search Results place of user.Therefore, described Search Results scope comprises: reference position (InitialPositon) and bar number (Count), the maximal value of described Search Results scope is: (InitialPositon+Count)-1.Wherein, described reference position is the reference position of Search Results, described the bar number that number is Search Results.
For example, after the user imports searching character string and searches for, will check Search Results by the form of webpage.Be example with each web displaying Count=30 bar Search Results, when the user wishes to check the 1st page Search Results, InitialPositon=1, Count=30, described Search Results scope is: the 1st~30 Search Results; And click " following one page " check the 1st page Search Results as the user after, and when wishing to check the 2nd page Search Results, InitialPositon=31 then, Count=30, described Search Results scope is: the 31st~60 Search Results.Can analogize according to this.
Therefore, in specific embodiments of the invention, described Search Results scope according to user's input is preferentially obtained record from the ordered list of described DocList, can comprise to obtain required Search Results:
When the maximal value of described Search Results scope is less than or equal to the length n of ordered list, the Search Results scope of this moment is contained in the described ordered list, therefore, the direct InitialPositon bar start-of-record from ordered list, read Count bar record as Search Results, and need not to read the record in the unordered list.Because the record in the ordered list has all been pressed the size order of KFScore and arranged, the Count bar record that therefore reads is the Search Results of arranging by the size order of KFScore.
(be InitialPositon≤n) when the reference position of described Search Results scope is arranged in ordered list, and the maximal value of described Search Results scope is during greater than the length n of ordered list, the part of the Search Results scope of this moment is arranged in ordered list, another part is arranged in unordered list, at this moment, InitialPositon bar start-of-record that can be from ordered list, read (n-InitialPositon+1) bar record, then the record in the described unordered list being pressed the size order of KFScore arranges, and the 1st start-of-record from described unordered list, read [Count-(n-InitialPositon+1)] bar record; The Count bar that reads is recorded as Search Results.At this moment, the Count bar record that reads also is the Search Results of arranging by the size order of KFScore.
When the reference position of described Search Results scope is arranged in unordered list (when being InitialPositon>n), the Search Results scope of this moment all is arranged in described unordered list, therefore, record in the described unordered list can be pressed the size order of KFScore arranges, (InitialPositon-n) bar start-of-record from the unordered list after the ordering reads Count bar record as Search Results then.At this moment, the Count bar record that reads also is the Search Results of arranging by the size order of KFScore.
By above-mentioned method, can obtain required Search Results.
Consider user's search custom, the user only can be concerned about tens of the Search Results that ordering is forward generally speaking.According to authoritative institution to the statistics analysis of user's search custom as can be known, surpass 90% general preceding two pages (being generally the Search Results about 20) only can checking Search Results of user.Therefore, be under other situations of several hundred million level in the search data amount, if the value of the length n of ordered list (for example surpasses certain value, n=100), then under most situations, the Search Results scope that the user imports is included among the ordered list, thereby can directly from ordered list, obtain Search Results required number and that arranged by the size order of KFScore, and need not again Search Results to be sorted, the required memory space is less, and the CPU operation times is less, response time is higher, and the system resource that takies is less, thereby can improve the search response time of search engine greatly, also can save a large amount of hard simultaneously, software resource.
Further, in specific embodiments of the invention, when etymology that the searching character string of not storing in the described index inverted list according to user input generates, then can return empty set as Search Results to the user, expression does not search relevant document.
In addition, in technical scheme of the present invention, above-mentioned steps 104 can have multiple implementation, namely can use multiple searching method to realize above-mentioned step 104.Below will various searching methods be introduced respectively in the mode of a plurality of specific embodiments.
Embodiment one: single Term searching method.
In the present embodiment, described single Term searching method refers to the method for searching for when searching character string according to user input only generates a corresponding etymology.Fig. 4 is the process flow diagram of single Term searching method among the present invention.As shown in Figure 4, described single Term searching method can comprise step as described below:
Step 401 is according to corresponding etymology of searching character string generation of user's input.
In this step, can be with the searching character string of user input as a word, directly with this word as the corresponding etymology of this searching character string.For example, the searching character string of importing as the user generates the etymology corresponding with this searching character string: " Liu Dehua " during for " Liu Dehua ".
Step 402 according to the etymology search index inverted list that generates, judges in the described index inverted list whether store this etymology; If then execution in step 403; Otherwise, execution in step 415.
Step 403 reads the lists of documents corresponding with described etymology from described index inverted list.
Owing to comprise a lists of documents (DocList) in the corresponding value of each etymology (Value) in the index inverted list, therefore, in step, can from described index inverted list, read the lists of documents corresponding with described etymology according to the above-mentioned etymology that generates.
Step 404 is resolved the Search Results scope of user's input, obtains reference position and bar number.
In this step, can resolve by the Search Results scope that the user is imported, obtain reference position InitialPositon and bar and count Count.
Step 405 judges whether (InitialPositon+Count)-1>n sets up, if then execution in step 406; Otherwise, execution in step 412;
Because the maximal value of described Search Results scope is (InitialPositon+Count)-1, therefore in this step, the maximal value of at first judging the Search Results scope whether greater than the length n of the ordered list of lists of documents, is judged namely whether (InitialPositon+Count)-1>n sets up.
When the maximal value of Search Results scope during greater than the length of the ordered list of lists of documents, illustrate that then described Search Results scope has exceeded the scope of ordered list, at this moment, only from ordered list, can't read all Search Results of required number, therefore, will continue execution in step 406;
When the maximal value of Search Results scope is less than or equal to the length of ordered list of lists of documents, illustrate that then described Search Results scope does not exceed the scope of ordered list, at this moment, only from ordered list, namely can read all required Search Results, therefore can execution in step 412, from ordered list, directly obtain the Search Results of required number.
Step 406 judges whether InitialPositon>n sets up, if then execution in step 410; Otherwise, execution in step 407;
In this step, will judge that InitialPositon in the Search Results scope is whether greater than the length n of ordered list.
As the InitialPositon in the Search Results scope during greater than the length of ordered list, the InitialPositon that described Search Results scope then is described has been arranged in unordered list, described Search Results all is arranged in unordered list, do not comprise Search Results in the described ordered list, but therefore execution in step 410;
When the InitialPositon in the Search Results scope is less than or equal to the length of ordered list, the InitialPositon that described Search Results scope then is described still is arranged in ordered list, namely there is the part Search Results to be arranged in ordered list, but also has the part Search Results to be arranged in unordered list.But this moment execution in step 407.
Step 407, the InitialPositon bar start-of-record from ordered list reads (n-InitialPositon+1) bar record and joins in the search result list (ResultList).
Step 408 is pressed the record in the described unordered list size order of KFScore and is arranged.
Step 409 reads [Count-(n-InitialPositon+1)] bar record and joins among the ResultList execution in step 413 from described unordered list.
Step 410 is pressed the record in the described unordered list size order of KFScore and is arranged.
Step 411, (InitialPositon-n) bar start-of-record from the unordered list after the ordering reads Count bar record and joins among the ResultList execution in step 413.
Step 412, the InitialPositon bar start-of-record from ordered list reads Count bar record and joins among the ResultList execution in step 413.
Step 413, as Search Results, and the total number of documents TotalCount of the lists of documents corresponding with described etymology that will read from described index inverted list is as correlated results sum SumCount with described ResultList.
Step 414 returns to the user with described Search Results and correlated results sum.Process ends.
Step 415 as Search Results, is established SumCount=0 with empty set, and described empty set and correlated results sum are returned to the user.Process ends.
By above-mentioned step 401~415, can realize single Term search.
To be example with the instantiation in music searching field below, above-mentioned single Term searching method will further be introduced.
Instantiation one:
Suppose that music document to be retrieved in the music searching field has 10,000,000, the music information of each document comprises: the audition amount (ListenCount) of song title (SongName), singer's name (SingerName), album name (AlbumName), the lyrics, song and portfolio (BusinessCount).
In example of the present invention, can described ListenCount and BusinessCount be set to the determinant attribute of above-mentioned music document, the determinant attribute weight is set to 0.3 and 0.7 respectively, and then the determinant attribute score value (KFScore) of each music document can calculate by formula as described below:
KFScore=ListenCount*0.3+BusinessCount*0.7
Can set up index inverted list InvertIndexList according to above-mentioned music document to be retrieved, determinant attribute and determinant attribute score value, and the length n of the ordered list in can the index inverted list is set to 100.
Fig. 5 is the structural representation of the index inverted list in the instantiation of the present invention.As shown in Figure 5, the corresponding value of etymology " Liu Dehua " is:<10000, DocList 〉, therefore, total number of documents TotalCount=10000 among the DocList corresponding with etymology " Liu Dehua ", the length n=100 of the ordered list among this DocList namely comprises 100 records in the ordered list; The length of unordered list is 9900, namely comprises 9900 records in the unordered list; In like manner, the corresponding value of etymology " I and you " is:<9870, DocList 〉, therefore, the total number of documents TotalCount=9870 among the DocList corresponding with etymology " I and you "; ...; The corresponding value of etymology " providence " is:<60, DocList 〉, therefore, the total number of documents TotalCount=60 among the DocList corresponding with etymology " providence "; Wherein, can suppose again in above-mentioned 60 documents, have 20 singers in the document to be called " Liu Dehua ".
The searching character string of importing as the user is " Liu Dehua ", and the Search Results scope of user's input is: during the 31st~50 Search Results, can use single Term searching method as described below to search for:
Searching character string according to user's input generates corresponding etymology " Liu Dehua "; Search Results scope to user's input is resolved, and knows reference position InitialPositon=31, and bar is counted Count=20;
According to the etymology that generates " Liu Dehua " retrieval index inverted list shown in Figure 5, in the index inverted list, find the etymology corresponding with the etymology of above-mentioned generation " Liu Dehua "; The corresponding value of this etymology is:<10000, DocList 〉, therefore, the total number of documents TotalCount=10000 among the DocList corresponding with etymology " Liu Dehua ", the length n=100 of the ordered list among this DocList namely comprises 100 records in the ordered list; The length of unordered list is 9900, namely comprises 9900 records in the unordered list.
Because (InitialPositon+Count)-and 1=31+20-1<n=100, so the Search Results scope is contained in the described ordered list, so direct the 31st start-of-record from ordered list reads 20 records as Search Results; Simultaneously, correlated results adds up to SumCount=TotalCount=10000.
Because the record in the ordered list has all been pressed the size order of KFScore and arranged, therefore 20 records that read are the Search Results of arranging by the size order of KFScore.
Embodiment two: many Term searching method.
In the practical application scene, the user might import a plurality of key words and search for, wish more accurately to locate the content of desired seek, simultaneously, some user also may have specific requirement to result for retrieval, for example, need search the information that not only comprises key word A but also comprise key word B, perhaps need to search the information that comprises key word A or key word B, perhaps need to search and comprise key word A but do not comprise information of key word B etc.Therefore, also might exist with (AND) between a plurality of key words that the user imports or (OR) and poor (SUB) three kinds of logical operation situations.
In order to satisfy above-mentioned user's demand, in technical scheme of the present invention, a kind of many Term searching method has been proposed also.In the present embodiment, described many Term searching method refers to the method for searching for when searching character string according to user input generates a plurality of corresponding etymology.Fig. 6 is the process flow diagram of many Term searching method among the present invention.As shown in Figure 6, described many Term searching method can comprise step as described below:
Step 600 arranges Y section read range in the lists of documents of index inverted list.
In specific embodiments of the invention, the strategy that adopts segmentation to read is retrieved, retrieve all at the enterprising line operate of full dose data avoiding as far as possible at every turn.Therefore, in this step, will in the lists of documents of index inverted list, set in advance Y section read range.Wherein, Y 〉=2.
For example, when Y=2, will in the lists of documents of index inverted list, set in advance 2 sections read ranges.At this moment, can be with the ordered list of lists of documents as the 1st section read range, and with the entire document tabulation as the 2nd section read range.
Again for example, when Y=3, will in the lists of documents of index inverted list, set in advance 3 sections read ranges.At this moment, can be with the ordered list of lists of documents as the 1st section read range, with the unordered list of lists of documents as the 2nd section read range, and with the entire document tabulation as the 3rd section read range.
In specific embodiments of the invention, can also use other method that read range is set according to practical situations, concrete method to set up does not repeat them here.
Step 601 is according to etymology formation of searching character string generation and the query logic sequence of user's input.
In this step, will resolve the searching character string of user's input, generate an etymology formation TermArray and a query logic sequence SetOperators.Wherein, x etymology can be comprised among the described etymology formation TermArray, TermArray{Term can be expressed as 1, Term 2..., Term x.Comprise (x-1) individual query logic symbol among the described query logic sequence SetOperators, can be expressed as SetOperators{symbol 1, symbol 2..., symbol X-1.Wherein, the logical relation between each etymology is represented by corresponding query logic in the described query logic sequence in the described etymology formation.For example, the symbol in the described query logic sequence 1Expression Term 1With Term 2Between logical relation, symbol 2Expression Term 3With Term before 1, Term 2Between logical relation, and the like.In addition, the value of described query logic symbol is: AND, OR or SUB.
For example, when the searching character string of importing as the user is " Liu De China OR providence ", can resolves the back to above-mentioned searching character string and generate etymology formation TermArray{ Liu De China, providence and a query logic sequence SetOperators{symbol 1, wherein, symbol 1Value be OR.
Step 602 is resolved the Search Results scope of user's input, obtains reference position and bar number.
In this step, can resolve by the Search Results scope that the user is imported, obtain reference position InitialPositon and bar and count Count.
Step 603 judges that whether the length of TermArray is greater than 1; If then execution in step 604; Otherwise, execution in step 616.
In this step, will judge that at first whether the length of TermArray is greater than 1; If, then illustrate to comprise plural etymology among the TermArray at least, will continue execution in step 604; If not, illustrate only to comprise an etymology among the TermArray, therefore can only use single Term searching method to search for, and needn't use many Term searching method, so execution in step 616.
From above-mentioned steps 603 as can be known, in specific embodiments of the invention, described single Term way of search is a kind of special circumstances of many Term way of search.
Step 604 reads the first two words source among the TermArray, and initial value is set is that 2 variable i and initial value are 1 variable i.
In this step, will directly from TermArray, read first two words source, i.e. etymology Term 1And Term 2
In addition, can also set in advance two variablees in this step: initial value is that 2 variable i and initial value are 1 variable j.Wherein, i can be used for representing that current what read is i etymology, and j then can be used for representing the hop count of current read range, represents that namely current read range is j section read range.
Step 605, according to above-mentioned two etymologies difference search index inverted lists that read, from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described two etymologies respectively, the record that reads is stored in respectively in first result for retrieval tabulation (ResultList1) and second result for retrieval tabulation (ResultList2).
Specifically, can regard above-mentioned two etymologies that read as two independently etymologies respectively in this step, according to each etymology difference search index inverted list.If store above-mentioned two etymologies in the index inverted list, then from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described two etymologies respectively.Then, (be Term with first etymology that reads 1) record in the corresponding lists of documents in the j section read range is stored among the ResultList1, (is Term with second etymology that reads 2) record in the corresponding lists of documents in the j section read range is stored among the ResultList2.
In this step, the j=1 of this moment, therefore that reads is in the lists of documents record in the 1st section read range.If described the 1st section read range is set to ordered list in step 600, the record in the ordered list that is lists of documents that reads this moment then.
Further, if do not store any one etymology in above-mentioned two etymologies in the index inverted list, then can the corresponding result for retrieval tabulation of this etymology of not storing be set to empty set.For example, if first etymology of not storing in the index inverted list in above-mentioned two etymologies (is Term 1), then the corresponding result for retrieval tabulation of this etymology ResultList1 is set to empty set; If second etymology not storing in the index inverted list in above-mentioned two etymologies (is Term 2), then the corresponding result for retrieval tabulation of this etymology ResultList2 is set to empty set.
Step 606 reads (i-1) the individual logical symbol symbol among the query logic sequence SetOperators I-1
Step 607, the value of the logical symbol that judgement is read; When the value of described logical symbol is AND, execution in step 608; When the value of described logical symbol is OR, execution in step 609; When the value of described logical symbol is SUB, execution in step 610.
Step 608 is used with (AND) logic to merge ResultList1 and ResultList2, and amalgamation result joins in the search result list (ResultList); Execution in step 611.
In this step, will use with logic to merge ResultList1 and ResultList2, and amalgamation result will be joined among the ResultList.
Because every record among ResultList1 and the ResultList2 all comprises two attributes: determinant attribute score (KFScore) and document identification (Did), and the purpose of carrying out merging with logic is to find out the identical record of Did from two result for retrieval tabulations.Because the singularity of KFScore as can be known, if the KFScore of two documents is identical, then these two documents might belong to same document, and if the KFScore difference of two documents then is same document scarcely.
Based on above-mentioned reason, in specific embodiments of the invention, described use and logic merge ResultList1 and ResultList2, and amalgamation result is joined among the ResultList and can comprise:
Compare each the bar record among ResultList1 and the ResultList2 one by one, KFScore two records identical with Did in two result for retrieval tabulations are joined in the search result list (ResultList) as a Search Results.
Step 609, use or (OR) logic merge ResultList1 and ResultList2, amalgamation result joins in the search result list (ResultList); Execution in step 611.
In this step, will use or logic merging ResultList1 and ResultList2, and amalgamation result will be joined among the ResultList.
For example, in specific embodiments of the invention, described use or logic merge ResultList1 and ResultList2, and amalgamation result is joined among the ResultList and can comprise:
Each bar record among described ResultList1 and the ResultList2 is joined in the search result list (ResultList), if there is the KFScore of two records all identical with Did, then in ResultList, delete a record wherein.
Step 610 uses poor (SUB) logic to merge ResultList1 and ResultList2, and amalgamation result joins in the search result list (ResultList); Execution in step 611.
In this step, will use the difference logic to merge ResultList1 and ResultList2, and amalgamation result will be joined among the ResultList.
For example, in specific embodiments of the invention, described use difference logic merges ResultList1 and ResultList2, and amalgamation result is joined among the ResultList and can comprise:
Remove each bar record of storing among the ResultList2 from described ResultList1, the record that will remove among the ResultList1 after the operation joins in the search result list (ResultList).
Step 611 judges whether also have Term not use among the TermArray; If then execution in step 612; Otherwise, execution in step 613.
Step 612 is established i=i+1, reads i etymology among the TermArray; Empty ResultList1 and ResultList2, and the record among the ResultList is copied among the ResultList2; Retrieve described index inverted list according to the etymology that reads, from described index inverted list, read the record in the j section read range in the lists of documents corresponding with the etymology that reads, the record that reads is stored among the ResultList1; Return execution in step 606.
Specifically, in this step i=i+1 can be set earlier, from TermArray, to read next etymology (i.e. i etymology); Simultaneously, also ResultList1 and ResultList2 are emptied, and will before Search Results (being the record among the ResultList) copy among the ResultList2 so that in the subsequent searches process will before Search Results carry out logical operation with the corresponding Search Results of next etymology that reads.
Then, again according to the etymology search index inverted list that reads.If store the above-mentioned etymology that reads in the index inverted list, then from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described etymology, and the record that reads is stored among the ResultList1, return execution in step 606 again.
In this step, if the j=1 of this moment, what then read is the 1st section record that read range is interior in the lists of documents.And if described the 1st section read range is set to ordered list in step 600, what then read this moment will be the record in the ordered list of lists of documents.In like manner, if the j=2 of this moment, what then read is the 2nd section record that read range is interior in the lists of documents.And if described the 2nd section read range is set to the entire document tabulation in step 600, that then reads this moment will be the record in the entire document tabulation.And the like, do not repeat them here.
Further, if do not store above-mentioned i the etymology that reads in the index inverted list, then can be set to empty set by described ResultList1, return execution in step 606 again.
Step 613 judges whether to satisfy termination condition; If then execution in step 617; Otherwise, execution in step 614.
Wherein, in specific embodiments of the invention, described termination condition can for:
The maximal value of the number great-than search range of results of the record of storing among the ResultList: (InitialPositon+Count)-1; Perhaps, j equals Y.
In technical scheme of the present invention, when the maximal value (InitialPositon+Count)-1 of the number great-than search range of results of the record of storing among the ResultList, then expression has retrieved the Search Results of sufficient amount, therefore satisfies the condition of finishing beam search.And as j during greater than Y, then the search of final stage has been carried out in expression, also can finish search this moment, thereby satisfies termination condition.
Step 614 empties ResultList1 and ResultList2; Read the first two words source among the TermArray, and i=2 and j=j+1 are set; According to above-mentioned two etymologies difference search index inverted lists that read, from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described two etymologies respectively, be stored in respectively in first result for retrieval tabulation (ResultList1) and second result for retrieval tabulation (ResultList2).
Because being termination conditions, the judged result of step 613 do not satisfy, represent that then the number of the record stored among the ResultList is less than required bar number, that is to say, in retrieving on last stage, do not retrieve the Search Results of sufficient amount, so need carry out the retrieving in this stage (i.e. j stage).
Therefore, in this step, will distinguish search index inverted list again according to above-mentioned two etymologies that read.If store above-mentioned two etymologies in the index inverted list, then from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described two etymologies respectively.Then, (be Term with first etymology that reads 1) record in the corresponding lists of documents in the j section read range is stored among the ResultList1, (is Term with second etymology that reads 2) record in the corresponding lists of documents in the j section read range is stored among the ResultList2.
Step 615 is pressed the record among described ResultList1 and the ResultList2 size order of KFScore respectively and is arranged; Return execution in step 606.
Step 616 is used above-mentioned single Term searching method, and the etymology search index inverted list according among the described TermArray obtains Search Results and correlated results sum; Execution in step 618.
Step 617 reads required record as Search Results from described ResultList, with the number of the record stored among the ResultList as correlated results sum (SumCount).
For example, in specific embodiments of the invention, describedly from described ResultList, read required record and comprise as Search Results:
When SumCount 〉=(InitialPositon+Count)-1, the InitialPositon bar start-of-record from described ResultList reads Count bar record as Search Results;
When InitialPositon≤SumCount<(InitialPositon+Count)-1, the number deficiency that then shows the record among the ResultList, at this moment, InitialPositon bar start-of-record from described ResultList, read (SumCount-InitialPositon+1) bar record as Search Results
When SumCount<InitialPositon, then show the number of the record among the ResultList very little, there is not required Search Results, at this moment, can be with empty set as Search Results.
Step 618 is returned Search Results and SumCount to the user.
In above-mentioned many Term searching method, used the strategy of sectioning search.For example, when having set in advance 2 sections read ranges, and with the ordered list of lists of documents as the 1st section read range, and with the entire document tabulation during as the 2nd section read range, with the ordered list in the search lists of documents earlier; If the number of the result for retrieval that obtains from ordered list is more than or equal to required bar number, the Search Results that has retrieved sufficient amount in first section the retrieving then is described, need not unordered list is searched this moment, and can directly provide the Search Results that has sorted; Have only when the number of result for retrieval in the ordered list is counted less than required bar, when namely not retrieving the Search Results of sufficient amount in first section the retrieving, just can carry out the search of subordinate phase, search in all records from lists of documents.
Therefore, use above-mentioned many Term searching method, can avoid at every turn all on the full dose data set (for example, in the entire document tabulation) to carry out search operaqtion as far as possible, thereby can dwindle the search volume effectively, reduce the complexity of search.
For example, in the prior art, if the rank of number of searches be hundred million ranks (for example, webpage or news search generally all are the T ranks), search then increase income at present (for example, Lucene) the each search in all can be carried out the full dose ordering to the result set (such as 1,000,000 grades) of search, obtains the result set of user search then on this basis.And if use above-mentioned many Term searching method, with the ordered list of lists of documents as the 1st section read range, then in most cases when finishing, the phase one search procedure can find the Search Results that satisfies user's needs, and need not carry out the search of subordinate phase, therefore can sharply reduce the data volume of required search, and not be used in the retrieving Search Results is sorted, thereby improved search response speed (generally can than the search response speed of Lucene a high order of magnitude) greatly, and can significantly reduce taking internal memory and cpu resource.
By above-mentioned step 601~618, can realize many Term search.
To be example with the instantiation in music searching field below, above-mentioned many Term searching method will further be introduced.
Instantiation two:
For for simplicity, in this instantiation, still use the basic setting in the instantiation one, and set up index inverted list shown in Figure 5, the length n of the ordered list in the index inverted list is set to 100.In addition, be located in the lists of documents of index inverted list and be provided with 2 sections read ranges (being Y=2), and with the ordered list of lists of documents as the 1st section read range, and with the entire document tabulation as the 2nd section read range.
The searching character string of importing as the user is " Liu De China AND providence ", and the Search Results scope of user's input is: during the 1st~10 Search Results, can use many Term searching method as described below to search for:
Searching character string according to user's input generates an etymology formation TermArray{ Liu De China, providence } and query logic sequence SetOperators{AND};
Search Results scope to user's input is resolved, and knows reference position InitialPositon=1, and bar is counted Count=10;
According to the etymology that generates " Liu Dehua " and " providence " retrieval index inverted list shown in Figure 5, find corresponding etymology " Liu Dehua " and " providence " with above-mentioned generation respectively;
According to index inverted list shown in Figure 5 as can be known, for etymology " Liu Dehua ", its corresponding value is:<10000, DocList 〉, therefore, the total number of documents TotalCount=10000 among the DocList corresponding with etymology " Liu Dehua ".So, will from described index inverted list, read the ordered list among the DocList corresponding with etymology " Liu Dehua ", all be stored in all 100 records in the described ordered list among the ResultList1;
According to index inverted list shown in Figure 5 as can be known, for etymology " providence ", its corresponding value is:<60, DocList 〉, therefore, the total number of documents TotalCount=60 among the DocList corresponding with etymology " providence "; Since 60<100, so the document among the DocList corresponding with etymology " providence " all is stored in the ordered list; So, will from described index inverted list, read the ordered list among the DocList corresponding with etymology " providence ", all be stored in all 60 records in the described ordered list among the ResultList2;
Read the 1st logical symbol from logic sequence SetOperators, because this logical symbol is AND, therefore use the AND logic to merge ResultList1 and ResultList2, amalgamation result joins among the ResultList;
Because in " providence " corresponding above-mentioned 60 documents, have 20 singers in the document to be called " Liu Dehua ", therefore, above-mentioned ResultList will store 20 records, the correlated results sum is: SumCount=20.
Because all Term all use among the TermArray, and (InitialPositon+Count)-1=1+10-1<SumCount, be that the Search Results scope is contained in the described ordered list, therefore satisfied termination condition, at this moment, directly the InitialPositon=1 bar start-of-record from described ResultList reads Count=10 bar record as Search Results;
At last, will return mentioned above searching results and correlated results sum SumCount=20 to the user.
Embodiment three: the hierarchical search method.
In the present embodiment, described hierarchical search method refers to according to preset rule whole search procedure is divided into the multilayer search procedure, strict differentiation is arranged between the search procedure of each level, and the field when searching between the different layers or search rule are different.In general, the score of last layer all is higher than one deck down, each layer inside is specified the rule according to information determinant attribute ordering again, divides the hierarchical search method like this and both can keep the correlativity of searching for, and ensures the important information search back forward problem of rank as a result simultaneously again.
Fig. 7 is the process flow diagram of multilayer searching method among the present invention.As shown in Figure 7, described multilayer searching method can comprise step as described below:
Step 701 sets in advance the multilayer search procedure according to predefined searching order rule, and the priority of each layer search procedure and the hierarchical search score value scope of each layer search procedure are set.
In this step, can preestablish corresponding searching order rule according to traffic performance, for example, in specific embodiments of the invention, at the music searching field, can preestablish following searching order rule: precise search, spelling search and participle search.According to above-mentioned searching order rule as can be known, at the music searching field, can carry out precise search earlier, if do not obtain the Search Results of sufficient amount then can carry out spelling search again, if do not obtain the Search Results of sufficient amount yet then can carry out participle search at last again, thereby obtain the required Search Results of user of sufficient amount as far as possible.
Therefore, according to above-mentioned predefined searching order rule, the multilayer search procedure can be set.For example, in the music searching field, three layers of following search procedure can be set: accurate layer search procedure, spelling layer search procedure and participle layer search procedure, and the priority of each layer search procedure by order from high to low is: accurate layer search procedure, spelling layer search procedure, participle layer search procedure.
Further, in specific embodiments of the invention, also every layer of above-mentioned search procedure can be divided into again a plurality of sublayers search procedure, and set the priority of each sublayer search procedure.For example, above-mentioned accurate layer search procedure can be divided into tactic three the sublayer search procedures of height according to priority: song sublayer search procedure, singer sublayer search procedure and special edition sublayer search procedure.Namely when carrying out accurate layer of search procedure, will carry out song sublayer search procedure earlier, and then carry out singer sublayer search procedure, carry out special edition sublayer search procedure at last again, to finish described accurate layer search procedure.In like manner, also described spelling layer search procedure and participle layer search procedure also can be divided into above-mentioned three sublayer search procedures.Do not repeat them here.
In addition, in specific embodiments of the invention, also can further set in advance the hierarchical search score value scope of each layer search procedure, thereby be convenient to the ordering of the follow-up result for retrieval that obtains.For example, when the setting search procedure that haves three layers, and the priority of each layer search procedure by order from high to low is: when ground floor search procedure, second layer search procedure, the 3rd layer of search procedure, the hierarchical search score value scope of ground floor search procedure can be made as: [A1, A2], the hierarchical search score value that namely is illustrated in resulting all result for retrieval in the ground floor search procedure all will be between A1 and A2; The hierarchical search score value scope of second layer search procedure is made as: [B1, B2] is made as the hierarchical search score value scope of the 3rd layer of search procedure: [C1, C2].Wherein, A1>A2>B1>B2>C1>C2.Therefore, the search mark of any one result for retrieval that obtains in the ground floor search procedure all will be higher than the result for retrieval that obtains in second and third layer search procedure.By above-mentioned method, thereby can in the correlativity that keeps result for retrieval, also can guarantee about the result for retrieval rank of important information earlier.
Step 702 when the user imports searching character string, is carried out each layer search in proper order according to the height of priority.
In this step, when the user imports searching character string, will carry out the search procedure of layering, namely according to the highest search of the height of the priority advanced row major level of order, and then carry out priority time high search, and the rest may be inferred, up to the required Search Results of the user who searches sufficient amount.
In technical scheme of the present invention, above-mentioned steps 702 can have multiple concrete implementation.To be example with a kind of specific implementation wherein below, technical scheme of the present invention will be introduced.
Fig. 8 is the process flow diagram of a kind of implementation method of step 702 among the present invention.As shown in Figure 8, above-mentioned steps 702 can comprise step as described below:
Step 801 is resolved the searching character string of user's input according to each layer search procedure that sets in advance, and generates etymology formation and the query logic sequence corresponding with each layer search procedure respectively.
In technical scheme of the present invention, owing to set in advance the multilayer search procedure, and every layer of employed searching method of search procedure might not be identical, therefore in this step, can resolve the searching character string of user's input, thereby generate etymology formation and the query logic sequence corresponding with each layer search procedure respectively.
Further, in specific embodiments of the invention, if having only an etymology in the etymology formation that generates, then corresponding with this etymology formation query logic sequence is empty set.
For example, if the multilayer search procedure that sets in advance is: accurate layer search procedure, spelling layer search procedure and participle layer search procedure, and the searching character string that the user imports is " Liu De China providence ", after then the searching character string that the user is imported is resolved, resultant and accurately have only an etymology " Liu De China providence " in the corresponding etymology formation of layer search procedure, the query logic sequence corresponding with this etymology formation is empty set; Have only an etymology " liudehuatianyi " in the resultant etymology formation corresponding with spelling layer search procedure, the query logic sequence corresponding with this etymology formation is empty set; Two etymologies are arranged in the resultant etymology formation corresponding with participle layer search procedure: " Liu Dehua " and " providence " has logical symbol a: OR in the query logic sequence corresponding with this etymology formation.
Step 802 is resolved the Search Results scope of user's input, obtains reference position and bar number.
In this step, can resolve by the Search Results scope that the user is imported, obtain reference position InitialPositon and bar and count Count.
Step 803 according to priority preset, is defined as the current search process with current search procedure unenforced and that priority is the highest.
In this step, need to determine which layer search procedure be the current search process be.Therefore, can be according to priority preset, determine that the current search process is current unenforced and search procedure that priority is the highest.For example, if carry out search procedure for the first time, then the current search process is the highest search procedure of priority; If carry out search procedure for the second time, then the current search process is priority time high search procedure; And the rest may be inferred.
Step 804 is retrieved the index inverted list according to etymology formation and the query logic sequence corresponding with the current search process, is stored in result for retrieval in the layering result for retrieval set and obtains current layering result for retrieval sum.
Owing in step 801, generated etymology formation and the query logic sequence corresponding with each layer search procedure, therefore in this step, can directly retrieve the index inverted list according to current search process corresponding etymology formation and query logic sequence, result for retrieval is stored in layering result for retrieval set (LayerResultList), thereby obtains the set of layering result for retrieval; Simultaneously, also can obtain the sum of the result for retrieval in the current layering result for retrieval set, i.e. current layering result for retrieval sum (LayerResultCount).
In addition, in the current search process of this step, can use above-mentioned many Term way of search that the index inverted list is retrieved, concrete retrieving does not repeat them here.
Step 805 according to the KFScore of each result for retrieval and the hierarchical search score value scope of current search process, is calculated the hierarchical search score value of each result for retrieval in the set of layering result for retrieval.
In specific embodiments of the invention, can use multiple computing method to calculate the hierarchical search score value of each result for retrieval in the set of layering result for retrieval.
For example, can be according to each result for retrieval in the hierarchical search score value scope of current search process and the set of layering result for retrieval according to KFScore putting in order from big to small, for each result for retrieval arranges corresponding hierarchical search score value.Other computing method do not repeat them here.
Step 806 joins the size of the result for retrieval in the set of layering result for retrieval according to the hierarchical search score value in total result for retrieval set, and the result for retrieval of deletion repetition; Calculate current total result for retrieval sum, and empty described layering result for retrieval set.
In this step, the result for retrieval in the set of the layering result for retrieval that obtains in the step 804 all can be inserted in the result for retrieval in total result for retrieval set (ResultList) according to the size of hierarchical search score value, and the result for retrieval that repeats of deletion.Wherein, the initial value of described total result for retrieval set is empty set, i.e. search result storage not in total result for retrieval set under the initial situation.Therefore, if the current search process is the ground floor search procedure, when then the result for retrieval in the set of layering result for retrieval being joined in total result for retrieval set, can not have the result for retrieval of repetition, namely the number of the result for retrieval of Chong Fuing is 0.
In addition, in specific embodiments of the invention, also need calculate current total result for retrieval sum (SearchResultCount), i.e. the sum of the result for retrieval in current total result for retrieval set.For example, can directly add up current total result for retrieval set, obtain the sum of result for retrieval; Perhaps, the total result for retrieval sum before this can also being calculated adds current layering result for retrieval sum, and deducts the number of the result for retrieval of repetition, thereby obtains current total result for retrieval sum.
Step 807 judges whether to satisfy termination condition; If then execution in step 808; Otherwise, return execution in step 803.
Wherein, in specific embodiments of the invention, described termination condition can for:
The maximal value of current total result for retrieval sum great-than search range of results, perhaps the current search process is last one deck search procedure.
In technical scheme of the present invention, when current total result for retrieval sum great-than search range of results (InitialPositon+Count), then expression has retrieved the Search Results of sufficient amount, therefore satisfies the condition of finishing beam search.And when the current search process is last one deck search procedure, also can finish search this moment, thereby satisfy termination condition.
Step 808 reads the result for retrieval of required number as Search Results according to the Search Results scope from described total result for retrieval set; With current total result for retrieval sum as correlated results sum (SumCount).
For example, in specific embodiments of the invention, the described result for retrieval that reads required number according to the Search Results scope from described total result for retrieval set comprises as Search Results:
When SumCount 〉=(InitialPositon+Count)-1, the InitialPositon bar start-of-record from described total result for retrieval set reads Count bar record as Search Results;
When InitialPositon≤SumCount<(InitialPositon+Count)-1, the number deficiency that then shows the record in total result for retrieval set, at this moment, InitialPositon bar start-of-record from described total result for retrieval set reads (SumCount-InitialPositon+1) bar record as Search Results;
When SumCount<InitialPositon, then show the number of the record in total result for retrieval set very little, there is not required Search Results, at this moment, can be with empty set as Search Results.
Step 809 is returned Search Results and Search Results sum to the user.
By above-mentioned step 701~702, can realize the multilayer search.
To be example with the instantiation in music searching field below, above-mentioned multilayer searching method will further be introduced.
Instantiation three:
For for simplicity, in this instantiation, still use the basic setting in the instantiation one, and set up index inverted list shown in Figure 5, the length n of the ordered list in the index inverted list is set to 100.
The searching character string of importing as the user is " Liu De China providence ", and the Search Results scope of user's input is: during the 1st~50 Search Results, can use multilayer searching method as described below to search for:
In this instantiation, at first set in advance three layers of search procedure according to the traffic performance in the music searching field: accurate layer search procedure, spelling layer search procedure and participle layer search procedure.And the priority of each layer search procedure by order from high to low is: accurate layer search procedure, spelling layer search procedure, participle layer search procedure.Then, more above-mentioned each layer search procedure all according to priority is divided into three sublayer search procedures by from high to low order: song sublayer search procedure, singer sublayer search procedure and special edition sublayer search procedure.
According to the above-mentioned situation that arranges, set in advance the hierarchical search score value scope of each layer search procedure: accurately the hierarchical search score value scope of layer search procedure is made as [A1, A2]; The hierarchical search score value scope of spelling layer search procedure is made as [B1, B2], the hierarchical search score value scope of participle layer search procedure is made as [C1, C2]; Wherein, A1>A2>B1>B2>C1>C2.
Finish after the above-mentioned setting, can resolve the searching character string " Liu De China providence " of user's input according to each layer search procedure that sets in advance, generate etymology formation and the query logic sequence corresponding with each layer search procedure respectively, be respectively:
Have only an etymology " Liu De China providence " in the etymology formation corresponding with accurate layer search procedure, the query logic sequence corresponding with this etymology formation is empty set;
Have only an etymology " liudehuatianyi " in the etymology formation corresponding with spelling layer search procedure, the query logic sequence corresponding with this etymology formation is empty set;
Two etymologies are arranged in the etymology formation corresponding with participle layer search procedure: " Liu Dehua " and " providence " has logical symbol a: AND in the query logic sequence corresponding with this etymology formation.
Search Results scope to user's input is resolved, and knows reference position InitialPositon=1, and bar is counted Count=50;
Because accurately the priority of the song sublayer search procedure in the layer search procedure is the highest, therefore song sublayer search procedure can be defined as the current search process.
In the search procedure of song sublayer, use single Term way of search retrieval index inverted list shown in Figure 5 according to etymology " Liu De China providence ".Owing to fail to find corresponding etymology " Liu De China providence " from described index inverted list, therefore return empty Search Results; So total retrieval set is combined into empty set, current total result for retrieval adds up to 0.
Because current total result for retrieval sum is less than the Search Results scope, and current search process one deck search procedure at last, therefore do not satisfy termination condition, will continue the follow-up retrieving of execution.At this moment, current search procedure unenforced and that priority is the highest is singer sublayer search procedure, therefore singer sublayer search procedure can be defined as the current search process, carries out follow-up retrieval.
In like manner, in singer sublayer search procedure and special edition sublayer search procedure, also all fail to find corresponding etymology " Liu De China providence " from described index inverted list, therefore return empty Search Results, so still do not satisfy termination condition, will proceed follow-up spelling layer search procedure.
Because the etymology in the spelling layer search procedure is " liudehuatianyi ", therefore during each sublayer search procedure in carrying out spelling layer search procedure, also all fail to find corresponding etymology " liudehuatianyi " from described index inverted list, therefore return empty Search Results, so still do not satisfy termination condition, will proceed follow-up participle layer search procedure.
In participle layer search procedure, will carry out song sublayer search procedure earlier, and then carry out singer sublayer search procedure, carry out special edition sublayer search procedure at last again.
Owing to two etymologies are arranged: " Liu Dehua " and " providence " in the participle layer search procedure, and in the query logic sequence corresponding with this etymology formation logical symbol a: AND is arranged, therefore, in each sublayer search procedure of participle layer search procedure, all can find corresponding etymology from described index inverted list, and obtain corresponding Search Results.
When the number of resulting Search Results in the participle layer search procedure during greater than 50, to finish search procedure, and from described total result for retrieval set, read the result for retrieval of required number as Search Results according to the Search Results scope, calculating Search Results sum returns Search Results and Search Results sum to the user then.
And the number of the Search Results when participle layer search procedure finishes is during still less than 50, because participle layer search procedure has been last one deck search procedure, therefore satisfy termination condition equally, at this moment, the Search Results sum be can calculate, Search Results and Search Results sum returned to the user then.
In technical scheme of the present invention, a kind of searcher has been proposed also.Fig. 9 is the structural representation of the searcher among the present invention.As shown in Figure 9, the searcher among the present invention comprises: inverted list generation module 901, memory module 902, etymology generation module 903, retrieval module 904.
Described inverted list generation module 901, being used for whole documents to be retrieved is that key word carries out index with etymology Term, and foundation is key word index with Term, be the index inverted list of value with the total TotalCount of the document that comprises this Term and the lists of documents DocList that comprises this Term; Include document code and the determinant attribute score value of a document in every record in the described lists of documents; Described lists of documents is made up of ordered list and unordered list, comprises n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the described ordered list; Wherein, described n is predetermined value; Described index inverted list is sent to memory module 902;
Described memory module 902 is used for the described index inverted list of storage;
Described etymology generation module 903 is used for generating corresponding etymology according to the searching character string of user's input; The Search Results scope of described etymology and user's input is sent to retrieval module 904;
Described retrieval module 904, be used for according to the etymology that generates the index inverted list of memory module 902 storages being retrieved, and preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum; Described Search Results and correlated results sum are sent to the user.
In sum, in technical scheme of the present invention, because be same class document setup determinant attribute and determinant attribute weight, therefore can calculate the determinant attribute score value of each document, set up the index inverted list according to above-mentioned determinant attribute score value then, make the lists of documents in this index inverted list be formed by ordered list and unordered list, and comprise n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the ordered list.Because the determinant attribute score value of document has been finished when setting up the index inverted list, but also the record of storing in the index inverted list has been fulfiled ahead of schedule partial ordered operation, therefore can make when according to the searching character string of user's input described index inverted list being retrieved, the Search Results scope that the user imports is being included under most situations among the ordered list, thereby can directly from ordered list, obtain the Search Results of required number, guaranteed that most Search Results can directly obtain from ordered list, and the Search Results that obtains is arranged by the size order of KFScore, therefore need not again Search Results to be sorted, thereby having significantly reduced full dose data score calculates and the full dose sorting operation, significantly reduced the operation times of CPU, significantly improved search speed, make search speed improve an order of magnitude than search speed of the prior art, and improved greatly the search response time of search engine.
In addition, in index inverted list of the present invention, what store is the determinant attribute score value of each document, need not to deposit the required factor that counts the score, therefore reduced the field of depositing, saved shared memory source greatly, thereby saved a large amount of hardware and software resources.
In addition, in index inverted list of the present invention, the foundation that record in the ordered list is sorted is determinant attribute and the determinant attribute score value of each document, and the determinant attribute of each document and determinant attribute score value all can arrange in advance according to practical situations, therefore can take into full account text relevant and information relative worth, make the ordering of the Search Results demand of more being close to the users; Simultaneously, determinant attribute and determinant attribute score value can also be set flexibly, thus can be at the sorting operation that guarantees customization of search results on the basis of text relevant.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (25)

1. a searching method is characterized in that, this method comprises:
A, for same at least one determinant attribute of class document setup and corresponding determinant attribute weight, and according to the determinant attribute score value KFScore of described determinant attribute and each document of determinant attribute weight calculation;
B, be that key word carries out index with whole documents to be retrieved with etymology Term, to set up with Term be key word index, be the index inverted list of value with the total TotalCount of the document that comprises this Term and the lists of documents DocList that comprises this Term; Include document code and the determinant attribute score value of a document in every record in the described lists of documents; Described lists of documents is made up of ordered list and unordered list, comprises n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the described ordered list; Wherein, described n is predetermined value;
C, the searching character string of importing according to the user generate corresponding etymology, according to the etymology that generates described index inverted list is retrieved, and preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum.
2. method according to claim 1 is characterized in that, the formula that calculates the KFScore of document is:
KFScore=KeyField 1*W 1+KeyField 2*W 2+......+KeyField X*W X
Wherein, W 1+ W 2+ ...+W X=1; Described KeyField xThe value of representing x KeyField of described document, described W xThe value of representing the Weight corresponding with x KeyField of described document.
3. method according to claim 1 is characterized in that,
Described Search Results scope comprises: reference position InitialPositon and bar are counted Count;
The maximal value of described Search Results scope is: (InitialPositon+Count)-1.
4. method according to claim 3 is characterized in that, described Search Results scope according to user input is preferentially obtained record from the ordered list of the etymology correspondence that generates, comprise to obtain required Search Results:
When the maximal value of described Search Results scope was less than or equal to the length n of ordered list of the etymology correspondence that generates, the InitialPositon bar start-of-record from described ordered list read Count bar record as Search Results;
Be arranged in the ordered list of the etymology correspondence that generates when the reference position of described Search Results scope, and the maximal value of described Search Results scope is during greater than the length n of described ordered list, InitialPositon bar start-of-record from described ordered list reads (n-InitialPositon+1) bar record; Record in the unordered list of the etymology correspondence that generates is arranged by the size order of KFScore, and the 1st start-of-record from described unordered list, read [Count-(n-InitialPositon+1)] bar record; The Count bar that reads is recorded as Search Results;
When the reference position of described Search Results scope is arranged in the unordered list of the etymology correspondence that generates, record in the described unordered list being pressed the size order of KFScore arranges, (InitialPositon-n) bar start-of-record from the unordered list after the ordering reads Count bar record as Search Results then.
5. method according to claim 3 is characterized in that, also further comprises among the step C:
When etymology that the searching character string of not storing in the described index inverted list according to user input generates, with empty set as Search Results.
6. method according to claim 1 is characterized in that, described step C comprises:
C1, the searching character string of importing according to the user generate a corresponding etymology;
C2, according to the etymology search index inverted list that generates, judge in the described index inverted list whether store this etymology; If, execution in step c3 then; Otherwise, execution in step c15;
C3, from described index inverted list, read the lists of documents corresponding with described etymology;
C4, the Search Results scope of user input is resolved, obtain reference position InitialPositon and bar and count Count;
Whether c5, judgement (InitialPositon+Count)-1>n set up, if, execution in step c6 then; Otherwise, execution in step c12; Wherein, described n is the length of ordered list;
C6, judge whether InitialPositon>n sets up, if, execution in step c10 then; Otherwise, execution in step c7;
C7, the InitialPositon bar start-of-record from ordered list read (n-InitialPositon+1) bar record and join among the search result list ResultList;
C8, the size order that KFScore pressed in the record in the described unordered list are arranged;
C9, from described unordered list, read [Count-(n-InitialPositon+1)] bar record and join among the ResultList execution in step c13;
C10, the size order that KFScore pressed in the record in the described unordered list are arranged;
C11, (InitialPositon-n) bar start-of-record from the unordered list after the ordering read Count bar record and join among the ResultList execution in step c13;
C12, the InitialPositon bar start-of-record from ordered list read Count bar record and join among the ResultList execution in step c13;
C13, with described ResultList as Search Results, and the total number of documents TotalCount of the lists of documents corresponding with described etymology that will read from described index inverted list is as the correlated results sum;
C14, described Search Results and correlated results sum are returned to the user.Process ends;
C15, with empty set as Search Results, establish the correlated results sum and equal 0, described empty set and correlated results sum are returned to the user, process ends.
7. method according to claim 1 is characterized in that, described step C comprises:
C1, Y section read range is set in the lists of documents of index inverted list; Wherein, Y 〉=2;
C2, the searching character string of importing according to the user generate an etymology formation TermArray and query logic sequence SetOperators;
C3, the Search Results scope of user input is resolved, obtain reference position InitialPositon and bar and count Count;
C4, when the length of TermArray greater than 1 the time, read the first two words source among the TermArray, and initial value is set is that 2 variable i and initial value are 1 variable j;
C5, according to two etymologies that read search index inverted lists respectively, from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described two etymologies respectively, the record that reads is stored in respectively among first result for retrieval tabulation ResultList1 and second result for retrieval tabulation ResultList2;
C6, read (i-1) the individual logical symbol among the SetOperators;
The value of the logical symbol that C7, judgement are read; When the value of described logical symbol is AND, execution in step C8; When the value of described logical symbol is OR, execution in step C9; When the value of described logical symbol is SUB, execution in step C10;
C8, use and logic merge ResultList1 and ResultList2, and amalgamation result joins among the search result list ResultList; Execution in step C11;
C9, use or logic merge ResultList1 and ResultList2, and amalgamation result joins among the ResultList; Execution in step C11;
C10, use difference logic merge ResultList1 and ResultList2, and amalgamation result joins among the ResultList; Execution in step C11;
C11, judge among the TermArray whether to also have etymology not use; If, execution in step C12 then; Otherwise, execution in step C13;
C12, establish i=i+1, read i etymology among the TermArray; Empty ResultList1 and ResultList2, and the record among the ResultList is copied among the ResultList2; Retrieve described index inverted list according to the etymology that reads, from described index inverted list, read the record in the j section read range in the lists of documents corresponding with the etymology that reads, the record that reads is stored among the ResultList1; Return execution in step C6;
C13, judge whether to satisfy termination condition; If, execution in step C16 then; Otherwise, execution in step C14;
C14, empty ResultList1 and ResultList2; Read the first two words source among the TermArray, and i=2 and j=j+1 are set; According to above-mentioned two etymologies difference search index inverted lists that read, from described index inverted list, read the record in the j section read range in the lists of documents corresponding with described two etymologies respectively, be stored in respectively among ResultList1 and the ResultList2;
C15, the size order of the record among described ResultList1 and the ResultList2 being pressed KFScore are respectively arranged; Return execution in step C6;
C16, with the number of the record stored in the set of total result for retrieval as the correlated results sum;
C17, return Search Results and correlated results sum to the user.
8. method according to claim 7 is characterized in that,
When Y=2, with the ordered list of lists of documents as the 1st section read range, with the entire document tabulation as the 2nd section read range.
9. method according to claim 7 is characterized in that,
When Y=3, with the ordered list of lists of documents as the 1st section read range, with the unordered list of lists of documents as the 2nd section read range, with the entire document tabulation as the 3rd section read range.
10. method according to claim 7 is characterized in that, also further comprises among the step C5:
If do not store any one etymology in described two etymologies in the index inverted list, the corresponding result for retrieval tabulation of the etymology of then not storing is set to empty set.
11. method according to claim 7 is characterized in that, described use and logic merge ResultList1 and ResultList2, and amalgamation result joined among the ResultList comprise:
Compare each the bar record among ResultList1 and the ResultList2 one by one, KFScore two records identical with Did in two result for retrieval tabulations are joined among the ResultList as a Search Results.
12. method according to claim 7 is characterized in that, described use or logic merge ResultList1 and ResultList2, and amalgamation result is joined among the ResultList and can comprise:
Each bar record among described ResultList1 and the ResultList2 is joined among the ResultList, if there is the KFScore of two records all identical with Did, then in ResultList, delete a record wherein.
13. method according to claim 7 is characterized in that, described use difference logic merges ResultList1 and ResultList2, and amalgamation result is joined among the ResultList and can comprise:
Remove each bar record of storing among the ResultList2 from described ResultList1, the record that will remove among the ResultList1 after the operation joins among the ResultList.
14. method according to claim 7 is characterized in that, also further comprises among the described step C12:
If do not store above-mentioned i the etymology that reads in the index inverted list, then described ResultList1 is set to empty set, returns execution in step C6 again.
15. method according to claim 7 is characterized in that, described termination condition is:
The maximal value of the number great-than search range of results of the record of storing among the ResultList; Perhaps, j equals Y.
16. method according to claim 7 is characterized in that, describedly reads required record comprise as Search Results from described ResultList:
When SumCount 〉=(InitialPositon+Count)-1, the InitialPositon bar start-of-record from described ResultList reads Count bar record as Search Results;
When InitialPositon≤SumCount<(InitialPositon+Count)-1, the InitialPositon bar start-of-record from described ResultList reads (SumCount-InitialPositon+1) bar record as Search Results;
When SumCount<InitialPositon, with empty set as Search Results.
17. method according to claim 1 is characterized in that, described step C comprises:
Set in advance the multilayer search procedure according to predefined searching order rule, and the priority of each layer search procedure and the hierarchical search score value scope of each layer search procedure are set;
When the user imports searching character string, carry out each layer search in proper order according to the height of priority.
18. method according to claim 17 is characterized in that, describedly sets in advance the multilayer search procedure according to predefined searching order rule, and the priority that each layer search procedure be set comprises:
In the music searching field, three layers of following search procedure are set: accurate layer search procedure, spelling layer search procedure and participle layer search procedure; And the priority of each layer search procedure by order from high to low is: accurate layer search procedure, spelling layer search procedure, participle layer search procedure;
19. method according to claim 18 is characterized in that, this method also further comprises:
Every layer of search procedure is divided into a plurality of sublayers search procedure, and sets the priority of each sublayer search procedure.
20. method according to claim 17 is characterized in that, the described hierarchical search score value scope that each layer search procedure is set comprises:
When the setting search procedure that haves three layers, and the priority of each layer search procedure by from high to low order is: when ground floor search procedure, second layer search procedure, the 3rd layer of search procedure,
The hierarchical search score value scope of ground floor search procedure is made as: [A1, A2]; The hierarchical search score value scope of second layer search procedure is made as: [B1, B2]; The hierarchical search score value scope of the 3rd layer of search procedure is made as: [C1, C2]; Wherein, A1>A2>B1>B2>C1>C2.
21. method according to claim 17 is characterized in that, and is described when the user imports searching character string, carries out each layer search in proper order according to the height of priority and comprises:
Each layer search procedure that Z1, basis set in advance resolved the searching character string of user's input, generates etymology formation and the query logic sequence corresponding with each layer search procedure respectively
Z2, the Search Results scope of user input is resolved, obtain reference position and bar number;
Z3, according to priority preset, current search procedure unenforced and that priority is the highest is defined as the current search process;
Z4, the index inverted list is retrieved according to etymology formation and the query logic sequence corresponding with the current search process, be stored in result for retrieval in the set of layering result for retrieval and obtain current layering result for retrieval sum
Z5, according to the KFScore of each result for retrieval and the hierarchical search score value scope of current search process, calculate the hierarchical search score value of each result for retrieval in the set of layering result for retrieval.
Z6, the size of the result for retrieval in the layering result for retrieval set according to the hierarchical search score value joined in total result for retrieval set, and the result for retrieval that repeats of deletion; Calculate current total result for retrieval sum, and empty described layering result for retrieval set;
Z7, judge whether to satisfy termination condition; If, execution in step Z8 then; Otherwise, return execution in step Z3;
Z8, from described total result for retrieval set, read the result for retrieval of required number as Search Results according to the Search Results scope; With current total result for retrieval sum as correlated results sum SumCount;
Z9, return Search Results and Search Results sum to the user.
22. method according to claim 21 is characterized in that, and is described according to the KFScore of each result for retrieval and the hierarchical search score value scope of current search process, the hierarchical search score value that calculates each result for retrieval in the set of layering result for retrieval comprises:
According to each result for retrieval in the hierarchical search score value scope of current search process and the set of layering result for retrieval according to KFScore putting in order from big to small, for each result for retrieval arranges corresponding hierarchical search score value.
23. method according to claim 21 is characterized in that, described termination condition can for:
The maximal value of current total result for retrieval sum great-than search range of results; Perhaps, the current search process is last one deck search procedure.
24. method according to claim 21 is characterized in that, the described result for retrieval that reads required number according to the Search Results scope from described total result for retrieval set comprises as Search Results:
When SumCount 〉=(InitialPositon+Count)-1, the InitialPositon bar start-of-record from described total result for retrieval set reads Count bar record as Search Results;
When InitialPositon≤SumCount<(InitialPositon+Count)-1, the InitialPositon bar start-of-record from described total result for retrieval set reads (SumCount-InitialPositon+1) bar record as Search Results;
When SumCount<InitialPositon, with empty set as Search Results.
25. a searcher is characterized in that, this searcher comprises: inverted list generation module, memory module, etymology generation module and retrieval module;
Described inverted list generation module, being used for whole documents to be retrieved is that key word carries out index with etymology Term, and foundation is key word index with Term, be the index inverted list of value with the total TotalCount of the document that comprises this Term and the lists of documents DocList that comprises this Term; Include document code and the determinant attribute score value of a document in every record in the described lists of documents; Described lists of documents is made up of ordered list and unordered list, comprises n determinant attribute score value maximum and the record of arranging by determinant attribute score value descending order in the described ordered list; Wherein, described n is predetermined value; Described index inverted list is sent to memory module;
Described memory module is used for the described index inverted list of storage;
Described etymology generation module is used for generating corresponding etymology according to the searching character string of user's input; The Search Results scope of described etymology and user's input is sent to retrieval module;
Described retrieval module, be used for retrieving according to the index inverted list that the etymology that generates is stored memory module, and preferentially from the ordered list of the etymology correspondence that generates, obtain record according to the Search Results scope of user input, to obtain required Search Results and correlated results sum; Described Search Results and correlated results sum are sent to the user.
CN201110461128.4A 2011-12-30 2011-12-30 A kind of searching method and device Active CN103186650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110461128.4A CN103186650B (en) 2011-12-30 2011-12-30 A kind of searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110461128.4A CN103186650B (en) 2011-12-30 2011-12-30 A kind of searching method and device

Publications (2)

Publication Number Publication Date
CN103186650A true CN103186650A (en) 2013-07-03
CN103186650B CN103186650B (en) 2016-05-25

Family

ID=48677819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110461128.4A Active CN103186650B (en) 2011-12-30 2011-12-30 A kind of searching method and device

Country Status (1)

Country Link
CN (1) CN103186650B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649650A (en) * 2016-12-10 2017-05-10 宁波思库网络科技有限公司 Demand information two-way matching method
CN106649302A (en) * 2015-10-28 2017-05-10 腾讯科技(深圳)有限公司 Search sequencing method and device
CN106909647A (en) * 2017-02-21 2017-06-30 福建榕基软件股份有限公司 A kind of data retrieval method and device
CN109388690A (en) * 2017-08-10 2019-02-26 阿里巴巴集团控股有限公司 Text searching method, inverted list generation method and system for text retrieval
WO2019080412A1 (en) * 2017-10-27 2019-05-02 平安科技(深圳)有限公司 Data service method, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210006A1 (en) * 2004-03-18 2005-09-22 Microsoft Corporation Field weighting in text searching
US20080288483A1 (en) * 2007-05-18 2008-11-20 Microsoft Corporation Efficient retrieval algorithm by query term discrimination
CN101460949A (en) * 2006-06-01 2009-06-17 微软公司 Indexing documents for information retrieval based on additional feedback fields
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval
US20110022600A1 (en) * 2009-07-22 2011-01-27 Ecole Polytechnique Federale De Lausanne Epfl Method of data retrieval, and search engine using such a method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210006A1 (en) * 2004-03-18 2005-09-22 Microsoft Corporation Field weighting in text searching
CN101460949A (en) * 2006-06-01 2009-06-17 微软公司 Indexing documents for information retrieval based on additional feedback fields
US20080288483A1 (en) * 2007-05-18 2008-11-20 Microsoft Corporation Efficient retrieval algorithm by query term discrimination
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval
US20110022600A1 (en) * 2009-07-22 2011-01-27 Ecole Polytechnique Federale De Lausanne Epfl Method of data retrieval, and search engine using such a method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
励子闰等: "基于全文检索引擎的信息检索技术的应用研究", 《计算机与数字工程》, vol. 36, no. 9, 31 December 2008 (2008-12-31), pages 81 - 85 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649302A (en) * 2015-10-28 2017-05-10 腾讯科技(深圳)有限公司 Search sequencing method and device
CN106649650A (en) * 2016-12-10 2017-05-10 宁波思库网络科技有限公司 Demand information two-way matching method
CN106649650B (en) * 2016-12-10 2020-08-18 宁波财经学院 Bidirectional matching method for demand information
CN106909647A (en) * 2017-02-21 2017-06-30 福建榕基软件股份有限公司 A kind of data retrieval method and device
CN106909647B (en) * 2017-02-21 2020-01-03 福建榕基软件股份有限公司 Data retrieval method and device
CN109388690A (en) * 2017-08-10 2019-02-26 阿里巴巴集团控股有限公司 Text searching method, inverted list generation method and system for text retrieval
WO2019080412A1 (en) * 2017-10-27 2019-05-02 平安科技(深圳)有限公司 Data service method, electronic device and storage medium

Also Published As

Publication number Publication date
CN103186650B (en) 2016-05-25

Similar Documents

Publication Publication Date Title
US20220261427A1 (en) Methods and system for semantic search in large databases
US8554854B2 (en) Systems and methods for identifying terms relevant to web pages using social network messages
US9846744B2 (en) Media discovery and playlist generation
Lu et al. Annotating structured data of the deep Web
US7672943B2 (en) Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling
CN101692223B (en) Refined Search space is inputted in response to user
US8037051B2 (en) Matching and recommending relevant videos and media to individual search engine results
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
US20080250039A1 (en) Discovering and scoring relationships extracted from human generated lists
US10430448B2 (en) Computer-implemented method of and system for searching an inverted index having a plurality of posting lists
JP2010541092A5 (en)
AU2006255181A1 (en) Relationship networks
EP2631815A1 (en) Method and device for ordering search results, method and device for providing information
JP6056610B2 (en) Text information processing apparatus, text information processing method, and text information processing program
EP2224360A1 (en) Generating a dictionary and determining a co-occurrence context for an automated ontology
Schedl et al. A music information system automatically generated via web content mining techniques
CN108280689A (en) Advertisement placement method, device based on search engine and search engine system
CN103186650A (en) Searching method and device
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
Ajoudanian et al. Deep web content mining
KR20020089677A (en) Method for classifying a document automatically and system for the performing the same
JP5315726B2 (en) Information providing method, information providing apparatus, and information providing program
JP6260678B2 (en) Information processing apparatus, information processing method, and information processing program
Bansal et al. Ad-hoc aggregations of ranked lists in the presence of hierarchies
CN113268683A (en) Academic literature recommendation method based on multiple dimensions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant