CN103186650B - A kind of searching method and device - Google Patents

A kind of searching method and device Download PDF

Info

Publication number
CN103186650B
CN103186650B CN201110461128.4A CN201110461128A CN103186650B CN 103186650 B CN103186650 B CN 103186650B CN 201110461128 A CN201110461128 A CN 201110461128A CN 103186650 B CN103186650 B CN 103186650B
Authority
CN
China
Prior art keywords
search
list
etymology
retrieval
search results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110461128.4A
Other languages
Chinese (zh)
Other versions
CN103186650A (en
Inventor
简勤
郭正平
陈健骥
何丹
赖航
肖巍
郑长松
王全礼
杨俊拯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Sichuan Co Ltd
Original Assignee
China Mobile Group Sichuan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Sichuan Co Ltd filed Critical China Mobile Group Sichuan Co Ltd
Priority to CN201110461128.4A priority Critical patent/CN103186650B/en
Publication of CN103186650A publication Critical patent/CN103186650A/en
Application granted granted Critical
Publication of CN103186650B publication Critical patent/CN103186650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a kind of searching method and device. Searching method wherein comprises: be same class document setup determinant attribute and determinant attribute weight, calculate the determinant attribute score value of each document; Set up index inverted list; In every record in lists of documents in index inverted list, include document code and the determinant attribute score value of a document; Described lists of documents is made up of ordered list and unordered list, the record that described ordered list comprises n determinant attribute score value maximum and arranges by determinant attribute score value descending order; Wherein, described n is predetermined value; Generate corresponding etymology according to the searching character string of user's input, according to generated etymology, described index inverted list is retrieved, and preferentially from ordered list corresponding to generated etymology, obtain record according to the Search Results scope of user's input, to obtain required Search Results and correlated results sum. Application the present invention can improve search speed, reduces taking of system resource.

Description

A kind of searching method and device
Technical field
The present invention relates to data service technical field, relate in particular to a kind of searching method and device.
Background technology
Existing search engine is all generally unilaterally to sort based on text similarity, the storage of indexBe generally based on key-value pair<KEY, DocList>form. Wherein, KEY represents keyword, DocListRepresent the lists of documents that comprises keyword KEY. Each element in DocList is a document object,For example, for depositing the essential information of a document: ID, this keyword KEY of the document are at this articleThe information such as number of times and the position of appearance occurring in shelves. After user inputs a keyword, will be firstRetrieve according to this keyword; When retrieving after corresponding keyword, then overall calculation and this keyThe score of all documents under this keyword in the corresponding DocList of word; Then, according to above-mentioned scoreAll Search Results entirety is sorted, then read required of user in Search Results after sequenceThe Search Results of number, returns to user by the Search Results reading.
For example, in prior art, there is a kind of sort method of integrative searching result, in the method, willAdopt sort algorithm to calculate the integrated value of vertical search engine, and according to comprehensively directly to this vertical search engineSort, gather all vertical search engine ranking results, generate final Search Results.
But above-mentioned sort method is that the computational process of Search Results score is placed in search procedure and is carried out,And be all to carry out full dose sequence, thereby larger to the occupancy of CPU and memory source, search is complicated at every turnSpend highlyer, search speed is slower, and does not consider the surcharge attribute of information, thereby cannot guaranteed rateBe worth high information sorting necessarily forward. Meanwhile, due in the method, need to store score basis because ofSon and integrate score calculated factor, thus memory space is also had to higher requirement.
In addition, in prior art, also there is a kind of search result ordering method based on search engine. At thisIn method, be mainly the Internet resources weight according to configured in advance, simultaneously according to the keyword of user's inputText weight in resource, every related resource score of COMPREHENSIVE CALCULATING, then carries out full dose sequence, withGenerate Search Results. But, in the method, although consider the surcharge attribute of information,In search procedure, need to calculate each document score, also need the result of search to carry out full dose sequence simultaneously,Thereby also larger to the occupancy of CPU and memory source, search complexity is also higher, search speed alsoSlowly. Meanwhile, owing to also needing the factor and the integrate score of storing score basis to calculate in the methodThe factor, thus memory space is also had to higher requirement
In summary, in searching method in the prior art, due to the calculating of document score and sequence allBe to be placed in search procedure to complete, and be to sort for full dose; And the structure of index organization alsoThe search custom of not considering user, causes canned data amount larger; In addition, each search all needs pinFull dose is carried out, thereby greatly increased the burden of the system such as internal memory and CPU scarce resource. Meanwhile, existingHave the search complexity of the searching method in technology generally all higher, search response speed is also slower. In addition,In searching method of the prior art, sort according to being text-dependent degree, and consider can expressThe determinant attribute of information value, therefore causes the problems such as sortord is single, user friendly is not enough.
Summary of the invention
In view of this, the invention provides a kind of searching method and device, thereby can improve search speed,Reduce taking of system resource.
The technical solution used in the present invention is specifically achieved in that
A kind of searching method, the method comprises:
A, be at least one determinant attribute of same class document setup and corresponding determinant attribute weight, and rootAccording to the determinant attribute score value KFScore of described determinant attribute and each document of determinant attribute weight calculation;
B, whole documents to be retrieved are carried out to index taking etymology Term as keyword, set up taking Term asKey word index, with the total TotalCount of the document that comprises this Term and the document that comprises this TermList DocList is the index inverted list of value; In every record in described lists of documents, include oneThe document code of document and determinant attribute score value; Described lists of documents is made up of ordered list and unordered list,Described ordered list comprises n determinant attribute score value maximum and suitable from big to small by determinant attribute score valueThe record that order is arranged; Wherein, described n is predetermined value;
C, the searching character string of inputting according to user generate corresponding etymology, according to generated etymology pairDescribed index inverted list is retrieved, and the Search Results scope of inputting according to user is preferentially from generatedIn ordered list corresponding to etymology, obtain record, to obtain required Search Results and correlated results sum.
A kind of searcher is also provided in the present invention, and this searcher comprises: inverted list generation module,Memory module, etymology generation module and retrieval module;
Described inverted list generation module, for entering whole documents to be retrieved taking etymology Term as keywordLine index, sets up taking Term as key word index, with the total TotalCount of the document that comprises this TermWith the lists of documents DocList that comprises this Term index inverted list that is value; Every in described lists of documentsIn bar record, include document code and the determinant attribute score value of a document; Described lists of documents is by orderList and unordered list composition, described ordered list comprises n determinant attribute score value maximum and by passThe record that key attribute score value descending order is arranged; Wherein, described n is predetermined value; By instituteState index inverted list and send to memory module;
Described memory module, for storing described index inverted list;
Described etymology generation module, for generating corresponding etymology according to the searching character string of user's input;The Search Results scope of described etymology and user's input is sent to retrieval module;
Described retrieval module, for the index inverted list of memory module being stored according to generated etymologyRetrieve, and according to the Search Results scope of user input preferentially from generated etymology corresponding in orderIn list, obtain record, to obtain required Search Results and correlated results sum; By described Search ResultsSend to user with correlated results sum.
As seen from the above technical solution, in the present invention due to be document setup determinant attribute and determinant attributeWeight, therefore can calculate the KFScore of each document, and set up one according to the KFScore of each documentIndividual index inverted list, makes the lists of documents in this index inverted list be made up of ordered list and unordered list.In the time of the searching character string of user input, can generate corresponding etymology, and according to generated etymology to ropeDraw inverted list and retrieve, the Search Results scope of inputting according to user is preferentially from generated etymology correspondenceOrdered list in obtain record, to obtain required Search Results and correlated results sum, thus can beEnsure under the prerequisite of correlation (being validity) of Search Results, the value of the various information of Multi-dimensional Scaling,To make the rank of valuable information forward, thereby can greatly improve search speed, improve search engineThe search response time; Meanwhile, also can reduce taking of the system resource such as CPU and internal memory, thereby saveA large amount of hardware and software resources.
Brief description of the drawings
Fig. 1 is the flow chart of the searching method in the present invention.
Fig. 2 is the information structure diagram of document in the present invention.
Fig. 3 is the structural representation of index inverted list in the present invention.
Fig. 4 is the flow chart of single Term searching method in the present invention.
Fig. 5 is the structural representation of the index inverted list in instantiation of the present invention.
Fig. 6 is the flow chart of many Term searching method in the present invention.
Fig. 7 is the flow chart of multilayer searching method in the present invention.
Fig. 8 is the flow chart of a kind of implementation method of step 702 in the present invention.
Fig. 9 is the structural representation of the searcher in the present invention.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention express clearlyer, below in conjunction with attachedThe present invention is further described in more detail for figure and specific embodiment.
Fig. 1 is the flow chart of the searching method in the present invention.
As shown in Figure 1, the method comprises:
Step 101, at least one determinant attribute of same class document setup (KeyField) is also established in advancePut the determinant attribute weight of each determinant attribute.
In data service field, the information aggregate of describing a things can be referred to as to a document(Document). Fig. 2 is the information structure diagram of document in the present invention. As shown in Figure 2, in the present inventionTechnical scheme in, each document comprises multiple etymologies (Term) and multiple attribute (Field);Wherein, described attribute can be made up of one or more Term. For example, in a document, can compriseThe attribute such as " title ", " content ", " title " attribute wherein can be by one or more TermComposition. Again for example, the attribute 1 in Fig. 2 is made up of N etymology, and attribute J is made up of M etymology etc.
In addition, in technical scheme of the present invention, also will be one or more for mark for each document setupKnow the determinant attribute of the important information of the document, and same class document has identical determinant attribute. As followsThe table 1 example explanation that is determinant attribute in the present invention.
Document classification Determinant attribute 1 Determinant attribute 2 ......
Commodity class document Commodity price Buy number of times ......
Paper class document Quote number of times Download time ......
Music class document Audition number of times Download time ......
...... ...... ...... ......
Table 1
As described in Table 1, for commodity class document, can commodity price and/or buy number of times and be set to closeKey attribute; For paper class document, can will quote number of times and/or download time as determinant attribute; RightIn music class document, can audition number of times and/or download time be set to determinant attribute.
After determinant attribute is set, will be also each set determinant attribute according to practical situationsPre-determine corresponding weights W eight (can be described as determinant attribute weight).
Step 102, according to the determinant attribute of described determinant attribute and each document of determinant attribute weight calculationScore value (KFScore).
After above-mentioned determinant attribute and determinant attribute weight are set, can be according to set determinant attributeDeterminant attribute score value (KFScore) with each document of determinant attribute weight calculation.
For example, can calculate by formula as described below (1) KFScore of a document:
KFScore=KeyField1*W1+KeyField2*W2+......+KeyFieldX*WX
Wherein, W1+W2+......+WX=1(1)
Described KeyField1Represent the value of the 1st KeyField of the document, described W1Represent and this articleThe value of the 1st Weight that KeyField is corresponding of shelves, described KeyFieldxRepresent described documentThe value of x KeyField, described WxRepresent the Weight corresponding with x KeyField of the documentValue, the rest may be inferred.
Step 103, by whole documents to be retrieved, taking Term as keyword, (Key) carries out index, sets upTaking Term as key word index, with the sum (TotalCount) of the document that comprises this Term with compriseThe lists of documents (DocList) of this Term is the index inverted list InvertIndexList of value (Value).
In this step, will set up an index inverted list. Specifically, will set up one with TermFor key word index, with key-value pair<TotalCount, DocList>be the index inverted list of value, this indexInverted list can be described as InvertIndexList.
Fig. 3 is the structural representation of index inverted list in the present invention. As shown in Figure 3, the rope in the present inventionDraw inverted list InvertIndexList and comprise one or more record, every record all can key-value pair<Key, Value>mode stores, and can comprise two fields: keyword (Key) field in every recordAnd key assignments (Value) field. Wherein, described Key field is used for storing the corresponding Term of document,Value field is used for storing the key-value pair < TotalCount corresponding with the Term of described Key field,DocList >. Wherein, DocList is the lists of documents that comprises this Term, and TotalCount is DocListIn total number of documents.
As shown in Figure 3, in specific embodiments of the invention, described DocList can be chained list, andTotalCount is the element sum in this chained list. Described chained list comprises many records, every recordAll with key-value pair<KFScore, Did>mode stores, and it is corresponding that each key-value pair comprises institute corresponding to oneThe document of Term. Wherein, described KFScore represents the determinant attribute score value of the document, can pass throughFormula (1) in step 102 calculates; Described Did represents the mark of the document, for example, and literary compositionThe numbering of shelves.
In addition, in specific embodiments of the invention, in index inverted list InvertIndexListEach DocList, all can carry out the record in same DocList according to the size of KFScore in advancePartial ordered, elder generation determines the record of front n KFScore maximum in this DocList, and described in inciting somebody to actionThe record of front n KFScore maximum is pressed KFScore size order and is arranged, and forms an ordered list(OrderedList); Other record unordered list of composition (DisorderedList) in this DocList.Therefore, in specific embodiments of the invention, each DocList is all by ordered list and unordered list groupBecome, described ordered list comprise n determinant attribute score value maximum and by determinant attribute score value from greatly toLittle tactic record. Wherein, described n is predetermined value, and therefore, described n isThe length value of OrderedList.
Hence one can see that, in specific embodiments of the invention, without all records in DocList allSort by KFScore size, and only need carry out partial ordered, n KFScore before only needing to determineMaximum record, and the record of front n KFScore maximum (being the record in ordered list) is pressedKFScore size sorts, and other record in this DocList (is the note in unordered listRecord) can not sort.
Step 104, generates corresponding etymology according to the searching character string of user's input, according to generatedEtymology is retrieved described index inverted list, and the Search Results scope of inputting according to user is preferentially from instituteIn ordered list corresponding to etymology generating, obtain record, to obtain required Search Results and correlated resultsSum.
In technical scheme of the present invention, after setting up index inverted list by above-mentioned steps 103,Can retrieve described index inverted list according to searching character string and the Search Results scope of user's input,To obtain required Search Results and correlated results sum.
Specifically, first can generate corresponding etymology according to the searching character string of user's input.
For example: the searching character string of inputting as user is " Liu Dehua ", can be according to different conversionsRule converts this searching character string " Liu Dehua " to corresponding etymology: " Liu Dehua " or " liudehua ".
The searching character string of inputting as user is " Liu De China providence ", can be according to different conversionsRule converts this searching character string " Liu De China providence " to corresponding etymology: " Liu De China providence "Or " liudehuatianyi "; Or can this searching character string " Liu De China providence " be converted to correspondingTwo etymologies: " Liu Dehua " and " providence ".
After generating corresponding etymology, can retrieve described index inverted list according to this etymology, sentenceIn disconnected described index inverted list, whether store above-mentioned etymology.
In the time storing this etymology in described index inverted list, can read this etymology correspondingDocList, and the Search Results scope of inputting according to user is preferentially from the ordered list of described DocListObtain record, to obtain required Search Results and correlated results sum.
Wherein, described Search Results scope is the scope at the desired Search Results place of user. Therefore,Described Search Results scope comprises: original position (InitialPositon) and number (Count), described inThe maximum of Search Results scope is: (InitialPositon+Count)-1. Wherein, described initialPosition is the original position of Search Results, the number that described number is Search Results.
For example, after user inputs searching character string and searches for, will check and search by the form of webpageHitch fruit. Taking each web displaying Count=30 bar Search Results as example, when user wishes to check the 1stWhen the Search Results of page, InitialPositon=1, Count=30, described Search Results scope is: theArticle 1~30, Search Results; And click " lower one page " after user checks the Search Results of the 1st page,While wishing to check the Search Results of the 2nd page, InitialPositon=31, Count=30, described searchRange of results is: 31st~60 articles of Search Results. Can analogize according to this.
Therefore,, in specific embodiments of the invention, the described Search Results scope according to user's input is excellentFirst from the ordered list of described DocList, obtain record, can comprise to obtain required Search Results:
In the time that the maximum of described Search Results scope is less than or equal to the length n of ordered list, nowSearch Results scope is contained in described ordered list, therefore, can be directly from ordered listInitialPositon bar recording start, reads Count bar and records as Search Results, and without reading nothingRecord in sequence table. All press the size order of KFScore due to the record in ordered list and arranged,Therefore the Count bar record that read is the Search Results of arranging by the size order of KFScore.
When the original position of described Search Results scope be arranged in ordered list (be InitialPositon≤n),And when the maximum of described Search Results scope is greater than the length n of ordered list, Search Results model nowA part of enclosing is arranged in ordered list, and another part is arranged in unordered list, now, and can be from there being sequenceInitialPositon article of recording start in table, reads (n-InitialPositon+1) bar record, soAfter the record in described unordered list is pressed to KFScore size order arrange, and from described unordered listIn the 1st article of recording start, read [Count-(n-InitialPositon+1)] bar record; By readThe Count bar of getting records as Search Results. Now, the Count bar record reading be also byThe Search Results that the size order of KFScore is arranged.
When the original position of described Search Results scope be arranged in unordered list (be InitialPositon > n)Time, Search Results scope now is all arranged in described unordered list, therefore, and can be by described unordered listIn the record size order of pressing KFScore arrange, then from the unordered list sequence the(InitialPositon-n) bar recording start, reads Count bar and records as Search Results. Now,The Count bar record reading is also the Search Results of arranging by the size order of KFScore.
By above-mentioned method, can obtain required Search Results.
Consider user's search custom, generally user only can be concerned about that forward tens of of sequence searchHitch fruit. The analysis of statistical results of the search custom according to authoritative institution to user is known, exceedes 90%General first two pages (being generally the Search Results of 20 left and right) only can checking Search Results of user. Therefore,In the situation that search data amount is several hundred million rank, if the value of the length n of ordered list exceedes necessarilyValue (for example, n=100), in most situations, the Search Results scope that user inputsBe included among ordered list, thus can directly from ordered list, obtain required number and byThe Search Results that the size order of KFScore is arranged, and without again Search Results being sorted, requiredMemory headroom is less, and CPU operation times is less, and the response time is higher, and the system resource taking is less,Thereby can greatly improve the search response time of search engine, also can save a large amount of hardware and software simultaneouslyResource.
Further, in specific embodiments of the invention, when not storing root in described index inverted listGenerate according to the searching character string of user input etymology time, can return to empty set as search knot to userReally, represent not search relevant document.
In addition, in technical scheme of the present invention, above-mentioned steps 104 can have multiple implementation,Can realize above-mentioned step 104 with multiple searching method. Below by with multiple specific embodimentsMode is introduced respectively various searching methods.
Embodiment mono-: single Term searching method.
In the present embodiment, described single Term searching method refers to according to the searching character string of user's inputThe method of searching for while only generating a corresponding etymology. Fig. 4 is single Term searcher in the present inventionThe flow chart of method. As shown in Figure 4, described single Term searching method can comprise step as described below:
Step 401, generates a corresponding etymology according to the searching character string of user's input.
In this step, can, using the searching character string of user's input as a word, directly this word be doneFor the corresponding etymology of this searching character string. For example, the searching character string of inputting as user is " Liu Dehua "Time, generate the etymology corresponding with this searching character string: " Liu Dehua ".
Step 402, according to generated etymology search index inverted list, judges in described index inverted listWhether store this etymology; If so, perform step 403; Otherwise, execution step 415.
Step 403 reads the lists of documents corresponding with described etymology from described index inverted list.
Because the corresponding value of each etymology (Value) in index inverted list comprises a document columnTable (DocList), therefore, can be according to above-mentioned generated etymology from described index inverted list in stepIn read the lists of documents corresponding with described etymology.
Step 404, resolves the Search Results scope of user's input, obtains original position and number.
In this step, can resolve by the Search Results scope that user is inputted, obtain initialPosition InitialPositon and number Count.
Step 405, whether judgement (InitialPositon+Count)-1 > n sets up, and if so, holdsRow step 406; Otherwise, execution step 412;
Because the maximum of described Search Results scope is (InitialPositon+Count)-1, therefore existIn this step, will first judge whether the maximum of Search Results scope is greater than the ordered list of lists of documentsLength n, judgement (InitialPositon+Count)-1 > n whether set up.
In the time that the maximum of Search Results scope is greater than the length of ordered list of lists of documents, institute is describedState the scope that Search Results scope has exceeded ordered list, now, only from ordered list, cannot readGet the Search Results of whole required numbers, therefore, will continue execution step 406;
In the time that the maximum of Search Results scope is less than or equal to the length of ordered list of lists of documents,Illustrate that described Search Results scope does not exceed the scope of ordered list, now, from ordered list, be onlyCan read all required Search Results, therefore can perform step 412, from ordered list, directly obtainGet the Search Results of required number.
Step 406, judges that whether InitialPositon > n sets up, and if so, performs step 410;Otherwise, execution step 407;
In this step, will judge whether the InitialPositon in Search Results scope is greater than ordered listLength n.
In the time that the InitialPositon in Search Results scope is greater than the length of ordered list, described in explanationThe InitialPositon of Search Results scope has been arranged in unordered list, and described Search Results is all positioned at nothingIn sequence table, in described ordered list, do not comprise Search Results, therefore can perform step 410;
In the time that the InitialPositon in Search Results scope is less than or equal to the length of ordered list, sayThe InitialPositon of bright described Search Results scope is still arranged in ordered list, has part Search ResultsBe arranged in ordered list, but also have part Search Results to be arranged in unordered list. Now can perform step407。
Step 407, InitialPositon article of recording start from ordered list, reads (n-InitialPositon+1) bar record joins in search result list (ResultList).
Step 408, presses the record in described unordered list the size order of KFScore and arranges.
Step 409 reads [Count-(n-InitialPositon+1)] bar note from described unordered listRecord joins in ResultList, execution step 413.
Step 410, presses the record in described unordered list the size order of KFScore and arranges.
Step 411, (InitialPositon-n) article recording start from the unordered list sequence,Read Count bar record and join in ResultList, execution step 413.
Step 412, InitialPositon article of recording start from ordered list, reads Count barRecord joins in ResultList, execution step 413.
Step 413 using described ResultList as Search Results, and will be read from described index inverted listThe total number of documents TotalCount of the lists of documents corresponding with described etymology of getting is as correlated results sumSumCount。
Step 414, returns to user by described Search Results and correlated results sum. Process ends.
Step 415, using empty set as Search Results, establishes SumCount=0, by described empty set and relevantResult sum returns to user. Process ends.
By above-mentioned step 401~415, can realize single Term search.
Be example by the instantiation taking music searching field below, above-mentioned single Term searching method is enteredRow is further introduced.
Instantiation one:
Suppose that music document to be retrieved in music searching field has 10,000,000, the music of each documentInformation comprises: song title (SongName), singer's name (SingerName), album name (AlbumName),The audition amount (ListenCount) of the lyrics, song and portfolio (BusinessCount).
In example of the present invention, can described ListenCount and BusinessCount be set to above-mentionedThe determinant attribute of music document, determinant attribute weight is set to respectively 0.3 and 0.7, each music documentDeterminant attribute score value (KFScore) can calculate by formula as described below:
KFScore=ListenCount*0.3+BusinessCount*0.7
Can set up index according to above-mentioned music document to be retrieved, determinant attribute and determinant attribute score value falls to arrangeTable I nvertIndexList, and the length n of ordered list in can index inverted list is set to 100.
Fig. 5 is the structural representation of the index inverted list in instantiation of the present invention. As shown in Figure 5, wordThe corresponding value in source " Liu Dehua " is:<10000, DocList>, therefore, with etymology " Liu Dehua "Total number of documents TotalCount=10000 in corresponding DocList, has a sequence in this DocListThe length n=100 of table, ordered list comprises 100 records; The length of unordered list is 9900,Be that unordered list comprises 9900 records; In like manner, the corresponding value of etymology " I and you " is: < 9870,DocList >, therefore, the total number of documents TotalCount in the DocList corresponding with etymology " I and you "=9870; ...; The corresponding value of etymology " providence " is:<60, DocList>, therefore, with wordTotal number of documents TotalCount=60 in the corresponding DocList in source " providence "; Wherein, can be false againBe located in above-mentioned 60 documents, have 20 singers in document to be called " Liu Dehua ".
When the searching character string of user's input is " Liu Dehua ", and the Search Results scope of user's input is:Article 31st~50, when Search Results, can use single Term searching method as described below to search for:
Generate corresponding etymology " Liu Dehua " according to the searching character string of user's input; User is inputtedSearch Results scope is resolved, and knows original position InitialPositon=31, number Count=20;
According to the index inverted list shown in generated etymology " Liu Dehua " retrieval Fig. 5, fall to arrange at indexIn table, find the etymology corresponding with the etymology of above-mentioned generation " Liu Dehua "; The corresponding value of this etymologyFor:<10000, DocList>, therefore, the literary composition in the DocList corresponding with etymology " Liu Dehua "The total TotalCount=10000 of shelves, the length n=100 of the ordered list in this DocList, orderlyList comprises 100 records; The length of unordered list is 9900, and unordered list comprises 9900Bar record.
Due to (InitialPositon+Count)-1=31+20-1 < n=100, therefore Search Results modelEnclose and be contained in described ordered list, so direct the 31st article of recording start from ordered list readGet 20 records as Search Results; Meanwhile, correlated results adds up to SumCount=TotalCount=10000。
All press the size order of KFScore due to the record in ordered list and arranged, therefore readArticle 20, record is the Search Results of arranging by the size order of KFScore.
Embodiment bis-: many Term searching method.
In practical application scene, user likely can input multiple keywords and search for, and hope can be comparativelyAccurately locate the content of desired seek, meanwhile, some user also may have specifically result for retrievalRequirement, for example, need to search the information that not only comprises keyword A but also comprise keyword B, or need to look intoLook for the information that comprises keyword A or keyword B, or need to search and comprise keyword A but do not compriseInformation of keyword B etc. Therefore between multiple keywords that, user inputs also likely exist with(AND) or (OR) and poor (SUB) three kinds of logical operation situations.
In order to meet above-mentioned user's demand, in technical scheme of the present invention, also propose a kind of manyTerm searching method. In the present embodiment, described many Term searching method refers to according to user inputThe method of searching for when searching character string generates multiple corresponding etymology. Fig. 6 is many Term in the present inventionThe flow chart of searching method. As shown in Figure 6, described many Term searching method can comprise as described belowStep:
Step 600 arranges Y section read range in the lists of documents of index inverted list.
In specific embodiments of the invention, the strategy that adopts segmentation to read is retrieved, to keep away as far as possibleExempt to retrieve all at the enterprising line operate of full dose data at every turn. Therefore, in this step, will be at index inverted listLists of documents in set in advance Y section read range. Wherein, Y >=2.
For example, in the time of Y=2, read model by set in advance 2 sections in the lists of documents of index inverted listEnclose. Now, can be using the ordered list of lists of documents as the 1st section of read range, and by whole document columnTable is as the 2nd section of read range.
Again for example, in the time of Y=3, read set in advance 3 sections in the lists of documents of index inverted listScope. Now, can be using the ordered list of lists of documents as the 1st section of read range, by lists of documentsUnordered list is as the 2nd section of read range, and using whole lists of documents as the 3rd section of read range.
In specific embodiments of the invention, can also use other method setting according to practical situationsRead range, concrete method to set up does not repeat them here.
Step 601, generates an etymology queue and query logic order according to the searching character string of user's inputRow.
In this step, by the searching character string of user's input is resolved, generate an etymology queueTermArray and a query logic sequence SetOperators. Wherein, described etymology queue TermArrayIn can comprise x etymology, can be expressed as TermArray{Term1,Term2,...,Termx. DescribedQuery logic sequence SetOperators comprises (x-1) individual query logic symbol, can be expressed asSetOperators{symbol1,symbol2,...,symbolx-1. Wherein, each word in described etymology queueLogical relation between source is represented by corresponding query logic in described query logic sequence. For example,,Symbol in query logic sequence1Represent Term1With Term2Between logical relation, symbol2TableShow Term3With Term before1、Term2Between logical relation, the like. In addition, described in, look intoThe value of asking logic symbol is: AND, OR or SUB.
For example, when the searching character string of inputting as user is " Liu De China OR providence ", can be to above-mentionedSearching character string is resolved rear generation etymology queue TermArray{ Liu De China, providence } and an inquiryLogic sequence SetOperators{symbol1, wherein, symbol1Value be OR.
Step 602, resolves the Search Results scope of user's input, obtains original position and number.
In this step, can resolve by the Search Results scope that user is inputted, obtain initialPosition InitialPositon and number Count.
Step 603, judges whether the length of TermArray is greater than 1; If so, perform step 604;Otherwise, execution step 616.
In this step, whether the length that first judges TermArray is greater than to 1; If so, sayIn bright TermArray, at least comprise plural etymology, will continue execution step 604; If not,Illustrate in TermArray and only comprise and therefore can only use single Term searching method to carry out by an etymologySearch, and needn't use many Term searching method, therefore perform step 616.
From above-mentioned steps 603, in specific embodiments of the invention, described single Term searcherFormula is a kind of special circumstances of many Term way of search.
Step 604, reads the first two etymology in TermArray, and initial value is set is 2 variableI and initial value are 1 variable i.
In this step, will directly from TermArray, read the first two etymology, i.e. etymology Term1WithTerm2
In addition, can also set in advance in this step two variablees: the variable i that initial value is 2 and firstInitial value is 1 variable j. Wherein, it is i etymology that i can be used for representing current read, and jCan be used for representing the hop count of current read range, represent that current read range is j section read range.
Step 605, according to two above-mentioned read etymologies search index inverted list respectively, from described ropeDraw and in inverted list, read respectively in the lists of documents corresponding with described two etymologies in j section read rangeRecord, read record is stored in respectively to the first result for retrieval list (ResultList1) andIn two result for retrieval lists (ResultList2).
Specifically, can regard respectively two above-mentioned read etymologies as two independently in this stepEtymology, according to each etymology search index inverted list respectively. If store above-mentioned two in index inverted listIndividual etymology reads respectively and described two lists of documents that etymology is corresponding from described index inverted listIn record in j section read range. Then (be, Term by first read etymology1) phaseRecord in corresponding lists of documents in j section read range is stored in ResultList1, by readSecond etymology (be Term2) record in corresponding lists of documents in j section read range depositsBe stored in ResultList2.
In this step, j=1 now, that therefore read is in lists of documents the 1st section and reads modelRecord in enclosing. If in step 600, described the 1st section of read range is set to ordered list,Record in the ordered list that is lists of documents that now read.
Further, if do not store any one word in above-mentioned two etymologies in index inverted listSource, can the corresponding result for retrieval list of this etymology of not storing be set to empty set. For example, ifFirst etymology of not storing in index inverted list in above-mentioned two etymologies (is Term1), by this wordSource corresponding result for retrieval list ResultList1 is set to empty set; If not storage in index inverted listSecond etymology in above-mentioned two etymologies (is Term2), by corresponding this etymology result for retrievalList ResultList2 is set to empty set.
Step 606, reads (i-1) the individual logical symbol in query logic sequence SetOperatorssymboli-1
Step 607, the value of the logical symbol that judgement is read; When the value of described logical symbol is ANDTime, execution step 608; In the time that the value of described logical symbol is OR, execution step 609; When describedWhen the value of logical symbol is SUB, execution step 610.
Step 608, use and (AND) logic merge ResultList1 and ResultList2, merge knotFruit joins in search result list (ResultList); Execution step 611.
In this step, will use with logic and merge ResultList1 and ResultList2, and will merge knotFruit joins in ResultList.
Because every record in ResultList1 and ResultList2 all comprises two attributes: determinant attributeScore (KFScore) and document identification (Did), and the object of carrying out merging with logic is from twoIn result for retrieval list, find out the record that Did is identical. Because the particularity of KFScore is known, if twoThe KFScore of individual document is identical, and these two documents likely belong to same document, and if two literary compositionsThe KFScore difference of shelves is same document scarcely.
Based on above-mentioned reason, in specific embodiments of the invention, described use and logic mergeResultList1 and ResultList2, and amalgamation result is joined in ResultList and can be comprised:
Compare one by one each record in ResultList1 and ResultList2, by two result for retrieval listsTwo records that middle KFScore is identical with Did join search result list as a Search Results(ResultList) in.
Step 609, use or (OR) logic merge ResultList1 and ResultList2, amalgamation resultJoin in search result list (ResultList); Execution step 611.
In this step, will use or logic merging ResultList1 and ResultList2, and will merge knotFruit joins in ResultList.
For example, in specific embodiments of the invention, described use or logic merge ResultList1 andResultList2, and amalgamation result is joined in ResultList and can be comprised:
Each record in described ResultList1 and ResultList2 joined to search result list(ResultList) in, if there is the KFScore of two records all identical with Did, at ResultListA middle deletion record wherein.
Step 610, is used poor (SUB) logic to merge ResultList1 and ResultList2, merges knotFruit joins in search result list (ResultList); Execution step 611.
In this step, will use poor logic to merge ResultList1 and ResultList2, and will merge knotFruit joins in ResultList.
For example, in specific embodiments of the invention, the poor logic of described use merge ResultList1 andResultList2, and amalgamation result is joined in ResultList and can be comprised:
From described ResultList1, remove each the record of storing in ResultList2, will remove behaviourRecord in ResultList1 after work joins in search result list (ResultList).
Whether step 611, judge in TermArray and also have Term not use; If so, carry out stepRapid 612; Otherwise, execution step 613.
Step 612, establishes i=i+1, reads i etymology in TermArray; Empty ResultList1And ResultList2, and the record in ResultList is copied in ResultList2; According to readEtymology retrieve described index inverted list, from described index inverted list, read relative with read etymologyRecord in the lists of documents of answering in j section read range, is stored in ResultList1 by read recordIn; Return to execution step 606.
Specifically, i=i+1 can be first set in this step, to read the next one from TermArrayEtymology (i.e. i etymology); , also ResultList1 and ResultList2 are emptied meanwhile, and by itFront Search Results (being the record in ResultList) copies in ResultList2, so that rearIn continuous search procedure, Search Results before and the corresponding Search Results of next etymology reading are enteredRow logical operation.
Then, then according to read etymology search index inverted list. If stored in index inverted listAbove-mentioned read etymology reads the document column corresponding with described etymology from described index inverted listRecord in table in j section read range, and read record is stored in ResultList1, thenReturn to execution step 606.
In this step, if j=1 now, read in lists of documents the 1st section read modelRecord in enclosing. And if described the 1st section of read range is set to ordered list in step 600,Now read by the record in the ordered list of lists of documents. In like manner, if j=2 now,What read is the record in the 2nd section of read range in lists of documents. And if will in step 600Described the 2nd section of read range is set to whole lists of documents, and what now read will be whole document columnRecord in table. The like, do not repeat them here.
Further, if do not store i above-mentioned read etymology in index inverted list, canDescribed ResultList1 is set to empty set, then returns to execution step 606.
Step 613, judges whether to meet termination condition; If so, perform step 617; Otherwise,Execution step 614.
Wherein, in specific embodiments of the invention, described termination condition can be:
The maximum of the number great-than search range of results of the record of storing in ResultList:(InitialPositon+Count)-1; Or j equals Y.
In technical scheme of the present invention, when the number great-than search of the record of storing in ResultListThe maximum (InitialPositon+Count)-1 o'clock of range of results, represents to have retrieved enough numbersThe Search Results of amount, therefore meets the condition of finishing beam search. And in the time that j is greater than Y, representedCarry out the search of final stage, now also can finish search, thereby meet termination condition.
Step 614, empties ResultList1 and ResultList2; Read the first two in TermArrayEtymology, and i=2 and j=j+1 are set; According to two above-mentioned read etymologies search index inverted list respectively,From described index inverted list, reading respectively j section in the lists of documents corresponding with described two etymologies readsGet the record in scope, be stored in respectively the first result for retrieval list (ResultList1) and the second retrievalIn the results list (ResultList2).
Because the judged result of step 613 is that termination condition does not meet, represent to deposit in ResultListThe number of the record of storage is less than required number, that is to say not inspection in retrieving on last stageRope is to the Search Results of sufficient amount, so need to carry out the retrieving in this stage (i.e. j stage).
Therefore, in this step, fall distinguishing again search index according to two above-mentioned read etymologiesRow's table. If store above-mentioned two etymologies in index inverted list, difference from described index inverted listRead the record in j section read range in the lists of documents corresponding with described two etymologies. Then,(be Term by first read etymology1) in corresponding lists of documents in j section read rangeRecord be stored in ResultList1, (be Term by second read etymology2) correspondingRecord in lists of documents in j section read range is stored in ResultList2.
Step 615, presses KFScore's by the record in described ResultList1 and ResultList2 respectivelySize order is arranged; Return to execution step 606.
Step 616, is used above-mentioned single Term searching method, according to the etymology inspection in described TermArrayRustling sound draws inverted list, obtains Search Results and correlated results sum; Execution step 618.
Step 617 reads required record as Search Results from described ResultList, willThe number of the record of storing in ResultList is as correlated results sum (SumCount).
For example, in specific embodiments of the invention, described from described ResultList, read requiredRecord comprises as Search Results:
In the time of SumCount >=(InitialPositon+Count)-1, from described ResultListInitialPositon bar recording start, reads Count bar and records as Search Results;
In the time of InitialPositon≤SumCount < (InitialPositon+Count)-1, showThe number deficiency of the record in ResultList, now, the InitialPositon from described ResultListBar recording start, reads (SumCount-InitialPositon+1) bar and records as Search Results,
In the time of SumCount < InitialPositon, show the number of the record in ResultList very little,There is not required Search Results, now, can be using empty set as Search Results.
Step 618, returns to Search Results and SumCount to user.
In above-mentioned many Term searching method, use the strategy of sectioning search. For example,, when in advanceBe provided with 2 sections of read ranges, and using the ordered list of lists of documents as the 1st section of read range, and incite somebody to actionWhole lists of documents is during as the 2nd section of read range, by the ordered list of first searching in lists of documents; AsThe number of the result for retrieval that fruit obtains from ordered list is more than or equal to required number, illustrates firstIn the retrieving of section, retrieve the Search Results of sufficient amount, now without unordered list is carried outSearch, and can directly provide the Search Results having sorted; Only have when result for retrieval in ordered listWhen number is less than required number, in the retrieving of first paragraph, do not retrieve the search knot of sufficient amountWhen fruit, just can carry out the search of second stage, in all records from lists of documents, search for.
Therefore, use above-mentioned many Term searching method, can avoid at every turn all in full dose data as far as possibleOn collection, (for example, in whole lists of documents) carries out search operaqtion, thereby can effectively dwindle search volume,Reduce the complexity of search.
For example, in the prior art, for example, if the rank of number of searches is hundred million ranks (, webpages or newHearing search is all generally T rank), each search of for example increasing income at present, in search (, Lucene)All can carry out full dose sequence to the result set (such as 1,000,000 grades) of search, then obtain on this basis useThe result set of family search. And if use above-mentioned many Term searching method, by lists of documents in orderList, in most cases can in the time that the first stage, search procedure finished as the 1st section of read rangeFind the Search Results that meets user's needs, and need not carry out the search of second stage, therefore can sharply subtractThe data volume of few required search, and not be used in retrieving Search Results is sorted, thereby largeImprove greatly search response speed (generally can than the high order of magnitude of the search response speed of Lucene),And can greatly reduce taking internal memory and cpu resource.
By above-mentioned step 601~618, can realize many Term search.
Be example by the instantiation taking music searching field below, above-mentioned many Term searching method is enteredRow is further introduced.
Instantiation two:
For for simplicity, in this instantiation, still use the basic setting in instantiation one, andSet up the index inverted list shown in Fig. 5, the length n of the ordered list in index inverted list is set to 100.In addition, be located in the lists of documents of index inverted list and be provided with 2 sections of read ranges (being Y=2), and willThe ordered list of lists of documents is as the 1st section of read range, and whole lists of documents is read as the 2nd sectionGet scope.
When the searching character string of user's input is " Liu De China AND providence ", and the search of user's inputRange of results is: when 1st~10 articles of Search Results, can use many Term searching method as described belowSearch for:
Generate an etymology queue TermArray{ Liu De China, providence according to the searching character string of user's input }With query logic sequence SetOperators{AND};
Search Results scope to user's input is resolved, and knows original position InitialPositon=1,Number Count=10;
Fall to arrange according to the index shown in generated etymology " Liu Dehua " and " providence " retrieval Fig. 5 respectivelyTable, finds corresponding etymology " Liu Dehua " and " providence " with above-mentioned generation;
Known according to the index inverted list shown in Fig. 5, for etymology " Liu Dehua ", its institute is correspondingValue be: < 10000, DocList >, therefore, in the DocList corresponding with etymology " Liu Dehua "Total number of documents TotalCount=10000. So, will from described index inverted list, read and etymology " LiuMoral China " ordered list in corresponding DocList, by all 100 notes in described ordered listRecord is all stored in ResultList1;
Known according to the index inverted list shown in Fig. 5, for etymology " providence ", it is correspondingValue is:<60, DocList>, therefore, the document in the DocList corresponding with etymology " providence "Sum TotalCount=60; Due to 60 < 100, so, the DocList corresponding with etymology " providence "In document be all stored in ordered list; So, will from described index inverted list, read with etymology " my godMeaning " ordered list in corresponding DocList, by all 60 records in described ordered list allBe stored in ResultList2;
From logic sequence SetOperators, read the 1st logical symbol, because this logical symbol isAND, therefore uses AND logic to merge ResultList1 and ResultList2, and amalgamation result joinsIn ResultList;
In " providence " corresponding above-mentioned 60 documents, there are 20 singers in document to be called " LiuMoral China ", therefore, above-mentioned ResultList will store 20 records, and correlated results sum is:SumCount=20。
Because Term all in TermArray all uses, and (InitialPositon+Count)-1=1+10-1 < SumCount, Search Results scope is contained in described ordered list, thereforeThrough meeting termination condition, now, the directly InitialPositon=1 article from described ResultListRecording start, reads Count=10 bar and records as Search Results;
Finally, will return to mentioned above searching results and correlated results sum SumCount=20 to user.
Embodiment tri-: hierarchical search method.
In the present embodiment, described hierarchical search method refers to according to predefined rule whole searchProcess is divided into multilayer search procedure, has strict differentiation between the search procedure of each level, different layers itBetween field or search rule while searching for be different. In general, the score of last layer all higher thanLower one deck, specify according to the rule of information determinant attribute sequence every one deck inside again, divides like this layering and searchSuo Fangfa both can keep the correlation of search, ensured again that the rear result rank of important information search is forward simultaneouslyProblem.
Fig. 7 is the flow chart of multilayer searching method in the present invention. As shown in Figure 7, described multilayer searcherMethod can comprise step as described below:
Step 701, sets in advance multilayer search procedure according to predefined searching order rule, and establishesPut the priority of each layer of search procedure and the hierarchical search score value scope of each layer of search procedure.
In this step, can preset corresponding searching order rule according to traffic performance, for example,In specific embodiments of the invention, for music searching field, can preset following search rowOrder rule: precise search, spelling search and participle search. It is known according to above-mentioned searching order rule,For music searching field, can first carry out precise search, if do not obtain the Search Results of sufficient amount,Can carry out again spelling search, if do not obtain yet the Search Results of sufficient amount, finally can divide againWord is searched for, thereby obtains the required Search Results of user of sufficient amount as far as possible.
Therefore,, according to above-mentioned predefined searching order rule, multilayer search procedure can be set. ExampleAs, in music searching field, three layers of following search procedure can be set: accurately layer search procedure, completeSpell layer search procedure and participle layer search procedure, and the priority of each layer of search procedure is by from high to lowOrder be: accurately layer search procedure, spelling layer search procedure, participle layer search procedure.
Further, in specific embodiments of the invention, also can be by every layer of above-mentioned search procedure againInferiorly be divided into multiple sublayers search procedure, and set the priority of each sublayer search procedure. For example, can be byAbove-mentioned accurate layer search procedure is divided into three sublayer search procedures that according to priority sequence is arranged: songSublayer search procedure, singer sublayer search procedure and special edition sublayer search procedure. Search carrying out accurately layerWhen rope process, will first carry out song sublayer search procedure, and then carry out singer sublayer search procedure,After carry out again special edition sublayer search procedure, to complete described accurate layer search procedure. In like manner, also can be by instituteState spelling layer search procedure and participle layer search procedure and be also divided into above-mentioned three sublayer search procedures. At this notRepeat again.
In addition,, in specific embodiments of the invention, also can further set in advance each layer of search procedureHierarchical search score value scope, thus be convenient to the sequence of follow-up obtained result for retrieval. For example, work as settingThe search procedure that haves three layers, and the priority of each layer of search procedure by order is from high to low: ground floor is searchedWhen rope process, second layer search procedure, the 3rd layer of search procedure, can be by the layering of ground floor search procedureSearch score value scope is made as: [A1, A2], is illustrated in all inspections that obtain in ground floor search procedureThe hierarchical search score value of hitch fruit all will be between A1 and A2; By the hierarchical search of second layer search procedureScore value scope is made as: [B1, B2], is made as the hierarchical search score value scope of the 3rd layer of search procedure: [C1,C2]. Wherein, A1 > A2 > B1 > B2 > C1 > C2. What therefore, in ground floor search procedure, obtain appointsThe search mark of a result for retrieval of meaning is all by the result for retrieval higher than obtaining in second and third layer of search procedure.By above-mentioned method, thus can be in keeping the correlation of result for retrieval, also can ensure aboutThe result for retrieval rank of important information is earlier.
Step 702, in the time that user inputs searching character string, carries out each layer according to the sequence of prioritySearch.
In this step, in the time that user inputs searching character string, by carrying out the search procedure of layering, pressAccording to the highest search of the advanced row major level of sequence of priority, and then carry out priority time high searchingRope, and the rest may be inferred, until search the required Search Results of the user of sufficient amount.
In technical scheme of the present invention, above-mentioned steps 702 can have multiple concrete implementation. WithLower by a kind of specific implementation taking wherein as example, technical scheme of the present invention is introduced.
Fig. 8 is the flow chart of a kind of implementation method of step 702 in the present invention. As shown in Figure 8, above-mentionedStep 702 can comprise step as described below:
Step 801, carries out the searching character string of user's input according to the each layer of search procedure setting in advanceResolve, generate and each layer of etymology queue and the query logic sequence that search procedure is corresponding respectively.
In technical scheme of the present invention, owing to having set in advance multilayer search procedure, and every layer searched forThe searching method that journey is used might not be identical, therefore in this step, and can be to the retrieval of user's inputCharacter string is resolved, and patrols thereby generate respectively etymology queue and the inquiry corresponding with each layer of search procedureCollect sequence.
Further, in specific embodiments of the invention, if only have one in the etymology queue generatingIndividual etymology, the query logic sequence corresponding with this etymology queue is empty set.
For example, if the multilayer search procedure setting in advance be: accurately layer search procedure, the search of spelling layerProcess and participle layer search procedure, and the searching character string that user inputs is " Liu De China providence ",After the searching character string of user being inputted is resolved, obtain is corresponding with accurate layer search procedureEtymology queue in only have an etymology " Liu De China providence ", the query logic corresponding with this etymology queueSequence is empty set; In the obtain etymology queue corresponding with spelling layer search procedure, only has an etymology" liudehuatianyi ", the query logic sequence corresponding with this etymology queue is empty set; Obtain with pointIn the corresponding etymology queue of word layer search procedure, there are two etymologies: " Liu Dehua " and " providence ", withIn query logic sequence corresponding to this etymology queue, there is logical symbol a: OR.
Step 802, resolves the Search Results scope of user's input, obtains original position and number.
In this step, can resolve by the Search Results scope that user is inputted, obtain initialPosition InitialPositon and number Count.
Step 803, according to predefined priority, by current unenforced and the highest the searching of priorityRope process is defined as current search process.
In this step, need to determine which layer search procedure be current search process be. Therefore, Ke YigenAccording to predefined priority, determine that current search process is current unenforced and the highest the searching of priorityRope process. For example, if carry out for the first time search procedure, current search process is that priority is the highestSearch procedure; If carry out for the second time search procedure, current search process is that priority is inferior highSearch procedure; And the rest may be inferred.
Step 804, according to the etymology queue corresponding with current search process and query logic sequence to ropeDraw inverted list and retrieve, result for retrieval is stored in the set of layering result for retrieval and obtains current layeringResult for retrieval sum.
Owing to having generated in step 801 and each layer of etymology queue and the inquiry that search procedure is correspondingLogic sequence, therefore in this step, can be directly according to the corresponding etymology queue of current search process andQuery logic sequence is retrieved index inverted list, and result for retrieval is stored in to the set of layering result for retrieval(LayerResultList), thus obtain the set of layering result for retrieval; Meanwhile, also can obtain current dividingThe sum of the result for retrieval in the set of layer result for retrieval, i.e. current layering result for retrieval sum (LayerResultCount)。
In addition, in the current search process of this step, can use above-mentioned many Term way of search pairIndex inverted list is retrieved, and concrete retrieving does not repeat them here.
Step 805, according to the hierarchical search of the KFScore of each result for retrieval and current search processScore value scope, the hierarchical search score value of each result for retrieval in the set of calculating layering result for retrieval.
In specific embodiments of the invention, can calculate layering result for retrieval by multiple computational methodsThe hierarchical search score value of each result for retrieval in set.
For example, can be according to the hierarchical search score value scope of current search process and the set of layering result for retrievalIn each result for retrieval according to KFScore putting in order from big to small, for each result for retrieval arranges phaseThe hierarchical search score value of answering. Other computational methods do not repeat them here.
Step 806, by the result for retrieval in the set of layering result for retrieval according to the size of hierarchical search score valueJoin in total result for retrieval set, and delete the result for retrieval repeating; Calculate current total retrievalResult sum, and empty the set of described layering result for retrieval.
In this step, can be by the result for retrieval in the layering result for retrieval set obtaining in step 804 allBe inserted into the result for retrieval in total result for retrieval set (ResultList) according to the size of hierarchical search score valueIn, and delete the result for retrieval repeating. Wherein, the initial value of described total result for retrieval set is empty set,It is search result storage not in the total result for retrieval set under initial situation. Therefore, if current searchProcess is ground floor search procedure, the result for retrieval in the set of layering result for retrieval is joined to total retrievalIn results set time, can not have the result for retrieval of repetition, the number of the result for retrieval repeating is 0.
In addition,, in specific embodiments of the invention, also need to calculate current total result for retrieval sum(SearchResultCount), the i.e. sum of the result for retrieval in current total result for retrieval set. ExampleAs, can directly add up current total result for retrieval set, obtain the sum of result for retrieval; OrPerson, the total result for retrieval sum before this can also being calculated adds current layering result for retrieval sum, andDeduct the number of the result for retrieval of repetition, thereby obtain current total result for retrieval sum.
Step 807, judges whether to meet termination condition; If so, perform step 808; Otherwise,Return to execution step 803.
Wherein, in specific embodiments of the invention, described termination condition can be:
The maximum of current total result for retrieval sum great-than search range of results, or current search processFor last one deck search procedure.
In technical scheme of the present invention, when current total result for retrieval sum great-than search range of results(InitialPositon+Count), time, represent to have retrieved the Search Results of sufficient amount, therefore fullThe condition that foot finishes beam search. And in the time that current search process is last one deck search procedure, now also canTo finish search, thereby meet termination condition.
Step 808 reads required number from described total result for retrieval set according to Search Results scopeResult for retrieval is as Search Results; Using current total result for retrieval sum as correlated results sum(SumCount)。
For example, in specific embodiments of the invention, described according to Search Results scope from described total retrievalThe result for retrieval that reads required number in results set comprises as Search Results:
In the time of SumCount >=(InitialPositon+Count)-1, from described total result for retrieval setIn InitialPositon article of recording start, read Count bar and record as Search Results;
In the time of InitialPositon≤SumCount < (InitialPositon+Count)-1, showThe number deficiency of the record in total result for retrieval set, now, from described total result for retrieval set theInitialPositon bar recording start, reads (SumCount-InitialPositon+1) bar record as searchingHitch fruit;
In the time of SumCount < InitialPositon, show the number of the record in total result for retrieval setVery little, there is not required Search Results, now, can be using empty set as Search Results.
Step 809, returns to Search Results and Search Results sum to user.
By above-mentioned step 701~702, can realize multilayer search.
Be example by the instantiation taking music searching field below, above-mentioned multilayer searching method is enteredThe introduction of one step.
Instantiation three:
For for simplicity, in this instantiation, still use the basic setting in instantiation one, andSet up the index inverted list shown in Fig. 5, the length n of the ordered list in index inverted list is set to 100.
When the searching character string of user's input is " Liu De China providence ", and the Search Results of user's inputScope is: when 1st~50 articles of Search Results, can use multilayer searching method as described below to search for:
In this instantiation, first set in advance three layers according to the traffic performance in music searching field and searchRope process: accurately layer search procedure, spelling layer search procedure and participle layer search procedure. And, each layerThe priority of search procedure by order is from high to low: accurately layer search procedure, spelling layer search procedure,Participle layer search procedure. Then, then by all suitable by from high to low according to priority of above-mentioned each layer of search procedureOrder is divided into three sublayer search procedures: song sublayer search procedure, singer sublayer search procedure and special editionLayer search procedure.
According to above-mentioned facilities, set in advance the hierarchical search score value scope of each layer of search procedure: willAccurately the hierarchical search score value scope of layer search procedure is made as [A1, A2]; By dividing of spelling layer search procedureLayer search score value scope is made as [B1, B2], and the hierarchical search score value scope of participle layer search procedure is made as[C1, C2]; Wherein, A1 > A2 > B1 > B2 > C1 > C2.
After completing above-mentioned setting, can the retrieval to user's input according to the each layer of search procedure setting in advanceCharacter string " Liu De China providence " is resolved, and generates and the each layer of etymology that search procedure is corresponding respectivelyQueue and query logic sequence, be respectively:
In the etymology queue corresponding with accurate layer search procedure, only have an etymology " Liu De China providence ",The query logic sequence corresponding with this etymology queue is empty set;
In the etymology queue corresponding with spelling layer search procedure, only have an etymology " liudehuatianyi ",The query logic sequence corresponding with this etymology queue is empty set;
In the etymology queue corresponding with participle layer search procedure, have two etymologies: " Liu Dehua " and " my godMeaning ", in the query logic sequence corresponding with this etymology queue, there is logical symbol a: AND.
Search Results scope to user's input is resolved, and knows original position InitialPositon=1,Number Count=50;
Because the priority of the song sublayer search procedure in accurate layer search procedure is the highest, therefore can be by songSong layer search procedure is defined as current search process.
In the search procedure of song sublayer, use single Term searcher according to etymology " Liu De China providence "Index inverted list shown in formula retrieval Fig. 5. Owing to failing to find corresponding from described index inverted listEtymology " Liu De China providence ", therefore returns to empty Search Results; So total retrieval set is combined into skyCollection, current total result for retrieval adds up to 0.
Because current total result for retrieval sum is less than Search Results scope, and current search process neitherLast one deck search procedure, does not therefore meet termination condition, will continue to carry out follow-up retrieving. ThisTime, current search procedure unenforced and that priority is the highest is singer sublayer search procedure, therefore can be bySinger sublayer search procedure is defined as current search process, carries out follow-up retrieval.
In like manner, in singer sublayer search procedure and special edition sublayer search procedure, also all fail from described ropeDraw inverted list and find corresponding etymology " Liu De China providence ", therefore return to empty Search Results, instituteStill not meet termination condition, follow-up spelling layer search procedure will be proceeded.
Because the etymology in spelling layer search procedure is " liudehuatianyi ", therefore carrying out spelling layerWhen each sublayer search procedure in search procedure, also all fail to find relatively from described index inverted listThe etymology " liudehuatianyi " of answering, therefore returns to empty Search Results, so still do not meet and finish barPart, will proceed follow-up participle layer search procedure.
In participle layer search procedure, will first carry out song sublayer search procedure, and then carry out singer'sLayer search procedure, finally carries out special edition sublayer search procedure again.
Owing to there being two etymologies in participle layer search procedure: " Liu Dehua " and " providence ", and with this wordIn query logic sequence corresponding to source queue, there is logical symbol a: AND, therefore, search at participle layerIn each sublayer search procedure of process, all can find corresponding etymology from described index inverted list,And obtain corresponding Search Results.
In the time that the number of the Search Results obtaining in participle layer search procedure is greater than 50, end is searchedRope process, and the retrieval of reading required number according to Search Results scope from described total result for retrieval setResult, as Search Results, is calculated Search Results sum, and so rear line returns to Search Results and search knotFruit sum.
And the number of Search Results in the time that participle layer search procedure finishes is while being still less than 50, due toParticiple layer search procedure has been last one deck search procedure, therefore meets equally termination condition, now,Can calculate Search Results sum, so rear line returns to Search Results and Search Results sum.
In technical scheme of the present invention, a kind of searcher is also proposed. Fig. 9 is searching in the present inventionThe structural representation of rope device. As shown in Figure 9, the searcher in the present invention comprises: inverted list generatesModule 901, memory module 902, etymology generation module 903, retrieval module 904.
Described inverted list generation module 901, for by whole documents to be retrieved taking etymology Term as keywordCarry out index, set up taking Term as key word index, with the sum of the document that comprises this TermThe index inverted list that TotalCount and the lists of documents DocList that comprises this Term are value; Described documentIn every record in list, include document code and the determinant attribute score value of a document; Described documentList is made up of ordered list and unordered list, and described ordered list comprises that n determinant attribute score valueGreatly and by determinant attribute score value descending order arrange record; Wherein, described n is for pre-determiningValue; Described index inverted list is sent to memory module 902;
Described memory module 902, for storing described index inverted list;
Described etymology generation module 903, for generating corresponding word according to the searching character string of user's inputSource; The Search Results scope of described etymology and user's input is sent to retrieval module 904;
Described retrieval module 904, for the rope to memory module 902 storages according to generated etymologyDraw inverted list and retrieve, and the Search Results scope of inputting according to user is preferentially from generated etymology pairIn the ordered list of answering, obtain record, to obtain required Search Results and correlated results sum; Described in inciting somebody to actionSearch Results and correlated results sum send to user.
In sum, in technical scheme of the present invention, due to the determinant attribute that has been same class document setupWith determinant attribute weight, therefore can calculate the determinant attribute score value of each document, then according to above-mentioned keyAttribute score value is set up index inverted list, makes lists of documents in this index inverted list by ordered list and nothingSequence table composition, and ordered list comprises n determinant attribute score value maximum and by determinant attribute score valueThe record that descending order is arranged. Because the determinant attribute score value of document is in the time setting up index inverted listComplete, but also the record of storing in index inverted list has been fulfiled ahead of schedule to partial ordered operation, because ofThis can make in the time described index inverted list being retrieved according to the searching character string of user's input, userThe Search Results scope of inputting is included among ordered list in most situations, thereby canThe Search Results that directly obtains required number from ordered list, has ensured that most Search Results is passableDirectly from ordered list, obtain, and the Search Results obtaining is by the size order row of KFScoreRow, therefore without again Search Results being sorted, thereby greatly reduced full dose data score calculate withAnd full dose sorting operation, greatly reduce the operation times of CPU, significantly improve search speed, makeSearch speed has improved an order of magnitude than search speed of the prior art, and has greatly improved and searchedThe search response time that index is held up.
In addition, in index inverted list of the present invention, what store is the determinant attribute score value of each document,Without depositing the required factor that calculates the score, therefore reduce the field of depositing, greatly save sharedMemory source, thereby saved a large amount of hardware and software resources.
In addition, in index inverted list of the present invention, the foundation that the record in ordered list is sortedDeterminant attribute and the determinant attribute score value of each document, and the determinant attribute of each document and determinant attributeScore value all can arrange in advance according to practical situations, therefore can take into full account text relevant andInformation relative worth, makes the sequence of the Search Results demand of being more close to the users; Meanwhile, can also establish flexiblyPut determinant attribute and determinant attribute score value, thereby can ensure customized searches on the basis of text relevantThe sorting operation of result.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all at thisWithin the spirit and principle of invention, any amendment of making, be equal to replacement, improvement etc., all should be included inWithin the scope of protection of the invention.

Claims (25)

1. a searching method, is characterized in that, the method comprises:
A, be at least one determinant attribute of same class document setup and corresponding determinant attribute weight, and according to instituteState the determinant attribute score value KFScore of determinant attribute and each document of determinant attribute weight calculation;
B, whole documents to be retrieved are carried out to index taking etymology Term as keyword, set up taking Term as crucialWord indexing, with the total TotalCount of the document that comprises this Term and the lists of documents that comprises this TermDocList is the index inverted list of value; In every record in described lists of documents, include the literary composition of a documentShelves numbering and determinant attribute score value; Described lists of documents is made up of ordered list and unordered list, described orderlyThe note that list comprises n determinant attribute score value maximum and arranges by determinant attribute score value descending orderRecord; Wherein, described n is predetermined value;
C, generate corresponding etymology according to the searching character string of user input, according to generated etymology to describedIndex inverted list is retrieved, and the Search Results scope of inputting according to user is preferentially from generated etymology pairIn the ordered list of answering, obtain record, to obtain required Search Results and correlated results sum.
2. method according to claim 1, is characterized in that, calculates the formula of the KFScore of documentFor:
KFScore=KeyField1*W1+KeyField2*W2+……+KeyFieldX*WX
Wherein, W1+W2+……+WX=1; Described KeyFieldxRepresent x pass of described documentThe value of key attribute KeyField, described WxRepresent corresponding with x KeyField of described documentThe value of determinant attribute weights W eight.
3. method according to claim 1, is characterized in that,
Described Search Results scope comprises: original position InitialPositon and number Count;
The maximum of described Search Results scope is: (InitialPositon+Count)-1.
4. method according to claim 3, is characterized in that, described according to the search knot of user's inputFruit scope is preferentially obtained record from ordered list corresponding to generated etymology, to obtain required search knotFruit comprises:
When the maximum of described Search Results scope is less than or equal to ordered list that generated etymology is correspondingWhen length n, InitialPositon article of recording start from described ordered list, reads Count bar recordAs Search Results;
When the original position of described Search Results scope is arranged in the ordered list that generated etymology is corresponding, and instituteWhen the maximum of stating Search Results scope is greater than the length n of described ordered list, from described ordered listInitialPositon article of recording start, reads (n-InitialPositon+1) bar record; By generated wordRecord in unordered list corresponding to source is pressed the size order of KFScore and is arranged, and from described unordered listThe 1st article of recording start, read [Count-(n-InitialPositon+1)] bar record; By readCount bar records as Search Results;
In the time that the original position of described Search Results scope is arranged in unordered list corresponding to generated etymology, willRecord in described unordered list is pressed the size order of KFScore and is arranged, then the unordered list from sequenceIn (InitialPositon-n) article recording start, read Count bar and record as Search Results.
5. method according to claim 3, is characterized in that, in step C, also further comprises:
In the time not storing the etymology generating according to the searching character string of user's input in described index inverted list, willEmpty set is as Search Results.
6. method according to claim 1, is characterized in that, described step C comprises:
C1, the searching character string of inputting according to user generate a corresponding etymology;
C2, according to generated etymology search index inverted list, judge in described index inverted list and whether storeThis etymology; If so, perform step c3; Otherwise, execution step c15;
C3, from described index inverted list, read the lists of documents corresponding with described etymology;
C4, the Search Results scope that user is inputted are resolved, and obtain original position InitialPositon and barNumber Count;
C5, judgement (InitialPositon+Count)-1 > whether n set up, and if so, performs step c6;Otherwise, execution step c12; Wherein, the length that described n is ordered list;
C6, judge InitialPositon > whether n set up, and if so, performs step c10; Otherwise, carry outStep c7;
C7, InitialPositon article of recording start from ordered list, read (n-InitialPositon+1)Bar record joins in search result list ResultList;
C8, the size order arrangement of the record in described unordered list being pressed to KFScore;
C9, from described unordered list, read [Count-(n-InitialPositon+1)] bar record and joinIn ResultList, execution step c13;
C10, the size order arrangement of the record in described unordered list being pressed to KFScore;
C11, from sequence unordered list in (InitialPositon-n) article recording start, read CountBar record joins in ResultList, execution step c13;
C12, InitialPositon article of recording start from ordered list, read Count bar record and addIn ResultList, execution step c13;
C13, using described ResultList as Search Results, and by from described index inverted list, read withThe total number of documents TotalCount of the corresponding lists of documents of described etymology is as correlated results sum;
C14, described Search Results and correlated results sum are returned to user, process ends;
C15, using empty set as Search Results, establish correlated results sum and equal 0, by described empty set and relevant knotFruit sum returns to user, process ends.
7. method according to claim 1, is characterized in that, described step C comprises:
C1, Y section read range is set in the lists of documents of index inverted list; Wherein, Y >=2;
C2, the searching character string of inputting according to user generate an etymology queue TermArray and query logicSequence SetOperators;
C3, the Search Results scope of user input is resolved, obtain original position InitialPositon andNumber Count;
C4, in the time that the length of TermArray is greater than 1, read the first two etymology in TermArray, andInitial value be set be the variable j that 2 variable i and initial value are 1;
C5, according to read two etymologies search index inverted lists respectively, from described index inverted list pointDo not read the record in j section read range in the lists of documents corresponding with described two etymologies, by readRecord be stored in respectively the first result for retrieval list ResultList1 and the second result for retrieval listIn ResultList2;
C6, read (i-1) the individual logical symbol in SetOperators;
The value of the logical symbol that C7, judgement are read; In the time that the value of described logical symbol is AND, holdRow step C8; In the time that the value of described logical symbol is OR, execution step C9; When described logical symbolWhen value is SUB, execution step C10;
C8, use and logic merge ResultList1 and ResultList2, and amalgamation result joins Search ResultsIn list ResultList; Execution step C11;
C9, use or logic merge ResultList1 and ResultList2, and amalgamation result joins ResultListIn; Execution step C11;
C10, the poor logic of use merge ResultList1 and ResultList2, and amalgamation result joins ResultListIn; Execution step C11;
C11, judge in TermArray whether to also have etymology not use; If so, perform step C12;Otherwise, execution step C13;
C12, establish i=i+1, read i etymology in TermArray; Empty ResultList1 andResultList2, and the record in ResultList is copied in ResultList2; According to read etymologyRetrieve described index inverted list, from described index inverted list, read the document corresponding with read etymologyRecord in list in j section read range, is stored in read record in ResultList1; Return and holdRow step C6;
C13, judge whether to meet termination condition; If so, perform step C16; Otherwise, execution stepC14;
C14, empty ResultList1 and ResultList2; Read the first two etymology in TermArray, andI=2 and j=j+1 are set; According to two above-mentioned read etymologies search index inverted list respectively, from described ropeDraw and in inverted list, read respectively in the lists of documents corresponding with described two etymologies in j section read rangeRecord, is stored in respectively in ResultList1 and ResultList2;
C15, respectively the record in described ResultList1 and ResultList2 is pressed to the size order of KFScoreArrange; Return to execution step C6;
C16, using the number of the record of storing in ResultList as correlated results sum;
C17, return to Search Results and correlated results sum to user.
8. method according to claim 7, is characterized in that,
In the time of Y=2, using the ordered list of lists of documents as the 1st section of read range, by whole lists of documentsAs the 2nd section of read range.
9. method according to claim 7, is characterized in that,
In the time of Y=3, using the ordered list of lists of documents as the 1st section of read range, by the nothing of lists of documentsSequence table is as the 2nd section of read range, using whole lists of documents as the 3rd section of read range.
10. method according to claim 7, is characterized in that, in step C5, also further comprises:
If do not store any one etymology in described two etymologies in index inverted list, by the word of not storingThe corresponding result for retrieval list in source is set to empty set.
11. methods according to claim 7, is characterized in that, described use and logic mergeResultList1 and ResultList2, and amalgamation result is joined to ResultList comprise:
Compare one by one each record in ResultList1 and ResultList2, by two result for retrieval listsTwo identical records of mark Did of KFScore and document join ResultList as a Search ResultsIn.
12. methods according to claim 7, is characterized in that, described use or logic mergeResultList1 and ResultList2, and amalgamation result is joined to ResultList comprise:
Each record in described ResultList1 and ResultList2 joined in ResultList, if hadArticle two, the KFScore of record and Did are all identical, in ResultList, delete a record wherein.
13. methods according to claim 7, is characterized in that, the poor logic of described use mergesResultList1 and ResultList2, and amalgamation result is joined to ResultList comprise:
From described ResultList1, remove each the record of storing in ResultList2, will remove operationAfter ResultList1 in record join in ResultList.
14. methods according to claim 7, is characterized in that, also further in described step C12Comprise:
If do not store i above-mentioned read etymology in index inverted list, by described ResultList1Be set to empty set, then return to execution step C6.
15. method according to claim 7, is characterized in that, described termination condition is:
The maximum of the number great-than search range of results of the record of storing in ResultList; Or j equalsY。
16. methods according to claim 7, is characterized in that, described from described ResultListIn read required record and comprise as Search Results:
In the time of SumCount >=(InitialPositon+Count)-1, from described ResultListInitialPositon bar recording start, reads Count bar and records as Search Results;
In the time of InitialPositon≤SumCount < (InitialPositon+Count)-1, from describedInitialPositon article of recording start in ResultList, reads (SumCount-InitialPositon+1) bar records as Search Results;
In the time of SumCount < InitialPositon, using empty set as Search Results.
17. methods according to claim 1, is characterized in that, described step C comprises:
Set in advance multilayer search procedure according to predefined searching order rule, and each layer of search procedure is setPriority and the hierarchical search score value scope of each layer of search procedure;
In the time that user inputs searching character string, carry out each layer of search according to the sequence of priority.
18. methods according to claim 17, is characterized in that, described according to predefined searchOrdering rule sets in advance multilayer search procedure, and the priority that each layer of search procedure be set comprises:
In music searching field, three layers of following search procedure are set: accurately layer search procedure, the search of spelling layerProcess and participle layer search procedure; And the priority of each layer of search procedure by order is from high to low: accuratelyLayer search procedure, spelling layer search procedure, participle layer search procedure.
19. methods according to claim 18, is characterized in that, the method also further comprises:
Every layer of search procedure is divided into multiple sublayers search procedure, and sets the preferential of each sublayer search procedureLevel.
20. methods according to claim 17, is characterized in that, described each layer of search procedure is setHierarchical search score value scope comprises:
When the setting search procedure that haves three layers, and the priority of each layer of search procedure by order is from high to low:When one deck search procedure, second layer search procedure, the 3rd layer of search procedure,
The hierarchical search score value scope of ground floor search procedure is made as: [A1, A2]; The second layer was searched forThe hierarchical search score value scope of journey is made as: [B1, B2]; By the hierarchical search score value model of the 3rd layer of search procedureEnclose as [C1, C2]; Wherein, A1 > A2 > B1 > B2 > C1 > C2.
21. methods according to claim 17, is characterized in that, describedly input searching character as userWhen string, carry out each layer of search according to the sequence of priority and comprise:
Z1, the searching character string of user being inputted according to the each layer of search procedure setting in advance are resolved, pointNot Sheng Cheng with each layer of etymology queue and the query logic sequence that search procedure is corresponding;
Z2, the Search Results scope that user is inputted are resolved, and obtain original position and number;
Z3, according to predefined priority, by true current search procedure unenforced and that priority is the highestBe decided to be current search process;
Z4, according to the etymology queue corresponding with current search process and query logic sequence to index inverted listRetrieve, result for retrieval is stored in the set of layering result for retrieval and to obtain current layering result for retrieval totalNumber;
Z5, according to the hierarchical search score value scope of the KFScore of each result for retrieval and current search process,Calculate the hierarchical search score value of each result for retrieval in the set of layering result for retrieval;
Z6, the result for retrieval in the set of layering result for retrieval is joined always according to the size of hierarchical search score valueIn result for retrieval set, and delete the result for retrieval repeating; Calculate current total result for retrieval sum,And empty the set of described layering result for retrieval;
Z7, judge whether to meet termination condition; If so, perform step Z8; Otherwise, return and carry out stepRapid Z3;
Z8, from described total result for retrieval set, read the result for retrieval of required number according to Search Results scopeAs Search Results; The sum SumCount using current total result for retrieval sum as correlated results;
Z9, return to Search Results and Search Results sum to user.
22. methods according to claim 21, is characterized in that, described according to each result for retrievalThe hierarchical search score value scope of KFScore and current search process, calculates in the set of layering result for retrieval eachThe hierarchical search score value of result for retrieval comprises:
According to each retrieval in the hierarchical search score value scope of current search process and the set of layering result for retrievalResult is according to KFScore putting in order from big to small, for each result for retrieval arranges corresponding hierarchical searchScore value.
23. methods according to claim 21, is characterized in that, described termination condition is:
The maximum of current total result for retrieval sum great-than search range of results; Or current search process isLast one deck search procedure.
24. methods according to claim 21, is characterized in that, described according to Search Results scope fromThe result for retrieval that reads required number in described total result for retrieval set comprises as Search Results:
In the time of SumCount >=(InitialPositon+Count)-1, from described total result for retrieval setInitialPositon article of recording start, reads Count bar and records as Search Results;
In the time of InitialPositon≤SumCount < (InitialPositon+Count)-1, from described total inspectionInitialPositon article of recording start in rope results set, reads (SumCount-InitialPositon+1)Bar records as Search Results;
In the time of SumCount < InitialPositon, using empty set as Search Results.
25. a searcher, is characterized in that, this searcher comprises: inverted list generation module, storageModule, etymology generation module and retrieval module;
Described inverted list generation module, is used at least one determinant attribute of same class document setup and closes accordinglyKey attribute weight, and divide according to the determinant attribute of described determinant attribute and each document of determinant attribute weight calculationValue KFScore; Whole documents to be retrieved are carried out to index taking etymology Term as keyword, set up with TermFor key word index, with the total TotalCount of the document that comprises this Term and the document that comprises this TermList DocList is the index inverted list of value; In every record in described lists of documents, include a documentDocument code and determinant attribute score value; Described lists of documents is made up of ordered list and unordered list, described inOrdered list comprises n determinant attribute score value maximum and arranges by determinant attribute score value descending orderRecord; Wherein, described n is predetermined value; Described index inverted list is sent to memory module;
Described memory module, for storing described index inverted list;
Described etymology generation module, for generating corresponding etymology according to the searching character string of user's input; By instituteThe Search Results scope of predicate source and user's input sends to retrieval module;
Described retrieval module, carries out for the index inverted list of memory module being stored according to generated etymologyRetrieval, and the Search Results scope of inputting according to user is preferentially from ordered list corresponding to generated etymologyObtain record, to obtain required Search Results and correlated results sum; By described Search Results and relevant knotFruit sum sends to user.
CN201110461128.4A 2011-12-30 2011-12-30 A kind of searching method and device Active CN103186650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110461128.4A CN103186650B (en) 2011-12-30 2011-12-30 A kind of searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110461128.4A CN103186650B (en) 2011-12-30 2011-12-30 A kind of searching method and device

Publications (2)

Publication Number Publication Date
CN103186650A CN103186650A (en) 2013-07-03
CN103186650B true CN103186650B (en) 2016-05-25

Family

ID=48677819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110461128.4A Active CN103186650B (en) 2011-12-30 2011-12-30 A kind of searching method and device

Country Status (1)

Country Link
CN (1) CN103186650B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649302A (en) * 2015-10-28 2017-05-10 腾讯科技(深圳)有限公司 Search sequencing method and device
CN106649650B (en) * 2016-12-10 2020-08-18 宁波财经学院 Bidirectional matching method for demand information
CN106909647B (en) * 2017-02-21 2020-01-03 福建榕基软件股份有限公司 Data retrieval method and device
CN109388690A (en) * 2017-08-10 2019-02-26 阿里巴巴集团控股有限公司 Text searching method, inverted list generation method and system for text retrieval
CN107729523A (en) * 2017-10-27 2018-02-23 平安科技(深圳)有限公司 Data service method, electronic installation and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101460949A (en) * 2006-06-01 2009-06-17 微软公司 Indexing documents for information retrieval based on additional feedback fields

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7584221B2 (en) * 2004-03-18 2009-09-01 Microsoft Corporation Field weighting in text searching
US7822752B2 (en) * 2007-05-18 2010-10-26 Microsoft Corporation Efficient retrieval algorithm by query term discrimination
CN101685455B (en) * 2008-09-28 2012-02-01 华为技术有限公司 Method and system of data retrieval
US20110022600A1 (en) * 2009-07-22 2011-01-27 Ecole Polytechnique Federale De Lausanne Epfl Method of data retrieval, and search engine using such a method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101460949A (en) * 2006-06-01 2009-06-17 微软公司 Indexing documents for information retrieval based on additional feedback fields

Also Published As

Publication number Publication date
CN103186650A (en) 2013-07-03

Similar Documents

Publication Publication Date Title
Turnbull et al. Five Approaches to Collecting Tags for Music.
US9830390B2 (en) Related entities
CN103377232B (en) Headline keyword recommendation method and system
CN101501630B (en) Method for ranking computerized search result list and its database search engine
US8843470B2 (en) Meta classifier for query intent classification
US20090313227A1 (en) Searching Using Patterns of Usage
US20090019034A1 (en) Media discovery and playlist generation
US20090076927A1 (en) Distinguishing accessories from products for ranking search results
JP2010541092A5 (en)
CN105426514A (en) Personalized mobile APP recommendation method
JP2009521750A (en) Analyzing content to determine context and providing relevant content based on context
US9405803B2 (en) Ranking signals in mixed corpora environments
CN103186650B (en) A kind of searching method and device
Schedl et al. Exploring the music similarity space on the web
Schedl et al. A music information system automatically generated via web content mining techniques
US20100042610A1 (en) Rank documents based on popularity of key metadata
CN111553765A (en) E-commerce search sorting method and device and computing equipment
JP2006318398A (en) Vector generation method and device, information classifying method and device, and program, and computer readable storage medium with program stored therein
Figueiredo et al. Evidence of quality of textual features on the web 2.0
US9779140B2 (en) Ranking signals for sparse corpora
US9626435B2 (en) Using hierarchical scoring for disambiguation in an information retrieval system
US10452710B2 (en) Selecting content items based on received term using topic model
CN106372123A (en) Tag-based related content recommendation method and system
Hsu et al. Efficient and effective prediction of social tags to enhance web search
Zhao et al. Trailmix: An ensemble recommender system for playlist curation and continuation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant