CN102054007B - Searching method and searching device - Google Patents

Searching method and searching device Download PDF

Info

Publication number
CN102054007B
CN102054007B CN2009102371861A CN200910237186A CN102054007B CN 102054007 B CN102054007 B CN 102054007B CN 2009102371861 A CN2009102371861 A CN 2009102371861A CN 200910237186 A CN200910237186 A CN 200910237186A CN 102054007 B CN102054007 B CN 102054007B
Authority
CN
China
Prior art keywords
data item
search condition
group character
search
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009102371861A
Other languages
Chinese (zh)
Other versions
CN102054007A (en
Inventor
童征宇
李晓蕊
刘志云
赵东岩
徐剑波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Founder Apabi Technology Ltd
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University
Priority to CN2009102371861A priority Critical patent/CN102054007B/en
Publication of CN102054007A publication Critical patent/CN102054007A/en
Application granted granted Critical
Publication of CN102054007B publication Critical patent/CN102054007B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a searching method and a searching device. The method comprises the following steps of: grouping documents in an indexing library according to data item values in preset data items, and executing the following steps when search conditions submitted by a user are acquired: determining a first searching condition for searching and a second searching condition for filtering according to the searching attribute information of data items in the searching conditions, searching the indexing library according to the first searching condition to obtain a primary searching result, and searching the data item values corresponding to searching words on the data items contained in the second searching conditions, and generating filters based on grouping, wherein the filters based on grouping only allow or only not allow grouped documents to pass; and filtering the targeted documents in the primary searching result by means of each filter in order to obtain the final searching result. By applying the method, a part of searching conditions are converted into filtering conditions so as to reduce the processes of searching and combining, the system resource is saved, and the processing speed is increased.

Description

A kind of search method and indexing unit
Technical field
The present invention relates to field of information processing, refer to a kind ofly especially, adopt the packet filtering mode to accelerate the search method and the indexing unit of retrieval rate based on global search technology.
Background technology
In the prior art; Text retrieval system supports that the user specifies a plurality of search conditions to retrieve simultaneously; Each search condition is retrieved respectively as a branch and is obtained one group of result for retrieval; The result for retrieval of respectively organizing that at last each branch is obtained merges, and just can be met the final result for retrieval of whole search conditions (being a plurality of search conditions).Therefore, the system resources consumption in the retrieving comprises the retrieval of each retrieval branch and the result for retrieval of each branch is merged the system resources consumption that obtains two processes of net result.
At present, the retrieval for a plurality of different values of same data item generally can be split into a plurality of search conditions; Simultaneously, complicated search condition (like the retrieval of specified retrieval word scope, the retrieval of specified retrieval word prefix etc.) can be extended to one group of common search condition when retrieval; This makes the quantity of the search condition in the primary retrieval may reach up to a hundred even thousands of; Along with increasing of the search condition quantity that splits out; The consumption of system resource also constantly increases along with the increase of search condition, thereby has aggravated the performance issue that full-text search exists more.
Existing system performance problems when solving a plurality of conditional information retrieval, can consider to improve branch retrieval performance, improve retrieving of complicated retrieval type or the like.But the raising of the system performance that this improvement brought is very limited.
Application number is 200610083172.5 patented claims; A kind of data integral service system and method are disclosed; Comprise: the querying condition of user's input is converted into numerical range; And with the data area that the data source of preserving in advance provides, search and provide query requests to obtain Query Result to the data source of correspondence.This method directly is cured as filter function with the part retrieval request, and result for retrieval calculates one by one through filter function and verifies.The calculated amount of filtering in this method is big, and the data of processing are many.This method can't be supported user's dynamic process retrieval request in retrieving neatly in addition.
And the search processing method based on filtercondition that exists in the prior art all is based on whole documents; Filtercondition is combined with whole documents in the index database; This filtrator based on whole documents exists establishment speed slow, and data volume is big, takies shortcomings such as a large amount of memory sources.And in practical application, often need filter according to certain characteristic of document, when the value of characteristic more for a long time, tend to cause the long problem that causes performance and transmission of search condition, and be difficult to satisfy this filtration requirement based on the filtrator of document.Still there is serious performance issue when therefore, in the how concurrent text retrieval system of multi-user, using based on the filtrator of whole documents.
Summary of the invention
The embodiment of the invention provides a kind of search method and indexing unit, and the system resource overhead that exists when solving many conditional information retrievals in the prior art is big, processing speed waits system performance problems slowly.
A kind of search method comprises: the data item occurrence according in the preset data item divides into groups to the document in the index database, when getting access to the search condition of user's submission, carries out the following step:
According to the searching attribute information of data item in the said search condition, determine first search condition that is used to retrieve and second search condition that is used to filter;
Through said first search condition said index database is retrieved, obtained the preliminary search result; And search the pairing data item occurrence of term on the data item that comprises in said second search condition, generate packet-based filtrator; The document that said packet-based filtrator only allows or only do not allow to set grouping passes through;
Through each said filtrator the document that hits among the said preliminary search result is filtered successively, obtain final result for retrieval.
A kind of indexing unit comprises:
Grouping module is used for according to the data item occurrence of preset data item the document in the index database being divided into groups;
Separation module is used to get access to the search condition that the user submits to, and according to the searching attribute information of data item in the said search condition, determines first search condition that is used to retrieve and second search condition that is used to filter;
Retrieval module is used for through first search condition that said separation module is determined said index database being retrieved, and obtains the preliminary search result;
Generation module is used for searching the pairing data item occurrence of term on the data item that second search condition that said separation module determines comprises, and generates packet-based filtrator; The document that said packet-based filtrator only allows or only do not allow to set grouping passes through;
Filtering module, the packet-based filtrator that is used for successively generating through each said generation module filters said preliminary search result's the document that hits, and obtains final result for retrieval.
Search method that the embodiment of the invention provides and indexing unit divide into groups to the document in the index database according to the data item occurrence in the preset data item; When needs are retrieved, can determine first search condition that is used to retrieve and second search condition that is used to filter according to the searching attribute information of data item in the search condition that gets access to user's submission; Be filter process with the pairing retrieving of second search condition then, promptly index database retrieved, obtain the preliminary search result through first search condition; And search the pairing data item occurrence of term on the data item that comprises in second search condition, generate and only allow or only do not allow to set the packet-based filtrator that the document of grouping passes through; Through each said filtrator the document that hits among the preliminary search result is filtered successively, obtain final result for retrieval.Said method with minimizing retrieving and the latter incorporated complexity of retrieval, thereby has been practiced thrift system resource through the part search condition is converted into filtercondition, has improved processing speed.
Description of drawings
Fig. 1 is the corresponding relation synoptic diagram of group character and document identification in the embodiment of the invention;
Fig. 2 is the attaching relation synoptic diagram between data item occurrence and the document in the embodiment of the invention;
Fig. 3 is the process flow diagram of search method in the embodiment of the invention;
Fig. 4 is the principle schematic of search method in the embodiment of the invention;
Fig. 5 is the structural representation of indexing unit in the embodiment of the invention.
Embodiment
In the text retrieval system, the search condition that the user submits to is made up of the term on data item (Field) and this data item.According to the searching attribute information of data item itself, i.e. the participle characteristic of data item itself, full-text search can comprise two kinds of retrieval modes:
A kind of is to create the retrieval of carrying out on the indexed data item behind the participle.
This retrieval mode requires to hit document and comprises term at the data item occurrence of creating on the indexed data item, hits between document and the search condition and can represent with the degree of correlation, and its degree of correlation is the floating number between [0,1].
The 2nd, directly create the retrieval of carrying out on the indexed data item at participle not.
It is in full accord at data item occurrence and the term created on the indexed data item that this retrieval mode requires to hit in the document document, or hit document in the document at the data item occurrence on the establishment indexed data item in the specified scope of search condition.Its degree of correlation can only not have intermediate value for 0 or 1.Therefore, this type retrieval is divided into two disjoint set with the shelves in the index database: satisfy the collection of document of search condition and do not satisfy the collection of document of search condition.Filtrator then is equivalent to this type retrieval, after filtrator filters, is met or does not satisfy the result of search condition.Therefore be participle establishment indexed data item not for searching attribute information, the search condition that comprises this data item can transform the filtrator of generation based on this data item, i.e. the said packet-based filtrator of hereinafter.
The search method that the embodiment of the invention provides adopts retrieval and filters the mode that combines, and realizes the retrieval to index database.
At first, according to preset data item (Field) document in the index database is divided into groups.And set up each data item occurrence in the preset data item, comprise the corresponding relation of document identification of the document of this data item occurrence, be specially: comprise the corresponding relation between the document identification of each document of this data item occurrence in the corresponding relation of data item occurrence and group character and group character and the index database.
Obtain data item preset in the index database,, and store the corresponding relation of each data item occurrence and group character for each data item occurrence distributes a group character (GroupID).The also corresponding relation of given data item value and group character in advance.For example: data item can comprise: plurality of data items such as newspaper name, publication date; Each data can comprise a plurality of different data item occurrences in mutually, for example comprises data item occurrences such as People's Daily, Jurisprudence Daily in the data item " newspaper name ".
According to data item occurrence all documents in the index database are divided in each grouping.All documents that are about to comprise the same data item value are included in the grouping.Thereby, realize document is incorporated in the different packets through setting up the corresponding relation of group character and document identification.Every piece of document can belong at least one grouping according to the data item occurrence that it comprised.For example, the document that will comprise data item occurrence " People's Daily " is divided in the grouping, and corresponding packet is designated GroupID 1; The document that will comprise data item occurrence " Jurisprudence Daily " is divided in the grouping, and corresponding packet is designated GroupID 2 etc.For example shown in Figure 1, be the corresponding relation synoptic diagram of document identification and group character.Wherein, the corresponding some document identification (DocID) of each GroupID.
Through above-mentioned data item occurrence and the corresponding relation of group character and the corresponding relation of setting up successively of group character and document identification, can establish the corresponding relation of the document identification of each document in each data item occurrence and the index database.The data item occurrence of above-mentioned foundation and the corresponding relation of group character and the corresponding relation of group character and document identification are saved in the packetized file.That is to say that this packetized file can provide the ability of searching group character according to data item occurrence, ability of searching group character according to document identification fast etc. is provided simultaneously.For example shown in Figure 2, be the attaching relation synoptic diagram between data item occurrence and the document.Wherein, comprise some documents (Doc) in the grouping of each data item occurrence (Field value).
After the search condition that gets access to user's submission, the flow process of in index database, retrieving is as shown in Figure 3, and it realizes that principle is as shown in Figure 4, and execution in step is following:
Step S1: from the search condition that the user submits to, isolate first search condition that is used to retrieve and second search condition that is used to filter.
Specifically, determine first search condition that is used to retrieve and second search condition that is used to filter according to the searching attribute information of data item in the search condition of user's submission.Wherein, Searching attribute information according to predefined each data item in the index database; The searching attribute information of the data item of determining that comprises is the part or all of search condition that the mode that adopts participle not directly to create index is retrieved, as second search condition; And the remaining search condition except that second search condition is first search condition in the deterministic retrieval condition.
Because after the user submitted search key to, the search condition that changes into was made up of data item and the term on this data item.And in index database; Preestablished the searching attribute information of each data item, the mode that promptly adopts the mode of creating index behind the participle to retrieve or adopt participle not directly to create index is retrieved, therefore; Can be according to the searching attribute information of predefined each data item in the index database; Find the searching attribute information of data item in the search condition, search condition is distinguished, realize determining above-mentioned first search condition and second search condition.
It is a plurality of that the user submits to search key to have, and the search condition that therefore transforms out also has a plurality of, and above-mentioned promptly is the differentiation that the situation that a plurality of search conditions are arranged is carried out, and gets ready thereby retrieve the full-text search that combines with filtration for follow-up realization.More than one of the equal possibility of first search condition of determining after the differentiation and second search condition.After distinguishing first search condition and second search condition, can realize the pairing retrieving of second search condition is converted into filter process, the packet-based filtrator that generates below promptly using substitutes the part search condition, to simplify retrieving.
Step S2: first search condition through determining is retrieved index database, obtains the preliminary search result.
According to above-mentioned first search condition of determining, index database is retrieved, obtain the preliminary search result.
When first search condition was unique, directly retrieval obtained result for retrieval; When first search condition is not unique, when a plurality of first search condition is promptly arranged, use each first search condition to retrieve respectively after, and each first search condition retrieved resulting result merge, obtain the preliminary search result.After promptly using each search condition to retrieve to obtain separately the document that is hit respectively, the document that a plurality of first search conditions of confirming wherein to comprise are all hit is the document that comprises among the preliminary search result after the merging.
Step S3: search the pairing data item occurrence of term on the data item that comprises in second search condition, generate packet-based filtrator.Wherein, the packet-based filtrator document that only allows or only do not allow to set grouping passes through.Specifically comprise:
At first, search the data item occurrence in the pairing index database of term on the data item that comprises in second search condition, generate the filtering information corresponding with second search condition.Wherein, filtering information comprises: the form and the span of filter value on the filter type of the data item of filtration, correspondence and the data item of filtering.
The form of filter value on the data item of above-mentioned filtration can comprise: one or more in the forms such as data item occurrence, the pairing group character of data item occurrence.
Then, according to filtering information directly find corresponding buffer memory in advance based on packet filter; Or, confirm corresponding effective group character according to the filter type of the data item correspondence of filtering in the filtering information and the form and the span of filter value, generate the filtrator that only allows or only do not allow the pairing document of effective group character to pass through.Promptly, realize that the document that only allows or only do not allow to set grouping passes through through only allowing or only not allowing the pairing document of effective group character to pass through.
Step S4: each the packet-based filtrator through above-mentioned generation filters the document that hits among the preliminary search result successively, obtains final result for retrieval.
At first, according to the document identification of hitting document among the preliminary search result, confirm to hit the pairing group character of document.
According to the document identification of each document among the above-mentioned preliminary search result who obtains, the group character in the inquiry stored packet file and the corresponding relation of document identification are confirmed the pairing group character of each document.Wherein, when the grouping under the document was not unique, the pairing group character of document also can have a plurality of.
Then, through each filtrator the group character of determining is filtered successively, only allow or only do not allow effective group character, obtain and to be final result for retrieval through the document of each filtrator according to filtrator.
Document among the above-mentioned preliminary search result who obtains is imported successively in the packet-based filtrator of generation and filtered; When in the pairing group character of document at least one by the permission that defines in the filtrator of process pass through effective group character the time; Confirm that the document can pass through this filtrator, otherwise can not pass through; Or when the pairing group character of document all by define in the filtrator of process the effective group character that does not allow to pass through the time, confirm that the document can not pass through this filtrator, otherwise can pass through.Be that packet-based filtrator is to filter according to the group character of document to search, and return and exist or non-existent lookup result.
When a plurality of filtrator, then the filter result of a last filtrator is filtered in the next filtrator of input.After all filtrators filter, just can obtain final result for retrieval.Be that the preliminary search result once filters through a plurality of filtrators, any one filters failure then can return the filter result of filtering failure, and a plurality of filtrators all filter the successful final result for retrieval that then is.
Among the above-mentioned steps S3, search the pairing data item occurrence of term on the data item that comprises in second search condition, generate the process of packet-based filtrator, specifically can comprise following two kinds of situation:
Situation one: the form of the filter value that comprises in the retrieving information of generation is a data item occurrence.
Under this situation, search the data item occurrence in the pairing index database of term on the data item that comprises in second search condition, generate the process of the filtering information corresponding, specifically comprise with second search condition:
Since the term on the data item that comprises in second search condition promptly corresponding the data item occurrence on this data item in the index database, promptly both have consistance.So can be according to the term on the data item that comprises in second search condition, the data item occurrence in the search index storehouse obtains the data item occurrence consistent with term.
Preferable, for a plurality of data item occurrences that belong to same data item or the span of data item occurrence, can merge in advance, reducing the quantity of the filtrator that generates, thus further conserve system resources.
Find after the data item occurrence consistent with term, the form that promptly can directly generate filter value is a data item occurrence, and corresponding span is the filtering information of the scope of the data item occurrence set or data item occurrence.
The general filtering information that generates carries out on the foreground, and after the filtering information that generates sends to the backstage, can search the filtrator whether buffer memory is arranged according to filtering information, when finding, can directly call; Perhaps confirm corresponding effective group character, generate packet-based filtrator according to filtering information.
Under this situation, confirm that according to filtering information corresponding effective group character generates the process of filtrator, specifically comprises:
According to the scope of the data item occurrence or the data item occurrence of the setting that comprises in the filtering information, search the corresponding relation of data item occurrence and group character, obtain effective group character.Promptly search data item occurrence and the corresponding relation of group character in the stored packet file in advance, the pairing group character of data item occurrence that comprises in the scope of the data item occurrence that obtains setting or the data item occurrence of setting.
According to the filter type of effective group character of determining, generate based on packet filter with effective pairing data item of group character.
Wherein, filter type comprises filtration that comprises data item occurrence and the modes such as filtration that do not comprise data item occurrence.Corresponding to the filtrator that two kinds of filter types generate, also be divided into the filtrator that filtrator that the pairing group character of the data item occurrence that only allows setting (being effective group character) passes through and the pairing group character of data item occurrence that does not only allow setting are passed through.And the filtrator that adopts above-mentioned wherein a kind of filter type reversed to obtain adopting the filtrator of above-mentioned another kind of mode.
Situation two: the form of the filter value that comprises in the retrieving information of generation is group character.
Under this situation, search the data item occurrence in the pairing index database of term on the data item that comprises in second search condition, generate the process of the filtering information corresponding, specifically comprise with second search condition:
According to the term on the data item that comprises in second search condition, the data item occurrence in the search index storehouse obtains the data item occurrence consistent with term.
Preferable, for a plurality of data item occurrences that belong to same data item or the span of data item occurrence, can merge earlier in advance.
According to the data item occurrence consistent that finds with term; Search the corresponding relation of data item occurrence and group character and determine the corresponding packet sign; The form that generates filter value is group character, and corresponding span is the filtering information of the scope of the group character set or group character.Promptly search data item occurrence and the corresponding relation of group character in the stored packet file in advance, the pairing group character of data item occurrence that comprises in the scope of the data item occurrence that obtains setting or the data item occurrence of setting.
Same, the filtrator whether buffer memory is arranged can be searched according to filtering information in the backstage, when finding, can directly call; Perhaps confirm corresponding effective group character, generate packet-based filtrator according to filtering information.
Under this situation, confirm that according to filtering information corresponding effective group character generates the process of filtrator, specifically comprises:
Directly obtain the group character that comprises in the filtering information or the scope of group character, obtain effective group character.
According to the filter type of effective group character and the pairing data item of this group character, generate packet-based filtrator then.
Preferable, the filtrator that can be with situation one generates during with situation two is stored or buffer memory, then can directly call when reusing, and avoids repeating creating.For the filtrator that adds buffer memory can (Least Recently Used, cache policy LRU) carry out the renewal replacement of buffer memory through using algorithm recently at most.
The generation of general filtering information can be handled at the front end of searching system, generates filtrator according to filtering information and then generally can handle on the backstage of searching system.That is to say in the situation one that front end is only made the simple information of being about to of handling and mail to the backstage, and in the situation two, the processing of front end is many, has reduced the processing pressure on backstage comparatively speaking.
The filtering information that generates can only comprise the filtering information of any one form among above-mentioned situation one or two, also can comprise the combination of the filtering information of above-mentioned several kinds of forms.For example:
During user search, specify newspaper name (papername), publication date (date), article title, author, publication region etc. to relate to a plurality of search keys of a plurality of data item, generated a plurality of search conditions.Wherein the searching attribute information of publication date and the pairing data item of newspaper name is for directly creating the retrieval of carrying out on the indexed data item at participle not; Then these two search conditions can be used as second search condition, and the retrieval that this search condition is corresponding is converted into filtrator.
Then according to the term that comprises in newspaper name and these two data item of publication date; Find data item occurrence corresponding in the index database; An example of the filtering information of the xml form that generates is following: wherein; The form and the span (data item occurrence and span, group character and span etc.) that comprise data item (Field), filter value, and information such as filter type.
<Filters>
<Filter?field=″papername″format=″index″operation=″exclude
″>2-5,9-20</Filter>
<Filter?field=″date″format=″value″operation=″include″>20080808</Filter>
<Filter?field=″date″format=″value″operation=″include″>20090808</Filter>
</Filters>
Above-mentioned data item comprises newspaper name (papername) and publication date (date); The form of filter value comprises that data item occurrence (value), its span are 20090808, and group character (index), its span are 2-5,9-20; Filter type comprises and comprises (include) and do not comprise (exclude) filter value etc.The filtrator that then generates according to this filtering information is the filtrator based on newspaper name and publication date two groupings, and two filtrators that can certainly generate respectively based on newspaper name and publication date filter successively.
Preferable, when the user imports search key,, above-mentioned two kinds of processing procedures that situation is given can be arranged then if adopt the directly mode of input.If what the user adopted when importing search key is selection mode; Be that system directly provides several retrieval keyword option to supply the user to select; The user only need choose the search key that will import can realize input; Then can directly its corresponding packet sign be bound with this term for offering the data item occurrence that belongs to preset in each search key that the user selects this moment.In case then the user has selected this term (data item occurrence), can directly get access to its corresponding packet sign, and needn't search the corresponding relation of data item occurrence and group character again.
According to above-mentioned search method of the present invention, can make up a kind of indexing unit, as shown in Figure 4, comprising: grouping module 10, separation module 20, retrieval module 30, generation module 40 and filtering module 50.
Grouping module 10 is used for according to the data item occurrence of preset data item the document in the index database being divided into groups.
Grouping module 10 specifically is used for: set up the corresponding relation of data item occurrence and group character, and the corresponding relation of document identification of setting up group character and the document that comprises data item occurrence, realize grouping to document in the index database.
Separation module 20 is used to get access to the search condition that the user submits to, and according to the searching attribute information of data item in the search condition, determines first search condition that is used to retrieve and second search condition that is used to filter.
Separation module 20; Specifically be used for: according to the searching attribute information of predefined each data item of index database, the said searching attribute information of the data item of confirming to comprise is that the part or all of search condition that the mode that adopts participle not directly to create index is retrieved is second search condition; Confirm that remaining search condition is first search condition.
Retrieval module 30 is used for through first search condition that separation module 20 is determined index database being retrieved, and obtains the preliminary search result.
Retrieval module 30 specifically is used for: when first search condition is not unique, use each first search condition that index database is retrieved respectively, and each first search condition is retrieved resulting result's merging, obtain the preliminary search result.
Generation module 40 is used for searching the pairing data item occurrence of term on the data item that second search condition that separation module 20 determines comprises, and generates packet-based filtrator; Wherein, the packet-based filtrator document that only allows or only do not allow to set grouping passes through.
Preferable, above-mentioned generation module 40 specifically comprises: information process unit 401 and definite generation unit 402.
Information process unit 401 is used for searching the data item occurrence in the pairing index database of term on the data item that second search condition that separation module 20 determines comprises, and generates the filtering information corresponding with second search condition; Wherein, comprise in the filtering information: the form and the span of filter value on the filter type of the data item of filtration, correspondence and the data item of filtering.
Preferable, information process unit 401 further can comprise: search subelement 4011 and handle subelement 4012.
Search subelement 4011, the term on the data item that is used for comprising according to second search condition that separation module 20 is determined, the search index storehouse obtains the data item occurrence consistent with term.
Handle subelement 4012, the form that is used for directly generating filter value is a data item occurrence, and corresponding span is the filtering information of the scope of the data item occurrence set or data item occurrence; Or according to the data item occurrence consistent with term; Search the corresponding relation of data item occurrence and group character and determine the corresponding packet sign; The form that generates filter value is group character, and corresponding span is the filtering information of the scope of the group character set or group character.
Confirm generation unit 402; The filter type that the data item that is used for filtering according to the filtering information that information process unit 401 generates is corresponding and the form and the span span of filter value; Confirm corresponding effective group character, generate the packet-based filtrator that only allows or only do not allow the pairing document of effective group character to pass through; Or according to the filtering information that information process unit 401 generates directly find corresponding buffer memory in advance based on packet filter.
Preferable, confirming generation unit 402, further can comprise: confirm subelement 4021 and generate subelement 4022.
Confirm subelement 4021, the data item occurrence of the setting that is used for comprising according to the filtering information that information process unit 401 generates or the scope of data item occurrence are searched the corresponding relation of data item occurrence and group character, obtain effective group character; Or directly obtain the group character that comprises in the filtering information or the scope of group character, obtain effective group character.
Generate subelement 4022, be used for the filter type corresponding with this data item, generate packet-based filtrator according to effective group character.
Filtering module 50, the packet-based filtrator that is used for successively generating through each generation module 40 filters the preliminary search result's that retrieval module 30 obtains the document that hits, and obtains final result for retrieval.
Preferable, above-mentioned filtering module 50 specifically comprises: sign is confirmed unit 501 and filter element 502.
Sign is confirmed unit 501, and the document identification that the preliminary search result who is used for obtaining according to retrieval module 30 hits document confirms to hit the pairing group character of document.
Filter element 502; The packet-based filtrator that is used for successively generating through each generation module 40 confirms that to identifying the group character that unit 501 is determined filters; Only allow or only do not allow effective group character according to packet-based filtrator; Obtain and to be final result for retrieval through the document of each said filtrator.
For example: shown in Figure 5 is the principle contrast synoptic diagram of search method of the application's search method and prior art.
As can be seen from Figure 5; To the Boolean retrieval that four search conditions are arranged; Way originally is to use four search conditions ( search condition 1,2,3,4) to retrieve respectively; (for example: the document that each search condition below provides among the figure), the document that then each search condition is hit merges, thereby obtains net result to obtain the document that hits of each search condition.Then be that wherein search condition 3 and 4 is transformed for filtrator among the application, use search condition 1 and 2 to retrieve after, with obtaining the preliminary search result after the result for retrieval merging, filter successively through filtrator 1 and 2 then, obtain net result.Be used for reducing the complexity that the result merges through making of filtrator, improve retrieval rate.
Above-mentioned search method and device that the embodiment of the invention provides are through dividing into groups to the document in the index database according to the data item occurrence in the preset data item; So that when retrieval, can generate packet-based filtrator.The document that packet-based filtrator only allows or only do not allow to set grouping passes through, with respect to it creates that efficient data volume high, that handle is little, strainability is high based on the filtrator of document.
When needs carry out full-text search, can determine first search condition and second search condition according to the searching attribute information of data item in the search condition that gets access to user's submission, realize with the pairing retrieving of second search condition being filter process.Through packet-based filtrator preliminary search is filtered, conveniently realize filtering, obtain final result for retrieval through group character.The search condition that is used to retrieve through minimizing reduces the complexity that retrieving and result for retrieval merge, and improves processing speed, thereby has practiced thrift system resource, has improved processing speed.Thereby overcome in the text retrieval system performance deficiency, improved the overall performance of searching system based on the filtrator of document.
Can also generate the filtrator commonly used that complicated filtrator and buffer memory generate in advance in addition; Reduce the generation and the constructive process of filtrator; And the generative process of retrieving and filtrator can concurrent processing, also can merge to handle for the term of same data item to generate a filtrator, thereby further optimize system performance; Reach and reduce search condition quantity, better improve the purpose of retrieval performance.
The above; Be merely the preferable embodiment of the present invention; But protection scope of the present invention is not limited thereto; Any technician who is familiar with the present technique field variation that can expect easily, replaces or is applied to other similar devices in the technical scope that the present invention discloses, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (10)

1. search method; It is characterized in that; Comprise: set up the corresponding relation of each data item occurrence and group character in the preset data item and set up the corresponding relation between the document identification of each document that comprises this data item occurrence in said group character and the index database; When getting access to the search condition of user's submission, carry out the following step:
According to the searching attribute information of data item in the said search condition, determine first search condition that is used to retrieve and second search condition that is used to filter;
Through said first search condition said index database is retrieved, obtained the preliminary search result; And search the pairing data item occurrence of term on the data item that comprises in said second search condition, generate and the corresponding filtering information of said second search condition; Comprise in the said filtering information: the form and the span of filter value on the filter type of the data item of filtration, correspondence and the data item of filtering; According to the filter type of the data item correspondence of filtering in the said filtering information and the form and the span of said filter value; Confirm corresponding effective group character, generate the said filtrator that only allows or only do not allow the pairing document of said effective group character to pass through; Or according to said filtering information directly find corresponding buffer memory in advance based on packet filter; Wherein, the said packet-based filtrator document that only allows or only do not allow to set grouping passes through;
According to the document identification of hitting document among the said preliminary search result, confirm to hit the pairing group character of document; Through each said filtrator the said group character of determining is filtered successively, only allow or only do not allow said effective group character, obtain and to be said final result for retrieval through the document of each said filtrator according to said filtrator.
2. the method for claim 1; It is characterized in that; Data item occurrence in the said pairing said index database of searching on the data item that comprises in said second search condition of term generates and the corresponding filtering information of said second search condition, specifically comprises:
According to the term on the data item that comprises in said second search condition, inquire about said index database, obtain the data item occurrence consistent with said term;
The form that directly generates said filter value is a data item occurrence, and corresponding span is the filtering information of the scope of the data item occurrence set or data item occurrence; Or according to the data item occurrence consistent with said term; Search the corresponding relation of said data item occurrence and group character and determine the corresponding packet sign; The form that generates said filter value is group character, and corresponding span is the filtering information of the scope of the group character set or group character.
3. method as claimed in claim 2 is characterized in that, according to said filtering information, confirms that corresponding effective group character generates the process of filtrator, specifically comprises:
Scope according to the data item occurrence or the data item occurrence of the setting that comprises in the said filtering information; Search the corresponding relation of said data item occurrence and group character; Obtain said effective group character, and, generate said filtrator according to effective group character and said filter type; Or directly obtain the group character that comprises in the said filtering information or the scope of group character, obtain said effective group character, and, generate said filtrator according to effective group character and said filter type.
4. the method for claim 1 is characterized in that, said searching attribute information according to data item in the said search condition is determined first search condition that is used to retrieve and second search condition that is used to filter, and specifically comprises:
According to the searching attribute information of predefined each data item in the said index database, the said searching attribute information of the data item of confirming to comprise is that the part or all of search condition that the mode that adopts participle not directly to create index is retrieved is said second search condition; Confirm that remaining search condition is said first search condition.
5. the method for claim 1; It is characterized in that, when said first search condition is not unique, use each first search condition that said index database is retrieved respectively; And each first search condition is retrieved resulting result merge, obtain said preliminary search result.
6. an indexing unit is characterized in that, comprising:
Grouping module is used to set up the corresponding relation of data item occurrence and group character, and the corresponding relation of document identification of setting up group character and the document that comprises data item occurrence;
Separation module is used to get access to the search condition that the user submits to, and according to the searching attribute information of data item in the said search condition, determines first search condition that is used to retrieve and second search condition that is used to filter;
Retrieval module is used for through first search condition that said separation module is determined said index database being retrieved, and obtains the preliminary search result;
Generation module specifically comprises: information process unit and definite generation unit;
Information process unit is used for searching the data item occurrence in the pairing said index database of term on the data item that second search condition that said separation module determines comprises, and generates and the corresponding filtering information of said second search condition; Comprise in the said filtering information: the form and the span of filter value on the filter type of the data item of filtration, correspondence and the data item of filtering;
Confirm generation unit; The filter type that the data item that is used for filtering according to the filtering information that said information process unit generates is corresponding and the form and the span of said filter value; Confirm corresponding effective group character, generate the said filtrator that only allows or only do not allow the pairing document of said effective group character to pass through; Or according to said filtering information directly find corresponding buffer memory in advance based on packet filter; Wherein, the said packet-based filtrator document that only allows or only do not allow to set grouping passes through;
Filtering module specifically comprises: sign is confirmed unit and filter element;
Sign is confirmed the unit, is used for the document identification of hitting document according to said preliminary search result, confirms to hit the pairing group character of document;
Filter element is used for through each said filtrator the said group character of determining being filtered successively, only allows or does not only allow said effective group character according to said filtrator, and obtaining can be through the document of each said filtrator.
7. device as claimed in claim 6 is characterized in that, said information process unit specifically comprises:
Search subelement, said index database inquired about in the term on the data item that is used for comprising according to second search condition that said separation module is determined, and obtains the data item occurrence consistent with said term;
Handle subelement, the form that is used for directly generating said filter value is a data item occurrence, and corresponding span is the filtering information of scope of data item occurrence or the data item occurrence of setting; Or according to the data item occurrence consistent with said term; Search the corresponding relation of said data item occurrence and group character and determine the corresponding packet sign; The form that generates said filter value is group character, and corresponding span is the filtering information of the scope of the group character set or group character.
8. device as claimed in claim 7 is characterized in that, said definite generation unit specifically comprises:
Confirm subelement, the data item occurrence of the setting that is used for comprising according to the filtering information that said information process unit generates or the scope of data item occurrence are searched the corresponding relation of said data item occurrence and group character, obtain said effective group character; Or directly obtain the group character that comprises in the said filtering information or the scope of group character, obtain said effective group character;
Generate subelement, be used for generating said filtrator according to effective group character and said filter type.
9. device as claimed in claim 6; It is characterized in that; Said separation module; Specifically be used for: according to the searching attribute information of predefined each data item of said index database, the said searching attribute information of the data item of confirming to comprise is that the part or all of search condition that the mode that adopts participle not directly to create index is retrieved is said second search condition; Confirm that remaining search condition is said first search condition.
10. device as claimed in claim 6; It is characterized in that; Said retrieval module specifically is used for: when said first search condition is not unique, use each first search condition that said index database is retrieved respectively; And each first search condition is retrieved resulting result merge, obtain said preliminary search result.
CN2009102371861A 2009-11-10 2009-11-10 Searching method and searching device Expired - Fee Related CN102054007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102371861A CN102054007B (en) 2009-11-10 2009-11-10 Searching method and searching device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102371861A CN102054007B (en) 2009-11-10 2009-11-10 Searching method and searching device

Publications (2)

Publication Number Publication Date
CN102054007A CN102054007A (en) 2011-05-11
CN102054007B true CN102054007B (en) 2012-10-31

Family

ID=43958341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102371861A Expired - Fee Related CN102054007B (en) 2009-11-10 2009-11-10 Searching method and searching device

Country Status (1)

Country Link
CN (1) CN102054007B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810096B (en) 2011-06-02 2016-03-16 阿里巴巴集团控股有限公司 A kind of search method based on individual character directory system and device
CN103123638B (en) * 2011-11-21 2017-08-04 北京神州泰岳软件股份有限公司 Data search method and device
CN103136305B (en) * 2011-12-05 2016-10-05 北大方正集团有限公司 Treating method and apparatus for test resource
CN103365910B (en) * 2012-04-06 2017-02-15 腾讯科技(深圳)有限公司 Method and system for information retrieval
CN103891244B (en) * 2012-09-04 2016-11-16 华为技术有限公司 A kind of method and device carrying out data storage and search
CN103853742B (en) * 2012-11-29 2017-11-24 北大方正集团有限公司 Retrieve device, terminal and search method
CN105701155B (en) * 2015-12-30 2019-05-31 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN105808737B (en) * 2016-03-10 2021-04-06 腾讯科技(深圳)有限公司 Information retrieval method and server
CN106202449B (en) * 2016-07-14 2019-09-13 上海超橙科技有限公司 Information retrieval and methods of exhibiting and system
CN106779580A (en) * 2016-11-17 2017-05-31 中知厚德知识产权投资管理(天津)有限公司 Multi-level intellectual property data system
CN106504140A (en) * 2016-11-17 2017-03-15 中知厚德知识产权投资管理(天津)有限公司 The intellectual property data system of various dimensions technology correlation evaluation
CN108090064B (en) * 2016-11-21 2021-10-08 腾讯科技(深圳)有限公司 Data query method and device, data storage server and system
CN107391535B (en) * 2017-04-20 2021-01-12 创新先进技术有限公司 Method and device for searching document in document application
CN107480253A (en) * 2017-08-14 2017-12-15 浪潮软件集团有限公司 Retrieval method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1429247A2 (en) * 2002-12-09 2004-06-16 Microsoft Corporation Managed file system filter model and architecture
CN101046811A (en) * 2006-06-07 2007-10-03 华为技术有限公司 Data integral service system and method
CN101281524A (en) * 2007-09-24 2008-10-08 北大方正集团有限公司 Method and apparatus for acquiring material

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1429247A2 (en) * 2002-12-09 2004-06-16 Microsoft Corporation Managed file system filter model and architecture
CN101046811A (en) * 2006-06-07 2007-10-03 华为技术有限公司 Data integral service system and method
CN101281524A (en) * 2007-09-24 2008-10-08 北大方正集团有限公司 Method and apparatus for acquiring material

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈昊华等.管理信息系统通用查询器的设计与实现.《计算机与现代化》.2003,(第12期),87-89. *

Also Published As

Publication number Publication date
CN102054007A (en) 2011-05-11

Similar Documents

Publication Publication Date Title
CN102054007B (en) Searching method and searching device
CN104679778B (en) A kind of generation method and device of search result
US8266152B2 (en) Hashed indexing
US10621370B2 (en) Methods and apparatus to provide group-based row-level security for big data platforms
US8108411B2 (en) Methods and systems for merging data sets
CN103902544B (en) A kind of data processing method and system
US8296279B1 (en) Identifying results through substring searching
CN107368527B (en) Multi-attribute index method based on data stream
CA2484009A1 (en) Managing expressions in a database system
AU2005239366A1 (en) Partial query caching
CN107783985B (en) Distributed database query method, device and management system
CN106547755A (en) A kind of data processing method and device based on piece key
US20040054683A1 (en) System and method for join operations of a star schema database
CN102411617A (en) Method for storing and inquiring a large quantity of URLs
Terrovitis et al. Efficient answering of set containment queries for skewed item distributions
CN103605750B (en) A kind of quick distributed data paging method
Cappellari et al. A path-oriented rdf index for keyword search query processing
CN102486775A (en) Method and device for querying business data
CN102314464A (en) Lyrics searching method and lyrics searching engine
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN104252537A (en) Index fragmentation method based on mail characteristics
Kulkarni et al. Skyline computation for frequent queries in update intensive environment
US8805820B1 (en) Systems and methods for facilitating searches involving multiple indexes
CN106156197A (en) The querying method of a kind of data base and device
WO2010089403A4 (en) Two-valued logic database management system with support for missing information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220622

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: FOUNDER APABI TECHNOLOGY Ltd.

Patentee after: Peking University

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: FOUNDER APABI TECHNOLOGY Ltd.

Patentee before: Peking University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121031