CN100412864C - Full-text retrieval system and method - Google Patents

Full-text retrieval system and method Download PDF

Info

Publication number
CN100412864C
CN100412864C CNB2005101080095A CN200510108009A CN100412864C CN 100412864 C CN100412864 C CN 100412864C CN B2005101080095 A CNB2005101080095 A CN B2005101080095A CN 200510108009 A CN200510108009 A CN 200510108009A CN 100412864 C CN100412864 C CN 100412864C
Authority
CN
China
Prior art keywords
retrieval
mentioned
search
result
morpheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005101080095A
Other languages
Chinese (zh)
Other versions
CN1755691A (en
Inventor
高知尾胜彦
笹气光一
加藤阳二
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba Digital Solutions Corp
Original Assignee
Toshiba Corp
Toshiba Solutions Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba Solutions Corp filed Critical Toshiba Corp
Publication of CN1755691A publication Critical patent/CN1755691A/en
Application granted granted Critical
Publication of CN100412864C publication Critical patent/CN100412864C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A first search unit (13) executes a search based on an N-gram index (14) by using a first search according to searching condition sentences and a secondary search aiming at the results of the first search. A morpheme analysis unit (15) can conduct the morpheme analysis on the searching condition sentences. A second search unit (16) executes a morpheme search based on a morpheme index (17) according to the analysis results of the morpheme analysis unit (15). An approximation-degree decision unit (183) can decide the approximation degree between the first hit value of the hit values of the first search based on the N-gram index (14) and the second hit value of the hit values of the morpheme search based on the morpheme index. A full-text search executes a control unit (18), and the secondary search based on the N-gram index can be omitted, and the first search unit (13) can be controlled when the first hit value is approximate to the second hit value; and the results of the first search or the results of the morpheme search can be taken as search results.

Description

Text retrieval system and method
Technical field
The present invention relates to be fit among the huge document information of electronization, utilize global search technology to retrieve the text retrieval system and the method for the document of the search condition that meets appointment fast.
Background technology
Among the huge document information of electronization, retrieve the searching system of the document of the search condition that meets appointment, developed a variety of in the past.As the representational search method of the file retrieval of using in this searching system, known have based on the search method of N-gram (N loigature string) index or based on the search method of morpheme (morpheme) index.Search method based on the N-gram index is used for full-text search.On the other hand, the search method based on the morpheme index is used for natural language searching (conceptual retrieval).The summary of these search methods is as follows.
<based on the search method of N-gram index 〉
Constitute the character string of document, when per 1 character staggers with character position, cut apart (division) and be the character string of length N (word string (gram)).As a result, the alphabet that occurs in document registers to index as the continuation character string (word string) of length N.The value of N can pre-determine.When retrieval, too,, be divided into the group of the character string (word string) of length N as the searching character string (term) of search condition.So, can utilize following step, retrieve by the information that obtains the appearance of identical characters string from index.
In retrieval (N-gram retrieval), at first carry out primary retrieval based on the N-gram index.In this primary retrieval, only the character string (promptly have or not and hit (hit)) that meets with the character string that has or not with the length N of being cut apart by term is selected candidate documents.Carry out quadratic search afterwards.In this quadratic search,, from the candidate documents of selecting, choose the document that comprises term by checking the neighbouring relations of each speech.Like this, in retrieval,, can realize not having the full-text search of omission by the primary retrieval and the retrieval in two stages of quadratic search based on the N-gram index.
Known, in order to improve the retrieval precision of primary retrieval, can strengthen the value of the N of N-gram.Yet because when strengthening the value of N, the scale of index can become greatly, retrieval might need the plenty of time.On the other hand, when reducing the value of N, retrieval is disturbed to be increased, and retrieval precision descends.Since quadratic search be with whole documents of hitting as object, so hits many more (irrelevant with what of actual interference), efficient is low more.
<based on the search method of morpheme index 〉
By analysis to document, from the document, in the scope of minimum linguistic unit (morpheme) with meaning, the morpheme that extraction should index (word).Each morpheme that extracts is distributed document information.Distribute the morpheme of this document information, registered to index.When retrieval, too, term is divided into morpheme.So, can utilize following step, retrieve by the document information that obtains from index meeting with identical morpheme.
In the retrieval (morpheme retrieval) based on the morpheme index, the capacity of index is very little just enough, and can retrieve fast.Its reason is: different with N-gram, between each morpheme, there is not the part that repeats.But, when morpheme was inconsistent between as the document of searching object and term, omission can take place.
Like this, in retrieval based on the N-gram index, no omission, and primary retrieval speed is fast.Yet, in retrieval, slow with the speed of the quadratic search that removes interference based on the N-gram index.On the other hand, in retrieval, can retrieve fast, but omission might take place based on the morpheme index.That uses in full-text search in other words, respectively has length based on the search method of N-gram index and the search method of using based on the morpheme index in natural language searching.
So, such as, in TOHKEMY 2001-092831 communique (Jpn.Pat.Appln.KOKAIPublication No.2001-092831), record and narrate the strong point that performance full-text search and natural language searching are arranged, the file retrieval technology of the file retrieval that is used for realizing that omission is few (below be called the 1st look-ahead technique).This 1st look-ahead technique is characterised in that, carries out two kinds of retrievals of full-text search and natural language searching and the result of two kinds of retrievals is gathered (combination) this point.In this 1st look-ahead technique, from the result for retrieval of full-text search, utilize natural language searching to choose result for retrieval.In addition, in the 1st look-ahead technique, in contrast, also can from utilize the result for retrieval that natural language searching (rough natural language searching) obtains, utilize full-text search to choose result for retrieval.In this occasion, can from the result for retrieval that utilizes natural language searching to obtain, retrieve the document that comprises specify text.
Like this, the 1st look-ahead technique is characterised in that, with full-text search and natural language searching as retrieval process and the result of two kinds of retrievals gathered this point independently separately.In other words, in the 1st look-ahead technique, from the resulting result for retrieval of the either party who utilizes full-text search and natural language searching, the opposing party by this full-text search and natural language searching chooses result for retrieval.Therefore, in the 1st look-ahead technique, must carry out full-text search and natural language searching.But, full-text search is compared with natural language searching, and its speed is slow.Therefore, in the time will being applied to full-text search based on the search method of N-gram index, this full-text search need be based on retrieval execution time of the total of N-gram index (primary retrieval time+quadratic search time).In other words, the 1st look-ahead technique does not have the shortcoming that is used for eliminating full-text search and makes the rapid structure of this full-text search itself.Therefore, the 1st look-ahead technique just goes wrong in the many occasions of retrieval hits.
On the other hand, recording and narrating in TOHKEMY 2003-308335 communique (Jpn.Pat.Appln.KOKAIPublication No.2003-308335) has, corresponding to retrieval type, use based on the full-text search of N-gram index or based on one file retrieval technology in the retrieval of morpheme index (below be called the 2nd look-ahead technique) as search condition.In this 2nd look-ahead technique, evaluation (judgement) retrieval type is crucial morphological pattern or " natural language type (conceptual retrieval) " in advance.If retrieval type is crucial morphological pattern, just carry out full-text search, if the natural language type just carries out the retrieval based on the morpheme index.
In this 2nd look-ahead technique, judging that retrieval type is the occasion of crucial morphological pattern, retrieval process need be based on the retrieval execution time (execution time of the execution time+quadratic search of primary retrieval) of the total of N-gram index.So, also the same in the 2nd look-ahead technique with the 1st look-ahead technique, can not help full-text search itself rapid.
As mentioned above, in the 1st look-ahead technique, must carry out full-text search.And, in the 2nd look-ahead technique, when retrieval type is crucial morphological pattern, also carry out full-text search based on the N-gram index.This full-text search based on the N-gram index needs a large amount of time., any one in the 1st and the 2nd look-ahead technique do not have and is used for making the rapid structure of full-text search itself.
Summary of the invention
The objective of the invention is to guarantee to a certain extent retrieval precision, and can carry out full-text search fast.
According to one embodiment of the present invention, a kind of text retrieval system can be provided, its formation comprises: utilize according to the primary retrieval of search condition statement with at the result's of this primary retrieval quadratic search, carry out the 1st retrieval unit based on the retrieval of N-gram index; Above-mentioned search condition statement is carried out the morphemic analysis unit of morphemic analysis; And, carry out the 2nd retrieval unit based on the morpheme retrieval of morpheme index according to utilizing the resulting morphemic analysis result in above-mentioned morphemic analysis unit.This text retrieval system has: judge as the 1st hits of the hits of above-mentioned primary retrieval based on the N-gram index with as the degree of approximation identifying unit of the degree of approximation of the 2nd hits of the hits of above-mentioned morpheme retrieval based on the morpheme index; And utilizing above-mentioned degree of approximation identifying unit to be judged to be above-mentioned the 1st hits and above-mentioned the 2nd hits when approximate, control above-mentioned the 1st retrieval unit in the mode of omitting above-mentioned quadratic search based on the N-gram index, adopt the result of above-mentioned primary retrieval or the result of above-mentioned morpheme retrieval to carry out control module as the full-text search of result for retrieval.
Description of drawings
Fig. 1 is the block diagram of formation that the text retrieval system of one embodiment of the present invention is shown.
Fig. 2 is the process flow diagram of step that the quick retrieval process of same embodiment is shown.
Fig. 3 is the diagrammatic sketch that an example of search interface picture is shown.
Fig. 4 is the diagrammatic sketch that an example of result for retrieval picture is shown.
Fig. 5 is the process flow diagram of step of quick retrieval process that the 1st variation of above-mentioned embodiment is shown.
Fig. 6 is the process flow diagram of step of quick retrieval process that the 2nd variation of above-mentioned embodiment is shown.
Embodiment
With reference to the accompanying drawings one embodiment of the present invention is described.Fig. 1 is the block diagram of formation that the text retrieval system of one embodiment of the present invention is shown.This text retrieval system is according to from user's retrieval requirement, carry out based on the N-gram index retrieval (being full-text search) and based on the system of the retrieval (being natural language searching) of morpheme index.Wherein, in the text retrieval system of Fig. 1, in the occasion that meets some requirements, the part of full-text search (based on the quadratic search of N-gram index) can be omitted.
The formation of the text retrieval system of Fig. 1 comprises: control gear 18 is carried out in user interface 11, retrieval execution/answering server 12, N-gram search engine 13, N-gram index data base 14, morphemic analysis mechanism 15, morpheme search engine 16, morpheme index data base 17 and full-text search.
User interface 11 has that reception requires from user's retrieval and to the interface function of user prompt result for retrieval.In the present embodiment, user interface 11, the part of formation text retrieval system.Yet user interface 11 also can not be the inscape of text retrieval system.Such as, user interface 11, also can be arranged on Jie by the formation in communication line (such as, network) and the client terminal that the text retrieval system of Fig. 1 is connected.
Retrieval execution/answering server 12, the search condition that will be required by the expression retrieval that user interface 11 receives conveys to N-gram search engine 13 and morphemic analysis mechanism 15., suppose to use character string (searching character string) herein, i.e. search condition statement is as search condition.The result for retrieval that utilizes retrieval execution/answering server 12, N-gram search engine 13 and morphemic analysis mechanism 15 to obtain is passed through user interface 11 to user prompt.
N-gram search engine 13 uses the N-gram index that is stored in the N-gram index data base 14 to carry out full-text search.N-gram search engine 13 comprises primary retrieval performance element 131 and quadratic search performance element 132.Primary retrieval performance element 131 utilizes group's (promptly according to search condition statement) of the character string of the length N that is obtained by the search condition statement to carry out the primary retrieval based on the N-gram index.The group of the character string of length N is by search condition statement limit is cut apart per 1 character of the character position limit of staggering (division) and obtained for the character string of length N (word string).Quadratic search performance element 132 carries out the quadratic search (at the result's of primary retrieval quadratic search) based on the N-gram index.The N-gram index that is stored in the N-gram index data base 14 is used for, and the continuation character string (word string) of the alphabet that is occurred in all documents that can become searching object as predetermined length N managed.In this N-gram index, to the continuation character string of each length N, the positional information of the position of the document that this character string of registration expression exists.
Morphemic analysis mechanism 15 carries out morphemic analysis to search condition (search condition statement).Morpheme search engine 16 according to the morphemic analysis result who is obtained by morphemic analysis mechanism 15, utilizes the morpheme index that is stored in the morpheme index data base 17 to carry out the morpheme retrieval.In the morpheme index in being stored in the morpheme index data base, registration comprises the document information for the positional information of the position of the document of this morpheme existence of the expression of distributing to each morpheme that extracts from the document that can become searching object.
Control gear 18 is carried out in full-text search, in order to carry out the full-text search that utilizes the N-gram index fast, controls N-gram search engine 13 and morpheme search engine 16 according to the setting content of set information file 19.In set information file 19, preestablish the information of the necessary condition of control etc. of the execution of the full-text search that utilizes full-text search to carry out control gear 18.Set information file 19 can be provided by mediums such as CD-ROM, storage cards.In addition, also can set information file 19 be downloaded in the text retrieval system of Fig. 1 through network.
Control gear 18 is carried out in full-text search, comprises morphemic analysis identifying unit 181, primary retrieval number of results identifying unit 182 and degree of approximation identifying unit 183 as a result.Morphemic analysis is identifying unit 181 as a result, according to the morphemic analysis result who utilizes morphemic analysis mechanism 15 to obtain to the search condition statement, determine to carry out based on the morpheme index retrieval (being the morpheme retrieval) or based in the quadratic search of N-gram index which.Primary retrieval number of results identifying unit 182, the result according to based on the primary retrieval of N-gram index determines whether carry out the quadratic search based on the N-gram index.Result for retrieval is counted degree of approximation identifying unit 183, and result and morpheme result for retrieval according to based on the primary retrieval of N-gram index determine whether carry out the quadratic search based on the N-gram index.
Below with reference to the process flow diagram of Fig. 2, the step of the full-text search in the quick search modes of carrying out being handled (retrieval process fast) in the text retrieval system of Fig. 1 describes.In addition, in the present embodiment,, except above-mentioned quick search modes, also prepare to have the standard retrieval pattern as search modes.Use which search modes in quick search modes and the standard retrieval pattern, as described later, can select by the user.The feature of search modes is as described below fast, is in the occasion that meets some requirements, and can omit the quadratic search this point based on the N-gram index.On the other hand, the standard retrieval pattern is characterised in that, in all case all is performed until based on this point till the quadratic search of N-gram index.
Below, suppose the user that full-text search is carried out in hope, carry out input operation by utilizing client terminal, the text retrieval system of Fig. 1 is sent the retrieval requirement of appointment full-text search from this terminal.User interface 11 receives this retrieval requirement, extracts this retrieval and requires represented search condition.User interface 11 sends to retrieval execution/answering server 12 with the search condition of having extracted.In addition, user interface 11 requires represented retrieval classification (such as, full-text search) notice retrieval execution/answering server 12 with retrieval.Retrieval execution/answering server 12, the occasion specifying full-text search in order to carry out full-text search, will send to N-gram search engine 13 from the search condition that user interface 11 transmits.
Primary retrieval performance element 131 in the N-gram search engine 13 receives the search condition of sending from retrieval execution/answering server 12.In the present embodiment, this search condition is search condition statement (searching character string).Primary retrieval performance element 131 according to this search condition statement, utilizes the N-gram index that is stored in the N-gram index data base 14, carries out known primary retrieval (step S1).Primary retrieval performance element 131 is in the N-gram search engine 13 inner primary retrieval results that keep.In addition, primary retrieval performance element 131, the number that will hit in primary retrieval (hits) N1 sends to full-text search with the search condition statement and carries out control gear 18.
The primary retrieval number of results identifying unit 182 in the control gear 18 is carried out in full-text search, will compare from hits N1 and benchmark hits (hits threshold value) K that primary retrieval performance element 131 sends, and judges its size (step S2).This hits threshold k is set in set information file 19.If hits N1 is smaller or equal to threshold k, then control gear 18 is carried out in full-text search, requires N-gram search engine 13 to carry out quadratic search.Threshold k as described later, can change (adjustment) by user's operation.
Relative therewith, in the occasion of hits N1 greater than threshold k, control gear 18 is carried out in full-text search, after hits N1 is remained on inside, the search condition statement is sent to morphemic analysis mechanism 15.Morphemic analysis mechanism 15 receiving when carrying out the search condition statement of control gear 18 from full-text search, just carries out morphemic analysis (step S3) to this search condition statement.Then, morphemic analysis mechanism 15 returns to full-text search with the result of morphemic analysis and carries out control gear 18.
Morphemic analysis in the full-text search execution control gear 18 is identifying unit 181 as a result, and the morphemic analysis result who is obtained by morphemic analysis mechanism 15 is judged (step S4).In other words, morphemic analysis is identifying unit 181 as a result, judges the word that whether the search condition statement can be divided into the retrieval (morpheme retrieval) that can carry out based on the morpheme index.The so-called word that can carry out the morpheme retrieval, refer to the word that itself has meaning (such as, be the autonomous word of representative with noun, verb, adjective).If the search condition statement can not be divided into the word that can carry out the morpheme retrieval, then full-text search execution control gear 18 requires N-gram search engine 13 to carry out quadratic search.
Relative therewith, can be divided into the occasion of the word that can carry out morpheme retrieval at the search condition statement, morphemic analysis is identifying unit 181 as a result, and the result of the morphemic analysis that utilizes morphemic analysis mechanism 15 to obtain is sent to morpheme search engine 16.Morpheme search engine 16, from morphemic analysis as a result identifying unit 181 receive morphemic analysis as a result the time, just utilize this morphemic analysis result and morpheme index data base 17, carry out known morpheme retrieval (step S5).Then, morpheme search engine 16, the result that morpheme is retrieved remains on inside.In addition, morpheme search engine 16, number (hits) N2 that will hit in the morpheme retrieval sends to full-text search and carries out control gear 18.
The result for retrieval that full-text search is carried out in the control gear 18 is counted degree of approximation identifying unit 183, judges whether hits (the 1st hits) N1 and hits (the 2nd hits) N2 in the morpheme retrieval in the primary retrieval be approximate
Figure C20051010800900131
(step S6).Hits N1 is illustrated in the number that hits in the primary retrieval that utilizes the primary retrieval performance element 131 in the N-gram search engine 13 to carry out, and as mentioned above, remains on the inside that control gear 18 is carried out in full-text search.Hits N2 is illustrated in the number that hits in the morpheme retrieval that utilizes morpheme search engine 16, sends from this morpheme search engine 16.In step S6, result for retrieval is counted degree of approximation identifying unit 183, and whether the degree of approximation (%) of judging N1 and N2 is in approximate ratio (degree of approximation threshold value) P (%).This approximate ratio P, expression is set in set information file 19 as the degree of approximation of the benchmark of degree of approximation judgement.In the present embodiment, the degree of approximation of N1 and N2, with | N1-N2| * 100%/N1 or | N1-N2| * 100%/N2 represents.In other words, the degree of approximation of N1 and N2 is with ratio (%) expression of absolute value and the N1 or the N2 of the difference of N1 and N2.This degree of approximation is more little, just represents that N1 and N2 are approximate more.Approximate ratio P as described later, can adjust by user's operation.
Result for retrieval is counted degree of approximation identifying unit 183, surpasses the occasion of P in the degree of approximation of N1 and N2, just judges that N1 and N2 are approximate.In this occasion, control gear 18 is carried out in full-text search, requires N-gram search engine 13 to carry out quadratic search.
Relative therewith, in the degree of approximation of N1 and N2 at P with interior occasion, result for retrieval is counted degree of approximation identifying unit 183, just judges that N1 and N2 are approximate.In this occasion, control gear 18 is carried out in full-text search, does not require that N-gram search engine 13 carries out quadratic search.This point is carried out control gear 18 with full-text search and is controlled N-gram search engine 13 equivalences to omit based on the quadratic search abridged mode of N-gram index.Then, control gear 18 is carried out in full-text search, determines to make in N-gram retrieval or the morpheme retrieval which preferential.The necessary condition (employing condition) that this is determined is set in set information file 19.This employing condition as described later, can be adjusted by user's operation.
If make the N-gram retrieval preferential, then control gear 18 is carried out in full-text search, requires N-gram search engine 13 that the primary retrieval result is turned back to retrieval execution/answering server 12.On the other hand, make the preferential occasion of morpheme retrieval, control gear 18 is carried out in full-text search, requires morpheme search engine 16 that the morpheme result for retrieval is turned back to retrieval execution/answering server 12.In other words, control gear 18 is carried out in full-text search, make the result who utilizes the primary retrieval that N-gram search engine 13 (interior primary retrieval performance element 131) obtains or utilize the result of the morpheme retrieval that morpheme search engine 16 obtains, turn back to retrieval execution/answering server 12 (step S7) from this N-gram search engine 13 or morpheme search engine 16.Herein, the primary retrieval result remains on the inside of N-gram search engine 13.In addition, the morpheme result for retrieval remains on the inside of morpheme search engine 16.
Retrieval executions/answering server 12 when carrying out control gear 18 or morpheme search engine 16 from full-text search and receive primary retrieval result or morpheme result for retrieval, is just notified the user with this result for retrieval via user interface 11 (reaching retrieve application).What additional representation judge the information of carrying out retrieval by in this result for retrieval.
Herein, condition a1, a2 and a3 are defined as follows.
A1: surpass hits threshold k (N1>K) based on the hits N1 in the primary retrieval of N-gram index.
A2: the search condition statement can be divided into the word that can carry out the morpheme retrieval.
A3: approximate based on the hits N2 in hits N1 in the primary retrieval of N-gram index and the morpheme retrieval
Figure C20051010800900141
As seen from the above description, in the present embodiment, in the occasions that condition a1, a2 and a3 all set up, promptly the result of determination in step S2, S4 and S6 all is the occasion of "Yes", can omit based on the execution of the quadratic search of N-gram index.In this occasion, the result for retrieval as retrieval is required can adopt primary retrieval result or morpheme result for retrieval.
Condition a3 is characterised in that, as the evaluation value that is used for judging the execution that whether can omit quadratic search, uses based on the hits N1 of the retrieval of morpheme index and based on the hits N2 this point of the primary retrieval of N-gram index.When N1 and N2 are approximate, in other words, as long as when condition a3 satisfies,, also can guarantee retrieval precision to a certain degree among above-mentioned 3 condition a1, a2 and the a3 herein, even omit execution based on the quadratic search of N-gram index.
So, the occasion set up of condition a3 at least in above-mentioned 3 condition a1, a2 and a3, also it doesn't matter to omit execution based on the quadratic search of N-gram index.In this occasion, also can when suppressing retrieval precision decline,, realize the rapid of full-text search by omitting quadratic search based on the N-gram index.But, in the invalid occasion of above-mentioned condition a1, in other words, do not reach the occasion of hits threshold k based on the hits of the primary retrieval of N-gram index, even it is carry out quadratic search, also little to Effect on Performance based on the N-gram index.Therefore, in the invalid occasion of condition a1, the benefit of omission quadratic search seldom.
Condition a2 sets up in the occasion that the search condition statement can be divided into the word that can carry out the morpheme retrieval.Can imagine that because this condition a2 sets up, for morphemic analysis result and the morpheme that is contained in the morpheme index, the dividing method of word is identical under many circumstances.So, carry out occasion in the morphemic analysis result who utilizes this moment based on the morpheme retrieval of morpheme index, can guarantee precision (reliability) to a certain extent as the result (hits N2) of the morpheme retrieval of evaluation value.This point is illustrated in the occasion that condition a2 sets up, and also can guarantee to comprise the precision of the judgement (judgement of step S6) whether the condition a3 of hits N2 set up to a certain extent.On the contrary, when condition 2 was false, the reliability of the judgement whether condition a3 sets up reduced.So, preferably as present embodiment,, omit execution based on the quadratic search of N-gram index in condition a1, a2 and the whole occasions of setting up of a3.
In addition, in the present embodiment, quadratic search performance element 132 in the N-gram search engine 13 is only carried out the occasion that control gear 18 requires to carry out quadratic search in full-text search, and the result based on the primary retrieval of N-gram index is carried out quadratic search (step S8).Herein, condition b1, b2 and b3 are defined as follows.
B1: based on the hits N1 in the primary retrieval of N-gram index smaller or equal to the hits threshold k.
B2: the search condition statement cannot be divided into the word that can carry out the morpheme retrieval.
B3: not approximate based on the hits N2 in hits N1 in the primary retrieval of N-gram index and the morpheme retrieval.
The occasion of at least one establishment in condition b1, b2 and b3, promptly the result of determination in step S2, S4 and S6 at least one be the occasion of "No", full-text search is carried out control gear 18 and is required N-gram search engine 13 to carry out quadratic search.The occasion of setting up at above-mentioned condition b1 is even carry out quadratic search based on the N-gram index in order to ensure sufficiently high retrieval precision, also very little to the adverse effect of retrieval rate (retrieval execution time).On the other hand,, only utilize, do not guarantee to guarantee retrieval precision to a certain degree based on the retrieval (morpheme retrieval) of morpheme index or based on the primary retrieval of N-gram index in the occasion that above-mentioned condition b2 or b3 set up.In this occasion, in the present embodiment,,, carry out quadratic search based on the N-gram index in order to ensure sufficiently high retrieval precision though retrieval rate reduces.
In addition, in the present embodiment, user interface 11 has the 1st search interface and the 2nd search interface (not shown).The 1st search interface is used to make the user to select the precision of full-text search.This precision is corresponding with search modes.In other words, the 1st search interface is used to make the user to select a certain among quick search modes and the standard retrieval pattern to be applied to full-text search.The 2nd search interface is used to make the user to specify in and carries out the adjustment parameter of using when the above-mentioned quick retrieval.User interface 11 is used for realizing the search interface picture of these the 1st and the 2nd search interfaces to user prompt.
Fig. 3 is illustrated in an example of the search interface picture of the occasion of specifying full-text search.This search interface picture is that one of picture is carried out in retrieval.The search interface picture except search condition territory (field) 31 and index button 32, also comprises retrieval precision selection district 33 and adjusts parameter region 34.Search condition territory 31 is used for specifying (input) by user's input operation, such as, keyword (searching character string) is as search condition.Index button 32, the text retrieval system that is used for user's index map 1 is carried out retrieval.
Select to dispose in the district 33 " fast " selector button 331 and " standard " selector button 332 in retrieval precision." fast " selector button 331, the text retrieval system that is used for the input operation index map 1 by the user uses quick search modes." standard " selector button 332 is used for the text retrieval system use standard retrieval pattern by user's input operation index map 1.
In adjusting parameter region 34, dispose and hit number field 341, approximate ratio territory 342 and adopt condition field 343.Hitting number field 341 is used for specifying hits threshold values (as the hits of benchmark) K by user's input operation.On the other hand, approximate ratio territory 342 is used for specifying approximate ratio (degree of approximation threshold value) P by user's input operation.In addition, adopt condition field 343 to be used for specifying the employing condition by user's input operation.Each is called the adjustment parameter hits threshold k, approximate ratio P and employing condition.
The search interface picture shows each default value of hits threshold k, approximate ratio P and the condition of employing in territory 341,342 and 343 in the initial state (being the initial retrieval interface images) that shows.Each default value of these hits threshold k, approximate ratio P and the condition of employing is set in set information file 19 (preservation) in advance.If using territory 341,342 and 343 to specify, the user adjusts parameter (hits threshold k, approximate ratio P and employing condition), the specified adjustment parameter of then preferential use.Relative therewith, the user does not use territory 341,342 and 343 to specify when adjusting parameter, uses the default value of preserving in set information file 19.
Below to use " standard " selector button 332, " fast " selector button 331, hit number field 341, approximate ratio territory 342 and adopt condition field 343 respectively the specified value search modes, the occasion of search modes, hits threshold k, approximate ratio P and employing condition is illustrated fast.
<standard retrieval pattern 〉
In the occasion of selecting 332 indications of " standard " selector button to retrieve, (standard retrieval processing) handled in the full-text search of operative norm search modes.At this, carry out retrieval (primary retrieval and quadratic search) based on the N-gram index.In this occasion, result for retrieval is complete, but retrieval rate reduces.
<quick search modes 〉
On the other hand, in the occasion of selecting 331 indications of " fast " selector button to retrieve, (retrieval process fast) handled in the full-text search of carrying out quick search modes.At this, carry out retrieval according to the process flow diagram of above-mentioned Fig. 2.So,, can in the retrieval precision of guaranteeing to a certain degree, retrieve fast based on the result of the primary retrieval of N-gram index with based on the occasion that the result of the retrieval of morpheme index is similar to.
Like this, by " standard " selector button 332 that in the search interface picture, disposes or " fast " selector button 331, can by user-specific criteria retrieval or fast in the retrieval either party and reflect the wish that the user is preferential to retrieval rate or retrieval precision is preferential.
<hits threshold k 〉
At first, by user's input operation, use and hit number field 341 appointment hits threshold k.In this occasion, in step S1, whether judgement surpasses the hits threshold k of appointment based on the hits N1 of the primary retrieval of N-gram index., suppose that hits N1 surpasses the hits threshold k herein, in this occasion, just as in text retrieval system, omitting based on one (condition a1) establishment in a plurality of conditions of the quadratic search of N-gram index and handling.On the other hand, in the occasion of hits N1, the result based on the primary retrieval of N-gram index is carried out quadratic search less than the hits threshold k.It is the reasons are as follows.At first, based on the few occasion of hits N1 in the primary retrieval of N-gram index,, also very little to the adverse effect of the retrieval rate of text retrieval system even carry out quadratic search.So, carry out quadratic search in this occasion.By this quadratic search, can try to achieve the high result for retrieval completely of precision.
Like this, owing to hit number field 341 by being configured in the search interface picture, the user can specify hits threshold value (as the hits of benchmark) K, so the user can carry out adjustment corresponding to environment in retrieval process fast.
<approximate ratio P 〉
At first, by user's input operation, can use approximate ratio territory 342 to specify approximate ratio P.In this occasion, in step S6, judge based on the hits N1 in the primary retrieval of N-gram index and based on the degree of approximation of the hits N2 of the retrieval of morpheme index whether less than the approximate ratio P of appointment.In other words, judge whether hits N1 and hits N2 be approximate.If hits N1 and hits N2 are approximate, just as in text retrieval system, omitting based on one (condition a3) establishment in a plurality of conditions of the quadratic search of N-gram index and handling.On the other hand, surpass the occasion of above-mentioned approximate ratio P, when promptly hits N1 and hits N2 are approximate, the result based on the primary retrieval of N-gram index is carried out quadratic search (step S8) in the above-mentioned degree of approximation.In other words, in the occasion that differs greatly based on the result of the primary retrieval of N-gram index and result, can think that the retrieval precision that this primary retrieval and morpheme retrieve is very poor based on the retrieval (morpheme retrieval) of morpheme index.In this occasion,,, can carry out quadratic search based on the N-gram index in order to ensure sufficiently high retrieval precision though retrieval rate is low.
Like this, because by the approximate ratio territory 342 that is configured in the search interface picture, the user can specify approximate ratio (as the degree of approximation of benchmark) P, so can carry out in quick retrieval process corresponding to the search condition statement or as the adjustment of the document group's of the object of retrieval feature.
The employing condition
In the approximate occasion of hits N1 and N2, adopt based on the result of the primary retrieval of N-gram index with based among the result of the retrieval of morpheme index any one and can obtain suitable result for retrieval.Yet, by search condition statement (keyword) being carried out the resulting word number of result of morphemic analysis in the occasion that equals a certain word number (minimum word number), as described below, obtain the possibility height of more high-precision result for retrieval based on a side of the retrieval of morpheme index.At first, will be called and cut apart word number by the search condition statement being carried out word number that morphemic analysis cuts apart.In this occasion of cutting apart word number few (such as, 1 word), can expect to exist hardly the omission of morpheme retrieval.Therefore, cutting apart the few occasion of word number, can think that result one side of morpheme retrieval compares the precision height based on the result of the primary retrieval of N-gram index.
So, in the present embodiment, import minimum word number as the benchmark of cutting apart word number.This minimum word number, expression are used for determine adopting based on the result of the primary retrieval of N-gram index or based on which the condition (employing condition) among the result of the retrieval of morpheme index.In other words, minimum word number, it preferentially still is to make the preferential employing condition of morpheme retrieval that expression is used for determining to make the N-gram retrieval.Herein, by user's input operation, can use and adopt condition field 343 to specify employing condition (minimum word number).
Control gear 18 is carried out in full-text search, in step S7, according to adopting the minimum word number and the above-mentioned word number of cutting apart of condition as this, determines to adopt based among the result of the result of the primary retrieval of N-gram index or morpheme retrieval which as result for retrieval., be less than the occasion that equals minimum word number cutting apart word number herein, control gear 18 is carried out in full-text search, and result one side who judges the morpheme retrieval compares the precision height based on the result of the primary retrieval of N-gram index.In this occasion, control gear 18 is carried out in full-text search, makes the morpheme retrieval preferential, and the result who adopts the morpheme retrieval is as the result for retrieval that requires at retrieval.Relative therewith, cutting apart the occasion that word number surpasses minimum word number, control gear 18 is carried out in full-text search, makes the N-gram retrieval preferential, adopts result based on the primary retrieval of N-gram index as the result for retrieval that requires at retrieval.
Like this, because by the employing condition field 343 that is configured in the search interface picture, the user can specify the minimum word number as the condition of employing, so in quick retrieval process, can carry out adjustment corresponding to the search condition statement.
Fig. 4 illustrates expression to utilize retrieval execution/answering server 12 to notify the example of result for retrieval picture of user's result for retrieval by user interface 11.This result for retrieval picture is that one of picture is carried out in retrieval.The result for retrieval picture except selecting district 33 with the same search condition territory 31 of search interface picture shown in Figure 3, index button 32, retrieval precision and adjusting the parameter region 34, comprises retrieval precision district 41 and result for retrieval district 42.
As mentioned above,,, utilize, try to achieve the result for retrieval that is notified to the user based on the N-gram index and based on the result of two kinds of retrievals of morpheme index with adjust the information of parameter according to the process flow diagram of Fig. 2 carrying out the occasion of quick retrieval process.This result for retrieval is by the result for retrieval district 42 of result for retrieval picture shown in Figure 4 notifying the user.The result for retrieval that this is notified to the user is one of following 3 kinds:
(a) based on the result of the retrieval (primary retrieval and quadratic search) of N-gram index
(b) based on the result of the retrieval of morpheme index
(c) based on the result of the retrieval (only primary retrieval) of N-gram index
On the other hand, carrying out the occasion that standard retrieval is handled, the result who often adopts above-mentioned (a) is as the result for retrieval that shows in result for retrieval district 42.
Carrying out the occasion of quick retrieval process, for adopt above-mentioned (a) and (b) and (c) in which result for retrieval, such as, the term that can utilize performance abstractively and this result for retrieval corresponding " retrieval precision " is shown in the retrieval precision district 41.As showing the term that reaches result for retrieval corresponding " retrieval precision " (c) with (a) and (b) abstractively, can use " suitably ", " more rough ", " roughly " respectively herein.
<the 1 variation 〉
Illustrated below with reference to the process flow diagram of Fig. 5 the 1st variation above-mentioned embodiment.In addition, in Fig. 5, for giving same reference marks with the treatment step of the process flow diagram equivalence of Fig. 2 of the step that quick retrieval process is shown.
The 1st variation is characterised in that, the processing (based on the retrieval of morpheme index) of the processing of step S1 and S2 (based on the primary retrieval of N-gram index) and step S3 to S5 is to carry out this point with the order opposite with the process flow diagram of Fig. 2.Herein, in the occasion (step S4) that the search condition statement can not be divided into the word that can carry out the morpheme retrieval, carry out the processing suitable,, carry out quadratic search (step S8) then based on the N-gram index promptly based on the primary retrieval (step S11) of N-gram index with above-mentioned steps S1.
In addition, even the search condition statement can be divided into the word that can carry out the morpheme retrieval, at this word number (cutting apart word number), such as, above the occasion of the benchmark word number of Duoing than minimum word number, the result's of morphemic analysis precision is low.In this occasion,, can not guarantee to guarantee precision to a certain degree for result based on the retrieval of morpheme index.At this, such as, can in above-mentioned steps S4, increase and cut apart word number and whether be less than the judgement that equals the benchmark word number.So, be to cut apart the occasion that word number surpasses the benchmark word number in this result of determination, can carry out primary retrieval and quadratic search (step S11 and S8) based on the N-gram index.So, specifying the occasion of carrying out, do not judging (step S2) owing to do not carry out the size of hits N1, so can shorten retrieval time based on the so long search condition statement of the quadratic search of N-gram index., be less than the occasion that equals the benchmark word number and satisfy above-mentioned condition a1, a2 and a3 cutting apart word number herein, can omit quadratic search based on the N-gram index.
<the 2 variation 〉
Illustrated below with reference to the process flow diagram of Fig. 6 the 2nd variation above-mentioned embodiment.In addition, in Fig. 6, for giving same reference marks with the treatment step of the process flow diagram equivalence of Fig. 2.
The 2nd variation is characterised in that the processing (based on the retrieval of morpheme index) of the processing of step S1 and S2 (based on the primary retrieval of N-gram index) and step S3 to S5 is this point of executed in parallel.Promptly in the 2nd variation, utilize the N-gram search engine primary retrieval performance element 131 retrieval and utilize the retrieval executed in parallel of morpheme search engine 16.Like this, by executed in parallel both sides' retrieval, can retrieve faster.
Other feature and change are conceivable for these those skilled in the art.Therefore, the present invention is based on the wider viewpoint, the representational embodiment that is not subject to specific detail and illustrates herein.So, in not breaking away from the defined wide inventive concept of accompanying technical scheme and explanation that is equal to it and scope, can carry out various changes.

Claims (13)

1. text retrieval system, it comprises:
Utilization is carried out the 1st retrieval unit based on the retrieval of N-gram index according to the primary retrieval of search condition statement with at the result's of this primary retrieval quadratic search; Above-mentioned search condition statement is carried out the morphemic analysis unit of morphemic analysis; And, carry out the 2nd retrieval unit based on the morpheme retrieval of morpheme index according to utilizing the resulting morphemic analysis result in above-mentioned morphemic analysis unit;
Above-mentioned text retrieval system is characterised in that to have:
By representing to judge the degree of approximation identifying unit whether above-mentioned the 1st hits and above-mentioned the 2nd hits are similar to as the 1st hits of the hits of above-mentioned primary retrieval based on the N-gram index with as the degree of approximation and the degree of approximation threshold of the akin degree of the 2nd hits of the hits of above-mentioned morpheme retrieval based on the morpheme index;
Judge that by the hits of more above-mentioned the 1st hits and benchmark above-mentioned the 1st hits are many or few primary retrieval number of results identifying unit; And
In that to utilize above-mentioned primary retrieval number of results identifying unit to be judged to be above-mentioned the 1st hits many and utilize above-mentioned degree of approximation identifying unit to be judged to be above-mentioned the 1st hits and above-mentioned the 2nd hits when approximate, to omit the mode of above-mentioned quadratic search based on the N-gram index, control above-mentioned the 1st retrieval unit, adopt the result of above-mentioned primary retrieval or the result of above-mentioned morpheme retrieval to carry out control module as the full-text search of result for retrieval
Control module is carried out in above-mentioned full-text search, utilize above-mentioned primary retrieval number of results identifying unit be judged to be above-mentioned the 1st hits after a little while with utilize above-mentioned degree of approximation identifying unit to be judged to be under above-mentioned the 1st hits and above-mentioned the 2nd hits any situation when approximate, control above-mentioned the 1st retrieval unit in the mode of carrying out above-mentioned quadratic search based on the N-gram index, the result who adopts above-mentioned quadratic search is as result for retrieval.
2. text retrieval system as claimed in claim 1 is characterized in that:
Also have, according to utilizing the resulting morphemic analysis result in above-mentioned morphemic analysis unit, judge the morphemic analysis identifying unit as a result that whether above-mentioned search condition statement can be divided into the word that can carry out the morpheme retrieval;
Control module is carried out in above-mentioned full-text search, by above-mentioned morphemic analysis as a result identifying unit be judged to be above-mentioned search condition statement can be divided into the word that can carry out morpheme retrieval the time, to carry out mode, control above-mentioned the 2nd retrieval unit based on the morpheme retrieval of above-mentioned morpheme index.
3. text retrieval system as claimed in claim 1 is characterized in that:
Control module is carried out in above-mentioned full-text search, according to the word number of cutting apart of the word number after cutting apart by the shown conduct of analysis result of above-mentioned morphemic analysis unit, determine to adopt among the result of the result of above-mentioned primary retrieval or the retrieval of above-mentioned morpheme which as result for retrieval.
4. text retrieval system as claimed in claim 3 is characterized in that:
Control module is carried out in above-mentioned full-text search, and when cutting apart word number and surpassing minimum word number as benchmark, the result who adopts above-mentioned primary retrieval is as result for retrieval above-mentioned; When cutting apart word number smaller or equal to above-mentioned minimum word number, the result who adopts above-mentioned morpheme retrieval is as result for retrieval above-mentioned.
5. text retrieval system as claimed in claim 4 is characterized in that also possessing:
Be used for to specify by the user user interface of above-mentioned minimum word number.
6. text retrieval system as claimed in claim 4 is characterized in that:
Control module is carried out in above-mentioned full-text search, above-mentioned when cutting apart word number and surpassing the benchmark word number of Duoing than above-mentioned minimum word number, to carry out the mode of above-mentioned quadratic search based on the N-gram index, control above-mentioned the 1st retrieval unit, the result who adopts above-mentioned quadratic search is as result for retrieval.
7. text retrieval system as claimed in claim 1 is characterized in that also possessing:
Be used for to specify by the user user interface of the hits of said reference.
8. text retrieval system as claimed in claim 1 is characterized in that also possessing:
Be used for to specify by the user user interface of above-mentioned degree of approximation threshold value.
9. text retrieval system as claimed in claim 1 is characterized in that also possessing:
Being used for can be by user-specific criteria retrieval and some user interface of retrieving fast, above-mentioned standard retrieval under any circumstance all is performed until above-mentioned quadratic search based on the N-gram index, and above-mentioned quick retrieval is according to utilizing the resulting result of determination of above-mentioned degree of approximation identifying unit might omit above-mentioned quadratic search based on the N-gram index.
10. text retrieval system as claimed in claim 1 is characterized in that:
Control module is carried out in above-mentioned full-text search so that above-mentioned based on the N-gram index primary retrieval and the mode of above-mentioned morpheme retrieval executed in parallel based on the morpheme index, above-mentioned the 1st retrieval unit and above-mentioned the 2nd retrieval unit are controlled respectively.
11. a text searching method that is applied to system, said system possesses:
Utilization is carried out the 1st retrieval unit based on the retrieval of N-gram index according to the primary retrieval of search condition statement with at this primary retrieval result's quadratic search; The search condition statement is carried out the morphemic analysis unit of morphemic analysis; And according to utilizing the resulting morphemic analysis result in above-mentioned morphemic analysis unit to carry out the 2nd retrieval unit of retrieving based on the morpheme of morpheme index;
Above-mentioned text searching method comprises:
By representing to judge the step whether above-mentioned the 1st hits and above-mentioned the 2nd hits are similar to as the 1st hits of the hits of above-mentioned primary retrieval based on the N-gram index with as the degree of approximation and the degree of approximation threshold of the akin degree of the 2nd hits of the hits of above-mentioned morpheme retrieval based on the morpheme index;
Judge that by the hits of more above-mentioned the 1st hits and benchmark above-mentioned the 1st hits are many or few step; Above-mentioned the 1st hits are many and be judged to be above-mentioned the 1st hits and above-mentioned the 2nd hits when approximate being judged to be, omission is by the above-mentioned quadratic search based on the N-gram index that above-mentioned the 1st retrieval unit carries out, and adopts the step of the result of the result of above-mentioned primary retrieval or the retrieval of above-mentioned morpheme as result for retrieval; And
Be judged to be above-mentioned the 1st hits after a little while with above-mentioned the 1st hits and above-mentioned the 2nd hits under any situation when approximate, make above-mentioned the 1st retrieval unit carry out above-mentioned quadratic search, adopt the step of the result of this quadratic search as result for retrieval based on the N-gram index.
12. text searching method as claimed in claim 11 is characterized in that, also comprises:
According to utilizing the resulting morphemic analysis result in above-mentioned morphemic analysis unit, judge the step that whether above-mentioned search condition statement can be divided into the word that can carry out the morpheme retrieval; And
Be judged to be above-mentioned search condition statement can be divided into the word that can carry out morpheme retrieval the time, making above-mentioned the 2nd retrieval unit carry out the step of above-mentioned morpheme retrieval based on the morpheme index.
13. text searching method as claimed in claim 11 is characterized in that:
In the step of result as result for retrieval of result who adopts above-mentioned primary retrieval or the retrieval of above-mentioned morpheme, according to the word number of cutting apart of the word number after cutting apart by the conduct shown in the analysis result of above-mentioned morphemic analysis unit, determine to adopt among the result of the result of above-mentioned primary retrieval or the retrieval of above-mentioned morpheme which as result for retrieval.
CNB2005101080095A 2004-09-29 2005-09-29 Full-text retrieval system and method Active CN100412864C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2004284673A JP4037859B2 (en) 2004-09-29 2004-09-29 Full-text search system and method
JP284673/2004 2004-09-29

Publications (2)

Publication Number Publication Date
CN1755691A CN1755691A (en) 2006-04-05
CN100412864C true CN100412864C (en) 2008-08-20

Family

ID=36239173

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101080095A Active CN100412864C (en) 2004-09-29 2005-09-29 Full-text retrieval system and method

Country Status (2)

Country Link
JP (1) JP4037859B2 (en)
CN (1) CN100412864C (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100424704C (en) * 2006-09-30 2008-10-08 华中科技大学 Full text search system based on ciphertext
JP5224851B2 (en) 2008-02-27 2013-07-03 インターナショナル・ビジネス・マシーンズ・コーポレーション Search engine, search system, search method and program
CN101350835B (en) * 2008-09-19 2011-12-28 华为终端有限公司 Method and device for selecting user
JP4796108B2 (en) * 2008-09-26 2011-10-19 株式会社東芝 Structured document retrieval apparatus, method and program
JP5178813B2 (en) * 2010-12-16 2013-04-10 ヤフー株式会社 Search system and method
JP7389437B2 (en) 2019-10-29 2023-11-30 国立研究開発法人国立循環器病研究センター Cerebral infarction treatment support system
CN115803730A (en) 2021-06-30 2023-03-14 株式会社英弗麦斯 Search device, search method, and recording medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10307835A (en) * 1997-05-08 1998-11-17 Canon Inc Information processor and its method
JP2000207404A (en) * 1999-01-11 2000-07-28 Sumitomo Metal Ind Ltd Method and device for retrieving document and record medium
CN1281191A (en) * 1999-07-19 2001-01-24 松下电器产业株式会社 Information retrieval method and information retrieval device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10307835A (en) * 1997-05-08 1998-11-17 Canon Inc Information processor and its method
JP2000207404A (en) * 1999-01-11 2000-07-28 Sumitomo Metal Ind Ltd Method and device for retrieving document and record medium
CN1281191A (en) * 1999-07-19 2001-01-24 松下电器产业株式会社 Information retrieval method and information retrieval device

Also Published As

Publication number Publication date
CN1755691A (en) 2006-04-05
JP2006099427A (en) 2006-04-13
JP4037859B2 (en) 2008-01-23

Similar Documents

Publication Publication Date Title
CN100412864C (en) Full-text retrieval system and method
US9075793B2 (en) System and method of providing autocomplete recommended word which interoperate with plurality of languages
CN101878476B (en) Machine translation for query expansion
US7680772B2 (en) Search quality detection
KR101533570B1 (en) Autocompletion and automatic input method correction for partially entered search query
US6128635A (en) Document display system and electronic dictionary
US11182435B2 (en) Model generation device, text search device, model generation method, text search method, data structure, and program
US6631373B1 (en) Segmented document indexing and search
US6578032B1 (en) Method and system for performing phrase/word clustering and cluster merging
US8745065B2 (en) Query parsing for map search
US7363294B2 (en) Indexing for contextual revisitation and digest generation
CN103365925B (en) Obtain polyphone phonetic, method based on phonetic retrieval and related device thereof
US20070050352A1 (en) System and method for providing autocomplete query using automatic query transform
CN101251844A (en) Apparatus and method for retrieval of contents
US20020111941A1 (en) Apparatus and method for information retrieval
US20020123994A1 (en) System for fulfilling an information need using extended matching techniques
JP2006523344A (en) System and method for interactive search query refinement
KR20080046670A (en) Ranking functions using document usage statistics
WO2012050743A2 (en) Language identification in multilingual text
US20050138067A1 (en) Indexing for contexual revisitation and digest generation
CN101425071A (en) Location expression detection device and computer readable medium
JP2006073012A (en) System and method of managing information by answering question defined beforehand of number decided beforehand
US20130318124A1 (en) Computer product, retrieving apparatus, and retrieval method
JP3023943B2 (en) Document search device
US20140289260A1 (en) Keyword Determination

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant