JP2005339150A

JP2005339150A - Document retrieval device

Info

Publication number: JP2005339150A
Application number: JP2004156399A
Authority: JP
Inventors: Takaaki Nakamura; 隆顕中村; Mitsunori Kori; 光則郡
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-05-26
Filing date: 2004-05-26
Publication date: 2005-12-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document retrieving device capable of shortening the retrieval time with respect to various retrieval conditions. <P>SOLUTION: When the retrieval instruction for the neighborhood conditions of a plurality of keyword groups is issued, a neighborhood condition deciding part 108 of a retrieval processing executing part 105 acquires the appearing positions in a document of the plurality of keyword groups from a keyword group collating part 107, and decides whether the instructed predetermined neighborhood conditions are satisfied by the acquired appearing positions. The acquisition processing of the appearing positions and the neighborhood condition decision processing are executed alternately, and when it is decided that the neighborhood conditions are true by the neighborhood condition deciding part 108, the decision result is outputted as a retrieval result at that point of time. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、キーワード集合に対して、所定の検索条件に一致した文書を出力する文書検索装置に関するものである。 The present invention relates to a document search apparatus that outputs a document that matches a predetermined search condition for a keyword set.

従来、文書検索装置としては、例えば、特許文献１に示すように、キーワードの文書中での出現位置情報の無駄なデータ転送を削減し、複合条件判定の判定を高速化するようにしたものがあった。ここで、複合条件とは、近傍条件、範囲条件、属性条件、文脈条件およびそれらの論理条件のことである。 Conventionally, as a document search apparatus, for example, as shown in Patent Document 1, wasteful data transfer of appearance position information in a keyword document is reduced, and the determination of the compound condition determination is accelerated. there were. Here, the compound condition is a neighborhood condition, a range condition, an attribute condition, a context condition, and their logical conditions.

特開平４−２９３１６１号公報JP-A-4-293161

しかしながら、特許文献１に記載された従来の文書検索装置は、キーワードの文書中での出現位置情報のデータ転送を効率化し、複合条件判定の判定を高速化するものであるが、例えば、近傍条件を判定する場合では、二つ以上のキーワードの出現位置を総当りで判定するしかなく、従って、キーワードの文書中での出現位置が増えるに従って、判定に要する計算量が膨大になるという課題があった。 However, the conventional document search device described in Patent Document 1 makes it efficient to transfer data of appearance position information in a keyword document and speeds up determination of composite condition determination. In this case, there is only a brute force determination of the appearance positions of two or more keywords, and there is a problem that the amount of calculation required for the determination increases as the appearance positions of the keywords in the document increase. It was.

この発明は上記のような課題を解決するためになされたもので、種々の検索条件に対しても検索時間の短縮化を図ることのできる文書検索装置を得ることを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a document search apparatus capable of shortening the search time even for various search conditions.

この発明に係る文書検索装置は、複数のキーワード集合の文書中での出現位置をそれぞれのキーワード集合の出現位置順に取得する処理と、複数のキーワード集合の出現位置が、所定の近傍条件を満たすか否かを判定する近傍条件判定処理とを交互に実行し、近傍条件が真と判定された時点でその判定結果を検索結果として出力する検索処理実行部を備えたものである。 The document search device according to the present invention obtains the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets, and whether the appearance positions of the plurality of keyword sets satisfy a predetermined neighborhood condition. And a proximity process determination unit that alternately executes a proximity condition determination process that determines whether or not, and outputs the determination result as a search result when the proximity condition is determined to be true.

この発明の文書検索装置は、キーワード集合照合部における出現位置の取得処理と、近傍条件判定部における近傍条件判定処理とを交互に実行し、近傍条件が真と判定された時点でその判定結果を検索結果として出力するようにしたので、近傍条件が満たされた時点で検索結果を出すことができ、従って、キーワードの数や文書中のキーワードの出現位置が多い場合でも検索時間の短縮化を図ることができる。 The document search apparatus according to the present invention alternately executes the appearance position acquisition process in the keyword set matching unit and the neighborhood condition determination process in the neighborhood condition determination unit, and the determination result is obtained when the neighborhood condition is determined to be true. Since the search result is output, the search result can be output when the neighborhood condition is satisfied. Therefore, even when the number of keywords and the appearance positions of the keywords in the document are large, the search time can be shortened. be able to.

実施の形態１．
図１は、この発明の実施の形態１による文書検索装置を示す構成図である。
この文書検索装置は、１件以上の文書を蓄積したデーターベースから、検索条件に指定された近傍条件に適合した文書を出力する文書検索装置である。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a document search apparatus according to Embodiment 1 of the present invention.
This document search apparatus is a document search apparatus that outputs a document that meets a proximity condition specified as a search condition from a database that stores one or more documents.

図において、文書検索装置は、検索条件入力部１０３、検索条件解析部１０４、検索処理実行部１０５、検索結果出力部１０６、キーワード集合照合部１０７、近傍条件判定部１０８、キーワード照合部１０９、データベース１１０、文書索引１１１からなる。 In the figure, a document search apparatus includes a search condition input unit 103, a search condition analysis unit 104, a search processing execution unit 105, a search result output unit 106, a keyword set collation unit 107, a neighborhood condition determination unit 108, a keyword collation unit 109, a database. 110 and a document index 111.

検索条件入力部１０３は、検索ユーザからの検索条件１０１の入力を受け付ける機能部である。検索条件１０１には、検索の対象とする二つ以上のキーワード集合の情報と、そのキーワード集合間の近傍条件などが指定されている。検索条件解析部１０４は、検索条件入力部１０３から入力された検索条件を解析し、例えば所定の近傍条件検索といった検索の実行プランを生成する機能部である。検索処理実行部１０５は、検索条件解析部１０４より出力された検索実行プランに従って検索処理を実行する機能部である。検索結果出力部１０６は、検索処理実行部１０５で得られた検索結果１０２を検索のユーザに対して出力する機能部である。 The search condition input unit 103 is a functional unit that receives an input of the search condition 101 from a search user. In the search condition 101, information on two or more keyword sets to be searched and a neighborhood condition between the keyword sets are specified. The search condition analysis unit 104 is a functional unit that analyzes the search conditions input from the search condition input unit 103 and generates a search execution plan such as a predetermined neighborhood condition search. The search processing execution unit 105 is a functional unit that executes search processing according to the search execution plan output from the search condition analysis unit 104. The search result output unit 106 is a functional unit that outputs the search result 102 obtained by the search processing execution unit 105 to a search user.

検索処理実行部１０５は、キーワード集合照合部１０７と、近傍条件判定部１０８を備えている。キーワード集合照合部１０７は、キーワード照合部１０９を繰り返し呼び出すことで、検索条件に指定されたキーワード集合に含まれるキーワードの、文書中での出現位置を取得する機能を有している。キーワード照合部１０９は、データベース１１０に蓄積された文書索引１１１を参照しながら、キーワードの文書中での出現位置を取得する機能を有している。また、近傍条件判定部１０８は、キーワード集合照合部１０７から出力された、二つ以上のキーワード集合の文書中で出現位置の情報を元に、検索条件に指定された近傍条件を判定する機能を有している。即ち、検索処理実行部１０５は、キーワード集合照合部１０７における出現位置の取得処理と、近傍条件判定部１０８における近傍条件判定処理とを交互に実行し、近傍条件が真と判定された時点でその判定結果を検索結果として出力するよう構成されている。 The search processing execution unit 105 includes a keyword set matching unit 107 and a neighborhood condition determination unit 108. The keyword set matching unit 107 has a function of acquiring the appearance position in the document of the keyword included in the keyword set specified in the search condition by repeatedly calling the keyword matching unit 109. The keyword matching unit 109 has a function of acquiring the appearance position of the keyword in the document while referring to the document index 111 accumulated in the database 110. In addition, the neighborhood condition determination unit 108 has a function of determining the neighborhood condition specified as the search condition based on the appearance position information in the documents of two or more keyword sets output from the keyword set matching unit 107. Have. That is, the search processing execution unit 105 alternately executes the appearance position acquisition process in the keyword set matching unit 107 and the neighborhood condition determination process in the neighborhood condition determination unit 108, and when the neighborhood condition is determined to be true, The determination result is output as a search result.

尚、上記の文書検索装置はコンピュータで実現され、検索条件解析部１０４および検索処理実行部１０５は、それぞれの機能に対応したソフトウェアと、これらのソフトウェアを実行するためのＣＰＵやメモリ等のハードウェアから構成されている。 Note that the document search device described above is realized by a computer, and the search condition analysis unit 104 and the search processing execution unit 105 include software corresponding to each function, and hardware such as a CPU and a memory for executing the software. It is composed of

キーワード集合とは、一つ以上のキーワードを含む集合である。キーワードを一つも含まないキーワード集合は、そもそも文書中での出現位置が存在しないため、ここでは考慮しない。 A keyword set is a set including one or more keywords. A keyword set that does not contain any keywords does not have any appearance position in the document, and is not considered here.

あるキーワードｋと同じ文字の並びが文書Ｄ中に存在するとき、キーワードｋは文書Ｄ中に出現するという。キーワードｋが文書Ｄ中に出現するとき、キーワードｋの出現位置を、文書Ｄの先頭の文字からキーワードｋの先頭の文字までの文字数で表わす。キーワードｋは、文書Ｄ中に０回以上出現するため、キーワードｋはｋのＤ中での出現位置の集合とみなすこともできる。 The keyword k appears in the document D when the same character sequence as the keyword k exists in the document D. When the keyword k appears in the document D, the appearance position of the keyword k is represented by the number of characters from the first character of the document D to the first character of the keyword k. Since the keyword k appears 0 or more times in the document D, the keyword k can also be regarded as a set of appearance positions in D of k.

図２は、文書とキーワードとの関係を示す説明図である。
キーワードｋの文書Ｄ中の出現位置の集合をｋ^Ｄと表わす。図２の例では、キーワードｋ＝“キーワード”のとき、文書２０１中でのｋの出現位置は、図中の２０２に示すようにｋ^Ｄ＝｛１，１８，３５｝である。キーワードｋの文字数を、キーワード長といい、ＬＥＮ（ｋ）と表わす。キーワードｋの文書Ｄ中の出現位置を一つ取得する処理をＳＴＲ^Ｄ（ｋ）と表わす。また、ｋの末尾の文字の位置をＥＮＤ^Ｄ（ｋ）と表わす。ＳＴＲ^Ｄ（ｋ）と、ＥＮＤ^Ｄ（ｋ）の関係を以下に示す。

FIG. 2 is an explanatory diagram showing the relationship between documents and keywords.
A set of appearance position in the document D of keyword k represents a k ^D. In the example of FIG. 2, when the keyword k = “keyword”, the appearance position of k in the document 201 is k ^D = {1, 18, 35} as indicated by 202 in the figure. The number of characters of the keyword k is called the keyword length and is represented as LEN (k). A process of acquiring one appearance position of the keyword k in the document D is represented as STR ^D (k). Further, the position of the last character of k is represented as END ^D (k). The relationship between STR ^D (k) and END ^{D (} k) is shown below.

二つのキーワードｋ_１とｋ_２（ＳＴＲ^Ｄ（ｋ_１）＜ＳＴＲ^Ｄ（ｋ_２）とする）があるとき、このキーワード間の距離を、キーワードｋ_１の末尾の文字とキーワードｋ_２の先頭の文字の間の文字数で表わし、ＤＩＳＴ^Ｄ（ｋ_１，ｋ_２）と表わす。ＤＩＳＴ^Ｄ（ｋ_１，ｋ_２）を以下のように定義する。

図２の文書２０１の例では、最初の”キーワード”と”出現”の距離は３である。 When there are two keywords _{k 1} and _k 2 ^(STR _D (k 1) and ^{_{<STR D (k 2))}} , the distance between the keyword, the end of the keyword _{k 1} character and the keyword _{k 2} beginning of It is represented by the number of characters between characters, and is represented as DIST ^D (k ₁ , k ₂ ). DIST ^D (k ₁ , k ₂ ) is defined as follows.

In the example of the document 201 in FIG. 2, the distance between the first “keyword” and “appearance” is 3.

キーワード集合をＳとすると、Ｓ＝｛ｋ_１，ｋ_２，…，ｋ_ｎ｝（ｎ≧１）である。キーワード集合Ｓの文書Ｄ中での出現位置を、Ｓ中のキーワードｋ_ｉ（ｉ＝１〜ｎ）の出現位置とする。Ｓ^Ｄをキーワード集合Ｓの文書中での出現位置の集合とすると、Ｓ^Ｄ＝ｋ_１ ^Ｄ∪ｋ_２ ^Ｄ∪…∪ｋ_ｎ ^Ｄとなる。図２の例では、キーワード集合をＳ＝｛“キーワード”，“出現”｝としたとき、図２の２０３に示したようにＳ^Ｄ＝｛１，９，１８，３５，４３｝である。キーワード集合の文書中での出現位置の中で最も値の小さいものをＳの最初の出現位置と呼ぶこととする。また、直前に取得したキーワード集合の出現位置以降で、文書中での最初の出現位置をキーワード集合の次の出現位置と呼ぶ。図２の例では、キーワード集合Ｓの最初の出現位置は１である。キーワード集合Ｓのその次の出現位置は９である。
二つのキーワード集合間の距離は、その二つのキーワード集合に含まれるキーワード間の距離によって定義する。 When the keyword set is S, S = {k ₁ , k ₂ ,..., K _n } (n ≧ 1). The appearance position of the keyword set S in the document D is defined as the appearance position of the keyword k _i (i = 1 to n) in S. When S ^D is a set of appearance positions in the document of the keyword set S, S ^D = k ₁ ^D ∪k ₂ ^D ∪... ∪k _n ^D. In the example of FIG. 2, when the keyword set is S = {“keyword”, “appearance”}, S ^D = {1, 9, 18, 35, 43} as shown by 203 in FIG. The smallest appearance value in the keyword set in the document is called the first appearance position of S. Also, the first appearance position in the document after the appearance position of the keyword set acquired immediately before is called the next appearance position of the keyword set. In the example of FIG. 2, the first appearance position of the keyword set S is 1. The next appearance position of the keyword set S is nine.
The distance between two keyword sets is defined by the distance between keywords included in the two keyword sets.

近傍条件とは、二つ以上のキーワード集合間の距離によって真偽を判定する検索条件である。 The neighborhood condition is a search condition for determining authenticity based on a distance between two or more keyword sets.

図３は、本発明の文書検索装置における検索処理の流れ図である。
先ず、ステップＳＴ３０１で、検索条件入力部１０３に検索条件１０１が入力されると、ステップＳＴ３０２で、検索条件解析部１０４により検索実行プランが生成される。次に、検索処理実行部１０５で、データベース１１０に蓄積された文書を１件ずつ、近傍条件に適合するか判定する。 FIG. 3 is a flowchart of search processing in the document search apparatus of the present invention.
First, when the search condition 101 is input to the search condition input unit 103 in step ST301, a search execution plan is generated by the search condition analysis unit 104 in step ST302. Next, the search processing execution unit 105 determines whether the documents stored in the database 110 meet the neighborhood condition one by one.

先ず、ステップＳＴ３０３で、一つの文書が近傍条件に適合するか判定する。近傍条件に適合していたら（ＹＥＳ）、その文書を検索結果に追加する（ステップＳＴ３０４）。検索結果は、文書の識別子のみを保持しても良いし、他に付加的な情報を保持しても良い。検索結果に文書を追加したら、その次の文書に対してステップＳＴ３０３の近傍条件判定を実施する。ステップＳＴ３０３で、文書が近傍条件に適合していなければ（ＮＯ）、文書を検索結果に追加せずに、次の文書に対して近傍条件判定を実施する（ステップＳＴ３０３）。この様にして、全ての文書に対して１回ずつ近傍条件を判定し終えたら、ステップＳＴ３０５で、検索結果出力部１０６より検索結果１０２を出力する。 First, in step ST303, it is determined whether one document meets the neighborhood condition. If the neighborhood condition is met (YES), the document is added to the search result (step ST304). The search result may hold only the document identifier or may hold additional information. When a document is added to the search result, the neighborhood condition determination in step ST303 is performed on the next document. If the document does not conform to the neighborhood condition in step ST303 (NO), the neighborhood condition is determined for the next document without adding the document to the search result (step ST303). In this way, when the neighborhood condition is determined once for all the documents, the search result output unit 106 outputs the search result 102 in step ST305.

図４は、図１の検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れ図である。
ここでは、キーワード集合Ｓ_１〜Ｓ_ｎの近傍条件を判定するものとする。先ず、近傍条件判定部１０８は、キーワード集合照合部１０７から、各キーワード集合Ｓ_１〜Ｓ_ｎの、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１〜ｋ_ｎを取得する（ステップＳＴ４０１）。ここで取得する情報は、キーワード集合の出現位置とキーワードの長さであっても良い。 FIG. 4 is a flowchart of the process (step ST303 in FIG. 3) of the search process execution unit 105 in FIG.
Here, it is assumed to determine the neighborhood condition of keyword set S ₁ to S _n. First, neighborhood condition judging unit 108, the keyword set collation unit 107, for each keyword set _S 1 to S _n, and first occurrence in a document to be determined, the keyword _k 1 to k _n at that appearance positions Is acquired (step ST401). The information acquired here may be the appearance position of the keyword set and the length of the keyword.

キーワード集合照合部１０７は、キーワード集合の出現位置取得要求がある度に、キーワード照合部１０９を呼び出しながら、文書の先頭から一つずつ順にキーワード集合の出現位置と、その位置のキーワード、もしくはキーワード長を出力する。 The keyword set matching unit 107 calls the keyword matching unit 109 each time a keyword set appearance position acquisition request is made, and sequentially displays the keyword set appearance position and the keyword at that position, or the keyword length, one by one from the top of the document. Is output.

次に、近傍条件判定部１０８は、ステップＳＴ４０２で、取得したキーワード集合の位置が近傍条件に適合するか否かを判定する。ステップＳＴ４０２において、判定対象の文書が近傍条件に適合していれば（ＹＥＳ）、ステップＳＴ４０６に移行し、「適合した」を出力して判定処理を終了する。一方、ステップＳＴ４０２でキーワード集合の出現位置が近傍条件に適合していなければ（ＮＯ）、ステップＳＴ４０３に移行して、近傍条件を満たさなかったキーワード集合Ｓ_ｉ（ｉ＝１〜ｎ）の文書中での次の出現位置を取得する。 Next, the neighborhood condition determining unit 108 determines whether or not the position of the acquired keyword set matches the neighborhood condition in step ST402. In step ST402, if the document to be determined conforms to the neighborhood condition (YES), the process proceeds to step ST406, “conforms” is output, and the determination process ends. On the other hand, if the appearance position of the keyword set does not match the neighborhood condition in step ST402 (NO), the process proceeds to step ST403, in the document of the keyword set S _i (i = 1 to n) that does not satisfy the neighborhood condition. Get the next occurrence position at.

ステップＳＴ４０４では、ステップＳＴ４０３でキーワード集合の次の出現位置が取得できたかどうか判定し、取得できていた場合は（ＹＥＳ）、ステップＳＴ４０２に戻って、それらの出現位置が近傍条件に適合するか否かを判定する。ステップＳＴ４０４でキーワード集合の次の出現位置が取得できていなければ（ＮＯ）、その判定対象の文書中に、近傍条件に適合する出現位置はないということなので、ステップＳＴ４０５で「適合しない」を出力して終了する。 In step ST404, it is determined whether or not the next appearance position of the keyword set has been acquired in step ST403. If it has been acquired (YES), the process returns to step ST402, and whether or not these appearance positions meet the neighborhood condition. Determine whether. If the next appearance position of the keyword set has not been acquired in step ST404 (NO), it means that there is no appearance position that matches the neighborhood condition in the document to be determined, so that “not compatible” is output in step ST405. And exit.

キーワード照合部１０９は、データベース１１０に記録された文書索引１１１を参照しながら、キーワードの文書中での出現位置を出力する。ここで、データベース１１０に記録されている文書索引１１１は、ｎ−ｇｒａｍ索引やサフィックスアレイといった、文字列とその文書中での出現位置を記録した索引でも良いし、キーワード照合部１０９が判定対象の文書を直接走査することで、キーワードの文書中での出現位置を取得するものであっても良い。即ち、判定対象の文書中でのキーワードの出現位置を取得できる機能を備えていればその実現手段はどのようなものであっても良い。 The keyword matching unit 109 outputs the appearance position of the keyword in the document while referring to the document index 111 recorded in the database 110. Here, the document index 111 recorded in the database 110 may be an index that records a character string and an appearance position in the document, such as an n-gram index or a suffix array. The appearance position of the keyword in the document may be acquired by directly scanning the document. That is, as long as it has a function capable of acquiring the appearance position of a keyword in a document to be determined, any means for realizing it may be used.

以上のように、実施の形態１によれば、複数のキーワード集合の文書中での出現位置をそれぞれのキーワード集合の出現位置順に取得するキーワード集合照合部１０７と、キーワード集合照合部１０７で取得した複数のキーワード集合の出現位置が、所定の近傍条件を満たすか否かを判定する近傍条件判定部１０８とを有し、キーワード集合照合部１０７における出現位置の取得処理と、近傍条件判定部１０８における近傍条件判定処理とを交互に実行し、近傍条件が真と判定された時点でその判定結果を検索結果として出力する検索処理実行部１０５を備えたので、キーワードの数やキーワードの文書中での出現位置が増大しても、判定に要する計算量を抑えることができ、このような場合の検索時間の短縮化に寄与することができる。 As described above, according to the first embodiment, the keyword set matching unit 107 that acquires the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets and the keyword set matching unit 107 are acquired. A proximity condition determination unit that determines whether or not the appearance positions of a plurality of keyword sets satisfy a predetermined neighborhood condition, and the appearance position acquisition processing in the keyword set matching unit 107 and the neighborhood condition determination unit in Since the search processing execution unit 105 that alternately executes the proximity condition determination process and outputs the determination result as a search result when the proximity condition is determined to be true, the number of keywords and the keyword in the document are included. Even if the appearance position increases, the amount of calculation required for the determination can be suppressed, which can contribute to shortening the search time in such a case.

例えば、キーワード「デジタル」と「カメラ」を同義語・異表記展開したキーワード集合｛「デジタル」，「ｄｉｇｉｔａｌ」，「ディジタル」｝と、｛「カメラ」，「ｃａｍｅｒａ」，「写真機」｝の近傍条件を判定するといった場合でも、本実施の形態では、これらのキーワードをキーワード集合として直接判定することができ、従って、検索時間の短縮化を図ることができる。 For example, a keyword set {“digital”, “digital”, “digital”} and {“camera”, “camera”, “camera”} in which the keywords “digital” and “camera” are developed synonyms and different notations are used. Even in the case of determining the neighborhood condition, in the present embodiment, these keywords can be directly determined as a keyword set, and therefore the search time can be shortened.

実施の形態２．
実施の形態２は、キーワード集合の順序指定あり近傍内条件を判定するようにしたものである。 Embodiment 2. FIG.
In the second embodiment, the in-neighbor conditions with keyword group order designation are determined.

実施の形態２における図面上の構成は、図１に示した実施の形態１と同様であるため、図１を援用して説明する。実施の形態２の文書検索装置は、実施の形態１の文書検索装置の近傍条件判定部１０８で、キーワード集合の順序指定あり近傍内条件を判定できるように構成したものである。また、検索処理の全体の流れは図３に示したものと同等であるため、ここでの説明は省略する。 Since the configuration of the second embodiment in the drawing is the same as that of the first embodiment shown in FIG. 1, description will be made with reference to FIG. The document search apparatus according to the second embodiment is configured such that the neighborhood condition determination unit 108 of the document search apparatus according to the first embodiment can determine the in-neighbor conditions with keyword group order designation. Further, since the entire flow of the search process is the same as that shown in FIG. 3, the description thereof is omitted here.

ここで、キーワード集合の順序指定あり近傍内条件について説明する。キーワード集合の順序指定あり近傍内条件とは、キーワード集合が指定された順序で出現し、かつ前後のキーワード集合間の距離が指定された距離以下であるか否かを判定する近傍条件である。今、キーワード集合Ｓ_１〜Ｓ_ｎがこの順序で指定されたとする。また、そのキーワード集合間の距離ｄが指定されたとする。このとき、条件は以下のように判定される。

Here, the in-neighbor conditions with keyword group order specification will be described. The in-neighbor condition with keyword set order specification is a neighborhood condition for determining whether or not a keyword set appears in the specified order and whether the distance between the preceding and following keyword sets is equal to or less than the specified distance. Now, keyword set _S 1 to S _n are as specified in this order. Further, it is assumed that the distance d between the keyword sets is designated. At this time, the condition is determined as follows.

図５は、実施の形態２における、図１の検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れ図である。
検索処理実行部１０５は、検索条件解析部１０４から、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離ｄおよび順序指定あり近傍内条件の判定指示が与えられると、近傍条件判定部１０８が、ステップＳＴ５０１でキーワード集合Ｓ_１〜Ｓ_ｎの、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１〜ｋ_ｎを、キーワード集合照合部１０７から取得する。次に、ステップＳＴ５０２において、近傍条件判定部１０８は、取得したキーワード集合の出現位置が、順序指定あり近傍内条件に適合しているか否かを判定する。即ち、ｉ＝１〜ｎ−１に対して、ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）≦ｄを全て満たしているか否かを判定する。 FIG. 5 is a flowchart of the process (step ST303 in FIG. 3) of the search process execution unit 105 in FIG. 1 in the second embodiment.
Search processing execution unit 105, a search condition analysis unit 104, the determination instruction for distance d and ordering there near the conditions between keyword set S ₁ to S _n and keyword set is given, the neighborhood condition determination unit 108, the keyword set _S 1 to S _n in step ST 501, the first occurrence in a document determination target, the keyword _k 1 to k _n in its appearance position is obtained from the keyword set collation unit 107. Next, in step ST502, the neighborhood condition determining unit 108 determines whether or not the appearance position of the acquired keyword set conforms to the in-neighbor conditions with order designation. That is, it is determined with respect to ^{_{i = 1~n-1, DIST D}} (k i, k i + 1) whether to satisfy all ≦ d.

ステップＳＴ５０２において、判定対象の文書に対して、取得したキーワード集合の出現位置が、順序指定あり近傍内条件に適合しているならば（ＹＥＳ）、ステップＳＴ５０６に移行し、「適合した」を出力して終了する。一方、ステップＳＴ５０２において、順序指定あり近傍内条件に適合していなければ（ＮＯ）、ステップＳＴ５０３に移行する。ステップＳＴ５０３では、ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）≦ｄを満たさなかったｋ_ｉ、ｋ_ｉ＋１の組で、ｉが最も小さいものに対して、どちらのキーワードが文書中で先に出現しているか判定する。 In step ST502, if the appearance position of the acquired keyword set for the document to be determined matches the in-neighbor condition with order specification (YES), the process proceeds to step ST506, and “adapted” is output. And exit. On the other hand, if it is determined in step ST502 that the order designation and in-neighbor conditions are not met (NO), the process proceeds to step ST503. In step ST503, which keyword appears first in the document with respect to the smallest i in the set of k _i and k _{i + 1} that did not satisfy DIST ^D (k _i , k _{i + 1} ) ≦ d. judge.

ステップＳＴ５０３において、判定対象の文書中で、ｋ_ｉのほうがｋ_ｉ＋１より先に出現していれば（ＹＥＳ）、ステップＳＴ５０４でＳ_ｉの次の出現位置と、その出現位置にあるキーワードｋ_ｉを取得する。一方、ステップＳＴ５０３で、判定対象の文書中で、ｋ_ｉ＋１の方がｋ_ｉより先に出現していれば（ＮＯ）、ステップＳＴ５０５に移行してＳ_ｉ＋１の次の出現位置と、その出現位置にあるキーワードｋ_ｉ＋１を取得する。ステップＳＴ５０４もしくはステップＳＴ５０５で、キーワード集合の次の出現位置と、その出現位置にあるキーワードを取得したら、ステップＳＴ５０２に戻って、取得したキーワード集合の位置が順序指定あり近傍内条件に適合するか否かを判定する。 In step ST 503, in a document to be determined, if more of _{k i} is long appeared earlier than _{k i + 1} (YES), the next occurrence position of _{S i} in step ST 504, the keyword _{k i} at that appearance positions get. On the other hand, if k _{i + 1} appears earlier than k _{i in} the document to be determined in step ST503 (NO), the process proceeds to step ST505 and the next appearance position of S _{i + 1} and its appearance position. The keyword k _{i + 1 in} is acquired. When the next appearance position of the keyword set and the keyword at the appearance position are acquired in step ST504 or step ST505, the process returns to step ST502, and whether or not the acquired position of the keyword set conforms to the in-neighbor conditions with specified order. Determine whether.

尚、図５の処理の流れ図では省略したが、ステップＳＴ５０１、ステップＳＴ５０４、ステップＳＴ５０５で、キーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に順序指定あり近傍内条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。 Although omitted in the flowchart of the processing of FIG. 5, if the appearance position of the keyword set cannot be acquired in step ST501, step ST504, or step ST505, the order is specified in the determination target document and the near-in condition is set. Since there is no matching appearance position, “not fit” is output and the process ends.

図６は、実施の形態２の図５に示した処理の流れによる、キーワード集合の順序指定あり近傍内条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２、Ｓ_３がこの順序で与えられ、距離の指定がｄであったとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２，ｐ_３｝、Ｓ_２ ^Ｄ＝｛ｐ_４，ｐ_５，ｐ_６｝、Ｓ_３ ^Ｄ＝｛ｐ_７，ｐ_８，ｐ_９｝とする。それらの出現位置の関係は、図６の通りであるとする。また、キーワード集合Ｓ_１、Ｓ_２、Ｓ_３としては、例えば、Ｓ_１は｛デジタル，ｄｉｇｉｔａｌ，ディジタル｝、Ｓ_２は｛カメラ，ｃａｍｅｒａ，写真機｝といったように、それぞれのキーワード集合が複数のキーワードを含むものである。 FIG. 6 shows a determination path for the in-neighbor conditions with keyword group order designation according to the processing flow shown in FIG. 5 of the second embodiment.
Now, it is assumed that keyword sets S ₁ , S ₂ , S ₃ are given in this order as search conditions, and the distance designation is d. The appearance position of each keyword set in the document D to be determined is S ₁ ^D = {p ₁ , p ₂ , p ₃ }, S ₂ ^D = {p ₄ , p ₅ , p ₆ }, S ₃ ^D = {P ₇ , p ₈ , p ₉ }. The relationship between the appearance positions is as shown in FIG. Further, as the keyword sets S ₁ , S ₂ , S ₃ , for example, S ₁ is {digital, digital, digital}, S ₂ is {camera, camera, camera}, and each keyword set includes a plurality of keyword sets. Includes keywords.

このような検索条件が与えられたとき、以下のようにして判定が実行される。尚、以下の（１）〜（４）の処理は図６中の（１）〜（４）に対応している。 When such a search condition is given, the determination is executed as follows. The following processes (1) to (4) correspond to (1) to (4) in FIG.

（１）図５のステップＳＴ５０１でキーワード集合Ｓ_１〜Ｓ_３の文書Ｄ中での最初の出現位置ｐ_１、ｐ_４、ｐ_７が取得される。次にステップＳＴ５０２でこの出現位置が順序指定あり近傍内条件に適合しているか否かを判定する。図６の場合、キーワード集合Ｓ_２がＳ_１よりも先に出現しているため、順序指定あり近傍内条件に適合していない。よって、ステップＳＴ５０３に移行する。ステップＳＴ５０３で、キーワード集合Ｓ_１、Ｓ_２の文書Ｄ中での出現位置を判定し、Ｓ_２の方が先に出現しているため、ステップＳＴ５０５で、キーワード集合Ｓ_２の次の出現位置ｐ_５を取得する。 (1) The first appearance positions p ₁ , p ₄ , and p _{7 in} the document D of the keyword sets S _{1 to} S ₃ are acquired in step ST501 of FIG. Next, in step ST502, it is determined whether or not the appearance position is in conformity with the in-neighbor conditions with the specified order. In the case of FIG. 6, since the keyword set S ₂ appears before S _1, it does not conform to the in-neighbor conditions with order designation. Therefore, the process proceeds to step ST503. In step ST503, the appearance positions of the keyword sets S ₁ and S _{2 in} the document D are determined. Since S ₂ appears earlier, the next appearance position p of the keyword set S _{2 in} step ST505. _{Get 5} .

（２）今度は、Ｓ_１、Ｓ_３の出現位置ｐ_１、ｐ_７と、ステップＳＴ５０５で取得したｐ_５に対してステップＳＴ５０２で順序指定あり近傍内条件の判定を実施する。今度はＳ_１、Ｓ_２間の距離がｄより大きいため、順序指定あり近傍内条件に適合していない。よって、ステップＳＴ５０３に移行する。ステップＳＴ５０３で、Ｓ_１の方がＳ_２よりも先に文書Ｄ中に出現しているため、ステップＳＴ５０４でＳ_１の次の出現位置ｐ２を取得する。 (2) This time, in step ST502, the determination of the in-neighbor conditions with order designation is performed on the appearance positions p ₁ and p _{7 of} S ₁ and S ₃ and p ₅ acquired in step ST505. In this case, since the distance between S ₁ and S ₂ is larger than d, the in-neighbor condition is not satisfied with the order designation. Therefore, the process proceeds to step ST503. In step ST 503, towards _{S 1} it is because it appears in the document D earlier than _{S 2,} to obtain the following appearance position p2 of _{S 1} in step ST 504.

（３）ステップＳＴ５０２で、出現位置ｐ_２、ｐ_５、ｐ_７に対して順序あり近傍内条件を判定する。今度はＳ_３の方がＳ_２よりも文書Ｄ中で先に出現しているため、ステップＳＴ５０５でＳ_３の次の出現位置ｐ_８を取得する。 (3) in step ST 502, it determines the order has near the condition for the occurrence position _{_{_{p 2, p 5, p 7}}} . Now since the direction of _{S 3} have appeared previously in the document D than _{S 2,} to get the next occurrence position _{p 8} of _{S 3} in step ST505.

（４）ステップＳＴ５０２で、出現位置ｐ_２、ｐ_５、ｐ_７に対して順序指定あり近傍内条件を判定し、条件に適合しているため、ステップＳＴ５０６で「適合した」を出力する。 (4) In step ST502, the in-neighbor conditions with order designation are determined for the appearance positions p ₂ , p ₅ , and p ₇ , and since “matched” is output because “matched” is output in step ST506.

キーワードの文書中での出現位置の数は、文書の文字数Ｎに比例するとみなすことができる。即ち、キーワードの文書中での出現位置の数はオーダ記法でＯ（Ｎ）（オーダＮ）と表せる。キーワード集合に含まれるキーワード数をＭとする。今、Ｋ個のキーワード集合間の順序指定あり近傍内条件を判定する場合を考える。 The number of appearance positions of the keyword in the document can be regarded as being proportional to the number N of characters in the document. That is, the number of appearance positions of the keyword in the document can be expressed as O (N) (order N) in the order notation. Let M be the number of keywords included in the keyword set. Consider a case where an in-neighbor condition with order specification between K keyword sets is determined.

従来の特許文献１記載の方式では、キーワードの出現位置を、総当りによって順序指定あり近傍内条件を判定する。このとき、Ｋ個のキーワード間の条件判定に要する計算量はＯ（Ｎ^Ｋ）であった。また、キーワード集合間の判定は、キーワード集合を展開して個々に条件を判定しなければならなかった。よって、組み合わせの数はＭ^Ｋとなる。よって、Ｋ個のキーワード集合間の順序指定あり近傍内条件の判定に要する計算量は、Ｏ（（ＭＮ）^Ｋ）となる。 In the conventional method described in Patent Document 1, the in-neighbor condition with the order designation is determined by the round robin for the appearance position of the keyword. At this time, the amount of calculation required for determining the condition between K keywords was O (N ^K ). Further, the determination between keyword sets has to be performed by expanding the keyword sets and individually determining the conditions. Thus, the number of combinations will be ^{M K.} Therefore, the amount of calculation required to determine the in-neighbor condition with the order between the K keyword sets is O ((MN) ^K ).

これに対して、実施の形態２の順序指定あり近傍内条件の判定の場合を考える。キーワード集合の文書中での出現位置の数は、文書中の全文字数Ｎと、キーワード集合内のキーワード数Ｍに比例するとみなすことができる。即ち、Ｏ（ＭＮ）である。本実施の形態の判定方式では、キーワード集合の順序指定あり近傍内条件を、最大でも全てのキーワード集合の出現位置の和の回数で判定可能である。よって、Ｋ個のキーワード間の順序指定あり近傍内条件の判定に要する計算量は、Ｏ（ＫＭＮ）である。 On the other hand, consider the case of determination of in-neighbor conditions with order designation in the second embodiment. The number of appearance positions in the document of the keyword set can be considered to be proportional to the total number N of characters in the document and the number of keywords M in the keyword set. That is, O (MN). In the determination method of the present embodiment, it is possible to determine the condition within the neighborhood with the specified keyword set order by the number of times of the sum of the appearance positions of all the keyword sets. Therefore, the amount of calculation required for determining the in-neighbor condition with order specification between K keywords is O (KMN).

このように、実施の形態２の文書検索装置では、キーワード集合間の順序指定あり近傍内条件を、従来の方式と比較して高速に判定することが可能である。 As described above, in the document search apparatus according to the second embodiment, it is possible to determine the in-neighbor conditions with the order designation between the keyword sets at a higher speed than in the conventional method.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得して、順序指定あり近傍内条件を判定する方法を示したが、文書の末尾から先頭に向かって順に条件を判定しても、同様の効果が得られる。 Here, the method of acquiring the appearance position of the keyword set in order from the beginning of the document and determining the in-neighbor conditions with the order designation has been shown, but even if the conditions are determined in order from the end to the beginning of the document, Similar effects can be obtained.

以上のように、実施の形態２によれば、近傍条件判定部１０８が行う判定処理の近傍条件として、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が指定された距離以下である順序指定あり近傍内条件としたので、従来に比べて、順序指定あり近傍内条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the second embodiment, when all keyword sets appear in the document, the distance between the preceding and following keyword sets is specified as the vicinity condition of the determination process performed by the vicinity condition determination unit 108. Compared to the conventional method, the near-internal condition with specified order that is less than the distance can reduce the amount of calculation required to determine the in-neighbor condition with specified order, and shorten the search time in such search processing. Can do.

実施の形態３．
実施の形態３は、キーワード集合の順序指定なし近傍内条件を判定するようにしたものである。 Embodiment 3 FIG.
In the third embodiment, the in-neighbor conditions without specifying the order of the keyword set are determined.

実施の形態３における図面上の構成は、図１に示した実施の形態１と同様であるため、図１を援用して説明する。また、実施の形態３の文書検索装置は、実施の形態１の文書検索装置の近傍条件判定部１０８で、キーワード集合の順序指定なし近傍内条件を判定できるように構成したものである。また、検索処理の全体の流れは図３に示したものと同等であるため、ここでの説明は省略する。 Since the configuration of the third embodiment on the drawing is the same as that of the first embodiment shown in FIG. 1, description will be made with reference to FIG. The document search apparatus according to the third embodiment is configured such that the neighborhood condition determination unit 108 of the document search apparatus according to the first embodiment can determine the in-neighbor conditions without specifying the order of the keyword set. Further, since the entire flow of the search process is the same as that shown in FIG. 3, the description thereof is omitted here.

ここで、キーワード集合の順序指定なし近傍内条件について説明する。キーワード集合の順序指定なし近傍内条件とは、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が指定された距離以下であるか否かを判定する近傍条件である。今、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離ｄが与えられたとき、順序指定なし近傍内条件は以下のように判定される。

Here, the condition in the neighborhood without specifying the order of the keyword set will be described. The in-neighbor condition without specifying the order of keyword sets is a neighborhood condition for determining whether or not the distance between preceding and following keyword sets is equal to or less than a specified distance when all keyword sets appear in a document. Now, when the keyword set S ₁ to S _n and the distance d between the keyword set is given, without ordering near the condition is determined as follows.

図７は、実施の形態３における、キーワード集合が二つの場合の、図１の検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れ図である。
検索処理実行部１０５に対してキーワード集合Ｓ_１、Ｓ_２とキーワード集合間の距離ｄが与えられると、近傍条件判定部１０８は、ステップＳＴ７０１でキーワード集合Ｓ_１、Ｓ_２の、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１、ｋ_２を、キーワード集合照合部１０７から取得する。次に、近傍条件判定部１０８は、ステップＳＴ７０２において、取得したキーワード集合の出現位置が、順序指定あり近傍内条件に適合しているか否かを判定する。即ち、ＤＩＳＴ^Ｄ（ｋ_１，ｋ_２）≦ｄまたはＤＩＳＴ^Ｄ（ｋ_２，ｋ_１）≦ｄを満たしているか否かを判定する。 FIG. 7 is a flowchart of processing (step ST303 in FIG. 3) of the search processing execution unit 105 in FIG. 1 when there are two keyword sets in the third embodiment.
When the distance d between the keyword sets S ₁ and S ₂ and the keyword set is given to the search processing execution unit 105, the neighborhood condition determination unit 108 determines the document to be determined of the keyword sets S ₁ and S ₂ in step ST701. The first appearance position and the keywords k ₁ and k ₂ at the appearance position are acquired from the keyword set collating unit 107. Next, in step ST702, the neighborhood condition determining unit 108 determines whether the appearance position of the acquired keyword set conforms to the in-neighbor conditions with order designation. That is, it is determined whether DIST ^D (k ₁ , k ₂ ) ≦ d or DIST ^D (k ₂ , k ₁ ) ≦ d is satisfied.

ステップＳＴ７０２において、判定対象の文書に対して、取得したキーワード集合の出現位置が、順序指定なし近傍内条件に適合しているならば（ＹＥＳ）、ステップＳＴ７０６に移行し、「適合した」を出力して終了する。一方、ステップＳＴ７０２において、順序指定あり近傍内条件に適合していなければ（ＮＯ）、ステップＳＴ７０３に移行する。ステップＳＴ７０３では、ｋ_１とｋ_２のどちらのキーワードが文書中で先に出現しているか判定する。判定対象の文書中で、ｋ_１のほうがｋ_２より先に出現していれば（ＹＥＳ）、ステップＳＴ７０４でＳ_１の次の出現位置と、その出現位置にあるキーワードｋ_１を取得する。 If, in step ST702, the appearance position of the acquired keyword set for the document to be determined matches the in-neighbor conditions without order designation (YES), the process proceeds to step ST706, and “matched” is output. And exit. On the other hand, if it is determined in step ST702 that the in-neighbor conditions with specified order are not met (NO), the process proceeds to step ST703. In step ST703, it is determined whether either of the keyword of _{k 1} and _{k 2} have emerged earlier in the document. In the document to be determined, more of _{k 1} is if appearing before the _{k 2} (YES), the next occurrence position of _{S 1} in step ST704, acquires the keyword _{k 1} in its appearance position.

ステップＳＴ７０３で、判定対象の文書中で、ｋ_２のほうがｋ_１より先に出現していれば（ＮＯ）、ステップＳＴ７０５でＳ_２の次の出現位置と、その出現位置にあるキーワードｋ_２を取得する。ステップＳＴ７０４もしくはステップＳＴ７０５で、キーワード集合の次の出現位置と、その出現位置にあるキーワードを取得したら、ステップＳＴ７０２に戻って、取得したキーワード集合の位置が順序指定あり近傍内条件に適合するか否かを判定する。 In step ST 703, in a document to be determined, if more of the _{k 2} is long appeared earlier than _{k 1} (NO), the next occurrence position of _{S 2} in step ST 705, the keyword _{k 2} at that appearance positions get. When the next appearance position of the keyword set and the keyword at the appearance position are acquired in step ST704 or step ST705, the process returns to step ST702, and whether or not the position of the acquired keyword set meets the in-neighbor conditions with specified order. Determine whether.

尚、図７の処理の流れ図では省略したが、ステップＳＴ７０１、ステップＳＴ７０４、ステップＳＴ７０５で、キーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に順序指定なし近傍内条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。 Although not shown in the flowchart of the processing in FIG. 7, if the appearance position of the keyword set cannot be acquired in step ST701, step ST704, or step ST705, the near-in-condition without order designation is set in the determination target document. Since there is no matching appearance position, “not fit” is output and the process ends.

図８は、実施の形態３の図７に示した処理の流れによる、二つのキーワード集合の順序指定なし近傍内条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２と距離ｄが与えられたとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２｝、Ｓ_２ ^Ｄ＝｛ｐ_３，ｐ_４，ｐ_５｝とする。それらの出現位置の関係は、図８の通りであるとする。このような検索条件が与えられたとき、以下のようにして判定が実行される。尚、以下の（１）〜（３）の処理は、図８中の（１）〜（３）に対応している。 FIG. 8 shows a path for determining the in-neighbor conditions without specifying the order of the two keyword sets, according to the processing flow shown in FIG. 7 of the third embodiment.
Assume that keyword sets S ₁ and S ₂ and a distance d are given as search conditions. The appearance position of each keyword set in the document D to be determined is S ₁ ^D = {p ₁ , p ₂ } and S ₂ ^D = {p ₃ , p ₄ , p ₅ }. The relationship between the appearance positions is as shown in FIG. When such a search condition is given, the determination is executed as follows. The following processes (1) to (3) correspond to (1) to (3) in FIG.

（１）図７のステップＳＴ７０１でＳ_１、Ｓ_２の文書Ｄ中での最初の出現位置ｐ_１、ｐ_３が取得される。次に、ステップＳＴ７０２で、この出現位置が順序指定なし近傍内条件に適合しているか否かを判定する。図８の場合、キーワード集合Ｓ_２とＳ_１間の距離がｄより大きいため順序指定なし近傍内条件に適合していない。よって、ステップＳＴ７０３に移行する。ステップＳＴ７０３で、キーワード集合Ｓ_１、Ｓ_２の文書Ｄ中での出現位置を判定し、Ｓ_２の方が先に出現しているため、ステップＳＴ７０５で、キーワード集合Ｓ_２の次の出現位置ｐ_４を取得する。 (1) In step ST701 in FIG. 7, the first appearance positions p ₁ and p _{3 in} the document D of S ₁ and S ₂ are acquired. Next, in step ST702, it is determined whether or not this appearance position meets the in-neighbor conditions without order designation. In the case of FIG. 8, since the distance between the keyword sets S ₂ and S ₁ is larger than d, it does not meet the in-neighbor condition without order designation. Therefore, the process proceeds to step ST703. In step ST703, the appearance positions of the keyword sets S ₁ and S _{2 in} the document D are determined. Since S ₂ appears earlier, the next appearance position p of the keyword set S _{2 in} step ST705. ₄ is acquired.

（２）今度は、Ｓ_１の出現位置ｐ_１と、ステップＳＴ７０５で取得したｐ_４に対してステップＳＴ７０２で順序指定なし近傍内条件の判定を実施する。今度もＳ_１、Ｓ_２間の距離がｄより大きいため、順序指定なし近傍内条件に適合していない。よって、ステップＳＴ７０３に移行する。ステップＳＴ７０３で、Ｓ_１の方がＳ_２よりも先に文書Ｄ中に出現しているため、ステップＳＴ７０４でＳ_１の次の出現位置ｐ_２を取得する。 (2) This time, the appearance position _{p 1} of _{S 1,} to implement the determination of the ordering without near the condition in step ST702 with respect to _{p 4} acquired in step ST 705. Again, since the distance between S ₁ and S ₂ is greater than d, it does not meet the in-neighbor condition without order designation. Therefore, the process proceeds to step ST703. In step ST 703, towards _{S 1} it is because it appears in the document D earlier than _{S 2,} to get the next occurrence position _{p 2} of _{S 1} at step ST704.

（３）ステップＳＴ７０２で出現位置ｐ_２、ｐ_４に対して順序なし近傍内条件を判定し、条件に適合しているため、ステップＳＴ７０６に移行して「適合した」を出力する。 (3) In Step ST702, the in-order neighborhood condition is determined for the appearance positions p ₂ and p ₄ , and the condition is satisfied. Therefore, the process proceeds to Step ST706, and “Applicable” is output.

次に、三つ以上のキーワード集合に対する検索処理を説明する。
図９は、実施の形態３における、三つ以上のキーワード集合に対する、検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れ図である。
検索処理実行部１０５は、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離ｄが与えられると、ステップＳＴ９０１でキーワード集合Ｓ_１〜Ｓ_ｎの順序を一つ決定する。尚、最初の順序は任意である。次に、近傍条件判定部１０８は、ステップＳＴ９０２において、ステップＳＴ９０１で決められたキーワード集合の順序で、順序指定あり近傍内条件を判定する。この順序指定あり近傍内条件の判定処理は、実施の形態２と同様に実施される。 Next, a search process for three or more keyword sets will be described.
FIG. 9 is a flowchart of processing (step ST303 in FIG. 3) of the search processing execution unit 105 for three or more keyword sets in the third embodiment.
Search processing execution unit 105, the keyword set _S 1 to S _n and the distance d between the keyword set is given, one determines the order of the keyword set _S 1 to S _n in step ST 901. Note that the initial order is arbitrary. Next, in step ST902, the neighborhood condition determination unit 108 determines the in-neighbor conditions with order designation in the order of the keyword set determined in step ST901. The determination process of the in-neighbor condition with order designation is performed in the same manner as in the second embodiment.

ステップＳＴ９０２において、判定対象の文書が、ステップＳＴ９０１で決められたキーワード集合の順序で順序指定あり近傍内条件に適合しているならば（ＹＥＳ）、ステップＳＴ９０５に移行し、「適合した」を出力する。一方、ステップＳＴ９０２で、順序指定あり近傍内条件に適合していなければ（ＮＯ）、ステップＳＴ９０３で、まだステップＳＴ９０２で判定していないキーワード集合の順序の組み合わせがあるか否か判定する。ステップＳＴ９０３において、まだ判定していないキーワード集合の順序の組み合わせがある場合（ＹＥＳ）、ステップＳＴ９０１で、まだ判定していないキーワード集合の順序を決定し、ステップＳＴ９０２で、その順序による順序指定あり近傍内条件を判定する。ステップＳＴ９０３で、全てのキーワード集合の順序の組み合わせに対して、順序指定あり近傍内条件の判定を実施したと判定された場合（ＮＯ）、ステップＳＴ９０４で「適合しない」を出力して終了する。 In step ST902, if the document to be determined conforms to the in-neighbor conditions with order designation in the order of the keyword set determined in step ST901 (YES), the process proceeds to step ST905, and “conforms” is output. To do. On the other hand, if it is determined in step ST902 that the specified neighborhood condition is not satisfied (NO), it is determined in step ST903 whether there is a combination of the order of keyword sets that has not yet been determined in step ST902. In step ST903, if there is a combination of the order of keyword sets that have not been determined yet (YES), the order of keyword sets that have not been determined yet is determined in step ST901, and in step ST902, the neighborhood with the order specified by that order. Determine internal conditions. If it is determined in step ST903 that the determination of the in-neighbor condition with specified order has been performed for the combinations of the order of all keyword sets (NO), “not fit” is output in step ST904, and the process ends.

この順序指定なし近傍内条件の判定処理で、近傍条件判定部１０８はキーワード集合照合部１０７より取得したキーワード集合の出現位置の情報を、図示しないメモリ上に記憶しておいてもよい。そうすることで、一度取得したキーワード集合の出現位置を再度取得する場合に、キーワード集合照合部１０７を呼び出さなくて良い。 In the determination process of the in-neighbor condition without order designation, the vicinity condition determination unit 108 may store information on the appearance position of the keyword set acquired from the keyword set matching unit 107 in a memory (not shown). By doing so, it is not necessary to call the keyword set matching unit 107 when acquiring the appearance position of the keyword set once acquired.

文書の文字数をＮ、キーワード集合に含まれるキーワード数をＭとする。先ず、２個のキーワード集合間の順序指定なし近傍内条件を判定する場合を考える。従来の特許文献１の方式の、キーワードの出現位置を総当りによって順序指定なし近傍内条件を判定するものでは、２個のキーワード集合間の条件の判定に要する計算量は、Ｏ（（ＭＮ）^２）となる。一方、実施の形態３の順序指定なし近傍内条件の判定の場合は、２個のキーワード間の順序指定あり近傍内条件の判定に要する計算量は、Ｏ（２ＭＮ）である。 Let N be the number of characters in the document and M be the number of keywords included in the keyword set. First, let us consider a case where an in-neighbor condition without order specification between two keyword sets is determined. In the conventional method of Patent Document 1, in which the in-neighbor conditions without order designation are determined by brute force, the amount of calculation required for determining the condition between two keyword sets is O ((MN) ² ). On the other hand, in the case of the determination of the in-neighbor condition without order designation according to the third embodiment, the amount of calculation required for the determination of the in-neighbor condition with order designation between two keywords is O (2MN).

次にＫ個のキーワード集合間の順序指定なし近傍内条件を判定する場合を考える。ここでは、一つのキーワード集合の文書中での出現位置の数をＮとする。従来の特許文献１の方式では、Ｋ個のキーワード集合間の順序指定なし近傍内条件の判定に要する計算量は、Ｏ（Ｎ^Ｋ）となる。 Next, consider a case in which an in-neighbor condition without order specification between K keyword sets is determined. Here, the number of appearance positions in the document of one keyword set is N. In the conventional method of Patent Document 1, the amount of calculation required for determining the in-neighbor condition without ordering between K keyword sets is O (N ^K ).

本実施の形態２の順序指定あり近傍内条件の判定に要する計算量はＯ（ＫＮ）である。Ｋ個のキーワード間の順序指定なし近傍内条件の場合は、キーワード集合の順序の組み合わせを変更しながら、順序指定あり近傍内条件を判定する。Ｋ個のキーワード集合の順序の組み合わせは、Ｋ！（Ｋの階乗）通りある。よって、Ｋ個のキーワード集合間の順序指定なし近傍内条件の判定に要する計算量は、Ｏ（Ｋ！ＫＮ）である。この計算量は、ＫがＮより小さければ、従来方式よりも少ない計算量である。 The amount of calculation required for determining the in-neighbor condition with the order designation in the second embodiment is O (KN). In the case of an in-neighbor condition without order specification between K keywords, the in-neighbor condition with order designation is determined while changing the combination of the order of the keyword set. The combination of the order of K keyword sets is K! There is (K factorial). Therefore, the amount of calculation required for determining the in-neighbor condition without ordering between the K keyword sets is O (K! KN). This calculation amount is less than the conventional method if K is smaller than N.

以上のように、実施の形態３によれば、近傍条件判定部１０８が行う判定処理の近傍条件として、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が指定された距離以下である順序指定なし近傍内条件としたので、従来に比べて、順序指定なし近傍内条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the third embodiment, when all keyword sets appear in the document, the distance between the preceding and following keyword sets is specified as the vicinity condition of the determination process performed by the vicinity condition determination unit 108. Because the neighborhood condition without order designation that is less than the distance is used, the amount of calculation required to determine the condition within neighborhood without order designation can be reduced compared to the conventional case, and the search time in such search processing can be shortened. Can do.

実施の形態４．
実施の形態４は、キーワード集合の順序指定あり近傍等条件を判定するようにしたものである。 Embodiment 4 FIG.
In the fourth embodiment, conditions such as neighborhoods with keyword group order designation are determined.

実施の形態４における図面上の構成は、図１に示した実施の形態１と同様であるため、図１を援用して説明する。実施の形態４の文書検索装置は、実施の形態１の文書検索装置の近傍条件判定部１０８で、キーワード集合の順序指定あり近傍等条件を判定できるように構成したものである。また、検索処理の全体の流れは図３に示したものと同等であるため、ここでの説明は省略する。 The configuration of the fourth embodiment on the drawing is the same as that of the first embodiment shown in FIG. 1, and will be described with reference to FIG. The document search apparatus according to the fourth embodiment is configured such that the proximity condition determination unit 108 of the document search apparatus according to the first embodiment can determine conditions such as the vicinity of keywords set in the specified order. Further, since the entire flow of the search process is the same as that shown in FIG. 3, the description thereof is omitted here.

ここで、キーワード集合の順序指定あり近傍等条件について説明する。キーワード集合の順序指定あり近傍等条件とは、キーワード集合が指定された順序で出現し、かつ前後のキーワード集合間の距離が真に指定された距離であるか否かを判定する近傍条件である。今、キーワード集合Ｓ_１〜Ｓ_ｎがこの順序で指定されたとする。また、そのキーワード集合間の距離ｄが指定されたとする。このとき、条件は以下のように判定される。

Here, a condition such as a neighborhood with keyword group order specification will be described. A keyword set order neighborhood condition is a neighborhood condition that determines whether a keyword set appears in the specified order and whether the distance between the preceding and following keyword sets is a truly specified distance. . Now, keyword set _S 1 to S _n are as specified in this order. Further, it is assumed that the distance d between the keyword sets is designated. At this time, the condition is determined as follows.

図１０は、実施の形態４における、検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れ図である。
検索処理実行部１０５は、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離ｄが与えられると、近傍条件判定部１０８は、ステップＳＴ１００１でキーワード集合Ｓ_１〜Ｓ_ｎの、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１〜ｋ_ｎを、キーワード集合照合部１０７から取得する。次に、近傍条件判定部１０８は、ステップＳＴ１００２において、取得したキーワード集合の出現位置が、順序指定あり近傍等条件に適合しているか否かを判定する。即ち、ｉ＝１〜ｎ−１に対して、ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）＝ｄを全て満たしているか否かを判定する。 FIG. 10 is a flowchart of the process (step ST303 in FIG. 3) of the search process execution unit 105 in the fourth embodiment.
Search processing execution unit 105, the keyword set _S 1 to S _n and the distance d between the keyword set is given, neighborhood condition judging unit 108, the keyword set _S 1 to S _n in step ST 1001, in the document to be determined first and appearance position, the keyword k ₁ to k _n in its appearance position for, acquires from the keyword set collation unit 107. Next, in step ST1002, the neighborhood condition determination unit 108 determines whether the appearance position of the acquired keyword set conforms to the conditions such as neighborhood with order designation. That is, it is determined with respect to ^{_{i = 1~n-1, DIST D}} (k i, k i + 1) whether to satisfy all = d.

ステップＳＴ１００２において、判定対象の文書に対して、取得したキーワード集合の出現位置が、順序指定あり近傍等条件に適合しているならば（ＹＥＳ）、ステップＳＴ１００３に移行し、「適合した」を出力して終了する。一方、ステップＳＴ１００２において、判定対象の文書に対して、取得したキーワード集合の出現位置が、順序指定あり近傍等条件に適合していなければ（ＮＯ）、ステップＳＴ１００４に移行する。 In step ST1002, if the appearance position of the acquired keyword set conforms to the conditions such as neighborhood with order designation for the determination target document (YES), the process proceeds to step ST1003, and “conforms” is output. And exit. On the other hand, in step ST1002, if the appearance position of the acquired keyword set does not conform to the conditions such as neighborhood with order designation for the determination target document (NO), the process proceeds to step ST1004.

ステップＳＴ１００４では、ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）＝ｄを満たさなかったｋ_ｉ、ｋ_ｉ＋１の組で、ｉが最も小さいものに対して、どちらのキーワード集合が、文書中でより先に出現しているか判定する。Ｓ_ｉが文書中で先に出現していた場合（ＹＥＳ）、ステップＳＴ１００５でＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）がｄ未満か否かを判定する。ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）がｄ未満の場合（ＹＥＳ）、ステップＳＴ１００７でキーワード集合Ｓ_ｉ＋１の次の出現位置と、その出現位置にあるキーワードｋ_ｉ＋１を取得する。ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）がｄより大きい場合（ＮＯ）、ステップＳＴ１００６でキーワード集合Ｓ_ｉの次の文書中での出現位置と、その出現位置にあるキーワードｋ_ｉを取得する。 In Step ST1004, which keyword set appears earlier in the document with respect to the combination of k _i and k _{i + 1} that did not satisfy DIST ^D (k _i , k _{i + 1} ) = d and i is the smallest. Judge whether you are doing. If S _i appears first in the document (YES), it is determined in step ST1005 whether DIST ^D (k _i , k _{i + 1} ) is less than d. When DIST ^D (k _i , k _{i + 1} ) is less than d (YES), in step ST1007, the next appearance position of the keyword set S _{i + 1} and the keyword k _{i + 1} at the appearance position are acquired. If DIST ^D (k _i , k _{i + 1} ) is larger than d (NO), in step ST1006, the appearance position in the next document of the keyword set S _i and the keyword k _i at the appearance position are acquired.

ステップＳＴ１００４で、Ｓ_ｉ＋１が文書中で先に出現していた場合（ＮＯ）、ステップＳＴ１００７でキーワード集合Ｓ_ｉ＋１の次の出現位置と、その出現位置にあるキーワードｋ_ｉ＋１を取得する。ステップＳＴ１００６またはステップＳＴ１００７でキーワード集合の次の出現位置と、その出現位置にあるキーワードを取得したら、ステップＳＴ１００２に戻り、取得したキーワード集合の位置が順序指定あり近傍等条件に適合するか否かを判定する。 In step ST1004, when S _{i + 1} appears _first in the document (NO), in step ST1007, the next appearance position of the keyword set S _{i + 1} and the keyword k _{i + 1} at the appearance position are acquired. When the next appearance position of the keyword set and the keyword at the appearance position are acquired in step ST1006 or step ST1007, the process returns to step ST1002, and it is determined whether or not the position of the acquired keyword set conforms to the conditions such as neighborhood with specified order judge.

尚、図１０の処理の流れ図では省略したが、ステップＳＴ１００１、ステップＳＴ１００６、ステップＳＴ１００７のキーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に順序指定あり近傍等条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。 Although omitted in the flowchart of the processing of FIG. 10, if the appearance position of the keyword set in step ST1001, step ST1006, and step ST1007 cannot be obtained, the order is specified in the determination target document and the conditions such as neighborhood are met. Since there is no appearance position to be output, “not fit” is output and the process ends.

図１１は、実施の形態４の図１０に示した処理の流れによる、キーワード集合の順序指定あり近傍等条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２、Ｓ_３がこの順序で与えられ、距離の指定がｄであったとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２，ｐ_３｝、Ｓ_２ ^Ｄ＝｛ｐ_４，ｐ_５，ｐ_６｝、Ｓ_３ ^Ｄ＝｛ｐ_７，ｐ_８，ｐ_９｝とする。それらの出現位置の関係は、図１１の通りであるとする。このような検索条件が与えられたとき、以下のようにして判定が実行される。尚、以下の（１）〜（５）の処理は、図１１中の（１）〜（５）に対応しているものである。 FIG. 11 shows a path for determining a condition such as a neighborhood with keyword group order designation according to the processing flow shown in FIG. 10 of the fourth embodiment.
Now, it is assumed that keyword sets S ₁ , S ₂ , S ₃ are given in this order as search conditions, and the distance designation is d. The appearance position of each keyword set in the document D to be determined is S ₁ ^D = {p ₁ , p ₂ , p ₃ }, S ₂ ^D = {p ₄ , p ₅ , p ₆ }, S ₃ ^D = {P ₇ , p ₈ , p ₉ }. The relationship between their appearance positions is as shown in FIG. When such a search condition is given, the determination is executed as follows. The following processes (1) to (5) correspond to (1) to (5) in FIG.

（１）図１１のステップＳＴ１００１でＳ_１〜Ｓ_３の文書Ｄ中での最初の出現位置ｐ_１、ｐ_４、ｐ_７が取得される。次に、ステップＳＴ１００２でこの出現位置が順序指定あり近傍等条件に適合しているか否かを判定する。図１１の場合、キーワード集合Ｓ_２が、Ｓ_１よりも先に出現しているため、順序指定あり近傍等条件に適合していない。よって、ステップＳＴ１００４に移行する。ステップＳＴ１００４で、キーワード集合Ｓ_１、Ｓ_２の文書Ｄ中での出現位置を判定し、Ｓ_２の方が先に出現しているため、ステップＳＴ１００７で、キーワード集合Ｓ_２の次の出現位置ｐ_５を取得する。 (1) In step ST1001 of FIG. 11, the first appearance positions p ₁ , p ₄ , and p _{7 in} the document D of S _{1 to} S ₃ are acquired. Next, in step ST1002, it is determined whether or not this appearance position is in conformity with a condition such as a neighborhood with order designation. For Figure 11, keyword set S ₂ is, because of the appearance before the S _1, does not conform to the order specified there near such conditions. Therefore, the process proceeds to step ST1004. In step ST1004, the appearance positions of the keyword sets S ₁ and S _{2 in} the document D are determined. Since S ₂ appears earlier, the next appearance position p of the keyword set S _{2 in} step ST1007. _{Get 5} .

（２）ステップＳＴ１００２で、出現位置ｐ_１、ｐ_５、ｐ_７に対して順序指定あり近傍等条件を判定する。今度はＳ_１、Ｓ_２間の距離がｄより大きいため、順序指定あり近傍等条件に適合していない。よって、ステップＳＴ１００４に移行する。ステップＳＴ１００４で、Ｓ_１の方がＳ_２よりも先に文書Ｄ中に出現しているため、ステップＳＴ１００５でＳ_１とＳ_２との間の距離がｄ未満であるか否かを判定する。図１１の場合、Ｓ_１、Ｓ_２間の距離がｄより大きいので、ステップＳＴ１００６でＳ_１の次の出現位置ｐ_２を取得する。 (2) In step ST1002, conditions such as neighborhood with order designation are determined for the appearance positions p ₁ , p ₅ , and p ₇ . This time, since the distance between S ₁ and S ₂ is larger than d, it does not conform to the conditions such as neighborhood with order designation. Therefore, the process proceeds to step ST1004. In step ST 1004, since the direction of _{S 1} is has appeared in the document D earlier than _{S 2,} it is determined whether the distance between the _{S 1} and _{S 2} is less than d in step ST1005. In the case of FIG. 11, since the distance between S ₁ and S ₂ is larger than d, the next appearance position p ₂ of S ₁ is acquired in step ST1006.

（３）ステップＳＴ１００２で、出現位置ｐ_２、ｐ_５、ｐ_７に対して順序指定あり近傍等条件を判定する。今度はＳ_３の方がＳ_２よりも文書Ｄ中で先に出現しているため、ステップＳＴ１００７でＳ_３の次の出現位置ｐ_８を取得する。 (3) In step ST1002, conditions such as neighborhood with order designation are determined for the appearance positions p ₂ , p ₅ , and p ₇ . Since S ₃ appears earlier in document D than S ₂ , the next appearance position p ₈ of S ₃ is acquired in step ST 1007.

（４）ステップＳＴ１００２で、出現位置ｐ_２、ｐ_５、ｐ_８に対して順序指定あり近傍等条件を判定する。Ｓ_２、Ｓ_３間の距離がｄ未満なので、順序指定あり近傍等条件に適合しない。よって、ステップＳＴ１００４に移行する。ステップＳＴ１００４で、Ｓ_２の方がＳ_３よりも先に文書中に出現しているため、ステップＳＴ１００５でＳ_２、Ｓ_３間の距離がｄ未満であるか否かを判定する。Ｓ_２、Ｓ_３間の距離がｄ未満なので、ステップＳＴ１００７でＳ_３の次の出現位置ｐ_９を取得する。 (4) In step ST1002, conditions such as neighborhoods with specified order are determined for the appearance positions p ₂ , p ₅ , and p ₈ . Since the distance between S ₂ and S ₃ is less than d, it does not meet the conditions such as neighborhood with specified order. Therefore, the process proceeds to step ST1004. In step ST 1004, since the direction of _{S 2} have appeared in the document before the _{S 3,} the distance between _S 2, _{S 3} at step ST1005 determines whether less than d. Since the distance between S ₂ and S ₃ is less than d, the next appearance position p ₉ of S ₃ is acquired in step ST1007.

（５）ステップＳＴ１００２で、出現位置ｐ２、ｐ５、ｐ９に対して順序指定あり近傍等条件を判定し、条件に適合するため、ステップＳＴ１００３で「適合した」を出力して終了する。 (5) In step ST1002, conditions such as neighborhood with order designation are determined for the appearance positions p2, p5, and p9, and in order to meet the conditions, “matched” is output in step ST1003 and the process ends.

文書の文字数をＮ、キーワード集合に含まれるキーワード数をＭとする。従来の特許文献１記載の方式では、Ｋ個のキーワード集合間の順序指定あり近傍等条件の判定に要する計算量は、Ｏ（（ＭＮ）^Ｋ）となる。一方、実施の形態４の順序指定なし近傍等条件の判定の場合は、Ｋ個のキーワード間の順序指定あり近傍等条件の判定に要する計算量は、実施の形態２と同様にＯ（ＫＭＮ）である。 Let N be the number of characters in the document and M be the number of keywords included in the keyword set. In the conventional method described in Patent Document 1, the amount of calculation required to determine the condition such as the neighborhood with order specification between K keyword sets is O ((MN) ^K ). On the other hand, in the case of determining the neighborhood condition without order designation in the fourth embodiment, the amount of calculation required for determining the neighborhood condition with order designation between the K keywords is O (KMN) as in the second embodiment. It is.

このように、実施の形態４の文書検索装置では、キーワード集合間の順序指定あり近傍等条件を、従来の方式と比較して高速に判定することが可能である。 As described above, in the document search apparatus according to the fourth embodiment, it is possible to determine the conditions such as the neighborhood with specified order between keyword sets at a higher speed than in the conventional method.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得して、順序指定あり近傍等条件を判定する方法を示したが、文書の末尾から先頭に向かって順に条件を判定しても、同様の効果が得られる。 In addition, here, the method of acquiring the appearance position of the keyword set in order from the top of the document and determining the conditions such as the neighborhood with the order designation is shown, but even if the conditions are determined in order from the end to the top of the document, Similar effects can be obtained.

以上のように、実施の形態４によれば、近傍条件判定部１０８が行う判定処理の近傍条件として、キーワード集合が指定された順序で出現し、かつ、前後のキーワード集合間の距離が指定された距離に等しい順序指定あり近傍等条件としたので、従来に比べて、順序指定あり近傍等条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the fourth embodiment, as a neighborhood condition in the judgment process performed by the neighborhood condition judgment unit 108, keyword sets appear in the designated order, and the distance between the preceding and following keyword sets is designated. As a result, it is possible to reduce the amount of calculation required to determine the neighborhood condition with specified order and to shorten the search time in such a search process. Can do.

実施の形態５．
実施の形態５は、キーワード集合の順序指定無し近傍等条件を判定するようにしたものである。 Embodiment 5 FIG.
In the fifth embodiment, a condition such as the neighborhood without specifying the order of keyword sets is determined.

実施の形態５における図面上の構成は、図１に示した実施の形態１と同様であるため、図１を援用して説明する。実施の形態５の文書検索装置は、実施の形態１の文書検索装置の近傍条件判定部１０８で、キーワード集合の順序指定あり近傍等条件を判定できるように構成したものである。また、検索処理の全体の流れは図３に示したものと同等であるため、ここでの説明は省略する。 The configuration of the fifth embodiment on the drawing is the same as that of the first embodiment shown in FIG. 1, and therefore will be described with reference to FIG. The document search apparatus according to the fifth embodiment is configured such that the proximity condition determination unit 108 of the document search apparatus according to the first embodiment can determine conditions such as the vicinity with keyword group order designation. Further, since the entire flow of the search process is the same as that shown in FIG. 3, the description thereof is omitted here.

ここで、キーワード集合の順序指定なし近傍等条件について説明する。キーワード集合の順序指定なし近傍条件とは、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が真に指定された距離であるか否かを判定する近傍条件である。今、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離がｄに対して、順序指定なし近傍等条件は以下のように判定される。

Here, a condition such as the neighborhood without specifying the order of the keyword set will be described. The keyword set order-free neighborhood condition is a neighborhood condition that determines whether or not the distance between the preceding and following keyword sets is a true specified distance when all keyword sets appear in the document. Now, the distance between the keyword set S ₁ to S _n and keyword set is relative d, unordered specifies vicinity such conditions is determined as follows.

実施の形態５における、検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れは、図９のステップＳＴ９０２でキーワード集合間の順序指定あり近傍等条件を判定するように構成したものと同等であるため、ここでの説明は省略する。 The flow of processing (step ST303 in FIG. 3) of the search processing execution unit 105 in the fifth embodiment is the same as that configured so as to determine conditions such as neighborhoods with order designation between keyword sets in step ST902 in FIG. Therefore, the description here is omitted.

今、一つのキーワード集合の文書中での出現位置の数をＮとする。従来の特許文献１記載の方式では、Ｋ個のキーワード集合間の順序指定なし近傍等条件の判定に要する計算量は、Ｏ（Ｎ^Ｋ）となる。一方、本実施の形態の場合は、Ｋ個のキーワード間の順序指定なし近傍等条件の判定に要する計算量は、実施の形態３と同様にＯ（Ｋ！ＫＮ）である。この計算量は、ＫがＮより小さければ、従来方式よりも少ない計算量である。 Now, let N be the number of appearance positions in a document of one keyword set. In the conventional method described in Patent Document 1, the amount of calculation required for determining a condition such as a neighborhood without order designation between K keyword sets is O (N ^K ). On the other hand, in the case of the present embodiment, the amount of calculation required for determining the condition such as the neighborhood without order designation between the K keywords is O (K! KN) as in the third embodiment. This calculation amount is less than the conventional method if K is smaller than N.

以上のように、実施の形態５によれば、近傍条件判定部１０８が行う判定処理の近傍条件として、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が指定された距離に等しい順序指定なし近傍等条件としたので、従来に比べて、順序指定なし近傍等条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the fifth embodiment, when all keyword sets appear in the document, the distance between the previous and next keyword sets is specified as the vicinity condition of the determination process performed by the vicinity condition determination unit 108. Since the neighborhood condition with no order designation equal to the distance is used, the amount of calculation required to determine the condition with no order designation can be reduced compared to the conventional case, and the search time in such search processing can be shortened. it can.

実施の形態６．
実施の形態６は、キーワード集合の順序指定あり近傍外条件を判定するようにしたものである。 Embodiment 6 FIG.
In the sixth embodiment, an out-of-neighbor condition with a specified order of keyword sets is determined.

実施の形態６における図面上の構成は、図１に示した実施の形態１と同様であるため、図１を援用して説明する。実施の形態６の文書検索装置は、実施の形態１の文書検索装置の近傍条件判定部１０８で、キーワード集合の順序指定あり近傍外条件を判定できるように構成したものである。また、検索処理の全体の流れは図３に示したものと同等であるため、ここでの説明は省略する。 Since the configuration of the sixth embodiment on the drawing is the same as that of the first embodiment shown in FIG. 1, description will be made with reference to FIG. The document search apparatus according to the sixth embodiment is configured so that the proximity condition determination unit 108 of the document search apparatus according to the first embodiment can determine an out-of-proximity condition with a specified keyword set order. Further, since the entire flow of the search process is the same as that shown in FIG. 3, the description thereof is omitted here.

ここで、キーワード集合の順序指定あり近傍外条件について説明する。キーワード集合の順序指定あり近傍外条件とは、キーワード集合が指定された順序で出現し、かつ前後のキーワード集合間の距離が指定された距離以上であるか否かを判定する近傍条件である。今、キーワード集合Ｓ_１〜Ｓ_ｎがこの順序で指定されたとする。また、そのキーワード集合間の距離ｄが指定されたとする。このとき、条件は以下のように判定される。

Here, an out-of-neighbor condition with a specified keyword set order will be described. The out-of-neighbor condition with keyword set order specification is a neighborhood condition for determining whether or not a keyword set appears in the specified order and whether the distance between the preceding and following keyword sets is equal to or greater than the specified distance. Now, keyword set _S 1 to S _n are as specified in this order. Further, it is assumed that the distance d between the keyword sets is designated. At this time, the condition is determined as follows.

図１２は、実施の形態６における、検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れ図である。
検索処理実行部１０５は、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離ｄが与えられると、近傍条件判定部１０８は、ステップＳＴ１２０１でキーワード集合Ｓ_１の、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１を、キーワード集合照合部１０７から取得する。その後は、以下の処理をｉが１〜ｎ−１について繰り返す。 FIG. 12 is a flowchart of the process (step ST303 in FIG. 3) of the search process execution unit 105 in the sixth embodiment.
Search processing execution unit 105, the keyword set _S 1 to S _n and the distance d between the keyword set is given, neighborhood condition judging unit 108, the first keyword set _{S 1,} in the document to be determined in step ST1201 And the keyword k ₁ at the appearance position are acquired from the keyword set collating unit 107. Thereafter, the following processing is repeated for i = 1 to n-1.

ステップＳＴ１２０２では、キーワード集合Ｓ_ｉ＋１の判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_ｉ＋１を取得する。次に、ステップＳＴ１２０３で、取得したキーワード集合Ｓ_ｉとＳ_ｉ＋１の出現位置が、順序指定あり近傍等条件に適合しているか否かを判定する。即ち、ＤＩＳＴ^Ｄ（ｋ_ｉ，ｋ_ｉ＋１）≧ｄを満たしているか否かを判定する。 In step ST1202, the first appearance position in the determination target document of the keyword set S _{i + 1} and the keyword k _{i + 1} at the appearance position are acquired. Next, in step ST1203, it is determined whether or not the appearance positions of the acquired keyword sets S _i and S _{i + 1} match a condition such as neighborhood with order designation. That ^{is, it} is determined whether to satisfy _{_{DIST D (k i, k i}} + 1) of ≧ d.

ステップＳＴ１２０３で判定対象の文書に対して、条件に適合していない場合（ＮＯ）、ステップＳＴ１２０５でＳ_ｉ＋１の次の出現位置を取得し、ステップＳＴ１２０３で取得したキーワード集合間の順序指定あり近傍外条件を判定する。ステップＳＴ１２０３で条件に適合していた場合（ＹＥＳ）、ステップＳＴ１２０２で次のｉに対してキーワード集合Ｓ_ｉ＋１の最初の出現位置と、その出現位置にあるキーワードｋ_ｉ＋１を取得する。この処理を、ｉが１〜ｎ−１について繰り返し、全てのｉについて順序指定あり近傍等条件に適合した場合、ステップＳＴ１２０４で「適合した」を出力して終了する。 If the document does not meet the condition in step ST1203 (NO), the next appearance position of S _{i + 1} is acquired in step ST1205, and there is an order designation between the keyword sets acquired in step ST1203. Determine the conditions. If the condition is met in step ST1203 (YES), the first appearance position of the keyword set S _{i + 1} for the next i and the keyword k _{i + 1} at the appearance position are acquired for the next i in step ST1202. This process is repeated for i ranging from 1 to n−1, and if all i satisfy the conditions such as the neighborhood with order designation, “adapted” is output in step ST1204 and the process ends.

尚、図１２の処理の流れ図では省略したが、ステップＳＴ１２０１、ステップＳＴ１２０５で、キーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に順序指定あり近傍等条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。 Although not shown in the flowchart of the processing in FIG. 12, if the appearance position of the keyword set cannot be acquired in step ST1201 and step ST1205, the order is specified in the document to be judged, and the appearance conforms to the conditions such as the neighborhood. Since there is no position, it outputs “not fit” and ends.

図１３は、実施の形態６の図１２に示した処理の流れによる、キーワード集合の順序指定あり近傍外条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２、Ｓ_３がこの順序で与えられ、距離の指定がｄであったとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２，ｐ_３｝、Ｓ_２ ^Ｄ＝｛ｐ_４，ｐ_５，ｐ_６｝、Ｓ_３ ^Ｄ＝｛ｐ_７，ｐ_８｝とする。それらの出現位置の関係は、図１３の通りであるとする。このような検索条件が与えられたとき、以下のようにして判定が実行される。尚、以下の（１）〜（５）の処理は、図１３中の（１）〜（５）に対応している。 FIG. 13 shows a determination path for the near-neighbor condition with keyword group order designation according to the processing flow shown in FIG. 12 of the sixth embodiment.
Now, it is assumed that keyword sets S ₁ , S ₂ , S ₃ are given in this order as search conditions, and the distance designation is d. The appearance position of each keyword set in the document D to be determined is S ₁ ^D = {p ₁ , p ₂ , p ₃ }, S ₂ ^D = {p ₄ , p ₅ , p ₆ }, S ₃ ^D = {P ₇ , p ₈ }. The relationship between the appearance positions is as shown in FIG. When such a search condition is given, the determination is executed as follows. The following processes (1) to (5) correspond to (1) to (5) in FIG.

（１）図１２のステップＳＴ１２０１でＳ_１の文書Ｄ中での最初の出現位置ｐ_１が取得される。次に、ステップＳＴ１２０２でＳ_２の文書中での最初の出現位置ｐ_４を取得する。ステップＳＴ１２０３でこの出現位置が順序指定あり近傍外条件に適合しているか否かを判定する。図１３の場合、キーワード集合Ｓ_２が、Ｓ_１よりも先に出現しているため、順序指定あり近傍外条件に適合していない。よって、ステップＳＴ１２０５に移行する。ステップＳＴ１２０５で、キーワード集合Ｓ_２の次の出現位置ｐ５を取得する。 (1) first occurrence _{p 1} in the document D in _{S 1} in step ST1201 of FIG. 12 is obtained. Next, to obtain the first occurrence _{p 4} in a document _{S 2} in step ST 1202. In step ST1203, it is determined whether or not the appearance position is in conformity with an out-of-neighbor condition with a specified order. In the case of FIG. 13, the keyword set S ₂ appears before S ₁ , so that it does not meet the out-of-neighbor condition with order designation. Therefore, the process proceeds to step ST1205. In step ST1205, acquires the next occurrence position p5 of the keyword set _{S 2.}

（２）ステップＳＴ１２０３で、出現位置ｐ_１、ｐ_５に対して順序指定あり近傍外条件を判定する。今度は、Ｓ_１、Ｓ_２間の距離がｄより小さいため、順序指定あり近傍外条件に適合していない。よって、ステップＳＴ１２０５に移行し、キーワード集合Ｓ_２の次の出現位置ｐ_６を取得する。 (2) In step ST1203, an out-of-neighbor condition with an order is determined for the appearance positions p ₁ and p ₅ . This time, since the distance between S ₁ and S ₂ is smaller than d, it is not suitable for the near-outside condition with order designation. Therefore, the process proceeds to step ST1205, acquires the next occurrence position _{p 6} of keyword set _{S 2.}

（３）ステップＳＴ１２０３で、出現位置ｐ_１、ｐ_６に対して順序指定あり近傍外条件を判定する。今度は順序指定あり近傍外条件に適合しているため、ステップＳＴ１２０２でキーワード集合Ｓ３の文書中での最初の出現位置ｐ_７を取得する。 (3) In step ST1203, an out-of-neighbor condition is specified with respect to the appearance positions p ₁ and p ₆ . Now because they comply with the vicinity outside the conditions there specified order, to obtain the first occurrence p ₇ in a document keyword set S3 in step ST 1202.

（４）ステップＳＴ１２０３で、出現位置ｐ_６、ｐ_７に対して順序指定あり近傍外条件を判定する。キーワード集合Ｓ_３が、Ｓ_２よりも先に出現しているため、順序指定あり近傍外条件に適合していない。よって、ステップＳＴ１２０５でキーワード集合Ｓ_３の次の出現位置ｐ_８を取得する。 (4) In step ST1203, an out-of-neighbor condition with an order is determined for the appearance positions p ₆ and p ₇ . Since the keyword set S ₃ appears before S ₂ , it does not meet the out-of-neighbor condition with the order specified. Therefore, to get the next occurrence position _{p 8} of keyword set _{S 3} at step ST1205.

（５）ステップＳＴ１２０３で、出現位置ｐ_６、ｐ_８に対して順序指定あり近傍外条件を判定する。ここで条件に適合し、かつ全てのキーワード集合について判定したので、ステップＳＴ１２０６で「適合した」を出力して終了する。 (5) In step ST1203, an out-of-neighbor condition with order designation is determined for the appearance positions p ₆ and p ₈ . Here, since the conditions are met and all keyword sets have been determined, “matched” is output in step ST1206, and the process ends.

文書の文字数をＮ、キーワード集合に含まれるキーワード数をＭとする。従来方式の、キーワードの出現位置を総当りによって順序指定あり近傍外条件を判定するものでは、Ｋ個のキーワード集合間の順序指定あり近傍外条件の判定に要する計算量は、Ｏ（（ＭＮ）^Ｋ）となる。一方、本実施の形態の順序指定あり近傍外条件の判定の場合は、実施の形態２と同様にＯ（ＫＭＮ）である。 Let N be the number of characters in the document and M be the number of keywords included in the keyword set. In the conventional method for determining an out-of-neighbor condition with an order specification by brute force, the amount of calculation required for determining an out-of-neighbor condition with an order specification between K keyword sets is O ((MN) ^K ). On the other hand, in the case of determining the out-of-neighbor condition with the order designation according to the present embodiment, it is O (KMN) as in the second embodiment.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得して、順序指定あり近傍外条件を判定する方法を示したが、文書の末尾から先頭に向かって順に条件を判定しても、同様の効果が得られる。 Here, the method of acquiring the appearance position of the keyword set in order from the beginning of the document and determining the out-of-neighbor condition with the order designation has been shown, but even if the condition is determined in order from the end to the beginning of the document, Similar effects can be obtained.

以上のように、実施の形態６によれば、近傍条件判定部１０８が行う判定処理の近傍条件として、キーワード集合が指定された順序で出現し、かつ、前後のキーワード集合間の距離が指定された距離以上である順序指定あり近傍外条件としたので、従来に比べて、順序指定あり近傍外条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the sixth embodiment, as a neighborhood condition in the judgment process performed by the neighborhood condition judgment unit 108, keyword sets appear in the designated order, and a distance between preceding and following keyword sets is designated. Therefore, it is possible to reduce the amount of calculation required for determining the out-of-neighbor condition with specified order, and to shorten the search time in such search processing. be able to.

実施の形態７．
実施の形態７は、キーワード集合の順序指定無し近傍外条件を判定するようにしたものである。 Embodiment 7 FIG.
In the seventh embodiment, an out-of-neighbor condition without specifying a keyword set order is determined.

実施の形態７における図面上の構成は、図１に示した実施の形態１と同様であるため、図１を援用して説明する。実施の形態５の文書検索装置は、実施の形態１の文書検索装置の近傍条件判定部１０８で、キーワード集合の順序指定無し近傍外条件を判定できるように構成したものである。また、検索処理の全体の流れは図３に示したものと同等であるため、ここでの説明は省略する。 Since the configuration of the seventh embodiment on the drawing is the same as that of the first embodiment shown in FIG. 1, description will be made with reference to FIG. The document search apparatus according to the fifth embodiment is configured such that the neighborhood condition determination unit 108 of the document search apparatus according to the first embodiment can determine an out-of-proximity condition without specifying a keyword set order. Further, since the entire flow of the search process is the same as that shown in FIG. 3, the description thereof is omitted here.

ここで、キーワード集合の順序指定なし近傍外条件について説明する。キーワード集合の順序指定なし近傍外条件とは、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が指定された距離以上であるか否かを判定する近傍条件である。今、キーワード集合Ｓ_１〜Ｓ_ｎとキーワード集合間の距離がｄに対して、順序指定なし近傍外条件は以下のように判定される。

Here, an out-of-neighbor condition without specifying the order of keyword sets will be described. The out-of-neighbor condition without specifying the order of keyword sets is a neighborhood condition for determining whether or not the distance between preceding and following keyword sets is greater than or equal to a specified distance when all keyword sets appear in the document. Now, the distance between the keyword set S ₁ to S _n and keyword set is relative d, unordered designated neighborhood outside conditions are determined as follows.

実施の形態７における、図１の検索処理実行部１０５の処理（図３のステップＳＴ３０３）の流れは、図９のステップＳＴ９０２でキーワード集合間の順序指定無し近傍外条件を判定するように構成したものと同等であるため、ここでの説明は省略する。 In the seventh embodiment, the flow of the processing of the search processing execution unit 105 in FIG. 1 (step ST303 in FIG. 3) is configured to determine an out-of-order specified neighborhood condition between keyword sets in step ST902 in FIG. Since it is equivalent to a thing, description here is abbreviate | omitted.

一つのキーワード集合の文書中での出現位置の数をＮとする。従来の特許文献１記載の方式の、キーワードの出現位置を総当りによって、順序指定無し近傍外条件を判定するものでは、Ｋ個のキーワード集合間の順序指定無し近傍外条件の判定に要する計算量は、Ｏ（Ｎ^Ｋ）となる。一方、本実施の形態の場合は、Ｋ個のキーワード間の順序指定無し近傍外条件の判定に要する計算量は、実施の形態３と同様にＯ（Ｋ！ＫＮ）である。この計算量は、ＫがＮより小さければ、従来方式よりも少ない計算量である。 Let N be the number of appearance positions in a document of one keyword set. In the conventional method described in Patent Document 1, in which the out-of-order unspecified out-of-order condition is determined by brute force of the keyword appearance position, the amount of calculation required for determining the out-of-order unspecified out-of-order condition between K keyword sets Becomes O (N ^K ). On the other hand, in the case of the present embodiment, the amount of calculation required for determining an out-of-order condition between the K keywords without specifying the order is O (K! KN) as in the third embodiment. This calculation amount is less than the conventional method if K is smaller than N.

以上のように、実施の形態７によれば、近傍条件判定部１０８が行う判定処理の近傍条件として、全てのキーワード集合が文書中に出現したとき、前後のキーワード集合間の距離が指定された距離以上である順序指定なし近傍外条件としたので、従来に比べて、順序指定なし近傍外条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the seventh embodiment, when all the keyword sets appear in the document, the distance between the preceding and following keyword sets is specified as the vicinity condition of the determination process performed by the vicinity condition determining unit 108. Compared to the conventional method, the calculation amount required to determine the out-of-order non-neighbor condition can be reduced and the search time in such search processing can be shortened. Can do.

実施の形態８．
実施の形態８は、キーワード集合の文脈条件を判定するようにしたものである。 Embodiment 8 FIG.
In the eighth embodiment, the context condition of the keyword set is determined.

図１４は、実施の形態８の文書検索装置を示す構成図である。
図示の文書検索装置は、図１に示す検索処理実行部１０５の近傍条件判定部１０８を文脈条件判定部１１２に置き換えたものである。それ以外の構成は、図１と同様であるため、対応する部分に同一符号を付してその説明を省略する。文脈条件判定部１１２は、検索処理実行部１０５ａに設けられ、複数のキーワード集合が与えられたとき、その全てのキーワード集合が同じ文書の構成単位中に出現するか否かを判定する機能を有するものである。 FIG. 14 is a block diagram showing the document search apparatus according to the eighth embodiment.
The illustrated document search apparatus is obtained by replacing the neighborhood condition determination unit 108 of the search processing execution unit 105 shown in FIG. 1 with a context condition determination unit 112. Since the other configuration is the same as that of FIG. 1, the same reference numerals are given to corresponding portions, and description thereof is omitted. The context condition determination unit 112 is provided in the search processing execution unit 105a and has a function of determining whether or not all keyword sets appear in the same document constituent unit when a plurality of keyword sets are given. Is.

検索処理における全体の流れは、図３のステップＳＴ３０３の近傍条件に適合しているか否かの判定処理を、文脈条件の判定処理に置き換えたものと同等であるため、ここでの説明は省略する。 The entire flow in the search process is equivalent to the process of determining whether or not the neighborhood condition in step ST303 in FIG. 3 is matched with the context condition determination process, and thus description thereof is omitted here. .

ここで、キーワード集合の文脈条件について説明する。キーワード集合の文脈条件とは、二つ以上のキーワード集合が与えられたとき、その全てのキーワード集合が同じ文書の構成単位中に出現するか否かを判定する条件である。ここで、文書の構成単位とは、文書中の文、段落、章、節、ページのような文書を構成する要素のことである。キーワード集合Ｓ_１〜Ｓ_ｎが指定されたとする。このとき、条件は以下のように判定される。ここでは、文書Ｄの構成単位をｃｏｍｐ^Ｄとし、その構成単位の先頭の位置をＳＴＲ^Ｄ（ｃｏｍｐ^Ｄ）、末尾の位置をＥＮＤ^Ｄ（ｃｏｍｐ^Ｄ）とする。構成単位の先頭位置、末尾位置は、それぞれ文書の先頭からの文字数で表されるとする。

Here, the context conditions of the keyword set will be described. The context condition of the keyword set is a condition for determining whether or not all of the keyword sets appear in the structural unit of the same document when two or more keyword sets are given. Here, the structural unit of a document is an element constituting the document such as a sentence, paragraph, chapter, section, page in the document. And keyword set _S 1 ~S _n is specified. At this time, the condition is determined as follows. Here, the constituent unit of the document D is comp ^D , the head position of the constituent unit is STR ^D (comp ^D ), and the tail position is END ^D (comp ^D ). It is assumed that the head position and the tail position of the structural unit are each represented by the number of characters from the head of the document.

図１５は、実施の形態８における検索処理実行部１０５ａの処理の流れ図である。
検索処理実行部１０５ａは、キーワード集合Ｓ_１〜Ｓ_ｎが与えられると、先ず、ステップＳＴ１５０１で判定対象の文書の全ての文書構成単位の先頭位置と末尾位置の情報を取得する。次に、文脈条件判定部１１２は、ステップＳＴ１５０２でキーワード集合Ｓ_１〜Ｓ_ｎの、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１〜ｋ_ｎを、キーワード集合照合部１０７から取得する。 FIG. 15 is a flowchart of processing of the search processing execution unit 105a in the eighth embodiment.
Search processing execution unit 105a, given keyword set S ₁ to S _n, firstly, acquires information of the start position and end positions of all the document constituent units of the document to be determined in step ST1501. Next, contextual condition judgment unit 112, the keyword set _S 1 to S _n in step ST 1502, the first occurrence in a document determination target, the keyword _k 1 to k _n in its appearance position, keyword set Obtained from the verification unit 107.

ステップＳＴ１５０３では、取得したキーワード集合の出現位置が、文脈条件に適合しているか否かを判定する。即ち、全てのｉ＝１〜ｎとある文書の構成単位ｃｏｍｐ^Ｄ _ｊ（ｊ＝１〜ｍ）に対して、ＳＴＲ^Ｄ（ｃｏｍｐ^Ｄ _ｊ）≦ＳＴＲ^Ｄ（ｋ_ｉ）、ＥＮＤ^Ｄ（ｋ_ｉ）≦ＥＮＤ^Ｄ（ｃｏｍｐ^Ｄ _ｊ）を満たしているか否かを判定する。ステップＳＴ１５０３で判定対象の文書に対して、取得したキーワード集合の出現位置が、文脈条件に適合しているならば（ＹＥＳ）、ステップＳＴ１５０７に移行し、「適合した」を出力して終了する。一方、ステップＳＴ１５０３で、判定対象の文書に対して、取得したキーワード集合の出現位置が、文脈条件に適合していなければ（ＮＯ）、ステップＳＴ１５０４で、文書中でのキーワードの末尾の出現位置が最も後ろのｋ_ｉが、二つ以上の文書の構成単位に跨っていないか判定する。即ち、あるｊに対してＳＴＲ^Ｄ（ｋ_ｉ）≦ＥＮＤ^Ｄ（ｃｏｍｐ^Ｄ _ｊ）≦ＥＮＤ^Ｄ（ｋ_ｉ）であるか否かを判定する。 In step ST1503, it is determined whether or not the appearance position of the acquired keyword set matches the context condition. That is, STR ^D (comp ^D _j ) ≦ STR ^D (k _i ), END ^D (k _i ) with respect to all document constituent units comp ^D _j (j = 1 to m) where i = 1 to n. It is determined whether or not ≦ END ^D (comp ^D _j ) is satisfied. If the appearance position of the acquired keyword set conforms to the context condition for the document to be determined in step ST1503 (YES), the process proceeds to step ST1507, “conforms” is output, and the process ends. On the other hand, if the appearance position of the acquired keyword set does not match the context condition for the determination target document in step ST1503 (NO), the appearance position of the end of the keyword in the document is determined in step ST1504. rearmost k _i is determined or not span construction unit of two or more documents. That is, it is determined whether STR ^D (k _i ) ≦ END ^D (comp ^D _j ) ≦ END ^D (k _i ) for a certain _j .

ステップＳＴ１５０４において、跨っていた場合（ＹＥＳ）、ステップＳＴ１５０５でＳ_ｉの次の出現位置を取得し、ステップＳＴ１５０６に移行する。ステップＳＴ１５０４で、ｋ_ｉが二つ以上の文書の構成単位に跨っていない場合（ＮＯ）、ステップＳＴ１５０６に移行する。ステップＳＴ１５０６では、キーワードの末尾の文書中での出現位置が最も後ろのｋ_ｉと同じ文書の構成単位に含まれない、全てのキーワード集合の次の出現位置を取得する。ステップＳＴ１５０６でキーワード集合の次の出現位置を取得したら、ステップＳＴ１５０３に戻り、取得したキーワード集合の出現位置が文脈条件に適合するか否かを判定する。 In step ST1504, if it was over (YES), obtains the next occurrence position of _{S i} in step ST 1505, the process proceeds to step ST1506. In Step ST1504, _{if k i} is not across the structural unit of two or more documents (NO), there moves to step ST1506. In step ST1506, the next appearance position of all keyword sets that does not fall within the same document constituent unit as the last k _{i in} the last document of the keyword is acquired. If the next appearance position of a keyword set is acquired by step ST1506, it will return to step ST1503 and it will be determined whether the appearance position of the acquired keyword set is suitable for context conditions.

尚、図１５の処理の流れ図では省略したが、ステップＳＴ１５０２、ステップＳＴ１５０５、ステップＳＴ１５０６で、キーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に文脈条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。 Although omitted in the flowchart of the processing in FIG. 15, if the appearance position of the keyword set cannot be obtained in step ST1502, step ST1505, and step ST1506, the appearance position that matches the context condition in the determination target document. Since it means that there is no, output “not fit” and exit.

また、ステップＳＴ１５０１で、文書の構成単位の先頭位置と末尾位置を取得するとしたが、文書の構成単位の先頭・末尾位置の情報は、文書をデータベース１１０に登録するときに、文書から自動的に抽出してデータベース１１０に記録しておいても良いし、検索時に判定対象の文書を走査して取得してくるようにしても良い。どちらの場合も、文書の構成単位が文であるときは、句点の直後の文字の位置を先頭位置、次の句点の位置を末尾位置とすることで、自動的に文書から抽出できる。文書の構成単位が段落の場合は、句点の代わりに改行文字を、構成単位の区切り文字とすればよい。 In step ST1501, the start position and the end position of the document unit are acquired. However, the information on the start and end positions of the document unit is automatically acquired from the document when the document is registered in the database 110. It may be extracted and recorded in the database 110, or a document to be determined may be scanned and acquired at the time of search. In either case, when the structural unit of the document is a sentence, it can be automatically extracted from the document by setting the position of the character immediately after the phrase as the head position and the position of the next phrase as the end position. In the case where the structural unit of the document is a paragraph, a line feed character may be used as a delimiter for the structural unit instead of a punctuation mark.

図１６は、実施の形態８の図１５に示した処理の流れによる、キーワード集合の文脈条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２、Ｓ_３が与えられたとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２，ｐ_３｝、Ｓ_２ ^Ｄ＝｛ｐ_４，ｐ_５，ｐ_６｝、Ｓ_３ ^Ｄ＝｛ｐ_７，ｐ_８，ｐ_９｝とする。それらの出現位置の関係は、図１６の通りであるとする。このような検索条件が与えられたとき、以下のようにして判定が実行される。尚、以下の（１）〜（３）の処理は図１６中の（１）〜（３）に対応している。 FIG. 16 shows a determination path for keyword context context conditions according to the processing flow shown in FIG. 15 of the eighth embodiment.
Assume that keyword sets S ₁ , S ₂ , and S ₃ are given as search conditions. The appearance position of each keyword set in the document D to be determined is S ₁ ^D = {p ₁ , p ₂ , p ₃ }, S ₂ ^D = {p ₄ , p ₅ , p ₆ }, S ₃ ^D = {P ₇ , p ₈ , p ₉ }. The relationship between their appearance positions is as shown in FIG. When such a search condition is given, the determination is executed as follows. The following processes (1) to (3) correspond to (1) to (3) in FIG.

（１）図１５のステップＳＴ１５０１で、文書Ｄの全ての構成単位の先頭位置と末尾位置の情報を取得する。次に、ステップＳＴ１５０２で、Ｓ_１〜Ｓ_３の文書Ｄ中での最初の出現位置ｐ_１、ｐ_４、ｐ_７が取得される。ステップＳＴ１５０３で、取得した出現位置が文脈条件に適合しているか否かを判定する。図１６の場合、キーワード集合Ｓ_１、Ｓ_３が、構成単位１６０２に、Ｓ_２が構成単位１６０１に含まれるため、文脈条件に適合しない。よって、ステップＳＴ１５０４を経てステップＳＴ１５０６に移行する。ステップＳＴ１５０６では、ｐ_１が文書中で最も後ろの位置にあるため、キーワード集合Ｓ_１が含まれる構成単位１６０２に含まれていないＳ_２の次の出現位置ｐ_５を取得する。 (1) In step ST1501 in FIG. 15, information on the start position and end position of all the structural units of the document D is acquired. Next, in step ST1502, the first appearance positions p ₁ , p ₄ , and p _{7 in} the document D of S _{1 to} S ₃ are acquired. In step ST1503, it is determined whether or not the acquired appearance position meets the context condition. In the case of FIG. 16, the keyword sets S ₁ and S ₃ are included in the structural unit 1602 and S ₂ is included in the structural unit 1601, and thus do not meet the context condition. Therefore, the process proceeds to step ST1506 via step ST1504. At step ST1506, _{since p 1} is in the rearmost position in the document, to get the next occurrence position _{p 5} of _{S 2} that are not included in the constituent unit 1602 includes keyword set _{S 1.}

（２）ステップＳＴ１５０３で、出現位置ｐ_１、ｐ_５、ｐ_７に対して文脈条件の判定を実施する。今度はＳ_１、Ｓ_３は構成単位１６０２に、Ｓ_２は構成単位１６０４に含まれているため、文脈条件に適合しない。ステップＳＴ１５０６で、ｐ_５が文書中で最も後ろに位置するため、構成単位１６０４に含まれないキーワード集合Ｓ_１とＳ_２の次の出現位置ｐ_２、ｐ_８を取得する。 (2) In step ST1503, the context condition is determined for the appearance positions p ₁ , p ₅ , and p ₇ . This time, S ₁ and S ₃ are included in the structural unit 1602 and S ₂ is included in the structural unit 1604, and thus do not meet the context condition. In Step ST1506, _{since p 5} is positioned at the rearmost in the document, to get the next occurrence position _p 2, _{p 8} of keyword set _{S 1} and _{S 2} which are not included in the constituent unit 1604.

（３）ステップＳＴ１５０３で、出現位置ｐ_２、ｐ_５、ｐ_８に対して文脈条件を判定する。今度は、Ｓ_１、Ｓ_２、Ｓ_３が、同じ構成単位１６０４に含まれるため、文脈条件に適合する。よって、ステップＳＴ１５０７で「適合した」を出力して終了する。 (3) In step ST1503, context conditions are determined for the appearance positions p ₂ , p ₅ , and p ₈ . This time, S ₁ , S ₂ , and S ₃ are included in the same structural unit 1604, and thus satisfy the context condition. Therefore, “applicable” is output in step ST1507 and the process ends.

文書の文字数をＮ、キーワード集合に含まれるキーワード数をＭとする。従来の特許文献１記載の方式では、Ｋ個のキーワード集合間の文脈条件の判定に要する計算量は、Ｏ（（ＭＮ）^Ｋ）となる。一方、本実施の形態の文脈条件の判定の場合は、実施の形態２と同様にＯ（ＫＭＮ）である。 Let N be the number of characters in the document and M be the number of keywords included in the keyword set. In the conventional method described in Patent Document 1, the amount of calculation required to determine the context condition between K keyword sets is O ((MN) ^K ). On the other hand, in the case of the determination of the context condition of the present embodiment, it is O (KMN) as in the second embodiment.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得して、文脈条件を判定する方法を示したが、文書の末尾から先頭に向かって順に条件を判定しても、同様の効果が得られる。 Here, the method of determining the context condition by acquiring the appearance position of the keyword set in order from the beginning of the document has been shown, but the same effect can be obtained by determining the condition in order from the end of the document toward the beginning. can get.

以上のように、実施の形態８によれば、複数のキーワード集合の文書中での出現位置をそれぞれのキーワード集合の出現位置順に取得するキーワード集合照合部１０７と、キーワード集合照合部１０７で取得した複数のキーワード集合が、同一文書の構成単位中に出現することを示す文脈条件を満たすか否かを判定する文脈条件判定部１１２とを有し、キーワード集合照合部１０７における出現位置の取得処理と、文脈条件判定部１１２における文脈条件判定処理とを交互に実行し、文脈条件が真と判定された時点でその判定結果を検索結果として出力する検索処理実行部１０５ａを備えたので、従来に比べて、文脈条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the eighth embodiment, the keyword set matching unit 107 that acquires the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets and the keyword set matching unit 107 are acquired. A context condition determination unit 112 that determines whether or not a plurality of keyword sets meet a context condition indicating that they appear in a constituent unit of the same document; The context condition determination unit 112 alternately executes the context condition determination process and outputs the determination result as a search result when the context condition is determined to be true. Thus, it is possible to reduce the amount of calculation required for determining the context condition, and to shorten the search time in such a search process.

実施の形態９．
実施の形態９は、キーワード集合の範囲条件を判定するようにしたものである。 Embodiment 9 FIG.
In the ninth embodiment, the range condition of the keyword set is determined.

図１７は、実施の形態９の文書検索装置を示す構成図である。
図示の文書検索装置は、図１に示す検索処理実行部１０５の近傍条件判定部１０８を範囲条件判定部１１３に置き換えたものである。それ以外の構成は、図１と同様であるため、対応する部分に同一符号を付してその説明を省略する。範囲条件判定部１１３は、検索処理実行部１０５ｂに設けられ、複数のキーワード集合と、一つの文書範囲が与えられたとき、その全てのキーワード集合が同じ文書範囲中に出現するか否かを判定する機能を有するものである。 FIG. 17 is a configuration diagram illustrating the document search apparatus according to the ninth embodiment.
The illustrated document search apparatus is obtained by replacing the proximity condition determination unit 108 of the search processing execution unit 105 shown in FIG. 1 with a range condition determination unit 113. Since the other configuration is the same as that of FIG. 1, the same reference numerals are given to corresponding portions, and description thereof is omitted. The range condition determination unit 113 is provided in the search processing execution unit 105b. When a plurality of keyword sets and one document range are given, the range condition determination unit 113 determines whether or not all the keyword sets appear in the same document range. It has the function to do.

検索処理における全体の流れは、図３のステップＳＴ３０３の近傍条件に適合しているか否かの判定処理を、範囲条件の判定処理に置き換えたものと同等であるため、ここでの説明は省略する。 The overall flow in the search process is the same as that obtained by replacing the determination process of whether or not the neighborhood condition in step ST303 in FIG. .

ここで、キーワード集合の範囲条件について説明する。キーワード集合の範囲条件とは、二つ以上のキーワード集合と、一つの文書範囲が与えられたとき、その全てのキーワード集合が同じ文書範囲中に出現するか否かを判定する条件である。ここで、文書範囲とは、要約、前書き、後書き、本文などのように、文書中のあるまとまりを構成する範囲のことである。キーワード集合Ｓ_１〜Ｓ_ｎと文書範囲が指定されたとする。このとき、範囲条件は以下のように判定される。ここでは、文書Ｄ中の範囲をｒａｎｇｅ^Ｄとし、その文書範囲の先頭の位置をＳＴＲ^Ｄ（ｒａｎｇｅ^Ｄ）、末尾の位置をＥＮＤ^Ｄ（ｒａｎｇｅ^Ｄ）とする。

Here, the range condition of the keyword set will be described. The range condition of the keyword set is a condition for determining whether or not all keyword sets appear in the same document range when two or more keyword sets and one document range are given. Here, the document range is a range that constitutes a certain unit in the document, such as a summary, a preface, a postscript, and a body. And keyword set _S 1 ~S _n and document range is specified. At this time, the range condition is determined as follows. Here, it is assumed that the range in the document D is range ^D , the start position of the document range is STR ^D (range ^D ), and the end position is END ^D (range ^D ).

図１８は、実施の形態９における、検索処理実行部１０５ｂの処理の流れ図である。
検索処理実行部１０５ｂは、キーワード集合Ｓ_１〜Ｓ_ｎと文書範囲ｒａｎｇｅ^Ｄが与えられると、先ず、ステップＳＴ１８０１で判定対象の文書の範囲の先頭位置と末尾位置の情報を取得する。次に、範囲条件判定部１１３は、ステップＳＴ１８０２でキーワード集合Ｓ_１〜Ｓ_ｎの、判定対象の文書中での最初の出現位置と、その出現位置にあるキーワードｋ_１〜ｋ_ｎを、キーワード集合照合部１０７から取得する。ステップＳＴ１８０３では、取得したキーワード集合の出現位置が、範囲条件に適合しているか否かを判定する。即ち、全てのｉ＝１〜ｎと文書範囲ｒａｎｇｅ^Ｄに対して、ＳＴＲ^Ｄ（ｒａｎｇｅ^Ｄ）≦ＳＴＲ^Ｄ（ｋ_ｉ）、ＥＮＤ^Ｄ（ｋ_ｉ）≦ＥＮＤ^Ｄ（ｒａｎｇｅ^Ｄ）を満たしているか否かを判定する。 FIG. 18 is a flowchart of the process of the search process execution unit 105b in the ninth embodiment.
Search processing execution unit 105b, when the keyword set _S 1 to S _n and the document range range ^D is given, first, to acquire the information of the start position and end position of the range of the document to be determined in step ST1801. Next, the range condition determining unit 113, the keyword set _S 1 to S _n in step ST1802, the first occurrence in a document determination target, the keyword _k 1 to k _n in its appearance position, keyword set Obtained from the verification unit 107. In step ST1803, it is determined whether the appearance position of the acquired keyword set conforms to the range condition. That is, for all i = 1 to n and the document range range ^D , whether STR ^D (range ^D ) ≦ STR ^D (k _i ), END ^D (k _i ) ≦ END ^D (range ^D ) is satisfied. Determine whether.

ステップＳＴ１８０３で判定対象の文書に対して、取得したキーワード集合の出現位置が、範囲条件に適合しているならば（ＹＥＳ）、ステップＳＴ１８０５に移行し、「適合した」を出力して終了する。一方、ステップＳＴ１８０３で、判定対象の文書に対して、取得したキーワード集合の出現位置が、範囲条件に適合していなければ（ＮＯ）、ステップＳＴ１８０４で、範囲条件を満たしていない全てのキーワード集合の次の出現位置を取得する。ステップＳＴ１８０４でキーワード集合の次の出現位置を取得したら、ステップＳＴ１８０３で、取得したキーワード集合の出現位置が範囲条件に適合するか否かを判定する。 If the appearance position of the acquired keyword set conforms to the range condition for the document to be determined in step ST1803 (YES), the process proceeds to step ST1805, “conforms” is output, and the process ends. On the other hand, if the appearance position of the acquired keyword set does not match the range condition for the determination target document in step ST1803 (NO), in step ST1804, all keyword sets that do not satisfy the range condition are detected. Get the next occurrence position. When the next appearance position of the keyword set is acquired in step ST1804, it is determined in step ST1803 whether or not the appearance position of the acquired keyword set meets the range condition.

尚、図１８の処理の流れ図では省略したが、ステップＳＴ１８０２、ステップＳＴ１８０４で、キーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に範囲条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。また、ステップＳＴ１８０３でＥＮＤ^Ｄ（ｒａｎｇｅ^Ｄ）＜ＥＮＤ^Ｄ（ｋ_ｉ）となるキーワードがあった場合も、「適合しない」を出力して終了する。 Although omitted in the flowchart of the processing in FIG. 18, if the appearance position of the keyword set cannot be acquired in steps ST1802 and ST1804, there is no appearance position that meets the range condition in the document to be determined. So, output “not fit” and exit. Also, even if there is a keyword to be in step ^{^{ST1803 END D (range D) <}} END D (k i), to terminate with a "nonconforming".

図１８のステップＳＴ１８０１で、文書範囲の先頭位置と末尾位置を取得するとしたが、文書範囲の先頭・末尾位置の情報は、文書をデータベース１１０に登録するときに、文書から自動的に抽出してデータベース１１０に記録しておいても良いし、検索時に判定対象の文書を走査して取得してくるようにしても良い。どちらの場合も、文書範囲の抽出ルールを予め決めておく必要がある。 In step ST1801 of FIG. 18, the start position and end position of the document range are acquired, but information on the start and end positions of the document range is automatically extracted from the document when the document is registered in the database 110. It may be recorded in the database 110, or it may be obtained by scanning a document to be determined at the time of search. In either case, it is necessary to determine a document range extraction rule in advance.

図１９は、実施の形態９の図１８に示した処理の流れよる、キーワード集合の範囲条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２、Ｓ_３が与えられたとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２，ｐ_３｝、Ｓ_２ ^Ｄ＝｛ｐ_４，ｐ_５，ｐ_６｝、Ｓ_３ ^Ｄ＝｛ｐ_７，ｐ_８，ｐ_９｝とする。それらの出現位置の関係は、図１９の通りであるとする。このような検索条件が与えられたとき、以下のようにして判定が実行される。尚、以下の（１）〜（３）の処理は、図１９中の（１）〜（３）に対応しているものである。 FIG. 19 shows a determination path for keyword set range conditions according to the processing flow shown in FIG. 18 of the ninth embodiment.
Assume that keyword sets S ₁ , S ₂ , and S ₃ are given as search conditions. The appearance position of each keyword set in the document D to be determined is S ₁ ^D = {p ₁ , p ₂ , p ₃ }, S ₂ ^D = {p ₄ , p ₅ , p ₆ }, S ₃ ^D = {P ₇ , p ₈ , p ₉ }. The relationship between their appearance positions is as shown in FIG. When such a search condition is given, the determination is executed as follows. The following processes (1) to (3) correspond to (1) to (3) in FIG.

（１）図１８のステップＳＴ１８０１で、文書Ｄの指定された範囲の先頭位置と末尾位置の情報を取得する。次に、ステップＳＴ１８０２で、Ｓ_１〜Ｓ_３の文書Ｄ中での最初の出現位置ｐ_１、ｐ_４、ｐ_７が取得される。ステップＳＴ１８０３で、取得した出現位置が範囲条件に適合しているか否かを判定する。図１９の場合、いずれのキーワード集合も指定された範囲に含まれないため、範囲条件に適合しない。よって、ステップＳＴ１８０４では、Ｓ_１、Ｓ_２、Ｓ_３の次の出現位置ｐ_２、ｐ_５、ｐ_８を取得する。 (1) In step ST1801 in FIG. 18, information on the start position and end position of the designated range of the document D is acquired. Next, in step ST1802, the first appearance positions p ₁ , p ₄ , and p _{7 in} the document D of S _{1 to} S ₃ are acquired. In step ST1803, it is determined whether or not the acquired appearance position meets the range condition. In the case of FIG. 19, any keyword set is not included in the specified range, and therefore does not meet the range condition. Therefore, in step _ST1804, acquires the S _1, S 2, the next occurrence position of _{_{_{S 3 p 2, p 5,}}} p 8.

（２）ステップＳＴ１８０３で、出現位置ｐ_２、ｐ_５、ｐ_８に対して範囲条件の判定を実施する。今度はＳ_１、Ｓ_２が指定された文書範囲に含まれないため、範囲条件に適合しない。ステップＳＴ１８０４で、Ｓ_１とＳ_２の次の出現位置ｐ_３、ｐ_６を取得する。 (2) In step ST1803, the range condition is determined for the appearance positions p ₂ , p ₅ , and p ₈ . This time, since S ₁ and S ₂ are not included in the designated document range, the range condition is not met. In step ST1804, the next appearance positions p ₃ and p ₆ of S ₁ and S ₂ are acquired.

（３）ステップＳＴ１８０３で、出現位置ｐ_３、ｐ_６、ｐ_８に対して文脈条件を判定する。今度は、Ｓ_１、Ｓ_２、Ｓ_３が、指定された文書範囲に含まれるため、範囲条件に適合する。よって、ステップＳＴ１８０５で「適合した」を出力して終了する。 (3) In step ST1803, context conditions are determined for the appearance positions p ₃ , p ₆ , and p ₈ . This time, S ₁ , S ₂ , and S ₃ are included in the designated document range, and therefore meet the range condition. Therefore, “applicable” is output in step ST1805, and the process ends.

文書の文字数をＮ、キーワード集合に含まれるキーワード数をＭとする。従来の特許文献１記載の方式では、Ｋ個のキーワード集合間の範囲条件の判定に要する計算量は、Ｏ（Ｍ^ＫＫＮ）となる。一方、本実施の形態の文脈条件の判定の場合は、実施の形態２と同様にＯ（ＫＭＮ）である。 Let N be the number of characters in the document and M be the number of keywords included in the keyword set. In the conventional method described in Patent Document 1, the amount of calculation required to determine the range condition between K keyword sets is O (M ^K KN). On the other hand, in the case of the determination of the context condition of the present embodiment, it is O (KMN) as in the second embodiment.

このように、実施の形態９の文書検索装置では、キーワード集合間の範囲条件を、従来の方式と比較して高速に判定することが可能である。 As described above, in the document search apparatus according to the ninth embodiment, it is possible to determine the range condition between keyword sets at a higher speed than in the conventional method.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得して、範囲条件を判定する方法を示したが、文書の末尾から先頭に向かって順に条件を判定しても、同様の効果が得られる。 Here, the method of acquiring the appearance position of the keyword set in order from the beginning of the document and determining the range condition has been shown, but the same effect can be obtained by determining the condition in order from the end of the document toward the beginning. can get.

以上のように、実施の形態９によれば、複数のキーワード集合の文書中での出現位置をそれぞれのキーワード集合の出現位置順に取得するキーワード集合照合部１０７と、キーワード集合照合部１０７で取得した複数のキーワード集合が、特定の文書範囲中に出現することを示す範囲条件を満たすか否かを判定する範囲条件判定部１１３とを有し、キーワード集合照合部１０７における出現位置の取得処理と、範囲条件判定部１１３における範囲条件判定処理とを交互に実行し、範囲条件が真と判定された時点でその判定結果を検索結果として出力する検索処理実行部を備えたので、従来に比べて、範囲条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the ninth embodiment, the keyword set matching unit 107 that acquires the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets and the keyword set matching unit 107 are acquired. A range condition determination unit 113 that determines whether or not a range condition indicating that a plurality of keyword sets appear in a specific document range is satisfied, and an appearance position acquisition process in the keyword set matching unit 107; Since it includes a search processing execution unit that alternately executes the range condition determination processing in the range condition determination unit 113 and outputs the determination result as a search result when the range condition is determined to be true, The amount of calculation required for determining the range condition can be reduced, and the search time in such a search process can be shortened.

実施の形態１０．
実施の形態１０は、キーワード集合の複合条件を判定するようにしたものである。 Embodiment 10 FIG.
In the tenth embodiment, a composite condition of a keyword set is determined.

図２０は、実施の形態１０の文書検索装置を示す構成図である。
図示の文書検索装置は、図１に示す検索処理実行部１０５の近傍条件判定部１０８を複合条件判定部１１４に置き換えたものである。それ以外の構成は、図１と同様であるため、対応する部分に同一符号を付してその説明を省略する。複合条件判定部１１４は、検索処理実行部１０５ｃに設けられ、二つ以上のキーワード集合が与えられたとき、その全てのキーワード集合が、上記実施の形態２〜実施の形態７のいずれかの近傍条件、実施の形態８の文脈条件、実施の形態９の範囲条件、およびそれらを論理演算で組み合わせた論理条件に適合するか否かを判定する機能を有するものである。 FIG. 20 is a configuration diagram illustrating the document search apparatus according to the tenth embodiment.
The illustrated document search apparatus is obtained by replacing the proximity condition determination unit 108 of the search processing execution unit 105 shown in FIG. Since the other configuration is the same as that of FIG. 1, the same reference numerals are given to corresponding portions, and description thereof is omitted. The composite condition determination unit 114 is provided in the search processing execution unit 105c, and when two or more keyword sets are given, all the keyword sets are in the vicinity of any one of the second to seventh embodiments. It has a function of determining whether or not a condition, a context condition of the eighth embodiment, a range condition of the ninth embodiment, and a logical condition obtained by combining them with a logical operation are satisfied.

検索処理における全体の流れは、図３のステップＳＴ３０３の近傍条件に適合しているか否かの判定処理を、複合条件の判定処理に置き換えたものと同等であるため、ここでの説明は省略する。 Since the entire flow in the search process is equivalent to the process of determining whether or not the neighborhood condition in step ST303 in FIG. 3 is adapted to the complex condition determination process, the description here is omitted. .

ここで、キーワード集合の複合条件について説明する。キーワード集合の複合条件とは、二つ以上のキーワード集合が与えられたとき、その全てのキーワード集合が、近傍条件、文脈条件、範囲条件、およびそれらを論理演算（ＡＮＤ／ＯＲ／ＮＯＴ等）で組み合わせた論理条件に適合するか否かを判定する条件である。複合条件に範囲条件の判定を含む場合は、検索条件として一つの文書範囲も与える。 Here, the compound condition of the keyword set will be described. The compound condition of a keyword set is that when two or more keyword sets are given, all of the keyword sets are neighborhood conditions, context conditions, range conditions, and logical operations (AND / OR / NOT etc.). This is a condition for determining whether or not the combined logical condition is met. When the composite condition includes determination of a range condition, a single document range is also given as a search condition.

図２１は、実施の形態１０における検索処理実行部１０５ｃの処理の流れ図である。
今、検索条件として、キーワード集合の順序指定あり近傍内条件と、文脈条件と、範囲条件とを論理条件ＡＮＤで結合した複合条件が与えられたとする。検索処理実行部１０５ｃは、キーワード集合と、キーワード集合間の距離と、一つの文書範囲が指定されると、キーワード集合照合部１０７によって、ステップＳＴ２１０１で、判定対象の文書の、全ての構成単位と、指定された文書範囲の先頭と末尾の位置を取得する。次に、キーワード集合照合部１０７は、ステップＳＴ２１０２で全てのキーワード集合の最初の出現位置を取得する。そして、複合条件判定部１１４は、ステップＳＴ２１０３で、取得したキーワード集合の出現位置が範囲条件に適合するか判定する。 FIG. 21 is a flowchart of processing of the search processing execution unit 105c in the tenth embodiment.
Now, it is assumed that a compound condition obtained by combining a condition within a neighborhood with a specified keyword set order, a context condition, and a range condition with a logical condition AND is provided as a search condition. When the keyword set, the distance between the keyword sets, and one document range are designated, the search processing execution unit 105c determines, in step ST2101, all the structural units of the document to be determined by the keyword set matching unit 107. , Get the start and end position of the specified document range. Next, in step ST2102, the keyword set matching unit 107 acquires the first appearance position of all keyword sets. In step ST2103, the composite condition determination unit 114 determines whether the appearance position of the acquired keyword set meets the range condition.

ステップＳＴ２１０３において、範囲条件に適合した場合（ＹＥＳ）、ステップＳＴ２１０４に移行し、取得したキーワード集合の出現位置が文脈条件に適合するか判定する。ステップＳＴ２１０４で、文脈条件に適合した場合（ＹＥＳ）、ステップＳＴ２１０５で、取得したキーワード集合の出現位置が近傍条件に適合するか判定する。ステップＳＴ２１０５で、近傍条件に適合した場合、ステップＳＴ２１０６で、「適合した」を出力して終了する。 In step ST2103, when the range condition is met (YES), the process proceeds to step ST2104, and it is determined whether the appearance position of the acquired keyword set matches the context condition. If the context condition is satisfied in step ST2104 (YES), it is determined in step ST2105 whether the appearance position of the acquired keyword set is compatible with the neighborhood condition. If it is determined in step ST2105 that the neighborhood condition is met, “matched” is output in step ST2106, and the process ends.

一方、ステップＳＴ２１０３、ステップＳＴ２１０４、ステップＳＴ２１０５の各条件判定で、条件に適合しなかった場合（ＮＯ）は、ステップＳＴ２１０７で、キーワード集合の次の出現位置を取得する。ステップＳＴ２１０７で、出現位置を取得するキーワード集合は、判定で適合しなかった条件によって異なる。範囲条件に適合しなかった場合は、実施の形態９に従ってキーワード集合の出現位置を取得する。文脈条件に適合しなかった場合は、実施の形態８に従ってキーワード集合の出現位置を取得する。近傍条件に適合しなかった場合は、実施の形態２乃至実施の形態７のいずれかに従ってキーワード集合の出現位置を取得する。ステップＳＴ２１０７で、キーワード集合の出現位置を取得したら、ステップＳＴ２１０３に戻って、再び範囲条件を判定する。 On the other hand, when the condition is not satisfied in each condition determination in step ST2103, step ST2104, and step ST2105 (NO), the next appearance position of the keyword set is acquired in step ST2107. In step ST2107, the keyword set from which the appearance position is acquired differs depending on conditions that are not matched in the determination. If the range condition is not met, the appearance position of the keyword set is acquired according to the ninth embodiment. If the context condition is not met, the appearance position of the keyword set is acquired according to the eighth embodiment. If the neighborhood condition is not met, the appearance position of the keyword set is acquired according to any one of the second to seventh embodiments. If the appearance position of a keyword set is acquired in step ST2107, it will return to step ST2103 and will determine a range condition again.

図２１の処理の流れ図では省略したが、ステップＳＴ２１０２、ステップＳＴ２１０７で、キーワード集合の出現位置が取得できなかった場合は、その判定対象の文書中に範囲条件に適合する出現位置はないということなので、「適合しない」を出力して終了する。また、ステップＳＴ２１０３で、キーワード集合の文書中での出現位置が、文書範囲より後ろであった場合も「適合しない」を出力して終了する。 Although omitted in the flowchart of the processing of FIG. 21, when the appearance position of the keyword set cannot be acquired in steps ST2102 and ST2107, it means that there is no appearance position that satisfies the range condition in the document to be determined. , "Not fit" is output and the process ends. Also, in step ST2103, if the appearance position of the keyword set in the document is behind the document range, “not fit” is output and the process ends.

図２２は、実施の形態１０の図２１に示した処理の流れによる、キーワード集合の複合条件の判定のパスを示したものである。
今、検索条件としてキーワード集合Ｓ_１、Ｓ_２、Ｓ_３が与えられたとする。各キーワード集合の、判定対象の文書Ｄ中での出現位置を、Ｓ_１ ^Ｄ＝｛ｐ_１，ｐ_２，ｐ_３｝、Ｓ_２ ^Ｄ＝｛ｐ_４，ｐ_５，ｐ_６，ｐ_７｝、Ｓ_３ ^Ｄ＝｛ｐ_８，ｐ_９，ｐ_１０｝とする。それらの出現位置の関係は、図２２の通りであるとする。尚、これらキーワード集合Ｓ_１、Ｓ_２、Ｓ_３の複合条件を満たす具体的な動作は、実施の形態９における範囲条件、実施の形態８における文脈条件、および実施の形態２〜実施の形態７のいずれかによる近傍条件の各条件を全て満たす動作であるため、ここでの説明は省略する。 FIG. 22 shows a determination path for a keyword set compound condition according to the processing flow shown in FIG. 21 of the tenth embodiment.
Assume that keyword sets S ₁ , S ₂ , and S ₃ are given as search conditions. The appearance position of each keyword set in the document D to be determined is represented by S ₁ ^D = {p ₁ , p ₂ , p ₃ }, S ₂ ^D = {p ₄ , p ₅ , p _6, p ₇ }, Let S ₃ ^D = {p ₈ , p ₉ , p ₁₀ }. The relationship between the appearance positions is as shown in FIG. Note that specific operations that satisfy the compound conditions of the keyword sets S ₁ , S ₂ , and S ₃ are the range conditions in the ninth embodiment, the context conditions in the eighth embodiment, and the second to seventh embodiments. Since this is an operation that satisfies all the conditions of the proximity condition by any of the above, description here is omitted.

範囲条件に適合する文書中の領域は、近傍条件や文脈条件と比べて局所的である。よって、論理条件ＡＮＤで結合された複合条件は、範囲条件から判定することで、より早く条件に適合する領域を絞り込むことができる。論理条件ＯＲで結合された複合条件の場合は、条件に適合する領域が広いものを先に判定することで、より早く判定結果を取得することができる。 The region in the document that meets the range condition is local compared to the neighborhood condition or context condition. Therefore, the complex condition combined with the logical condition AND can be narrowed down the area that matches the condition earlier by determining from the range condition. In the case of a compound condition combined with a logical condition OR, a determination result can be acquired earlier by determining a wide area that meets the condition first.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得して、複合条件を判定する方法を示したが、文書の末尾から先頭に向かって順に条件を判定しても、同様の効果が得られる。 Here, the method of determining the compound condition by acquiring the appearance position of the keyword set in order from the beginning of the document has been shown, but the same effect can be obtained by determining the condition in order from the end to the beginning of the document. can get.

以上のように、実施の形態１０によれば、複数のキーワード集合の文書中での出現位置をそれぞれのキーワード集合の出現位置順に取得するキーワード集合照合部１０７と、キーワード集合照合部１０７で取得した複数のキーワード集合が、所定の近傍条件、同一文書の構成単位中に出現することを示す文脈条件、特定の文書範囲中に出現することを示す範囲条件、およびこれら条件の論理条件を組み合わせた複合条件を満たすか否かを判定する複合条件判定部１１４とを有し、キーワード集合照合部１０７における出現位置の取得処理と、複合条件判定部１１４における複合条件判定処理とを交互に実行し、複合条件が真と判定された時点でその判定結果を検索結果として出力する検索処理実行部１０５ｃを備えたので、従来に比べて、複合条件の判定に要する計算量を削減することができ、このような検索処理における検索時間を短縮することができる。 As described above, according to the tenth embodiment, the keyword set matching unit 107 that acquires the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets and the keyword set matching unit 107 are acquired. A composite that combines a predetermined neighborhood condition, a context condition indicating that it appears in a structural unit of the same document, a range condition indicating that it appears in a specific document range, and a logical condition of these conditions A composite condition determination unit 114 that determines whether or not a condition is satisfied, and alternately performs an appearance position acquisition process in the keyword set matching unit 107 and a composite condition determination process in the composite condition determination unit 114, Since the search processing execution unit 105c that outputs the result of the determination as a search result when the condition is determined to be true is provided, compared to the conventional case, there is a more It is possible to reduce the calculation amount required for the determination of the conditions, it is possible to shorten the search time in such a search process.

実施の形態１１．
実施の形態１１は、上記各実施の形態１乃至１０において、キーワード集合照合部１０７で、キーワード集合の文書中での出現位置を高速に取得するようにしたものである。 Embodiment 11 FIG.
In the eleventh embodiment, in the first to tenth embodiments, the keyword set collating unit 107 acquires the appearance position of the keyword set in the document at high speed.

実施の形態１１の図面上の構成は、各実施の形態１〜１１のいずれかと同様であるため、その図示およびキーワード集合照合部１０７以外の構成の説明は省略する。実施の形態１１のキーワード集合照合部１０７は、キーワード集合として直前に出力した出現位置にあるキーワードの次の出現位置のみを取得し、その取得した出現位置と、他のキーワードの出現位置とを比較し、最も小さいものをキーワード集合の次の出現位置として出力するよう構成されている。 Since the configuration of the eleventh embodiment on the drawing is the same as that of any of the first to eleventh embodiments, the description of the configuration other than the illustration and the keyword set matching unit 107 is omitted. The keyword set matching unit 107 of the eleventh embodiment acquires only the next appearance position of the keyword at the appearance position output immediately before as a keyword set, and compares the acquired appearance position with the appearance position of another keyword. The smallest one is output as the next appearance position of the keyword set.

次に、実施の形態１１における、キーワード集合照合部１０７の動作について説明する。
キーワード集合照合部１０７は、キーワード集合の文書中での最初の出現位置取得要求があると、キーワード集合中の全てのキーワードについて、キーワード照合部１０９からその最初の出現位置を取得する。キーワード照合部１０９は、キーワードの出現位置取得要求がある度に、キーワードの出現位置を文書の先頭から順番に一つずつ出力する。例えば、図２の例で説明すると、キーワード集合｛キーワード，出現｝では、“キーワード”の最初の出現位置である１と、“出現”の最初の出現位置である９を出力する。 Next, the operation of the keyword set matching unit 107 in Embodiment 11 will be described.
When there is a first appearance position acquisition request in the keyword set document, the keyword set matching unit 107 acquires the first appearance position from the keyword matching unit 109 for all keywords in the keyword set. The keyword matching unit 109 outputs the keyword appearance positions one by one in order from the top of the document each time a keyword appearance position acquisition request is made. For example, in the example of FIG. 2, in the keyword set {keyword, appearance}, 1 that is the first appearance position of “keyword” and 9 that is the first appearance position of “appearance” are output.

キーワード集合照合部１０７は、キーワード集合中のどのキーワードについても、その出現位置が取得できなかった場合、「ヒットしない」を出力して終了する。一方、キーワード集合中の一つ以上のキーワードについて、その出現位置が取得できたら、その得られた出現位置の中で、文書中で最も早く出現するものを出力して終了する。図２の例では、各キーワードの最初の出現位置は［１，９］であるため、１をキーワード集合の出現位置として出力する。このとき、必要に応じて、その出現位置にあるキーワードやそのキーワード長もあわせて出力するようにしても良い。そして、キーワード集合照合部１０７は、終了時に出力した出現位置にあるキーワード（図２の例では「キーワード」）と、全てのキーワードについて取得した出現位置の情報（図２の例では［１，９］）を内部で記憶しておく。 If the appearance position of any keyword in the keyword set cannot be acquired, the keyword set matching unit 107 outputs “no hit” and ends. On the other hand, if the appearance position of one or more keywords in the keyword set can be acquired, the one that appears earliest in the document among the obtained appearance positions is output and the process ends. In the example of FIG. 2, since the first appearance position of each keyword is [1, 9], 1 is output as the appearance position of the keyword set. At this time, if necessary, the keyword at the appearance position and the keyword length may also be output. Then, the keyword set matching unit 107 outputs the keyword at the appearance position output at the end (“keyword” in the example of FIG. 2) and the information on the appearance positions acquired for all the keywords ([1, 9 in the example of FIG. 2). ]) Is stored internally.

図２３は、２回目以降のキーワード集合の出現位置の取得処理の流れである。
キーワード集合照合部１０７は、２回目以降のキーワード集合の出現位置の取得要求があった場合、ステップＳＴ２３０１で、前回出力した出現位置にあるキーワードについて、キーワード照合部１０９から次の出現位置を取得する。ステップＳＴ２３０２で、そのキーワードの出現位置が取得できたか否かを判定する。 FIG. 23 is a flow of the process for acquiring the appearance position of the keyword set for the second and subsequent times.
When there is a request for obtaining the appearance position of the keyword set for the second and subsequent times, the keyword set collating unit 107 obtains the next appearance position from the keyword collating unit 109 for the keyword at the appearance position output last time in step ST2301. . In step ST2302, it is determined whether or not the appearance position of the keyword has been acquired.

ステップＳＴ２３０２において、取得できていた場合（ＹＥＳ）、ステップＳＴ２３０３で、ステップＳＴ２３０１で取得してキーワードの出現位置と、前回以前のキーワード集合照合処理で取得したキーワードの出現位置の中で、最も文書中で早く出現するものを出力する。例えば、図２の例では、前回出力した出現位置にあるキーワードは“キーワード”であるため、ステップＳＴ２３０２において、“キーワード”の次の出現位置である１８を取得する。次に、ステップＳＴ２３０３において、その他のキーワードである“出現”の出現位置９とを比較し、最も小さいものとして９をキーワード集合｛キーワード，出現｝の次の出現位置として出力する。 In step ST2302, if it has been acquired (YES), in step ST2303, it is the most in-document among the keyword appearance position acquired in step ST2301 and the keyword appearance position acquired in the previous keyword set matching process. Output what appears earlier. For example, in the example of FIG. 2, since the keyword at the appearance position output last time is “keyword”, in step ST2302, 18 that is the next appearance position of “keyword” is acquired. Next, in step ST2303, the appearance position 9 of the other keyword “appearance” is compared, and 9 is output as the next appearance position of the keyword set {keyword, appearance} as the smallest one.

ステップＳＴ２３０２で、出現位置が取得できなかった場合（ＮＯ）、ステップＳＴ２３０４に移行する。ステップＳＴ２３０４では、前回以前のキーワード集合照合処理で取得したキーワードの出現位置の情報があるか否かを判定する。出現位置の情報があった場合（ＹＥＳ）、ステップＳＴ２３０３に移行する。ステップＳＴ２３０４で、出現位置の情報がなかった場合（ＮＯ）、ステップＳＴ２３０５で、「ヒットしない」を出力して終了する。例えば、図２の例では、キーワード集合の出現位置として３５を出力した後、次の出現位置要求があった場合、その出現位置３５にある“キーワード”の次の出現位置は取得できないため、他のキーワードである“出現”の出現位置４３をキーワード集合の次の出現位置として出力する。その後は、ステップＳＴ２３０４において、次の出現位置の候補が存在しないため、「ヒットしない」を出力する。 If the appearance position cannot be acquired in step ST2302, the process proceeds to step ST2304. In step ST2304, it is determined whether or not there is information on the appearance position of the keyword acquired in the keyword set matching process before the previous time. If there is information on the appearance position (YES), the process proceeds to step ST2303. When there is no appearance position information in step ST2304 (NO), in step ST2305, “no hit” is output and the process ends. For example, in the example of FIG. 2, when 35 is output as the appearance position of the keyword set and the next appearance position is requested, the next appearance position of the “keyword” at the appearance position 35 cannot be acquired. Is output as the next appearance position of the keyword set. After that, in step ST2304, since there is no candidate for the next appearance position, “no hit” is output.

キーワード照合部１０９の処理の詳細については触れなかったが、キーワード照合部１０９は、キーワードの出現位置取得要求に対して、そのキーワードが文書の先頭から何文字目に出現しているかの情報を返すことができれば、どのような方式で実現しても良い。即ち、記憶装置に文字や文字列と、その文書中での出現位置の組の情報を索引として記録しておいても良いし、出現位置取得要求があったときに文書を直接走査して、キーワードの出現位置を取得しても良い。 Although the details of the processing of the keyword matching unit 109 were not mentioned, the keyword matching unit 109 returns information indicating what character the keyword appears from the top of the document in response to the keyword appearance position acquisition request. As long as it is possible, it may be realized by any method. That is, information on a set of characters and character strings and appearance positions in the document may be recorded as an index in the storage device, or the document may be directly scanned when an appearance position acquisition request is made, You may acquire the appearance position of a keyword.

以上のように、キーワード集合照合部１０７を構成したので、キーワード集合の出現位置取得要求がある度に、一つのキーワードについて出現位置を取得するだけで、キーワード集合の出現位置を出力することができる。このように、キーワード集合中の全てのキーワードについて、全ての出現位置を取得することなく、必要最小限のキーワード照合の実行でキーワード集合の出現位置を取得できる。 As described above, since the keyword set collating unit 107 is configured, the appearance position of the keyword set can be output only by acquiring the appearance position for one keyword each time there is a request for obtaining the appearance position of the keyword set. . Thus, the appearance position of the keyword set can be acquired by executing the minimum necessary keyword matching without acquiring all the appearance positions for all the keywords in the keyword set.

尚、ここでは文書の先頭から順にキーワード集合の出現位置を取得する方式について示したが、実施の形態１乃至実施の形態１０の条件判定の方法によっては、文書の末尾から順にキーワード集合の出現位置を取得することも可能である。その場合は、キーワード照合部１０９を、文書の末尾から順番に一つずつ出力するように構成する。また、ステップＳＴ２３０３では、キーワードの出現位置の中で、文書中で最も後ろに出現するものを出力すればよい。 Here, the method for acquiring the appearance position of the keyword set in order from the beginning of the document has been described. However, depending on the condition determination method in the first to tenth embodiments, the appearance position of the keyword set in order from the end of the document. It is also possible to obtain. In that case, the keyword matching unit 109 is configured to output one by one in order from the end of the document. In step ST2303, it is only necessary to output the most recent appearance in the document among the appearance positions of the keywords.

以上のように、実施の形態１１によれば、キーワード集合照合部１０７を、キーワード集合として直前に出力した出現位置にあるキーワードの次の出現位置のみを取得し、その取得した出現位置と、他のキーワードの出現位置とを比較し、最も小さいものをキーワード集合の次の出現位置として出力するよう構成したので、キーワード集合の文書中での出現位置を高速に取得することができる。 As described above, according to the eleventh embodiment, the keyword set matching unit 107 acquires only the next appearance position of the keyword at the appearance position output immediately before as the keyword set, and the acquired appearance position The appearance position of the keyword set in the document can be acquired at a high speed because the smallest position is output as the next occurrence position of the keyword set.

この発明の実施の形態１による文書検索装置を示す構成図である。It is a block diagram which shows the document search device by Embodiment 1 of this invention. 文書とキーワードとの関係を示す説明図である。It is explanatory drawing which shows the relationship between a document and a keyword. 実施の形態１の文書検索装置における検索処理の流れ図である。3 is a flowchart of search processing in the document search apparatus according to the first embodiment. 実施の形態１の文書検索装置における検索処理実行部の処理の流れ図である。4 is a flowchart of processing of a search processing execution unit in the document search device according to the first embodiment. 実施の形態２における検索処理実行部の処理の流れ図である。10 is a flowchart of processing of a search processing execution unit in the second embodiment. 実施の形態２のキーワード集合の順序指定あり近傍内条件の判定のパスを示す説明図である。FIG. 10 is an explanatory diagram showing a determination path for a neighborhood condition with a specified keyword set order according to the second embodiment; 実施の形態３におけるキーワード集合が二つの場合の検索処理実行部の処理の流れ図である。14 is a flowchart of processing of a search processing execution unit when there are two keyword sets in the third embodiment. 実施の形態３の二つのキーワード集合の順序指定なし近傍内条件の判定のパスを示す説明図である。FIG. 10 is an explanatory diagram illustrating a determination path for a condition in the neighborhood without specifying the order of two keyword sets according to the third embodiment. 実施の形態３における三つ以上のキーワード集合に対する検索処理実行部の処理の流れ図である。14 is a flowchart of processing of a search processing execution unit for three or more keyword sets in the third embodiment. 実施の形態４における検索処理実行部の処理の流れ図である。15 is a flowchart of processing of a search processing execution unit in the fourth embodiment. 実施の形態４のキーワード集合の順序指定あり近傍等条件の判定のパスを示す説明図である。FIG. 20 is an explanatory diagram showing a determination path for conditions such as neighborhoods with specified keyword set order according to the fourth embodiment; 実施の形態６における検索処理実行部の処理の流れ図である。18 is a flowchart of processing of a search processing execution unit in the sixth embodiment. 実施の形態６のキーワード集合の順序指定あり近傍外条件の判定のパスを示す説明図である。FIG. 20 is an explanatory diagram illustrating a determination path for an out-of-neighbor condition with a specified order of keyword sets according to the sixth embodiment. 実施の形態８の文書検索装置を示す構成図である。FIG. 10 is a configuration diagram illustrating a document search device according to an eighth embodiment. 実施の形態８における検索処理実行部の処理の流れ図である。20 is a flowchart of processing of a search processing execution unit in the eighth embodiment. 実施の形態８のキーワード集合の文脈条件の判定のパスを示す説明図である。FIG. 20 is an explanatory diagram illustrating a determination path for a context condition of a keyword set according to an eighth embodiment. 実施の形態９の文書検索装置を示す構成図である。FIG. 20 is a configuration diagram illustrating a document search device according to a ninth embodiment. 実施の形態９における検索処理実行部の処理の流れ図である。20 is a flowchart of processing of a search processing execution unit in the ninth embodiment. 実施の形態９のキーワード集合の範囲条件の判定のパスを示す説明図である。FIG. 20 is an explanatory diagram illustrating a determination path for a keyword set range condition according to the ninth embodiment; 実施の形態１０の文書検索装置を示す構成図である。FIG. 20 is a configuration diagram illustrating a document search device according to an embodiment 10; 実施の形態１０における検索処理実行部の処理の流れ図である。22 is a flowchart of processing of a search processing execution unit in the tenth embodiment. 実施の形態１０のキーワード集合の複合条件の判定のパスを示す説明図である。FIG. 38 is an explanatory diagram illustrating a determination path for a composite condition of a keyword set according to the tenth embodiment. 実施の形態１１における２回目以降のキーワード集合の出現位置の取得処理の流れである。This is a flow of processing for acquiring the appearance position of the keyword set for the second and subsequent times in the eleventh embodiment.

Explanation of symbols

１０１検索条件、１０５，１０５ａ，１０５ｂ，１０５ｃ検索処理実行部、１０７キーワード集合照合部、１０８近傍条件判定部、１１０データベース、１１２文脈条件判定部、１１３範囲条件判定部、１１４複合条件判定部。 101 Search conditions, 105, 105a, 105b, 105c Search processing execution unit, 107 Keyword set collation unit, 108 Neighborhood condition determination unit, 110 Database, 112 Context condition determination unit, 113 Range condition determination unit, 114 Compound condition determination unit

Claims

A keyword set matching unit that obtains the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets, and the appearance positions of the plurality of keyword sets obtained by the keyword set matching unit satisfy a predetermined neighborhood condition. A neighborhood condition determination unit that determines whether or not to satisfy,
A search that alternately executes an appearance position acquisition process in the keyword set matching unit and a neighborhood condition determination process in the neighborhood condition determination unit, and outputs the determination result as a search result when the neighborhood condition is determined to be true. A document search apparatus comprising a processing execution unit.

The neighborhood condition is an in-neighbor condition with an order specification in which a keyword set appears in a specified order and a distance between preceding and following keyword sets is equal to or less than a specified distance. Document retrieval device.

2. The document according to claim 1, wherein the neighborhood condition is an unordered neighborhood condition in which a distance between preceding and following keyword sets is equal to or less than a specified distance when all keyword sets appear in the document. Search device.

2. The document according to claim 1, wherein the neighborhood condition is a neighborhood condition such that a keyword set appears in a specified order and a distance between preceding and following keyword sets is equal to a specified distance. Search device.

2. The document search according to claim 1, wherein when all keyword sets appear in the document, the neighborhood condition is a condition such as an unordered neighborhood condition in which a distance between preceding and following keyword sets is equal to a specified distance. apparatus.

The neighborhood condition is an out-of-neighbor condition with an order specification in which a keyword set appears in a specified order and a distance between preceding and following keyword sets is equal to or greater than a specified distance. Document retrieval device.

2. The document according to claim 1, wherein the neighborhood condition is an out-of-order non-neighbor condition where a distance between preceding and following keyword sets is equal to or greater than a specified distance when all keyword sets appear in the document. Search device.

A keyword set collation unit that acquires the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets, and a plurality of keyword sets obtained by the keyword set collation unit appear in a constituent unit of the same document A context condition determination unit that determines whether or not a context condition indicating that
Retrieval in which appearance position acquisition processing in the keyword set matching unit and context condition determination processing in the context condition determination unit are alternately executed, and when the context condition is determined to be true, the determination result is output as a search result A document search apparatus comprising a processing execution unit.

A keyword set matching unit that obtains the appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets, and a plurality of keyword sets obtained by the keyword set matching unit appear in a specific document range. A range condition determination unit that determines whether or not a range condition indicating that,
Retrieval in which appearance position acquisition processing in the keyword set matching unit and range condition determination processing in the range condition determination unit are alternately executed, and when the range condition is determined to be true, the determination result is output as a search result A document search apparatus comprising a processing execution unit.

A keyword set matching unit that obtains appearance positions of a plurality of keyword sets in a document in the order of the appearance positions of the keyword sets, and a plurality of keyword sets obtained by the keyword set matching unit include a predetermined neighborhood condition, A composite condition determining unit that determines whether or not a composite condition that combines a context condition that indicates occurrence in a structural unit, a range condition that indicates that it appears in a specific document range, and a logical condition of these conditions is combined; Have
A search that alternately executes an appearance position acquisition process in the keyword set matching unit and a complex condition determination process in the complex condition determination unit, and outputs the determination result as a search result when the complex condition is determined to be true. A document search apparatus comprising a processing execution unit.

The keyword set matching unit acquires only the next appearance position of the keyword at the appearance position output immediately before as a keyword set, compares the obtained appearance position with the appearance position of another keyword, and determines the smallest one. The document search apparatus according to claim 1, wherein the document search apparatus outputs the next occurrence position of the keyword set.