JP2012003355A

JP2012003355A - Retrieval device, method, and program

Info

Publication number: JP2012003355A
Application number: JP2010135605A
Authority: JP
Inventors: Akihiro Miyata; 章裕宮田; Takashi Fujimura; 考藤村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-06-14
Filing date: 2010-06-14
Publication date: 2012-01-05
Anticipated expiration: 2030-06-14
Also published as: JP5514002B2

Abstract

PROBLEM TO BE SOLVED: To reduce the size of a retrieval index without decreasing retrieval accuracy in the case of creating the retrieval index in response to a retrieval request for uniquely acquiring a specific position of a specific document.SOLUTION: A position where an index key is extracted is determined based on the feature of a character code expressing a character from the whole or partial area of an inputted document, and an index key configured of the combination of one or more characters lying at the position is extracted from the whole or partial area of the document to associate the index key with an appearance position in the document, where the index key appears, to output to an index DB. Furthermore, a partial area in a certain document is accepted as a search query, and from the search query, a position where the query key is extracted is determined based on the feature of the character code expressing the character and a query key configured of the combination of one or more characters is extracted. The index DB is searched based on the query key, and a search result is outputted.

Description

本発明は、検索装置及び方法及びプログラムに係り、特に、改ページや改行位置が確定しているドキュメント内の部分領域の撮影画像を検索クエリとして、該領域が出現するドキュメント及び該ドキュメント内における位置を取得する検索要求に応えるための、ドキュメント及びドキュメント内の各位置のインデックスを作成する検索装置及び方法及びプログラムに関する。 The present invention relates to a search apparatus, method, and program, and in particular, a document in which a region appears and a position in the document using a captured image of a partial region in a document in which a page break or a line break position is determined as a search query. The present invention relates to a search apparatus, a method, and a program for creating a document and an index of each position in the document in response to a search request for acquiring a document.

詳しくは、改ページや改行位置が確定しているドキュメント内の該領域を含む可能性があるドキュメント及びドキュメント内における位置を網羅的に取得するのではなく、位置を一意に特定したい場合に適用される検索装置及び方法及びプログラムに関する。 Specifically, this is applied to a document that may include the area in a document where the page break or line feed position has been determined, and a position in the document that is not comprehensively acquired, but is intended to uniquely identify the position. The present invention relates to a search apparatus, method, and program.

ドキュメントの一部領域から、該領域がどのドキュメントに含まれているか、あるいは、どのドキュメントのどの位置に含まれているか一意に特定することが必要なシーンは少なくない。 There are not a few scenes in which it is necessary to uniquely identify from which document a part of a document is included in which document or at which position in which document.

例えば、手元に雑誌の切り抜きがある場合、切り抜いた元の雑誌を探して、切り抜きの続きを読みたいことがある。この場合、該切り抜きがどの雑誌の一部であったか一意に特定する必要がある。 For example, if there is a magazine cut out at hand, there may be a case where the original cut out magazine is searched and the continuation of the cut out is read. In this case, it is necessary to uniquely identify which magazine the clipping was part of.

上記の事例は、ドキュメントの一部領域をクエリとし、膨大な量のドキュメント群の中から、該領域を含むドキュメント名、あるいはドキュメント名及びドキュメントにおける位置を問い合わせる検索システムと捉えることができる。 The above example can be regarded as a search system that uses a partial area of a document as a query and inquires about a document name including the area or a document name and a position in the document from a huge amount of documents.

そして、ドキュメント群の中から情報を取得する検索要求に応えるシステムを構築するためには、ドキュメント群を事前に分析して検索インデックスを作成する必要がある。 In order to construct a system that responds to a search request for acquiring information from a document group, it is necessary to analyze the document group in advance and create a search index.

例えば、図２５のように、ドキュメント内に登場するＮ文字の連続した文字列を抽出し、該文字列を検索インデックスのキーとし、該文字列を含むドキュメント名、あるいは、ドキュメント名及びドキュメント中において該文字列が登場する位置を検索インデックスの値とする方式が挙げられる。 For example, as shown in FIG. 25, a continuous character string of N characters appearing in a document is extracted, and the character string is used as a search index key, and the document name including the character string or the document name and the document There is a method in which the position where the character string appears is used as a search index value.

また、N-gram方式は幅広い場面で有用性が認められており、現在でも多くの拡張手法が提案されている。また、通常のN-gram方式に加え、状況に応じてＮの値を変動させる方式も実施されている（例えば、非特許文献１参照）。 In addition, the N-gram method has been recognized as useful in a wide range of situations, and many extension methods have been proposed even now. In addition to the normal N-gram method, a method of changing the value of N according to the situation has been implemented (see, for example, Non-Patent Document 1).

「Unicodeを用いたN-gram索引の一実現方式とその評価」情報処理学会研究会報告、2000-NL-136-17,pp.135-142."A realization method of N-gram index using Unicode and its evaluation", Information Processing Society of Japan, 2000-NL-136-17, pp.135-142.

しかしながら、ドキュメントの一部領域をクエリとして上記方式で作成した検索インデックスに検索問い合わせを行う場合、検索精度を下げずにインデックスサイズは減らすことは難しい。 However, when a query is made to a search index created by the above method using a partial region of a document as a query, it is difficult to reduce the index size without reducing the search accuracy.

例えば、図２６のように、「ドキュメント１」の２ページの部分領域を撮影し、撮影した部分画像をOCR （光学文字認識）処理して部分テキストに変換し、該部分テキストから検索キーを抽出し、該キーをもとに検索インデックスに対して検索問い合わせを行う場合について考える。なお、検索インデックス作成時、検索問い合わせ時のキー抽出方法は、上述のＮ文字の連続した文字列を抽出する方法とし、Ｎ＝２とする。また、図２７のように、各検索キーの検索問い合わせ結果を集計して件数が最多である元ドキュメント名及び元ドキュメントにおける位置を特定する。 For example, as shown in FIG. 26, a two-page partial area of “document 1” is photographed, the photographed partial image is converted into partial text by OCR (optical character recognition), and a search key is extracted from the partial text. Consider a case in which a search query is made to the search index based on the key. Note that the key extraction method at the time of creating a search index and at the time of a search inquiry is a method of extracting a continuous character string of N characters as described above, and N = 2. Also, as shown in FIG. 27, the search query results of each search key are totaled to identify the original document name and the position in the original document with the largest number of cases.

まず、図２５のように読む方向に１文字ずつずらしながらキー抽出を行って検索インデックスを作成する場合について考える。この場合、図２８のように部分テキストの左上端から読む方向に１文字ずつずらしながら抽出した全ての検索キーに対して、正しい検索結果（この場合は「ドキュメント１」の２ページ）を含む問い合わせ結果が得られるため、検索問い合わせ結果を集計して件数が最多であるドキュメント・ドキュメントにおける位置を求めると（この場合は「ドキュメント１」の２ページ）、それは正しい元ドキュメント・元ドキュメントにおける位置である。しかし、この方法は、１文字ずつずらしながらキー抽出を行って検索インデックスを作成するため、検索インデックスのデータ量が膨大になり、検索問い合わせ速度低下、検索インデックス格納ハードディスク容量の増大という問題がある。 First, consider the case of creating a search index by performing key extraction while shifting character by character in the reading direction as shown in FIG. In this case, as shown in FIG. 28, inquiries including correct search results (in this case, two pages of “Document 1”) for all search keys extracted while shifting one character at a time from the upper left end of the partial text. Since the result is obtained, the search query results are aggregated to obtain the position in the document / document having the largest number of cases (in this case, “Document 1”, page 2), it is the correct position in the original document / original document. . However, since this method creates a search index by extracting keys while shifting character by character, there is a problem that the data amount of the search index becomes enormous, the search query speed decreases, and the search index storage hard disk capacity increases.

一方で、検索インデックスのデータ量を削減するために図２９のように読む方向に２文字ずつずらしながらキー抽出を行って検索インデックスを作成する場合について考える。この場合、図３０のように部分テキストの左上端から読む方向に１文字ずつずらしながら抽出した全ての検索キーに対して、正しい検索結果（この場合は「ドキュメント１」の２ページ）を含む問い合わせ結果が得られないことがある。すなわち、検索問い合わせに用いたキーのうち「アッ」、「プで」、「を電」、「気信」、「を通」、「じて」はそもそもドキュメント1の２ページに対する検索インデックスが作成されていないので、これらのキーの問い合わせ結果には正しい問い合わせ結果である「ドキュメント１」の２ページが含まれない。このため、検索問い合わせ結果を集計して件数が最多であるドキュメント・ドキュメントにおける位置を求めると（この場合は「ドキュメント５」の４３ページ）、それは正しい元ドキュメント・元ドキュメントにおける位置にならない場合がある。 On the other hand, in order to reduce the data amount of the search index, consider a case where a search index is created by performing key extraction while shifting by two characters in the reading direction as shown in FIG. In this case, as shown in FIG. 30, inquiries including correct search results (in this case, two pages of “Document 1”) for all search keys extracted while shifting one character at a time from the upper left end of the partial text. Results may not be obtained. In other words, among the keys used for search queries, “A”, “P”, “Den”, “Chi”, “Through”, and “Ji” are created as search indexes for two pages of Document 1. Therefore, the inquiry result of these keys does not include two pages of “document 1” which is a correct inquiry result. For this reason, when the search query results are aggregated to obtain the position in the document / document having the largest number (in this case, page 43 of “Document 5”), it may not be the correct position in the original document / original document. .

このとき、図３１のように部分テキストの左上端から読む方向に２文字ずつずらしながらキー抽出を行えば正しい問い合わせ結果が得られることもある。しかし、部分テキストから２文字ずつずらしながらキー抽出を行う場合、必ずしも図３１のように正しく検索問い合わせができるとは限らない。すなわち、元のドキュメントの部分領域を撮影したものを入力とする場合、どの領域が撮影されるか既定することは難しく、撮影される部分領域が１文字分ずれただけで正しい検索が行えなくなってしまう。つまり、図３２のように図３１から1文字分ずれた部分テキストの左上端からキー抽出を行うと、検索問い合わせに用いたキー「アッ」、「プで」、「を電」、「気信」、「を通」、「じて」はそもそも「ドキュメント１」の２ページに対する検索インデックスが作成されていないので、検索キーに対して正しい検索結果（この場合は「ドキュメント１」の２ページ）を含む問い合わせ結果がまったく得られない。 At this time, a correct query result may be obtained if key extraction is performed while shifting by two characters in the reading direction from the upper left end of the partial text as shown in FIG. However, when key extraction is performed while shifting two characters from a partial text, a search query cannot always be made correctly as shown in FIG. That is, when an input of a partial area of the original document is used as an input, it is difficult to determine which area is to be captured, and a correct search cannot be performed simply by shifting the captured partial area by one character. End up. That is, as shown in FIG. 32, when the key is extracted from the upper left corner of the partial text shifted by one character from FIG. 31, the keys “A”, “P”, “Den”, “Chi” ”,“ Through ”, and“ Jiji ”are not created with search indexes for the two pages of“ Document 1 ”in the first place, so that the correct search result for the search key (in this case,“ Document 1 ”, page 2)) Inquiry results including are not obtained at all.

このように、検索インデックスのデータ量を削減するために２文字ずつずらしながらキー抽出を行うと（図２９）、部分テキストからキー抽出する方法によっては正しく検索が行えない場合（図３０、図３２）があり、検索精度は低下していると言える。ここでは２文字ずつずらしてキー抽出する例で説明したが、Ｍ文字ずつ（Ｍ＞２）ずらしてキー抽出する場合も本質的に問題は同じである。 As described above, when key extraction is performed while shifting two characters at a time in order to reduce the data amount of the search index (FIG. 29), the search cannot be performed correctly depending on the method of extracting keys from partial text (FIGS. 30 and 32). ), And the search accuracy is low. In this example, the key extraction is performed by shifting two characters at a time, but the problem is essentially the same when the keys are extracted by shifting M characters (M> 2).

本発明は、上記の点に鑑みなされたもので、ドキュメント群の中から特定ドキュメントの特定位置を一意に取得する検索要求に応じるための検索インデックス作成時に、検索精度を低下させることなく、検索インデックスのサイズを減らすことが可能な検索装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and at the time of creating a search index for responding to a search request for uniquely acquiring a specific position of a specific document from a document group, the search index is not reduced. It is an object of the present invention to provide a search apparatus, method, and program capable of reducing the size of a program.

上記の課題を解決するために、本発明（請求項１）は、改ページや改行位置が確定しているドキュメント内の一部領域を検索クエリとして、該領域が出現するドキュメント及び該ドキュメント内における位置を取得する検索要求に応えるための検索インデックスを作成し、検索を行う検索装置であって、
インデックス作成対象のドキュメントの入力を受け付けるドキュメント入力手段と、
前記ドキュメントの全体または一部領域から、インデックスキーを抽出する位置を、文字を表現する文字コードの特徴に基づいて決定するインデックスキー抽出位置決定手段と、
前記ドキュメントの全体または一部領域から、前記位置にある１文字以上の文字の組み合わせからなるインデックスキーを抽出するインデックスキー抽出手段と、
前記インデックスキーと該インデックスキーが出現するドキュメントにおける出現位置を関連付けてインデックス記憶手段に出力するインデックス出力手段と、を有する。 In order to solve the above problems, the present invention (Claim 1) uses a partial area in a document in which a page break or a line break position is determined as a search query, and the document in which the area appears and the document A search device for creating a search index for responding to a search request for acquiring a position and performing a search,
A document input means for accepting input of documents to be indexed;
Index key extraction position determining means for determining a position for extracting an index key from the whole or a partial area of the document based on characteristics of a character code representing a character;
Index key extracting means for extracting an index key composed of a combination of one or more characters at the position from the whole or a partial area of the document;
Index output means for associating an appearance position in the document in which the index key appears with the index key and outputting it to the index storage means.

また、本発明（請求項２）は、請求項１の前記インデックスキー抽出位置決定手段において、
ドキュメントの全体または一部領域から、1文字以上の文字の組み合わせからなるインデックスキーを抽出する位置を、文字とその近傍に存在する文字の文字コードの関係に基づいて決定する手段を含む。 The present invention (Claim 2) is the index key extraction position determination means according to Claim 1,
Means for determining a position for extracting an index key composed of a combination of one or more characters from the whole or a partial area of the document based on the relationship between the characters and the character codes of the characters existing in the vicinity thereof;

また、本発明（請求項３）は、あるドキュメント内の一部領域を検索クエリとして受け付けるクエリ入力手段と、
前記検索クエリから、クエリキーを抽出する位置を、文字を表現する文字コードの特徴に基づいて決定するクエリキー抽出位置決定手段と、
前記検索クエリから、１文字以上の文字の組み合わせからなるクエリキーを抽出するクエリキー抽出手段と、
前記クエリキーに基づいて、前記インデックス記憶手段を検索し、その検索結果を出力する検索手段と、を更に有する。 Further, the present invention (Claim 3) includes query input means for accepting a partial area in a document as a search query,
Query key extraction position determination means for determining a position from which the query key is extracted from the search query based on the characteristics of the character code representing the character;
Query key extraction means for extracting a query key consisting of a combination of one or more characters from the search query;
Search means for searching the index storage means based on the query key and outputting the search result is further included.

また、本発明（請求項４）は、請求項３の前記クエリキー抽出位置決定手段において、
検索クエリから、１文字以上の文字の組み合わせからなるクエリキーを抽出する位置を、文字とその近傍に存在する文字の文字コードの関係に基づいて決定する手段を含む。 Further, the present invention (Claim 4) is the query key extraction position determination means according to Claim 3,
Means for determining a position from which a query key consisting of a combination of one or more characters is extracted from the search query based on the relationship between the characters and the character codes of the characters existing in the vicinity thereof;

また、本発明（請求項５）は、改ページや改行位置が確定しているドキュメント内の一部領域を検索クエリとして、該領域が出現するドキュメント及び該ドキュメント内における位置を取得する検索要求に応えるための検索インデックスを作成し、検索を行う検索方法であって、
入力手段が、インデックス作成対象のドキュメントの入力を受け付けるドキュメント入力ステップと、
インデックスキー抽出位置決定手段が、前記ドキュメントの全体または一部領域から、インデックスキーを抽出する位置を、文字を表現する文字コードの特徴に基づいて決定するインデックスキー抽出位置決定ステップと、
インデックスキー抽出手段が、前記ドキュメントの全体または一部領域から、前記位置にある１文字以上の文字の組み合わせからなるインデックスキーを抽出するインデックスキー抽出ステップと、
インデックス出力手段が、前記インデックスキーと該インデックスキーが出現するドキュメントにおける出現位置を関連付けてインデックス記憶手段に出力するインデックス出力ステップと、を行う。 Further, the present invention (Claim 5) uses a partial area in a document in which a page break or a line break position is fixed as a search query, and makes a search request for acquiring a document in which the area appears and a position in the document. A search method for creating a search index and performing a search,
A document input step in which the input means receives input of a document to be indexed;
An index key extraction position determination means for determining an index key extraction position from the whole or a partial area of the document based on the characteristics of the character code representing the character;
An index key extracting unit that extracts an index key composed of a combination of one or more characters at the position from the whole or a partial area of the document; and
The index output means performs an index output step of associating the index key with an appearance position in the document in which the index key appears, and outputting to the index storage means.

また、本発明（請求項６）は、請求項５の前記インデックスキー抽出位置決定ステップにおいて、
ドキュメントの全体または一部領域から、1文字以上の文字の組み合わせからなるインデックスキーを抽出する位置を、文字とその近傍に存在する文字の文字コードの関係に基づいて決定する。 Further, according to the present invention (Claim 6), in the index key extraction position determination step of Claim 5,
A position for extracting an index key composed of a combination of one or more characters from the whole or a partial area of the document is determined based on the relationship between the characters and the character codes of the characters existing in the vicinity.

また、本発明（請求項７）は、クエリ入力手段が、あるドキュメント内の一部領域を検索クエリとして受け付けるクエリ入力ステップと、
クエリキー抽出位置決定手段が、前記検索クエリから、クエリキーを抽出する位置を、文字を表現する文字コードの特徴に基づいて決定するクエリキー抽出位置決定ステップと、
クエリキー抽出手段が、前記検索クエリから、１文字以上の文字の組み合わせからなるクエリキーを抽出するクエリキー抽出ステップと、
検索手段が、前記クエリキーに基づいて、前記インデックス記憶手段を検索し、その検索結果を出力する検索ステップと、を更に行う。 Further, according to the present invention (claim 7), the query input means accepts a partial area in a document as a search query;
A query key extraction position determining means for determining a position from which the query key is extracted from the search query based on a characteristic of a character code representing the character;
A query key extracting means for extracting a query key comprising a combination of one or more characters from the search query;
The search means further performs a search step of searching the index storage means based on the query key and outputting the search result.

また、本発明（請求項８）は、請求項７の前記クエリキー抽出位置決定ステップにおいて、
検索クエリから、１文字以上の文字の組み合わせからなるクエリキーを抽出する位置を、文字とその近傍に存在する文字の文字コードの関係に基づいて決定する。 The present invention (Claim 8) is characterized in that in the query key extraction position determination step of Claim 7,
A position where a query key composed of a combination of one or more characters is extracted from the search query is determined based on the relationship between the characters and the character codes of the characters existing in the vicinity thereof.

また、本発明（請求項９）は、請求項１乃至４のいずれか１項に記載の検索装置を構成する各手段としてコンピュータを機能させるためのプログラムである。 Moreover, this invention (Claim 9) is a program for functioning a computer as each means which comprises the search device of any one of Claim 1 thru | or 4.

上記のように、本発明によれば、キー抽出位置特定の際に、文字コードのパターンというクエリ位置および言語に非依存の情報を用いることにより、ドキュメント群の中から特定ドキュメントの特定位置を一意に取得する検索要求に応じるための検索インデックス作成時に、検索精度を低下させることなく、検索インデックスのサイズを減らすことができる。 As described above, according to the present invention, when specifying the key extraction position, the specific position of the specific document is uniquely identified from the document group by using the query position and the language-independent information called the character code pattern. When creating a search index for responding to a search request acquired at the same time, it is possible to reduce the size of the search index without reducing the search accuracy.

特に、文字コードのパターンでキー抽出位置を決定した場合、クエリ位置（第1の実施の形態で撮影した書籍内の位置）に依らず、検索インデックスキーが作成されている位置から検索キーを抽出することができるため、書籍内に網羅的に検索インデックスが作成されていなくても精度良く検索を実行できる。 In particular, when the key extraction position is determined by the character code pattern, the search key is extracted from the position where the search index key is created, regardless of the query position (the position in the book taken in the first embodiment). Therefore, even if the search index is not comprehensively created in the book, the search can be executed with high accuracy.

また、文字コードのパターンでキー抽出位置を決定した場合、各国の言語特徴の違いを気にすることなく本手法の効果を発揮できる。 In addition, when the key extraction position is determined by the character code pattern, the effect of the present technique can be exhibited without worrying about differences in language characteristics in each country.

本発明の第１の実施の形態におけるサーバ部の構成図である。It is a block diagram of the server part in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるドキュメントの例である。It is an example of the document in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索インデックス作成処理のフローチャートである。It is a flowchart of the search index creation process in the 1st Embodiment of this invention. 本発明の第１の実施の形態における関連付けデータの例である。It is an example of the correlation data in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるキー抽出位置決定処理を示す図（その１）である。It is FIG. (1) which shows the key extraction position determination process in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるキー抽出位置のデータの例である。It is an example of the data of the key extraction position in the 1st Embodiment of this invention. 本発明の第１の実施の形態における抽出されたキーの例である。It is an example of the extracted key in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索インデックスＤＢの例である。It is an example of search index DB in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索問い合わせ処理のフローチャートである。It is a flowchart of the search inquiry process in the 1st Embodiment of this invention. 本発明の第１の実施の形態における撮影した部分領域の例である。It is an example of the partial area image | photographed in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるキー抽出決定処理を示す図（その２）である。It is FIG. (2) which shows the key extraction determination process in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索問い合わせ作成処理で決定されたキー抽出位置を示す図である。It is a figure which shows the key extraction position determined by the search inquiry creation process in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索問い合わせ処理で抽出されたキーの例である。It is an example of the key extracted by the search inquiry process in the 1st Embodiment of this invention. 本発明に第１の実施の形態における検索問い合わせ結果の集計例である。It is a totaling example of the search inquiry result in 1st Embodiment in this invention. 本発明の第１の実施の形態におけるコンテンツＤＢの例である。It is an example of content DB in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるＷｅｂブラウザの表示例である。It is a display example of the Web browser in the 1st Embodiment of this invention. 本発明の第２の実施の形態における検索インデックス作成処理のフローチャートである。It is a flowchart of the search index creation process in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における検索インデックス作成時のキー抽出位置決定処理を示す図である。It is a figure which shows the key extraction position determination process at the time of the search index preparation in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における検索問い合わせ処理のフローチャートである。It is a flowchart of the search inquiry process in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における検索問い合わせ作成時のキー抽出位置を決定処理を示す図である。It is a figure which shows the determination process of the key extraction position at the time of the search inquiry preparation in the 2nd Embodiment of this invention. 本発明の第３の実施の形態における検索インデックス作成処理のフローチャートである。It is a flowchart of the search index creation process in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における検索インデックス作成時のキー抽出位置決定処理を示す図である。It is a figure which shows the key extraction position determination process at the time of the search index preparation in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における検索問い合わせ処理のフローチャートである。It is a flowchart of the search inquiry process in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における検索問い合わせ時のキー抽出位置決定処理を示す図である。It is a figure which shows the key extraction position determination process at the time of the search inquiry in the 3rd Embodiment of this invention. 従来技術による検索インデックス作成例である。It is an example of search index creation by a prior art. 従来技術による検索問い合わせの例である。It is an example of the search inquiry by a prior art. 従来技術による検索問い合わせ時のキー抽出方法を示す図である。It is a figure which shows the key extraction method at the time of the search inquiry by a prior art. 従来技術による読む方向に１文字ずつずらしながらキー抽出を行う例である。In this example, key extraction is performed while shifting one character at a time in the reading direction according to the prior art. 従来技術による読む広報に２文字ずつずらしながらキー抽出を行う例である。This is an example in which key extraction is performed while shifting two characters at a time for publicity reading according to the prior art. 従来技術による部分テキストの左上端から読む方向に１文字ずつずらしながらキー抽出を行う例である。This is an example in which key extraction is performed while shifting one character at a time in the reading direction from the upper left corner of a partial text according to the prior art. 従来技術による部分テキストの左上端から読む方向に２文字ずつずらしながらキー抽出を行う例（その１）である。This is an example (part 1) in which key extraction is performed while shifting by two characters in the reading direction from the upper left corner of a partial text according to the prior art. 従来技術による部分テキストの左上端から読む方向に２文字ずつずらしながらキー抽出を行う例（その２）である。This is an example (part 2) in which key extraction is performed while shifting by two characters in the reading direction from the upper left corner of a partial text according to the prior art.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

［第１の実施の形態］
図１は、本発明の第１の実施の形態におけるサーバの構成を示す。 [First Embodiment]
FIG. 1 shows the configuration of a server according to the first embodiment of the present invention.

同図に示すサーバ３００部は、本発明の検索装置として利用されるものである。 The server 300 shown in the figure is used as a search device of the present invention.

サーバ部３００はＰＣサーバ等の機器で実現でき、データ入力部３０１、キー抽出位置決定部３０２、キー抽出部３０３、検索インデックス出力部３０４、検索インデックスＤＢ３０５、サーバ側データ送受信部３０６、検索問い合わせ部３０７、コンテンツＤＢ３０８から構成される。 The server unit 300 can be realized by a device such as a PC server, and includes a data input unit 301, a key extraction position determination unit 302, a key extraction unit 303, a search index output unit 304, a search index DB 305, a server side data transmission / reception unit 306, a search inquiry unit. 307 and content DB 308.

同図におけるクライアント部４００はカメラ付き携帯電話等で実現でき、ドキュメント撮影部４０１、クライアント側データ送受信部４０２、コンテンツ表示部４０３から構成される。 The client unit 400 in FIG. 1 can be realized by a camera-equipped mobile phone or the like, and includes a document photographing unit 401, a client-side data transmission / reception unit 402, and a content display unit 403.

同図におけるドキュメント読み取り装置２００は、サーバ部３００のデータ入力部３０１、及び、サーバ側データ送受信部３０６に接続され、一般的なスキャナ等の外部装置であり、文字が記載された紙媒体の文書を入力とし、文書をスキャンして電子的な画像ファイルに変換したものを出力とする。光学文字認識装置２０１は一般的なOCRソフトウェア等の外部装置であり、文字が写っている画像ファイルを入力とし、写っている文字を電子的なテキストデータに変換したものを出力とする。 The document reading device 200 in FIG. 1 is connected to the data input unit 301 and the server-side data transmission / reception unit 306 of the server unit 300 and is an external device such as a general scanner, and is a paper medium document on which characters are written. Is input, and the document is scanned and converted to an electronic image file as output. The optical character recognition device 201 is an external device such as general OCR software, which takes an image file containing characters as input, and outputs the converted character as electronic text data.

同図におけるドキュメント１００は、図２のような文章を含む紙媒体書籍の１ページである。なお、図２は文章のみからなるページの例であるが、ページには図や表等の文字以外の情報が含まれていてもよい。また、ドキュメントは１ページの一部分から構成されても構わないし、複数ページから構成されても構わない。 A document 100 in the figure is one page of a paper medium book including sentences as shown in FIG. Note that FIG. 2 is an example of a page composed only of text, but the page may include information other than characters such as a figure and a table. Further, the document may be composed of a part of one page or may be composed of a plurality of pages.

以下に、上記の構成における処理を説明する。 Hereinafter, processing in the above configuration will be described.

本発明は、検索インデックス作成処理と検索問い合わせ作成処理に分けられる。 The present invention is divided into search index creation processing and search query creation processing.

＜検索インデックス作成処理＞
図３は、本発明の第１の実施の形態における検索インデックス作成処理のフローチャートである。 <Search index creation process>
FIG. 3 is a flowchart of search index creation processing according to the first embodiment of this invention.

ステップ１）インデックス作成時入力ステップ：データ入力部３０１において、インデックス作成対象となるドキュメントをテキストデータとして入力する。 Step 1) Index creation input step: In the data input unit 301, an index creation document is input as text data.

ステップ１０）ドキュメント読み取り装置２００は、ドキュメント１００を入力として受け付け、ドキュメント100を画像ファイルに変換したものを出力する。 Step 10) The document reading apparatus 200 receives the document 100 as an input, and outputs a document 100 converted into an image file.

ステップ１１）光学文字認識装置２０１は、ステップ１０の出力を入力として受け付け、画像ファイルをテキストデータに変換したものを出力する。テキストデータは画像ファイルに写っているテキストの改行位置も保持している。 Step 11) The optical character recognition apparatus 201 receives the output of Step 10 as an input, and outputs an image file converted into text data. The text data also holds the line feed position of the text in the image file.

ステップ１２）データ入力部３０１は、ドキュメント１００のドキュメント名、ドキュメントにおける位置、ステップ１１の出力を入力として受け付け、これらを図４のように関連付けてキー抽出位置決定部３０２に出力する。ここではドキュメント名は書籍名、ドキュメントにおける位置はページとする。 Step 12) The data input unit 301 accepts the document name of the document 100, the position in the document, and the output of step 11 as input, and outputs them to the key extraction position determination unit 302 in association with each other as shown in FIG. Here, the document name is a book name, and the position in the document is a page.

ステップ２）インデックス作成時キー抽出ステップ：キー位置抽出位置決定部３０２、キー抽出部３０３は、一定のルールに従ってテキストデータからキーを抽出する。 Step 2) Index creation key extraction step: The key position extraction position determination unit 302 and the key extraction unit 303 extract a key from text data according to a certain rule.

ステップ１３）キー抽出位置決定部３０２は、ステップ１２におけるデータ入力部３０１からの出力を入力として受け付け、キー抽出位置を決定する。キー抽出位置は文字コードのパターンを用いて決定する。ここでは図５のように、元のテキストデータ（図５（ａ））の各文字をUnicodeコードに変換し（図５（ｂ））、「文字Ａのコード＜文字Ａの右隣に出現する文字のコード」となる文字Ａをキー抽出位置とし（図５（ｃ））、キー抽出位置を（左上端から右方向へ数えた場合の文字数、左上端から下方向へ数えた場合の文字数）という座標形式で表現して、図４のデータにキー抽出位置の情報を付加して図６のように出力する。 Step 13) The key extraction position determination unit 302 receives the output from the data input unit 301 in step 12 as an input, and determines the key extraction position. The key extraction position is determined using a character code pattern. Here, as shown in FIG. 5, each character of the original text data (FIG. 5A) is converted into a Unicode code (FIG. 5B), and “Character A code <appears to the right of character A”. Character A that is “character code” is set as the key extraction position (FIG. 5C), and the key extraction position (number of characters when counting from the upper left corner to the right, number of characters when counting from the upper left corner to the lower) The key extraction position information is added to the data shown in FIG. 4 and output as shown in FIG.

ステップ１４）キー抽出部３０３は、ステップ１３の出力を入力として受け付け、一定ルールに基づきキー抽出を行う。ここでは、キー抽出位置にある文字と該文字の右隣にある文字を連結した２文字をキーとして図７のように出力する。図７では、抽出されたキーと当該キーの抽出対象となったドキュメント名、ドキュメント内における位置（ページ）を検索インデックス出力部３０４に出力する。 Step 14) The key extraction unit 303 receives the output of Step 13 as an input, and performs key extraction based on a certain rule. Here, two characters obtained by concatenating the character at the key extraction position and the character to the right of the character are output as keys as shown in FIG. In FIG. 7, the extracted key, the document name from which the key is extracted, and the position (page) in the document are output to the search index output unit 304.

ステップ３）インデックス作成時キー出力ステップ：抽出されたキーを検索インデックスとして出力する。 Step 3) Key generation step during index creation: The extracted key is output as a search index.

ステップ１５）検索インデックス出力部３０４は、ステップ１４の出力を入力として受け付け、図７の形式のまま検索インデックスＤＢ３０５に格納する。ここでは、複数のドキュメントに対してステップ１０〜１５の処理を繰り返し行い、検索インデックスＤＢ３０５には図８のようなデータが格納されたものとする。 Step 15) The search index output unit 304 accepts the output of step 14 as input and stores it in the search index DB 305 in the format of FIG. Here, it is assumed that the processing of steps 10 to 15 is repeated for a plurality of documents, and the search index DB 305 stores data as shown in FIG.

＜検索問い合わせ作成処理＞
図９は、本発明の第１の実施の形態における検索問い合わせ作成処理のフローチャートである。 <Search query creation process>
FIG. 9 is a flowchart of search query creation processing according to the first embodiment of this invention.

ステップ４）問い合わせ時入力ステップ：問い合わせ対象となるドキュメントをテキストデータとして入力する。 Step 4) Inquiry input step: A document to be inquired is inputted as text data.

ステップ１６）クライアント部４００のドキュメント撮影部４０１は、ドキュメント１００の部分領域を撮影して図１０のように画像ファイルとして出力する。 Step 16) The document photographing unit 401 of the client unit 400 photographs a partial area of the document 100 and outputs it as an image file as shown in FIG.

ステップ１７）クライアント側データ送受信部４０２は、ステップ１６で出力された部分領域の画像ファイルを入力として受け付け、画像ファイルのままネットワーク等を通じてサーバ部３００に出力する。 Step 17) The client-side data transmitting / receiving unit 402 accepts the image file of the partial area output in Step 16 as an input, and outputs the image file as it is to the server unit 300 through a network or the like.

ステップ１８）サーバ側データ送受信部３０６は、ステップ１７において、クライアント部４００から出力された画像ファイルを入力として受け付け、光学文字認識装置201を用いて画像ファイルをテキストデータに変換したものを出力する。テキストデータは画像ファイルに写っているテキストの改行位置も保持している。 Step 18) In step 17, the server-side data transmission / reception unit 306 accepts the image file output from the client unit 400 as an input, and outputs the image file converted into text data using the optical character recognition device 201. The text data also holds the line feed position of the text in the image file.

ステップ５）問い合わせ時キー抽出ステップ：、一定のルールに従ってテキストデータからキーを抽出する。 Step 5) Inquiry key extraction step: A key is extracted from text data according to a certain rule.

ステップ１９）キー抽出位置決定部３０２は、ステップ１８において出力されたテキストデータを入力として受け付け、キー抽出位置を決定する。キー抽出位置はステップ１３と同一の方法を用いて決定する。すなわち、図１１のように各文字をUnicodeコードに変換し（図１１（ｂ））、「文字Ａのコード＜文字Ａの右隣に出現する文字のコード」となる文字Ａをキー抽出位置とし（図１１（ｃ））、キー抽出位置を（左上端から右方向へ数えた場合の文字数、左上端から下方向へ数えた場合の文字数）という座標形式で表現して図１２のように出力する。 Step 19) The key extraction position determination unit 302 receives the text data output in Step 18 as an input, and determines the key extraction position. The key extraction position is determined using the same method as in step 13. That is, each character is converted into a Unicode code as shown in FIG. 11 (FIG. 11 (b)), and the character A with “character A code <character code appearing right next to character A” is set as the key extraction position. (FIG. 11 (c)), the key extraction position is expressed in the coordinate format of (number of characters when counting from the upper left corner to the right, number of characters when counting from the upper left corner to the lower), and output as shown in FIG. To do.

ステップ２０）キー抽出部３０３は、ステップ１９の出力を入力として受け付け、ステップ１４と同一の方法を用いてキー抽出を行う。すなわち、キー抽出位置にある文字と該文字の右隣にある文字を連結した２文字をキーとして図１３のように出力する。 Step 20) The key extraction unit 303 receives the output of Step 19 as an input, and performs key extraction using the same method as in Step 14. That is, two characters obtained by connecting the character at the key extraction position and the character on the right side of the character are output as keys as shown in FIG.

ステップ６）問い合わせ時問い合わせステップ：抽出されたキーを用いて問い合わせを行う。 Step 6) Inquiry step during inquiry: An inquiry is made using the extracted key.

ステップ２１）検索問い合わせ部３０７は、ステップ２０で出力されたキーを入力として受け付け、各キーに対応するドキュメント名、ドキュメントにおける位置を検索インデックスＤＢ３０５に問い合わせる。 Step 21) The search query unit 307 receives the key output in Step 20 as an input, and queries the search index DB 305 for the document name corresponding to each key and the position in the document.

ステップ２２）図１４のように各検索キーの検索問い合わせ結果を集計して件数が最多である元ドキュメント名および元ドキュメントにおける位置を特定する。検索インデックス作成時（ステップ１３）と検索問い合わせ時（ステップ１９）のキー抽出位置決定方法が同一であるため、検索問い合わせ時に「ドキュメント１」の２ページから抽出した検索キーは、すべて「ドキュメント１」の２ページと関連付けられて検索インデックス内に含まれているため、問い合わせ結果を集計して最多件数の結果を求めると、これは常に正しい検索結果（この場合は「ドキュメント１」の２ページ）になる。 Step 22) As shown in FIG. 14, the search query results for each search key are tabulated to identify the original document name and the position in the original document with the largest number of cases. Since the key extraction position determination method at the time of search index creation (step 13) and search inquiry (step 19) is the same, all the search keys extracted from the two pages of “document 1” at the time of search inquiry are all “document 1”. Because it is included in the search index in association with these two pages, when the query results are aggregated to obtain the maximum number of results, this is always the correct search result (in this case, page 2 of “Document 1”). Become.

ステップ２３）検索問い合わせ部３０７は、コンテンツＤＢ３０８に問い合わせを行い、ステップ２２で特定した元ドキュメント名および元ドキュメントにおける位置に対応するコンテンツ（ここではhttp://content_1_2.html）を取得して出力する。ここでは、コンテンツＤＢ３０８には事前に図１５に示すデータが格納されていたとする。 Step 23) The search inquiry unit 307 makes an inquiry to the content DB 308, and acquires and outputs the content (here http: //content_1_2.html) corresponding to the original document name specified in Step 22 and the position in the original document. . Here, it is assumed that the data shown in FIG. 15 is stored in the content DB 308 in advance.

ステップ７）問い合わせ時結果出力ステップ：問い合わせ結果を表示する。 Step 7) Inquiry result output step: The inquiry result is displayed.

ステップ２４）サーバ側データ送受信部３０６は、ステップ２３の出力を入力として受け付け、ネットワークを通じてクライアント部３００に出力する。 Step 24) The server-side data transmission / reception unit 306 receives the output of step 23 as an input and outputs it to the client unit 300 through the network.

ステップ２５）クライアント側データ送受信部４０２は、ステップ２４の出力を入力として受け付け、出力する。 Step 25) The client side data transmitting / receiving unit 402 receives and outputs the output of Step 24 as an input.

ステップ２６）コンテンツ表示部４０３は、ステップ２５の出力を入力として受け付け、コンテンツとして表示する。ここではコンテンツ表示部４０３は携帯電話のディスプレイであるとし、コンテンツであるhttp://content_1_2.htmlの内容を図１６のようにWebブラウザで表示する。 Step 26) The content display unit 403 receives the output of step 25 as an input and displays it as content. Here, it is assumed that the content display unit 403 is a mobile phone display, and the content of http: //content_1_2.html, which is the content, is displayed by a Web browser as shown in FIG.

［第２の実施の形態］
本実施の形態は第１の実施の形態から、検索インデックス作成時におけるキー抽出位置決定（ステップ１３）、検索問い合わせ時におけるキー抽出位置決定（ステップ１９）のみを変更したものであり、その他の処理方法は第１の実施の形態と同様である。 [Second Embodiment]
This embodiment is different from the first embodiment in that only the key extraction position determination at the time of search index creation (step 13) and the key extraction position determination at the time of search inquiry (step 19) are changed. The method is the same as in the first embodiment.

図１７は、本発明の第２の実施の形態における検索インデックス作成処理のフローチャートである。ここではステップ１３を変更したステップ１０１３、ステップ１９を変更したステップ１０１９についてのみ説明する。 FIG. 17 is a flowchart of search index creation processing according to the second embodiment of this invention. Here, only step 1013 in which step 13 is changed and step 1019 in which step 19 is changed will be described.

ステップ１０１３）キー抽出位置決定部３０２は、ステップ１２の出力を入力として受け付け、キー抽出位置を決定する。キー抽出位置は文字コードのパターンを用いて決定する。ここでは図１７のように各文字をUnicodeコードに変換し（図１８（ｂ））、「文字Ａのコード×2 ＜文字Ａの右隣に出現する文字のコード」となる文字Ａをキー抽出位置として特定する（図１８（ｃ））。以降はステップ１３と同様の出力を行う。 Step 1013) The key extraction position determination unit 302 receives the output of Step 12 as an input, and determines the key extraction position. The key extraction position is determined using a character code pattern. Here, each character is converted into a Unicode code as shown in FIG. 17 (FIG. 18 (b)), and the key A is extracted from the character A, which is “character A code × 2 <character code appearing right next to character A”. The position is specified (FIG. 18C). Thereafter, the same output as in step 13 is performed.

図１９は、本発明の第２の実施の形態における検索問い合わせ処理のフローチャートである。 FIG. 19 is a flowchart of search inquiry processing in the second embodiment of this invention.

ステップ１０１９）キー抽出位置決定部３０２は、ステップ１８の出力を入力として受け付け、キー抽出位置を決定する。キー抽出位置はステップ１０１３と同一の方法を用いて決定する。すなわち、図１８のように各文字をUnicodeコードに変換し（図２０（ｂ））、「文字Ａのコード×２＜文字Ａの右隣に出現する文字のコード」となる文字Ａをキー抽出位置として特定する（図２０（ｂ））。以降はステップ１９と同様の出力を行う。 Step 1019) The key extraction position determination unit 302 receives the output of Step 18 as an input, and determines the key extraction position. The key extraction position is determined using the same method as in step 1013. That is, each character is converted into a Unicode code as shown in FIG. 18 (FIG. 20B), and a character A that is “character A code × 2 <character code appearing to the right of character A” is extracted as a key. The position is specified (FIG. 20B). Thereafter, the same output as in step 19 is performed.

［第３の実施の形態］
本実施の形態は第１の実施の形態から、検索インデックス作成時におけるキー抽出位置決定（ステップ１３）、検索問い合わせ時におけるキー抽出位置決定（ステップ１９）のみを変更したものであり、その他の処理方法は第１の実施の形態と同様である。 [Third Embodiment]
This embodiment is different from the first embodiment in that only the key extraction position determination at the time of search index creation (step 13) and the key extraction position determination at the time of search inquiry (step 19) are changed. The method is the same as in the first embodiment.

図２１は、本発明の第３の実施の形態における検索インデックス作成処理のフローチャートである。 FIG. 21 is a flowchart of search index creation processing according to the third embodiment of the present invention.

ここではステップ１３を変更したステップ2013についてのみ説明する。 Here, only step 2013 in which step 13 is changed will be described.

ステップ２０１３）キー抽出位置決定部３０２は、ステップ１２の出力を入力として受け付け、キー抽出位置を決定する。キー抽出位置は文字コードのパターンを用いて決定する。ここでは図２２のように各文字をUnicodeコードに変換し（図２２（ｂ））、「文字Ａのコードが奇数」となる文字Ａをキー抽出位置として特定する（図２２（ｃ））。以降はステップ１３と同様の出力を行う。 Step 2013) The key extraction position determination unit 302 receives the output of step 12 as an input, and determines the key extraction position. The key extraction position is determined using a character code pattern. Here, each character is converted into a Unicode code as shown in FIG. 22 (FIG. 22B), and the character A with “the code of character A is odd” is specified as the key extraction position (FIG. 22C). Thereafter, the same output as in step 13 is performed.

図２３は、本発明の第３の実施の形態における検索問い合わせ処理のフローチャートえある。ここでは、ステップ１９を変更したステップ２０１９についてのみ説明する。 FIG. 23 is a flowchart of search inquiry processing according to the third embodiment of the present invention. Here, only step 2019 in which step 19 is changed will be described.

ステップ２０１９）キー抽出位置決定部３０２は、ステップ１８の出力を入力として受け付け、キー抽出位置を決定する。キー抽出位置はステップ２０１３と同一の方法を用いて決定する。すなわち、図２４のように各文字をUnicodeコードに変換し（図２４（ｂ））、「文字Ａのコードが奇数」となる文字Ａをキー抽出位置として特定する（図２４（ｃ））。以降はステップ１９と同様の出力を行う。 Step 2019) The key extraction position determination unit 302 receives the output of Step 18 as an input, and determines the key extraction position. The key extraction position is determined using the same method as in step 2013. That is, as shown in FIG. 24, each character is converted into a Unicode code (FIG. 24B), and the character A with “the code of character A is an odd number” is specified as the key extraction position (FIG. 24C). Thereafter, the same output as in step 19 is performed.

なお、本発明は、上記の第１〜第３の実施の形態におけるサーバ部３００の動作をプログラムとして構築し、サーバ部として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 In the present invention, the operation of the server unit 300 in the first to third embodiments described above is constructed as a program, installed in a computer used as the server unit and executed, or distributed via a network. It is possible to make it.

また、本発明は、構築されたプログラムを、ハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 In the present invention, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の点に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 In addition, this invention is not limited to said point, A various change and application are possible within a claim.

１００ドキュメント
２００ドキュメント読み取り装置
２０１光学文字認識装置
３００サーバ部
３０１データ入力部
３０２キー抽出位置決定部
３０３キー抽出部
３０４検索インデックス出力部
３０５検索インデックスＤＢ
３０６サーバ側データ送受信部
３０７検索問い合わせ部
３０８コンテンツＤＢ
４００クライアント部
４０１ドキュメント撮影部
４０２クライアント側データ送受信部
４０３コンテンツ表示部 100 Document 200 Document Reading Device 201 Optical Character Recognition Device 300 Server Unit 301 Data Input Unit 302 Key Extraction Position Determination Unit 303 Key Extraction Unit 304 Search Index Output Unit 305 Search Index DB
306 Server-side data transmission / reception unit 307 Search inquiry unit 308 Content DB
400 Client unit 401 Document photographing unit 402 Client-side data transmission / reception unit 403 Content display unit

Claims

Create a search index for responding to a search request to obtain a document in which the area appears and a position in the document by using a partial area in the document in which the page break or line break position is fixed as a search query. A search device to perform,
A document input means for accepting input of documents to be indexed;
Index key extraction position determining means for determining a position for extracting an index key from the whole or a partial area of the document based on characteristics of a character code representing a character;
Index key extracting means for extracting an index key composed of a combination of one or more characters at the position from the whole or a partial area of the document;
An index output means for associating the index key with an appearance position in a document in which the index key appears, and outputting to the index storage means;
A search device comprising:

The index key extraction position determining means includes
Including means for determining a position for extracting an index key composed of a combination of one or more characters from the whole or a partial area of the document based on the relationship between the characters and the character codes of the characters existing in the vicinity thereof
The search device according to claim 1.

Query input means for accepting a partial area in a document as a search query,
Query key extraction position determination means for determining a position from which the query key is extracted from the search query based on the characteristics of the character code representing the character;
Query key extraction means for extracting a query key consisting of a combination of one or more characters from the search query;
Search means for searching the index storage means based on the query key and outputting the search results;
The search device according to claim 1, further comprising:

The query key extraction position determining means includes
Means for determining a position from which a query key consisting of a combination of one or more characters is extracted from the search query based on the relationship between the character and the character code of the character existing in the vicinity thereof,
The search device according to claim 3.

Create a search index for responding to a search request to obtain a document in which the area appears and a position in the document by using a partial area in the document in which the page break or line break position is fixed as a search query. A search method to perform,
A document input step in which the input means receives input of a document to be indexed;
An index key extraction position determination means for determining an index key extraction position from the whole or a partial area of the document based on the characteristics of the character code representing the character;
An index key extracting unit that extracts an index key composed of a combination of one or more characters at the position from the whole or a partial area of the document; and
An index output means for associating the index key with an appearance position in the document in which the index key appears, and outputting to the index storage means;
The search method characterized by performing.

In the index key extraction position determination step,
The search according to claim 5, wherein a position for extracting an index key composed of a combination of one or more characters from the whole or a partial area of the document is determined based on a relationship between a character and a character code of a character existing in the vicinity thereof. Method.

A query input step in which the query input means accepts a partial area in a document as a search query;
A query key extraction position determining means for determining a position from which the query key is extracted from the search query based on a characteristic of a character code representing the character;
A query key extracting means for extracting a query key comprising a combination of one or more characters from the search query;
A search means for searching the index storage means based on the query key and outputting the search results;
The search method according to claim 5, further comprising:

In the query key extraction position determination step,
The search method according to claim 7, wherein a position for extracting a query key composed of a combination of one or more characters from the search query is determined based on a relationship between the characters and character codes of characters existing in the vicinity thereof.

The program for functioning a computer as each means which comprises the search device of any one of Claims 1 thru | or 4.