JP2000259847A

JP2000259847A - Information retrieval method and device and recording medium

Info

Publication number: JP2000259847A
Application number: JP11057973A
Authority: JP
Inventors: Shigeki Ouchi; 茂樹大内
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-03-05
Filing date: 1999-03-05
Publication date: 2000-09-22

Abstract

PROBLEM TO BE SOLVED: To perform display as a text using a character code at the time of a recognized result understandable to a user, to perform the display as a partial image when confidence is low and to improve retrievability and visibility by using the confidence of character recognition. SOLUTION: A logical element is extracted (3) from inputted document images (1 and 2), whether or not the extracted logical element is a character string area is identified (4) and the identified character string area is character recognized (5). Then, the display is performed as a text when the confidence of the recognized result is equal to or more than a threshold and the display is performed as a partial image when it is less than the threshold.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ファクシミリやイ
メージスキャナ等の画像入力装置から入力された文書画
像データのデータベースから、検索の利便性を向上させ
るために、文書内容の論理構造を的確に把握できるよう
な文書中の領域を抽出・表示するための情報検索方法、
情報検索装置および情報検索プログラムを記録した記録
媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention accurately grasps a logical structure of document contents from a database of document image data input from an image input device such as a facsimile or an image scanner in order to improve the convenience of retrieval. Information retrieval method to extract and display areas in documents that can be
The present invention relates to an information search device and a recording medium storing an information search program.

【０００２】[0002]

【従来の技術】従来、文書画像の検索においては、検索
時の利便性を考慮して、画像入力装置から入力された文
書画像に対して、オペレータが手作業でタイトル情報や
キーワード情報などの論理構造情報を付加したり、ある
いは定形文書に対しては、文書中の特定の位置をタイト
ル・キーワード等として切り出していた。2. Description of the Related Art Conventionally, in retrieving a document image, an operator manually inputs logical information such as title information and keyword information to a document image input from an image input device in consideration of convenience at the time of retrieval. In the case where structure information is added or a fixed position document is extracted, a specific position in the document is cut out as a title / keyword or the like.

【０００３】[0003]

【発明が解決しようとする課題】しかし、上記したオペ
レータによる情報の付加は、文書量の増大に伴って作業
量も増大するという問題があり、また文書中の特定位置
の自動切り出しは、定形文書に限定されてしまう。However, the above-described addition of information by the operator has a problem that the amount of work increases with an increase in the amount of the document, and automatic extraction of a specific position in the document requires a fixed-size document. It is limited to.

【０００４】そこで、これを解決するものとして、特開
平９−１３４４０６号公報、特開平５−２７４４７１号
公報に記載されているように、非定形文書のレイアウト
的特徴（論理要素）を用いて自動的にタイトルを抽出す
る方法が提案されている。しかし、何れの方法でも、抽
出した結果をイメージとして表示すれば、電子的な検索
に使用することができず、また文字認識した結果を表示
すれば、文字認識のエラーによってユーザが理解できな
い、つまり自然言語として存在し得ない文字が表示され
る場合がある。To solve this problem, as described in JP-A-9-134406 and JP-A-5-274471, automatic layout features (logical elements) of an irregular document are used. There has been proposed a method for extracting titles. However, in any method, if the extracted result is displayed as an image, it cannot be used for an electronic search, and if the result of character recognition is displayed, the user cannot understand due to a character recognition error. Characters that cannot exist as a natural language may be displayed.

【０００５】本発明は、上記した従来の論理要素の抽出
および表示方法の問題点を解決するためになされたもの
で、本発明の目的は、文字認識の確信度を用いることに
より、ユーザが理解できる認識結果であれば文字コード
を用いたテキストとして表示し、確信度が低ければ部分
画像として表示し、検索性と視認性を向上させた情報検
索方法、装置および情報検索プログラムを記録した記録
媒体を提供することにある。SUMMARY OF THE INVENTION The present invention has been made to solve the problems of the above-described conventional logic element extraction and display methods. It is an object of the present invention to use a certainty factor of character recognition so that a user can understand it. If the recognition result can be obtained, it is displayed as text using a character code, and if the degree of certainty is low, it is displayed as a partial image, and an information search method, an apparatus, and a recording medium that records an information search program with improved searchability and visibility are provided. Is to provide.

【０００６】[0006]

【課題を解決するための手段】前記目的を達成するため
に、請求項１記載の発明では、文書画像から論理要素を
抽出して表示する情報検索方法であって、前記抽出され
た論理要素の文字領域に対して文字認識を行ない、該認
識結果の確信度が所定のしきい値以上であるとき前記文
字領域を文字コードで表示し、しきい値未満であるとき
前記文字領域を部分画像として表示することを特徴とし
ている。According to an aspect of the present invention, there is provided an information retrieval method for extracting a logical element from a document image and displaying the extracted logical element. Perform character recognition on the character region, display the character region as a character code when the certainty factor of the recognition result is equal to or more than a predetermined threshold, and when the confidence is less than the threshold, the character region as a partial image It is characterized by displaying.

【０００７】請求項２記載の発明では、前記抽出された
論理要素の文字領域が、文書タイトルに相当する領域で
あるか否かを判定し、文書タイトルと判定された文字領
域についての文字認識結果の確信度を基に、前記文字領
域を文字コードまたは部分画像で表示することを特徴と
している。According to the second aspect of the present invention, it is determined whether or not the character area of the extracted logical element is an area corresponding to a document title, and the character recognition result of the character area determined to be the document title is determined. The character area is displayed as a character code or a partial image based on the certainty factor.

【０００８】請求項３記載の発明では、予め登録された
定型文書との比較結果を基に、前記文書タイトルに相当
する領域を判定することを特徴としている。According to a third aspect of the present invention, an area corresponding to the document title is determined based on a result of comparison with a pre-registered standard document.

【０００９】請求項４記載の発明では、文書画像を入力
する手段と、該文書画像から所定の論理要素を抽出する
手段と、該抽出された論理要素の文字領域を認識処理す
る手段と、該認識結果の確信度を基に前記文字領域を文
字コードまたは部分画像で表示する手段とを備えたこと
を特徴としている。According to the invention described in claim 4, means for inputting a document image, means for extracting a predetermined logical element from the document image, means for recognizing a character area of the extracted logical element, Means for displaying the character area as a character code or a partial image based on the certainty factor of the recognition result.

【００１０】請求項５記載の発明では、文書画像を入力
する機能と、該文書画像を格納する機能と、該格納され
た文書画像から所定の論理要素を抽出する機能と、該抽
出された論理要素の文字領域を認識処理する機能と、該
認識結果である確信度と文字コードと文字の座標を格納
する機能と、前記文字領域の認識結果である確信度が所
定のしきい値以上であるか否かを判定する機能と、しき
い値以上であると判定されたとき、前記文字領域の文字
コードを表示する機能と、しきい値以上でないと判定さ
れたとき、前記文字領域の文字座標を基に前記文書画像
を参照して、前記文字領域の部分画像を表示する機能を
コンピュータに実現させるためのプログラムを記録した
コンピュータ読み取り可能な記録媒体であることを特徴
としている。According to the present invention, a function of inputting a document image, a function of storing the document image, a function of extracting a predetermined logical element from the stored document image, and a function of A function of recognizing a character area of an element, a function of storing a certainty factor as a result of the recognition, a character code, and a coordinate of a character, and a certainty factor of a recognition result of the character area being equal to or greater than a predetermined threshold value A function of determining whether or not the character area is greater than or equal to a threshold, a function of displaying a character code of the character area, and a character coordinate of the character area when it is less than the threshold. A computer-readable recording medium that records a program for causing a computer to realize a function of displaying a partial image of the character area with reference to the document image based on the document image.

【００１１】[0011]

【発明の実施の形態】以下、本発明の一実施例を図面を
用いて具体的に説明する。（実施例１）図１は、本発明の実施例１に係る情報検索
装置の構成を示す。図において、１は文書画像を入力す
る画像入力手段、２は入力された文書画像を格納する文
書画像格納手段、３は文書画像から論理要素を抽出する
論理要素抽出手段、４は抽出された論理要素が文字列領
域であることを識別する文字列領域識別手段、５は文字
列領域に対して認識処理する文字認識手段、６は認識結
果などを格納する認識結果格納手段、７は抽出された論
理要素を文字コードあるいは部分画像として表示する表
示手段である。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be specifically described below with reference to the drawings. (Embodiment 1) FIG. 1 shows the configuration of an information retrieval apparatus according to Embodiment 1 of the present invention. In the figure, 1 is an image input means for inputting a document image, 2 is a document image storage means for storing an input document image, 3 is a logical element extracting means for extracting a logical element from the document image, and 4 is an extracted logic. A character string area identifying means for identifying that the element is a character string area, 5 is a character recognizing means for performing recognition processing on the character string area, 6 is a recognition result storing means for storing a recognition result and the like, and 7 is an extracted character. This is display means for displaying a logical element as a character code or a partial image.

【００１２】図２は、本発明の実施例１の処理フローチ
ャートである。スキャナなどの画像入力手段１によって
文書画像が入力され、文書画像格納手段（データベー
ス）２に格納される（ステップ１０１）。論理要素抽出
手段３は、文書画像のスキュー補正などの前処理を行っ
た後、文書画像のレイアウト的な特徴である論理要素
（タイトル、作成者など）を抽出する（ステップ１０
２）。なお、このような論理要素抽出手段としては、公
知の手法を用いればよい。FIG. 2 is a processing flowchart of the first embodiment of the present invention. A document image is input by an image input unit 1 such as a scanner, and stored in a document image storage unit (database) 2 (step 101). After performing preprocessing such as skew correction of the document image, the logical element extraction unit 3 extracts a logical element (title, creator, etc.) which is a layout characteristic of the document image (step 10).
2). Note that a known method may be used as such a logical element extraction unit.

【００１３】次に、文字列領域識別手段４は、抽出され
た論理要素が文字列領域であるか否かを識別し（ステッ
プ１０３）、文字認識手段５は、識別された文字列領域
の認識処理を行ない、文字認識の結果得られる文字コー
ドが認識の確信度（類似度）と文字の座標と共に認識結
果格納手段６保存される（ステップ１０４）。Next, the character string area identifying means 4 identifies whether the extracted logical element is a character string area (step 103), and the character recognizing means 5 recognizes the identified character string area. Processing is performed, and the character code obtained as a result of the character recognition is stored in the recognition result storage means 6 together with the degree of certainty (similarity) of the recognition and the coordinates of the character (step 104).

【００１４】そして、上記したように抽出された論理要
素の確信度を参照して、確信度が所定のしきい値以上の
とき（ステップ１０５）、認識結果格納手段６に保存さ
れている文字認識結果をテキスト（文字コード）として
表示手段７に表示し（ステップ１０６）、しきい値以上
でないとき、認識結果格納手段６に保存されている文字
の座標値を用いて文書画像格納手段２から部分画像を読
み出し、表示手段７に表示する（ステップ１０７）。Referring to the certainty factor of the logical element extracted as described above, when the certainty factor is equal to or greater than a predetermined threshold value (step 105), the character recognition stored in the recognition result storage means 6 is performed. The result is displayed as text (character code) on the display means 7 (step 106). If the result is not equal to or larger than the threshold value, the document image storage means 2 uses the character coordinate values stored in the recognition result storage means 6 The image is read out and displayed on the display means 7 (step 107).

【００１５】なお、上記した実施例では、抽出した論理
要素について文字認識しているが、文書画像を文字認識
した後に、論理要素を抽出し、その確信度を基にテキス
トまたは部分画像で表示させるようにしてもよい。In the above-described embodiment, the extracted logical element is subjected to character recognition. However, after character recognition of the document image, the logical element is extracted and displayed as a text or a partial image based on the certainty factor. You may do so.

【００１６】（実施例２）実施例２は、論理要素として
文書タイトルを抽出して表示する実施例である。図３
は、実施例２の構成を示す。図において、２１は画像入
力手段、２２は文書画像格納手段、２３は文字列領域抽
出手段、２４は文字認識手段、２５はフォント識別手
段、２６は自然言語解析手段、２７は属性抽出手段、２
８はポイント計算手段、２９は認識結果格納手段、３０
はタイトル抽出手段、３１は表示手段である。図４は、
実施例２の処理フローチャートである。(Embodiment 2) Embodiment 2 is an embodiment in which a document title is extracted and displayed as a logical element. FIG.
Shows the configuration of the second embodiment. In the figure, 21 is an image input unit, 22 is a document image storage unit, 23 is a character string area extracting unit, 24 is a character recognizing unit, 25 is a font identifying unit, 26 is a natural language analyzing unit, 27 is an attribute extracting unit,
8 is a point calculation means, 29 is a recognition result storage means, 30
Is a title extracting means, and 31 is a display means. FIG.
9 is a processing flowchart according to the second embodiment.

【００１７】画像入力手段２１によって文書画像を入力
し、文書画像格納手段（データベース）２２に蓄積する
（ステップ２０１）。文字列領域抽出手段２３は、文字
列領域を矩形として抽出する（ステップ２０２）。文字
認識手段２４は抽出された文字列領域を認識処理し、ま
たフォント識別手段２５は認識処理された各文字のフォ
ントを識別し、自然言語解析手段２６は認識結果につい
て言語解析する。また、属性抽出手段２７は、文書画像
から下線などの属性を抽出する（ステップ２０３）。A document image is input by the image input means 21 and stored in the document image storage means (database) 22 (step 201). The character string area extracting unit 23 extracts the character string area as a rectangle (Step 202). The character recognition means 24 recognizes the extracted character string area, the font identification means 25 identifies the font of each recognized character, and the natural language analysis means 26 analyzes the language of the recognition result. The attribute extracting unit 27 extracts an attribute such as an underline from the document image (step 203).

【００１８】次いで、ポイント計算手段２８は、文字列
矩形の座標値、大きさなどにタイトルらしさのポイント
を与える。例えば、一般にタイトルの文字サイズは、本
文など他の文字サイズよりも大きいので、文字サイズが
大きいときタイトルらしさのポイントとして高い値を与
える。ポイント計算手段２８は、文書画像の各行につい
て、それらの各属性のポイントを基にタイトルらしさの
総合スコア（例えば、重み付きポイントの総和）を算出
する（ステップ２０４）。Next, the point calculation means 28 gives a title-like point to the coordinate value and size of the character string rectangle. For example, the character size of a title is generally larger than the size of other characters such as the text, so that when the character size is large, a high value is given as a point of title-likeness. The point calculation means 28 calculates a total score (for example, a sum of weighted points) of the lines of the document image based on the points of the respective attributes (step 204).

【００１９】タイトル抽出手段３０では、各行の内、最
もスコアが大きい行をタイトル候補行として出力する
（ステップ２０５）。そして、認識結果格納手段２９に
保存されているタイトル候補行内の文字列の確信度を参
照して、所定のしきい値以上のとき、タイトルを文字コ
ードとして表示手段３１に表示し（ステップ２０７）、
そうでないとき認識結果格納手段２９に保存されている
座標値を用いて文書画像格納手段２２から部分画像を読
み出し、表示手段３１に表示する（ステップ２０８）。The title extracting means 30 outputs the line having the highest score among the lines as a title candidate line (step 205). Then, referring to the certainty factor of the character string in the title candidate line stored in the recognition result storage unit 29, when the predetermined value is exceeded, the title is displayed on the display unit 31 as a character code (step 207). ,
Otherwise, the partial image is read from the document image storage means 22 using the coordinate values stored in the recognition result storage means 29 and displayed on the display means 31 (step 208).

【００２０】（実施例３）実施例３は、登録された定型
文書を用いてタイトルを抽出する実施例である。図５
は、実施例３の構成を示し、４３は定型文書を登録した
文書テンプレート登録手段、４４は入力文書と定型文書
とを比較する文書画像比較手段、４５は比較の結果、一
致したときにテンプレートマッチングによってタイトル
を抽出するテンプレートマッチング手段、４６は一致し
ないとき実施例２の方法によってタイトルを抽出するタ
イトル抽出手段、４７は抽出したタイトルを表示する表
示手段である。(Embodiment 3) Embodiment 3 is an embodiment in which a title is extracted using a registered standard document. FIG.
Denotes a configuration of the third embodiment, 43 denotes a document template registration unit for registering a standard document, 44 denotes a document image comparing unit for comparing an input document with a standard document, and 45 denotes a template matching when a match is found as a result of the comparison. Is a template matching means for extracting the title by the method, 46 is a title extracting means for extracting the title by the method of the second embodiment when they do not match, and 47 is a display means for displaying the extracted title.

【００２１】図６は、実施例３の処理フローチャートで
ある。画像入力手段４１から文書画像を入力し、文書画
像格納手段４２に蓄積する（ステップ３０１）。文書画
像比較手段４４は、入力された文書画像と登録されてい
る複数の定型文書とを比較する（ステップ３０２）。比
較の結果、レイアウト特徴が一致したときには、テンプ
レートマッチング手段４５は一致した定型文書と入力文
書画像とをマッチング処理することにより、文書画像か
らタイトルを抽出する（ステップ３０３）。一致しない
ときは、前述した実施例２と同様に、タイトル抽出手段
４６（図３と同様に構成されている）はタイトルらしさ
のポイントを計算することにより、タイトルを抽出する
（ステップ３０４）。FIG. 6 is a processing flowchart of the third embodiment. A document image is input from the image input unit 41 and stored in the document image storage unit 42 (Step 301). The document image comparing means 44 compares the input document image with a plurality of registered standard documents (step 302). When the layout features match as a result of the comparison, the template matching unit 45 performs a matching process on the matched fixed-form document and the input document image to extract a title from the document image (step 303). If they do not match, as in the above-described second embodiment, the title extracting means 46 (configured in the same manner as in FIG. 3) extracts the title by calculating the points of the title likeness (step 304).

【００２２】（実施例４）実施例４は、本発明をソフト
ウェアによって実現する実施例である。図７は、実施例
４の構成を示す。ＣＤ−ＲＯＭなどのコンピュータ読み
取り可能な記録媒体には、本発明の情報検索機能を実現
するプログラムなどが記録されている。また、文書画像
はスキャナなどから取り込まれ、ハードディスクなどに
格納されている。そして、該プログラムが起動される
と、文書画像データが読み込まれて、文書タイトルなど
の論理要素を抽出して認識処理を実行し、認識結果に応
じて文書タイトルを文字コードあるいは部分画像として
ディスプレイなどに表示する。(Embodiment 4) Embodiment 4 is an embodiment in which the present invention is realized by software. FIG. 7 shows a configuration of the fourth embodiment. On a computer-readable recording medium such as a CD-ROM, a program for realizing the information search function of the present invention is recorded. The document image is captured by a scanner or the like and stored in a hard disk or the like. When the program is started, the document image data is read, a logical element such as a document title is extracted and recognition processing is executed, and the document title is displayed as a character code or a partial image according to the recognition result. To be displayed.

【００２３】[0023]

【発明の効果】以上、説明したように、本発明によれ
ば、文書画像から論理要素をキーに検索する場合に、部
分画像あるいは文字コード（テキスト）で表示するよう
に構成しているので、目視による検索に便利な視認性
と、電子的な検索に便利な検索性との両立を図ることが
可能になる。これによって、イメージスキャナやファク
シミリ等の画像入力装置から入力され、情報処理装置の
記憶媒体等に画像として蓄積されている多数の文書画像
を検索するときに、その操作性を向上させることができ
る。As described above, according to the present invention, when a search is performed using a logical element as a key from a document image, a partial image or a character code (text) is displayed. It is possible to achieve both visibility that is convenient for visual search and searchability that is convenient for electronic search. This makes it possible to improve the operability when searching for a large number of document images input from an image input device such as an image scanner or a facsimile and stored as images in a storage medium or the like of the information processing device.

[Brief description of the drawings]

【図１】本発明の実施例１の構成を示す。FIG. 1 shows a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の実施例１の処理フローチャートであ
る。FIG. 2 is a processing flowchart according to the first embodiment of the present invention.

【図３】本発明の実施例２の構成を示す。FIG. 3 shows a configuration of a second exemplary embodiment of the present invention.

【図４】本発明の実施例２の処理フローチャートであ
る。FIG. 4 is a processing flowchart according to a second embodiment of the present invention.

【図５】本発明の実施例３の構成を示す。FIG. 5 shows a configuration of a third embodiment of the present invention.

【図６】本発明の実施例３の処理フローチャートであ
る。FIG. 6 is a processing flowchart according to a third embodiment of the present invention.

【図７】本発明の実施例４の構成を示す。FIG. 7 shows a configuration of a fourth embodiment of the present invention.

[Explanation of symbols]

１画像入力手段２文書画像格納手段３論理要素抽出手段４文字列領域識別手段５文字認識手段６認識結果格納手段７表示手段 DESCRIPTION OF SYMBOLS 1 Image input means 2 Document image storage means 3 Logical element extraction means 4 Character string area identification means 5 Character recognition means 6 Recognition result storage means 7 Display means

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B050 BA10 BA16 BA20 CA04 CA07 DA03 EA03 EA06 EA18 FA02 GA08 5B064 AA01 BA01 CA08 DA02 DA17 DC20 EA08 EA18 EA28 FA03 FA11 FA13 5B075 ND07 PP02 PP10 PQ02 PR06 QM08 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5B050 BA10 BA16 BA20 CA04 CA07 DA03 EA03 EA06 EA18 FA02 GA08 5B064 AA01 BA01 CA08 DA02 DA17 DC20 EA08 EA18 EA28 FA03 FA11 FA13 5B075 ND07 PP02 PP10 PQ02 PR06 QM08

Claims

[Claims]

1. An information retrieval method for extracting and displaying a logical element from a document image, wherein character recognition is performed on a character region of the extracted logical element, and a certainty factor of the recognition result is determined. An information retrieval method, wherein the character area is displayed as a character code when the value is equal to or more than a threshold value, and the character area is displayed as a partial image when the value is less than a threshold value.

2. The character region of the extracted logical element,
Determining whether the area is equivalent to a document title, and displaying the character area as a character code or a partial image based on the certainty of the character recognition result for the character area determined as the document title. 2. The information retrieval method according to claim 1, wherein:

3. The information search method according to claim 2, wherein an area corresponding to the document title is determined based on a result of comparison with a pre-registered standard document.

4. A means for inputting a document image, a means for extracting a predetermined logical element from the document image, a means for recognizing a character area of the extracted logical element, and a method for determining the degree of certainty of the recognition result. Means for displaying the character area as a character code or a partial image based on the information.

5. A function for inputting a document image, a function for storing the document image, a function for extracting a predetermined logical element from the stored document image, and recognizing a character area of the extracted logical element. A function of processing, a function of storing the certainty factor, the character code, and the coordinates of the character as the recognition result, and determining whether the certainty factor as the recognition result of the character area is equal to or greater than a predetermined threshold value A function for displaying a character code of the character area when it is determined to be greater than or equal to a threshold value, and a function for displaying the character code of the character area when it is determined to be less than or equal to the threshold value. , A computer-readable recording medium storing a program for causing a computer to realize a function of displaying a partial image of the character area.