JP4802502B2

JP4802502B2 - Word recognition device and word recognition method

Info

Publication number: JP4802502B2
Application number: JP2005013475A
Authority: JP
Inventors: 昌史古賀; 達也亀山; 竜治嶺; 寿一 ▲高▼橋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-01-21
Filing date: 2005-01-21
Publication date: 2011-10-26
Anticipated expiration: 2025-01-21
Also published as: CN100530217C; CN1808466A; JP2006202068A

Description

本発明は、文字列認識を利用する単語入力手段に関する技術分野に属する。 The present invention belongs to a technical field related to word input means using character string recognition.

従来より、紙に印刷ないし手書きされた文字を読取る装置はOCRとして知られている。主な応用分野は、帳票処理、郵便物の区分、文書のテキスト化などである。典型的なOCRでは、以下のような手順で文字を読取る。まず紙面をスキャナを用いて光電変換して計算機に取り込み(画像入力)、読取りの対象の領域を推定して個々の文字を切出し(文字切出し)、個々の文字が何であるかを識別し(文字識別)、言語情報などを利用して読取った文字群を文字列として解釈する(後処理)。こうしたOCRで日本語を認識する際には、特に言語情報を記憶するための手段(言語辞書)に多くの記憶容量が必要である。また、紙面上に複数の文字行があり、読取り対象となるものはその一部であることがある。このような場合、応用分野に応じて予め定められた規則に従い、自動的に装置が読取り対象となる文字行を判別する。 Conventionally, an apparatus that reads characters printed or handwritten on paper is known as OCR. The main application fields are form processing, mail classification, and text conversion of documents. In typical OCR, characters are read in the following procedure. First, the paper is photoelectrically converted into a computer using a scanner (image input), the area to be read is estimated, individual characters are cut out (character cutout), and the individual characters are identified (characters). The character group read using (identification) and language information is interpreted as a character string (post-processing). When recognizing Japanese by such OCR, a large amount of storage capacity is required especially for the means (language dictionary) for storing language information. In addition, there are a plurality of character lines on the page, and a part to be read may be a part thereof. In such a case, the apparatus automatically determines a character line to be read in accordance with a rule predetermined according to the application field.

一般に文字切出しの段階では、どの部分画像が正しい文字に対応するか特定するのが困難である。このため、様々な仮説に基づいて文字を切り出し、後処理で文字の切り出し方を特定する手法が広く用いられている。
また、類似した形状の文字がある場合には、文字識別処理単独では文字種を特定するのが困難なことがある。こうした場合には、文字識別は複数の候補文字を出力する。 In general, it is difficult to specify which partial image corresponds to a correct character at the stage of character extraction. For this reason, a method of cutting out characters based on various hypotheses and specifying how to cut out characters by post-processing is widely used.
In addition, when there are characters with similar shapes, it may be difficult to specify the character type by the character identification process alone. In such a case, character identification outputs a plurality of candidate characters.

近年は、携帯電話、PDA(personal digital assistant)などの携帯機器に搭載されたカメラを画像入力の手段として、文書、看板、標識などの文字を読取る試みが現われている。これらの機器での認識対象は、電話番号、メールアドレス、URL、英単語などである。また、認識結果は電話やメールの発信、WEBへのアクセス、単語の翻訳などのサービスに用いられる。このような用途では、利用者が携帯機器により身の回りにある文書、看板、標識などを自在に読取り、サービスを受けることを想定している。このため、操作の容易さ、待ち時間の短さなどを実現することが必要となっている。 In recent years, attempts have been made to read characters such as documents, signboards, signs, and the like using a camera mounted on a portable device such as a mobile phone or a PDA (personal digital assistant) as an image input means. The recognition targets on these devices are phone numbers, e-mail addresses, URLs, English words, and the like. The recognition results are used for services such as telephone and e-mail transmission, web access, and word translation. In such an application, it is assumed that the user can freely read documents, signboards, signs, etc. around him / her with a portable device and receive a service. For this reason, it is necessary to realize ease of operation, short waiting time, and the like.

従来の技術の認識対象の場合、画像中から読取対象の文字列を特定するのが比較的容易であった。例えば、電話番号の場合には、通常直前に「Tel.」などの文字列が記載されている上、全体の桁数や括弧、ハイフンのつけ方に規則性がある。また、メールアドレスやURLなどでも、冒頭に「http:」がある、途中に「＠」が現れる、「.com」「.jp」などで終わる、などの規則性がある。こうした規則性を利用して、自動的に認識対象の文字列を検出することは従来の技術でも可能である。また、英単語の場合には単語の前後に空白がある。このため、大まかな位置の指定に基づいて認識対象の単語を特定することは容易であった。例えば、山崎正裕他、「OCR機能を応用した携帯電話向け電子辞書機能の開発」（電子情報通信学会2004年総合大会講演論文集D-12-35）（非特許文献１）では、操作者が画面中央のマークに読み取りたい英単語を合わせることで、その近辺の英単語を読み取り、単語の翻訳結果を表示する応用例が記載されている。 In the case of the recognition target of the conventional technology, it is relatively easy to specify the character string to be read from the image. For example, in the case of a telephone number, a character string such as “Tel.” Is usually written immediately before, and the number of digits, parentheses, and hyphens are regular. Also, email addresses and URLs have regularity such as “http:” at the beginning, “@” in the middle, and “.com”, “.jp”, etc. The conventional technique can automatically detect the character string to be recognized using such regularity. In the case of English words, there are spaces before and after the word. For this reason, it is easy to specify a word to be recognized based on a rough position specification. For example, in Masahiro Yamazaki et al., "Development of an electronic dictionary function for mobile phones using the OCR function" (Proceedings of the 2004 IEICE General Conference Proceedings D-12-35) (Non-patent Document 1) An application example is described in which an English word to be read is matched with a mark at the center of the screen, and an English word in the vicinity is read and a translation result of the word is displayed.

しかし、日本語や中国語など、単語間に空白を空けることなく記述される言語の文字列から単語を認識する場合には、読取対象の文字列を特定するのが困難である。これは、日本語の場合、単語間に空白を置かずに印刷したり書いたりするからである。例えば「臨時営繕費用請求」という文字行中の「営繕」という単語を読取るために、操作者がマークを「営繕」の中央に合わせても、どの範囲が操作者の期待する読取範囲かを自動的に特定することは困難である。代案として、読み取り領域を矩形で指定する方式があるが、これは著しく操作量を増加させ、機器の利便性が低下する。 However, when recognizing a word from a character string written in a language such as Japanese or Chinese without a space between words, it is difficult to specify the character string to be read. This is because Japanese prints and writes without spaces between words. For example, even if the operator aligns the mark with the center of “repair” to read the word “repair” in the text “temporary repair cost claim”, it automatically determines which range is expected by the operator. It is difficult to specify it. As an alternative, there is a method of designating a reading area with a rectangle, but this significantly increases the amount of operation and reduces the convenience of the device.

また、こうした文字認識機能を有する携帯機器では、後処理における辞書の記憶容量の問題が生じる。従来の方式では、単語辞書の情報を用いて制約をかけながら、文字切り出しや文字識別の結果にあいまい性がある中で尤もらしい単語を検出するのが一般的である。日本語一般の単語、時事単語などを読み取り対象とすると、単語数は膨大となり、携帯機器に格納することは困難である。この問題の解決法として、辞書をサーバなど外部の計算機に記憶し、携帯機器と通信機能で接続することが考えられる。しかし、こうした後処理では頻繁に単語辞書にアクセスする必要があり、辞書を外部に置くと処理時間が長くなるという問題がある。 Further, in a portable device having such a character recognition function, a problem of dictionary storage capacity in post-processing occurs. In the conventional method, it is common to detect a probable word while there is ambiguity in the result of character segmentation or character identification while applying restrictions using information in the word dictionary. When reading general Japanese words, current affairs words, etc., the number of words becomes enormous and it is difficult to store them in a portable device. As a solution to this problem, it is conceivable to store the dictionary in an external computer such as a server and connect it to a portable device with a communication function. However, in such post-processing, it is necessary to frequently access the word dictionary, and if the dictionary is placed outside, there is a problem that the processing time becomes long.

特開平11-085909号公報Japanese Patent Laid-Open No. 11-085909

山崎正裕他、「OCR機能を応用した携帯電話向け電子辞書機能の開発」電子情報通信学会2004年総合大会講演論文集D-12-35Masahiro Yamazaki et al., "Development of an electronic dictionary function for mobile phones using the OCR function" Proceedings of the 2004 IEICE General Conference D-12-35 H. Bunke、 P.S.P. Wang、 “Handbook of Character Recognition and Document Image Analysis、” World Scientific、1997H. Bunke, P.S.P.Wang, “Handbook of Character Recognition and Document Image Analysis,” World Scientific, 1997 丸川勝美他「手書き漢字住所認識のためのエラー修正アルゴリズム」、情報処理学会論文誌、Vol. 35、 No. 6、 1994-6、 pp. 1101-1110Katsumi Marukawa et al. "Error Correction Algorithm for Handwritten Kanji Address Recognition", IPSJ Transactions, Vol. 35, No. 6, 1994-6, pp. 1101-1110

本発明が解決しようとする第一の課題は、日本語や中国語など、単語間に空白を空けることなく記述される言語の文書中から、簡単な操作で読み取りたい単語を指定できるようにすることである。上に述べたとおり、日本語の場合は単語間の空白がないため、位置を１点だけ指定しても、単語の範囲を自動的に特定することが困難である。本発明では、この問題を解決し、英単語などを認識するのと同等の操作で読み取り対象の単語を指定可能とする。
本発明が解決しようとする第二の課題は、後処理における単語辞書へのアクセス頻度を低減し、単語辞書がサーバ上にあっても実用的な処理時間で単語の読取を可能とすることである。 The first problem to be solved by the present invention is to make it possible to specify a word to be read by a simple operation from a document in a language such as Japanese or Chinese that is described without a space between words. That is. As described above, since there is no space between words in Japanese, it is difficult to automatically specify a word range even if only one position is specified. In the present invention, this problem is solved and a word to be read can be designated by an operation equivalent to recognizing an English word or the like.
The second problem to be solved by the present invention is to reduce the frequency of access to the word dictionary in post-processing, and to enable word reading in a practical processing time even if the word dictionary is on the server. is there.

上記の課題を解決するための第一の手段として、本発明では、操作者の指定する位置情報にもっとも近接するものを、単語照合の結果得られる候補単語の集合から選択する手段を設ける。ここで、単語照合とは、あらかじめ辞書に格納した単語として尤もらしい部分画像の配列を、文字識別結果に基づいて検出する処理である。辞書には1つ以上の単語をあらかじめ記憶しておく。もし、単語として尤もらしい部分画像配列が複数見出された場合には、それらを候補単語として出力する。指定の位置と候補単語の近接の度合いの尺度としては、例えば、候補単語の外接矩形の重心と指定位置の距離を用いる。これにより、単語間の空白がない場合にも、操作者が指定する位置近辺の単語を読み取ることが可能となる。 As a first means for solving the above-described problem, the present invention provides means for selecting the closest one to the position information designated by the operator from a set of candidate words obtained as a result of word matching. Here, word collation is a process of detecting an arrangement of partial images that are likely to be words stored in a dictionary in advance based on a character identification result. The dictionary stores one or more words in advance. If a plurality of partial image sequences that are likely to be words are found, these are output as candidate words. As a measure of the degree of proximity between the designated position and the candidate word, for example, the distance between the center of gravity of the circumscribed rectangle of the candidate word and the designated position is used. As a result, even when there is no space between words, it is possible to read a word near the position designated by the operator.

上記の課題を解決するための第二の手段として、本発明では、文字識別後に単語情報を用いずに尤もらしい文字列の候補を出力する文字列出力手段を設ける。この文字列出力手段は、文字識別の結果得られる確信度、部分画像の間隔の均一性などの情報を基準に尤もらしい文字列を出力する。尤もらしい文字列が複数ある場合には、複数の文字列を候補文字列として出力する。 As a second means for solving the above problem, the present invention provides a character string output means for outputting a plausible character string candidate without using word information after character identification. The character string output means outputs a plausible character string based on information such as the certainty factor obtained as a result of character identification and the uniformity of the interval between partial images. When there are a plurality of likely character strings, the plurality of character strings are output as candidate character strings.

文字列出力手段では、繰り返し処理によって尤もらしさを最適にする方式を採用する。従来は、部分画像の位置関係をネットワークで表現し(文字切り出しネットワーク)、個々の部分画像の文字としての確信度を求め、ネットワーク上で確信度の和が最大となる経路を求める方式が広く使われている。しかし、この方式では、部分画像の間隔の均一性を最適化することはできない。そこで、文字の切り出し方を少しずつ繰り返し変化させ、文字列としての尤もらしさを最適化するようにする。 The character string output means adopts a method for optimizing likelihood by iterative processing. Conventionally, a method is widely used in which the positional relationship of partial images is represented by a network (character segmentation network), the confidence level of each partial image as a character is obtained, and the route that maximizes the sum of confidence levels on the network is obtained. It has been broken. However, this method cannot optimize the uniformity of the interval between the partial images. Therefore, the character cutout method is repeatedly changed little by little to optimize the likelihood as a character string.

上に述べたような文字切り出し方式、単語照合方式、データ形式を用いることにより、単語間に空白を空けることなく記述される日本語や中国語などの言語の文字列中から、操作者の指定する位置に近接する単語を自動的に切り出すことが可能となる。これにより、単語を認識させるための操作者の操作量は大幅に減り、機器の利便性が向上する。 By using the character segmentation method, word collation method, and data format as described above, the operator can specify from character strings in languages such as Japanese and Chinese that are described without spaces between words. It is possible to automatically cut out a word close to the position to be performed. Thereby, the operation amount of the operator for recognizing the word is greatly reduced, and the convenience of the device is improved.

また、単語辞書が遠隔のサーバに有っても、頻繁なネットワークアクセスを行う必要がなく、処理速度が向上する。本発明では、文字列を一括してサーバに転送することができ、転送時間は短縮される。転送する文字列は、文字識別の確信度、文字間隔の分散などよって厳選されており、転送時間は短縮される。さらに文字の切り出しの曖昧さはこの時点で解消されており、サーバでの単語照合処理も簡便なもので済む。 Even if the word dictionary is in a remote server, it is not necessary to perform frequent network access, and the processing speed is improved. In the present invention, character strings can be collectively transferred to a server, and the transfer time is shortened. The character string to be transferred is carefully selected according to the certainty of character identification, the dispersion of character intervals, and the like, and the transfer time is shortened. Furthermore, the ambiguity of character segmentation has been resolved at this point, and the word matching process at the server can be simplified.

図1に本発明の一実施例を示す。本実施例は2つの計算機１００、１０１で実現する。画像入力手段１０２は、文字の像を光電変換して計算機に取り込む。位置指定手段１０３は、操作者により入力された読み取り対象の単語の位置の指定を特定する。ここでは、位置は画像上のＸ座標値とＹ座標値で指定するものとする。文字切り出し手段１０４は、個々の文字に対応すると思われる部分画像を切出す。文字識別手段１０５は、切り出した部分画像各々が何の文字であるかを識別し、確信度とともに出力する。この際、各文字の形状を記憶するための手段(文字識別辞書１０９)を参照する。文字列出力手段１０６は、文字識別の結果得られる確信度、部分画像の間隔の均一性などの情報を基準に尤もらしい文字列を出力する。尤もらしい文字列が複数ある場合には、複数の文字列を候補文字列として出力する。単語照合手段１０７は、候補文字列と単語辞書１１０にあらかじめ記憶してある単語を照合し、一致するものを検出する。単語選択手段１０８は、単語照合手段１０７の出力と、位置指定手段１０３の出力とを元に、指定位置に近い単語を選択し、単語認識結果として出力する。最後に認識結果表示手段１１１にて、単語認識結果を表示する。 FIG. 1 shows an embodiment of the present invention. This embodiment is realized by two computers 100 and 101. The image input means 102 photoelectrically converts a character image into a computer. The position designation unit 103 identifies the designation of the position of the word to be read input by the operator. Here, the position is designated by an X coordinate value and a Y coordinate value on the image. The character cutout unit 104 cuts out partial images that are considered to correspond to individual characters. The character identifying means 105 identifies what character each cut out partial image is and outputs it together with the certainty factor. At this time, a means (character identification dictionary 109) for storing the shape of each character is referred to. The character string output means 106 outputs a plausible character string based on information such as the certainty factor obtained as a result of character identification and the uniformity of the interval between partial images. When there are a plurality of likely character strings, the plurality of character strings are output as candidate character strings. The word collating means 107 collates the candidate character string with words stored in advance in the word dictionary 110, and detects matching words. The word selection means 108 selects a word close to the designated position based on the output of the word collating means 107 and the output of the position designation means 103, and outputs it as a word recognition result. Finally, the recognition result display unit 111 displays the word recognition result.

計算機１は携帯情報端末、例えばカメラ付き携帯電話やカメラ付きＰＤＡなどである。計算機２は、無線又は有線により計算機１と直接又は間接的に通信可能な計算機であり、例えばセルラ通信のネットワークに接続されるサーバなどである。認識結果表示手段１１２は、計算機１が有する表示部である。計算機１の画像入力手段１１１はカメラなどの画像入力装置により実現される。位置指定手段１０３、文字切り出し手段１０４、文字識別手段１０５、および文字列出力手段１０６は、計算機１の記憶部に格納されたプログラムを演算部で実行することにより実現される。文字識別辞書１０９は計算機１の記憶部に格納される。単語照合手段１０７、および単語選択手段１０８は、計算機２の記憶部に格納されたプログラムを演算部で実行することにより実現される。単語辞書１１０は計算機２の記憶部に格納される。計算機１および２は通信機能を有し、この通信機能を用いて単語の位置指定、文字列出力手段の出力、単語認識結果などの送受信を行う。 The computer 1 is a portable information terminal such as a camera-equipped mobile phone or a camera-equipped PDA. The computer 2 is a computer that can communicate directly or indirectly with the computer 1 wirelessly or by wire, and is, for example, a server connected to a cellular communication network. The recognition result display unit 112 is a display unit included in the computer 1. The image input means 111 of the computer 1 is realized by an image input device such as a camera. The position specifying unit 103, the character cutout unit 104, the character identification unit 105, and the character string output unit 106 are realized by executing a program stored in the storage unit of the computer 1 by the calculation unit. The character identification dictionary 109 is stored in the storage unit of the computer 1. The word collating unit 107 and the word selecting unit 108 are realized by executing a program stored in the storage unit of the computer 2 by the calculation unit. The word dictionary 110 is stored in the storage unit of the computer 2. The computers 1 and 2 have a communication function, and use this communication function to transmit and receive word position designation, character string output means output, word recognition results, and the like.

計算機１(100)の外観（表側及び裏側））の例を図１０に示す。画像入力手段１０２のカメラは、表示部１１１と反対側に設置すると、ユーザが画像を視認しながら入力する際などに便利である。表示部１１１の側には、表示部の表示内容の操作や、画像入力の指定に用いられる入力ボタン１１２が設けられる。 An example of the external appearance (front side and back side) of the computer 1 (100) is shown in FIG. If the camera of the image input means 102 is installed on the side opposite to the display unit 111, it is convenient when the user inputs an image while viewing the image. On the display unit 111 side, an input button 112 used to operate display contents on the display unit and specify image input is provided.

図2は、入力画像および位置指定の操作を模式的に表すものである。２０１は、表示部１１１上に入力画像を表示するウインドウである。入力画像には、操作者が読ませようとする単語が撮られているものとする。２０２は、位置指定のためのマークである。操作者がこのマークを認識させたい単語にあわせて画像の入力を行うことで、このマークの位置に相当する入力画像の位置が、位置指定手段１０３で特定すべき位置として指定される。この例では、「経済」の文字列を読ませるために、その近辺にマークをあわせた状態で画像を入力する。 FIG. 2 schematically shows an input image and position specifying operation. A window 201 displays an input image on the display unit 111. It is assumed that words that the operator wants to read are taken in the input image. Reference numeral 202 denotes a mark for specifying a position. When the operator inputs an image in accordance with a word for which the mark is to be recognized, the position of the input image corresponding to the position of the mark is designated as a position to be specified by the position specifying means 103. In this example, in order to read the character string “Economy”, an image is input with a mark in the vicinity thereof.

２０２に示すように、入力画像の中にへんとつくりに分かれている文字が多い場合、文字と文字の境界を一意に決めることが困難である。このような場合は、この段階では様々な仮説に基づいて文字を切り出しておく。図３は、文字切り出し手段１０３の出力の例を模式的に表している。ここでは文字切り出し結果は特開平11-085909号公報（特許文献１）にあるようなネットワークの形式となっている。図中で丸印によって表されたネットワークの頂点は、文字間の境界の候補を表す。丸の中の数字は、各境界候補の識別子を表す。また、折れ線は、切り出された部分画像を表す。こうしたネットワークによる表現により、文字の切り出し方は、ネットワーク中の経路で表されることとなる。 As shown in 202, when there are many characters that are divided in the input image, it is difficult to uniquely determine the boundary between the characters. In such a case, characters are cut out at this stage based on various hypotheses. FIG. 3 schematically illustrates an example of the output of the character cutout unit 103. Here, the character cutout result is in a network format as disclosed in Japanese Patent Laid-Open No. 11-085909 (Patent Document 1). The vertices of the network represented by circles in the figure represent candidates for boundaries between characters. The number in the circle represents the identifier of each boundary candidate. A broken line represents a cut out partial image. With such network representation, the character cut-out method is represented by a route in the network.

文字識別手段１０５としては、例えば、H. Bunke、 P.S.P. Wang、 “Handbook of Character Recognition and Document Image Analysis、” (World Scientific、 1997)（非特許文献２）にあるような手法を用いる。類似した形状の文字があると、文字識別処理単独では文字種を特定するのが困難なことがある。こうした場合には、文字識別手段１０５は、複数の候補文字を確信度とついにして出力する。 As the character identification means 105, for example, a technique as described in H. Bunke, P.S.P. Wang, “Handbook of Character Recognition and Document Image Analysis,” (World Scientific, 1997) (Non-patent Document 2) is used. If there are characters with similar shapes, it may be difficult to specify the character type by the character identification process alone. In such a case, the character identification means 105 outputs a plurality of candidate characters together with the certainty factor.

図4は、文字列出力手段１０６の出力の例を模式的に表している。文字の切り出し方がこの段階では決定できないため、様々な文字の切り出し方を仮定して文字列を出力している。図中では６つの文字列を示しているが、これは６つの候補文字列が出力される場合を示す。また、候補文字列の順序は、以下に示す文字列確信度の値が大きい順に並べるようにする。 FIG. 4 schematically shows an output example of the character string output means 106. Since how to cut out characters cannot be determined at this stage, character strings are output assuming various ways of cutting out characters. Although six character strings are shown in the figure, this indicates a case where six candidate character strings are output. In addition, the candidate character strings are arranged in descending order of the character string certainty values shown below.

(文字列確信度) ＝ a×(文字識別結果の一位の確信度の平均値）- b×(文字の中心座標間隔の分散値)
（a、bは正の定数）
これは、できるだけ文字として尤もらしく、かつ、文字列として文字のピッチが整っているような文字の切り出し方を、上位の候補とするためのものである。 (Character string certainty factor) = a x (Average value of the first certainty factor of the character identification result)-b x (Distribution value of the center coordinate interval of the character)
(A and b are positive constants)
This is intended to make a character candidate that is as plausible as possible and has a good character pitch as a character string as a top candidate.

図5は、文字列出力手段１０７における処理の手順の例を示している。まず、ステップ５０１にて文字識別確信度の総和が最大となる経路をネットワーク上で探索する。これはダイキストラのアルゴリズムなど通常の経路探索アルゴリズムで実現可能である。次に、ステップ５０２にて、ステップ５０１で得られた経路にしたがって、文字列確信度を計算し、変数ａとｂに代入する。 FIG. 5 shows an example of a processing procedure in the character string output means 107. First, in step 501, a route on which the total sum of character identification certainty levels is maximized is searched on the network. This can be realized by a normal route search algorithm such as the Dijkstra algorithm. Next, in step 502, the character string certainty factor is calculated according to the route obtained in step 501 and substituted into variables a and b.

次にループ５０３にて、以下の処理を繰り返す。まず、ループ５０４にて、全ての境界候補について、その境界候補を逆転した文字列確信度を計算し、その値を変数cに代入する。もし、変数cの値が変数bの値より大きい場合には、cの値をbに代入する。 Next, in the loop 503, the following processing is repeated. First, in a loop 504, for all boundary candidates, the character string certainty factor obtained by reversing the boundary candidates is calculated, and the value is substituted into the variable c. If the value of variable c is larger than the value of variable b, the value of c is substituted for b.

上記のループ５０３の処理の中で、境界候補の逆転とは以下のような処理を示す。もし、境界候補iが経路に含まれている場合には、iの直前、直後の境界候補を両端とする文字の切り出し方を選択し、iが含まれないように経路を修正する。もし、境界候補iが経路に含まれていない場合には、iを含むように経路を修正する。図６の（Ｂ）の例では、図６（Ａ）の境界候補３番を逆転しており、図６の（Ｃ）では、境界候補５番を逆転している。 In the processing of the loop 503, the reversal of the boundary candidate indicates the following processing. If the boundary candidate i is included in the route, a method of cutting out characters having the boundary candidates immediately before and after i as both ends is selected, and the route is corrected so that i is not included. If the boundary candidate i is not included in the route, the route is corrected to include i. In the example of FIG. 6B, the boundary candidate No. 3 in FIG. 6A is reversed, and in FIG. 6C, the boundary candidate No. 5 is reversed.

次にステップ５０５にて、変数aの値がbの値未満かどうか判定し、もし判定結果が真であれば、aにbの値を代入する。もし偽であれば、ループ５０３を終了し、その時点での経路にしたがって文字列に対応する部分画像の配列を確定し、文字列として出力する。上記の処理は、最適な文字の切り出し方を１つだけ出力する例である。上記の処理と同様に、常に上n個の文字の経路を記憶し、それらを繰り替えし少しずつ修正することで、上位n個の最適な文字の切り出し方を出力することも可能となる。 In step 505, it is determined whether the value of the variable a is less than the value of b. If the determination result is true, the value of b is substituted into a. If false, the loop 503 is terminated, and the arrangement of partial images corresponding to the character string is determined according to the path at that time, and is output as a character string. The above processing is an example of outputting only one optimum character cutout method. Similarly to the above processing, it is possible to always output the top n character paths, repeat them, and modify them little by little to output the top n optimal character cutout methods.

文字列出力手段の出力としては、得られた部分画像配列の各部分画像に対する文字識別結果の一位候補文字をつなぎ合わせたものを用いる。また、別の実施例として、後述するような各部分画像に対し複数の候補文字を格納したもの(ラティス)を用いてもよい。
単語照合手段１０７には、通常の文字列比較手法を用いる。また、入力としてラティスを用いる場合には、丸川勝美他「手書き漢字住所認識のためのエラー修正アルゴリズム」（情報処理学会論文誌、Vol. 35、 No. 6、 1994-6、 pp. 1101-1110）（非特許文献３）のような手法を用いる。 As the output of the character string output means, a combination of the first candidate characters of the character identification result for each partial image of the obtained partial image array is used. As another example, a plurality of candidate characters (lattices) stored for each partial image as described later may be used.
The word matching means 107 uses a normal character string comparison method. Also, when using lattice as input, Katsumi Marukawa et al. “Error Correction Algorithm for Handwritten Kanji Address Recognition” (IPSJ Journal, Vol. 35, No. 6, 1994-6, pp. 1101-1110) ) (Non-Patent Document 3) is used.

図７に、単語選択手段１０８の出力を認識結果表示手段１１１で表示した結果を模式的に示す。２０１は位置指定に用いた画面である。７０１は単語認識結果を表示するウインドウである。ウインドウの上の方ほど、単語として尤もらしいものを表示している。単語の尤もらしさには、認識された単語の外接矩形の画像上での重心と、操作者が指定した読み取り位置の距離を用いる。また、外接矩形が指定した読み取り位置を含むような単語候補を表示するようにしてもよい。さらに、操作者が希望する単語候補を指定できるよう、認識結果表示手段１１１ではカーソル７０２を表示する。操作者はボタンなどを操作してカーソルを上下し、列挙された候補単語から希望するものを選択する。また、ウインドウ７０１中の文字のX座標は、ウインドウ２０１に示す入力画像中で対応する文字のX座標に合わせて表示する。 FIG. 7 schematically shows the result of displaying the output of the word selection means 108 by the recognition result display means 111. Reference numeral 201 denotes a screen used for position designation. A window 701 displays the word recognition result. The upper part of the window displays more likely words. For the word likelihood, the distance between the center of gravity of the recognized word on the circumscribed rectangle image and the reading position designated by the operator is used. Moreover, you may make it display the word candidate which includes the reading position which the circumscribed rectangle designated. Further, the recognition result display unit 111 displays a cursor 702 so that the operator can specify a desired word candidate. The operator operates buttons and moves the cursor up and down to select a desired word from the listed candidate words. The X coordinate of the character in the window 701 is displayed in accordance with the X coordinate of the corresponding character in the input image shown in the window 201.

図８は、文字列出力手段１０６の出力形式をラティスとした際のデータ形式である。表の各行すなわち１レコードが、文字列中の１文字に対応する。はじめの２変数BLとBRには、ネットワーク上での左と右の境界の識別子を記憶する。次の４つの変数L、T、R、Bには、切り出された部分画像の左端、右端、上端、下端の座標を記憶する。次の変数Nには、出力する候補文字の数を記憶する。配列C[1]からC[N]には、文字識別結果得られた候補文字の文字コードを記憶する。配列Lk[1]からLk[N]には、各候補文字の確信度を格納する。このように、文字識別結果を部分画像の座標と共に記憶しておくことにより、単語選択手段１０８にて、位置指定結果に応じた候補単語を選択することが可能となる。 FIG. 8 shows a data format when the output format of the character string output means 106 is a lattice. Each row of the table, that is, one record corresponds to one character in the character string. The first two variables BL and BR store the identifiers of the left and right boundaries on the network. The following four variables L, T, R, and B store the coordinates of the left end, right end, upper end, and lower end of the cut out partial image. The next variable N stores the number of candidate characters to be output. In the arrays C [1] to C [N], character codes of candidate characters obtained as a result of character identification are stored. In arrays Lk [1] to Lk [N], the certainty factor of each candidate character is stored. As described above, by storing the character identification result together with the coordinates of the partial image, the word selection unit 108 can select a candidate word corresponding to the position designation result.

図９は、単語照合１０７の出力のデータ形式を示す。第一の変数LENには、単語の文字数を記憶する。次の４つの変数L、T、R、Bには、単語の左端、右端、上端、下端の座標を記憶する。変数C[i]には、単語のi番目の文字の文字コードを記憶する。変数P[i]には、単語のi番目の文字に対応する図８のテーブル中のレコードへのポインタを示す。こうした記憶形式を用いることにより、図７に示すような表示が可能となる。 FIG. 9 shows an output data format of the word collation 107. The first variable LEN stores the number of characters of the word. In the next four variables L, T, R, and B, the coordinates of the left end, right end, upper end, and lower end of the word are stored. The variable C [i] stores the character code of the i-th character of the word. A variable P [i] indicates a pointer to the record in the table of FIG. 8 corresponding to the i-th character of the word. By using such a storage format, a display as shown in FIG. 7 becomes possible.

本発明の一実施例の構成。1 shows a configuration of an embodiment of the present invention. 位置指定画面。Position specification screen. 文字切り出し結果のネットワーク。A network of character segmentation results. 文字列出力結果。String output result. 文字列出力の処理手順。Processing procedure for character string output. 境界の逆転の例。An example of boundary reversal. 単語選択結果の表示例。Display example of word selection results. 文字列出力のデータ形式。Data format for character string output. 単語照合結果のデータ形式。Data format for word matching results. 計算機１の構成例。2 shows a configuration example of the computer 1.

Explanation of symbols

１００：第一の計算機、１０１：第二の計算機、１０２：画像入力手段、１０３：位置指定、１０４：文字切出し、１０５：文字識別、１０６：文字列出力、１０７：単語照合、１０８：単語選択、１０９：文字識別辞書、１１０：単語辞書、１１１：認識結果表示、２０１：入力画像表示ウインドウ、２０２：位置指定用カーソル、７０１：単語認識結果表示ウインドウ、７０２：単語選択カーソル。 100: first computer, 101: second computer, 102: image input means, 103: position designation, 104: character extraction, 105: character identification, 106: character string output, 107: word matching, 108: word selection 109: character identification dictionary, 110: word dictionary, 111: recognition result display, 201: input image display window, 202: cursor for position designation, 701: word recognition result display window, 702: word selection cursor.

Claims

Image input means for photoelectrically converting an image into a digital image, character cutout means for cutting out a sequence of character candidate partial images based on boundary candidates between characters in the digital image, and each partial image obtained by the character cutout means Character identification means for identifying each character as a character and outputting a pair of the identification result and each character identification certainty factor, and a route search algorithm, and using the path search algorithm, the sum of the certainty factors for each character identification A character string including a variance value of the center interval of the character string obtained as a result of selecting a route with the largest value , further correcting a route with one boundary candidate i as a boundary, and reversing or not Repeatedly search for a set of boundaries to obtain an optimal partial image arrangement by increasing or decreasing the certainty of the A word storage means for storing a match, a character string word matching means for detecting a partial character string matching the word stored in the word storage means from the character string generation result of the character string generation means, and a word to be read in the image A position specifying means for specifying the position by the position of the mark input in the image, and a word selecting means for selecting a character string word matching result that is close to the position specified by the position specifying means. Feature word recognition device.

The character string generation means repeats searching for a set of boundaries for obtaining an optimum partial image arrangement by correcting a path for reversing or not as the boundary for all boundary candidates obtained from the digital image. The word recognition apparatus according to claim 1.