JPH08329191A

JPH08329191A - Character string segmenting method

Info

Publication number: JPH08329191A
Application number: JP7134408A
Authority: JP
Inventors: Takuma Akagi; 琢磨赤木; Hidekata Mototani; 秀堅本谷; Shigeyoshi Shimotsuji; 成佳下辻
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-05-31
Filing date: 1995-05-31
Publication date: 1996-12-13

Abstract

PURPOSE: To provide a character string segmenting method capable of accurately and efficiently extracting a certain character string even when plural character strings concentrically exist. CONSTITUTION: In the character string segmenting method, character candidates are extracted from an input picture (24), two extracted character candidates e.g. are combined to prepare a partial character string to be a constitutional element of a character string (26), a character string candidate is prepared only by combining plural prepared partial, character strings (28), a character string is determined from plural prepared character string candiates (30). Preferably a character string candidate is prepared only when one character candidate is superposed between partial character strings to be combined or between a partial character string and already prepared character string candidate.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、画像から文字列候補を
切り出す方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for extracting character string candidates from an image.

【０００２】[0002]

【従来の技術】図面認識装置あるいはテキストリーダー
などのように書面に記述されたイメージを認識する装置
では、文字列を切り出す処理が重要である。従来の文字
列切り出し方法では、通常、以下のような手順が取られ
る。2. Description of the Related Art In a device for recognizing an image written on a document such as a drawing recognition device or a text reader, the process of cutting out a character string is important. In the conventional character string cutout method, the following procedure is usually taken.

【０００３】全ての文字候補を抽出する。抽出された文字候補同士を統合することができるか否
かを文字認識などによりチェックし、統合可能であれば
統合する。ｎ（ｎは２以上）文字の文字候補からなる文字列候補
に他の文字候補を統合できるか否かチェックし、統合可
能であれば統合して、ｎ＋１文字の文字候補からなる文
字列候補を生成する。統合可能な文字候補が無くなった場合、処理を終了す
る。All character candidates are extracted. Whether or not the extracted character candidates can be integrated is checked by character recognition or the like, and if they can be integrated, they are integrated. It is checked whether or not another character candidate can be integrated with a character string candidate composed of n (n is 2 or more) characters, and if they can be integrated, the character string candidate composed of n + 1 character candidates is integrated. To generate. When there are no more character candidates that can be integrated, the process ends.

【０００４】ここで、上記のとの処理を各文字候補
や各文字候列補に対して繰返し行なった結果、様々な文
字列候補が生成される。その際、上記手順では、ある文
字候補が複数の文字列候補に統合可能であった場合に、
同一の文字候補を構成要素とする複数の文字候補列が生
成されていることがある。Here, as a result of repeatedly performing the above-described processing for each character candidate and each character string sequence, various character string candidates are generated. At that time, in the above procedure, when a character candidate can be integrated into a plurality of character string candidates,
A plurality of character candidate strings having the same character candidate as a component may be generated.

【０００５】この場合、該文字候補をどの文字列候補に
統合するか、すなわちどの文字列候補を採用するかを決
定する必要が生じる。決定に際しては、例えば、同一の
文字候補を構成要素とする複数の文字候補列夫々を、予
め設けた評価基準に照らし合わせて文字列としての尤も
らしさを評価し、この評価結果に従い尤もらしい文字候
補列を採用する。In this case, it becomes necessary to determine which character string candidate is to be integrated with the character candidate, that is, which character string candidate is adopted. At the time of determination, for example, each of a plurality of character candidate strings having the same character candidate as a component is evaluated for the likelihood as a character string by comparing it with a preset evaluation criterion, and the likely character candidate is obtained according to this evaluation result. Adopt columns.

【０００６】また、文字列を構成する文字の組合せ方に
複数の解釈が存在する場合があるが、このような場合
は、例えば特開平５−２８３０５号公報にあるような手
法により、全ての解釈に係る文字列候補について評価を
行ない、その評価値に従い妥当な解釈に係る文字列候補
を選択する方法がある。In addition, there are cases where there are a plurality of interpretations for the combination of characters that make up a character string. In such a case, all interpretations are performed by the method disclosed in Japanese Patent Laid-Open No. 28305/1993. There is a method in which a character string candidate related to is evaluated and a character string candidate related to a proper interpretation is selected according to the evaluation value.

【０００７】しかしながら、従来の方法では、文字列の
方向が縦横のみに限定されていることを前提として文字
切り出しを行なっていたので、あらゆる方向の文字列が
集中的に存在するような場合には文字列切り出しが不可
能であった。また、従来の方法を拡張し、全方向の文字
列に対応させた場合、取捨選択すべき文字列候補の数が
爆発的に増え現実的ではない。However, in the conventional method, the character is cut out on the assumption that the direction of the character string is limited only to the vertical and horizontal directions. Therefore, when the character strings in all directions exist intensively. It was impossible to cut out a character string. Further, when the conventional method is expanded to support character strings in all directions, the number of character string candidates to be selected is explosively increased, which is not realistic.

【０００８】[0008]

【発明が解決しようとする課題】従来の文字列切り出し
方法では、様々な方向を持った文字列が集中しているよ
うな画像に対しては、正しい文字列切り出しが困難であ
った。また、従来は、全ての文字候補の組合せを文字列
候補として検討するので、処理に時間がかかった。According to the conventional character string cutout method, it is difficult to correctly cut out a character string for an image in which character strings having various directions are concentrated. Further, conventionally, since all combinations of character candidates are considered as character string candidates, it takes time to process.

【０００９】本発明は、上記事情に鑑みてなされたもの
であり、文字列が集中的に存在する場合でも、的確かつ
効率的に文字列を抽出できる文字列切り出し方法を提供
することを目的とする。The present invention has been made in view of the above circumstances, and an object thereof is to provide a character string segmentation method capable of extracting a character string accurately and efficiently even when the character string is concentrated. To do.

【００１０】[0010]

【課題を解決するための手段】本発明に係る文字列切り
出し方法は、入力画像中から文字候補を抽出し、抽出さ
れた文字候補を組合わせて文字列の構成要素となり得る
部分文字列を作成し、作成された部分文字列を組合わせ
ることによってのみ文字列候補を作成し、作成された文
字列候補から文字列を決定することを特徴とする。A character string cutout method according to the present invention extracts character candidates from an input image and combines the extracted character candidates to create a partial character string that can be a constituent element of the character string. The character string candidates are created only by combining the created partial character strings, and the character string is determined from the created character string candidates.

【００１１】好ましくは、前記文字列決定は、部分文字
列作成時および文字列候補作成時に求められた信頼度を
基に文字列を決定することを特徴とする。また、好まし
くは、前記部分文字列の信頼度を、該部分文字列を構成
する文字候補を認識処理して得た文字としての類似度に
基いて求め、前記文字列候補の信頼度は、該文字列候補
を構成する部分文字列の前記類似度に基いて求めること
を特徴とする。Preferably, the determination of the character string is performed by determining the character string based on the reliability obtained at the time of creating the partial character string and at the time of creating the character string candidate. Further, preferably, the reliability of the partial character string is obtained based on the similarity as a character obtained by recognizing the character candidates forming the partial character string, and the reliability of the character string candidate is It is characterized in that it is obtained based on the similarity of the partial character strings forming the character string candidates.

【００１２】また、好ましくは、前記部分文字列は、抽
出された２つの前記文字候補を組合わせることによって
作成することを特徴とする。また、好ましくは、前記文
字候補を組合わせて部分文字列を作成する際、文字候補
同士の距離が所定のしきい値よりも短く、文字候補同士
の大きさがほぼ揃っており、かつ、それらの文字候補に
より部分文字列を作成し当該部分文字列を認識処理した
とき文字としての類似度が所定のしきい値を満足する場
合、該文字候補からなる部分文字列を採用することを特
徴とする。Further, it is preferable that the partial character string is created by combining the two extracted character candidates. Further, preferably, when creating a partial character string by combining the character candidates, the distance between the character candidates is shorter than a predetermined threshold, the size of the character candidates are substantially uniform, and When a partial character string is created from the character candidates of and the similarity as a character satisfies a predetermined threshold value when the partial character string is recognized, the partial character string composed of the character candidates is adopted. To do.

【００１３】また、好ましくは、前記文字列候補の作成
にあたっては、部分文字列同士または部分文字列と既に
作成された文字列候補を組合わせてより長い文字列候補
を作成して行き、作成された文字列候補夫々が文字列と
しての特徴を有するかを調べて、文字列候補として採用
するか否かを決定することを特徴とする。Preferably, in creating the character string candidates, partial character strings or a combination of the partial character strings and the already created character string candidates are created to create longer character string candidates, and the longer character string candidates are created. It is characterized in that each of the character string candidates is examined to determine whether or not each character string candidate has a characteristic as a character string and whether or not to adopt it as a character string candidate.

【００１４】また、好ましくは、前記文字列候補を作成
する際、組み合わせる部分文字列同士または部分文字列
と既に作成された文字列候補が文字候補の重なりを持つ
場合にだけ文字列候補を作成することを特徴とする。Preferably, when creating the character string candidates, the character string candidates are created only when the partial character strings to be combined or the partial character strings and the already created character string candidates have overlapping character candidates. It is characterized by

【００１５】また、好ましくは、前記文字候補の重なり
が１文字だけの場合に文字列候補を作成することを特徴
とする。また、好ましくは、前記文字列候補の文字列と
しての特徴を有するかを調べる際、文字列候補を構成す
る文字候補の高さが揃っているかおよび文字候補が直線
上に並んでいるかを調べ、その結果が所定の条件を満た
した場合にのみ文字列候補を作成することを特徴とす
る。Preferably, the character string candidates are created when the character candidates overlap only one character. In addition, preferably, when checking whether or not the character string candidates have characteristics as a character string, it is checked whether or not the heights of the character candidates forming the character string candidates are aligned and whether the character candidates are lined up on a straight line, It is characterized in that a character string candidate is created only when the result satisfies a predetermined condition.

【００１６】また、好ましくは、前記文字列候補から文
字列を決定するにあたって、文字列候補を各文字数の文
字列候補毎に信頼度順にソートし、この結果を基にして
文字列を決定することを特長とする。Further, preferably, in determining the character string from the character string candidates, the character string candidates are sorted in order of reliability for each character string candidate having each number of characters, and the character string is determined based on this result. It is characterized by.

【００１７】[0017]

【作用】本発明では、入力画像中から、例えば大きさ・
形状などを基に、文字候補を抽出する。そして、抽出さ
れた文字候補を、例えば距離・大きさ・認識処理結果な
どを基に適宜組合わせて、例えば２文字の部分文字列を
作成する。その際、好ましくは、各部分文字列の信頼度
を求める。In the present invention, from the input image, for example, the size
Character candidates are extracted based on the shape and the like. Then, the extracted character candidates are appropriately combined based on, for example, the distance, the size, the recognition processing result, and the like to create a partial character string of, for example, two characters. At that time, preferably, the reliability of each partial character string is obtained.

【００１８】そして、作成された部分文字列を基にして
文字列候補を作成する。その際、好ましくは、同じ文字
候補を含むもの同士を統合する。また、好ましくは、各
文字列候補の信頼度を求める。Then, character string candidates are created based on the created partial character strings. At that time, it is preferable to integrate those including the same character candidates. Further, preferably, the reliability of each character string candidate is obtained.

【００１９】最後に、前記文字列候補（部分文字列を含
んでも良い）から文字列を決定する。例えば、信頼度や
これに加えて例えば文字列の位置・方向・長さなどに基
いて決定する。Finally, a character string is determined from the character string candidates (which may include partial character strings). For example, it is determined based on the reliability or the position, direction, length, etc. of the character string in addition to the reliability.

【００２０】本発明によれば、最初に入力画像中から抽
出された文字候補のうち統合できると判断された文字候
補同士を組合わせて部分文字列を作成しておき、次にこ
の部分文字列を妥当性を持って組合わせていくことで文
字列候補を作成していき、最後に所定の評価基準に従い
文字列候補のうちから文字列を確定するので、文字候補
が多数集中し、外形的特徴だけでは文字列を決定できな
いような場合でも、正しく文字列を切り出すことができ
るとともに、全ての文字候補の組合せを文字列候補とし
て検討する従来の方法に比較して、はるかに効率良く文
字列を切り出すことができる。According to the present invention, among the character candidates first extracted from the input image, the character candidates determined to be able to be integrated are combined to create a partial character string, and then this partial character string is created. We will create character string candidates by combining them with appropriateness, and finally determine the character string from among the character string candidates according to a predetermined evaluation standard, so many character candidates will be concentrated and Even if a character string cannot be determined only by its features, the character string can be cut out correctly, and it is much more efficient than the conventional method that considers all character candidate combinations as character string candidates. Can be cut out.

【００２１】また、文字列決定を可能性の高い全ての文
字列候補を挙げた上で実施できるので、文字列同士の最
適な組合せを見つけることが容易である。また、文字列
決定時に可能性の高い全ての文字列候補が揃っているた
め、複数の文字列の組み合わせ候補が作りやすく、形状
・認識だけでは判断できないような意味処理などの高度
な文字列決定処理に対して、複数の文字列候補の組合せ
を容易に提示できる。Further, since the character string can be determined after enumerating all the character string candidates having a high possibility, it is easy to find the optimum combination of the character strings. In addition, since all the character string candidates that are highly likely to be available at the time of character string determination are available, it is easy to create combination candidates for multiple character strings, and advanced character string determination such as semantic processing that cannot be judged only by shape and recognition. A combination of a plurality of character string candidates can be easily presented to the process.

【００２２】また、文字列の信頼度を表す認識処理によ
る類似度は部分文字列を作成するときだけ求め、文字列
候補を作成する際の文字列としての信頼度は文字列候補
を構成する部分文字列の信頼度を基に求めるため、処理
時間が短縮される。このように本発明によれば、文字列
が集中的に存在する場合でも、的確かつ効率的に文字列
を抽出することができる。Further, the similarity by the recognition process indicating the reliability of the character string is obtained only when the partial character string is created, and the reliability as the character string when creating the character string candidate is the portion that constitutes the character string candidate. The processing time is shortened because the reliability is calculated based on the character string. As described above, according to the present invention, a character string can be accurately and efficiently extracted even when the character string is concentrated.

【００２３】[0023]

【実施例】以下、図面を参照しながら実施例を説明す
る。図１に、本発明の一実施例に係る文字切り出し装置
の構成を示す。図１のように、本実施例の文字切り出し
装置は、図面認識装置あるいはテキストリーダーなどの
ように書面に記述された画像等を認識する装置に用いる
のに好適であり、画像入力部２２、文字候補抽出部２
４、部分文字列作成部２６、文字列候補作成部２８、文
字列決定部３０を備えている。Embodiments will be described below with reference to the drawings. FIG. 1 shows the configuration of a character clipping device according to an embodiment of the present invention. As shown in FIG. 1, the character slicing device according to the present embodiment is suitable for use in a device for recognizing an image described in a document such as a drawing recognition device or a text reader. Candidate extraction unit 2
4, a partial character string creating unit 26, a character string candidate creating unit 28, and a character string determining unit 30.

【００２４】文字候補とは、入力画像から文字として抽
出されたパターンで、文字列を構成する文字の候補のこ
とを言う。文字列候補は、２以上の文字候補からなる文
字列の候補のことを言う。なお、２つの文字候補により
構成される文字列候補を、便宜上、部分文字列と呼ぶ。
３つの文字候補により構成される文字列候補を、便宜
上、準文字列候補と呼ぶ。A character candidate is a pattern extracted as a character from an input image, and means a candidate for a character that constitutes a character string. A character string candidate is a character string candidate composed of two or more character candidates. A character string candidate composed of two character candidates is called a partial character string for convenience.
A character string candidate composed of three character candidates is called a quasi-character string candidate for convenience.

【００２５】図中の３２は、文字列候補テーブルであ
る。詳しくは後述するが、概略的には、各文字列候補に
対し、それを構成する文字候補とその文字列候補の信頼
度とを表したテーブルである。Reference numeral 32 in the drawing is a character string candidate table. As will be described in detail later, for each character string candidate, it is a table that roughly shows the character candidates that make up the character string candidate and the reliability of the character string candidate.

【００２６】画像入力部２２は、画像を入力するための
ものである。書面等に記述された画像を光学的に読取り
入力する場合には、イメージスキャナなどの光電変換装
置等を用いて構成され、画像が既に電子化され電子ファ
イル等に格納されている場合は、ファイル読取り装置等
を用いて構成される。The image input section 22 is for inputting an image. In the case of optically reading and inputting an image described in a document, etc., it is configured by using a photoelectric conversion device such as an image scanner, and when the image is already digitized and stored in an electronic file, etc. It is configured by using a reading device or the like.

【００２７】文字候補抽出部２４は、入力画像中から文
字を抽出し、これを文字候補として部分文字列作成部２
６に渡す。部分文字列作成部２６は、文字候補から部分
文字列を作成する。The character candidate extraction unit 24 extracts a character from the input image and uses it as a character candidate for the partial character string creation unit 2
Pass to 6. The partial character string creating unit 26 creates a partial character string from the character candidates.

【００２８】文字列候補作成部２８は、部分文字列と部
分文字列を統合して、準文字列候補を作成するととも
に、さらに可能な限り、３文字以上の文字候補からなる
文字列候補に他の部分文字列を統合して、次々と長い文
字列候補を作成して行く。The character string candidate creating unit 28 integrates the partial character strings and the partial character strings to create a quasi-character string candidate, and further, to the extent possible, a character string candidate composed of three or more character candidates. The partial character strings of are integrated to create long character string candidates one after another.

【００２９】文字列決定部３０は、作成された全文字列
候補から、文字列として採用する文字列候補を決定す
る。次に、本実施例の文字列切り出し処理（文字列候補
テーブル作成処理）を説明する。The character string determination unit 30 determines a character string candidate to be adopted as a character string from all the created character string candidates. Next, the character string cutout processing (character string candidate table creation processing) of this embodiment will be described.

【００３０】本実施例の文字列切り出し処理では、概略
的には、画像を入力した後、文字候補抽出部２４による
処理、部分文字列作成部２６による処理、文字列候補作
成部２８による処理、文字列決定部３０による処理が順
次行なわれるものである。In the character string cut-out processing of this embodiment, after the image is input, the character candidate extracting section 24, the partial character string creating section 26, the character string candidate creating section 28 perform the following steps. The processing by the character string determination unit 30 is sequentially performed.

【００３１】図２は、本実施例の文字列切り出し処理の
要部の流れを示すフローチャートである。以下では、図
２を参照しながら、本実施例の文字列切り出し処理をさ
らに詳しく説明する。FIG. 2 is a flow chart showing the flow of the main part of the character string cutout processing of this embodiment. Hereinafter, the character string cutout processing according to the present embodiment will be described in more detail with reference to FIG.

【００３２】（文字候補抽出部４による処理）まず、文
字候補抽出部２４により、画像入力部２２から入力され
た画像中に含まれる文字が文字候補として抽出される。
その際、抽出した文字候補には、当該文字候補の中心の
位置および大きさなどの情報が付加される。(Processing by Character Candidate Extraction Unit 4) First, the character candidate extraction unit 24 extracts characters included in the image input from the image input unit 22 as character candidates.
At that time, information such as the position and size of the center of the character candidate is added to the extracted character candidate.

【００３３】なお、画像中から文字を抽出する処理とし
ては、例えば、特公昭６３−６３９５１号公報で開示さ
れた、文字領域と図形領域を区別する方法などを用いる
ことができる。As a process of extracting characters from an image, for example, a method disclosed in Japanese Patent Publication No. 63-63951 for distinguishing character areas from graphic areas can be used.

【００３４】文字候補が抽出されたら、画像全体の文字
候補の位置を探索し、画像中で近い距離にある文字候補
同士をまとめて文字候補のグループを作成する。ここか
らの処理は、上記のようにして作成された各グループ内
において行う。After the character candidates are extracted, the positions of the character candidates in the entire image are searched, and the character candidates having a close distance in the image are put together to form a character candidate group. The process from here is performed in each group created as described above.

【００３５】同一グループに含まれる文字候補に、０か
ら順に通し番号を付ける。番号の付け方に、特に基準は
ない。図３の（ａ）に、入力画像の一例を示す。図中の
各パターンは全て同じグループに属するものとする。
（ｂ）は、（ａ）から得た各文字候補に通し番号を割り
振った一例であり、各文字候補に０番〜１６番までの文
字候補番号を夫々付している。The character candidates included in the same group are sequentially numbered from 0. There is no particular standard for numbering. FIG. 3A shows an example of the input image. It is assumed that all the patterns in the figure belong to the same group.
(B) is an example in which serial numbers are assigned to the character candidates obtained from (a), and character candidate numbers 0 to 16 are assigned to the character candidates, respectively.

【００３６】本実施例では前述のように、各グループに
つき、同一グループ内において複数の文字候補を統合し
て得られる文字列候補を表すための文字列候補テーブル
３２を設けている。In this embodiment, as described above, each group is provided with the character string candidate table 32 for representing character string candidates obtained by integrating a plurality of character candidates in the same group.

【００３７】この文字列候補テーブル３２は、例えば縦
に文字列候補が並べられ、横には文字候補番号が並べら
れたマトリクス構造を有するものである。文字列候補テ
ーブル３２では、ある文字列候補を構成する文字候補番
号の位置に対応するビットを１にし、その他のビットを
０にすることによって、その文字列候補がどの文字候補
から構成されているかを表す。また、各文字列候補に
は、それぞれ、その文字列候補が文字認識の観点からど
のくらい文字列らしいかを表す「信頼度」を格納する欄
が設けられている。図４に、文字列候補テーブル３２の
一例を示す。The character string candidate table 32 has, for example, a matrix structure in which character string candidates are arranged vertically and character candidate numbers are arranged horizontally. In the character string candidate table 32, by setting the bit corresponding to the position of the character candidate number forming a certain character string candidate to 1 and setting the other bits to 0, which character candidate the character string candidate is composed of Represents Further, each character string candidate is provided with a field for storing “reliability” indicating how likely the character string candidate is to be a character string from the viewpoint of character recognition. FIG. 4 shows an example of the character string candidate table 32.

【００３８】ここでの処理では、文字列候補テーブル３
２に、グループ内の文字候補１つで構成される文字列候
補を登録する（図４の文字数１の欄を参照）。ｎ個の文
字候補から成るグループであれば、ｎ個の文字列候補が
できる。なお、文字候補については、文字列候補として
の信頼度は求めない。In this processing, the character string candidate table 3
In 2, a character string candidate composed of one character candidate in the group is registered (see the column for the number of characters 1 in FIG. 4). A group of n character candidates can have n character string candidates. For the character candidates, the reliability as a character string candidate is not obtained.

【００３９】文字候補のグループが図３の例の場合、０
番から１６番までの文字候補があるので、図４のように
文字列候補テーブル３２の横の長さは１７個になる。図
３の「Ａ」の文字候補によって構成される文字列候補に
対応する行は、図４のように１番の欄だけが１であり、
他の欄は０となる。When the group of character candidates is the example of FIG.
Since there are character candidates from No. 16 to No. 16, the horizontal length of the character string candidate table 32 is 17 as shown in FIG. As for the line corresponding to the character string candidates composed of the character candidates of “A” in FIG. 3, only the first column is 1 as shown in FIG. 4,
Other columns are 0.

【００４０】＜部分文字列作成部６の処理＞次に、部分
文字列作成部６による処理に移る。同一グループ内から
２つの文字候補を選び、その文字候補同士の距離を求め
る。距離がある閾値よりも短いかどうか、つまりその文
字候補同士があまり離れていないかどうかを調べる。ま
た、その２つの文字候補の大きさが揃っているかどうか
を調べる。<Process of Partial Character String Creating Unit 6> Next, the process by the partial character string creating unit 6 will be described. Two character candidates are selected from the same group, and the distance between the character candidates is calculated. It is checked whether the distance is shorter than a certain threshold value, that is, the character candidates are not so far apart from each other. Also, it is checked whether the two character candidates have the same size.

【００４１】もし、２つの文字候補の距離が近くかつ大
きさが揃っていれば（ステップＳ６０３）、その２つの
文字候補を認識処理にかける（ステップＳ６０４）。認
識処理には、例えば、パターンマッチング法や複合類似
度法などの公知の技術を用いることができる。If the two character candidates are close to each other and have the same size (step S603), the two character candidates are subjected to recognition processing (step S604). For the recognition process, a known technique such as a pattern matching method or a composite similarity method can be used.

【００４２】本実施例では、認識処理は、２つの文字候
補により構成される文字列の方向（例えば文字候補の中
心を結ぶ方向）を文字の横方向とし、上下２方向につい
て行なう。処理結果としては、その文字候補の文字とし
ての類似度を例えば１０００点満点で出力する。In the present embodiment, the recognition process is performed in two vertical directions, with the direction of the character string composed of two character candidates (for example, the direction connecting the centers of the character candidates) being the lateral direction of the character. As the processing result, the similarity as the character of the character candidate is output with a maximum of 1000 points, for example.

【００４３】本実施例では、上下方向の少なくとも一方
において、２つの文字候補の類似度が共に閾値よりも高
い場合にのみ、それらの文字候補によって構成される部
分文字列を文字列候補テーブル３２に登録する（ステッ
プＳ６０５）。In this embodiment, a partial character string formed by these character candidates is stored in the character string candidate table 32 only when the similarity between the two character candidates is higher than the threshold value in at least one of the vertical direction. Register (step S605).

【００４４】このとき、その部分文字列の信頼度は、そ
の文字列候補に含まれる文字候補の類似度を基に計算す
る。例えば、閾値を越えた方向の類似度の２文字分の平
均とする。At this time, the reliability of the partial character string is calculated based on the similarity of the character candidates included in the character string candidate. For example, the similarity of two characters in the direction exceeding the threshold is averaged.

【００４５】しかし、文字列候補テーブル３２のうちの
部分文字列（文字数２）に対応する部分では、信頼度の
登録は行わず、代りに当該部分文字列を構成する２文字
の類似度をそのまま登録しておく。これは、さらに長い
３文字以上の文字列候補の信頼度を求めるためである。However, in the part corresponding to the partial character string (the number of characters is 2) in the character string candidate table 32, the reliability is not registered, and instead the similarity of the two characters forming the partial character string is unchanged. Register This is to obtain the reliability of a character string candidate having a longer length of 3 characters or more.

【００４６】もし、上下両方ともの類似度が閾値を越え
る場合は、平均の高い方を信頼度とする。この処理をグ
ループ内の全ての文字候補の組合せで行う（ステップＳ
６０１，Ｓ６０２）。If the upper and lower similarities exceed the threshold value, the higher average is taken as the reliability. This process is performed by combining all the character candidates in the group (step S
601, S602).

【００４７】このようにして、文字列候補テーブル３２
のうち文字候補と部分文字列の部分完成する。＜文字列候補作成部８の処理＞次に、部分文字列を組み
合わせて３文字以上の文字列候補を作成する文字列候補
作成部２８による処理に移る。ここでは、まず、共通の
文字候補を持つ２つの部分文字列を組み合わせて、３文
字からなる準文字列候補を作成する（ステップＳ６０
６，Ｓ６０９〜Ｓ６１２）。In this way, the character string candidate table 32
The character candidates and the partial character strings are partially completed. <Processing of Character String Candidate Creating Unit 8> Next, the process proceeds to the processing by the character string candidate creating unit 28 that creates a character string candidate of three or more characters by combining partial character strings. Here, first, two partial character strings having a common character candidate are combined to create a quasi-character string candidate consisting of three characters (step S60).
6, S609-S612).

【００４８】まず、部分文字列を２つ選択する。そし
て、それらの部分文字列が１文字候補だけ重なりを持っ
ているかどうか調べる。もし、１つの文字候補の重なり
を持っていれば、それらを統合した文字列候補である準
文字列候補を作成する（ステップＳ６１０）。この場
合、準文字列候補は、３つの文字候補によって構成され
る。First, two partial character strings are selected. Then, it is checked whether or not those partial character strings have an overlap of one character candidate. If one character candidate has an overlap, a quasi-character string candidate that is a character string candidate that integrates them is created (step S610). In this case, the quasi-character string candidate is composed of three character candidates.

【００４９】次に、この準文字列候補が文字列としての
特徴を有する（形状的性質を満たす）かどうかを調べる
（ステップＳ６１１）。ここで、文字列としての特徴を
有する条件は、一例としては、候補文字列を凸閉包で囲
んだときの最も長い辺と次に長い辺が、ほぼ同じ長さで
あり、平行であり、ある程度の（文字の高さぐらいの）
距離を持っていること、かつ、準文字列候補を構成する
全文字候補の中心が所定の許容範囲内で直線上に並んで
いることが考えられる。Next, it is checked whether or not this quasi-character string candidate has characteristics as a character string (satisfies the geometrical property) (step S611). Here, the condition having a characteristic as a character string is, as an example, that the longest side and the next longest side when the candidate character string is surrounded by the convex closure have substantially the same length and are parallel to each other. Of (about the height of the letters)
It is conceivable that there is a distance and the centers of all the character candidates that form the quasi-character string candidates are arranged on a straight line within a predetermined allowable range.

【００５０】準文字列候補が、文字列としての特徴を有
する場合には、それを文字列候補（３文字）として文字
列候補テーブル３２に登録する（ステップＳ６１２）。
このとき、その準文字列候補の信頼度は、これを構成す
る文字列候補と部分文字列の類似度を基に計算する。例
えば、それを構成する文字候補の類似度の平均あるいは
加重平均とする。If the quasi-character string candidate has a feature as a character string, it is registered in the character string candidate table 32 as a character string candidate (3 characters) (step S612).
At this time, the reliability of the quasi-character string candidate is calculated based on the similarity between the character string candidates and the partial character strings that form the candidate. For example, the average or the weighted average of the similarities of the character candidates forming the character candidate is used.

【００５１】ただし、この類似度は、部分文字列を作成
したときに求めたものを使い、新たに認識処理を行うこ
とはしない。以上の３文字の文字列候補作成処理（準文
字列候補作成処理）を、全ての部分文字列の組合せにつ
いて行なう。However, as this similarity, the one obtained when the partial character string was created is used, and new recognition processing is not performed. The above-described character string candidate creation process of three characters (quasi character string candidate creation process) is performed for all combinations of partial character strings.

【００５２】ここで、文字列候補テーブル３２のうち文
字候補（１文字）と部分文字列（２文字）と準文字列候
補（３文字）の部分が完成する。次に、４文字以上の文
字列候補作成処理を行う（ステップＳ６０７〜Ｓ６１
２）。Here, the character candidate (1 character), the partial character string (2 characters), and the quasi character string candidate (3 characters) of the character string candidate table 32 are completed. Next, a character string candidate creation process of four or more characters is performed (steps S607 to S61).
2).

【００５３】４文字以上の文字列候補作成処理でも、基
本的には、３文字の文字列候補作成処理と同じである。
ただし、組み合わせる文字列候補は、部分文字列候補同
士ではなく、部分文字列と、作成する文字列候補より１
文字少ない文字列候補の組合せで行う。つまり、４文字
の文字列候補は、部分文字列と３文字の文字列候補の組
合せによって行う。The process of creating a character string candidate of four or more characters is basically the same as the process of creating a character string candidate of three characters.
However, the character string candidates to be combined are not partial character string candidates, but 1 from the partial character string and the character string candidate to be created.
The combination of character string candidates with few characters is performed. That is, the 4-character character string candidates are obtained by combining the partial character strings and the 3-character character string candidates.

【００５４】この場合も、準文字列候補は、組み合わせ
る文字列候補同士が１文字の重なりを持っている場合だ
け行う。３文字の文字列候補作成と同じように、これら
の準文字列候補の文字列らしさを調べ、条件を満たせば
文字列候補として文字列候補テーブル３２に登録する。
このときの信頼度も、文字列候補を構成する文字候補の
類似度の平均などとして求める。Also in this case, the quasi-character string candidates are executed only when the character string candidates to be combined have an overlap of one character. Similar to the creation of the three-character string candidates, the character strings of these quasi-character string candidates are examined, and if the conditions are satisfied, they are registered in the character string candidate table 32 as character string candidates.
The reliability at this time is also obtained as an average of the similarities of the character candidates that form the character string candidates.

【００５５】例えば、図５の（ａ）に示す部分文字列
「５３」と、準文字列候補「３４７」を統合すると、
（ｂ）のように４文字の文字列候補「５３４７」ができ
る。そして、図６のように、該当するビットが立てら
れ、信頼度が計算される。For example, if the partial character string "53" shown in FIG. 5A and the quasi character string candidate "347" are integrated,
As shown in (b), a 4-character character string candidate “5347” can be created. Then, as shown in FIG. 6, the corresponding bit is set and the reliability is calculated.

【００５６】４文字の文字列候補を全ての組み合わせに
よって登録したら、同様の方法で、可能な限り、５文
字、６文字、…と順に作成して行く（ステップＳ６０７
〜Ｓ６１２）。When the character string candidates of 4 characters are registered by all the combinations, the same method is used to sequentially create 5 characters, 6 characters, ... (Step S607).
~ S612).

【００５７】以上のように、同じグループ内で文字列候
補を短いものから順に作成し、組み合わせる文字列候補
と部分文字列が無くなるまで繰り返し、文字列候補テー
ブル３２が完成する（ステップＳ６１３）。As described above, the character string candidates are created in order from the shortest one in the same group, and the character string candidate table 32 is completed by repeating the character string candidates and the partial character strings to be combined (step S613).

【００５８】この文字列候補テーブル３２は、図４のよ
うに、一番上が１文字の文字列候補で、下に行く程長い
文字列が登録されていることになる。＜文字列決定部１０の処理＞グループ内の文字列候補テ
ーブル３２が完成すると、次は、この文字列候補テーブ
ル３２を使って文字列を決定する文字列決定部３０によ
る処理に移る。In this character string candidate table 32, as shown in FIG. 4, the uppermost character string candidates are one character, and the character strings that are longer toward the bottom are registered. <Process of Character String Deciding Unit 10> When the character string candidate table 32 in the group is completed, next, the process proceeds to the process by the character string deciding unit 30 which determines the character string using the character string candidate table 32.

【００５９】前述のようにして出来上がった文字列候補
テーブル３２に登録された文字列候補は、認識処理によ
り求められた類似度を基に作成した信頼度を持ってお
り、これと文字列の位置・方向・長さ等を用いて、最良
の文字列の組合せを求めれば良い。文字列を決定する際
には、部分文字列も文字列候補として扱う。The character string candidates registered in the character string candidate table 32 created as described above have the reliability created based on the similarity calculated by the recognition process. -The best combination of character strings may be obtained by using the direction and length. When determining a character string, a partial character string is also treated as a character string candidate.

【００６０】本実施例では、まず、文字列候補テーブル
３２中の同じ長さの文字列候補の中でソートを行う。ソ
ートは、信頼度の低い文字列候補が上、高い文字列候補
が下にくるように行う。In this embodiment, first, character string candidates of the same length in the character string candidate table 32 are sorted. The sorting is performed so that the character string candidates with low reliability are on the top and the character string candidates with high reliability are on the bottom.

【００６１】もし、文字列が、「長いもの程文字列であ
る可能性が高い」かつ「同じ長さなら信頼度が大きい程
文字列である可能性が高い」とするなら、この文字列候
補テーブル３２は、下の文字列候補程、文字列である可
能性が高く、上にあがる程、文字列としての可能性が低
いことになる。If the character string "has a higher possibility of being a character string as long as it is" and "has a higher reliability of the same length as a character string is more likely", this character string candidate In the table 32, a lower character string candidate is more likely to be a character string, and an upper table is less likely to be a character string.

【００６２】そこで、この文字列候補テーブル３２を下
から順に検索して行き、文字列を決定する。なお、部分
文字列は、２文字の文字列候補として扱う。１文字の文
字列候補テーブル３２には信頼度がないため、ソートは
行わない。Therefore, the character string candidate table 32 is sequentially searched from the bottom to determine the character string. The partial character string is treated as a character string candidate of 2 characters. Since the one-character string candidate table 32 has no reliability, no sorting is performed.

【００６３】ここで、文字列決定部３０は、グループ内
の文字候補の通し番号を持つバッファを持つ（図示せ
ず）。これを、文字候補バッファと呼ぶ。まず、文字列
候補テーブル３２の一番下の文字列候補を、文字列と決
定する。そして、グループ内に含まれる文字候補で、こ
の文字列に含まれる文字候補を、文字候補バッファから
外す。下から順に文字列候補テーブル３２を検索して行
き、文字候補バッファに登録されている文字候補のみに
よって構成される文字列候補があれば、それを文字列と
して決定し、その文字列に含まれる文字候補を、文字候
補バッファから外す。Here, the character string determining unit 30 has a buffer having a serial number of character candidates in the group (not shown). This is called a character candidate buffer. First, the character string candidate at the bottom of the character string candidate table 32 is determined as a character string. Then, among the character candidates included in the group, the character candidates included in this character string are removed from the character candidate buffer. The character string candidate table 32 is searched in order from the bottom, and if there is a character string candidate configured only by the character candidates registered in the character candidate buffer, it is determined as a character string and included in the character string. Remove character candidates from the character candidate buffer.

【００６４】この処理を、文字候補バッファが空になる
か、２文字の部分文字列まで続ける。以上で、グループ
内の文字列決定処理が終了する。This process is continued until the character candidate buffer becomes empty or a partial character string of two characters. This is the end of the character string determination process within the group.

【００６５】以上の文字列切り出し処理を全てのグルー
プについて行うと、画像全体の文字列切り出しが終了す
る。以下では、図３に示した画像例を入力として、本実
施例の文字列切り出し処理をさらに具体的に説明する。When the above character string cutout processing is performed for all groups, the character string cutout for the entire image is completed. In the following, the character string cutout process of the present embodiment will be described more specifically by using the image example shown in FIG. 3 as an input.

【００６６】まず、文字候補抽出部２４により、図３
（ａ）のような文字候補がグループとしてまとまったと
する。図３の場合、矢印も文字候補になっているとす
る。この場合、部分文字列作成部２６により図３から得
られる部分文字列を、図７に示す。図７では、例えば０
番の「←（矢印）」と１番「Ａ」の文字候補ペアは、文
字候補同士の距離が近く大きさが揃っているが、認識に
かけたときに０番の「←矢印」に該当する辞書候補がな
く類似度が低くなるため、文字列候補テーブル３２は作
成されない。４番の「７」と５番の「→（矢印）」、１
１番の「６」と１２番の「６」、１３番の「Ｄ」と１４
番の「Ｈ」等についても統合不可と判断される。７番の
「１」と９番の「Ｂ」も、中心線からの高さが揃ってい
ないため、認識処理に入る前にはじかれる。First, the character candidate extracting section 24 is used to execute the process shown in FIG.
It is assumed that character candidates as shown in (a) are collected as a group. In the case of FIG. 3, it is assumed that the arrow is also a character candidate. In this case, the partial character string obtained from FIG. 3 by the partial character string creating unit 26 is shown in FIG. In FIG. 7, for example, 0
The character pairs of number "← (arrow)" and number 1 "A" are close to each other and have the same size, but they correspond to number "← arrow" when they are recognized. Since there is no dictionary candidate and the similarity is low, the character string candidate table 32 is not created. No. 4 "7" and No. 5 "→ (arrow)", 1
No. 1 "6" and No. 12 "6", No. 13 "D" and 14
It is also determined that the number “H” and the like cannot be integrated. Nos. 7 "1" and 9 "B" do not have the same height from the center line, and are therefore repelled before the recognition process.

【００６７】次に、文字列候補作成部２８による処理に
移る。まず、３文字の文字列候補として、文字候補番号
（１，２）（２，３）のペアから（１，２，３）の文字
列候補を要素として持つ文字列候補が得られる。Next, the processing by the character string candidate creating section 28 is started. First, as a character string candidate of three characters, a character string candidate having a character string candidate of (1,2,3) as an element is obtained from a pair of character candidate numbers (1,2) (2,3).

【００６８】同様に、（２，３，４）、（６，７，
８）、（９，１０，１１）、（１３，１５，１６）を要
素として持つ文字列候補が夫々作成される。次に、４文
字の文字列候補として、文字候補番号（１，２，３）
（３，４）または（１，２）（２，３，４）から構成さ
れる文字列候補の組合せによって文字列候補（１，２，
３，４）ができる。同じ文字列は作成しないので、文字
列候補（１，２，３，４）は１つだけ登録される。Similarly, (2, 3, 4), (6, 7,
Character string candidates having elements 8), (9, 10, 11), and (13, 15, 16) are created, respectively. Next, as a character string candidate of 4 characters, character candidate number (1, 2, 3)
A combination of character string candidates (1, 4, 2) or (1, 2)
3, 4) can be done. Since the same character string is not created, only one character string candidate (1, 2, 3, 4) is registered.

【００６９】図３の場合は、５文字以上の文字列候補は
作成されないので、文字列候補テーブル３２は図４のよ
うになる。文字列候補テーブル３２が完成すると、文字
列決定部３０に処理が移る。今、簡便のため文字列候補
テーブル３２を信頼度でソートしても図４の状態から変
化がなかったとする。In the case of FIG. 3, character string candidates of 5 characters or more are not created, so the character string candidate table 32 is as shown in FIG. When the character string candidate table 32 is completed, the processing moves to the character string determination unit 30. For the sake of simplicity, assume that there is no change from the state of FIG. 4 even if the character string candidate table 32 is sorted by reliability.

【００７０】初め、文字候補バッファには、文字候補番
号０番から１６番までの文字候補が登録されている。ま
ず、文字列候補テーブル３２の一番下の文字列候補
（１，２，３，４）が文字列と決定される。文字列候補
バッファからは文字候補番号１、２、３、４の文字候補
が外される。Initially, the character candidate numbers 0 to 16 are registered in the character candidate buffer. First, the lowest character string candidate (1, 2, 3, 4) in the character string candidate table 32 is determined as a character string. The character candidates of character candidate numbers 1, 2, 3, and 4 are removed from the character string candidate buffer.

【００７１】文字列候補テーブル３２を下から順に文字
候補バッファに登録されている文字候補のみによって構
成される文字列を検索して行くと、文字列候補（１３，
１５，１６）が、次に決定される文字列である。When the character string candidate table 32 is searched for a character string composed of only character candidates registered in the character candidate buffer in order from the bottom, the character string candidates (13,
15, 16) is the character string to be determined next.

【００７２】同様に、文字列（９，１０，１１）（６，
７，８）が決定される。文字列候補（１，２，３）
（２，３，４）は、文字列バッファからはずれた文字候
補を要素としているので選ばれない。Similarly, the character strings (9, 10, 11) (6,
7, 8) are decided. Character string candidates (1, 2, 3)
(2, 3, 4) is not selected because it has the character candidates that have deviated from the character string buffer as elements.

【００７３】そこで、図３のグループの文字列は（１，
２，３，４）の「Ａ００７」、（６，７，８）の「Ｃ１
２」、（９，１０，１１）の「Ｂ８６」、（１３，１
５，１６）の「Ｄ５３」の４つと決定される。Therefore, the character strings of the group in FIG.
2, 3, 4) "A007", (6, 7, 8) "C1"
2 "," B86 "of (9, 10, 11), (13, 1
It is decided to be four of "D53" of 5, 16).

【００７４】以上のような本実施例により、以下の利点
が得られる。文字候補が多数集中し、外形的特徴だけで
は文字列を決定できないような場合でも、正しく文字列
を切り出せる。また、全ての文字候補の組合せを文字列
候補として検討する従来の方法よりも、はるかに処理効
率が良い。The following advantages can be obtained by this embodiment as described above. Even if a large number of character candidates are concentrated and the character string cannot be determined only by the external characteristics, the character string can be cut out correctly. Moreover, the processing efficiency is far better than the conventional method in which the combination of all character candidates is considered as a character string candidate.

【００７５】また、文字列決定を可能性の高い全ての文
字列候補を挙げたうえでできるので、文字列同士の最適
な組合せを見つけることが容易である。また、文字列決
定時に可能性の高い全ての文字列候補が揃っているた
め、複数の文字列の組み合わせ候補が作りやすく、形状
・認識だけでは判断できないような意味処理などの高度
な文字列決定処理に対して、複数の文字列候補の組合せ
を容易に提示できる。Further, since the character string can be determined after listing all the character string candidates having a high possibility, it is easy to find the optimum combination of the character strings. In addition, since all the character string candidates that are highly likely to be available at the time of character string determination are available, it is easy to create combination candidates for multiple character strings, and advanced character string determination such as semantic processing that cannot be judged only by shape and recognition. A combination of a plurality of character string candidates can be easily presented to the process.

【００７６】また、より具体的な点としては、部分文字
列作成部２６で部分文字列を構成する際、大きさが不揃
いであったり、２つの文字を部分文字列の方向に認識処
理にかけて類似度が低かった場合には、部分文字列を作
成しないので、従来の方式では正しい文字列切り出しが
困難であった様々な方向を持った文字列が集中している
ような画像（図３参照）の場合でも、部分文字列は図７
で示したものしか作成されない。Further, as a more specific point, when the partial character strings are formed by the partial character string creating unit 26, the sizes are not uniform, or the two characters are similar in the direction of the partial character strings in the recognition process. If the degree is low, the partial character string is not created, so it is difficult to cut out the correct character string with the conventional method An image in which character strings with various directions are concentrated (see Fig. 3) Even in the case of
Only those shown in are created.

【００７７】また、文字列候補作成部２８で文字列候補
を作成する際には、この部分文字列を基に文字列候補を
作成するので、文字同士の方向が違ったり大きさが不揃
いな文字列候補は作成されない。Further, when the character string candidate creating unit 28 creates the character string candidates, the character string candidates are created based on this partial character string, so that the characters are different in direction or are uneven in size. No column candidates are created.

【００７８】また、部分文字列作成部２６において、文
字認識処理を部分文字列作成後に行うため、処理量が少
なくて済む。また、文字列の信頼度を表す認識処理によ
る類似度は部分文字列作成部２６のみによって求め、文
字列候補作成部２８で作成される文字列の信頼度は文字
列候補を構成する部分文字列の信頼度を基に求めるた
め、文字列候補を作成するごとに認識処理を行う必要が
なく、処理時間が短縮される。Further, in the partial character string creating section 26, the character recognition processing is performed after the partial character string is created, so that the processing amount can be small. Further, the similarity by the recognition process indicating the reliability of the character string is obtained only by the partial character string creating unit 26, and the reliability of the character string created by the character string candidate creating unit 28 is the partial character string forming the character string candidate. Since it is obtained based on the reliability of, it is not necessary to perform recognition processing every time a character string candidate is created, and the processing time is shortened.

【００７９】＜変形例＞前述した部分文字列作成部６に
よる処理では、２つの文字候補を認識処理にかけるにあ
たって、認識処理は２つの文字候補により構成される文
字列の方向を文字の方向として、上下２方向について行
なった。<Modification> In the processing by the partial character string creating unit 6 described above, when the two character candidates are subjected to the recognition processing, the recognition processing uses the direction of the character string formed by the two character candidates as the character direction. , 2 directions in the vertical direction.

【００８０】文字列が縦に並んでいるものが含まれてい
ることを考慮して、上下２方向に加えて、左右２方向に
ついても認識処理を行ない、得られた４つの類似度に基
き信頼度を求めるようにしても良い。Considering that the character strings include vertically arranged strings, recognition processing is performed not only in the upper and lower two directions but also in the left and right directions, and the reliability is based on the obtained four similarities. You may ask for the degree.

【００８１】さらに、上下２方向と左右２方向に加えて
４５度の４方向についても認識処理を行なうなど、処理
対象の画像に応じて適宜設定すれば良い。また、本発明
は上述した各実施例に限定されるものではなく、その要
旨を逸脱しない範囲で、種々変形して実施することがで
きる。Furthermore, recognition processing may be performed in four directions of 45 degrees in addition to the two directions of up, down, left and right, and may be appropriately set according to the image to be processed. Further, the present invention is not limited to the above-described embodiments, and various modifications can be carried out without departing from the scope of the invention.

【００８２】[0082]

【発明の効果】本発明によれば、最初に入力画像中から
抽出された文字候補のうち統合できると判断された文字
候補同士を組合わせて部分文字列を作成しておき、次に
この部分文字列を妥当性を持って組合わせていくことで
文字列候補を作成していき、最後に例えば所定の評価基
準に従い文字列候補のうちから文字列を確定するので、
文字候補が多数集中し、外形的特徴だけでは文字列を決
定できないような場合でも、正しく文字列を切り出すこ
とができるとともに、全ての文字候補の組合せを文字列
候補として検討する従来の方法に比較して、はるかに効
率良く文字列を切り出すことができる。According to the present invention, a partial character string is created by combining character candidates that have been determined to be integrated among character candidates extracted from an input image first, and then this partial character string is created. We will create character string candidates by combining the character strings with validity, and finally, for example, determine the character string from among the character string candidates according to a predetermined evaluation standard.
Even if a large number of character candidates are concentrated and the character string cannot be determined only by the external features, the character string can be cut out correctly and compared with the conventional method that considers all combinations of character candidates as character string candidates. Then, the character string can be cut out much more efficiently.

【００８３】また、文字列決定を可能性の高い全ての文
字列候補を挙げた上で実施できるので、文字列同士の最
適な組合せを見つけることが容易である。また、文字列
の信頼度を表す認識処理による類似度は部分文字列を作
成するときだけ求め、文字列候補を作成する際の文字列
としての信頼度は文字列候補を構成する部分文字列の信
頼度を基に求めるため、処理時間が短縮されている。こ
のように本発明によれば、文字列が集中的に存在する場
合でも、的確かつ効率的に文字列を抽出することができ
る。Further, since the character string determination can be carried out after enumerating all the character string candidates having a high possibility, it is easy to find the optimum combination of the character strings. In addition, the similarity by recognition processing that represents the reliability of a character string is obtained only when creating a partial character string, and the reliability as a character string when creating a character string candidate is the reliability of the partial character string that constitutes the character string candidate. The processing time is shortened because it is calculated based on the reliability. As described above, according to the present invention, a character string can be accurately and efficiently extracted even when the character string is concentrated.

[Brief description of drawings]

【図１】本発明の一実施例に係る文字列切り出し装置の
構成を示す図FIG. 1 is a diagram showing a configuration of a character string clipping device according to an embodiment of the present invention.

【図２】同実施例の文字列切り出し処理の流れを示すフ
ローチャートFIG. 2 is a flowchart showing a flow of a character string cutout process of the embodiment.

【図３】文字列切り出し処理の対象となる画像の一例を
示す図FIG. 3 is a diagram showing an example of an image which is a target of a character string cutout process.

【図４】文字列候補テーブルの一例を示す図FIG. 4 is a diagram showing an example of a character string candidate table.

【図５】４文字の文字列候補の作成を説明するための図FIG. 5 is a diagram for explaining creation of a 4-character string candidate.

【図６】４文字の文字列候補の作成を説明するための図FIG. 6 is a diagram for explaining creation of character string candidates of four characters.

【図７】同実施例の部分文字列作成部で作成される部分
文字列の一例を示す図FIG. 7 is a diagram showing an example of a partial character string created by a partial character string creating unit of the embodiment.

[Explanation of symbols]

２２…画像入力部、２４…文字候補抽出部、２６…部分
文字列作成部、２８…文字列候補作成部、３０…文字列
決定部、３２…文字列候補テーブル22 ... Image input unit, 24 ... Character candidate extraction unit, 26 ... Partial character string creation unit, 28 ... Character string candidate creation unit, 30 ... Character string determination unit, 32 ... Character string candidate table

Claims

[Claims]

1. Only by extracting character candidates from an input image, combining the extracted character candidates to create a partial character string that can be a constituent element of the character string, and combining the created partial character strings. A character string segmentation method characterized by creating character string candidates and determining a character string from the created character string candidates.

2. The character string slicing method according to claim 1, wherein in the character string determination, the character string is determined based on the reliability obtained when the partial character string is created and when the character string candidate is created. .

3. The reliability of the partial character string is obtained based on the similarity as a character obtained by recognizing the character candidates forming the partial character string, and the reliability of the character string candidate is The character string cutout method according to claim 2, wherein the character string candidate is obtained based on the similarity of the partial character strings forming the character string candidate.

4. The character string cutout method according to claim 1, wherein the partial character string is created by combining the two extracted character candidates.

5. When a partial character string is created by combining the character candidates, the distance between the character candidates is shorter than a predetermined threshold value, the character candidates are substantially equal in size, and
When a partial character string is created from the character candidates and the similarity as a character satisfies a predetermined threshold value when the partial character string is recognized, the partial character string consisting of the character candidates is adopted. The character string cutout method according to claim 1.

6. When creating the character string candidates, the longer character string candidates are created by combining the partial character strings or by combining the partial character strings with the already created character string candidates, and the created character strings are created. The character string cutout method according to claim 1, wherein it is determined whether or not each candidate has a characteristic as a character string and whether or not to adopt it as a character string candidate.

7. When creating the character string candidates, the character string candidates are created only when the partial character strings to be combined or the partial character strings and the already created character string candidates have overlapping character candidates. The character string cutout method according to claim 6.

8. The character string cutout method according to claim 7, wherein a character string candidate is created when the character candidates overlap only one character.

9. When checking whether or not the character string candidates have characteristics as a character string, it is checked whether or not the heights of the character candidates forming the character string candidates are uniform and whether or not the character candidates are arranged in a straight line. The character string cutout method according to claim 6, wherein the character string candidates are created only when the result satisfies a predetermined condition.

10. When determining a character string from the character string candidates, the character string candidates are sorted in order of reliability for each character string candidate of each number of characters, and the character string is determined based on this result. The method for cutting out a character string according to claim 2.