JPH0535920A

JPH0535920A - Word recognition device

Info

Publication number: JPH0535920A
Application number: JP3192154A
Authority: JP
Inventors: Hiroyoshi Toda; 浩義戸田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1991-07-31
Filing date: 1991-07-31
Publication date: 1993-02-12
Anticipated expiration: 2016-04-09
Also published as: JP3154752B2

Abstract

PURPOSE:To improve recognition speed by deciding a word using a word score determined from the similarity degree, the existence rate and the form value among continuous three characters a three-character word in the recognition of sound/character, etc. CONSTITUTION:This device is provided with a word image storage means 101 storing a word image and a CPU. Characters are segmented from the CPU by changing the segment position variously from the word image, the segmented characters are recognized, the similarity degree, the existence rate, and the form value between continuous arbitrary number of character, calculated regarding the recognized character candidates, are a word score is calculated based on the numeric value and an appropriate word is decided from the calculated word scope.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声・文字などを単
語単位で認識する単語認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word recognition device for recognizing speech / characters in word units.

【０００２】[0002]

【従来の技術】従来の音声・文字認識装置においては、
単語の認識に際には、単語候補領域の切出し候補の組み
合わせより得られる文字候補から、認識結果候補を展開
処理して逐次新たな文字列を作成し、その文字列に対し
て言語辞書検索を行い、単語を決定するようにしている
（特開昭５９−０７８４００号公報、特開昭６３−２１
６１８８号公報参照）。2. Description of the Related Art In a conventional voice / character recognition device,
When recognizing a word, the recognition result candidates are expanded from the character candidates obtained from the combination of cut-out candidates in the word candidate area to sequentially create a new character string, and a language dictionary search is performed for the character string. By doing so, the word is decided (JP-A-59-078400, JP-A-63-21).
No. 6188).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記決
定方法では、（ｉ）文字の接触や分離などの理由により
切出し候補の数が多くなると、展開処理によって大量の
文字列を作ってしまい処理速度の低下が生じる。また、
（ｉｉ）言語辞書に登録されていない単語（固有名詞や
数値など）を正解として選択するのが困難である。However, in the above-mentioned determination method, (i) when the number of cut-out candidates increases due to the contact or separation of characters, a large amount of character strings are created by the expansion process and the processing speed increases. Degradation occurs. Also,
(Ii) It is difficult to select a word (proper noun, numerical value, etc.) that is not registered in the language dictionary as the correct answer.

【０００４】本発明では、文字列候補の全ての組み合わ
せを考えるのではなく、先頭から文字を決定（あるいは
上位候補に限定）していき、すでに決定している文字に
つながる可能性のある文字だけを考慮することにより、
上記問題点（ｉ）を解決する。また、文字列を構成する
連続した３文字についてのみの存在率を評価すること
で、上記問題点（ｉｉ）を解決する。In the present invention, instead of considering all combinations of character string candidates, characters are determined (or limited to upper candidates) from the beginning, and only characters that may lead to the already determined character are determined. By considering
The above problem (i) is solved. Further, the above problem (ii) is solved by evaluating the existence rate of only three consecutive characters forming the character string.

【０００５】すなわち、この発明は、音声・文字などの
認識において、単語候補領域から単語を決定する際、単
語候補領域の切出し候補を左端から右端へ順番に走査し
ていき、連続する３つの文字の３文字間類似度、存在
率、形状値から求まる単語得点を用いて、切出し候補の
それぞれの位置での最適な文字を逐次決定（あるいは上
位候補に限定）しながら単語を決定する単語認識装置を
提供するものである。That is, according to the present invention, when a word is determined from a word candidate area in the recognition of a voice / character, etc., the cutout candidates of the word candidate area are sequentially scanned from the left end to the right end, and three consecutive characters are extracted. A word recognition device that determines a word while sequentially determining (or limiting to upper candidates) the optimum character at each position of a cutout candidate using the word score obtained from the similarity between three characters, the existence rate, and the shape value Is provided.

【０００６】[0006]

【課題を解決するための手段】図１はこの発明の構成を
示すブロック図であり、図に示すように、この発明は、
イメージリーダによって読取られた単語として認識する
べきイメージを記憶する単語イメージ記憶手段101 と、
単語イメージ記憶手段101 に記憶されたイメージからキ
ャラクタとして認識するべきイメージを、切出し位置を
様々に変えて切出す切出し手段と102 、切出し手段102
によって切出された複数種類のイメージをキャラクタ候
補として認識するキャラクタ認識手段103 と、キャラク
タ認識手段103 によって認識されたキャラクタ候補につ
いて、連続する任意の個数のキャラクタ間の類似度、存
在率、形状値を算出し、その数値に基づいて単語得点を
算出する算出手段104 と、算出手段104 によって算出さ
れた単語得点から妥当な単語を決定する単語決定手段10
5 と、を備えてなる単語認識装置である。FIG. 1 is a block diagram showing the configuration of the present invention. As shown in the figure, the present invention is
Word image storage means 101 for storing an image to be recognized as a word read by an image reader,
A cutting-out means 102 for cutting out an image to be recognized as a character from the image stored in the word image storage means 101 at various cutting-out positions, and a cutting-out means 102.
Character recognition means 103 for recognizing a plurality of types of images cut out by character recognition as character candidates, and similarity, existence ratio, shape value between any number of consecutive characters for the character candidates recognized by character recognition means 103 And a word determining means 10 for determining a valid word from the word score calculated by the calculating means 104.
It is a word recognition device comprising 5 and 5.

【０００７】なお、この発明の切出し手段102 、キャラ
クタ認識手段103 、算出手段104 、及び単語決定手段10
5 としては、ＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ／Ｏポートか
らなるマイクロコンピュータを用いるのが便利であり、
単語イメージ記憶手段101 としては、通常、その中のＲ
ＡＭが用いられる。The cutout means 102, the character recognition means 103, the calculation means 104, and the word determination means 10 of the present invention.
As 5, it is convenient to use a microcomputer consisting of a CPU, ROM, RAM, and I / O port.
The word image storage means 101 is usually R
AM is used.

【０００８】[0008]

【作用】この発明によれば、単語イメージ記憶手段101
に記憶された単語イメージから、切出し位置を様々に変
えてキャラクタを切出して、それをキャラクタ候補と
し、そのキャラクタ候補について、連続するｎ（但しｎ
は自然数）個のキャラクタ間の類似度、存在率、形状値
を算出し、その数値から得られた単語得点に基づいて妥
当な単語を決定する。According to the present invention, the word image storage means 101
From the word image stored in, the character is cut out by changing the cut-out position variously, and the character is taken as a character candidate, and consecutive n (however, n
Is a natural number) and the similarity, existence rate, and shape value between the characters are calculated, and a valid word is determined based on the word score obtained from the numerical values.

【０００９】したがって、切出したキャラクタ候補から
最適なキャラクタを逐次決定していくので、接触・分離
などの理由によって切出したキャラクタ候補が多くなる
場合の処理速度を短縮することができる。また、言語辞
書に登録されていない単語の認識率が向上する。Therefore, the optimum character is sequentially determined from the cut-out character candidates, so that the processing speed can be shortened when the cut-out character candidates are increased due to contact or separation. In addition, the recognition rate of words that are not registered in the language dictionary is improved.

【００１０】[0010]

【実施例】以下、図面に示す実施例に基づいてこの発明
を詳述する。なお、これによってこの発明が限定される
ものではない。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on the embodiments shown in the drawings. The present invention is not limited to this.

【００１１】図２はこの発明による単語認識装置の一実
施例の構成を示すブロック図であり、以下に、活字英数
字ＯＣＲ（光学的文字読み取り装置）を例に取り、本発
明を説明する。この図において、１はイメージリーダに
よって読取られた単語として認識するべきイメージ（単
語矩形領域）を記憶する単語イメージバッファメモリ、
２は文字（キャラクタ）候補を格納する文字候補バッフ
ァメモリ、３は単語候補を格納する単語候補バッファメ
モリ、４は選択した単語を記憶する単語選択バッファメ
モリ、５は作業用メモリである。単語イメージバッファ
メモリ１、文字候補バッファメモリ２、単語候補バッフ
ァメモリ３、単語選択バッファメモリ４、作業用メモリ
５は、それぞれＲＡＭから構成されている。FIG. 2 is a block diagram showing the configuration of an embodiment of the word recognition device according to the present invention. The present invention will be described below by taking a printed alphanumeric character OCR (optical character reading device) as an example. In this figure, 1 is a word image buffer memory for storing an image (word rectangular area) to be recognized as a word read by an image reader,
2 is a character candidate buffer memory for storing character candidates, 3 is a word candidate buffer memory for storing word candidates, 4 is a word selection buffer memory for storing a selected word, and 5 is a working memory. The word image buffer memory 1, the character candidate buffer memory 2, the word candidate buffer memory 3, the word selection buffer memory 4, and the working memory 5 are each composed of RAM.

【００１２】６はＣＰＵ、７はプロセッサからなる単語
得点計算部であり、単語得点計算部７は、３文字間類似
度計算部７ａ、３文字間存在率計算部７ｂ、３文字間形
状値計算部７ｃを有している。８はシステム全体を制御
するプロセッサからなる制御部である。Reference numeral 6 is a CPU, and 7 is a word score calculation section comprising a processor. The word score calculation section 7 is a 3-character similarity calculation section 7a, a 3-character existence rate calculation section 7b, and a 3-character shape value calculation. It has a portion 7c. Reference numeral 8 is a control unit including a processor that controls the entire system.

【００１３】ＣＰＵ６は、単語イメージバッファメモリ
１に記憶されたイメージから文字（キャラクタ）として
認識するべきイメージを、切出し位置を様々に変えて切
出して文字候補バッファメモリ２に格納する。The CPU 6 cuts out an image to be recognized as a character from the image stored in the word image buffer memory 1 at various cutting positions and stores it in the character candidate buffer memory 2.

【００１４】次に、切出した複数種類のイメージを文字
候補（切出し候補）として認識し、認識した文字候補に
ついて、連続する３つの文字間の類似度、存在率、形状
値を、それぞれ３文字間類似度計算部７ａ、３文字間存
在率計算部７ｂ、３文字間形状値計算部７ｃで計算し、
それらの計算値に基づいて単語得点計算部７で単語得点
を算出して単語候補とする。そして、算出した単語得点
から妥当な単語を決定する。Next, the cut-out plural kinds of images are recognized as character candidates (cut-out candidates), and for the recognized character candidates, the similarity between three consecutive characters, the existence rate, and the shape value are respectively calculated between three characters. The similarity calculation unit 7a, the inter-character existence ratio calculation unit 7b, and the inter-character shape value calculation unit 7c calculate,
Based on these calculated values, the word score calculation unit 7 calculates a word score and sets it as a word candidate. Then, a valid word is determined from the calculated word score.

【００１５】ここで、３文字間類似度とは、連続した３
つの文字のそれぞれの類似度の平均値である。Here, the similarity between three characters means that three consecutive characters are consecutive.
It is the average value of the similarity of two characters.

【００１６】３文字間存在率とは、連続した３つの文字
の組み合わせが英文中に出現する確率である。例えばｔ
ｈｅやａｂｌといった組み合わせは、ｚｚｚや♯＄％と
いった組み合わせより、３文字間存在率が高いと考えら
れる。The existence ratio between three characters is the probability that a combination of three consecutive characters appears in an English sentence. For example, t
It is considered that the combination of he and abl has a higher three-character existence rate than the combination of zzz and # $%.

【００１７】３文字間形状値とは、連続した３つの文字
の相対的な形状の妥当性を表す値である。例えば、３つ
の文字の高さがほぼ同じであれば、ａｃｅという組み合
わせは、ａＣｅという組み合わせより、３文字間形状値
が高いと考えられる。The three-letter shape value is a value indicating the validity of the relative shape of three consecutive characters. For example, if the heights of three characters are almost the same, the combination of ace is considered to have a higher inter-letter shape value than the combination of aCe.

【００１８】単語得点とは、３文字間類似度、存在率、
形状値より求まる、第ｎ（但しｎは自然数）文字目まで
の単語としての妥当性を表す値である。ｎが１または２
のときの３文字間類似度、存在率、形状値は、それぞれ
１文字、２文字間の類似度、存在率、形状値を用いる。
単語得点は以下の式で求まる。Ｓ（１）＝Ｗｒ・Ｒ＋Ｗｐ・Ｐ＋Ｗｆ・ＦＳ（ｎ）＝｛Ｓ（ｎ−１）＋Ｗｒ・Ｒ＋Ｗｐ・Ｐ＋Ｗｆ・Ｆ｝／２Ｓ（ｎ）第ｎ文字目までの単語得点（但しｎ
は自然数）Ｒ，Ｐ，Ｆ３文字間類似度、存在率、形状値Ｗｒ，Ｗｐ，Ｗｆ３文字間類似度、存在率、形状値の
重みThe word score is the similarity between three characters, the existence rate,
It is a value that is obtained from the shape value and represents the validity as a word up to the n-th (where n is a natural number) character. n is 1 or 2
For the similarity between three characters, the existence rate, and the shape value in this case, the similarity, the existence rate, and the shape value between one character and two characters are used.
The word score is calculated by the following formula. S (1) = Wr.R + Wp.P + Wf.F S (n) = {S (n-1) + Wr.R + Wp.P + Wf.F} / 2 S (n) Word score up to the nth character (however, n
Is a natural number) R, P, F 3 Character similarity, existence ratio, shape value Wr, Wp, Wf 3 Character similarity, existence ratio, shape value weight

【００１９】切出し候補とは、単語矩形領域における文
字と文字の切れ目の候補である。単語矩形領域の右端と
左端も切出し候補の一つとして数える。文字と文字の切
れ目の可能性がある場所は全て切出し候補として求める
ので、通常は実際の文字の数以上に分割する。例えば、
Ｏという文字は（）の様に切出される可能性があり、
ｍという文字はｒｎの様に切出される可能性がある。The cutout candidates are candidates for characters and character breaks in the word rectangular area. The right and left ends of the word rectangular area are also counted as one of the cutout candidates. All characters and places where there is a possibility of character breaks are obtained as cut-out candidates, so normally divide into more than the actual number of characters. For example,
The letter O may be cut out like (),
The letter m may be cut out like rn.

【００２０】図３は文字候補バッファメモリ２の記憶内
容を示す説明図である。文字候補とは、上記切出し候補
の組み合わせから作られる、文字矩形領域の候補であ
る。文字候補は、文字矩形領域の左端の切出し候補（以
後、先頭ライン）、右端の切出し候補（以後、末尾ライ
ン）、文字認識によって得られる認識結果候補などの情
報を持つ。単語矩形領域の全ての文字候補は、文字候補
バッファメモリ２へ格納される。例えば１番から４番ま
で４つの切出し候補がある場合、次の様な切出し候補の
組み合わせの文字候補が、文字候補バッファメモリ２へ
格納される。１−２１−３１−４２−３２−４３−４FIG. 3 is an explanatory view showing the stored contents of the character candidate buffer memory 2. A character candidate is a candidate for a character rectangular area created from a combination of the cutout candidates. The character candidate has information such as a cutout candidate at the left end of the character rectangular area (hereinafter, the leading line), a cutout candidate at the right end (hereinafter, the ending line), and a recognition result candidate obtained by character recognition. All character candidates in the word rectangular area are stored in the character candidate buffer memory 2. For example, when there are four cutout candidates from No. 1 to No. 4, character candidates of the following combinations of cutout candidates are stored in the character candidate buffer memory 2. 1-2 1-3 1-4-4 2-3 2-4 3-4

【００２１】図４は単語候補バッファメモリ３及び単語
選択バッファメモリ４の記憶内容を示す説明図である。
単語候補バッファメモリ３には、単語矩形領域で先頭か
ら決定していった文字が順番に格納されている。単語候
補の最後の文字の末尾ラインを、その単語候補の末尾ラ
インと呼ぶこととする。単語候補は必ずしも１つではな
く、単語得点があるしきい値より大きい全ての単語候補
が、単語候補バッファメモリ３に格納されている。各単
語候補は、その末尾ラインと同じ先頭ラインを持つ文字
が最後に加えられ、新たに単語得点が計算されて単語選
択バッファメモリ４へ移される。単語候補バッファメモ
リ３内の単語候補が一通り評価された後で、単語選択バ
ッファメモリ４から単語得点の高い上位候補が再び単語
候補バッファメモリ３へ戻される。FIG. 4 is an explanatory view showing the stored contents of the word candidate buffer memory 3 and the word selection buffer memory 4.
In the word candidate buffer memory 3, the characters determined from the beginning in the word rectangular area are sequentially stored. The tail line of the last character of a word candidate is called the tail line of the word candidate. The number of word candidates is not always one, and all word candidates having a word score larger than a certain threshold value are stored in the word candidate buffer memory 3. For each word candidate, a character having the same start line as the end line is added to the end, a new word score is calculated, and the word score is moved to the word selection buffer memory 4. After the word candidates in the word candidate buffer memory 3 are evaluated once, the high-rank candidates having a high word score are returned from the word selection buffer memory 4 to the word candidate buffer memory 3 again.

【００２２】次に、図５の単語決定フローチャートに沿
って、本発明の動作を説明する。図５において、開始状
態は、単語矩形領域と切出し候補、文字候補がすでに求
められた状態である。単語矩形領域の切出し候補のう
ち、ある時点で処理の対象になっている切出し候補のこ
とを注目ラインと呼ぶ。注目ラインを単語矩形領域の左
端から右端まで順番に走査していき、走査が終了した時
点で、単語候補バッファメモリ３に残っているものが正
解候補である。Next, the operation of the present invention will be described with reference to the word determination flowchart of FIG. In FIG. 5, the start state is a state in which the word rectangular area, the cutout candidate, and the character candidate have already been obtained. Of the cutout candidates of the word rectangular area, the cutout candidate that is the target of the processing at a certain point of time is called a line of interest. The line of interest is sequentially scanned from the left end to the right end of the word rectangular area, and when the scanning is completed, the one remaining in the word candidate buffer memory 3 is the correct answer candidate.

【００２３】まず、単語候補バッファメモリ３をクリア
し、単語矩形領域の左端の切出し候補を末尾ラインとす
る、空白だけからなる単語候補を１つ作り、単語候補バ
ッファメモリ３に格納する。また、単語矩形領域の左端
の切出し候補を、注目ラインとする（ステップ11）。続
いて、単語選択バッファメモリ４をクリアする（ステッ
プ12）。次に、単語候補バッファメモリ３の中から、注
目ラインと同じ末尾ラインを持つ単語候補を一つ選択す
る。つまり、その単語候補を抜き出し、単語候補バッフ
ァメモリ３からは抹消する（ステップ13）。First, the word candidate buffer memory 3 is cleared, and one word candidate consisting of only blanks is created and stored in the word candidate buffer memory 3 with the cutout candidate at the left end of the word rectangular area as the end line. Further, the cutout candidate at the left end of the word rectangular area is set as the attention line (step 11). Then, the word selection buffer memory 4 is cleared (step 12). Next, one word candidate having the same end line as the attention line is selected from the word candidate buffer memory 3. That is, the word candidate is extracted and deleted from the word candidate buffer memory 3 (step 13).

【００２４】そして、文字候補バッファメモリ２の中か
ら、注目ラインと同じ先頭ラインを持つ文字候補を一つ
選択する（ステップ14）。また、現在選択している文字
候補の中から、文字認識によって得られた認識結果候補
を一つ選択する（ステップ15）。Then, one character candidate having the same leading line as the target line is selected from the character candidate buffer memory 2 (step 14). Further, one recognition result candidate obtained by character recognition is selected from the currently selected character candidates (step 15).

【００２５】次に、現在選択している単語候補の最後に
上で得られた文字を加えたものを、新たな単語候補とし
て単語選択バッファメモリ４へ格納する。このとき、最
後の３つの文字から３文字間類似度、存在率、形状値を
用いてその単語候補の単語得点を求めておく（ステップ
16）。そして、現在選択している文字候補の中に、文字
認識によって得られた認識結果候補が他にあればステッ
プ15へ戻る（ステップ17）。また、文字候補バッファメ
モリ２の中に、注目ラインと同じ先頭ラインを持つ文字
候補が他にあればステップ14へ戻る（ステップ18）。さ
らに、単語候補バッファメモリ３の中に、注目ラインと
同じ末尾ラインを持つ単語候補が他にあればステップ13
に戻る（ステップ19）。Next, the word selected at the end of the currently selected word candidate and the character obtained above are stored in the word selection buffer memory 4 as a new word candidate. At this time, the word score of the word candidate is obtained from the last three characters by using the similarity between three characters, the existence rate, and the shape value (step
16). Then, if there is another recognition result candidate obtained by character recognition among the currently selected character candidates, the process returns to step 15 (step 17). If there is another character candidate in the character candidate buffer memory 2 that has the same leading line as the line of interest, the process returns to step 14 (step 18). Furthermore, if there is another word candidate in the word candidate buffer memory 3 that has the same end line as the line of interest, step 13
Return to (step 19).

【００２６】そして、単語選択バッファメモリ４の中か
ら、単語得点の最も高い単語候補（あるいは上位候補）
を単語候補バッファメモリ３へ戻す。このとき、単語候
補バッファメモリ３の単語候補の末尾ラインのうち、単
語矩形領域で最も左に位置するものを新しい注目ライン
とする（ステップ20）。ここで、新しい注目ラインが単
語矩形領域の右端でないならば、まだ後に続く文字があ
るのでステップ12へ戻る（ステップ21）。そして、単語
候補バッファメモリ３の中から、単語得点の最も高い単
語候補を正解として決定する（ステップ22）。Then, from the word selection buffer memory 4, a word candidate with the highest word score (or a high-rank candidate) is selected.
Is returned to the word candidate buffer memory 3. At this time, among the end lines of the word candidates in the word candidate buffer memory 3, the leftmost one in the word rectangular area is set as a new line of interest (step 20). If the new line of interest is not at the right end of the word rectangular area, there are still characters to follow, and the process returns to step 12 (step 21). Then, the word candidate having the highest word score is determined as the correct answer from the word candidate buffer memory 3 (step 22).

【００２７】このようにして、単語のイメージから切出
した文字候補を左端から右端へ順番に走査していき、連
続する３つの文字の３文字間類似度、存在率、形状値か
ら、切出した文字候補のそれぞれの位置での最適な文字
を逐次決定しながら単語を決定することにより、接触・
分離などにより判定しにくい文字であっても、正確、か
つ迅速に認識可能となる。In this way, the character candidates cut out from the image of the word are sequentially scanned from the left end to the right end, and the cut out characters are extracted from the similarity between three characters, the existence rate, and the shape value of three consecutive characters. By determining the word while sequentially determining the optimal character at each position of the candidate,
Even characters that are difficult to determine due to separation or the like can be accurately and quickly recognized.

【００２８】[0028]

【発明の効果】この発明によれば、単語のイメージから
キャラクタ候補を切出す場合に、切出し位置を様々に変
えて、最適なキャラクタを逐次決定しながら単語を決定
するようにしたので、接触・分離などの理由により切出
すキャラクタ候補が多くなる場合の処理速度が短縮され
る。また、言語辞書に登録されていない単語の認識率が
向上する。According to the present invention, when a character candidate is cut out from a word image, the cutting position is variously changed and the word is determined while the optimum character is sequentially determined. The processing speed is reduced when the number of character candidates to be cut out increases due to reasons such as separation. In addition, the recognition rate of words that are not registered in the language dictionary is improved.

[Brief description of drawings]

【図１】この発明の構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of the present invention.

【図２】この発明の一実施例の構成を示すブロック図。FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention.

【図３】文字候補バッファメモリの記憶内容を示す説明
図。FIG. 3 is an explanatory diagram showing stored contents of a character candidate buffer memory.

【図４】単語候補バッファメモリ及び単語選択バッファ
メモリの記憶内容を示す説明図。FIG. 4 is an explanatory diagram showing stored contents of a word candidate buffer memory and a word selection buffer memory.

【図５】実施例の動作を示すフローチャート。FIG. 5 is a flowchart showing the operation of the embodiment.

[Explanation of symbols]

１単語イメージバッファメモリ２文字候補バッファメモリ３単語候補バッファメモリ４単語選択バッファメモリ５作業用メモリ６ＣＰＵ６ａ３文字間類似度計算部６ｂ３文字間存在率計算部６ｃ３文字間形状値計算部７単語得点計算部８制御部 1 word image buffer memory 2 character candidate buffer memory 3 word candidate buffer memory 4 word selection buffer memory 5 working memory 6 CPU 6a 3 character similarity calculation unit 6b 3 character existence rate calculation unit 6c 3 character shape value calculation unit 7 Word score calculation unit 8 Control unit

Claims

Claims: 1. A word image storage means for storing an image to be recognized as a word read by an image reader, and an image to be recognized as a character from the image stored in the word image storage means. The cutting means for cutting the cutting position in various ways, the character recognition means for recognizing a plurality of types of images cut out by the cutting means as character candidates, and the character candidates recognized by the character recognition means for consecutive arbitrary Calculation means for calculating the similarity, existence rate, and shape value between a certain number of characters, and a word determination means for calculating a word score based on the numerical values; and a word determination means for determining a valid word from the word scores calculated by the calculation means. A word recognition device comprising: