JPH0535920A - Word recognition device - Google Patents

Word recognition device

Info

Publication number
JPH0535920A
JPH0535920A JP3192154A JP19215491A JPH0535920A JP H0535920 A JPH0535920 A JP H0535920A JP 3192154 A JP3192154 A JP 3192154A JP 19215491 A JP19215491 A JP 19215491A JP H0535920 A JPH0535920 A JP H0535920A
Authority
JP
Japan
Prior art keywords
word
character
candidate
characters
candidates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP3192154A
Other languages
Japanese (ja)
Other versions
JP3154752B2 (en
Inventor
Hiroyoshi Toda
浩義 戸田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Priority to JP19215491A priority Critical patent/JP3154752B2/en
Publication of JPH0535920A publication Critical patent/JPH0535920A/en
Application granted granted Critical
Publication of JP3154752B2 publication Critical patent/JP3154752B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve recognition speed by deciding a word using a word score determined from the similarity degree, the existence rate and the form value among continuous three characters a three-character word in the recognition of sound/character, etc. CONSTITUTION:This device is provided with a word image storage means 101 storing a word image and a CPU. Characters are segmented from the CPU by changing the segment position variously from the word image, the segmented characters are recognized, the similarity degree, the existence rate, and the form value between continuous arbitrary number of character, calculated regarding the recognized character candidates, are a word score is calculated based on the numeric value and an appropriate word is decided from the calculated word scope.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は、音声・文字などを単
語単位で認識する単語認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word recognition device for recognizing speech / characters in word units.

【0002】[0002]

【従来の技術】従来の音声・文字認識装置においては、
単語の認識に際には、単語候補領域の切出し候補の組み
合わせより得られる文字候補から、認識結果候補を展開
処理して逐次新たな文字列を作成し、その文字列に対し
て言語辞書検索を行い、単語を決定するようにしている
(特開昭59−078400号公報、特開昭63−21
6188号公報参照)。
2. Description of the Related Art In a conventional voice / character recognition device,
When recognizing a word, the recognition result candidates are expanded from the character candidates obtained from the combination of cut-out candidates in the word candidate area to sequentially create a new character string, and a language dictionary search is performed for the character string. By doing so, the word is decided (JP-A-59-078400, JP-A-63-21).
No. 6188).

【0003】[0003]

【発明が解決しようとする課題】しかしながら、上記決
定方法では、(i)文字の接触や分離などの理由により
切出し候補の数が多くなると、展開処理によって大量の
文字列を作ってしまい処理速度の低下が生じる。また、
(ii)言語辞書に登録されていない単語(固有名詞や
数値など)を正解として選択するのが困難である。
However, in the above-mentioned determination method, (i) when the number of cut-out candidates increases due to the contact or separation of characters, a large amount of character strings are created by the expansion process and the processing speed increases. Degradation occurs. Also,
(Ii) It is difficult to select a word (proper noun, numerical value, etc.) that is not registered in the language dictionary as the correct answer.

【0004】本発明では、文字列候補の全ての組み合わ
せを考えるのではなく、先頭から文字を決定(あるいは
上位候補に限定)していき、すでに決定している文字に
つながる可能性のある文字だけを考慮することにより、
上記問題点(i)を解決する。また、文字列を構成する
連続した3文字についてのみの存在率を評価すること
で、上記問題点(ii)を解決する。
In the present invention, instead of considering all combinations of character string candidates, characters are determined (or limited to upper candidates) from the beginning, and only characters that may lead to the already determined character are determined. By considering
The above problem (i) is solved. Further, the above problem (ii) is solved by evaluating the existence rate of only three consecutive characters forming the character string.

【0005】すなわち、この発明は、音声・文字などの
認識において、単語候補領域から単語を決定する際、単
語候補領域の切出し候補を左端から右端へ順番に走査し
ていき、連続する3つの文字の3文字間類似度、存在
率、形状値から求まる単語得点を用いて、切出し候補の
それぞれの位置での最適な文字を逐次決定(あるいは上
位候補に限定)しながら単語を決定する単語認識装置を
提供するものである。
That is, according to the present invention, when a word is determined from a word candidate area in the recognition of a voice / character, etc., the cutout candidates of the word candidate area are sequentially scanned from the left end to the right end, and three consecutive characters are extracted. A word recognition device that determines a word while sequentially determining (or limiting to upper candidates) the optimum character at each position of a cutout candidate using the word score obtained from the similarity between three characters, the existence rate, and the shape value Is provided.

【0006】[0006]

【課題を解決するための手段】図1はこの発明の構成を
示すブロック図であり、図に示すように、この発明は、
イメージリーダによって読取られた単語として認識する
べきイメージを記憶する単語イメージ記憶手段101 と、
単語イメージ記憶手段101 に記憶されたイメージからキ
ャラクタとして認識するべきイメージを、切出し位置を
様々に変えて切出す切出し手段と102 、切出し手段102
によって切出された複数種類のイメージをキャラクタ候
補として認識するキャラクタ認識手段103 と、キャラク
タ認識手段103 によって認識されたキャラクタ候補につ
いて、連続する任意の個数のキャラクタ間の類似度、存
在率、形状値を算出し、その数値に基づいて単語得点を
算出する算出手段104 と、算出手段104 によって算出さ
れた単語得点から妥当な単語を決定する単語決定手段10
5 と、を備えてなる単語認識装置である。
FIG. 1 is a block diagram showing the configuration of the present invention. As shown in the figure, the present invention is
Word image storage means 101 for storing an image to be recognized as a word read by an image reader,
A cutting-out means 102 for cutting out an image to be recognized as a character from the image stored in the word image storage means 101 at various cutting-out positions, and a cutting-out means 102.
Character recognition means 103 for recognizing a plurality of types of images cut out by character recognition as character candidates, and similarity, existence ratio, shape value between any number of consecutive characters for the character candidates recognized by character recognition means 103 And a word determining means 10 for determining a valid word from the word score calculated by the calculating means 104.
It is a word recognition device comprising 5 and 5.

【0007】なお、この発明の切出し手段102 、キャラ
クタ認識手段103 、算出手段104 、及び単語決定手段10
5 としては、CPU、ROM、RAM、I/Oポートか
らなるマイクロコンピュータを用いるのが便利であり、
単語イメージ記憶手段101 としては、通常、その中のR
AMが用いられる。
The cutout means 102, the character recognition means 103, the calculation means 104, and the word determination means 10 of the present invention.
As 5, it is convenient to use a microcomputer consisting of a CPU, ROM, RAM, and I / O port.
The word image storage means 101 is usually R
AM is used.

【0008】[0008]

【作用】この発明によれば、単語イメージ記憶手段101
に記憶された単語イメージから、切出し位置を様々に変
えてキャラクタを切出して、それをキャラクタ候補と
し、そのキャラクタ候補について、連続するn(但しn
は自然数)個のキャラクタ間の類似度、存在率、形状値
を算出し、その数値から得られた単語得点に基づいて妥
当な単語を決定する。
According to the present invention, the word image storage means 101
From the word image stored in, the character is cut out by changing the cut-out position variously, and the character is taken as a character candidate, and consecutive n (however, n
Is a natural number) and the similarity, existence rate, and shape value between the characters are calculated, and a valid word is determined based on the word score obtained from the numerical values.

【0009】したがって、切出したキャラクタ候補から
最適なキャラクタを逐次決定していくので、接触・分離
などの理由によって切出したキャラクタ候補が多くなる
場合の処理速度を短縮することができる。また、言語辞
書に登録されていない単語の認識率が向上する。
Therefore, the optimum character is sequentially determined from the cut-out character candidates, so that the processing speed can be shortened when the cut-out character candidates are increased due to contact or separation. In addition, the recognition rate of words that are not registered in the language dictionary is improved.

【0010】[0010]

【実施例】以下、図面に示す実施例に基づいてこの発明
を詳述する。なお、これによってこの発明が限定される
ものではない。
DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on the embodiments shown in the drawings. The present invention is not limited to this.

【0011】図2はこの発明による単語認識装置の一実
施例の構成を示すブロック図であり、以下に、活字英数
字OCR(光学的文字読み取り装置)を例に取り、本発
明を説明する。この図において、1はイメージリーダに
よって読取られた単語として認識するべきイメージ(単
語矩形領域)を記憶する単語イメージバッファメモリ、
2は文字(キャラクタ)候補を格納する文字候補バッフ
ァメモリ、3は単語候補を格納する単語候補バッファメ
モリ、4は選択した単語を記憶する単語選択バッファメ
モリ、5は作業用メモリである。単語イメージバッファ
メモリ1、文字候補バッファメモリ2、単語候補バッフ
ァメモリ3、単語選択バッファメモリ4、作業用メモリ
5は、それぞれRAMから構成されている。
FIG. 2 is a block diagram showing the configuration of an embodiment of the word recognition device according to the present invention. The present invention will be described below by taking a printed alphanumeric character OCR (optical character reading device) as an example. In this figure, 1 is a word image buffer memory for storing an image (word rectangular area) to be recognized as a word read by an image reader,
2 is a character candidate buffer memory for storing character candidates, 3 is a word candidate buffer memory for storing word candidates, 4 is a word selection buffer memory for storing a selected word, and 5 is a working memory. The word image buffer memory 1, the character candidate buffer memory 2, the word candidate buffer memory 3, the word selection buffer memory 4, and the working memory 5 are each composed of RAM.

【0012】6はCPU、7はプロセッサからなる単語
得点計算部であり、単語得点計算部7は、3文字間類似
度計算部7a、3文字間存在率計算部7b、3文字間形
状値計算部7cを有している。8はシステム全体を制御
するプロセッサからなる制御部である。
Reference numeral 6 is a CPU, and 7 is a word score calculation section comprising a processor. The word score calculation section 7 is a 3-character similarity calculation section 7a, a 3-character existence rate calculation section 7b, and a 3-character shape value calculation. It has a portion 7c. Reference numeral 8 is a control unit including a processor that controls the entire system.

【0013】CPU6は、単語イメージバッファメモリ
1に記憶されたイメージから文字(キャラクタ)として
認識するべきイメージを、切出し位置を様々に変えて切
出して文字候補バッファメモリ2に格納する。
The CPU 6 cuts out an image to be recognized as a character from the image stored in the word image buffer memory 1 at various cutting positions and stores it in the character candidate buffer memory 2.

【0014】次に、切出した複数種類のイメージを文字
候補(切出し候補)として認識し、認識した文字候補に
ついて、連続する3つの文字間の類似度、存在率、形状
値を、それぞれ3文字間類似度計算部7a、3文字間存
在率計算部7b、3文字間形状値計算部7cで計算し、
それらの計算値に基づいて単語得点計算部7で単語得点
を算出して単語候補とする。そして、算出した単語得点
から妥当な単語を決定する。
Next, the cut-out plural kinds of images are recognized as character candidates (cut-out candidates), and for the recognized character candidates, the similarity between three consecutive characters, the existence rate, and the shape value are respectively calculated between three characters. The similarity calculation unit 7a, the inter-character existence ratio calculation unit 7b, and the inter-character shape value calculation unit 7c calculate,
Based on these calculated values, the word score calculation unit 7 calculates a word score and sets it as a word candidate. Then, a valid word is determined from the calculated word score.

【0015】ここで、3文字間類似度とは、連続した3
つの文字のそれぞれの類似度の平均値である。
Here, the similarity between three characters means that three consecutive characters are consecutive.
It is the average value of the similarity of two characters.

【0016】3文字間存在率とは、連続した3つの文字
の組み合わせが英文中に出現する確率である。例えばt
heやablといった組み合わせは、zzzや♯$%と
いった組み合わせより、3文字間存在率が高いと考えら
れる。
The existence ratio between three characters is the probability that a combination of three consecutive characters appears in an English sentence. For example, t
It is considered that the combination of he and abl has a higher three-character existence rate than the combination of zzz and # $%.

【0017】3文字間形状値とは、連続した3つの文字
の相対的な形状の妥当性を表す値である。例えば、3つ
の文字の高さがほぼ同じであれば、aceという組み合
わせは、aCeという組み合わせより、3文字間形状値
が高いと考えられる。
The three-letter shape value is a value indicating the validity of the relative shape of three consecutive characters. For example, if the heights of three characters are almost the same, the combination of ace is considered to have a higher inter-letter shape value than the combination of aCe.

【0018】単語得点とは、3文字間類似度、存在率、
形状値より求まる、第n(但しnは自然数)文字目まで
の単語としての妥当性を表す値である。nが1または2
のときの3文字間類似度、存在率、形状値は、それぞれ
1文字、2文字間の類似度、存在率、形状値を用いる。
単語得点は以下の式で求まる。 S(1)=Wr・R+Wp・P+Wf・F S(n)={S(n−1)+Wr・R+Wp・P+Wf・F}/2 S(n) 第n文字目までの単語得点(但しn
は自然数) R,P,F 3文字間類似度、存在率、形状値 Wr,Wp,Wf 3文字間類似度、存在率、形状値の
重み
The word score is the similarity between three characters, the existence rate,
It is a value that is obtained from the shape value and represents the validity as a word up to the n-th (where n is a natural number) character. n is 1 or 2
For the similarity between three characters, the existence rate, and the shape value in this case, the similarity, the existence rate, and the shape value between one character and two characters are used.
The word score is calculated by the following formula. S (1) = Wr.R + Wp.P + Wf.F S (n) = {S (n-1) + Wr.R + Wp.P + Wf.F} / 2 S (n) Word score up to the nth character (however, n
Is a natural number) R, P, F 3 Character similarity, existence ratio, shape value Wr, Wp, Wf 3 Character similarity, existence ratio, shape value weight

【0019】切出し候補とは、単語矩形領域における文
字と文字の切れ目の候補である。単語矩形領域の右端と
左端も切出し候補の一つとして数える。文字と文字の切
れ目の可能性がある場所は全て切出し候補として求める
ので、通常は実際の文字の数以上に分割する。例えば、
Oという文字は( )の様に切出される可能性があり、
mという文字はrnの様に切出される可能性がある。
The cutout candidates are candidates for characters and character breaks in the word rectangular area. The right and left ends of the word rectangular area are also counted as one of the cutout candidates. All characters and places where there is a possibility of character breaks are obtained as cut-out candidates, so normally divide into more than the actual number of characters. For example,
The letter O may be cut out like (),
The letter m may be cut out like rn.

【0020】図3は文字候補バッファメモリ2の記憶内
容を示す説明図である。文字候補とは、上記切出し候補
の組み合わせから作られる、文字矩形領域の候補であ
る。文字候補は、文字矩形領域の左端の切出し候補(以
後、先頭ライン)、右端の切出し候補(以後、末尾ライ
ン)、文字認識によって得られる認識結果候補などの情
報を持つ。単語矩形領域の全ての文字候補は、文字候補
バッファメモリ2へ格納される。例えば1番から4番ま
で4つの切出し候補がある場合、次の様な切出し候補の
組み合わせの文字候補が、文字候補バッファメモリ2へ
格納される。 1−2 1−3 1−4 2−3 2−4 3−4
FIG. 3 is an explanatory view showing the stored contents of the character candidate buffer memory 2. A character candidate is a candidate for a character rectangular area created from a combination of the cutout candidates. The character candidate has information such as a cutout candidate at the left end of the character rectangular area (hereinafter, the leading line), a cutout candidate at the right end (hereinafter, the ending line), and a recognition result candidate obtained by character recognition. All character candidates in the word rectangular area are stored in the character candidate buffer memory 2. For example, when there are four cutout candidates from No. 1 to No. 4, character candidates of the following combinations of cutout candidates are stored in the character candidate buffer memory 2. 1-2 1-3 1-4-4 2-3 2-4 3-4

【0021】図4は単語候補バッファメモリ3及び単語
選択バッファメモリ4の記憶内容を示す説明図である。
単語候補バッファメモリ3には、単語矩形領域で先頭か
ら決定していった文字が順番に格納されている。単語候
補の最後の文字の末尾ラインを、その単語候補の末尾ラ
インと呼ぶこととする。単語候補は必ずしも1つではな
く、単語得点があるしきい値より大きい全ての単語候補
が、単語候補バッファメモリ3に格納されている。各単
語候補は、その末尾ラインと同じ先頭ラインを持つ文字
が最後に加えられ、新たに単語得点が計算されて単語選
択バッファメモリ4へ移される。単語候補バッファメモ
リ3内の単語候補が一通り評価された後で、単語選択バ
ッファメモリ4から単語得点の高い上位候補が再び単語
候補バッファメモリ3へ戻される。
FIG. 4 is an explanatory view showing the stored contents of the word candidate buffer memory 3 and the word selection buffer memory 4.
In the word candidate buffer memory 3, the characters determined from the beginning in the word rectangular area are sequentially stored. The tail line of the last character of a word candidate is called the tail line of the word candidate. The number of word candidates is not always one, and all word candidates having a word score larger than a certain threshold value are stored in the word candidate buffer memory 3. For each word candidate, a character having the same start line as the end line is added to the end, a new word score is calculated, and the word score is moved to the word selection buffer memory 4. After the word candidates in the word candidate buffer memory 3 are evaluated once, the high-rank candidates having a high word score are returned from the word selection buffer memory 4 to the word candidate buffer memory 3 again.

【0022】次に、図5の単語決定フローチャートに沿
って、本発明の動作を説明する。図5において、開始状
態は、単語矩形領域と切出し候補、文字候補がすでに求
められた状態である。単語矩形領域の切出し候補のう
ち、ある時点で処理の対象になっている切出し候補のこ
とを注目ラインと呼ぶ。注目ラインを単語矩形領域の左
端から右端まで順番に走査していき、走査が終了した時
点で、単語候補バッファメモリ3に残っているものが正
解候補である。
Next, the operation of the present invention will be described with reference to the word determination flowchart of FIG. In FIG. 5, the start state is a state in which the word rectangular area, the cutout candidate, and the character candidate have already been obtained. Of the cutout candidates of the word rectangular area, the cutout candidate that is the target of the processing at a certain point of time is called a line of interest. The line of interest is sequentially scanned from the left end to the right end of the word rectangular area, and when the scanning is completed, the one remaining in the word candidate buffer memory 3 is the correct answer candidate.

【0023】まず、単語候補バッファメモリ3をクリア
し、単語矩形領域の左端の切出し候補を末尾ラインとす
る、空白だけからなる単語候補を1つ作り、単語候補バ
ッファメモリ3に格納する。また、単語矩形領域の左端
の切出し候補を、注目ラインとする(ステップ11)。続
いて、単語選択バッファメモリ4をクリアする(ステッ
プ12)。次に、単語候補バッファメモリ3の中から、注
目ラインと同じ末尾ラインを持つ単語候補を一つ選択す
る。つまり、その単語候補を抜き出し、単語候補バッフ
ァメモリ3からは抹消する(ステップ13)。
First, the word candidate buffer memory 3 is cleared, and one word candidate consisting of only blanks is created and stored in the word candidate buffer memory 3 with the cutout candidate at the left end of the word rectangular area as the end line. Further, the cutout candidate at the left end of the word rectangular area is set as the attention line (step 11). Then, the word selection buffer memory 4 is cleared (step 12). Next, one word candidate having the same end line as the attention line is selected from the word candidate buffer memory 3. That is, the word candidate is extracted and deleted from the word candidate buffer memory 3 (step 13).

【0024】そして、文字候補バッファメモリ2の中か
ら、注目ラインと同じ先頭ラインを持つ文字候補を一つ
選択する(ステップ14)。また、現在選択している文字
候補の中から、文字認識によって得られた認識結果候補
を一つ選択する(ステップ15)。
Then, one character candidate having the same leading line as the target line is selected from the character candidate buffer memory 2 (step 14). Further, one recognition result candidate obtained by character recognition is selected from the currently selected character candidates (step 15).

【0025】次に、現在選択している単語候補の最後に
上で得られた文字を加えたものを、新たな単語候補とし
て単語選択バッファメモリ4へ格納する。このとき、最
後の3つの文字から3文字間類似度、存在率、形状値を
用いてその単語候補の単語得点を求めておく(ステップ
16)。そして、現在選択している文字候補の中に、文字
認識によって得られた認識結果候補が他にあればステッ
プ15へ戻る(ステップ17)。また、文字候補バッファメ
モリ2の中に、注目ラインと同じ先頭ラインを持つ文字
候補が他にあればステップ14へ戻る(ステップ18)。さ
らに、単語候補バッファメモリ3の中に、注目ラインと
同じ末尾ラインを持つ単語候補が他にあればステップ13
に戻る(ステップ19)。
Next, the word selected at the end of the currently selected word candidate and the character obtained above are stored in the word selection buffer memory 4 as a new word candidate. At this time, the word score of the word candidate is obtained from the last three characters by using the similarity between three characters, the existence rate, and the shape value (step
16). Then, if there is another recognition result candidate obtained by character recognition among the currently selected character candidates, the process returns to step 15 (step 17). If there is another character candidate in the character candidate buffer memory 2 that has the same leading line as the line of interest, the process returns to step 14 (step 18). Furthermore, if there is another word candidate in the word candidate buffer memory 3 that has the same end line as the line of interest, step 13
Return to (step 19).

【0026】そして、単語選択バッファメモリ4の中か
ら、単語得点の最も高い単語候補(あるいは上位候補)
を単語候補バッファメモリ3へ戻す。このとき、単語候
補バッファメモリ3の単語候補の末尾ラインのうち、単
語矩形領域で最も左に位置するものを新しい注目ライン
とする(ステップ20)。ここで、新しい注目ラインが単
語矩形領域の右端でないならば、まだ後に続く文字があ
るのでステップ12へ戻る(ステップ21)。そして、単語
候補バッファメモリ3の中から、単語得点の最も高い単
語候補を正解として決定する(ステップ22)。
Then, from the word selection buffer memory 4, a word candidate with the highest word score (or a high-rank candidate) is selected.
Is returned to the word candidate buffer memory 3. At this time, among the end lines of the word candidates in the word candidate buffer memory 3, the leftmost one in the word rectangular area is set as a new line of interest (step 20). If the new line of interest is not at the right end of the word rectangular area, there are still characters to follow, and the process returns to step 12 (step 21). Then, the word candidate having the highest word score is determined as the correct answer from the word candidate buffer memory 3 (step 22).

【0027】このようにして、単語のイメージから切出
した文字候補を左端から右端へ順番に走査していき、連
続する3つの文字の3文字間類似度、存在率、形状値か
ら、切出した文字候補のそれぞれの位置での最適な文字
を逐次決定しながら単語を決定することにより、接触・
分離などにより判定しにくい文字であっても、正確、か
つ迅速に認識可能となる。
In this way, the character candidates cut out from the image of the word are sequentially scanned from the left end to the right end, and the cut out characters are extracted from the similarity between three characters, the existence rate, and the shape value of three consecutive characters. By determining the word while sequentially determining the optimal character at each position of the candidate,
Even characters that are difficult to determine due to separation or the like can be accurately and quickly recognized.

【0028】[0028]

【発明の効果】この発明によれば、単語のイメージから
キャラクタ候補を切出す場合に、切出し位置を様々に変
えて、最適なキャラクタを逐次決定しながら単語を決定
するようにしたので、接触・分離などの理由により切出
すキャラクタ候補が多くなる場合の処理速度が短縮され
る。また、言語辞書に登録されていない単語の認識率が
向上する。
According to the present invention, when a character candidate is cut out from a word image, the cutting position is variously changed and the word is determined while the optimum character is sequentially determined. The processing speed is reduced when the number of character candidates to be cut out increases due to reasons such as separation. In addition, the recognition rate of words that are not registered in the language dictionary is improved.

【図面の簡単な説明】[Brief description of drawings]

【図1】この発明の構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of the present invention.

【図2】この発明の一実施例の構成を示すブロック図。FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention.

【図3】文字候補バッファメモリの記憶内容を示す説明
図。
FIG. 3 is an explanatory diagram showing stored contents of a character candidate buffer memory.

【図4】単語候補バッファメモリ及び単語選択バッファ
メモリの記憶内容を示す説明図。
FIG. 4 is an explanatory diagram showing stored contents of a word candidate buffer memory and a word selection buffer memory.

【図5】実施例の動作を示すフローチャート。FIG. 5 is a flowchart showing the operation of the embodiment.

【符号の説明】[Explanation of symbols]

1 単語イメージバッファメモリ 2 文字候補バッファメモリ 3 単語候補バッファメモリ 4 単語選択バッファメモリ 5 作業用メモリ 6 CPU 6a 3文字間類似度計算部 6b 3文字間存在率計算部 6c 3文字間形状値計算部 7 単語得点計算部 8 制御部 1 word image buffer memory 2 character candidate buffer memory 3 word candidate buffer memory 4 word selection buffer memory 5 working memory 6 CPU 6a 3 character similarity calculation unit 6b 3 character existence rate calculation unit 6c 3 character shape value calculation unit 7 Word score calculation unit 8 Control unit

Claims (1)

【特許請求の範囲】 【請求項1】 イメージリーダによって読取られた単語
として認識するべきイメージを記憶する単語イメージ記
憶手段と、 単語イメージ記憶手段に記憶されたイメージからキャラ
クタとして認識するべきイメージを、切出し位置を様々
に変えて切出す切出し手段と、 切出し手段によって切出された複数種類のイメージをキ
ャラクタ候補として認識するキャラクタ認識手段と、 キャラクタ認識手段によって認識されたキャラクタ候補
について、連続する任意の個数のキャラクタ間の類似
度、存在率、形状値を算出し、その数値に基づいて単語
得点を算出する算出手段と、 算出手段によって算出された単語得点から妥当な単語を
決定する単語決定手段と、を備えてなる単語認識装置。
Claims: 1. A word image storage means for storing an image to be recognized as a word read by an image reader, and an image to be recognized as a character from the image stored in the word image storage means. The cutting means for cutting the cutting position in various ways, the character recognition means for recognizing a plurality of types of images cut out by the cutting means as character candidates, and the character candidates recognized by the character recognition means for consecutive arbitrary Calculation means for calculating the similarity, existence rate, and shape value between a certain number of characters, and a word determination means for calculating a word score based on the numerical values; and a word determination means for determining a valid word from the word scores calculated by the calculation means. A word recognition device comprising:
JP19215491A 1991-07-31 1991-07-31 Word recognition device Expired - Lifetime JP3154752B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP19215491A JP3154752B2 (en) 1991-07-31 1991-07-31 Word recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP19215491A JP3154752B2 (en) 1991-07-31 1991-07-31 Word recognition device

Publications (2)

Publication Number Publication Date
JPH0535920A true JPH0535920A (en) 1993-02-12
JP3154752B2 JP3154752B2 (en) 2001-04-09

Family

ID=16286599

Family Applications (1)

Application Number Title Priority Date Filing Date
JP19215491A Expired - Lifetime JP3154752B2 (en) 1991-07-31 1991-07-31 Word recognition device

Country Status (1)

Country Link
JP (1) JP3154752B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001283157A (en) * 2000-01-28 2001-10-12 Toshiba Corp Method and program for recognizing word

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001283157A (en) * 2000-01-28 2001-10-12 Toshiba Corp Method and program for recognizing word

Also Published As

Publication number Publication date
JP3154752B2 (en) 2001-04-09

Similar Documents

Publication Publication Date Title
US6944344B2 (en) Document search and retrieval apparatus, recording medium and program
JPH0789363B2 (en) Character recognition device
JPH0634256B2 (en) Contact character cutting method
JPH0682403B2 (en) Optical character reader
JPH08263478A (en) Single/linked chinese character document converting device
JP3154752B2 (en) Word recognition device
JP3309174B2 (en) Character recognition method and device
US20020184022A1 (en) Proofreading assistance techniques for a voice recognition system
JP3975825B2 (en) Character recognition error correction method, apparatus and program
JPH06215184A (en) Labeling device for extracted area
JPH0528324A (en) English character recognition device
JP2002063197A (en) Retrieving device, recording medium and program
US5689583A (en) Character recognition apparatus using a keyword
JP4263928B2 (en) Character recognition device, character recognition method, character recognition program, and recording medium
JP2890241B2 (en) Optical character recognition device
JPH07141472A (en) Character string recognizing device
JPH0728935A (en) Document image processor
JP2538543B2 (en) Character information recognition device
JP2895115B2 (en) Character extraction method
JPH07271921A (en) Character recognizing device and method thereof
JP3151866B2 (en) English character recognition method
JPH0944604A (en) Character recognizing processing method
JPH06195521A (en) Character recognizing method
JP2004139428A (en) Character recognition device
JPH08202830A (en) Character recognition system

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080202

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090202

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100202

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100202

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110202

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120202

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120202

Year of fee payment: 11