JPH03154985A - Maximum likelihood word recognizing system - Google Patents

Maximum likelihood word recognizing system

Info

Publication number
JPH03154985A
JPH03154985A JP1292226A JP29222689A JPH03154985A JP H03154985 A JPH03154985 A JP H03154985A JP 1292226 A JP1292226 A JP 1292226A JP 29222689 A JP29222689 A JP 29222689A JP H03154985 A JPH03154985 A JP H03154985A
Authority
JP
Japan
Prior art keywords
character
word
words
candidate
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1292226A
Other languages
Japanese (ja)
Inventor
Naotaka Daikoumei
大光明 直孝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP1292226A priority Critical patent/JPH03154985A/en
Publication of JPH03154985A publication Critical patent/JPH03154985A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To increase the processing speed by summing points of characters included in a character candidate string correspondingly to each word and narrowing down words to a smaller number of words based on the frequency in this summing and discriminating coincident/nonconcident parts of characters to perform recollation of words. CONSTITUTION:Evaluation values which are given to characters coinciding with each candidate word are summed to a pertinent evaluation value table 12 by an evaluation value summing processing part 20, and the frequency in summing of the evaluation value for each word is counted and its value of the frequency in summing is stored in a summing frequency count totalizing table 15. A sort processing part 3 sorts candidate words obtained as the output of the evaluation value summing processing part 2 in the descending order of evaluation values, and the candidate word having the largest evaluation value is outputted as the collation output by a discrimination processing part 4. Thus, the number of candidate words as the object of recollation processing is reduced to increase the processing speed.

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は、入力文字列から候補単語を探索し、この候補
単語の中から最も適切で尤度の高い単語を認識する最尤
度単語認識方式に関し、更に詳しくは、単語照合の途中
結果として得られるN個の候補単語から不一致文字部分
の文字切り出しおよび文字照合を再処理する再照合の対
象とすべきn個の候補単語を抽出する候補単語絞り込み
方式を使用した最尤度単語認識方式に関する。
[Detailed Description of the Invention] [Objective of the Invention] (Industrial Application Field) The present invention searches for candidate words from an input character string, and recognizes the most appropriate and most likely word from among the candidate words. Regarding the maximum likelihood word recognition method, in more detail, from the N candidate words obtained as an intermediate result of word matching, character extraction of unmatched character portions and character matching are reprocessed to select n candidates to be subjected to rematching. This invention relates to a maximum likelihood word recognition method using a candidate word narrowing down method for extracting words.

(従来の技術) 従来の最尤度単語認識方式は、第3図に示すように、入
力文字列とメモリ部37の単語辞書11との連想統合処
理を行って、入力文字列の各文字位置にその文字が存在
する候補単語を探索し、該当するすべての候補単語を抽
出する探索処理部1を有し、それから評価値加算処理部
2において前記探索処理部1で得られた候補単語に対し
て一致した文字に付与された評価値を評価値テーブル1
2に加算する。次に、評価値加算処理部2から出力され
る候補単語に対してソート処理部3において評価値の降
順にソートを行い、上位N位までの候補単語を得る。な
お、探索処理部1、評価点加算処理部2およびソート処
理部3は単語照合処理部16を構成している。
(Prior Art) The conventional maximum likelihood word recognition method, as shown in FIG. It has a search processing section 1 that searches for candidate words in which the character exists and extracts all the corresponding candidate words.Then, an evaluation value addition processing section 2 searches for candidate words in which the character exists in the search processing section 1, and extracts all the corresponding candidate words. The evaluation values given to the matching characters are shown in evaluation value table 1.
Add to 2. Next, the candidate words output from the evaluation value addition processing section 2 are sorted by the sorting processing section 3 in descending order of evaluation value to obtain the top N candidate words. Note that the search processing section 1, the evaluation point addition processing section 2, and the sorting processing section 3 constitute a word matching processing section 16.

ソート処理部2で得られたN個の各候補単語は、判定処
理部4において各単語の文字が人力文字列の中にすべて
含まれているか否かを判定される。
For each of the N candidate words obtained by the sorting processing section 2, a judgment processing section 4 judges whether all the characters of each word are included in the human character string.

この判定の結果、すべての文字か含まれていない場合に
は、再照合処理部10において、その単語の不一致部分
に対して再度文字切り出しおよび文字照合を行い、再度
単語の認識処理ようになっている。この再照合処理部1
0は不一致文字位置探索処理部61文字一致率による絞
り込み部13、再文字切り出し部7、再文字照合部8、
単語照合処理部5および判定処理部9から構成されてい
る。
As a result of this determination, if all the characters are not included, the re-matching processing unit 10 performs character extraction and character matching again for the unmatched part of the word, and starts the word recognition process again. There is. This re-verification processing unit 1
0 is an unmatched character position search processing unit 61, a narrowing down unit 13 based on character matching rate, a character re-cutting unit 7, a character re-matching unit 8,
It is composed of a word matching processing section 5 and a determination processing section 9.

また、前記判定処理部4における判定の結果、候補単語
中の全構成文字が候補文字列(入力文字列)の中に存在
すると判定された場合、該当する候補単語を照合認識結
果として出力する。
Further, as a result of the determination in the determination processing unit 4, if it is determined that all the constituent characters of the candidate word are present in the candidate character string (input character string), the corresponding candidate word is output as a matching recognition result.

任意ピッチ文字列では、1文字文の領域が2文字分の領
域に分割されたり、逆に2文字分の領域が1文字分の領
域として合成されてしまう等というように1文字の領域
が必ずしも正確に識別できない。
In arbitrary pitch character strings, the area for one character is not necessarily divided into areas for one character, such as the area for a one-character sentence being divided into areas for two characters, or conversely, the area for two characters is combined into an area for one character. cannot be identified accurately.

従って、領域の識別が誤った場合、入力文字列から絞ら
れた文字候補列と候補単語の一致状態は、単純に各々の
先頭位置から1文字毎に文字コードの比較を行う方法で
は、一致部分と不一致部分を切り分けることができない
。単純な比較をした場合には、誤った一致部分、不一致
部分の判定を行うことが多くなる。このため、一致部分
、不一致部分の判定は、各々の文字候補の前後の文字ま
での比較を行いながら最適な組合せを求めるDPマツチ
ング方と呼ばれる方法に類似した方法により行う。この
不一致部分判定の処理量は候補単語数Nx候補単語長、
lx候補文字列長mのオーダとなる。
Therefore, if the region is incorrectly identified, the matching state between the character candidate string narrowed down from the input character string and the candidate word cannot be determined by simply comparing the character codes character by character from the first position of each character string. It is not possible to separate the discrepancies. When a simple comparison is made, incorrect matching or non-matching portions are often determined. Therefore, the determination of matching portions and non-matching portions is performed using a method similar to the DP matching method in which the optimal combination is determined by comparing the characters before and after each character candidate. The processing amount for this mismatch determination is the number of candidate words N x length of candidate words.
lx candidate character string length is on the order of m.

この判定の結果得られた一致部分の文字数と単語の文字
数の比を文字一致率と称し、この文字一致率の大きな単
語をn個(<N個)選択することにより再照合単語を絞
り込んでいる。
The ratio of the number of characters in the matching part obtained as a result of this judgment to the number of characters in the word is called the character matching rate, and by selecting n words (<N) with a large character matching rate, the words to be rematched are narrowed down. .

(発明が解決しようとする課題) 上述した従来の方法では、N個の候補について一致/不
一致部分を求める処理を行うため、処理負荷が重く、高
速化を図ることができないという問題がある。
(Problems to be Solved by the Invention) The conventional method described above has a problem in that the processing load is heavy and speeding up cannot be achieved because processing is performed to determine matching/mismatching portions for N candidates.

本発明は、上記に鑑みてなされたもので、その目的とす
るところは、再照合処理の対象とする候補単語数を減ら
して再照合処理量を低減し、高速化を図った最尤度単語
認識方式を提供することにある。
The present invention has been made in view of the above, and its purpose is to reduce the number of candidate words to be subjected to re-matching processing, reduce the amount of re-matching processing, and speed up maximum likelihood word processing. The objective is to provide a recognition method.

[発明の構成] (課題を解決しようとする手段) 上記目的を達成するため、本発明の最尤度単語認識方式
は、入力文字列の文字切り出し、文字認識の結果得られ
た文字候補列中のすべての文字について文字毎に該文字
を有する単語を探索し、該文字の得点を該単語の得点と
して加算し、この得られた単語の得点の高いものから類
似度が高いと判断する尤度検索手段と、尤度の高い所定
数の単語について構成文字がすべて文字候補列中に含ま
れるか否かを調べ、すべての文字が含まれている単語を
入力文字候補列に一致した単語と判断し、この判断によ
り一致しないと判断された場合には、前記所定数の単語
について文字候補列と単語の−致部分/不一致部分を探
索し、不一致部分について再度文字領域の識別と文字認
識を行い、最尤度単語を決定する再照合手段を有する任
意ピッチで記入された日本語文字列に一致する単語を単
語辞書中から探索する最尤度単語認識方式であって、文
字の得点を単語の得点として加算する場合に単語毎の加
算回数を記憶する加算回数記憶手段と、この加算回数を
単語の構成文字数で割って得られる疑似文字一致率が所
定値以上の前記所定数よりも少ない数の単語についての
み再照合を行う再照合手段と有することを要旨とする。
[Structure of the Invention] (Means for Solving the Problems) In order to achieve the above object, the maximum likelihood word recognition method of the present invention extracts characters from an input character string and extracts characters from character candidate strings obtained as a result of character recognition. For each character, search for a word that has the character, add the score of the character as the score of the word, and determine the likelihood that the obtained word has a high degree of similarity based on the higher score. A search method is used to check whether all constituent characters of a predetermined number of words with high likelihood are included in the character candidate string, and a word that includes all the characters is determined to be a word that matches the input character candidate string. However, if it is determined that they do not match, the predetermined number of words are searched for matching/unmatching parts between the character candidate string and the word, and character area identification and character recognition are performed again for the unmatched parts. is a maximum likelihood word recognition method that searches a word dictionary for a word that matches a Japanese character string written at an arbitrary pitch, and has a rematching means for determining the maximum likelihood word. an addition number storage means for storing the number of additions for each word when adding up as a score; and a pseudo character matching rate obtained by dividing the number of additions by the number of characters constituting a word, which is a number smaller than the predetermined number and which is greater than or equal to a predetermined value. The gist of the present invention is to have a re-verification means that performs re-verification only on words.

(作用) 本発明の最尤度単語認識方式では、文字候補列に含まれ
る文字の得点を単語対応に加算するとともに、この加算
回数を記憶し、この加算回数を基に単語を少ない数に絞
り込み、それから文字の一致/不一致部分を識別し、単
語の再照合を行っている。
(Operation) In the maximum likelihood word recognition method of the present invention, the scores of characters included in a character candidate string are added to each word, the number of additions is stored, and the number of words is narrowed down to a small number based on the number of additions. , then identifies matching/mismatching parts of the characters and rematching the words.

(実施例) 以下、図面を用いて本発明の詳細な説明する。(Example) Hereinafter, the present invention will be explained in detail using the drawings.

第1図は本発明の一実施例に係わる最尤度単語認識方式
の構成を示すブロック図である。同図に示す最尤度単語
認識方式は、単語照合処理部31と、判定処理部4と、
再照合処理部33と、メモリ部35とから構成されてい
る。単語照合処理部31は、探索処理部1と、評価値加
算処理部20と、ソート処理部3とから構成され、再照
合処理部33は、疑似文字一致率による絞り込み部14
と、不一致文字位置探索処理部6と、再文字切り出し部
7と、再文字照合部8と、単語照合処理部5と、判定処
理部9とから構成され、メモリ部35は、単語辞書11
と、評価値テーブル12と、加算回数集計テーブル15
とから構成されている。
FIG. 1 is a block diagram showing the configuration of a maximum likelihood word recognition method according to an embodiment of the present invention. The maximum likelihood word recognition method shown in the figure includes a word matching processing section 31, a determination processing section 4,
It is composed of a reverification processing section 33 and a memory section 35. The word matching processing section 31 includes a search processing section 1, an evaluation value addition processing section 20, and a sorting processing section 3. The re-matching processing section 33 includes a narrowing down section 14 based on pseudo character matching rate
, a mismatch character position search processing section 6, a character re-extraction section 7, a re-character matching section 8, a word matching processing section 5, and a determination processing section 9.
, evaluation value table 12 , and addition count aggregation table 15
It is composed of.

なお、本実施例に示す最尤度単語認識方式において、前
述した第3図の最尤度単語認識方式に使用されている構
成要素と同じ構成要素には同じ符号が付されており、本
実施例の最尤度単語認識方式が前述した第3図の最尤度
単語認識方式と異なる点は、評価値加算処理部20にお
いて加算回数を計数し、この加算回数を記憶する加算回
数集計テーブル15を設けた点と、疑似文字一致率によ
る絞り込み部14を設け、N個の候補単語から正解の可
能性の高い候補単語をn個に絞り込んでから不一致文字
位置探索処理を行い、処理負荷を低減している点である
In the maximum likelihood word recognition method shown in this example, the same components as those used in the maximum likelihood word recognition method shown in FIG. The difference between the example maximum likelihood word recognition method and the maximum likelihood word recognition method shown in FIG. A narrowing down unit 14 based on pseudo character matching rate is provided to narrow down candidate words with a high probability of being correct from N candidate words, and then search for unmatched character positions to reduce the processing load. This is what we are doing.

また、前記疑似文字一致率により絞り込み部14は、第
2図に示すように、疑似文字一致率算出部26と、グル
ープ化順位置付は部27と、候補単語採否判定部28と
から構成されている。
Further, as shown in FIG. 2, the pseudo character matching rate narrowing down section 14 is composed of a pseudo character matching rate calculating section 26, a grouping order positioning section 27, and a candidate word acceptance/rejection determining section 28. ing.

次に作用を説明する。Next, the effect will be explained.

人力文字列は、まず探索処理部1において単語辞書11
の各単語と比較され、入力文字列の各文字位置にその文
字が存在する候補単語が探索され、該当するすべての候
補単語が抽出される。この抽出された各候補単語は、評
価値加算処理部20において一致した文字に付与された
評価値を該当する評価値テーブル12に加算されると同
時に、評価値を加算する各単語に対して評価値の加算回
数を計数し、その加算回数値を加算回数集計テーブル1
5に格納する。この評価値加算回数の記憶処理は、1単
語当りの評価値加算処理部の加算処理と、1回の格納処
理が追加されるだけであるので、この処理量は無視し得
る程度である。
The human character string is first processed by the word dictionary 11 in the search processing unit 1.
is compared with each word in the input character string, a candidate word in which that character exists in each character position of the input character string is searched, and all corresponding candidate words are extracted. For each extracted candidate word, the evaluation value added to the matching character is added to the corresponding evaluation value table 12 in the evaluation value addition processing unit 20, and at the same time, each word to which the evaluation value is added is evaluated. Count the number of times a value is added, and use the number of additions as the number of additions table 1
Store in 5. This storage process of the number of evaluation value additions only requires the addition process of the evaluation value addition processing unit per word and one storage process, so the amount of processing is negligible.

次に、ソート処理部3は、評価値加算処理部2の出力と
して得られる候補単語について評価値の降順にソートを
実行し、上位N位までの候補単語を得る。それから、判
定処理部4は、ソート処理部3で得たN個の各候補単語
についてその構成文字と候補文字である入力文字列との
一致状況を判定して、構成文字のすべてが候補文字列中
に存在する候補単語が存在した場合、これらのうち最も
評価値の高い候補単語を照合結果として出力する。
Next, the sort processing unit 3 sorts the candidate words obtained as the output of the evaluation value addition processing unit 2 in descending order of evaluation value, and obtains the top N candidate words. Then, the determination processing unit 4 determines the matching status of the constituent characters of each of the N candidate words obtained by the sorting processing unit 3 with the input character string that is the candidate character, and determines whether all of the constituent characters are in the candidate character string. If there are candidate words in the list, the candidate word with the highest evaluation value is output as the matching result.

構成文字のすべてが候補文字列中に存在する候補単語が
ない場合には、N個の候補単語を次の再照合処理部33
に引き渡す。再照合処理部33では、N候補単語の中か
ら再照合処理の対象候補単語を決定するため、疑似文字
一致率による絞り込み部14における処理を実行し、以
下不一致文字位置探索処理部6、再文字切り出し部7、
再文字 0 照合部8、単語照合処理部5および判定処理部9を順次
実行して照合結果を出力する。
If there is no candidate word in which all of the constituent characters are present in the candidate character string, N candidate words are sent to the next re-verification processing unit 33.
hand over to. The rematching processing unit 33 executes the processing in the narrowing down unit 14 based on the pseudo character matching rate in order to determine target candidate words for the rematching process from among the N candidate words. Cutout part 7,
Re-character 0 The matching unit 8, word matching processing unit 5, and determination processing unit 9 are executed in sequence to output the matching results.

疑似文字一致率による絞り込み部14では、第2図に示
すように疑似文字一致率算出部26、グループ化順位置
付は部27および候補単語採否判定部28における処理
を実行する。
The narrowing down unit 14 based on the pseudo character matching rate executes the processes in the pseudo character matching rate calculating unit 26, the grouping order positioning unit 27, and the candidate word acceptance/rejection determining unit 28, as shown in FIG.

疑似文字一致率算出部26では、評価値加算処理部20
で記憶した評価値加算回数を当該単語長、すなわち単語
の構成文字数で割ることにより、疑似文字一致率を算出
する。なお、疑似文字一致率と称するのは、次の理由に
よる。評価値加算処理部20では、文字切り出し誤りに
よる文字位置がずれる影響を補正するために、その文字
位装置の前後り文字の範囲を調べ、その文字が含まれて
いる単語に加算を行う。従って、「東京Jと「京都」の
場合、2文字目の「京」によりどちらも1回の加算が行
われること1とより、正確な一致状態を表現しないこと
があるためである。なお、従来の一致/不一致の判定で
は、文字の並び順までを考慮した判定を行うため、この
ような不正確さは発生しない。
In the pseudo character matching rate calculation unit 26, the evaluation value addition processing unit 20
The pseudo character matching rate is calculated by dividing the number of evaluation value additions stored in , by the length of the word, that is, the number of characters constituting the word. Note that the reason why this is referred to as pseudo character matching rate is as follows. The evaluation value addition processing unit 20 examines the range of characters before and after the character position device and performs addition to the word containing the character, in order to correct the effect of character position shift due to a character segmentation error. Therefore, in the case of ``Tokyo J'' and ``Kyoto,'' both are added once due to the second character ``Kyo'' (1), and an accurate matching state may not be expressed. Note that in conventional match/mismatch determinations, such inaccuracies do not occur because the determination is performed taking into account the order of the characters.

グループ化順位置付は部27では、上記疑似文字一致率
算出部26で得られた各候補単語の疑似文字一致率の高
い候補単語を再照合処理の対象候補単語とするため、疑
似文字一致率の降順に候補単語をソートする。ここで、
疑似文字一致率の差が少ない候補単語があったとき、単
純に疑似文字一致率の順位を評価尺度とし、そのしきい
値で再照合対象候補とするか除外するかを区別すること
は、本来、正解/不正解の判定が難しい類似単語の中で
僅かな疑似文字一致率の差のみで採否を決定することに
なり、判定方法としては無理がある。
In the grouping order positioning unit 27, the pseudo character matching rate is determined in order to select candidate words with a high pseudo character matching rate for each candidate word obtained in the pseudo character matching rate calculation unit 26 as target candidate words for the re-matching process. Sort candidate words in descending order. here,
When there is a candidate word with a small difference in pseudo-character match rate, it is originally impossible to simply use the rank of the pseudo-character match rate as an evaluation measure and use that threshold to distinguish whether to include it as a candidate for re-matching or to exclude it. , it is difficult to determine whether a word is correct or incorrect, and acceptance or rejection is determined based only on the slight difference in pseudo-character matching rate among similar words, which is unreasonable as a judgment method.

このため、グループ化順位置付は部27では、疑似文字
一致率の近い類似候補単語を一纏めにしたグループとし
て扱うこととし、そのグループに順位を付与する。
Therefore, in the grouping order positioning section 27, similar candidate words with similar pseudo-character matching rates are treated as a group, and a ranking is assigned to the group.

疑似文字一致率による絞り込み部14の最後の処理であ
る候補単語採否判定部28における処理は、グループ化
順位付けされた候補単語のうち、予め設定した評価代部
のしきい値(グループ化順1 ]  2 位)より上位の候補単語を再照合処理の対象候補単語と
して抽出する。
The final process of the narrowing down unit 14 based on the pseudo character matching rate in the candidate word acceptance/rejection determining unit 28 is based on a preset evaluation margin threshold (grouping order 1) among the grouped candidate words. ] 2) Extract higher ranking candidate words as target candidate words for re-verification processing.

以上の疑似文字一致率による絞り込み部14により、判
定処理部4で再照合処理が必要と判定された場合のN個
の候補単語から正解の可能性の高い候補単語N個に絞り
込むことができる。なお、疑似文字一致率による絞り込
み部14は文字一致率の算出に用いる値が異なる他は従
来の文字一致率による絞り込み部13の処理と全く同じ
処理である。
The narrowing down unit 14 based on the above-mentioned pseudo character matching rate can narrow down the N candidate words that are likely to be correct from the N candidate words when the determination processing unit 4 determines that re-verification processing is necessary. Note that the narrowing down unit 14 based on the pseudo character matching rate performs exactly the same processing as the conventional narrowing down unit 13 based on the character matching rate, except that the values used to calculate the character matching rate are different.

[発明の効果] 以上説明したように、本発明によれば、文字候補列に含
まれる文字の得点を単語対応に加算するとともに、この
加算回数を記憶し、この加算回数を基に単語を少ない数
に絞り込み、それから文字の一致/不一致部分を識別し
、単語の再照合を行っているので、この絞り込んだ分だ
け処理不可量を低減することができ、高速化を図ること
ができる。
[Effects of the Invention] As explained above, according to the present invention, the scores of characters included in a character candidate string are added to each word, the number of additions is stored, and the number of words is reduced based on the number of additions. Since the number of characters is narrowed down, the matching/mismatching portions of characters are identified, and the words are re-verified, the amount of processing that cannot be processed can be reduced by the amount of narrowing down, and speeding up can be achieved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例に係わる最尤度単語認識方式
の構成を示すブロック図、第2図は第1図の最尤度単語
認識方式に使用されている疑似文字一致率による絞り込
み部の詳細な構成を示すブロック図、第3図は従来の最
尤度単語認識方式の構成を示すブロック図である。 1・・・探索処理部   3・・・ソート処理部、4・
・・判定処理部 6・・・不一致文字位置探索処理部 11・・・単語辞書   12・・評価値テーブル14
・・・疑似文字一致率による絞り込み部15・・・加算
回数集計テーブル 20・・・評価値加算処理部 31・・・単語照合処理部
Fig. 1 is a block diagram showing the configuration of a maximum likelihood word recognition method according to an embodiment of the present invention, and Fig. 2 is a narrowing down based on the pseudo character matching rate used in the maximum likelihood word recognition method of Fig. 1. FIG. 3 is a block diagram showing the structure of a conventional maximum likelihood word recognition system. 1... Search processing unit 3... Sort processing unit, 4.
... Judgment processing section 6 ... Unmatched character position search processing section 11 ... Word dictionary 12 ... Evaluation value table 14
...Narrowing down section 15 based on pseudo character matching rate...Addition count aggregation table 20...Evaluation value addition processing section 31...Word matching processing section

Claims (2)

【特許請求の範囲】[Claims] (1)入力文字列の文字切り出し、文字認識の結果得ら
れた文字候補列中のすべての文字について文字毎に該文
字を有する単語を探索し、該文字の得点を該単語の得点
として加算し、この得られた単語の得点の高いものから
類似度が高いと判断する尤度検索手段と、尤度の高い所
定数の単語について構成文字がすべて文字候補列中に含
まれるか否かを調べ、すべての文字が含まれている単語
を入力文字候補列に一致した単語と判断し、この判断に
より一致しないと判断された場合には、前記所定数の単
語について文字候補列と単語の一致部分/不一致部分を
探索し、不一致部分について再度文字領域の識別と文字
認識を行い、最尤度単語を決定する再照合手段を有する
任意ピッチで記入された日本語文字列に一致する単語を
単語辞書中から探索する最尤度単語認識方式であって、
文字の得点を、単語の得点として加算する場合に単語毎
の加算回数を記憶する加算回数記憶手段と、この加算回
数を単語の構成文字数で割って得られる疑似文字一致率
が所定値以上の前記所定数よりも少ない数の単語につい
てのみ再照合を行う再照合手段と有することを特徴とす
る最尤度単語認識方式。
(1) Characters are extracted from the input character string, and for each character in the character candidate string obtained as a result of character recognition, a word containing the character is searched for, and the score of the character is added as the score of the word. , a likelihood search means that determines that the obtained words have a high degree of similarity based on the words with high scores, and checks whether all constituent characters are included in the character candidate string for a predetermined number of words with high likelihood. , a word containing all the characters is determined to be a word that matches the input character candidate string, and if it is determined that there is no match based on this determination, the matching part of the character candidate string and the word for the predetermined number of words is determined as a word that matches the input character candidate string. /A word dictionary that searches for unmatched parts, performs character area identification and character recognition again for the unmatched parts, and searches for words that match Japanese character strings written at arbitrary pitches. A maximum likelihood word recognition method that searches from inside,
an addition number storage means for storing the number of additions for each word when character scores are added as word scores; 1. A maximum likelihood word recognition method comprising a re-verification means for performing re-verification only for words whose number is less than a predetermined number.
(2)前記再照合手段は、前記疑似文字一致率で単語を
ソートし、所定の刻み幅以内の疑似文字一致率を有する
単語を1つのグループとし、このグループに疑似文字一
致率の大きいものから順位を付与し、該順位が所定の順
位以上の前記所定数よりも少ない数の単語について再照
合を行うことを特徴とする請求項(1)記載の最尤度単
語認識方式。
(2) The re-matching means sorts the words by the pseudo-character matching rate, groups words with a pseudo-character matching rate within a predetermined step size, and divides words into one group from those with a large pseudo-character matching rate. 2. The maximum likelihood word recognition method according to claim 1, wherein a ranking is given and re-verification is performed for words whose ranking is a predetermined rank or higher and is smaller than the predetermined number.
JP1292226A 1989-11-13 1989-11-13 Maximum likelihood word recognizing system Pending JPH03154985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1292226A JPH03154985A (en) 1989-11-13 1989-11-13 Maximum likelihood word recognizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1292226A JPH03154985A (en) 1989-11-13 1989-11-13 Maximum likelihood word recognizing system

Publications (1)

Publication Number Publication Date
JPH03154985A true JPH03154985A (en) 1991-07-02

Family

ID=17779129

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1292226A Pending JPH03154985A (en) 1989-11-13 1989-11-13 Maximum likelihood word recognizing system

Country Status (1)

Country Link
JP (1) JPH03154985A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738519B1 (en) 1999-06-11 2004-05-18 Nec Corporation Character recognition apparatus
CN101840500A (en) * 2010-06-01 2010-09-22 福建新大陆电脑股份有限公司 Device based on confidence for code word decoding and method
WO2011036830A1 (en) * 2009-09-24 2011-03-31 日本電気株式会社 Word recognition device, method, non-transitory computer readable medium storing program and shipped item classification device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738519B1 (en) 1999-06-11 2004-05-18 Nec Corporation Character recognition apparatus
WO2011036830A1 (en) * 2009-09-24 2011-03-31 日本電気株式会社 Word recognition device, method, non-transitory computer readable medium storing program and shipped item classification device
JP5621777B2 (en) * 2009-09-24 2014-11-12 日本電気株式会社 Non-transitory computer-readable medium storing word recognition device, method and program, and shipment sorting device
US9101961B2 (en) 2009-09-24 2015-08-11 Nec Corporation Word recognition apparatus, word recognition method, non-transitory computer readable medium storing word recognition program, and delivery item sorting apparatus
CN101840500A (en) * 2010-06-01 2010-09-22 福建新大陆电脑股份有限公司 Device based on confidence for code word decoding and method

Similar Documents

Publication Publication Date Title
JP3950535B2 (en) Data processing method and apparatus
US6738515B1 (en) Pattern string matching apparatus and pattern string matching method
Lehal et al. A shape based post processor for Gurmukhi OCR
JP7256935B2 (en) Dictionary creation device and dictionary creation method
JPH03154985A (en) Maximum likelihood word recognizing system
JPH0766423B2 (en) Character recognition device
JPH03150692A (en) Word collation system
JP3659688B2 (en) Character recognition device
JPH03189890A (en) Compound word collating method
JPH04111186A (en) Character recognition result correction method for address character string
JPH07114622A (en) Postprocessing method of character recognition device
JPS62285189A (en) Character recognition post processing system
JPS5995682A (en) Character recognition postprocessing system
JP2923295B2 (en) Pattern identification processing method
JP2851865B2 (en) Character recognition device
JPS62284481A (en) Post processing system for character recognition
JP3100786B2 (en) Character recognition post-processing method
JPH06119497A (en) Character recognizing method
JPS6344287A (en) Character reader
JPS5953986A (en) Character recognizing device
JP3446769B2 (en) Character recognition device
CN115357708A (en) Hot line data extraction and data element analysis method
JPH01191992A (en) Character recognizing device
JPH1153474A (en) Character string recognizing method
JPS60220480A (en) Character recognizer