JP3339879B2

JP3339879B2 - Character recognition device

Info

Publication number: JP3339879B2
Application number: JP15984292A
Authority: JP
Inventors: 一弘萱嶋; 寿男丹羽; 英嗣前川; 泰治〆木
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1992-06-18
Filing date: 1992-06-18
Publication date: 2002-10-28
Anticipated expiration: 2017-10-28
Also published as: JPH064716A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文書等の文章の文字を
読み取って認識するための文字認識装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for reading and recognizing characters of a sentence such as a document.

【０００２】[0002]

【従来の技術】近年、データベースの発展に伴い、高速
で認識率の高い文字認識装置に対する要求が高まってい
る。2. Description of the Related Art In recent years, with the development of databases, demands for a high-speed and high-recognition character recognition apparatus have been increasing.

【０００３】従来の文字認識装置としては、例えば、情
報処理学会論文誌Vol.30 No.11 pp.1394-1401 に示され
ている。図４は従来の文字認識装置を示すものである。
文字認識装置の文字修正部は、単語検索部３、単語辞書
４、文節検索部６、文法辞書７、文節評価値計算部９な
どにより構成され、その文字修正部は、文字認識部１か
ら１文字につきｎ個の候補文字を入力として受けとる。
文字修正部の単語検索部３では、その候補文字の集合の
中から単語辞書４を用いて候補単語の集合を得る。文節
検索部６では、その候補単語の集合から文法辞書７を使
い、候補文節を選び出して候補文節の集合を得る。文節
評価値計算部９では、候補文節の毎に、文字認識部１の
評価値と、単語の頻度と、文字の長さなどを評価演算し
て、文節の確からしさ示す文節評価値を導く。文節選択
部１０では文節評価値に基づいて最も正しいと思われる
文節を選択して、修正文字列１１を得る。A conventional character recognition device is disclosed in, for example, IPSJ Transactions Vol.30 No.11 pp.1394-1401. FIG. 4 shows a conventional character recognition device.
The character correction unit of the character recognition device includes a word search unit 3, a word dictionary 4, a phrase search unit 6, a grammar dictionary 7, a phrase evaluation value calculation unit 9, and the like. It takes as input n candidate characters per character.
The word search unit 3 of the character correction unit obtains a set of candidate words from the set of candidate characters using the word dictionary 4. The phrase search unit 6 selects a candidate phrase from the set of candidate words using the grammar dictionary 7, and obtains a set of candidate phrases. The phrase evaluation value calculation unit 9 evaluates the evaluation value of the character recognition unit 1, the frequency of words, the length of characters, and the like for each candidate phrase, and derives a phrase evaluation value indicating the certainty of the phrase. The phrase selecting unit 10 selects a phrase that is considered to be the most correct based on the phrase evaluation value, and obtains a corrected character string 11.

【０００４】以上のように単語辞書や文法辞書を使うこ
とにより、文字認識部１だけでは判断が難しい文字を単
語と文法の知識により修正することができる。As described above, by using a word dictionary or a grammar dictionary, a character that is difficult to determine only with the character recognition unit 1 can be corrected with knowledge of words and grammar.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記の
文字認識装置では、文字認識部１から出力された認識文
字の修正に利用している知識は、一般的な文章について
の単語と文法の知識である。ところが文書の中には固有
の特徴を持っているものも多数ある。例えば、特許の文
書であれば特許に固有の単語が多く記載される。このよ
うに文書の内容によって文章の文体や使われる単語の頻
度などが異なっているのが普通であるが、文書が持つこ
のような固有の情報は認識文字の修正のために用いられ
ていなかった。However, in the above-described character recognition device, the knowledge used for correcting the recognition character output from the character recognition unit 1 is knowledge of words and grammar of general sentences. is there. However, many documents have unique characteristics. For example, in the case of a patent document, many words unique to the patent are described. In this way, the style of sentences and the frequency of words used are usually different depending on the contents of the document, but such unique information possessed by the document was not used for correction of recognition characters .

【０００６】更に、従来手法では、修正には文字認識部
１から出力されるｎ個の候補文字から正しい文字を選択
するが、正解文字がｎ個の候補内になければ、修正は不
可能であった。Further, in the conventional method, a correct character is selected from the n candidate characters output from the character recognition unit 1 for correction. However, if the correct character is not included in the n candidates, the correction is impossible. there were.

【０００７】従って、以上のように文書が持つ固有の情
報を用いていないため文字認識率が低くなるという課題
がある。[0007] Therefore, there is a problem that the character recognition rate is reduced because the unique information of the document is not used as described above.

【０００８】本発明は、従来のこのような課題を考慮
し、文字認識率を高くすることができる文字認識装置を
提供することを目的とするものである。SUMMARY OF THE INVENTION An object of the present invention is to provide a character recognition device which can increase the character recognition rate in consideration of the conventional problems as described above.

【０００９】[0009]

【課題を解決するための手段】本発明は、認識対象文章
の文字列を読み取って候補文字群を得る候補文字認識手
段と、単語辞書及び前記候補文字群から作成される候補
文字列によって候補単語群を得、その候補単語群及び文
法辞書によって候補文節群を得る単語文節検索手段と、
少なくともその候補文節群の、少なくとも語彙的及び文
法的な正しさを考慮した評価値を文節毎に演算する評価
値演算手段と、その演算結果に応じて、前記候補文節群
から文節を選択し、その選択された文節により作成され
る選択文字列を出力する文節選択手段と、その出力され
た選択文字列から所定の基準に基づいてキーワードを抽
出するキーワード抽出手段と、前記候補文字群と前記キ
ーワードとの間で部分一致検索を行う部分一致検索手段
と、その部分一致したキーワードを候補単語として前記
候補単語群に付加する候補単語付加手段とを備え、前記
文節選択手段から出力された選択文字列は、前記候補単
語群に含まれる単語のみから構成されており、前記所定
の基準は、前記認識対象文章中で前記候補単語が含まれ
る文節全てについて、文節の評価値と前記候補単語の一
般的な頻度との差異を求め、その差と０のうち大きい方
の値の和であり、その候補単語群を用いて前記読み取っ
た文字列を認識することを特徴とする文字認識装置であ
る。SUMMARY OF THE INVENTION The present invention provides a candidate character recognizing means for reading a character string of a sentence to be recognized to obtain a candidate character group, and a candidate word based on a word dictionary and a candidate character string created from the candidate character group. A word phrase search means for obtaining a group of words and a candidate word group by the candidate word group and the grammar dictionary;
At least the candidate clause group, an evaluation value computing means for computing an evaluation value considering at least lexical and grammatical correctness for each clause, and selecting a clause from the candidate clause group according to the computation result; Phrase selecting means for outputting a selected character string created by the selected phrase, keyword extracting means for extracting a keyword from the output selected character string based on a predetermined criterion, the candidate character group and the keyword comprising a partial matching means for performing a partial match search between, and a candidate word adding means for adding to said candidate word group the keywords that the partial match as a candidate word and the
The selected character string output from the phrase selecting means is the candidate unit.
Only words included in the word group are composed of the predetermined
Is that the candidate word is included in the sentence to be recognized.
For all of the clauses, the evaluation value of the clause and one of the candidate words
Find the difference from the general frequency, the greater of the difference and 0
Is the sum of the values of
The character recognition device is characterized by recognizing a character string that has been extracted .

【００１０】[0010]

【００１１】[0011]

【作用】本発明では、候補文字認識手段が認識対象文章
の文字列を読み取って候補文字群を得て、単語文節検索
手段が単語辞書及び候補文字群から作成される候補文字
列によって候補単語群を得、その候補単語群及び文法辞
書によって候補文節群を得て、評価値演算手段が少なく
ともその候補文節群の、少なくとも語彙的及び文法的な
正しさを考慮した評価値を文節毎に演算し、その演算結
果に応じて、文節選択手段が候補文節群から文節を選択
し、その選択された文節により作成される選択文字列を
出力し、キーワード抽出手段がその出力された選択文字
列から所定の基準に基づいてキーワードを抽出し、部分
一致検索手段が候補文字群とキーワードとの間で部分一
致検索を行い、候補単語付加手段がその部分一致したキ
ーワードを候補単語として候補単語群に付加し、その候
補単語群を用いて読み取った文字列の認識を行う。な
お、前記文節選択手段から出力された選択文字列は、前
記候補単語群に含まれる単語のみから構成されており、
前記所定の基準としては、前記認識対象文章中で前記候
補単語が含まれる文節全てについて、文節の評価値と前
記候補単語の一般的な頻度との差異を求め、その差と０
のうち大きい方の値の和を用いる。 According to the present invention, the candidate character recognizing means reads the character string of the sentence to be recognized to obtain a candidate character group, and the word phrase search means obtains the candidate word group by the candidate character string created from the word dictionary and the candidate character group. And a candidate phrase group is obtained from the candidate word group and the grammar dictionary, and the evaluation value calculating means calculates an evaluation value for each phrase at least in consideration of at least lexical and grammatical correctness of the candidate phrase group. In accordance with the operation result, the phrase selecting means selects a phrase from the candidate phrase group, outputs a selected character string created by the selected phrase, and the keyword extracting means determines a predetermined character string from the output selected character string. The keyword is extracted based on the criteria of (1), the partial match search means performs a partial match search between the candidate character group and the keyword, and the candidate word adding means searches the partial match keyword for the candidate unit. As added to the candidate word group, to recognize the character string read by using the candidate word group. What
The selected character string output from the phrase selecting means is
It consists of only words included in the candidate word group,
The predetermined criterion includes the above-mentioned condition in the sentence to be recognized.
For all clauses containing complementary words, the evaluation value of the clause and the previous
The difference between the general frequency of the candidate word and the difference is calculated.
The sum of the larger value is used.

【００１２】[0012]

【００１３】[0013]

【実施例】以下に、本発明をその実施例を示す図面に基
づいて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings showing embodiments thereof.

【００１４】図１は、本発明にかかる一実施例の文字認
識装置の構成図である。すなわち、文字認識部１は、文
字の画像より文字認識を行い、長さｍの文字列で１文字
につき第１候補文字から第ｎ候補文字までのｎ個の候補
文字を持つ候補文字集合２（候補文字群）を出力する。FIG. 1 is a block diagram of a character recognition apparatus according to an embodiment of the present invention. That is, the character recognizing unit 1 performs character recognition from a character image, and sets a candidate character set 2 (n) having n character characters from a first candidate character to an nth candidate character per character in a character string of length m. (Candidate character group).

【００１５】単語検索部３は、単語辞書４を検索するこ
とにより候補文字集合２の組み合わせ（候補文字列）の
中から、単語辞書４に存在する単語と一致する候補単語
を選び、候補単語集合５（候補単語群）を作成する。文
節検索部６は、文法辞書７を参照して文節となり得る単
語の組み合わせである候補文節を選び、候補文節集合８
（候補文節群）を作成する。文節評価値計算部９は、文
節検索部６で検索された文節の語彙的および文法的な正
しさを文節中の単語の長さや頻度などを基準として評価
値（数１）を計算する。The word search unit 3 searches the word dictionary 4 to select a candidate word that matches a word existing in the word dictionary 4 from a combination (candidate character string) of the candidate character set 2 and selects a candidate word set. 5 (candidate word group) is created. The phrase search unit 6 refers to the grammar dictionary 7 to select a candidate phrase that is a combination of words that can be a phrase, and sets a candidate phrase set 8
(Candidate clause group). The phrase evaluation value calculation unit 9 calculates the lexical and grammatical correctness of the phrase retrieved by the phrase retrieval unit 6 based on the length and frequency of words in the phrase as an evaluation value (Equation 1).

【数１】Ｅ＝ａＥ_a＋ｂＥ_b＋ｃＥ_c＋ｄＥ_d 但し、Ｅ_a：文字認識装置１の出力Ｅ_b：文節の長さＥ_c：文節中に含まれる自立語の出現頻度Ｅ_d：単語の長さａ、ｂ、ｃ、ｄは、定数文節選択部１０は、文節の候補の中で評価値の大きい文
節を選択し、修正文字列１１（選択文字列）を生成す
る。この修正文字列１１中の文節の評価値をＥ_i（i=1,
B）とする。ただし、Ｂは文書全体の文節の数である。[Number 1] _{_{E = aE a + bE b +}} cE c + dE d However, E _a: the character recognition apparatus 1 Output E _b: the length of clause E _c: frequency of the independent words included in the phrase E _d: Word The lengths a, b, c, and d are constants. The phrase selection unit 10 selects a phrase having a large evaluation value from the phrase candidates and generates a corrected character string 11 (selected character string). The evaluation value of the phrase in the modified character string 11 is E _i (i = 1,
B). Here, B is the number of phrases in the entire document.

【００１６】キーワード抽出部１２は、文節選択部１０
の出力の修正文字列１１から認識対象の文書のキーワー
ドを抽出し、キーワード集合１３を作成する。これに
は、修正文字列中の文節の評価値と単語の一般的な頻度
情報を用いる。例えば、単語ｗに対するキーワードへの
なり易さＫ_w は、（数２）により求めることができる。
すなわち、文書中で単語ｗが含まれる文節全てについ
て、評価値と単語ｗの一般的な頻度との差を求め、その
差と０のうち大きい方の値の和を求める。The keyword extracting unit 12 includes a phrase selecting unit 10
Then, the keywords of the document to be recognized are extracted from the corrected character string 11 output as described above, and a keyword set 13 is created. For this, the evaluation value of the phrase in the corrected character string and general frequency information of the word are used. For example, ease K _w becomes to keywords for word w can be determined by equation (2).
That is, the difference between the evaluation value and the general frequency of the word w is obtained for all the phrases including the word w in the document, and the sum of the difference and the larger value of 0 is obtained.

【００１７】[0017]

【数２】 (Equation 2)

【００１８】ただし、Ｓ_iは文節ｉに含まれる文字列
を、Ｆ_wは単語ｗの一般的な頻度を表す。キーワード
は、（数２）で計算できるキーワードへのなり易さＫ_w
を大きいものから順に求めることにより抽出される。本
実施例により、ニューラルネットワーク関連の文書から
キーワードを抽出した結果を表１に示す。Here, S _i represents a character string included in the phrase i, and F _w represents a general frequency of the word w. Keyword, (number 2) ease K _w made to the keywords that can be calculated
Are obtained by obtaining in order from the largest one. Table 1 shows the result of extracting keywords from documents related to the neural network according to the present embodiment.

【００１９】[0019]

【表１】 [Table 1]

【００２０】表１は、キーワードへのなり易さＫ_w の大
きい順に単語を並べたものである。この表からわかるよ
うに、文書の中に含まれるキーワードを抽出することが
できる。また、単語の頻度情報だけを利用すると「こ
と」、「もの」などの頻出単語が抽出されるのである
が、キーワードへのなり易さＫ_w を求めて抽出が行われ
るので、このような「こと」、「もの」などの頻出単語
がキーワードとして抽出されず、文書の内容を表す単語
だけがキーワードとして抽出されることがわかり、得ら
れたキーワードが文章の特徴をよくあらわしている。[0020] Table 1 is obtained by arranging the words in descending order of ease K _w made to the keyword. As can be seen from this table, keywords contained in the document can be extracted. In addition, when you use only the frequency information of the word "thing", although the frequent words such as "thing" is extracted, since the extraction is carried out in search of ease of K _w made to the keyword, like this " It can be seen that frequent words such as "things" and "things" are not extracted as keywords, and only words representing the contents of the document are extracted as keywords, and the obtained keywords often express the features of the text.

【００２１】キーワード部分一致検索部１４は、得られ
たキーワードと候補文字列集合との部分一致検索を行
う。例えば、キーワードとして、「認識」が抽出されれ
ば、修正文字列にある「認＊」や「＊識」が部分一致文
字として抽出される。The keyword partial match search unit 14 performs a partial match search between the obtained keyword and a set of candidate character strings. For example, if "recognition" is extracted as a keyword, "recognition *" and "* knowledge" in the corrected character string are extracted as partially matched characters.

【００２２】候補単語付加部１５は、部分一致したキー
ワードをその候補文字列の候補単語として候補単語集合
５に付加する。前述の例では、部分一致した「認＊」や
「＊識」が「認識」として候補単語集合５に付加され
る。これによって、文字認識部１からは出力されなかっ
た文字を候補内文字に入れることができる。これは、文
章の特徴を示すキーワードを用いているので、文章の特
徴に従った処理が可能となる。The candidate word adding unit 15 adds the partially matched keyword to the candidate word set 5 as a candidate word of the candidate character string. In the example described above, partially recognized “*” and “* knowledge” are added to the candidate word set 5 as “recognition”. This allows characters not output from the character recognition unit 1 to be included in the candidate characters. This uses a keyword indicating the characteristics of a sentence, so that processing according to the characteristics of the sentence can be performed.

【００２３】又、キーワード重み付加部１６は、文節内
にキーワードがあれば（数３）に従い候補文節の評価値
を大きくする。これは、文章の特徴を表すキーワードを
用いて、候補となる文節の評価を高める。これによっ
て、文章の特徴を用いた修正が可能となる。If there is a keyword in the phrase, the keyword weighting unit 16 increases the evaluation value of the candidate phrase according to (Equation 3). This enhances the evaluation of candidate phrases using keywords representing the features of the text. As a result, it is possible to make corrections using the features of the text.

【００２４】[0024]

【数３】 (Equation 3)

【００２５】以上の文字認識部１が候補文字認識手段を
構成し、単語検索部３及び文節検索部６が単語文節検索
手段を構成し、文節評価値計算部９が評価値演算手段を
構成し、文節選択部１０が文節選択手段を構成し、キー
ワード抽出部１２がキーワード抽出手段を構成し、キー
ワード重み付加部１６がキーワード重みづけ手段を構成
し、キーワード部分一致検索部１４が部分一致検索手段
を構成し、候補単語付加部１５が候補単語付加手段を構
成している。又、前述の文字認識部１を除いた部分が文
字修正部を構成している。The character recognizing unit 1 constitutes a candidate character recognizing unit, the word retrieving unit 3 and the phrase retrieving unit 6 constitute a word phrase retrieving unit, and the phrase evaluation value calculating unit 9 constitutes an evaluation value calculating unit. , The phrase selection unit 10 constitutes a phrase selection unit, the keyword extraction unit 12 constitutes a keyword extraction unit, the keyword weight addition unit 16 constitutes a keyword weighting unit, and the keyword partial match search unit 14 comprises a partial match search unit. And the candidate word adding unit 15 forms a candidate word adding unit. Further, the portion excluding the above-described character recognition unit 1 constitutes a character correction unit.

【００２６】次に上記実施例の動作について説明する。Next, the operation of the above embodiment will be described.

【００２７】まず、認識対象の文書を文字認識部１で処
理し、例えば１０位までの候補文字集合２を得る。次
に、単語検索部３で、単語辞書４に存在する単語と一致
する候補文字の組み合わせを選び出し、文節検索部６
で、文法辞書７を参照して文節となり得る単語の組み合
わせを選び出す。その後文節評価値計算部９で、文節検
索部６によって検索された文節の評価値を計算する。文
節選択部１０では、計算された評価値を基準にして、文
節の候補の中から正しい文節を選択し、１パス目の修正
文字列１１を出力する。First, the document to be recognized is processed by the character recognizing unit 1 to obtain, for example, a candidate character set 2 up to the tenth place. Next, the word search unit 3 selects a combination of candidate characters that match a word existing in the word dictionary 4, and
With reference to the grammar dictionary 7, a combination of words that can be a phrase is selected. Thereafter, the phrase evaluation value calculation unit 9 calculates the evaluation value of the phrase searched by the phrase search unit 6. The phrase selecting unit 10 selects a correct phrase from the phrase candidates based on the calculated evaluation value, and outputs the corrected character string 11 in the first pass.

【００２８】次にキーワード抽出部１２で、キーワード
へのなり易さＫ_wを計算し、キーワードを抽出する。[0028] In the next keyword extraction unit 12, calculates the ease K _w made to the keyword, to extract the keyword.

【００２９】キーワード部分一致検索部１４で、得られ
たキーワードと候補文字集合２との部分一致検索を行
う。計算量を少なくするために、候補文字集合２とし
て、文字認識部１から上位の文字、例えば、１位のみを
候補集合としてもよい。これは、文字認識部１が高い認
識率を有していれば、問題がない。The keyword partial match search unit 14 performs a partial match search between the obtained keyword and the candidate character set 2. In order to reduce the amount of calculation, as the candidate character set 2, a character higher than the character recognition unit 1, for example, only the first place, may be set as the candidate character set. This is not a problem if the character recognition unit 1 has a high recognition rate.

【００３０】次に、候補単語付加部１５で、部分一致し
て得られたキーワードを候補単語として候補単語集合５
に付加する。修正された文字の文字認識の評価値Ｅ_a
は、ここでは、最下位の評価値を与えるのが適当であ
る。Next, the candidate word adding unit 15 uses the keywords obtained by partial matching as candidate words as candidate words 5
To be added. Evaluation value E _a character recognition of the modified character
Here, it is appropriate to give the lowest evaluation value.

【００３１】用いるキーワードは、１０位程度を用いる
のがよい。しかし、長い文章であれば、更に、多くのキ
ーワードを用いた方がよい。The keyword to be used is preferably about tenth. However, for long sentences, it is better to use more keywords.

【００３２】再び、文節検索部６と文節評価値計算部９
で、キーワードを候補単語として付加された候補単語集
合５を用いて候補文節を選び、新たに選んだ候補文節の
文節評価値を求める。Again, the phrase search section 6 and the phrase evaluation value calculation section 9
Then, a candidate phrase is selected using the candidate word set 5 to which the keyword is added as a candidate word, and a phrase evaluation value of the newly selected candidate phrase is obtained.

【００３３】次に、キーワード重み付加部１６により、
文章の特徴に従った（数３）の文節評価値の修正を行
う。Next, the keyword weighting unit 16
The phrase evaluation value of (expression 3) is corrected according to the characteristics of the sentence.

【００３４】最後に、文節選択部１０で、文節の候補の
中で評価値の大きい文節を選択し、２パス目の最終修正
文字列１１を出力する。Finally, the phrase selecting section 10 selects a phrase having a large evaluation value from among the phrase candidates, and outputs the final corrected character string 11 in the second pass.

【００３５】本実施例により文字認識を行った結果の一
部を図２と図３に示す。FIGS. 2 and 3 show part of the result of character recognition according to this embodiment.

【００３６】図２は、キーワード重み付加部１６の効果
を示すものである。図２では、「現場学習機能を」とい
う文字列に対して７つの文節が候補に上がった。１パス
認識結果では、「学習」という文節が選択されなかった
が、キーワード情報に「学習」という単語が含まれるた
めに、２パス認識結果では「学習」という文節の評価値
が上がり、正しく文字を修正することができた。このよ
うに、１パス認識、２パス認識共に評価値を用いて語彙
的及び文法的に正しいものから文節を選択し、更に２パ
ス認識ではキーワード情報を用いることにより文書の内
容に合った文字列を選択することができた。FIG. 2 shows the effect of the keyword weighting unit 16. In FIG. 2, seven phrases are included in the candidate for the character string “On-site learning function”. Although the phrase “learning” was not selected in the one-pass recognition result, the keyword “learning” was included in the keyword information. Could be corrected. As described above, in the one-pass recognition and the two-pass recognition, a phrase is selected from lexically and grammatically correct ones using the evaluation value, and in the two-pass recognition, a character string matching the contents of the document is obtained by using keyword information. Was able to choose.

【００３７】図３は、「文字認識の後処理効果」という
内容を示す文章に対して本実施例を適用した結果であ
り、これを用いてキーワード部分一致検索部１４と候補
単語付加部１５の効果を示す。文字認識部１からの出力
では、２番目の「認識」の識の方が候補文字に入ってい
ない。その為、従来の方法では、単語の修正はできな
い。本方式では、図３に示すように文章の特徴に基づく
キーワードを抽出できるので、キーワード部分一致検索
を用いることによって、「認識」を候補に入れることが
できた。そのため、２パス目の修正後、正しい文章を得
ることができた。FIG. 3 shows the result of applying the present embodiment to a sentence indicating the content of “post-processing effect of character recognition”. Using this result, the keyword partial match search unit 14 and the candidate word adding unit 15 Show the effect. In the output from the character recognition unit 1, the second "recognition" is not included in the candidate characters. Therefore, the word cannot be corrected by the conventional method. In this method, as shown in FIG. 3, a keyword based on the feature of a sentence can be extracted, so that “recognition” could be included as a candidate by using a keyword partial match search. Therefore, after the correction in the second pass, correct sentences could be obtained.

【００３８】なお、上記実施例では、一度文字修正部で
処理した文字列からキーワードを抽出して、抽出したキ
ーワードの情報を用いてもう一度文字修正部で処理する
２パス認識を行ったが、さらにその認識の結果を用いて
キーワードを抽出して再度文字修正部で処理を１回以上
繰り返して行うｎパス認識を行ってもよい。この場合、
さらに正しいキーワード情報が得られ認識率が向上す
る。In the above-described embodiment, the keyword is extracted from the character string once processed by the character correction unit, and the two-pass recognition is performed by processing the character correction unit again using the extracted keyword information. A keyword may be extracted using the result of the recognition, and n-pass recognition may be performed by repeating the processing at least once in the character correction unit. in this case,
Further, correct keyword information is obtained, and the recognition rate is improved.

【００３９】また、上記実施例では、候補外から抽出で
きた文字に対しては（すなわち、キーワードを候補単語
として候補単語集合に付加して得た文字）、文字認識部
１の評価値Ｅ_a は、最下位の値を与えたが、抽出した文
字を文字認識部１に送り、この文字に対する評価値Ｅ_a
を再計算し、評価値Ｅ_a を得てもよい。これによって、
更に、正確な評価が可能となる。Also, in the above embodiment, for characters that can be extracted from outside the candidates (ie, characters obtained by adding a keyword as a candidate word to a candidate word set), the evaluation value E _a of the character recognition unit 1 is obtained. Gives the lowest value, but sends the extracted character to the character recognizing unit 1 and evaluates the character E _a
Recalculate the may be obtained evaluation value E _a. by this,
Further, accurate evaluation is possible.

【００４０】また、上記実施例では、文節の評価値と単
語の一般的な頻度情報を用いてキーワードを抽出した
が、これに限らず、候補単語となり得る候補文字列の認
識対象文章内での出現回数と、その候補単語の一般文章
内での出現頻度辞書とを用いてキーワードを抽出しても
よい。あるいは又、他の抽出方法を用いてもよい。In the above embodiment, the keyword is extracted by using the evaluation value of the phrase and the general frequency information of the word. However, the present invention is not limited to this. A keyword may be extracted using the number of appearances and a dictionary of the frequency of occurrence of the candidate word in a general sentence. Alternatively, other extraction methods may be used.

【００４１】また、上記実施例では、キーワードの部分
一致検索及び重み付加の両方を用いて文字を認識する構
成としたが、これに限らず、キーワードの部分一致検索
及び重み付加のうち片方のみを用いて文字を認識する構
成としてもよい。In the above embodiment, the character is recognized by using both the keyword partial matching search and the weighting. However, the present invention is not limited to this. Only one of the keyword partial matching searching and the weighting is used. A configuration may be used in which characters are recognized by using these.

【００４２】[0042]

【発明の効果】以上述べたところから明らかなように本
発明は、文字認識率を高くすることができるという長所
を有する。As apparent from the above description, the present invention has an advantage that the character recognition rate can be increased.

[Brief description of the drawings]

【図１】本発明にかかる一実施例の文字認識装置の構成
図である。FIG. 1 is a configuration diagram of a character recognition device according to an embodiment of the present invention.

【図２】同実施例のキーワード重み付加の効果を示す実
験結果の出力図である。FIG. 2 is an output diagram of an experimental result showing an effect of keyword weighting of the embodiment.

【図３】同実施例のキーワード部分一致検索と候補単語
付加部の効果を示す実験結果の出力図である。FIG. 3 is an output diagram of an experimental result showing an effect of a keyword partial match search and a candidate word adding unit of the embodiment.

【図４】従来の文字認識装置の構成図である。FIG. 4 is a configuration diagram of a conventional character recognition device.

[Explanation of symbols]

１文字認識部３単語検索部４単語辞書６文節検索部７文法辞書９文節評価値計算部１０文節選択部１２キーワード抽出部１４キーワード部分一致検索部１５候補単語付加部１６キーワード重み付加部 Reference Signs List 1 character recognition unit 3 word search unit 4 word dictionary 6 phrase search unit 7 grammar dictionary 9 phrase evaluation value calculation unit 10 phrase selection unit 12 keyword extraction unit 14 keyword partial match search unit 15 candidate word addition unit 16 keyword weight addition unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者前川英嗣大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者〆木泰治大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献「電子情報通信学会技術研究報告ＰＲＬ191−135」Ｖｏｌ．91 Ｎｏ．478 ｐ．71−78（1992．２．12）“文字認識後処理方法と後処理による効果の分析" (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/72 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Eiji Maekawa 1006 Kadoma Kadoma, Kadoma City, Osaka Prefecture Inside Matsushita Electric Industrial Co., Ltd. (72) Yasuji Takaki 1006 Kadoma Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. In-house (56) References “IEICE Technical Report PRL191-135” Vol. 91 No. 478 p. 71-78 (199.2.2.12) “Post-processing methods for character recognition and analysis of effects of post-processing” (58) Fields investigated (Int. Cl. ⁷ , DB name) G06K 9/72

Claims

(57) [Claims]

1. A candidate character recognizing means for reading a character string of a sentence to be recognized to obtain a candidate character group, and obtaining a candidate word group from a word dictionary and a candidate character string created from the candidate character group. Word phrase search means for obtaining a candidate phrase group using a grammar dictionary, evaluation value calculation means for calculating, for each phrase, an evaluation value of at least the lexical and grammatical correctness of the candidate phrase group, and its calculation A phrase selecting means for selecting a phrase from the candidate phrase group according to the result and outputting a selected character string created by the selected phrase, and a keyword based on the output selected character string based on a predetermined criterion. Keyword extraction means for extracting a keyword; a partial match search means for performing a partial match search between the candidate character group and the keyword; Candidate word adding means for adding to the candidate word group, and the selected character string output from the phrase selecting means
And the predetermined criterion is the candidate word in the recognition target sentence.
For all the phrases that include, the evaluation value of the phrase and the candidate
Find the difference between the general frequency of the word and the difference
This is the sum of the larger values, and the read character string is recognized using the candidate word group.
Character recognition apparatus characterized by that.