JPH064716A

JPH064716A - Character recognizing device

Info

Publication number: JPH064716A
Application number: JP4159842A
Authority: JP
Inventors: Kazuhiro Kayashima; 一弘萱嶋; Toshio Niwa; 寿男丹羽; Hidetsugu Maekawa; 英嗣前川; 泰治〆木; Taiji Shimeki
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-06-18
Filing date: 1992-06-18
Publication date: 1994-01-14
Anticipated expiration: 2017-10-28
Also published as: JP3339879B2

Abstract

PURPOSE:To improve the character recognition rate by extracting keywords and correcting a character based on an evaluation reference matched with the contents of a document. CONSTITUTION:An inputted document picture is processed once by a character recognizing part 1, a word retrieving part 3, a caluse retrieving part 6, a clause evaluation value calculating part 9, and a clause selecting part 10 to output a corrected character string 11, and keywords are extracted from the corrected character string 11 by a keyword extracting part 12 and used to restore characters other than candidates by a keyword partial coincidence retrieval part 14 and a candidate word adding part 15, and a weight is added to the evaluation value by a keyword weight adding part 16, and the corrected character string 11 is updated to recognize inputted characters.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書等の文章の文字を
読み取って認識するための文字認識装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for reading and recognizing characters in sentences such as documents.

【０００２】[0002]

【従来の技術】近年、データベースの発展に伴い、高速
で認識率の高い文字認識装置に対する要求が高まってい
る。2. Description of the Related Art In recent years, with the development of databases, there has been an increasing demand for a character recognition device having a high speed and a high recognition rate.

【０００３】従来の文字認識装置としては、例えば、情
報処理学会論文誌Vol.30 No.11 pp.1394-1401 に示され
ている。図４は従来の文字認識装置を示すものである。
文字認識装置の文字修正部は、単語検索部３、単語辞書
４、文節検索部６、文法辞書７、文節評価値計算部９な
どにより構成され、その文字修正部は、文字認識部１か
ら１文字につきｎ個の候補文字を入力として受けとる。
文字修正部の単語検索部３では、その候補文字の集合の
中から単語辞書４を用いて候補単語の集合を得る。文節
検索部６では、その候補単語の集合から文法辞書７を使
い、候補文節を選び出して候補文節の集合を得る。文節
評価値計算部９では、候補文節の毎に、文字認識部１の
評価値と、単語の頻度と、文字の長さなどを評価演算し
て、文節の確からしさ示す文節評価値を導く。文節選択
部１０では文節評価値に基づいて最も正しいと思われる
文節を選択して、修正文字列１１を得る。A conventional character recognition device is shown, for example, in IPSJ Journal Vol.30 No.11 pp.1394-1401. FIG. 4 shows a conventional character recognition device.
The character correction unit of the character recognition device includes a word search unit 3, a word dictionary 4, a phrase search unit 6, a grammar dictionary 7, a phrase evaluation value calculation unit 9, and the like, and the character correction unit includes the character recognition units 1 to 1. Receives n candidate characters per character as input.
The word search unit 3 of the character correction unit obtains a set of candidate words from the set of candidate characters by using the word dictionary 4. The phrase searching unit 6 uses the grammar dictionary 7 from the set of candidate words to select candidate phrases and obtain a set of candidate phrases. The phrase evaluation value calculation unit 9 evaluates the evaluation value of the character recognition unit 1, the frequency of words, the length of characters, and the like for each candidate phrase, and derives a phrase evaluation value indicating the likelihood of the phrase. The phrase selecting unit 10 selects the most likely phrase based on the phrase evaluation value and obtains the modified character string 11.

【０００４】以上のように単語辞書や文法辞書を使うこ
とにより、文字認識部１だけでは判断が難しい文字を単
語と文法の知識により修正することができる。As described above, by using the word dictionary and the grammar dictionary, it is possible to correct a character that is difficult to determine by the character recognition unit 1 only, based on the knowledge of the word and the grammar.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記の
文字認識装置では、文字認識部１から出力された認識文
字の修正に利用している知識は、一般的な文章について
の単語と文法の知識である。ところが文書の中には固有
の特徴を持っているものも多数ある。例えば、特許の文
書であれば特許に固有の単語が多く記載される。このよ
うに文書の内容によって文章の文体や使われる単語の頻
度などが異なっているのが普通であるが、文書が持つこ
のような固有の情報は認識文字の修正のために用いられ
ていなかった。However, in the above-mentioned character recognition device, the knowledge utilized for correcting the recognized characters output from the character recognition unit 1 is the knowledge of words and grammar for general sentences. is there. However, many documents have their own characteristics. For example, in a patent document, many words unique to the patent are described. In this way, the style of the sentence and the frequency of the words used are usually different depending on the content of the document, but such unique information held by the document was not used to correct the recognition character. .

【０００６】更に、従来手法では、修正には文字認識部
１から出力されるｎ個の候補文字から正しい文字を選択
するが、正解文字がｎ個の候補内になければ、修正は不
可能であった。Further, in the conventional method, the correct character is selected from the n candidate characters output from the character recognition unit 1 for correction, but if the correct character is not in the n candidates, the correction is impossible. there were.

【０００７】従って、以上のように文書が持つ固有の情
報を用いていないため文字認識率が低くなるという課題
がある。Therefore, there is a problem that the character recognition rate becomes low because the unique information of the document is not used as described above.

【０００８】本発明は、従来のこのような課題を考慮
し、文字認識率を高くすることができる文字認識装置を
提供することを目的とするものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and an object of the present invention is to provide a character recognition apparatus capable of increasing the character recognition rate.

【０００９】[0009]

【課題を解決するための手段】請求項１の本発明は、認
識対象文章の文字列を読み取って候補文字群を得る候補
文字認識手段と、単語辞書及び候補文字群から作成され
る候補文字列によって候補単語群を得、その候補単語群
及び文法辞書によって候補文節群を得る単語文節検索手
段と、少なくともその候補文節群の、少なくとも語彙的
及び文法的な正しさを考慮した評価値を文節毎に演算す
る評価値演算手段と、その演算結果に応じて、候補文節
群から文節を選択し、その選択された文節により作成さ
れる選択文字列を出力する文節選択手段と、その出力さ
れた選択文字列から所定の基準に基づいてキーワードを
抽出するキーワード抽出手段と、キーワードが含まれる
文節について、評価値に重みづけを行うキーワード重み
づけ手段とを備え、その重みづけを用いて読み取った文
字列を認識する文字認識装置である。According to the present invention of claim 1, a candidate character recognition means for reading a character string of a recognition target sentence to obtain a candidate character group, and a candidate character string created from a word dictionary and a candidate character group. A word phrase search means for obtaining a candidate word group by using the candidate word group and a grammar dictionary, and an evaluation value of at least the lexical and grammatical correctness of at least the candidate phrase group for each phrase. An evaluation value calculating means for calculating, a clause selecting means for selecting a clause from a candidate clause group according to the calculation result, and outputting a selection character string created by the selected clause, and the outputted selection A keyword extracting means for extracting a keyword from a character string based on a predetermined criterion, and a keyword weighting means for weighting an evaluation value with respect to a clause including the keyword are provided. Which recognizes character recognition apparatus a character string read by using the weighting.

【００１０】請求項２の本発明は、認識対象文章の文字
列を読み取って候補文字群を得る候補文字認識手段と、
単語辞書及び候補文字群から作成される候補文字列によ
って候補単語群を得、その候補単語群及び文法辞書によ
って候補文節群を得る単語文節検索手段と、少なくとも
その候補文節群の、少なくとも語彙的及び文法的な正し
さを考慮した評価値を文節毎に演算する評価値演算手段
と、その演算結果に応じて、候補文節群から文節を選択
し、その選択された文節により作成される選択文字列を
出力する文節選択手段と、その出力された選択文字列か
ら所定の基準に基づいてキーワードを抽出するキーワー
ド抽出手段と、候補文字群とキーワードとの間で部分一
致検索を行う部分一致検索手段と、その部分一致したキ
ーワードを候補単語として候補単語群に付加する候補単
語付加手段とを備え、その候補単語群を用いて前記読み
取った文字列を認識する文字認識装置である。According to the present invention of claim 2, a candidate character recognizing means for reading a character string of a recognition target sentence to obtain a candidate character group,
A word phrase search means for obtaining a candidate word group by a candidate character string created from a word dictionary and a candidate character group, and obtaining a candidate phrase group by the candidate word group and a grammar dictionary, and at least lexical and at least the candidate phrase group. Evaluation value calculation means for calculating an evaluation value for each phrase in consideration of grammatical correctness, and a selection character string created by selecting a phrase from a candidate phrase group according to the calculation result and selecting the phrase And a keyword extracting means for extracting a keyword from the output selected character string based on a predetermined criterion, and a partial match searching means for performing a partial match search between the candidate character group and the keyword. A candidate word adding means for adding the partially matched keyword as a candidate word to the candidate word group, and using the candidate word group to recognize the read character string. A character recognition device that.

【００１１】[0011]

【作用】請求項１の本発明は、候補文字認識手段が認識
対象文章の文字列を読み取って候補文字群を得て、単語
文節検索手段が単語辞書及び候補文字群から作成される
候補文字列によって候補単語群を得、その候補単語群及
び文法辞書によって候補文節群を得て、評価値演算手段
が少なくともその候補文節群の、少なくとも語彙的及び
文法的な正しさを考慮した評価値を文節毎に演算し、そ
の演算結果に応じて、文節選択手段が候補文節群から文
節を選択し、その選択された文節により作成される選択
文字列を出力し、キーワード抽出手段がその出力された
選択文字列から所定の基準に基づいてキーワードを抽出
し、キーワード重みづけ手段がキーワードの含まれる文
節について、評価値に重みづけを行い、その重みづけを
用いて読み取った文字列の認識を行う。According to the present invention of claim 1, the candidate character recognition means reads the character string of the recognition target sentence to obtain a candidate character group, and the word / phrase search means is created from the word dictionary and the candidate character group. A candidate word group is obtained by using the candidate word group and the candidate phrase group is obtained by the grammar dictionary, and the evaluation value calculating means obtains the evaluation value of at least the lexical and grammatical correctness of the candidate phrase group. Each time, the bunsetsu selecting means selects a bunsetsu from the candidate bunsetsu group according to the operation result, outputs a selection character string created by the selected bunsetsu, and the keyword extracting means outputs the selected selection. The keyword is extracted from the character string based on a predetermined criterion, and the keyword weighting means weights the evaluation value of the phrase containing the keyword, and reads it using the weighting. Performs recognition of string.

【００１２】請求項２の本発明は、候補文字認識手段が
認識対象文章の文字列を読み取って候補文字群を得て、
単語文節検索手段が単語辞書及び候補文字群から作成さ
れる候補文字列によって候補単語群を得、その候補単語
群及び文法辞書によって候補文節群を得て、評価値演算
手段が少なくともその候補文節群の、少なくとも語彙的
及び文法的な正しさを考慮した評価値を文節毎に演算
し、その演算結果に応じて、文節選択手段が候補文節群
から文節を選択し、その選択された文節により作成され
る選択文字列を出力し、キーワード抽出手段がその出力
された選択文字列から所定の基準に基づいてキーワード
を抽出し、部分一致検索手段が候補文字群とキーワード
との間で部分一致検索を行い、候補単語付加手段がその
部分一致したキーワードを候補単語として候補単語群に
付加し、その候補単語群を用いて読み取った文字列の認
識を行う。According to the present invention of claim 2, the candidate character recognition means reads the character string of the recognition target sentence to obtain a candidate character group,
The word phrase search means obtains a candidate word group from the candidate character string created from the word dictionary and the candidate character group, obtains the candidate phrase group from the candidate word group and the grammar dictionary, and the evaluation value calculation means at least the candidate phrase group. Of at least the lexical and grammatical correctness, the evaluation value is calculated for each bunsetsu, and the bunsetsu selecting means selects a bunsetsu from the candidate bunsetsu group according to the operation result, and creates the selected bunsetsu. The selected character string is output, the keyword extraction means extracts a keyword from the output selected character string based on a predetermined criterion, and the partial match search means performs a partial match search between the candidate character group and the keyword. Then, the candidate word adding means adds the partially matched keyword as a candidate word to the candidate word group, and recognizes the read character string using the candidate word group.

【００１３】[0013]

【実施例】以下に、本発明をその実施例を示す図面に基
づいて説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings showing its embodiments.

【００１４】図１は、本発明にかかる一実施例の文字認
識装置の構成図である。すなわち、文字認識部１は、文
字の画像より文字認識を行い、長さｍの文字列で１文字
につき第１候補文字から第ｎ候補文字までのｎ個の候補
文字を持つ候補文字集合２（候補文字群）を出力する。FIG. 1 is a block diagram of a character recognition apparatus according to an embodiment of the present invention. That is, the character recognition unit 1 performs character recognition from a character image, and a candidate character set 2 (n) having n candidate characters from the first candidate character to the nth candidate character for each character in a character string of length m ( Output the candidate character group).

【００１５】単語検索部３は、単語辞書４を検索するこ
とにより候補文字集合２の組み合わせ（候補文字列）の
中から、単語辞書４に存在する単語と一致する候補単語
を選び、候補単語集合５（候補単語群）を作成する。文
節検索部６は、文法辞書７を参照して文節となり得る単
語の組み合わせである候補文節を選び、候補文節集合８
（候補文節群）を作成する。文節評価値計算部９は、文
節検索部６で検索された文節の語彙的および文法的な正
しさを文節中の単語の長さや頻度などを基準として評価
値（数１）を計算する。The word search unit 3 searches the word dictionary 4 to select a candidate word that matches the word existing in the word dictionary 4 from the combination (candidate character string) of the candidate character set 2 to select the candidate word set. 5 (candidate word group) is created. The phrase searching unit 6 refers to the grammar dictionary 7 to select a candidate phrase that is a combination of words that can be a phrase, and a candidate phrase set 8
Create (candidate phrase group). The bunsetsu evaluation value calculation unit 9 calculates an evaluation value (Equation 1) based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 6 based on the length and frequency of the words in the bunsetsu.

【数１】Ｅ＝ａＥ_a＋ｂＥ_b＋ｃＥ_c＋ｄＥ_d 但し、Ｅ_a：文字認識装置１の出力Ｅ_b：文節の長さＥ_c：文節中に含まれる自立語の出現頻度Ｅ_d：単語の長さａ、ｂ、ｃ、ｄは、定数文節選択部１０は、文節の候補の中で評価値の大きい文
節を選択し、修正文字列１１（選択文字列）を生成す
る。この修正文字列１１中の文節の評価値をＥ_i（i=1,
B）とする。ただし、Ｂは文書全体の文節の数である。[Equation 1] E = aE _a + bE _b + cE _c + dE _d where E _a : output of the character recognition device 1 E _b : length of phrase E _c : frequency of occurrence of independent word contained in phrase E _d : of word For the lengths a, b, c and d, the constant phrase selection unit 10 selects a phrase having a large evaluation value among the phrase candidates and generates a modified character string 11 (selected character string). The evaluation value of the phrase in the modified character string 11 is E _i (i = 1,
B). However, B is the number of clauses in the entire document.

【００１６】キーワード抽出部１２は、文節選択部１０
の出力の修正文字列１１から認識対象の文書のキーワー
ドを抽出し、キーワード集合１３を作成する。これに
は、修正文字列中の文節の評価値と単語の一般的な頻度
情報を用いる。例えば、単語ｗに対するキーワードへの
なり易さＫ_w は、（数２）により求めることができる。
すなわち、文書中で単語ｗが含まれる文節全てについ
て、評価値と単語ｗの一般的な頻度との差を求め、その
差と０のうち大きい方の値の和を求める。The keyword extraction unit 12 is a phrase selection unit 10
The keyword of the document to be recognized is extracted from the corrected character string 11 output by the above, and the keyword set 13 is created. For this, the evaluation value of the clause in the modified character string and general frequency information of the word are used. For example, ease K _w becomes to keywords for word w can be determined by equation (2).
That is, the difference between the evaluation value and the general frequency of the word w is calculated for all the phrases including the word w in the document, and the sum of the difference and the larger value of 0 is calculated.

【００１７】[0017]

【数２】 [Equation 2]

【００１８】ただし、Ｓ_iは文節ｉに含まれる文字列
を、Ｆ_wは単語ｗの一般的な頻度を表す。キーワード
は、（数２）で計算できるキーワードへのなり易さＫ_w
を大きいものから順に求めることにより抽出される。本
実施例により、ニューラルネットワーク関連の文書から
キーワードを抽出した結果を表１に示す。However, S _i represents a character string included in the phrase i, and F _w represents a general frequency of the word w. Ease of becoming a keyword that can be calculated by (Equation 2) K _w
Are extracted in order from the largest one. Table 1 shows the result of extracting the keywords from the document related to the neural network according to the present embodiment.

【００１９】[0019]

【表１】 [Table 1]

【００２０】表１は、キーワードへのなり易さＫ_w の大
きい順に単語を並べたものである。この表からわかるよ
うに、文書の中に含まれるキーワードを抽出することが
できる。また、単語の頻度情報だけを利用すると「こ
と」、「もの」などの頻出単語が抽出されるのである
が、キーワードへのなり易さＫ_w を求めて抽出が行われ
るので、このような「こと」、「もの」などの頻出単語
がキーワードとして抽出されず、文書の内容を表す単語
だけがキーワードとして抽出されることがわかり、得ら
れたキーワードが文章の特徴をよくあらわしている。Table 1 shows words arranged in descending order of the ease of becoming a keyword K _w . As can be seen from this table, the keywords contained in the document can be extracted. Further, when only the frequency information of words is used, frequent words such as “koto” and “mono” are extracted. However, since extraction is performed by obtaining the easiness of becoming a keyword K _w , such “ It can be seen that frequent words such as “thing” and “thing” are not extracted as keywords, but only words that represent the content of the document are extracted as keywords, and the obtained keywords well represent the characteristics of the sentence.

【００２１】キーワード部分一致検索部１４は、得られ
たキーワードと候補文字列集合との部分一致検索を行
う。例えば、キーワードとして、「認識」が抽出されれ
ば、修正文字列にある「認＊」や「＊識」が部分一致文
字として抽出される。The keyword partial match search unit 14 performs a partial match search between the obtained keyword and the candidate character string set. For example, if “recognition” is extracted as a keyword, “recognition *” and “* knowledge” in the correction character string are extracted as partial matching characters.

【００２２】候補単語付加部１５は、部分一致したキー
ワードをその候補文字列の候補単語として候補単語集合
５に付加する。前述の例では、部分一致した「認＊」や
「＊識」が「認識」として候補単語集合５に付加され
る。これによって、文字認識部１からは出力されなかっ
た文字を候補内文字に入れることができる。これは、文
章の特徴を示すキーワードを用いているので、文章の特
徴に従った処理が可能となる。The candidate word adding section 15 adds the partially matched keywords to the candidate word set 5 as candidate words of the candidate character string. In the above-mentioned example, the partially matching “recognition *” and “* knowledge” are added to the candidate word set 5 as “recognition”. As a result, a character that has not been output from the character recognition unit 1 can be included in the candidate characters. Since this uses a keyword indicating the feature of the sentence, it is possible to perform processing according to the feature of the sentence.

【００２３】又、キーワード重み付加部１６は、文節内
にキーワードがあれば（数３）に従い候補文節の評価値
を大きくする。これは、文章の特徴を表すキーワードを
用いて、候補となる文節の評価を高める。これによっ
て、文章の特徴を用いた修正が可能となる。Further, the keyword weighting unit 16 increases the evaluation value of the candidate phrase according to (Equation 3) if there is a keyword in the phrase. This enhances the evaluation of candidate clauses using keywords that represent the characteristics of the sentence. This makes it possible to make corrections using the characteristics of the text.

【００２４】[0024]

【数３】 [Equation 3]

【００２５】以上の文字認識部１が候補文字認識手段を
構成し、単語検索部３及び文節検索部６が単語文節検索
手段を構成し、文節評価値計算部９が評価値演算手段を
構成し、文節選択部１０が文節選択手段を構成し、キー
ワード抽出部１２がキーワード抽出手段を構成し、キー
ワード重み付加部１６がキーワード重みづけ手段を構成
し、キーワード部分一致検索部１４が部分一致検索手段
を構成し、候補単語付加部１５が候補単語付加手段を構
成している。又、前述の文字認識部１を除いた部分が文
字修正部を構成している。The above character recognition unit 1 constitutes the candidate character recognition means, the word search unit 3 and the phrase search unit 6 constitute the word phrase search unit, and the phrase evaluation value calculation unit 9 constitutes the evaluation value calculation unit. The phrase selection unit 10 constitutes a phrase selection unit, the keyword extraction unit 12 constitutes a keyword extraction unit, the keyword weight addition unit 16 constitutes a keyword weighting unit, and the keyword partial match search unit 14 forms a partial match search unit. And the candidate word adding unit 15 constitutes a candidate word adding means. The part excluding the character recognition unit 1 described above constitutes a character correction unit.

【００２６】次に上記実施例の動作について説明する。Next, the operation of the above embodiment will be described.

【００２７】まず、認識対象の文書を文字認識部１で処
理し、例えば１０位までの候補文字集合２を得る。次
に、単語検索部３で、単語辞書４に存在する単語と一致
する候補文字の組み合わせを選び出し、文節検索部６
で、文法辞書７を参照して文節となり得る単語の組み合
わせを選び出す。その後文節評価値計算部９で、文節検
索部６によって検索された文節の評価値を計算する。文
節選択部１０では、計算された評価値を基準にして、文
節の候補の中から正しい文節を選択し、１パス目の修正
文字列１１を出力する。First, the document to be recognized is processed by the character recognition unit 1 to obtain a candidate character set 2 of, for example, the 10th place. Next, the word search unit 3 selects a combination of candidate characters that match a word existing in the word dictionary 4, and the phrase search unit 6
Then, referring to the grammar dictionary 7, a combination of words that can become a clause is selected. Then, the phrase evaluation value calculation unit 9 calculates the evaluation value of the phrase searched by the phrase search unit 6. The phrase selection unit 10 selects a correct phrase from the candidate phrases based on the calculated evaluation value, and outputs the corrected character string 11 of the first pass.

【００２８】次にキーワード抽出部１２で、キーワード
へのなり易さＫ_wを計算し、キーワードを抽出する。Next, the keyword extracting unit 12 calculates the easiness of becoming a keyword K _w and extracts the keyword.

【００２９】キーワード部分一致検索部１４で、得られ
たキーワードと候補文字集合２との部分一致検索を行
う。計算量を少なくするために、候補文字集合２とし
て、文字認識部１から上位の文字、例えば、１位のみを
候補集合としてもよい。これは、文字認識部１が高い認
識率を有していれば、問題がない。The keyword partial match search unit 14 performs a partial match search between the obtained keyword and the candidate character set 2. In order to reduce the amount of calculation, the candidate character set 2 may include only characters higher than the character recognition unit 1, for example, only the first character. This is not a problem as long as the character recognition unit 1 has a high recognition rate.

【００３０】次に、候補単語付加部１５で、部分一致し
て得られたキーワードを候補単語として候補単語集合５
に付加する。修正された文字の文字認識の評価値Ｅ_a
は、ここでは、最下位の評価値を与えるのが適当であ
る。Next, the candidate word addition unit 15 sets the keywords obtained by partial matching as candidate words and sets the candidate word set 5.
Added to. Evaluation value E _a of the character recognition of the corrected character
Is suitable to give the lowest evaluation value here.

【００３１】用いるキーワードは、１０位程度を用いる
のがよい。しかし、長い文章であれば、更に、多くのキ
ーワードを用いた方がよい。It is preferable to use the tenth place as the keyword to be used. However, if the text is long, it is better to use more keywords.

【００３２】再び、文節検索部６と文節評価値計算部９
で、キーワードを候補単語として付加された候補単語集
合５を用いて候補文節を選び、新たに選んだ候補文節の
文節評価値を求める。Again, the phrase retrieval unit 6 and the phrase evaluation value calculation unit 9
Then, the candidate phrase is selected using the candidate word set 5 to which the keyword is added as the candidate word, and the phrase evaluation value of the newly selected candidate phrase is obtained.

【００３３】次に、キーワード重み付加部１６により、
文章の特徴に従った（数３）の文節評価値の修正を行
う。Next, the keyword weight adding unit 16
Correct the clause evaluation value of (Equation 3) according to the characteristics of the sentence.

【００３４】最後に、文節選択部１０で、文節の候補の
中で評価値の大きい文節を選択し、２パス目の最終修正
文字列１１を出力する。Finally, the phrase selecting unit 10 selects a phrase having a large evaluation value among the phrase candidates and outputs the final modified character string 11 of the second pass.

【００３５】本実施例により文字認識を行った結果の一
部を図２と図３に示す。Part of the result of character recognition according to this embodiment is shown in FIGS.

【００３６】図２は、キーワード重み付加部１６の効果
を示すものである。図２では、「現場学習機能を」とい
う文字列に対して７つの文節が候補に上がった。１パス
認識結果では、「学習」という文節が選択されなかった
が、キーワード情報に「学習」という単語が含まれるた
めに、２パス認識結果では「学習」という文節の評価値
が上がり、正しく文字を修正することができた。このよ
うに、１パス認識、２パス認識共に評価値を用いて語彙
的及び文法的に正しいものから文節を選択し、更に２パ
ス認識ではキーワード情報を用いることにより文書の内
容に合った文字列を選択することができた。FIG. 2 shows the effect of the keyword weight adding section 16. In FIG. 2, seven clauses were selected as candidates for the character string “on-site learning function”. In the 1-pass recognition result, the phrase "learning" was not selected, but since the word "learning" is included in the keyword information, the evaluation value of the phrase "learning" increases in the 2-pass recognition result, and the character Could be fixed. In this way, the phrase is selected from the lexically and grammatically correct ones by using the evaluation values in both the 1-pass recognition and the 2-pass recognition, and the keyword information is used in the 2-pass recognition to further match the character string to the content of the document. Was able to choose.

【００３７】図３は、「文字認識の後処理効果」という
内容を示す文章に対して本実施例を適用した結果であ
り、これを用いてキーワード部分一致検索部１４と候補
単語付加部１５の効果を示す。文字認識部１からの出力
では、２番目の「認識」の識の方が候補文字に入ってい
ない。その為、従来の方法では、単語の修正はできな
い。本方式では、図３に示すように文章の特徴に基づく
キーワードを抽出できるので、キーワード部分一致検索
を用いることによって、「認識」を候補に入れることが
できた。そのため、２パス目の修正後、正しい文章を得
ることができた。FIG. 3 shows the result of applying this embodiment to a sentence showing the content of "post-processing effect of character recognition". Using this result, the keyword partial match search unit 14 and the candidate word addition unit 15 are used. Show the effect. In the output from the character recognition unit 1, the second “recognition” knowledge is not included in the candidate characters. Therefore, the word cannot be corrected by the conventional method. In this method, as shown in FIG. 3, the keyword based on the feature of the sentence can be extracted, so that “recognition” can be included in the candidates by using the keyword partial match search. Therefore, the correct sentence could be obtained after the correction of the second pass.

【００３８】なお、上記実施例では、一度文字修正部で
処理した文字列からキーワードを抽出して、抽出したキ
ーワードの情報を用いてもう一度文字修正部で処理する
２パス認識を行ったが、さらにその認識の結果を用いて
キーワードを抽出して再度文字修正部で処理を１回以上
繰り返して行うｎパス認識を行ってもよい。この場合、
さらに正しいキーワード情報が得られ認識率が向上す
る。In the above-described embodiment, the keyword is extracted from the character string once processed by the character correction unit, and the two-pass recognition in which the character correction unit processes again using the extracted keyword information is performed. A keyword may be extracted using the result of the recognition, and the n-pass recognition may be performed again by repeating the process once or more in the character correction unit. in this case,
Further, correct keyword information is obtained and the recognition rate is improved.

【００３９】また、上記実施例では、候補外から抽出で
きた文字に対しては（すなわち、キーワードを候補単語
として候補単語集合に付加して得た文字）、文字認識部
１の評価値Ｅ_a は、最下位の値を与えたが、抽出した文
字を文字認識部１に送り、この文字に対する評価値Ｅ_a
を再計算し、評価値Ｅ_a を得てもよい。これによって、
更に、正確な評価が可能となる。Further, in the above embodiment, for the character extracted from the outside of the candidate (that is, the character obtained by adding the keyword as the candidate word to the candidate word set), the evaluation value E _a of the character recognition unit 1 Gives the lowest value, but sends the extracted character to the character recognition unit 1 and evaluates the character E _a for this character.
May be recalculated to obtain the evaluation value E _a . by this,
Furthermore, accurate evaluation becomes possible.

【００４０】また、上記実施例では、文節の評価値と単
語の一般的な頻度情報を用いてキーワードを抽出した
が、これに限らず、候補単語となり得る候補文字列の認
識対象文章内での出現回数と、その候補単語の一般文章
内での出現頻度辞書とを用いてキーワードを抽出しても
よい。あるいは又、他の抽出方法を用いてもよい。Further, in the above embodiment, the keyword is extracted by using the evaluation value of the phrase and the general frequency information of the word. However, the present invention is not limited to this, and the candidate character string which can be a candidate word is recognized in the recognition target sentence. The keyword may be extracted using the number of appearances and the appearance frequency dictionary of the candidate word in a general sentence. Alternatively, other extraction methods may be used.

【００４１】また、上記実施例では、キーワードの部分
一致検索及び重み付加の両方を用いて文字を認識する構
成としたが、これに限らず、キーワードの部分一致検索
及び重み付加のうち片方のみを用いて文字を認識する構
成としてもよい。In the above embodiment, the character is recognized by using both the partial keyword matching search and the weight addition. However, the present invention is not limited to this, and only one of the keyword partial matching search and the weight addition is performed. It may be configured to recognize a character by using it.

【００４２】[0042]

【発明の効果】以上述べたところから明らかなように本
発明は、文字認識率を高くすることができるという長所
を有する。As is apparent from the above description, the present invention has an advantage that the character recognition rate can be increased.

[Brief description of drawings]

【図１】本発明にかかる一実施例の文字認識装置の構成
図である。FIG. 1 is a configuration diagram of a character recognition device according to an embodiment of the present invention.

【図２】同実施例のキーワード重み付加の効果を示す実
験結果の出力図である。FIG. 2 is an output diagram of an experimental result showing an effect of keyword weight addition of the embodiment.

【図３】同実施例のキーワード部分一致検索と候補単語
付加部の効果を示す実験結果の出力図である。FIG. 3 is an output diagram of experimental results showing the effects of the keyword partial match search and the candidate word addition unit of the embodiment.

【図４】従来の文字認識装置の構成図である。FIG. 4 is a configuration diagram of a conventional character recognition device.

[Explanation of symbols]

１文字認識部３単語検索部４単語辞書６文節検索部７文法辞書９文節評価値計算部１０文節選択部１２キーワード抽出部１４キーワード部分一致検索部１５候補単語付加部１６キーワード重み付加部 1 character recognition unit 3 word search unit 4 word dictionary 6 phrase search unit 7 grammar dictionary 9 phrase evaluation value calculation unit 10 phrase selection unit 12 keyword extraction unit 14 keyword partial match search unit 15 candidate word addition unit 16 keyword weight addition unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者〆木泰治大阪府門真市大字門真1006番地松下電器産業株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Taiji Yuki, 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

1. A candidate word group is obtained by a candidate character recognition means for reading a character string of a recognition target sentence to obtain a candidate character group, and a candidate character string created from the word dictionary and the candidate character group. And a word phrase search means for obtaining a candidate phrase group from a grammar dictionary, an evaluation value calculation means for calculating, for each phrase, an evaluation value of at least the lexical and grammatical correctness of the candidate phrase group, and the calculation thereof. A phrase selecting unit that selects a phrase from the candidate phrase group according to the result and outputs a selection character string created by the selected phrase, and a keyword based on a predetermined criterion from the output selection character string And a keyword weighting unit that weights the evaluation value for a phrase including the keyword,
A character recognition device characterized by recognizing the read character string using the weighting.

2. A candidate word group is obtained by a candidate character recognition means for reading a character string of a recognition target sentence to obtain a candidate character group, a word dictionary and a candidate character string created from the candidate character group, and the candidate word group. And a word phrase search means for obtaining a candidate phrase group from a grammar dictionary, an evaluation value calculation means for calculating, for each phrase, an evaluation value of at least the lexical and grammatical correctness of the candidate phrase group, and the calculation thereof. A phrase selecting unit that selects a phrase from the candidate phrase group according to the result and outputs a selection character string created by the selected phrase, and a keyword based on a predetermined criterion from the output selection character string A keyword extraction means for extracting a partial match search means for performing a partial match search between the candidate character group and the keyword, and the partially matched keyword as a candidate word And a candidate word adding unit that adds the candidate word group to the candidate word group, and recognizes the read character string using the candidate word group.

3. The character according to claim 1, wherein the predetermined criterion is the number of appearances of the candidate word in the recognition target sentence and the appearance frequency of the candidate word in a general sentence. Recognition device.