JP6060134B2

JP6060134B2 - Information processing apparatus and information processing method

Info

Publication number: JP6060134B2
Application number: JP2014231133A
Authority: JP
Inventors: 佐藤　広行; 広行佐藤; 康裕森田
Original assignee: Primagest Inc
Current assignee: Primagest Inc
Priority date: 2014-11-13
Filing date: 2014-11-13
Publication date: 2017-01-11
Anticipated expiration: 2034-11-13
Also published as: JP2016095662A

Description

本発明は、取得したイメージ情報中から文字認識処理で認識した文字列認識結果の補正を効率的に行うことができる情報処理装置及び情報処理方法に関し、例えば、認識された文字列と予め登録された文字列とを比較して認識された文字列補正を可能とした情報処理装置及び情報処理方法に関するものである。 The present invention relates to an information processing apparatus and an information processing method capable of efficiently correcting a character string recognition result recognized by character recognition processing from acquired image information. For example, the present invention relates to an information processing apparatus registered in advance with a recognized character string. The present invention relates to an information processing apparatus and an information processing method capable of correcting a recognized character string by comparing with a character string.

デジタル情報技術が普及した現在であるが、紙原稿は情報伝達の媒体としてなお広く使われている。紙原稿に記載された情報を高い精度でデジタルデータ化する場合、データ化する文字をキーボードなどから手入力により入力していた。特に大量の手入力を行う場合には、異なる入力者が同一原稿から同一文字をそれぞれ入力するダブルエントリ処理が行われ、入力された結果が同一であれば（一致すれば）正しく入力されたとしてエントリ結果のデータを採用し、入力結果が異なった場合にはいずれかの入力が誤っていたとして入力文字の確認を行っていた。 Although digital information technology is now widespread, paper manuscripts are still widely used as information transmission media. In the case of converting information written on a paper document into digital data with high accuracy, characters to be converted are manually input from a keyboard or the like. In particular, when performing a large amount of manual input, a double entry process is performed in which different input persons input the same character from the same manuscript, and if the input results are the same (if they match), the input is correct. The data of the entry result was adopted, and when the input result was different, the input character was confirmed as one of the inputs was incorrect.

手入力によるエントリに代わる方法として、ＯＣＲ（光学式文字読取）装置で紙原稿の文字列部分の文字データを認識し、データ化する方法も広く採用されている。スキャナの解像度の向上や画像解析技術の発達により、高い認識精度を得られるようになってはいるが、どうしても誤認識が避けられず、認識の精度の問題があり、ＯＣＲ装置による大量の文字認識処理では手入力によるエントリに代わる方法にはなりえていない。 As a method for replacing entry by manual input, a method of recognizing character data in a character string portion of a paper document by an OCR (optical character reading) apparatus and converting it into data is widely adopted. High recognition accuracy can be obtained by improving the resolution of the scanner and development of image analysis technology, but there is a problem of recognition accuracy due to the inevitable misrecognition, and a large amount of character recognition by the OCR device. Processing cannot be an alternative to manual entry.

例えば、診療報酬請求書などは電子化が進んでいるが、なお紙媒体で提供されることもある。その中には、印刷された傷病名等をデータ化する必要があり、データ化には高い精度が求められる。そのため、手入力によるエントリが２回行われ、そのエントリ結果が合致すればそのデータを採用し、合致しなければ再度エントリして合致したデータを採用することが行われているが、その労力は膨大なものとなっている。 For example, medical fee bills are being digitized, but they may still be provided on paper. Among them, it is necessary to convert the name of a printed disease and the like into data, and high accuracy is required for the conversion into data. For this reason, manual entry is performed twice, and if the entry result matches, the data is adopted, and if it does not match, the entry is adopted again and the matched data is adopted. It has become enormous.

この点を改良すべくＯＣＲ装置で誤認識した文字データに対して、正解データの単語辞書と照会し、文法規則を用いてチェックして日本語文章として妥当な候補文字列を生成し、その候補文字列に対して、単語長、単語の出現頻度などから評価値を算出し、評価値が最大の候補文字列を誤読の補正結果とする方法（特許文献１）が提案されている。 In order to improve this point, the character data misrecognized by the OCR device is referred to the word dictionary of correct data, checked using grammatical rules, and valid candidate character strings are generated as Japanese sentences. A method (Patent Document 1) has been proposed in which an evaluation value is calculated for a character string from the word length, the appearance frequency of the word, etc., and the candidate character string having the maximum evaluation value is used as a misread correction result.

ＯＣＲ装置で読み取られる文字のなかで誤認識され易い文字を集めた類似単語辞書を用意し、誤認識と判定されたら類似辞書を参照して、同じ誤読文字を見つけて正文字と置換し、同じ誤読文字が見つからない場合、あるいは、置換後でも誤認識されやすい文字を含む場合、誤認識の可能性があると容易に認識できるマークに置換する方法も提案されていた（特許文献２）。 Prepare a similar word dictionary that collects characters that are easily misrecognized among the characters read by the OCR device. If it is determined to be misrecognized, refer to the similar dictionary to find the same misread character and replace it with the correct character. There has also been proposed a method of replacing a mark that can be easily recognized when there is a possibility of erroneous recognition when a misread character is not found or when a character that is likely to be erroneously recognized even after replacement is included (Patent Document 2).

特開平０７−０２８９５６号公報Japanese Patent Laid-Open No. 07-028956 特開２０１１−１５０４３６号公報JP 2011-150436 A

しかしながら、従来のＯＣＲ装置による文字認識処理により、手入力によるエントリの一方をＯＣＲ処理に置き換えるためにはどうしても９０％を上回る正解率が必要である。ＯＣＲ装置の読取解像度の向上や画像解析技術の発達により、高い認識精度を得られるようになってきているが、従来ではどうしても正解率が９０％を下回ることが多い。
このため、ＯＣＲ装置による文字認識処理を用いたが誤認識した文字からより確からしい文字を類推し９０％を上回る正解率を提供する方法が望まれていた。 However, in order to replace one of the manually entered entries with the OCR process by the character recognition process by the conventional OCR apparatus, a correct answer rate exceeding 90% is absolutely necessary. Although high recognition accuracy can be obtained by improving the reading resolution of the OCR device and the development of image analysis technology, the accuracy rate is often below 90% in the past.
For this reason, there has been a demand for a method of providing a correct answer rate exceeding 90% by using a character recognition process by an OCR device but estimating a more probable character from misrecognized characters.

即ち、ＯＣＲ処理では、イメージから一つの文字の描かれている範囲を判断し、その一つの文字を認識する。このため、一つの文字範囲の判断で誤ったり、一つの文字の認識で誤ったりし、誤りの原因は多様である。正解率を上げるにはこのような多様な誤認識に対して、正解文字を類推する方法が必要となる。 That is, in the OCR process, a range where one character is drawn is determined from the image, and the one character is recognized. For this reason, there are various causes of errors, such as an error in judging one character range or an error in recognizing one character. In order to increase the correct answer rate, it is necessary to provide a method of analogizing correct characters against such various misrecognitions.

特許文献１では、ＯＣＲ装置で誤認識した文字データに対して、正解データの単語辞書と照会している。ＯＣＲの誤認識では、「右膝関節捻挫」を「右額鸚挫Ｗ」のように複数文字を誤認識することも多く、正解データの単語辞書と単純に照会しても候補文字列を探し出すことが難しい。また、日本語文章として妥当かどうかの判断も、ただの単語である傷病名では意味を持たない。 In Patent Document 1, the character data erroneously recognized by the OCR device is referred to the word dictionary of correct data. In the OCR misrecognition, “right knee joint sprain” is often misrecognized by a plurality of characters such as “right forehead W”, and a candidate character string is searched even by simply referring to the word dictionary of correct data. It is difficult. In addition, the judgment as to whether it is appropriate as a Japanese sentence does not have any meaning in the name of a disease that is just a word.

特許文献２では、「誤認識されやすい文字を特定し」とあり、例えば「口」（漢字のクチ）と「ロ」（カタカナのロ）を挙げている。ＯＣＲは、「右膝関節捻挫」を「右額鸚挫Ｗ」のように誤認識したりする。すなわち、文字範囲の切出しに失敗して２文字を１文字で認識したり（「関節」を「鸚」と）、文字の後ろにあるノイズを文字と認識したり（「挫」を「挫Ｗ」と）する。このため、文字単位の置換だけでは、正解率を上げることが難しい。 In Patent Document 2, there is “specify a character that is easily misrecognized”, for example, “mouth” (kanji for kanji) and “ro” (katakana). The OCR misrecognizes “right knee joint sprain” as “right forehead W”. That is, the extraction of the character range fails and two characters are recognized as one character (“joint” is “鸚”), or the noise behind the character is recognized as a character (“挫” is “挫 W "). For this reason, it is difficult to increase the accuracy rate only by substitution in units of characters.

本発明は上記の問題点を解決し、ＯＣＲ装置で読み取った原稿文字列の正解率を向上させることが出来る情報処理装置及び情報処理方法を提供することを目的としてなされたもので、係る目的を達成する一手段として例えば以下の構成を備える。 The present invention has been made for the purpose of providing an information processing apparatus and an information processing method capable of solving the above-mentioned problems and improving the accuracy rate of a document character string read by an OCR apparatus. As one means to achieve, for example, the following configuration is provided.

すなわち、イメージ情報を表示する表示手段を備え、処理対象原稿をイメージ情報として取得するイメージ取得手段と、前記イメージ取得手段で取得したイメージ情報中の必要文字列を抽出して文字認識する認識手段と、前記イメージ取得手段で取得したイメージ情報中の必要文字列を前記表示手段に表示し当該表示を確認して入力される文字列を正解文字列として取得する正解文字列取得手段と、前記認識手段で認識した認識文字列と前記正解文字列取得手段で取得した対応する正解文字列とを関連付けて予め登録する分類器と、前記分類器の登録文字列ペアを参照して新に文字認識した文字列の正解文字列を類推する類推手段を備えることを特徴とする。 In other words, the image acquisition unit includes a display unit that displays image information, acquires a processing target document as image information, and recognizes a character by extracting a necessary character string from the image information acquired by the image acquisition unit. Correct character string acquisition means for displaying a necessary character string in the image information acquired by the image acquisition means on the display means, confirming the display, and acquiring an input character string as a correct character string, and the recognition means A classifier that pre-registers the recognized character string recognized in step 2 and the corresponding correct character string acquired by the correct character string acquisition means, and a character newly recognized by referring to the registered character string pair of the classifier An analogizing means for analogizing the correct character string of the sequence is provided.

そして例えば、更に予め正解文字列が登録されている登録マスターを備え、前記認識手段が認識した文字列が前記登録マスターに登録されているか否かで誤認識文字列か否かを判定する判定手段と、前記判定手段が誤認識文字列と判定した認識文字列を取り出す誤認識文字列抽出手段と、前記分類器の登録文字列ペアを参照して新たに文字認識した文字列のうち誤認識と判定した文字列から正解文字列を類推する類推手段を備えることを特徴とする。 And, for example, a determination unit that further includes a registration master in which correct character strings are registered in advance, and determines whether or not the character string recognized by the recognition unit is a misrecognized character string based on whether or not the character string is registered in the registration master And a misrecognized character string extracting unit that extracts a recognized character string determined by the determining unit as a misrecognized character string, and a misrecognition among character strings newly recognized with reference to a registered character string pair of the classifier. An analogizing means for analogizing a correct character string from the determined character string is provided.

又例えば、更に予め正解文字列が登録されている登録マスターを備え、前記登録マスターに前記認識文字列と類似する文字列が登録されているか否かを判定し類似する文字列を読み出す読み出し手段と、前記読み出し手段で類似文字列が抽出されない場合には前記認識手段で認識した認識文字列の前記分類器への登録を中止すること特徴とする。 Also, for example, a reading master that further includes a registration master in which correct character strings are registered in advance, determines whether or not a character string similar to the recognized character string is registered in the registration master, and reads out a similar character string. When the similar character string is not extracted by the reading means, registration of the recognized character string recognized by the recognition means to the classifier is stopped.

更に例えば、予め正解文字列が登録されている登録マスターを備え、前記登録マスターに前記認識文字列と類似する文字列が登録されているか否かを判定する判定手段と、前記登録マスターに前記認識文字列と類似する文字列が登録されていない場合には前記認識手段で認識した認識文字列の前記分類器への登録を中止する登録中止手段と、前記判定手段が新たに文字認識した文字列に類似した文字列が前記登録マスターに登録されている場合には類似する文字列を読み出す読み出し手段と、前記分類器に関連付けて登録されている認識文字列と正解文字列とを参照して新に文字認識した文字列に対する正解文字列を類推すること特徴とする。 Further, for example, a registration master in which a correct character string is registered in advance, a determination unit that determines whether a character string similar to the recognition character string is registered in the registration master, and the recognition in the registration master A registration canceling unit for canceling registration of the recognized character string recognized by the recognition unit in the classifier when a character string similar to the character string is not registered, and a character string newly recognized by the determination unit When a character string similar to is registered in the registration master, a reading means for reading out a similar character string, a recognition character string registered in association with the classifier, and a correct character string are referred to. A correct character string for a character string recognized as a character is analogized.

又例えば前記認識手段は、複数種類の文字認識方法でそれぞれ文字認識を行い、前記の分類器にはそれぞれ文字認識した認識文字列と前記正解文字列取得手段で取得した対応する正解文字列とを関連付けて予め登録することを特徴とする。 For example, the recognition means performs character recognition using a plurality of types of character recognition methods, and the classifier includes a recognized character string recognized by the character and a corresponding correct character string acquired by the correct character string acquisition means. It is characterized by registering in advance in association.

更に例えば、前記類推手段が類推した類推文字列を前記表示手段に表示する類推文字列表示手段を備え、前記表示手段により表示された類推文字列を参照して認識文字列を補正可能とすることを特徴とする。 Further, for example, it comprises analogy character string display means for displaying the analogy character string estimated by the analogy means on the display means, and the recognition character string can be corrected with reference to the analogy character string displayed by the display means. It is characterized by.

又例えば、類推文字列表示手段は、前記類推手段が類推した複数の類推文字列を正解確率の高い順番に表示し、かつ、正解確率の和が一定値に達するまで類推文字列を表示することを特徴とする。 Also, for example, the analogy character string display means displays a plurality of analogy character strings estimated by the analogy means in order of high probability of correct answer, and displays the analogy character string until the sum of correct probability reaches a certain value. It is characterized by.

又例えば、前記認識手段で認識した認識文字列を各文字毎に分離する分離手段を備え、前記分類器には、前記認識文字列として分離手段で分離した認識文字列の分離した文字群を前記正解文字列と関連付けて登録することを特徴とする。 Further, for example, it comprises a separating means for separating the recognized character string recognized by the recognizing means for each character, and the classifier includes a character group separated from the recognized character string separated by the separating means as the recognized character string. It is characterized in that it is registered in association with a correct character string.

本発明によれば、例えばＯＣＲ装置等で読み取った原稿文字列の正解率を向上させることが、例えば正解率を９０％以上に向上させることができる情報処理装置及び情報処理方法を提供出来る。 According to the present invention, it is possible to provide an information processing apparatus and an information processing method that can improve the accuracy rate of an original character string read by an OCR device or the like, for example, and improve the accuracy rate to 90% or more.

本発明に係る一発明の実施の形態例の情報処理システムの基本構成を示すブロック図である。1 is a block diagram illustrating a basic configuration of an information processing system according to an embodiment of the present invention. FIG. 本発明に係る一発明の実施の形態例の文字列認識処理の概要を説明するための機能図である。It is a functional diagram for demonstrating the outline | summary of the character string recognition process of the embodiment of one invention concerning this invention. 本実施の形態例で処理する柔道整復施術療養費支給申請書の例を示す図である。It is a figure which shows the example of the judo reduction treatment medical treatment expense supply application processed in this embodiment. 本実施の形態例で用いる柔道整復療養費支給申請書に印刷された負傷名の例を示す図である。It is a figure which shows the example of the injury name printed on the judo reduction medical treatment payment application used in the example of this embodiment.

本実施の形態例で用いる柔道整復療養費支給申請書に印刷された負傷名をイメージ情報として読み取り文字認識して得た文字列の例を示す図である。It is a figure which shows the example of the character string obtained by reading the injury name printed on the judo reduction medical treatment payment application used in this embodiment as image information, and recognizing the character. 本実施の形態例における文字認識した文字列を一文字ずつに分解した例を示す図である。It is a figure which shows the example which decomposed | disassembled the character string which recognized the character in this Embodiment into a character one by one. 本実施の形態例で読み取った申請書読み取りイメージ情報を確認して特定した正解文字列の例を示す図である。It is a figure which shows the example of the correct character string identified by confirming the application form reading image information read in the example of this embodiment.

本実施の形態例で認識した文字列を一文字ずつ分解した文字列と対応する正解文字列を並べて表示した例を示す図である。It is a figure which shows the example which arranged and displayed the correct character string corresponding to the character string which decomposed | disassembled the character string recognized in this Embodiment one by one. 本発明に係る第２の実施の形態例の概略処理を示すフローチャートである。It is a flowchart which shows the schematic process of the 2nd Embodiment based on this invention. 第２の実施の形態例における誤読文字と判定された文字列の例を示す図である。It is a figure which shows the example of the character string determined with the misread character in the example of 2nd Embodiment. 第２の実施の形態例の誤読と判定された文字列の一文字ずつ分解した文字列の例である。It is an example of the character string decomposed | disassembled for every character of the character string determined to be misread of 2nd Embodiment.

第２の発明の実施の形態例で用いる誤読と判定された文字列を一文字ずつ分解した文字列と正解文字列をペアにして比較している例を示す図である。It is a figure which shows the example which compares the character string and character string which decomposed | disassembled the character string determined to be misread used by the example of 2nd invention one character at a time and a pair. 本発明に係る第３の発明の実施の形態例に係る誤読補正方法の概略を示すフローチャートである。It is a flowchart which shows the outline of the misread correction method which concerns on the embodiment of the 3rd invention concerning this invention.

第２の実施の形態例の文字認識した文字列の例と、認識文字列の類似文字列の例を比較可能に表示した例を示す図である。It is a figure which shows the example which displayed the example of the character string which recognized the character of 2nd Embodiment, and the example of the similar character string of a recognized character string so that comparison was possible. 第３の実施の形態例におけるエントリされた正解文字列の例を示す図である。It is a figure which shows the example of the correct answer character string entered in the example of 3rd Embodiment. 第３の実施の形態例において認識した文字列中の誤読文字列と判定した文字列を一文字ずつに分解した列と、誤読文字列に類似した文字列とを抽出した状態を示す図である。It is a figure which shows the state which extracted the string which decomposed | disassembled the character string determined to be the misread character string in the character string recognized in the example of 3rd Embodiment one by one, and the character string similar to the misread character string. 第３の実施の形態例における誤読文字列を一文字ずつ分解した文字列と、正解文字列と判定した文字列をペア表示した例を示す図である。It is a figure which shows the example which displayed as a pair the character string which decomposed | disassembled the misread character string character by character in the example of 3rd Embodiment, and the character string determined to be a correct character string. 本発明に係る第４の実施の形態例の誤読補正方法の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the misread correction method of the 4th Embodiment based on this invention. 本発明に係る第４の実施の形態例に係るエントリー画面の例を示す図である。It is a figure which shows the example of the entry screen which concerns on the 4th example of embodiment which concerns on this invention. 本発明の各実施の形態例におけるエントリー画面の例を示す図である。It is a figure which shows the example of the entry screen in each embodiment of this invention. 本発明の各実施の形態例における誤読補正方法の効果例を説明するための図である。It is a figure for demonstrating the example of an effect of the misreading correction method in each embodiment of this invention.

１００中央処理装置
１１０表示装置
１２０入力装置
１３０イメージ読取装置
１５０通信装置
１６０誤読補正処理部
１６２分類噐
１６３マスタ
２１０認識結果データベース
２２０正解文字列データベース
２３０申請書データベース
２４０認識領域抽出部
５００顧客端末 DESCRIPTION OF SYMBOLS 100 Central processing unit 110 Display apparatus 120 Input apparatus 130 Image reading apparatus 150 Communication apparatus 160 Misreading correction process part 162 Classification | category 163 Master 210 Recognition result database 220 Correct character string database 230 Application form database 240 Recognition area extraction part 500 Customer terminal

以下、図面も参照して本発明に係る一発明の実施の形態例を詳細に説明する。
本発明に係る一発明の実施の形態例によれば、診療報酬請求書などの中に印刷された傷病名等をデータ化するときは、高い正確性が求められる。そのため、従来は手入力によるエントリーが２回行われ、そのエントリー結果が合致すればそのデータを採用し、合致しなければ再度エントリして合致したデータを採用することが行われているが、その労力は膨大なものとなっていた。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
According to an embodiment of the present invention related to the present invention, high accuracy is required when data on the names of wounds and the like printed in a medical fee bill or the like is converted into data. Therefore, in the past, manual entry was performed twice, and if the entry result matched, the data was adopted, and if it did not match, the entry was made again and the matched data was adopted. The effort was enormous.

手入力によるエントリーに代わる方法として、ＯＣＲ（光学式文字読取）で文字データを認識し、データ化する方法もあるが、従来は正解率が９０%を下回ることも多い。しかし、本発明に係る一発明の実施の形態例のＯＣＲ誤読補正方法によれば、ＯＣＲが誤読した文字に対して正解データを９０%以上の正解率で類推することができ、手入力によるエントリの一方をＯＣＲに代えることができ、データ化の労力を減らすことができる。 As an alternative to manual entry, there is a method of recognizing character data by OCR (optical character reading) and converting it into data, but conventionally, the accuracy rate is often less than 90%. However, according to the OCR misread correction method according to an embodiment of the present invention, it is possible to estimate correct data with a correct answer rate of 90% or more for characters misread by OCR, and manually enter entries. One of them can be replaced with OCR, and the data conversion effort can be reduced.

〔第１の実施の形態例〕
図１は本発明に係る一実施の形態例の情報処理システムの基本構成を示すブロック図である。図１において、１００は本システムの各構成の全体制御を行なっている中央処理装置である。中央処理装置１００は、例えば、イメージ読取装置１３０を制御して読み取った紙原稿用紙（例えば帳票）読取イメージデータを申請書データファイル２３０に登録すると共に、認識領域抽出部２４０を制御して認識領域特定処理、特定した認識領域に記載された文字認識処理等を行うと共に、認識した文字列を認識結果データベース２１０に一時登録する処理も行う。 [First Embodiment]
FIG. 1 is a block diagram showing a basic configuration of an information processing system according to an embodiment of the present invention. In FIG. 1, reference numeral 100 denotes a central processing unit that performs overall control of each component of the present system. For example, the central processing unit 100 registers read image data of paper original paper (for example, a form) read by controlling the image reading device 130 in the application form data file 230 and controls the recognition region extracting unit 240 to recognize the recognition region. A specific process, a character recognition process described in the specified recognition area, and the like are performed, and a process of temporarily registering the recognized character string in the recognition result database 210 is also performed.

１１０は表示装置であり、後述する各種の情報を表示する。１２０はキーボード等で構成された入力装置であり、必要に応じて表示装置１１０に表示された認識結果と読取イメージ情報を確認して認識結果の修正などを行うことができる。１３０は各種帳票などの紙原稿を読み取り、必要に応じて記載された特定領域の文字認識が可能なイメージ読取装置であり、帳票や紙に記載されている保険適用申請書等を高速で読み取りイメージ情報に変換すると共に、特定領域の文字情報を文字認識し、キャラクタデータに変換可能である。 Reference numeral 110 denotes a display device that displays various types of information to be described later. An input device 120 includes a keyboard and the like, and can check the recognition result and the read image information displayed on the display device 110 and correct the recognition result as necessary. Reference numeral 130 denotes an image reading apparatus capable of reading a paper document such as various forms and recognizing characters in a specific area described as needed, and reads an image of insurance application form written on the form or paper at high speed. In addition to conversion into information, character information in a specific area can be recognized and converted into character data.

１５０は他の装置やシステムなどに通信媒体を介して通信することが出来る通信装置である。例えば治療院の治療院端末装置５００との間で各種のデータ通信が可能である。なお、図１では医院端末は１台のみ記載されているが、通信可能な治療院端末数に制限はなく、実際には数百台或いはそれ以上の医療端末との間での通信が可能である。なお、治療院よりの通信はデジタルデータの通信に限定されるものではなく、例えば書類をファクシミリ通信で送る場合も含まれる。 A communication apparatus 150 can communicate with other apparatuses and systems via a communication medium. For example, various types of data communication can be performed with the clinic terminal device 500 of the clinic. Although only one clinic terminal is shown in FIG. 1, there is no limit on the number of treatment hospital terminals that can be communicated, and in fact, communication with several hundred or more medical terminals is possible. is there. Note that communication from the clinic is not limited to digital data communication, and includes, for example, sending documents by facsimile communication.

２１０はイメージ読取装置１３０で読み取り認識された認識結果を一時的に登録する認識結果データベースであり申請書データファイル２３０と関連付けて読み出し可能に登録している。 Reference numeral 210 denotes a recognition result database for temporarily registering recognition results read and recognized by the image reading apparatus 130, and is registered so as to be readable in association with the application form data file 230.

１６０は誤読補正処理部であり、詳細を後述するが、分類噐１６２、マスタ１６３などを備え、認識結果を例えば分類器１６２を用いて必要に応じて正解文字列データベース２２０に登録されている正解文字列を参照して補正する。２２０は申請書に記載されるべき傷病名などの文字列を正解文字列として予め登録する正解文字列データベースである。 An error reading correction processing unit 160, which will be described in detail later, includes a classification kit 162, a master 163, and the like, and corrects the recognition result registered in the correct character string database 220 as necessary using the classifier 162, for example. Correct by referring to the character string. Reference numeral 220 denotes a correct character string database in which a character string such as a name of a disease to be described in the application form is registered in advance as a correct character string.

２３０は申請書データベースであり申請書のイメージ情報および記載内容の文字コード化データを登録している。
以上の構成を備える本実施の形態例の概略機能を図２を参照して説明する。本実施の形態例では、実際の文字列認識処理を行う前に、予め認識するべき文字列の正しい文字列である正解文字列１４５と紙原稿用紙から読み出した認識処理により認識された認識文字列とを互いに対応付けて分類器１６２に登録する処理を行う。 An application form database 230 registers image information of the application form and character-coded data of the description.
A schematic function of the present embodiment having the above configuration will be described with reference to FIG. In the present embodiment, the correct character string 145, which is the correct character string of the character string to be recognized in advance, and the recognized character string recognized by the recognition process read from the paper manuscript before performing the actual character string recognition process Are registered in the classifier 162 in association with each other.

図２の誤読補正方法（Ｓ１０１）において、まず左側の分類器生成処理Ｓ１０２を行う。最初に処理対象の紙原稿用紙を読み取り第１イメージ情報１４１を生成する第１スキャン処理Ｓ１１１を実行する。具体的には、認識処理対象の文字列が表示されている紙原稿用紙をイメージ読取装置１３０にセットする。そして入力装置１２０から指示して中央処理装置１００の制御下でイメージ読取装置１３０にセットされた紙原稿用紙をスキャンして紙原稿用紙のイメージ情報を習得して申請書データファイル２３０に登録する。 In the misread correction method (S101) of FIG. 2, first, the left classifier generation process S102 is performed. First, the first scan process S111 for reading the paper document to be processed and generating the first image information 141 is executed. Specifically, a paper original sheet on which a character string to be recognized is displayed is set in the image reading device 130. Then, an instruction from the input device 120 is given to scan the paper original paper set on the image reading device 130 under the control of the central processing unit 100 to acquire image information of the paper original paper and register it in the application form data file 230.

続いてステップＳ１１２の第１文字列抽出処理に移行し、読み取った原稿（申請書）中の予め指定されている文字認識するべき文字列の領域を抽出し、第１抽出文字列１４２を抽出する。次のステップＳ１１３の第１文字列認識処理においてステップＳ１１２で抽出した第１抽出文字列１４２に対する文字認識処理を行い、認識結果としての第１認識文字列１４３を得る。
次のステップＳ１１４の第１文字列分解処理において、ステップ１１３で認識した第１認識文字列１４３に対する文字列分解処理Ｓ１１４を行い、第１認識文字群１４４を得る。 Subsequently, the process proceeds to a first character string extraction process in step S112, where a character string area to be recognized in advance in the read document (application form) is extracted, and a first extracted character string 142 is extracted. . In the first character string recognition process in the next step S113, character recognition processing is performed on the first extracted character string 142 extracted in step S112, and a first recognized character string 143 is obtained as a recognition result.
In the first character string decomposition process in the next step S114, the character string decomposition process S114 is performed on the first recognized character string 143 recognized in step 113, and the first recognized character group 144 is obtained.

又、第１スキャン処理（Ｓ１１１）で読み取った紙原稿イメージ情報は、例えば表示装置１１０に表示されるため、ステップＳ１１５の第１エントリ処理を行い、ステップＳ１１２の抽出文字列１４２に相当する領域の読み取りイメージ情報を確認しつつ当該領域に表示されている正解文字列１４５を例えば入力装置１２０から指示入力する。この処理は１回行えば足りるため、予め申請書に書き込まれる全ての文字列を全てエントリし終わった場合にはエントリ処理は行われない。
このようにして入力された同じ領域に表されている正解文字列１４５と、認識文字群１４４を対応付けてそれぞれ分類器１６２に登録する。 Further, since the paper document image information read in the first scan process (S111) is displayed on the display device 110, for example, the first entry process in step S115 is performed, and the area corresponding to the extracted character string 142 in step S112 is displayed. The correct character string 145 displayed in the area is instructed and inputted from the input device 120, for example, while confirming the read image information. Since this process only needs to be performed once, the entry process is not performed when all the character strings previously written in the application form have been entered.
The correct character string 145 represented in the same area input in this way and the recognized character group 144 are associated with each other and registered in the classifier 162.

全ての処理対象の文字列に対する正解文字列１４５及び第１認識文字群１４４のぺアの分類器１６２への登録がなされたときには、続いて具体的な認識処理を行う準備が完了したため、右側のステップＳ１０３の正解類推処理を実行することになる。 When the correct character string 145 and the first recognized character group 144 are registered in the pair classifier 162 for all the character strings to be processed, preparation for performing a specific recognition process is completed. The correct analogy process in step S103 is executed.

まずステップＳ１３１で実際に処理対象の紙原稿用紙をイメージ読取装置１３０にセットしてイメージ読取処理を行い、読み取ったイメージ情報は申請書データファイル２３０に登録する第２スキャン処理を実行して第２イメージ情報１５１を得る。続いてステップＳ１３２で紙原稿用紙中の文字認識する必要のある領域のイメージを切り出し認識対象の文字列（第２抽出文字列１５２）を抽出する第２文字列抽出処理を行う。続くステップＳ１３３で抽出した文字列１５２を含む情報に対する文字認識処理を行い、第２認識文字列１５３を生成する。次のステップＳ１３４の第２分解処理において、ステップ１３３で認識した第２認識文字列１５３に対する文字列分解処理Ｓ１３４を行い、第２認識文字群１５４を得る。 First, in step S131, a paper original sheet to be actually processed is set in the image reading device 130 and image reading processing is performed, and the read image information is subjected to second scanning processing in which it is registered in the application form data file 230. Image information 151 is obtained. Subsequently, in step S132, a second character string extraction process for extracting an image of a recognition target character string (second extracted character string 152) is performed by cutting out an image of an area in the paper original sheet that needs to be recognized. In step S133, character recognition processing is performed on information including the character string 152 extracted to generate a second recognized character string 153. In the second decomposing process in the next step S134, the character string decomposing process S134 is performed on the second recognized character string 153 recognized in step 133 to obtain the second recognized character group 154.

次にステップＳ１３５で正解類推処理を行い、先に登録した分類器１６２に登録されている正解文字列１４５と第１認識文字群１４４を参照して正解文字列と類推する類推文字列１５５を類推する。あるいは、複数の類推した文字列をその正解確率と共に出力する（１６５）。 Then perform correct analogy process in step S135, the analogy string 155 to analogize and see correct character string 145 registered in the classification 162 registered previously the first recognized character group 144 and correct character string Analogy. Alternatively, a plurality of analogized character strings are output together with their correct answer probabilities (165).

以下に以上で説明した本実施の形態例の詳細を説明する。以下の説明では、処理対象の紙原稿用紙として柔道整復施術療養費支給申請書を処理する例を説明する。本実施の形態例で処理する柔道整復施術療養費支給申請書の例を図３に示す。図３は本実施の形態例で処理する柔道整復施術療養費支給申請書の例を示す図である。 Details of the embodiment described above will be described below. In the following description, an example will be described in which a judo reduction treatment medical treatment payment application is processed as a paper manuscript sheet to be processed. FIG. 3 shows an example of an application form for judo reduction treatment medical treatment paid in the present embodiment. FIG. 3 is a diagram showing an example of a judo reduction treatment medical treatment payment application processed in the present embodiment.

抽出する文字列は図３に４０２で示す負傷名であり、係る負傷名表示領域の表示文字列を文字認識してデジタルデータ化する例とする。例えば、図３に示す柔道整復施術療養費支給申請書４０１では、負傷名記載欄４０２は、プリンターにより印刷される。
負傷名印刷例を図４に示す。傷病名は例えば図４に５０１〜5０３で示すように印刷されている。なお、この文字列は印刷ではなく手書きされた文字列であってもよい。但し手書きの場合には、書き手によって文字認識の正解率が下がるため、印刷された文字列が望ましい。 The character string to be extracted is an injury name indicated by 402 in FIG. 3, and the display character string in the injury name display area is recognized and converted into digital data. For example, in the judo reduction treatment medical expenses supply application 401 shown in FIG. 3, the injury name description column 402 is printed by a printer.
FIG. 4 shows an injured name printing example. The names of wounds are printed as shown by 501 to 503 in FIG. The character string may be a handwritten character string instead of printing. However, in the case of handwriting, since the correct rate of character recognition is lowered by the writer, a printed character string is desirable.

実際の処理では、処理対象の複数の柔道整復施術療養費支給申請書をイメージ読取装置１３０より読み取り、スキャニングした場合には、分類器生成処理１０２では、複数の柔道整復施術療養費支給申請書４０１を第１スキャンＳ１１１でイメージ情報として取り込み、読み込んだ枚数分の第１イメージ１４１を得る。 In actual processing, when a plurality of judo reduction treatment and medical expenses payment application forms to be processed are read from the image reading device 130 and scanned, the classifier generation process 102 has a plurality of judo reduction treatment and medical expenses payment application forms 401. Are acquired as image information in the first scan S111, and the first images 141 corresponding to the number of read images are obtained.

次に、この複数の読み取りイメージ情報（第１イメージ１４１）に対してステップＳ１１２の第１文字抽出処理では、抽出するべき文字列の領域である例えば負傷名欄４０２を抽出し、第１抽出文字列１４２を得る。次にステップＳ１１３の文字認識処理（ＯＣＲ処理）で文字認識処理を実行して第１認識文字列１４３を得る。柔道整復施術療養費支給申請書４０１の負傷名４０２は、例えば図４に示すように印刷されており、第１抽出文字列１４２を例えば図５に示すように文字認識したとする。 Next, in the first character extraction process of step S112 for the plurality of read image information (first image 141), for example, an injury name column 402 that is an area of the character string to be extracted is extracted, and the first extracted character is extracted. A column 142 is obtained. Next, a character recognition process is performed by the character recognition process (OCR process) of step S113, and the 1st recognition character string 143 is obtained. The injury name 402 of the judo reduction treatment medical expenses supply application 401 is printed as shown in FIG. 4, for example, and the first extracted character string 142 is recognized as shown in FIG. 5, for example.

この状態で第１認識文字列分解処理Ｓ１１４では、第１認識文字列１４３を一文字ずつに分解し、第１認識文字群１４４を作る。例えば、図５の認識文字列例６０１，６０２，６０３を、図６に示す第１認識文字群例７０１，７０２，７０３のように一文字ずつに分解する。 In this state, in the first recognized character string decomposing process S114, the first recognized character string 143 is decomposed into characters, and a first recognized character group 144 is created. For example, the recognized character string examples 601, 602, and 603 in FIG. 5 are decomposed into individual characters as in the first recognized character group examples 701, 702, and 703 shown in FIG. 6.

第１エントリ処理では、中央処理装置１００は表示装置１１０に第１スキャン処理で読み取ったイメージ情報を申請書データファイル２３０に登録すると共に、第１イメージ１４１の第１認識文字列１４３に該当する部分を表示して、オペレータがこの表示を確認して第１エントリ処理Ｓ１１５にて正解文字列１４５の入力を促す。 In the first entry process, the central processing unit 100 registers the image information read in the first scan process on the display device 110 in the application form data file 230 and also corresponds to the first recognized character string 143 of the first image 141. The operator confirms this display and prompts the user to input the correct character string 145 in the first entry process S115.

例えば入力するべき文字列部分をハイライト表示などすることで、オペレータが入力するべき対象を目視確認できる。この文字列を確認したオペレータは、ハイライト表示されている読み取りイメージを目視確認しながら正解文字列１４５を順次打ち込むことになる。
例えば、図７の正解文字列例８０１，８０２，８０３である。なお、このエントリを二回行い、その結果を突き合わせることによりデータの正確性を向上させることもある。 For example, by highlighting a character string portion to be input, an object to be input by the operator can be visually confirmed. The operator who has confirmed the character string sequentially inputs the correct character string 145 while visually confirming the read image that is highlighted.
For example, the correct character string examples 801, 802, and 803 in FIG. In addition, the accuracy of data may be improved by performing this entry twice and matching the result.

このようにしてえた正解文字列１４５と第１認識文字群１４４を対応付けて分類器１６２に登録する。本実施の形態例の分類器生成処理Ｓ１１６では、第１認識文字群１４４と正解文字列１４５のペア１４６を作る。たとえば、具体例を示すと図８に示すペア９０１，９０２，９０３ようになる。分類器生成処理Ｓ１１６では、第１認識文字群と正解文字列のペア１４６を多数集めて、必要数が収集できてから例えば機械学習アルゴリズムにより分類器１６２を生成する。 The correct character string 145 thus obtained and the first recognized character group 144 are registered in the classifier 162 in association with each other. In the classifier generation process S116 of the present embodiment, a pair 146 of the first recognized character group 144 and the correct character string 145 is created. For example, a specific example is a pair 901, 902, 903 shown in FIG. In the classifier generation process S116, a large number of pairs 146 of first recognized character groups and correct character strings are collected, and after the necessary number is collected, the classifier 162 is generated by, for example, a machine learning algorithm.

機械学習アルゴリズムには、ナイーブベイズやサポートベクトルマシンなど様々な種類があり、本実施の形態例でも採用可能である。機械学習アルゴリズムの詳細をナイーブベイズを例に説明する。ナイーブベイズは、二つのクラス間で、ある属性が与えられたとき、どちらのクラスに属する確率が高いかを確率として判定する。本例では、属性とは、認識文字群であり、クラスとは、正解文字列である。分類器は、二つのクラス間で一つ作成される。本実施の形態例では、負傷名としての正解文字列は、４００個程度あり、生成される分類器はその組み合わせの数となり、例えば４００×３９９／２＝７９，８００個程度が生成される。 There are various types of machine learning algorithms such as naive Bayes and support vector machines, which can also be adopted in this embodiment. The details of the machine learning algorithm will be described using Naive Bayes as an example. Naive Bayes determines, as a probability, which class has a higher probability when a certain attribute is given between two classes. In this example, the attribute is a recognized character group, and the class is a correct character string. One classifier is created between two classes. In the present embodiment, there are about 400 correct character strings as injury names, and the number of generated classifiers is the number of combinations, for example, about 400 × 399/2 = 79,800 are generated.

分類器１６２の一つの例として、右膝関節捻挫９０２と左膝関節捻挫９０３のペアを比較する分類器を生成する場合を例として説明する。正解文字列１４５と第１認識文字群１４４の全てのペアから、正解文字列に、右膝関節捻挫９０２か左膝関節捻挫９０３が入るペアを取り出し、第１認識文字群の出現回数から確率を計算する。 As an example of the classifier 162, a case where a classifier that compares a pair of the right knee joint sprain 902 and the left knee joint sprain 903 is generated will be described as an example. From all the pairs of the correct character string 145 and the first recognized character group 144, a pair in which the right knee joint sprain 902 or the left knee joint sprain 903 enters the correct character string is extracted, and the probability is calculated from the number of appearances of the first recognized character group. calculate.

例えば、前記ペアの総数が１００個であり、右額鸚挫Ｗ９０６の３文字目の鸚の出現回数が１であれば、右膝関節捻挫９０２か左膝関節捻挫９０３の分類器において、３文字目に鸚が属性として現れたとき、右膝関節捻挫９０２である確率は、１／１００であり、左膝関節捻挫９０３である確率は、０となる。 For example, if the total number of pairs is 100 and the number of appearances of the third character heel of the right forehead 鸚挫 W906 is 1, three characters in the classifier of the right knee joint sprain 902 or the left knee joint sprain 903 When an eyelid appears as an attribute to the eye, the probability of being a right knee joint sprain 902 is 1/100, and the probability of being a left knee joint sprain 903 is zero.

このように、属性毎の出現確率を計算し、掛け算した結果がトータルの確率となる。この場合、属性に出現確率ゼロがあると、トータルがゼロになってしまう。このような現象を避けるため、出現確率計算の分子と分母に１を和し、かつ、Log関数を掛け、掛け算を和算に変換している。 Thus, the appearance probability for each attribute is calculated, and the result of multiplication is the total probability. In this case, if the attribute has an appearance probability of zero, the total becomes zero. In order to avoid such a phenomenon, 1 is added to the numerator and denominator of the appearance probability calculation, and the log function is multiplied to convert the multiplication into the addition.

新たな認識文字群が与えられると、分類器は例えば７９，８００個あり、この中から新たな認識文字群が出現する確率の高いクラス、すなわち正解らしい類推文字列を選択し、最も多く選択された類推文字列を正解であると類推する。
When a new recognition character group is given, classification unit is 79,800 pieces such as high class probability of a new recognition character group appears from this, that select analogy string seems correct, most selected Analogize the estimated analogy string as correct.

認識文字列を一文字ずつに分解する以外の方法としては、Bag of Wordsと呼ばれる方法がある。Bag of Wordsとは、まず、全てのＯＣＲ認識文字をユニークに集めた集合を作る。認識文字群の文字が、その集合に含まれていればTrueとし、含まれていなければFalseとする表を作り、属性とする方法である。本実施の形態例の分類器生成処理で一文字ごとでなくBag of Wordsを用いたばあいは、後述する正解類推処理でもBag of Wordsを使う。 There is a method called Bag of Words as a method other than decomposing the recognition character string into characters. Bag of Words first creates a set that uniquely collects all OCR recognition characters. This is a method of creating a table by making a table that is True if the characters of the recognized character group are included in the set and False otherwise. When Bag of Words is used instead of for each character in the classifier generation processing of the present embodiment, Bag of Words is also used in the correct analogy processing described later.

前記した機械学習アルゴリズムに前記の分類器を設定し、認識文字群を与えると、正解文字列を類推し、またその確率を返す。分類器の正確性を向上するためには、第１認識文字群と正解文字列のペアを多く集める必要がある。柔道整復施術療養費支給申請書４０１は、1ヶ月単位で集計されるので、少なくとも1ヶ月分の柔道整復施術療養費支給申請書４０１を読み込み、１ヶ月分の柔道整復施術療養費支給申請書４０１イメージから分類器１６２を生成する。 When the classifier is set in the machine learning algorithm described above and a recognition character group is given, the correct character string is inferred and its probability is returned. In order to improve the accuracy of the classifier, it is necessary to collect a large number of pairs of the first recognized character group and the correct character string. Since the judo rehabilitation medical treatment payment application 401 is tabulated on a monthly basis, the judo rehabilitation medical treatment payment application 401 is read at least for one month, and the judo reduction medical treatment payment application 401 for one month is read. A classifier 162 is generated from the image.

更に、翌月分の誤読補正では、今月分のイメージと認識文字列と正解文字列も使い、分類器生成処理Ｓ１０２を行う。今月分のイメージと認識文字列と正解文字列は既にあるので、第１スキャンＳ１１１、第１エントリＳ１１５、第１文字列抽出Ｓ１１２、第１文字列認識Ｓ１１３は不要となることから、スキャンを二回行う必要が無くなり、連続してスムーズに、かつ、精度を高めながら運用することができる。 Furthermore, in the misreading correction for the next month, the classifier generation process S102 is performed using the image, the recognized character string, and the correct character string for the current month. Since the image, the recognized character string, and the correct character string for this month are already present, the first scan S111, the first entry S115, the first character string extraction S112, and the first character string recognition S113 are not necessary. It is not necessary to perform the operation once, and it can be operated continuously and smoothly with high accuracy.

機械学習アルゴリズムとしては、サポートベクトルマシーン（ＳＶＭ）やニューロなどがあるが、分類器を生成できる機械学習アルゴリズムであれば、これらに限定されるわけでない。 The machine learning algorithm includes a support vector machine (SVM) and a neuron, but is not limited to these as long as it is a machine learning algorithm that can generate a classifier.

次に、正解類推処理部１０３の詳細を説明する。処理対象の新たな複数の柔道整復施術療養費支給申請書をイメージ読取装置１３０にセットして第２スキャン処理S１３１でイメージの読み込みを行い、複数の第２イメージ１５１を得る。
この第２イメージ１５１から第２抽出文字列１５２を抽出する第２文字列抽出処理Ｓ１３２はステップＳ１１２の第１文字列抽出処理と同様であるため詳細説明を省略する。
抽出した第２抽出文字列１５２はステップＳ１１３の第１認識文字列認識処理と同様の第２文字列認識処理Ｓ１３３でそれぞれ文字認識され、第２認識文字１５３は第１文字列分解処理と同様の第２文字列分解処理１３４で第２認識文字群１５４に分割される。 Next, the details of the correct analogy processing unit 103 will be described. A plurality of new judo reduction treatment medical application payment application forms to be processed are set in the image reading device 130, and an image is read in the second scan processing S131 to obtain a plurality of second images 151.
Since the second character string extraction process S132 for extracting the second extracted character string 152 from the second image 151 is the same as the first character string extraction process in step S112, detailed description thereof is omitted.
The extracted second extracted character string 152 is recognized in the second character string recognition process S133 similar to the first recognized character string recognition process in step S113, and the second recognized character 153 is similar to the first character string decomposition process. A second character string decomposition process 134 divides the character string into a second recognized character group 154.

ステップＳ１３５の正解類推処理では、分類器１６２を先の機械学習に設定し、第２認識文字群１５４を機械学習に与えることにより、類推文字列１５５を得る。
第２認識文字群１５４が右，額，鸚，挫，Ｗ，１１０２であった場合、前記の７９，８００個の分類器に属性として右，額，鸚，挫，Ｗ，１１０２を投入すると、右膝関節捻挫９０２か左膝関節捻挫９０３の分類器からは、右膝関節捻挫９０２の確率が高く返り、右膝関節捻挫９０２が類推文字列の候補として選択される。同様に、全ての分類器で試すと、右膝関節捻挫９０２が類推文字列の候補として選択される数が最も多く、類推文字列１５５として選択される。 In the correct analogy process in step S135, the classifier 162 is set to the previous machine learning, and the second recognized character group 154 is given to the machine learning, thereby obtaining the analogy character string 155.
When the second recognized character group 154 is right, forehead, 鸚, 挫, W, 1102, when the right, forehead, 鸚, 挫, W, 1102 are input as attributes to the 79,800 classifiers, From the classifier of the right knee joint sprain 902 or the left knee joint sprain 903, the probability of the right knee joint sprain 902 returns high, and the right knee joint sprain 902 is selected as an analogy character string candidate. Similarly, when all the classifiers are tried, the right knee joint sprain 902 is selected as the analogy character string candidate most frequently and is selected as the analogy character string 155.

あるいは、正解類推１３５が一つの第２認識文字群１５４に対して、機械学習に指示し、複数の類推文字列とその確率１６５を得ることもできる。
この場合に学習効果の具体例と、認識文字群である１５４に対して具体的に類推された複数の類推文字列とその確率１６５のエントリでの使用方法の具体例を以下に示す。
印字されている文字列が「右膝関節捻挫」で、文字認識結果が「告関節捻挫」と誤認識することがあり、印字されている文字列が「左膝関節捻挫」で、文字認識結果が「告関節捻挫」と誤認識することもある。もちろん、分類器１６２の確率としては、「右膝関節捻挫」が高くでるが、「左膝関節捻挫」の確率もある程度の確率となる。
類推結果をエントリに用いる場合には、「右膝関節捻挫」を第一候補として表示するが、「左膝関節捻挫」も第二候補として表示すると、エントリの効率が上がる。 Alternatively, the correct analogy 135 may instruct machine learning for one second recognized character group 154 to obtain a plurality of analogy strings and their probabilities 165.
In this case, a specific example of a learning effect and a specific example of a method of using a plurality of analogized character strings specifically estimated for the recognized character group 154 and their probabilities 165 are shown below.
The printed character string may be “Right Knee Sprain” and the character recognition result may be misrecognized as “Notice Sprain”. The printed character string may be “Left Knee Sprain” May be misrecognized as a “joint sprain”. Of course, as the probability of the classifier 162, “right knee joint sprain” is high, but the probability of “left knee joint sprain” is also a certain probability.
When the analogy result is used for the entry, “right knee joint sprain” is displayed as the first candidate. However, when “left knee joint sprain” is also displayed as the second candidate, the efficiency of the entry increases.

〔第２の実施の形態例〕
以上の説明は予め例えば標準紙原稿等をスキャンし、認識した全ての文字列から分類器１６２を生成し、次に、処理対象の紙原稿をスキャンして先に生成した分類器１６２を用いて正解文字列を類推する例について説明した。 [Second Embodiment]
In the above description, for example, a standard paper document or the like is scanned in advance, the classifier 162 is generated from all recognized character strings, and then the paper document to be processed is scanned and the classifier 162 previously generated is used. An example of analogizing the correct character string has been described.

しかし本発明は以上の例に限定されるものではなく、予め全ての負傷名を登録したマスター１６３を生成しておき、認識文字列と同じ負傷名が登録されているか比較し、登録されていなければ誤読と判断するように、制御してもよい。分類器１６２を誤読文字群と正解文字列のペアから生成し、類推するときも誤読文字列のみを使用することも考えられる。この場合には高い正確性が期待できる。 However, the present invention is not limited to the above example. A master 163 in which all injury names are registered in advance is generated, and whether or not the same injury name as the recognized character string is registered is registered. For example, the control may be performed so that it is determined as a misread. It is also conceivable that the classifier 162 is generated from a misread character group and a correct character string pair and only the misread character string is used for analogy. In this case, high accuracy can be expected.

このように構成した本発明に係る第２の実施の形態例を以下に説明する。
図９は、本発明に係る第２の実施の形態例の概略処理を示すフローチャートである。図９において、上述した図２のフローチャートに示す処理と同様処理には同一ステップ番号を付し詳細説明を省略する。 A second embodiment according to the present invention configured as described above will be described below.
FIG. 9 is a flowchart showing a schematic process of the second embodiment according to the present invention. In FIG. 9, the same step numbers are assigned to the same processes as those shown in the flowchart of FIG. 2 described above, and detailed description thereof is omitted.

第１スキャン処理ステップＳ１１１、第１文字列抽出処理ステップＳ１１２、第１文字列認識ステップＳ１１３，第１文字列分解ステップＳ１１４，第１エントリ処理ステップＳ１１５、第２スキャン処理ステップＳ１３１、第２文字列抽出処理ステップＳ１３２、第２文字列認識ステップＳ１３３，第２文字列分解ステップＳ１３４は、図２に示す第１の実施の形態例の処理と同様である。 First scan processing step S111, first character string extraction processing step S112, first character string recognition step S113, first character string decomposition step S114, first entry processing step S115, second scan processing step S131, second character string The extraction processing step S132, the second character string recognition step S133, and the second character string decomposition step S134 are the same as the processing of the first embodiment shown in FIG.

第２の実施の形態例で用いるマスター１６３には、予め全ての負傷名が登録されており、４００種類ほどである。ステップＳ１１２の第１文字列認識処理で抽出された例えば図５に示す第１認識文字列１４３を抽出すると、ステップＳ２１３に示す第１マスター比較処理に進み、第１認識文字列１４３について、順次マスター１６３を参照してマスター１６３に登録されている文字列か否かを比較する。 All injured names are registered in advance in the master 163 used in the second embodiment, and there are about 400 types. When, for example, the first recognized character string 143 shown in FIG. 5 extracted in the first character string recognition process in step S112 is extracted, the process proceeds to the first master comparison process shown in step S213, and the first recognized character string 143 is sequentially mastered. Reference is made to 163 to compare whether or not the character string is registered in the master 163.

そして、第１認識文字列の中で登録されていない文字列を第１誤読文字列２４３として選ぶ。例えば図５の文字列が認識された時には、図１０の誤読文字列例の「右額鸚挫Ｗ」１００２は、マスター１６３に登録された負傷名に含まれておらず、誤読文字列と判定する。 Then, a character string that is not registered in the first recognized character string is selected as the first misread character string 243. For example, when the character string of FIG. 5 is recognized, “right forehead W” 1002 of the misread character string example of FIG. 10 is not included in the injury name registered in the master 163 and is determined as a misread character string. To do.

なお、マスター１６３に登録されている文字列が第１抽出文字列１４２内に含まれていた場合には、第１エントリ処理ステップＳ１１６でのエントリが必要なく正解文字列であると判定してもよい。 If the character string registered in the master 163 is included in the first extracted character string 142, it is determined that the entry in the first entry processing step S116 is not necessary and is a correct character string. Good.

第１マスター比較処理Ｓ２１３に次いで、第１誤読文字列分解処理Ｓ１１４では、第１誤読文字列２４３を一文字ずつに分解し、第１誤読文字群１４４を作る。例えば、図１０に示す誤読文字群例の「右額鸚挫Ｗ」１００１を図１１に示す第１誤読文字群例１１０２のように一文字ずつに分解する。 Subsequent to the first master comparison process S213, in a first misread character string decomposition process S114, the first misread character string 243 is decomposed into characters to create a first misread character group 144. For example, the “right forehead W” 1001 of the misread character group example shown in FIG. 10 is decomposed into characters one by one as in the first misread character group example 1102 shown in FIG.

次に、分類器生成処理ステップＳ２１６では、第１誤読文字群１４４と正解文字列１４５のペアである第１誤読文字群と正解文字列のペア２４６を作る。例えば、第２の実施の形態例では、第１誤読文字群と正解文字列のペア２４６は図１２に示す１２０２に示すペア文字列となる。 Next, in the classifier generation processing step S216, a first misread character group and correct character string pair 246, which is a pair of the first misread character group 144 and the correct character string 145, is created. For example, in the second embodiment, the pair 246 of the first misread character group and the correct character string is a pair character string indicated by 1202 shown in FIG.

なお、分類器生成２１６は、第２の実施の形態例でも、第１誤読文字群と正解文字列のペア２４６を多数集めてから機械学習アルゴリズムにより分類器２２２を作る。 In the second embodiment, the classifier generation 216 collects a number of first misread character group / correct character string pairs 246 and then creates the classifier 222 by a machine learning algorithm.

次に、図９の右側に示す正解類推部処理ステップＳ２０３を説明する。新たな複数の柔道整復施術療養費支給申請書を図２の第２スキャン処理ステップＳ１３１と同様の処理でイメージ読み取りを行い、所定数の第２イメージ１５１を得る。
次に、得られた複数の第２イメージ１５１に対して図２と同様のステップＳ１３２の第２文字列抽出処理で第２抽出文字列１５２を得る。
次に、得られた複数の第２抽出文字列１５２に対して図２と同様のステップＳ１３３の第２文字列認識処理で第２誤読文字列１５３を得る。 Next, the correct analog inference unit processing step S203 shown on the right side of FIG. 9 will be described. A new plurality of judo reduction treatment medical application payment applications are read in the same process as in the second scan processing step S131 of FIG. 2, and a predetermined number of second images 151 are obtained.
Next, the second extracted character string 152 is obtained by the second character string extraction process in step S132 similar to FIG. 2 for the plurality of second images 151 obtained.
Next, a second misread character string 153 is obtained by the second character string recognition process in step S133 similar to FIG. 2 for the plurality of second extracted character strings 152 obtained.

続くステップＳ２３３の第２マスター比較処理では、第２誤読文字列１５３についてマスター１６３を参照し、マスター１６３に登録されていない第２認識文字列１５３が検出された場合には、検出された文字列を第２誤読文字列２５３として取り出す。 In the subsequent second master comparison process in step S233, the master 163 is referred to for the second misread character string 153, and when the second recognized character string 153 not registered in the master 163 is detected, the detected character string Is taken out as a second misread character string 253.

次に、ステップＳ１３４の第２誤読文字列分解処理では、第２誤読文字列２５３を一文字ずつに分解し、第２誤読文字群１５４を作る。
次に、ステップＳ２３５の正解類推処理において、分類器１６２を先の機械学習アルゴリズムに設定し、第２誤読文字群１５４を機械学習に与えて類推文字列２５５を得る。 Next, in the second misread character string disassembling process in step S134, the second misread character string 253 is decomposed into characters one by one to create a second misread character group 154.
Next, in the correct answer analogy process in step S235, the classifier 162 is set to the previous machine learning algorithm, and the second misread character group 154 is given to the machine learning to obtain the analogy character string 255.

なお、第１の発明の実施の形態例と同様に、正解類推処理Ｓ２３５において、一つの第２誤読文字群２５３に対して、機械学習に指示し、複数の類推文字列とその確率２６５を得る様にしてもよい。 As in the embodiment of the first invention, in the correct analogy process S235, one second misread character group 253 is instructed to machine learning, and a plurality of analogy character strings and their probabilities 265 are obtained. You may do it.

第２の発明の実施の形態例によれば、高い正解率の認識文字列を得ることが出来ると共に、マスター１６３に負傷名を登録しているため、認識文字列が正解か否かを判断するに際し、対象を容易に絞ることが出来る。 According to the embodiment of the second invention, a recognized character string with a high accuracy rate can be obtained, and an injury name is registered in the master 163, so it is determined whether or not the recognized character string is correct. In this case, the target can be easily narrowed down.

〔第３の実施の形態例〕
次に、本発明に係る第３の実施の形態例を説明する。第３の実施の形態例に係る文字列認識誤読補正方法の概略を図１３を参照して説明する。図１３において、上述した図２に示す処理及び図９に示す処理と同様処理には同一ステップ番号を付し詳細説明を省略する。
図１３に示す第３の実施の形態例は、図２に示す誤読補正方法を改良したものであり、第１スキャン処理Ｓ１１１、第１文字列抽出処理Ｓ１１２、第１文字列認識処理Ｓ１１３、第１エントリ処理Ｓ１１５、第１文字列分解Ｓ１１４、第２スキャン処理Ｓ１３１、第２文字列抽出処理Ｓ１３２、第２文字列認識処理Ｓ１３３、第２文字列分解Ｓ１３４は図２に示す処理と同様である。
第３の実施の形態例においても、第２の実施の形態例と同様にマスター１６３には全ての負傷名が登録されている。 [Third Embodiment]
Next, a third embodiment according to the present invention will be described. An outline of the character string recognition misreading correction method according to the third embodiment will be described with reference to FIG. In FIG. 13, the same steps as those shown in FIG. 2 and the processing shown in FIG.
The third embodiment shown in FIG. 13 is an improvement of the misreading correction method shown in FIG. 2, and includes a first scan process S111, a first character string extraction process S112, a first character string recognition process S113, The one entry process S115, the first character string decomposition S114, the second scan process S131, the second character string extraction process S132, the second character string recognition process S133, and the second character string decomposition S134 are the same as the processes shown in FIG. .
Also in the third embodiment, all injured names are registered in the master 163 as in the second embodiment.

図１３において、ステップＳ３１３の第１類似文字列検索処理においては、第１認識文字列１４４に関してマスター１６３から対応する複数の類似文字列３４３を取り出す。たとえば、第１認識文字列１４４が図１４に示す曹部挫傷１３０１であった場合を例として説明する。 In FIG. 13, in the first similar character string search process in step S <b> 313, a plurality of similar character strings 343 corresponding to the first recognized character string 144 are extracted from the master 163. For example, a case where the first recognition character string 144 is a soda bruise 1301 shown in FIG. 14 will be described as an example.

この場合、第１類似文字列検索処理Ｓ３１３では、類似する複数の類似文字列３４３として、図１６に示す背部挫傷１３１１、臀部挫傷１３１２、殿部挫傷１３１３、腰部挫傷１３１４を検索して抽出したとする。文字の類似度としては、たとえばレーベンシュタイン編集距離などを使う。 In this case, in the first similar character string search process S313, the back contusion 1311, the buttock contusion 1312, the buttocks contusion 1313, and the lumbar contusion 1314 shown in FIG. 16 are searched and extracted as a plurality of similar character strings 343 similar to each other. To do. For example, the Levenshtein edit distance is used as the character similarity.

第１エントリ処理Ｓ１１５から得た正解文字列１４５が、図１５に示す背部挫傷１４０１であったとすると、第１中断判定処理Ｓ３１７では、この正解文字列である背部挫傷１４０１が複数の類似文字列３４３の中にある背部挫傷１３１１に一致するため、分類器生成処理Ｓ３１８に進む。 If the correct character string 145 obtained from the first entry process S115 is the back contusion 1401 shown in FIG. 15, in the first interruption determination process S317, the back contusion 1401 that is the correct character string is a plurality of similar character strings 343. Since it corresponds to the back contusion 1311 in the inside, the process proceeds to the classifier generation process S318.

一方、第１エントリ処理Ｓ１１５から得た正解文字列１４５が、図１５に示す大腿部捻挫１４１１であったとすると、第１中断判定処理Ｓ３１９においては、大腿部捻挫１４１１に対しては複数の類似文字列３４３の中に一致する文字列がないため、分類器生成処理Ｓ３１８に進まず、処理を中断する。 On the other hand, if the correct character string 145 obtained from the first entry process S115 is the thigh sprain 1411 shown in FIG. 15, a plurality of thigh sprains 1411 are not included in the first interruption determination process S319. Since there is no matching character string in the similar character string 343, the process is interrupted without proceeding to the classifier generation process S318.

第１認識文字列１４４が曹部挫傷１３０１であり、その正解文字列１４５が大腿部捻挫１４１１であった場合においては、類似性が低い。このようなデータから分類器を生成することはノイズを増やし計算量を増すのみで精度を向上させないため、誤読補正処理の分類器生成処理を中断する。 In the case where the first recognized character string 144 is a soda bruise 1301 and the correct character string 145 is a thigh sprain 1411, the similarity is low. Generating a classifier from such data only increases noise and increases the amount of calculation, but does not improve accuracy, so the misclassification correction processing classifier generation process is interrupted.

あるいは、第１文字列抽出処理Ｓ１１２において、柔道整復施術療養費支給申請書４０１の負傷名の位置検出に失敗したような場合で、右隣の負傷年月日を読んでしまい、第１抽出文字列１４２が図１４の“２３・９・５”１３０２であったような場合では、第１類似文字列検索処理Ｓ３１３においては類似文字列を検索することが出来ず、類似文字列を返すことが出来ないため、第１中断判定処理Ｓ３１９では、分類器生成処理Ｓ３１８に進まず、分類器への登録処理を中断する。 Alternatively, in the first character string extraction process S112, in the case where the position detection of the injury name in the judo reduction treatment medical treatment payment application 401 has failed, the injured date on the right is read and the first extracted character is read. In the case where the column 142 is “23 · 9 · 5” 1302 in FIG. 14, the similar character string cannot be searched in the first similar character string search process S313, and a similar character string may be returned. Since it cannot be performed, in the first interruption determination process S319, the process does not proceed to the classifier generation process S318, and the registration process to the classifier is interrupted.

第１抽出文字列１４２が“２３・９・５”１３０２である様な場合には、その正解文字列１４５が何であろうと、類似性が低い。このようなデータから分類器を生成することはノイズを増やし計算量を増すのみで精度を向上させないため、この場合にも分類器生成処理を中断する。 When the first extracted character string 142 is “23 · 9 · 5” 1302, the similarity is low regardless of the correct character string 145. Generating a classifier from such data only increases noise and increases the amount of calculation, but does not improve accuracy. In this case, the classifier generation process is interrupted.

次に、たとえば、第１抽出文字列１４２が図１４に示す曹部挫傷１３０１であったとすれば、第１文字列分解処理Ｓ１４４は、図１６に示す曹，部，挫，傷，，，，，，，１５０１のように分解し、第１認識文字群１４４を生成する。 Next, for example, if the first extracted character string 142 is a soda bruise 1301 shown in FIG. 14, the first character string disassembling process S144 performs the soda, part, flaw, scratch,... ,, 1501 to generate a first recognized character group 144.

次に、分類器生成処理Ｓ３１８は、第１認識文字群１４４と関連付けられている正解文字列１４５のペアである第１認識文字群と正解文字列のペア１４６を作る。たとえば、図１７に示すように曹，部，挫，傷，，，，，，，背部挫傷１６０１である。このデータを多数集めて、機械学習アルゴリズムで分類器１６２を作成する。 Next, the classifier generation processing S318 creates a first recognized character group and correct character string pair 146 that is a pair of correct character strings 145 associated with the first recognized character group 144. For example, as shown in FIG. A large number of these data are collected and a classifier 162 is created by a machine learning algorithm.

サポートベクトルマシン（ＳＶＭ）に代表される機械学習の分類器は、一つの正解文字列ともう一つの正解文字列とのどちらに分類するのかという分類器が多数集まって構成される。曹，部，挫，傷，，，，，，，背部挫傷１６０１というデータは、背部挫傷ｖｓ大腿部挫傷という分類器の生成にも利用される。 A classifier for machine learning typified by a support vector machine (SVM) is composed of a large number of classifiers that classify one correct character string or another correct character string. The data of Cao, part, contusion, wound, ..., back contusion 1601 is also used to generate a classifier called back contusion vs thigh contusion.

しかし、曹部挫傷１３０１の複数の類似文字列３４３である背部挫傷１３１１、臀部挫傷１３１２、殿部挫傷１３１３、腰部挫傷１３１４を考慮して、曹，部，挫，傷，，，，，，，背部挫傷１６０１というデータを背部挫傷ｖｓ臀部挫傷、背部挫傷ｖｓ殿部挫傷、背部挫傷ｖｓ腰部挫傷という分類器１６２の生成にのみ利用することで、分類器１６２を生成する計算量を減らし、精度を上げることもできる。 However, in consideration of a back contusion 1311, a buttocks contusion 1312, a buttocks contusion 1313, and a lumbar contusion 1314, which are a plurality of similar character strings 343 of the soda contusion 1301, a soda, part, contusion, wound,. By using the data of the back contusion 1601 only for the generation of the classifier 162 of the back contusion vs the buttock contusion, the back contusion vs the buttock contusion, the back contusion vs the lumbar contusion, the amount of calculation for generating the classifier 162 is reduced and the accuracy is improved. It can also be raised.

次に、図１３の右側に示す正解類推処理Ｓ３０３を説明する。新たな複数の柔道整復施術療養費支給申請書を第２スキャン処理Ｓ１３１でスキャンし、複数の第２イメージ１５１を得る。
次に、得られた複数の第２イメージ１５１に対して第２文字列抽出処理Ｓ１３２では、第２抽出文字列１５２を得る。
次に、得られた複数の第２抽出文字列１５２に対して第２文字列認識処理Ｓ１３３において、第２認識文字列１５３を得る。 Next, the correct analogy process S303 shown on the right side of FIG. 13 will be described. A plurality of new judo reduction treatment medical expenses application forms are scanned in the second scan process S131 to obtain a plurality of second images 151.
Next, in the second character string extraction process S132, the second extracted character string 152 is obtained for the obtained plurality of second images 151.
Next, the second recognized character string 153 is obtained in the second character string recognition process S133 for the plurality of second extracted character strings 152 obtained.

第２類似文字列検索Ｓ３３３では、第２認識文字列１５３についてマスター１６３から複数の類似文字列３５３を取り出す。
第２認識文字列１５３に似た文字列がマスター１６３内に登録されておらず検索できない場合、第２中断判定処理Ｓ３３９における判定で正解類推処理を中断する。 In the second similar character string search S333, a plurality of similar character strings 353 are extracted from the master 163 for the second recognized character string 153.
When a character string similar to the second recognized character string 153 is not registered in the master 163 and cannot be searched, the correct answer analogy process is interrupted in the determination in the second interruption determination process S339.

第２文字列分解処理Ｓ１３４においては、第２認識文字列１５３を分解し、第２認識文字群１５４を生成する。
次に、正解類推処理Ｓ２３５においては、分類器１６２を先の機械学習アルゴリズムに設定し、第２認識文字群１５４を機械学習アルゴリズムに与えると、類推文字列３５５を生成することが出来る。 In the second character string decomposition process S134, the second recognized character string 153 is decomposed to generate a second recognized character group 154.
Next, in the correct analogy processing S235, when the classifier 162 is set to the previous machine learning algorithm and the second recognized character group 154 is given to the machine learning algorithm, the analogy character string 355 can be generated.

あるいは、正解類推処理Ｓ２３５において、一つの第２認識文字群１５４に対して、機械学習に指示し、複数の類推文字列とその確率３６５を得ることができる様に構成してもよい。
分類器生成処理Ｓ３１８では、似た文字列（類似文字列）が検索されない場合には処理を中断し、認識文字列のデータは分類器１６２の生成には使われなかった。このため、正解類推処理Ｓ３３８でも同様に、似た文字列（類似文字列）が検索されない場合には正解類推処理を中断する。しかしながら、類推が当たる可能性もあるため、正解類推処理を中断せずに続行する様にしてもよい。 Alternatively, in the correct analogy processing S235, it may be configured such that one second recognized character group 154 can be instructed to machine learning to obtain a plurality of analogy character strings and their probabilities 365.
In the classifier generation process S318, when a similar character string (similar character string) is not searched, the process is interrupted, and the data of the recognized character string is not used for generation of the classifier 162. For this reason, in the correct analogy process S338 as well, the correct analogy process is interrupted when a similar character string (similar character string) is not searched. However, since there is a possibility of analogy, the correct analogy process may be continued without interruption.

また、複数の類似文字列３５３を対象とする分類器１６２のみを参照して類推するように機械学習に指示することにより、機械学習は類推する範囲を絞り込むことができ、類推に必要な計算時間が大幅に減少することが出来る。 In addition, by instructing machine learning to make an analogy by referring to only the classifier 162 that targets a plurality of similar character strings 353, the machine learning can narrow down the range of analogy, and the calculation time required for the analogy Can be significantly reduced.

例えば、第２類似文字列検索処理Ｓ３３３で検索された複数の類似文字列が、背部挫傷１３１１、臀部挫傷１３１２、殿部挫傷１３１３、腰部挫傷１３１４であった場合、その組み合わせである、背部挫傷ｖｓ臀部挫傷、背部挫傷ｖｓ殿部挫傷、背部挫傷ｖｓ腰部挫傷、臀部挫傷ｖｓ殿部挫傷、臀部挫傷ｖｓ腰部挫傷、殿部挫傷ｖｓ腰部挫傷という分類器のみで確率を計算することにより、計算量を大幅に減らすことができる。 For example, when the plurality of similar character strings searched in the second similar character string search processing S333 are a back contusion 1311, a buttocks contusion 1312, a buttocks contusion 1313, and a lumbar contusion 1314, a back contusion vs. By calculating the probabilities only with the classifiers of buttock contusion, back contusion vs buttock contusion, back contusion vs lumbar contusion, buttock contusion vs genital contusion, buttock contusion vs lumbar contusion, gluteal contusion vs lumbar contusion It can be greatly reduced.

さらに、類似度を確率の計算に加えて、精度を高めることができる。例えば、第２類似文字検索Ｓ３３３において類似度を広くして検索すると、曹部挫傷１３０１の類似文字列として、大腿部挫傷も類似文字となる。背部挫傷は、曹部挫傷から一文字違いであり、大腿部挫傷は二文字違いである。この類似度の相違を確率計算に反映することにより、精度を高めることができる。 Furthermore, the accuracy can be increased by adding the similarity to the probability calculation. For example, if the second similar character search S333 is performed with the similarity increased, the thigh contusion becomes a similar character as the similar character string of the soda contusion 1301. The back contusion is one letter different from the Sobe contusion, and the thigh contusion is two letters different. By reflecting this difference in similarity in the probability calculation, the accuracy can be improved.

さらに、第２類似文字列検索Ｓ３３３の前に、第２認識文字列の内、明らかな誤読文字を正解文字に置き換えてしまう処理も精度の向上に有効である。たとえば、第２認識文字列が、治下腿部挫傷Ｗであった場合、治下腿部を右下腿部と置き換えてから、第２類似文字列検索Ｓ３３３を行うことにより、第２類似文字列３５３に正解文字列が含まれる確率が上がり、精度が向上する。 Furthermore, a process of replacing an apparently misread character with a correct character in the second recognized character string before the second similar character string search S333 is also effective in improving accuracy. For example, if the second recognized character string is a treated leg bruise W, the second similar character string search S333 is performed after the treated lower leg part is replaced with the right lower leg part, whereby the second similar character string The probability that the correct character string is included in the column 353 is increased, and the accuracy is improved.

〔第４の実施の形態例〕
次に、本発明に係る第４の発明の実施の形態例を図１８のフローチャートも参照して説明する。以上で説明した第１乃至第３の実施の形態例では、分類器生成処理、正解類推処理をそれぞれ１種類揃える例を説明した。しかし本発明は以上の例に限定されるものではなく複数種類備えてもよい。或いは、それぞれの例を２回、或いはそれ以上繰り返し実行してもよい。 [Fourth Embodiment]
Next, a fourth embodiment of the present invention will be described with reference to the flowchart of FIG. In the first to third exemplary embodiments described above, an example in which one type of classifier generation processing and one type of correct analogy processing are prepared has been described. However, the present invention is not limited to the above examples, and a plurality of types may be provided. Alternatively, each example may be repeated twice or more.

第４の実施の形態例では、異なる二種類のイメージ処理を行うこととし、それぞれイメージ読み込み処理、文字認識処理を行うこととし、それぞれの処理を文字認識処理Ａと文字認識処理Ｂと呼ぶ。第４の実施の形態例では分類器生成部と正解類推部も二つ用意する。 In the fourth embodiment, two different types of image processing are performed, and image reading processing and character recognition processing are performed, respectively. These processes are referred to as character recognition processing A and character recognition processing B, respectively. In the fourth embodiment, two classifier generators and two correct analogies are also prepared.

分類器生成処理部S１７０２として．第１文字列抽出・認識処理S１７１１と分類器生成処理S１７１２を実行する。正解類推処理Ｓ１７０３では、第２文字列抽出・認識処理S１７１３、正解類推処理S１７１４を実行する。そして、類推文字列１７１５、あるいは、複数の類推文字列とその確率１７１６と共に生成する。 As a classifier generation processing unit S1702. A first character string extraction / recognition process S1711 and a classifier generation process S1712 are executed. In the correct analogy process S1703, a second character string extraction / recognition process S1713 and a correct analogy process S1714 are executed. Then, an analogy character string 1715 or a plurality of analogy character strings and their probabilities 1716 are generated.

一方、分類器生成処理部S１７０４として．第３文字列抽出・認識処理S１７２１と分類器生成処理S１７２２を実行する。正解類推処理Ｓ１７０５では、第４文字列抽出・認識処理S１７２３、正解類推処理S１７２４を実行する。そして、類推文字列１７２５、あるいは、複数の類推文字列とその確率１７２６と共に生成する。 On the other hand, as the classifier generation processing unit S1704. A third character string extraction / recognition process S1721 and a classifier generation process S1722 are executed. In the correct analogy process S1705, a fourth character string extraction / recognition process S1723 and a correct analogy process S1724 are executed. Then, an analogy character string 1725 or a plurality of analogy character strings and their probabilities 1726 are generated.

第４の実施の形態例では、例えば２つの分類器生成処理の一方を第１の実施の形態例の分類器生成処理を行うように制御し、分類器生成処理１７０４は第２実施の形態例の処理を採用するようにしてもよい。 In the fourth embodiment, for example, one of the two classifier generation processes is controlled to perform the classifier generation process of the first embodiment, and the classifier generation process 1704 is the second embodiment. You may make it employ | adopt the process of.

例えば、異なる文字列抽出、認識処理は、異なる文字認識能力を持ち、当該処理から生成される分類器１６２１と分類器１６２２の登録内容が異なるものとなる。このことから、正解類推処理Ｓ１７１４と正解類推処理Ｓ１７２４とは、異なる根拠に基づき、誤読文字群から類推文字列１７１５と類推文字列１７２５とを選び出す。 For example, different character string extraction and recognition processes have different character recognition capabilities, and the registered contents of the classifier 1621 and the classifier 1622 generated from the processes are different. Therefore, the correct analogy process S1714 and the correct analogy process S1724 select the analogy character string 1715 and the analogy character string 1725 from the misread character group based on different grounds.

第４の実施の形態例においては、異なる認識方法で生成した類推文字列１７１５，１７２５をステップＳ１７３１の比較処理Ｓ１７３１が取り込み、類推文字列１７１５と類推文字列１７２５とを比較し、合致していれば類推文字列１７３２を出力し、合致していなければ合致せずと出力する。合致しない場合でも、どちらかが正解である確率は上がり、改善される。 In the fourth embodiment, the analogy character strings 1715 and 1725 generated by different recognition methods are taken in by the comparison process S1731 in step S1731, and the analogy character string 1715 and the analogy character string 1725 are compared and matched. For example, the analogy character string 1732 is output, and if it does not match, it outputs that it does not match. Even if they do not match, the probability that either is correct is increased and improved.

あるいは、合致しない場合でも、第３の実施の形態例の処理を適用し、正解類推１７１４と正解類推１７２４が機械学習に指示し、複数の類推文字列とその確率１７１６と複数の類推文字列とその確率１７２６を得た場合、確率の高い類推文字列を選択することで、正解である可能性が高まる。
本実施例では、二種類の文字認識処理を利用する例を示したが、三種類以上の文字認識方法を適用することも可能である。 Alternatively, even if they do not match, the processing of the third embodiment is applied, and the correct analogy 1714 and the correct analogy 1724 instruct machine learning, and a plurality of analogy strings, their probabilities 1716, and a plurality of analogy strings When the probability 1726 is obtained, the possibility of a correct answer increases by selecting an analogy character string with a high probability.
In the present embodiment, an example in which two types of character recognition processes are used has been described. However, three or more types of character recognition methods can be applied.

〔第５の実施の形態例〕
本発明に係る第５の実施の形態例を図１９を参照して説明する。第５の実施の形態例において、上記した各実施の形態例と異なるところを主に説明する。
第５の実施の形態例は、上記各実施の形態例で得られた類推文字列１１５を入手後の活用に特徴を有している。たとえば、操作者が第２イメージ１５１の負傷名を参考としながらエントリする必要がある様な場合には中央処理装置１００は表示装置１１０に図１９に示すエントリ画面１８０１を表示する。 [Fifth Embodiment]
A fifth embodiment according to the present invention will be described with reference to FIG. In the fifth embodiment, differences from the above-described embodiments will be mainly described.
The fifth embodiment is characterized in that the analogy character string 115 obtained in each of the above embodiments is used after being obtained. For example, when the operator needs to make an entry with reference to the injury name of the second image 151, the central processing unit 100 displays the entry screen 1801 shown in FIG.

エントリ画面１８０１には、第２イメージ１５１を表示するエリア１８０２があり、その中に表示された負傷名でエントリされる負傷名にハイライト１８０３が施され、その負傷名の拡大１８０４を右上に表示するように制御する。右側には負傷名のエントリエリア１８０６が設けられるようにする。 The entry screen 1801 has an area 1802 for displaying the second image 151. The injury name entered with the injury name displayed therein is highlighted 1803, and an enlarged 1804 of the injury name is displayed on the upper right. Control to do. An injured name entry area 1806 is provided on the right side.

類推文字列１５５をエントリエリア１８０６の下にエントリ候補文字１８０７として表示する。このため、操作者は、負傷名の拡大１８０４とエントリ候補文字１８０７とが等しければ、エンターキーを入力するのみの操作でエントリ候補文字１８０７が負傷名のエントリエリア１８０６に入力され、次のエントリエリアに自動的にカーソルを移動させる。これによりエントリに要する時間が大幅に短縮する。 The analogy character string 155 is displayed as an entry candidate character 1807 below the entry area 1806. Therefore, if the injured name expansion 1804 and the entry candidate character 1807 are equal, the operator can input the entry candidate character 1807 into the injured name entry area 1806 only by inputting the enter key, and the next entry area Automatically move the cursor. This greatly reduces the time required for entry.

〔第６の実施の形態例〕
本発明に係る第６の実施の形態例を図２０を参照して説明する。第６の実施の形態例において、上記した各実施の形態例と異なるところを主に説明する。 [Sixth embodiment]
A sixth embodiment of the present invention will be described with reference to FIG. In the sixth embodiment, differences from the above-described embodiments will be mainly described.

第６の実施の形態例は、上記各実施の形態例で得られた複数の類推文字列とその確率１６５を入手後の活用に特徴を有している。たとえば、操作者が第２イメージ１５１の負傷名を参考としながらエントリする必要がある様な場合には中央処理装置１００は表示装置１１０に図２０に示すエントリ画面１８０１を表示する。 The sixth embodiment is characterized by the utilization after obtaining a plurality of analogy character strings and their probabilities 165 obtained in the above embodiments. For example, when the operator needs to make an entry with reference to the injury name of the second image 151, the central processing unit 100 displays an entry screen 1801 shown in FIG.

第６の実施の形態例では、例えば第１の実施の形態例における類推文字列１６５とその確率とを入手後の有効活用処理に特徴を有している。たとえば、オペレータが第２イメージ１５１の負傷名を参考として正解文字列を見て、エントリするときに、エントリ画面１８０１を表示させる。エントリ画面１８０１には、第２イメージ１５１を表示するエリア１８０２があり、その中に表示された負傷名でエントリされる負傷名にハイライト１８０３が施され、その負傷名の拡大１８０４が右上に表示され、その負傷名のエントリエリア１８０６が設けられている。 In the sixth embodiment, for example, there is a feature in the effective utilization processing after obtaining the analogy character string 165 and its probability in the first embodiment. For example, when the operator looks at the correct character string with reference to the injury name of the second image 151 and makes an entry, the entry screen 1801 is displayed. The entry screen 1801 has an area 1802 for displaying the second image 151. The injury name entered with the injury name displayed therein is highlighted 1803, and an enlarged 1804 of the injury name is displayed on the upper right. An entry area 1806 for the injured name is provided.

具体的には、例えば以下のように類推文字列を確率の高い順に表示し、正解文字列の特定を容易にしている。複数の類推文字列とその確率を得たとする。例えば、類推文字列の一つ目が「右膝関節捻挫」でその確率が５０%であり、類推文字列の二つ目が「右肘関節捻挫」でその確率が３０%であり、類推文字列の三つ目が「右肩関節捻挫」でその確率が２０%であり、和した確率値が８０％となったら表示を中止とする。 Specifically, for example, analogy character strings are displayed in descending order of probability as follows, so that the correct character string can be easily identified. Suppose that we got multiple analogy strings and their probabilities. For example, the first analogy string is “Right Knee Sprain” and the probability is 50%. The second analogy string is “Right Elbow Sprain” and the probability is 30%. The third item in the column is “Right shoulder sprain” and its probability is 20%. When the summed probability value is 80%, the display is stopped.

「右膝関節捻挫」の確率が５０%で一番高く、「右肘関節捻挫」の確率が３０%と二番目に高い。確率を和すると８０％となり、ここまでの表示となる。すなわち、右膝関節捻挫１９０１と右肘関節捻挫１９０２が表示される。 The probability of “right knee sprain” is the highest at 50%, and the probability of “right elbow sprain” is the second highest, at 30%. When the probability is summed, it becomes 80%, and the display so far is obtained. That is, a right knee joint sprain 1901 and a right elbow sprain 1902 are displayed.

オペレータは、右膝関節捻挫１９０１と右肘関節捻挫１９０２の表示に対して、負傷名の拡大１１０４と等しい方を選択し、エンターキーを入力することにより、選択した類推文字列が負傷名のエントリーエリア１８０６に入力され、次のエントリーエリアにカーソルが移動する。これによりエントリーに要する時間が短縮し、かつ、正解の類推文字列が現れる確率が増加する。
第６の実施の形態例によれば正解文字列の特定が極めて簡単な操作で確実に行うことができる。 The operator selects the right knee joint sprain 1901 and right elbow joint sprain 1902 that is equal to the injured name enlargement 1104 and inputs an enter key, so that the selected analogy string is the injured name entry. Input is made in area 1806, and the cursor moves to the next entry area. This shortens the time required for entry and increases the probability that a correct analogy character string will appear.
According to the sixth embodiment, the correct character string can be reliably identified by an extremely simple operation.

以上、本発明に係る誤読補正方法について説明したが、本発明にかかる誤読補正方法は、上述した実施形態例にのみ限定されるものではなく、本発明の範囲で種々の変更実施が可能であることは言うまでもない。 The misread correction method according to the present invention has been described above, but the misread correction method according to the present invention is not limited to the above-described embodiment, and various modifications can be made within the scope of the present invention. Needless to say.

図２１を用いて、本発明の産業上の利用例を説明する。例えば、診療報酬請求書などの中に印刷された傷病名等をデータ化するときは、高い正確性が求められる。そのため本発明の誤読補正方法適用前の従来の処理２００１では、手入力による第１エントリ処理２０１２と第２エントリ処理２０１３が異なるオペレータにより行われ、その二つのエントリ結果を比較し合致しないデータを取り出す、異なるデータ取出し２０１４が行われ、その合致しないデータに関してエントリを行う第３エントリが行われる。 An example of industrial use of the present invention will be described with reference to FIG. For example, high accuracy is required when data such as a name of a sickness or the like printed in a medical fee bill or the like is converted into data. Therefore, in the conventional process 2001 before applying the misreading correction method of the present invention, the first entry process 2012 and the second entry process 2013 are manually performed by different operators, and the two entry results are compared to extract data that does not match. A different data retrieval 2014 is performed, and a third entry is performed for making an entry regarding the mismatched data.

本発明に係る誤読補正方法適用後２００２では、第１エントリ２０１２を廃止し、本発明に係る各実施の形態例の誤読補正方法で出力したデータを異なる取り出しデータに代わりに用いる。このことにより、エントリする件数を大幅に減らすことが出来る。このような高い正確性が求められるデータ化は、処方された薬品名など数多く存在し、エントリを２回行う運用は広く行われており、本発明の有効範囲も広い。 After applying the misreading correction method according to the present invention, the first entry 2012 is abolished, and the data output by the misreading correction method of each embodiment according to the present invention is used instead of different extracted data. As a result, the number of entries can be greatly reduced. Such data that requires high accuracy has many prescription drug names, etc., operation of performing entry twice is widely performed, and the effective range of the present invention is also wide.

Claims

A display means for displaying image information;
Image acquisition means for acquiring a processing target document as image information;
Correct character string acquisition means for displaying the necessary character string in the image information acquired by the image acquisition means on the display means and acquiring the input character string as a correct character string by confirming the display;
Recognizing means for recognizing characters by extracting necessary character strings in the image information obtained by the image obtaining means;
A classifier that pre-registers the character recognition result recognized by the recognizing unit in association with the corresponding correct character string acquired by the correct character string acquiring unit;
Analogizing means for analogizing a correct character string for a recognition result newly recognized with reference to a recognition result and a correct character string registered in association with the classifier ,
The recognition result to be registered in the classifier, misreading correction method recognized character group corresponding to the required character string in the image information recognized by said recognizing means and said Rukoto registered in association with the correct character string .

And registration master are further advance the correct answer string registration,
Determining means for determining whether or not the necessary character string recognition result recognized by the recognition means is a misrecognized character string based on whether or not the result is registered in the registration master;
And a misrecognition character string extraction means extracting the necessary character string recognition result of determining the recognized character string the determination unit is a wrong
The misreading correction method according to claim 1 , wherein the analogizing means analogizes a correct character string from the character string determined to be erroneous recognition.

And registration master are further advance the correct answer string registration,
Read means for determining whether or not a character string similar to the recognition result is registered in the registration master and reading out a similar character string;
The misreading correction method according to claim 1, wherein when a similar character string is not registered by the reading unit, registration of the recognition result recognized by the recognizing unit to the classifier is stopped.

And registration master are further advance the correct answer string registration,
Determination means for determining whether or not a character string similar to the recognition result is registered in the registration master;
A registration cancellation unit for canceling registration of the recognition result recognized by the recognition unit in the classifier when a character string similar to the recognition result is not registered in the registration master;
Reading means for reading out a similar character string when a character string similar to a character string newly recognized by the determination means is registered in the registration master;
2. The misread correction method according to claim 1, wherein the correct character string for the newly recognized character recognition result is inferred with reference to a recognition result and a correct character string registered in association with the classifier.

5. The method of correcting misreading according to claim 1, wherein the analogizing means analogizes a plurality of analogical character strings with probability.

The recognition means performs character recognition using a plurality of types of character recognition methods,
6. The classification device according to claim 1, wherein a recognition result obtained by character recognition and a corresponding correct character string acquired by the correct character string acquisition unit are associated with each other and registered in advance. Misreading correction method of description.

An analogy string display means for displaying the analogy string estimated by the analogy means on the display means;
The misread correction method according to claim 1, wherein the recognition character string can be corrected with reference to the analogy character string displayed by the display unit.

The analogy string display means displays a plurality of analogy strings estimated by the analogy means in order of high probability of correct answer, and displays the analogy string until the sum of correct probabilities reaches a certain value. The misread correction method according to claim 7.