JPH04336684A

JPH04336684A - Method and device for document processing

Info

Publication number: JPH04336684A
Application number: JP3109226A
Authority: JP
Inventors: Yoji Furuya; 陽二古谷; Sadahiro Tanaka; 貞浩田中
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1991-05-14
Filing date: 1991-05-14
Publication date: 1992-11-24

Abstract

PURPOSE:To output a text obtained through character recognition after speech synthesis by operating a character recognition device and a speech synthesizer with a host computer. CONSTITUTION:A part of character recognition processing for one character image with the heaviest load in a series of character recognition operation is made independent as separate control and separate box body, and the other processing is separated so as to be performed on a main machine. The left part of dotted lines shows the personal computer main body and the right part shows each equipment connected to the personal computer. A CPU 1 controls and judges the entire system according to various programs stored in an external storage device 9. Especially, a character recognition device 21 connected to the personal computer and a speech synthesizer 22 can operate by one host computer. Thus, a speaker is generated by adding commercially marketed speech output device and soft ware to the text of the recognition result and the character recognition result can be confirmed through speech.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、原稿上の文字情報を光
学的文字読み取り装置（ＯＣＲ）により読み取って認識
し、認識結果として得たテキストを音声として出力する
文書処理方法及び装置に関するものである。[Field of Industrial Application] The present invention relates to a document processing method and device that reads and recognizes character information on a document using an optical character reader (OCR) and outputs the text obtained as a recognition result as audio. be.

【０００２】0002

【従来の技術】従来、文字認識装置は、原稿である用紙
上のイメージの読み取り、行のサーチ、１文字ずつ切り
出し、文字認識、認識結果の評価と修正、表示いといっ
た処理を一通りすべて一台のワークステーションか、パ
ソコン或いは特殊な文字認識装置で行なうものはあった
。[Prior Art] Conventionally, a character recognition device performs all processes such as reading an image on a sheet of paper, searching for lines, cutting out characters one by one, character recognition, evaluating and correcting the recognition results, and displaying them. In some cases, this was done using a single workstation, a personal computer, or special character recognition equipment.

【０００３】従来、テキストを入力し、音声合成して出
力する音声合成装置はあった。Conventionally, there have been speech synthesis devices that input text, synthesize it into speech, and output it.

【０００４】0004

【発明が解決しようとしている課題】しかしながら、上
記従来例では、１台のマシン上で文字認識作業のすべて
を行なっているために次のような欠点があった。（１）マシン自体に大きな負荷がかかるので、高性能の
つまりは高価なマシンが必要。（２）高性能のマシンを用いても、認識速度が低い。１
０〜２０文字／秒程度の製品が一般的。（３）マシン自体がほぼ文字認識の専用機の形態になっ
てしまうため、文字認識処理にからむ、各工程の自由度
が少ない。つまり、用紙上の活字の文字イメージの読み
込みルーチンや、１文字イメージの切り出しルーチン等
の改良も、ユーザ側で行なうことは事実上できない。（４）（３）と同じく、マシン自体がほぼ文字認識の専
用機の形態になってしまうため、例えば、ユーザ側で文
字認識結果の音声出力装置を追加する、等を行なおうと
しても、事実上できない。However, in the conventional example described above, all the character recognition work is performed on one machine, which has the following drawbacks. (1) Since a large load is placed on the machine itself, a high-performance or expensive machine is required. (2) Even if a high-performance machine is used, the recognition speed is low. 1
Products with speeds of 0 to 20 characters/second are common. (3) Since the machine itself is essentially a specialized machine for character recognition, there is little freedom in each step involved in character recognition processing. In other words, it is virtually impossible for the user to improve the routine for reading character images of type on paper or the routine for cutting out single character images. (4) As in (3), since the machine itself is almost a specialized machine for character recognition, even if the user tries to add an audio output device for the character recognition results, for example, Practically not possible.

【０００５】しかしながら、上記従来例では、ホストコ
ンピュータを文字認識処理のみに用いるようプログラム
が定まっている為、認識結果をそのままディスプレイな
どに表示するしかできず、データの変換などが不可能で
あった。[0005] However, in the above-mentioned conventional example, since the program is designed to use the host computer only for character recognition processing, it is only possible to display the recognition results as they are on a display, etc., and it is impossible to convert data. .

【０００６】[0006]

【課題を解決するための手段（及び作用）】本発明によ
れば、一連の文字認識作業の中で、最も負荷のかかる、
１文字イメージだけの文字認識処理の部分を、別制御、
別筺体として、独立させ、他の各処理は、従来通りメイ
ンのマシン上で行なうよう、分離することで、（１）メ
インのマシンは、低価格のパソコン、あるいはワークス
テーションで実行でき、（２）認識速度も、従来の２〜
３倍の６０〜７０字／秒が確保でき、（３）また、メイ
ンのマシン側の処理である、用紙上の活字イメージの読
み込みのプログラム、１文字イメージの切り出しプログ
ラム、認識結果の評価と修正のプログラム、テキスト表
示のプログラム等もユーザ側で、市販ソフトを組み合わ
せる、あるいは、一部自作する等自由度がかなり広がり
、（４）認識結果のテキストに対し、市販の音声出力の
装置、ソフトウェアを付加して、スピーカ発生させ、文
字認識結果を音声で確認できるようにしたものである。[Means for Solving the Problems (and Effects)] According to the present invention, among a series of character recognition tasks, the most
The character recognition process for only one character image is controlled separately.
By separating it into a separate chassis and performing all other processing on the main machine as before, (1) the main machine can be run on a low-cost personal computer or workstation, and (2) ) Recognition speed is also faster than conventional 2~
3 times the speed of 60 to 70 characters/second can be secured, and (3) the main machine-side processing is a program for reading type images on paper, a program for cutting out single character images, and evaluation and correction of recognition results. , text display programs, etc., users have a considerable degree of freedom by combining commercially available software or creating some themselves. In addition, a speaker is used to generate the sound so that character recognition results can be confirmed by voice.

【０００７】[0007]

【実施例】（実施例１）以下、図に従って第１の実施例について説
明する。[Example] (Example 1) A first example will be described below with reference to the drawings.

【０００８】図１は、本実施例のシステム構成図である
。点線の左側の部分は、パーソナルコンピュータ（以下
、パソコンと称す）本体を示し、右側は、パソコンに接
続した各種装置を示している。１は中央処理装置（以下
、ＣＰＵと称す）は、外部記憶装置９に格納されている
各種プログラムに従って、本システム全体の制御及び判
断を行なう。２は、データやテキストを表示するＣＲＴ
ディスプレイ、３は文字列等を入力する為の各種キーを
有するキーボード、４はポインティングデバイスとして
のマウス、５はメモリである。メモリ５は、リード・オ
ンリー・メモリ（以下、ＲＯＭと称す）、ランダム・ア
クセス・メモリ（以下、ＲＡＭと称す）７、ビデオＲＡ
Ｍ（以下、ＶＲＡＭと称す）８で構成されている。ＲＯ
Ｍ６には基本的入出力制御を行なうソフトウェアである
、オペレーティングシステム（以下ＯＳと称す）の一部
や、表示用の文字フォント等が内蔵されている。ＲＡＭ
７は、外部記憶装置９に内蔵されている各種プログラム
をロードして実行する場になる。例えば、画像読み込み
プログラム１０−２が使用する画像バッファ、文字切り
出しプログラム１０−３、文認識指令プログラム１０−
４及び後処理プログラム１０−５が使用する文字認識バ
ッファ、音声合成プログラム１０−６が使用する音声合
成バッファが、ＲＡＭ７中に設けられる。ＶＲＡＭ８は
、ＣＲＴディスプレイ２に表示するための表示用データ
を格納する場所である。FIG. 1 is a system configuration diagram of this embodiment. The left side of the dotted line shows the main body of a personal computer (hereinafter referred to as a personal computer), and the right side shows various devices connected to the personal computer. A central processing unit (hereinafter referred to as CPU) 1 controls and makes decisions about the entire system according to various programs stored in an external storage device 9. 2 is a CRT that displays data and text
A display, 3 a keyboard having various keys for inputting character strings, 4 a mouse as a pointing device, and 5 a memory. The memory 5 includes a read-only memory (hereinafter referred to as ROM), a random access memory (hereinafter referred to as RAM) 7, and a video RA.
M (hereinafter referred to as VRAM)8. R.O.
The M6 includes a part of an operating system (hereinafter referred to as OS), which is software that performs basic input/output control, and character fonts for display. RAM
7 is a place where various programs stored in the external storage device 9 are loaded and executed. For example, an image buffer used by the image reading program 10-2, a character cutting program 10-3, a sentence recognition command program 10-
4 and a character recognition buffer used by the post-processing program 10-5, and a speech synthesis buffer used by the speech synthesis program 10-6 are provided in the RAM 7. The VRAM 8 is a place where display data to be displayed on the CRT display 2 is stored.

【０００９】外部記憶装置９の中には、後に説明する図
６〜図２２に示すフローチャートのような処理の各種プ
ログラム１０、データの保存ファイル１１及び１２、辞
書ファイル１３〜１４等が内蔵されている。詳しく述べ
ると１０は、本実施で用いる文字認識・音声出力ソフト
で、後に図６のフローチャートを用いて説明するメイン
プログラム１０−１の制御の下で、ユーザの指示に従っ
て各種プログラムが起動される。１０−２は、用紙上の
画像イメージでスキャナ２０を用いてＲＡＭ７の画像バ
ッファ７−１内に取り込むプログラムであり、図８のフ
ローチャートを用いて後述する。文字切り出しプログラ
ム１０−３は、画像バッファ７−１内の画像イメージか
ら、文字１文字分のイメージを切り出すプログラムであ
り、図１０のフローチャートを用いて後述する。１０−
４は、文字認識指令プログラムであって、切り出した１
文字分の画像情報を文字認識装置２１に送り、１文字の
文字認識処理を実行させるプログラムであり、図９のフ
ローチャートを用いて後述する。１０−５は後処理プロ
グラムであって、文字認識後のテキストの評価を行ない
、文字候補が複数ある場合は、より正しいと推定される
文字候補を第１候補とする、等のはたらきをするプログ
ラムであり、図１１のフローチャートを用いて後述する
。１０−６は音声合成プログラムであって、文字認識で
完成したテキストを、漢字仮名変換して読みに変え、さ
れにアクセント、イントネーション情報を付けて、音声
合成装置２２に送り、スピーカ２３より発生させる処理
を行なうプログラムであり、図１２のフローチャートを
用いて後述する。このとき、音声合成プログラム１０−
６は、漢字仮名変換辞書ファイル１３、アクセント辞書
ファイル１４、イントネーション規制ファイル１５を用
いて、漢字の読みへの変換、アクセント情報付加、イン
トネーション情報付加を行なう。１０−７は保存プログ
ラムであって、画像バッファ７−１内の画像イメージデ
ータを画像ファイル１１に保存、あるいは、文字認識バ
ッファ７−２内にある文字認識結果のテキストデータを
、テキストファイル１２に保存する役目を持つものであ
り、図１３のフローチヤートを用いて後述する。また、
反対に、画像ファイル１１、あるいはテキストファイル
１２内のデータを画像バッファ７−１、あるいは文字認
識バッファ７−２に読み込みセットする役目もある。さ
らには、画像バッファ７−１内の画像イメージデータや
、文字認識バッファ７−２内のテキストデータをプリン
タ２４を用いて用紙上に印刷する役目も持つ。The external storage device 9 includes various programs 10 for processing as shown in flowcharts shown in FIGS. 6 to 22, which will be explained later, data storage files 11 and 12, dictionary files 13 to 14, and the like. There is. To be more specific, 10 is character recognition/speech output software used in this implementation, and various programs are started according to user instructions under the control of a main program 10-1, which will be explained later using the flowchart of FIG. Reference numeral 10-2 is a program for capturing an image on a sheet into the image buffer 7-1 of the RAM 7 using the scanner 20, which will be described later using the flowchart of FIG. The character cutting program 10-3 is a program that cuts out an image for one character from the image in the image buffer 7-1, and will be described later using the flowchart of FIG. 10-
4 is a character recognition command program, and the extracted 1
This is a program that sends image information for a character to the character recognition device 21 and executes character recognition processing for one character, and will be described later using the flowchart of FIG. 10-5 is a post-processing program that evaluates the text after character recognition, and when there are multiple character candidates, selects the character candidate that is estimated to be more correct as the first candidate, etc. This will be described later using the flowchart in FIG. Reference numeral 10-6 is a speech synthesis program, which converts text completed by character recognition into kanji and kana characters into readings, adds accent and intonation information to the text, sends it to the speech synthesis device 22, and generates it from the speaker 23. This is a program that performs processing, and will be described later using the flowchart in FIG. At this time, the speech synthesis program 10-
6 uses the kanji-kana conversion dictionary file 13, the accent dictionary file 14, and the intonation regulation file 15 to convert kanji into readings, add accent information, and add intonation information. 10-7 is a storage program that saves the image data in the image buffer 7-1 to the image file 11, or saves the text data of the character recognition result in the character recognition buffer 7-2 to the text file 12. It has the role of storage, and will be described later using the flowchart of FIG. Also,
Conversely, it also has the role of reading and setting data in the image file 11 or text file 12 into the image buffer 7-1 or character recognition buffer 7-2. Furthermore, it also has the role of printing the image data in the image buffer 7-1 and the text data in the character recognition buffer 7-2 onto paper using the printer 24.

【００１０】図２は、文字認識装置２１のシステム構成
図である。接続部３４を介して、パソコン本体と結合さ
れているが、装置全体は、ＣＰＵ３０によって制御され
る。３１はメモリであって、ＲＯＭ３２及びＲＡＭ３３
で構成され、ＲＯＭ３２には、文字認識プログラム３２
−１と、文字認識処理ルーチンで用いる「全文字の線分
集合データ」３２−２が内蔵されている。ＲＡＭ３３に
は、文字認識作業中、１文字分のイメージの画像データ
３３−１、認識結果データが格納される。手順としては
、パソコン本体側から認識開始指令の信号が入ると、文
字認識プログラム３２−１は、まず１文字分の画像イメ
ージを取り込んで、ＲＡＭ３３上に画像データ３３−１
としてセットする。そして、各種処理を行なった後「全
文字の線分集合データ」３２−２と比較して、最も確か
らしい複数の文字候補を、得点付きで並べ、認識結果デ
ータ３３−２としてセットする。そして、パソコン本体
側に認識結果データ３３−２を返送する、と言う流れに
なる。処理の詳細は、後述の図７に示すフローチャート
で説明する。FIG. 2 is a system configuration diagram of the character recognition device 21. Although it is connected to a personal computer via a connection section 34, the entire device is controlled by a CPU 30. 31 is a memory, ROM32 and RAM33
The ROM 32 contains a character recognition program 32.
-1 and "line segment set data of all characters" 32-2 used in the character recognition processing routine are built-in. The RAM 33 stores image data 33-1 of an image of one character and recognition result data during character recognition work. The procedure is that when a recognition start command signal is received from the personal computer, the character recognition program 32-1 first captures an image for one character and stores the image data 33-1 on the RAM 33.
Set as . After various processes are performed, the character candidates are compared with the "line segment set data of all characters" 32-2, and the most probable character candidates are arranged with scores and set as recognition result data 33-2. Then, the recognition result data 33-2 is sent back to the personal computer. Details of the process will be explained with reference to a flowchart shown in FIG. 7, which will be described later.

【００１１】図３における（ａ）は、画像データ３３−
１のフォーマット例示図で、１６×１６のドットイメー
ジ中に「力」の文字らしい文字フォントがセットされて
いることを示している。(a) in FIG. 3 shows image data 33-
This is a format example diagram of No. 1, which shows that a character font that seems to be the character for "power" is set in a 16 x 16 dot image.

【００１２】図３における（ｂ）は、画像データ３３−
１を文字認識処理したあとの、認識結果データの例を示
している。左端の項目は、候補の順位欄で、１位から８
位まで並び、２番目の欄でＪＩＳの区点コードによる、
文字１文字分のコードが並び、最後の欄で、各候補の得
点が記載されている。本実施例においては、最高１０点
満点で、最低１点とする。この点数は「全文字の線分集
合データ」との比較でもって決められる。得点の高い順
に上位８位までの計８個の文字が、認識結果データ３３
−２としてセットされる。ちなみに、図３に置ける（ｂ
）の場合、第１位はカタカナの「カ」、第２位は漢字の
「力（ちから）」、第３位は、小さなカタカナの「ヵ」
、第８位は、漢字の「刀（かたな）」である。(b) in FIG. 3 shows image data 33-
This shows an example of recognition result data after subjecting 1 to character recognition processing. The leftmost item is the candidate ranking column, from 1st to 8th place.
In the second column, according to the JIS Kuten code,
Codes for one character are lined up, and the score for each candidate is listed in the last column. In this embodiment, the maximum score is 10 points, and the minimum score is 1 point. This score is determined by comparison with "line segment set data of all characters". The top 8 characters in order of highest score are recognized as recognition result data 33.
-2. By the way, it can be placed in Figure 3 (b
), the first place is the katakana ``ka'', the second place is the kanji ``chikara'', and the third place is the small katakana ``ka''.
, 8th place is the kanji ``katana''.

【００１３】図４は、文字認識・音声出力ソフト１０の
、ＣＲＴディスプレイ２への画面表示例である。４０は
画面枠、４１は表題、４２は終了マークで、ユーザがマ
ウスカーソル５５をマウス４を動かすことによって終了
マーク４２に合わせ、マウス４のボタンをクリックする
と、音声認識・音声出力ソフト１０は終了し、ＯＳのプ
ロンプト表示に戻る。４３はメニュー欄であって、「読
み込み」、「認識」、「音声」、「保存」の各メニュー
が並んでいる。メニューの選択方法は、終了マーク４２
の場合と同じく、マウスカーソル５５を相当するメニュ
ーに合わせて、マウス４のボタンをクリックすれば良い
。ボタンをクリックすることによってその時マウスカー
ソルの示しているコマンドがＣＰＵへと送信される。「
読み込み」メニューは、画像読み込みプログラム１０−
２を起動して、スキャナ２０を介し、用紙上の画像イメ
ージをパソコン上に取り込む為のものである。このとき
同時に、読み込み画像表示４４上に、読み込んだ画像イ
メージが表示される。「認識」メニューは、文字切り出
しプログラム１０−３、文字認識指令プログラム１０−
４、後処理プログラム１０−５、を起動して、文字認識
作業を開始させるためのものである。手順としては、ユ
ーザはまず、ＣＲＴ２上の読み込み画像４４上で、文字
認識の大正領域を決めておかなければならないが、それ
には、マウスカーソル５５を対象領域の左上隅にあて、
マウスボタンを押し、押したままマウスカーソル５５を
移動させ、対象領域の右下隅まで来たところで、マウス
ボタンを離す。これによって、認識対象領域の枠４６が
表示される。そして、ユーザが「認識」メニューを選択
すると、認識結果が結果表示４５に表示される。これに
よって、文字認識作業が終ったことになる。FIG. 4 is an example of a screen display of the character recognition/voice output software 10 on the CRT display 2. As shown in FIG. 40 is a screen frame, 41 is a title, and 42 is an end mark. When the user moves the mouse 4 to align the mouse cursor 55 with the end mark 42 and clicks the button on the mouse 4, the speech recognition/speech output software 10 ends. and return to the OS prompt display. Reference numeral 43 denotes a menu field, in which menus ``Load'', ``Recognition'', ``Audio'', and ``Save'' are lined up. To select the menu, click the end mark 42
As in the case of , all you have to do is align the mouse cursor 55 with the corresponding menu and click the button of the mouse 4. By clicking the button, the command currently indicated by the mouse cursor is sent to the CPU. "
"Load" menu is the image loading program 10-
2 is started and the image on the paper is taken into the personal computer via the scanner 20. At the same time, the read image is displayed on the read image display 44. The "recognition" menu includes the character extraction program 10-3 and the character recognition command program 10-3.
4. This is for activating the post-processing program 10-5 and starting character recognition work. As for the procedure, the user must first determine the Taisho region for character recognition on the read image 44 on the CRT 2. To do so, place the mouse cursor 55 on the upper left corner of the target region,
Press the mouse button, move the mouse cursor 55 while holding it down, and release the mouse button when it reaches the lower right corner of the target area. As a result, a frame 46 of the recognition target area is displayed. Then, when the user selects the "recognition" menu, the recognition result is displayed on the result display 45. This means that the character recognition work is finished.

【００１４】「音声」メニューは、認識結果のテキスト
４５を、音声合成装置２２、スピーカ２３を用いて、発
声させるものである。手順としては、ユーザは、まずマ
ウスカーソル５５を用いて、音声出力の対象領域４７を
決める。これは、文字認識の対象領域４６を決めた際の
作業と同じである。そして次にユーザが「音声」メニュ
ーを選択すると、音声合成プログラムが起動され、出力
対象領域４７内のテキストがスピーカ２３から自動的に
発声されることになる。The "voice" menu is for making the text 45 resulting from the recognition aloud using the voice synthesizer 22 and the speaker 23. As a procedure, the user first uses the mouse cursor 55 to determine the target area 47 for audio output. This is the same operation as when determining the target area 46 for character recognition. Then, when the user selects the "sound" menu, the speech synthesis program is activated, and the text within the output target area 47 is automatically uttered from the speaker 23.

【００１５】「保存」メニューを選択すると、保存プロ
グラム１０−７が起動され、さらに細かな選択用のメニ
ューが現われ、読み込み画像４４のファイル保存、結果
テキスト４５のファイル保存、あるいは、画像ファイル
１１からの画像イメージのリードで読み込み画像４４の
表示セット、テキストファイル１２からのテキストデー
タのリードと結果テキストへの表示、セット、あるいは
また、画像イメージの印刷、結果テキストの印刷、が選
択実行できる。When the "Save" menu is selected, the save program 10-7 is started, and a menu for more detailed selections appears. It is possible to select and execute the display set of the read image 44 by reading the image image, reading the text data from the text file 12 and displaying and setting the result text, or printing the image image and printing the result text.

【００１６】４９、４８は、画像イメージのデータが多
くて、複数ページに渡る際の次ページ表示切換ボタン、
前ページ表示切換ボタンである。５０は、ページ数の表
示である。５２、５１は４９及び４８と同様に、結果テ
キストデータが多く、複数ページにまたがる場合の、次
ページ表示切換ボタン、前ページ表示切換ボタンであっ
て、５３はページ数表示である。[0016] 49 and 48 are next page display switching buttons when there is a lot of image data and it spans multiple pages;
This is a previous page display switching button. 50 is a display of the number of pages. Similar to 49 and 48, 52 and 51 are next page display switching buttons and previous page display switching buttons when the resulting text data is large and spans multiple pages, and 53 is a page number display.

【００１７】なお、５６は文字カーソルである。ユーザ
はキーボード３を用いて、結果テキストの編集作業もで
きる。つまり、文字カーソル５６位置の文字の削除、文
字列挿入、移動、等々ができる。文字認識に誤りが発生
した場合は、この方法でテキスト修正が可能である。Note that 56 is a character cursor. The user can also use the keyboard 3 to edit the resulting text. In other words, the character at the character cursor 56 position can be deleted, a character string inserted, moved, etc. If an error occurs in character recognition, it is possible to correct the text using this method.

【００１８】図５は、文字認識バッファ７−２上にある
文字認識の結果テキストのフォーマット例である。文字
コードが先頭から順に書かれているが、“カ”の字につ
いては、文字認識の得点の同じ候補があったので、カタ
カナの“カ”と漢字の“力”（ちから）が並置されてい
る。候補が２つ並んでいることをテキストの中で区別す
るために、開始マークコードと終了マークコードで囲み
、区切りマークコードで区切って、間にカタカナの“カ
”の文字コードと、漢字の“力”（ちから）のコードを
埋め込んである。なお、結果テキスト４５として、ディ
スプレイ上に表示されるものは、この場合最初に出現し
て第１候補となっているカタカナの“カ”である。FIG. 5 shows an example of the format of text as a result of character recognition on the character recognition buffer 7-2. The character codes are written in order from the beginning, but for the character “ka”, there were candidates with the same character recognition score, so the katakana character “ka” and the kanji character “chikara” were juxtaposed. There is. In order to distinguish in the text that two candidates are lined up, they are surrounded by a start mark code and an end mark code, separated by a delimiter mark code, and in between are the character code for katakana "ka" and the kanji " It has a code of "power" embedded in it. Note that what is displayed on the display as the result text 45 is the katakana character "ka" that appears first and is the first candidate in this case.

【００１９】次に、本実施例について、フローチャート
を用いて処理の流れを説明する。Next, the process flow of this embodiment will be explained using a flowchart.

【００２０】図６はメインプログラム１０−１の動作を
説明するフローチャートである。基本的には、ユーザの
メニュー選択によって、各処理プログラムが起動される
ことを示している。動作の概要は図４を用いてすでに説
明済みなので詳細な説明はここでは省略する。ステップ
Ｓ２の「読み込み」メニュー選択、ステップＳ６の「認
識」メニュー選択、ステップＳ８の「音声」メニュー選
択、ステップＳ１２の「保存」メニュー選択では、各処
理プログラムの起動を行なうだけだが、ステップＳ４の
領域指定作業、ステップＳ１０の編集作業、ステップＳ
１４のページ変更作業は、メインプログラム１０−１自
身が処理を行なう。FIG. 6 is a flowchart explaining the operation of the main program 10-1. Basically, this indicates that each processing program is activated by the user's menu selection. Since the outline of the operation has already been explained using FIG. 4, detailed explanation will be omitted here. The "load" menu selection in step S2, the "recognition" menu selection in step S6, the "audio" menu selection in step S8, and the "save" menu selection in step S12 only start each processing program, but in step S4 Area specification work, step S10 editing work, step S
The page change operation No. 14 is processed by the main program 10-1 itself.

【００２１】図７は、文字認識装置２１内のＲＯＭ３２
にある文字認識プログラム３２−１の動作を説明するフ
ローチヤートである。このプログラムは、音声認識装置
２１の電源ＯＮと同時に立ち上がり、電源ＯＦＦまで動
作を続ける。FIG. 7 shows the ROM 32 in the character recognition device 21.
This is a flowchart illustrating the operation of the character recognition program 32-1 in FIG. This program starts up at the same time as the power of the speech recognition device 21 is turned on, and continues to operate until the power is turned off.

【００２２】まず、ステップＳ２０では、電源ＯＦＦか
チェックし、ＯＦＦならばそのまま終了する。ＮＯなら
ば（電源がＯＮならば）、ステップＳ２１で、パソコン
本体側から文字認識処理の開始指令が来ているかチェッ
クし、ＮＯなら、ステップＳ２０の直前に戻りループを
形成する。ステップＳ２１でＹＥＳなら、ステップＳ２
２に進んで、パソコン本体側から１文字分の画像イメー
ジを取り込み、ＲＡＭ３３上に画像データ３３−１とし
てセットする。そして、ステップＳ２３で、画像データ
３３−１より、活字の字体に相当する黒い領域の輪郭、
つまりアウトラインを得、ステップＳ２４で、アウトラ
インに沿って線分、つまりベクトルを作成し、ステップ
Ｓ２５で、その線分の集合データを得る。そして、ステ
ップＳ２６では、上記の線分集合データと、ＲＯＭ３２
上にある全文字の線分集合データ３２−２をベクトルの
一致の面で比較し、ステップＳ２７で、一致の得点の高
い順に、第１位から第８位の候補までで、認識結果デー
タ３３−２を作成する。そして、ステップＳ２８で、パ
ソコン本体側に、認識結果データ３３−２を転送し、一
連の処理が終る。その後、ステップＳ２０に戻り、以後
、ループを形成して、再度パソコン本体側から、文字の
認識開始指令が来るのを待つ。First, in step S20, it is checked whether the power is turned off, and if it is turned off, the process ends. If NO (if the power is ON), a check is made in step S21 to see if a command to start character recognition processing has come from the personal computer, and if NO, the process returns to immediately before step S20 to form a loop. If YES in step S21, step S2
Proceeding to step 2, an image for one character is fetched from the personal computer main body side and set on the RAM 33 as image data 33-1. Then, in step S23, from the image data 33-1, the outline of the black area corresponding to the font of the typeface,
That is, an outline is obtained, line segments, ie, vectors, are created along the outline in step S24, and set data of the line segments is obtained in step S25. Then, in step S26, the above line segment set data and the ROM 32
The line segment set data 32-2 of all the characters on the top are compared in terms of vector matching, and in step S27, the recognition result data 33 are selected from the first to eighth candidates in descending order of matching score. Create -2. Then, in step S28, the recognition result data 33-2 is transferred to the personal computer main body side, and the series of processing ends. Thereafter, the process returns to step S20, forming a loop and waiting again for a character recognition start command from the personal computer.

【００２３】なお、この文字認識の処理方法は、『疑似
ベイズ識別関数法』として、情報処理関連の各種文献上
で、公知の技術となっているものであるので、これ以上
の詳細な部分については、説明を省略する。[0023] This character recognition processing method is a well-known technique in various information processing-related literature as the ``pseudo-Bayesian discriminant function method,'' so we will not go into further details. , the explanation will be omitted.

【００２４】図８は、画像読み込みプログラム１０−２
の動作を説明するフローチャートである。メインプログ
ラム１０−１から起動されると、まず、ステップＳ３０
で、スキャナ２０から用紙上の画像イメージを読み込み
、ステップＳ３１で、その画像データをＲＡＭ７上の画
像バッファ７−１にセットし、ステップＳ３２で、ＣＲ
Ｔディスプレイ２上に画像イメージを図４の４４に示す
ように表示して、このプログラムからリターンし、メイ
ンプログラム１０−１側に制御が戻る。FIG. 8 shows the image reading program 10-2.
2 is a flowchart illustrating the operation of FIG. When started from the main program 10-1, first, step S30
Then, the image on the paper is read from the scanner 20, and in step S31, the image data is set in the image buffer 7-1 on the RAM 7, and in step S32, the CR
An image is displayed on the T-display 2 as shown at 44 in FIG. 4, the program returns, and control returns to the main program 10-1.

【００２５】図９は、文字認識指令プログラム１０−４
の動作を説明するフローチャートである。メインプログ
ラム１０−１から起動されると、まず、文字認識の対象
領域が確定済かチェックする。ＮＯなら、このままリタ
ーンする。ＹＥＳならステップＳ４１に移り、文字切り
出しプログラム１０−３を起動し、文字切り出しプログ
ラム１０−３の処理が終れば、次にステップＳ４２で、
後処理プログラム１０−５を起動し、その処理が終るの
を待ってリターンし、メインプログラム１０−１に制御
を戻す。FIG. 9 shows the character recognition command program 10-4.
2 is a flowchart illustrating the operation of FIG. When started from the main program 10-1, it is first checked whether the target area for character recognition has been determined. If NO, return as is. If YES, the process moves to step S41 and starts the character cutting program 10-3, and when the processing of the character cutting program 10-3 is finished, then in step S42,
The post-processing program 10-5 is activated, waits for its processing to finish, returns, and returns control to the main program 10-1.

【００２６】図１０は、文字切り出しプログラム１０−
３の動作を説明するフローチャートである。文字認識指
令プログラム１０−４によって起動されると、まず、ス
テップＳ５０で、認識対象領域の最初の１文字イメージ
を切り出し、ステップＳ５１で、文字認識装置２１に、
１文字イメージを送り、認識処理開始を指令する。そし
て、ステップＳ５２で、文字認識装置２１から認識結果
を受け取り、ステップＳ５３で、認識結果データの上で
、候補第１位の文字コードを、文字認識バッファ７−２
に加える。そして、ステップＳ５４で、同点の候補があ
ったのなら、候補第２位の文字コードも文字認識バッフ
ァ７−２に加える。このときの第１候補と第２候補の文
字コードは、図５で説明したように、開始マーク、区切
りマーク、終了マークの間に埋め込まれた形のフォーマ
ットでセットされる。そして、ステップＳ５５で、認識
対象領域の中で、次の１文字イメージを切り出し、ステ
ップＳ５６で、次の１文字イメージが有るのかチェック
し、ＹＥＳなら、ステップＳ５１の直前に戻り、前述の
処理をくり返す。ＮＯならリターンし、文字認識指令プ
ログラム１０−４に制御が戻る。FIG. 10 shows the character extraction program 10-
3 is a flowchart illustrating the operation of step 3. When started by the character recognition command program 10-4, first, in step S50, the first character image of the recognition target area is cut out, and in step S51, the character recognition device 21
Sends a single character image and commands the start of recognition processing. Then, in step S52, the recognition result is received from the character recognition device 21, and in step S53, the first candidate character code is added to the character recognition buffer 7-2 on the recognition result data.
Add to. Then, in step S54, if there is a candidate with the same score, the character code of the second candidate is also added to the character recognition buffer 7-2. At this time, the character codes of the first and second candidates are set in a format embedded between the start mark, delimiter mark, and end mark, as explained with reference to FIG. Then, in step S55, the next character image is cut out in the recognition target area, and in step S56, it is checked whether the next character image exists. If YES, the process returns to immediately before step S51 and the above-mentioned process is performed. Repeat. If NO, the process returns and control returns to the character recognition command program 10-4.

【００２７】図１１は、後処理プログラム１０−５の動
作を説明するフローチャートである。文字認識指令プロ
グラム１０−４によって起動されると、まず、ステップ
Ｓ６０で、文字認識バッファをサーチし、第１候補、第
２候補が並置されている部分があるか調べ、その結果を
、ステップＳ６１で有るかどうか判断し、ＮＯならば、
このプログラムからリターンする。ＹＥＳならば、ステ
ップＳ６２に移り、その部分が、前後をカタカナで囲ま
れた一字であり、しかも、候補にカタカナの一字が存在
するならば、それを第１候補とする。次に、ステップＳ
６３に移り、その部分が前後をひらがなで囲まれた一字
であり、しかも、候補にひらがなの一字が存在するなら
、それを第１候補とする。次にステップＳ６４に移り、
その部分が、前後を漢字とひらがなで囲まれた一字であ
り、しかも候補に漢字があるなら、それを第１候補とす
る。そして、ステップＳ６５で文字認識バッファを再度
サーチし、次の第１候補、第２候補が並置している部分
があるか調査し、その結果をステップＳ６６でチェック
し、ＹＥＳならステップＳ６２の直前に戻り、以後、ス
テップＳ６２、Ｓ６３、Ｓ６４、Ｓ６５の処理をくり返
す。ステップＳ６６で、ＮＯの場合は、ステップＳ６７
に進み、候補変更済みのテキストを画面表示して、後処
理プログラムからリターンする。FIG. 11 is a flowchart illustrating the operation of the post-processing program 10-5. When started by the character recognition command program 10-4, first, in step S60, the character recognition buffer is searched to see if there is a part where the first candidate and the second candidate are juxtaposed, and the result is sent to step S61. Determine if it is, and if NO,
Return from this program. If YES, the process moves to step S62, and if that part is a character surrounded by katakana characters at the front and back, and if there is a katakana character among the candidates, that part is selected as the first candidate. Next, step S
63, if that part is a character surrounded by hiragana characters on both sides, and if there is a hiragana character among the candidates, that part is selected as the first candidate. Next, proceed to step S64,
If that part is a character surrounded by kanji and hiragana on the front and back, and if there is a kanji as a candidate, that is selected as the first candidate. Then, in step S65, the character recognition buffer is searched again, and it is investigated whether there is a part where the next first candidate and second candidate are juxtaposed.The result is checked in step S66, and if YES, immediately before step S62. After that, the process of steps S62, S63, S64, and S65 is repeated. If NO in step S66, step S67
Proceed to , display the text with changed candidates on the screen, and return from the post-processing program.

【００２８】図１２は、音声合成プログラム１０−６の
動作を説明するフローチャートである。このプログラム
は、メインプログラム１０−１から起動されるが、まず
、ステップＳ７０で、音声出力領域が確定済かチェック
する。これは、図４で説明した枠４７である。ＮＯなら
、このプログラムからリターンするが、ＹＥＳならステ
ップＳ７１で、音声合成バッファ７−３に確定された領
域から最初の文をひとつ取り出す。そして、ステップＳ
７２で漢字仮名変換を行ない、もとの漢字混じり文を文
節に区切ったかな読みだけの文に変える。このとき、漢
字単語と、かな読みの対応情報の入った、漢字仮名変換
辞書ファイル１３を用いる。また、文節の区切りは、文
節ごとの読みの長さがなるべく長くなるように、読みの
二文節最長一致法を用いて処理する。次にステップＳ７
３で、文を構成する単語のひとつひとつに、アクセント
情報を付加する。このとき、単語とアクセントの対応情
報の入ったアクセント辞書ファイル１４を用いる。次に
ステップＳ７４で、この文のイントネーションを整える
。ここで、たとえば、疑問文なら文末の音程を上げる等
の情報を加える。このとき、文の構成とイントネーショ
ンの規則情報の入った、イントネーション規則ファイル
１５を用いる。そして、ステップＳ７５では音声合成装
置２２が発声中かどうかチェックし、発声中ならそれが
終了するまで待つ。そして、ステップＳ７６に移り、ス
テップＳ７１、Ｓ７２、Ｓ７３で得た、読み、アクセン
ト、イントネーション情報を音成合成装置２２に送り、
発声を指令する。そして、ステップＳ７７で、次のひと
つの文を取り出して、音成合成バッファ７−３にセット
するが、ステップＳ７８では、“次のひとつの文”があ
ったのかなかったのかチェックし、ＹＥＳなら、ステッ
プＳ７２の直前に戻り、ステップＳ７２からステップＳ
７７までの処理をくり返す。ＮＯなら、音声合成プログ
ラム１０−６をリターンする。FIG. 12 is a flowchart illustrating the operation of the speech synthesis program 10-6. This program is started from the main program 10-1, and first, in step S70, it is checked whether the audio output area has been determined. This is the frame 47 described in FIG. If NO, the program returns, but if YES, in step S71, the first sentence is extracted from the area determined in the speech synthesis buffer 7-3. And step S
72 performs kanji-kana conversion, converting the original sentence with kanji into a sentence with only kana reading separated into clauses. At this time, a kanji-kana conversion dictionary file 13 containing correspondence information between kanji words and kana readings is used. Furthermore, the separation of phrases is processed using the two-clause longest matching method of reading so that the length of reading for each phrase is as long as possible. Next step S7
In step 3, accent information is added to each word that makes up the sentence. At this time, an accent dictionary file 14 containing word-accent correspondence information is used. Next, in step S74, the intonation of this sentence is adjusted. Here, for example, if the sentence is a question, information such as raising the pitch at the end of the sentence is added. At this time, an intonation rule file 15 containing sentence structure and intonation rule information is used. Then, in step S75, it is checked whether or not the speech synthesizer 22 is producing a voice, and if it is producing a voice, it waits until it is finished. Then, the process moves to step S76, and the reading, accent, and intonation information obtained in steps S71, S72, and S73 is sent to the sound synthesis device 22,
Command vocalization. Then, in step S77, the next sentence is taken out and set in the phonetic synthesis buffer 7-3, but in step S78, it is checked whether there is "the next sentence", and if YES, it is checked. , returns to immediately before step S72, and from step S72 to step S
Repeat the process up to 77. If NO, the speech synthesis program 10-6 is returned.

【００２９】図１３は、保存プログラム１０−７の動作
を説明する為のフローチャートである。メインプログラ
ム１０−１から、保存プログラム１０−７が起動される
と、まずユーザが選択するための保存用各種メニューの
並んだ小さなウインドウが開かれる。図１３は、そのメ
ニューの中からユーザが選択動作したときの各処理を説
明している。ステップＳ８０で、ユーザが「終了」を選
択した場合は、保存用の各種メニューの並んだ小さなウ
インドウは閉じられ、保存プログラムは終了する。ステ
ップＳ８１で、ユーザが「テキスト保存」を選択した場
合は、ステップＳ８２で、文字認識バッファ７−２上に
ある認識結果データ４５を、テキストファイル１２に保
存する。ユーザがステップＳ８３で、「画像保存」を選
択した場合は、ステップＳ８４で、画像バッファ７−１
上にある読み込み画像データ４４を、画像ファイル１１
に保存する。ユーザがステップＳ８５で、「テキスト読
み込み」を選択した場合は、ステップＳ８６で、テキス
トファイル１２中の結果データを読み込み、文字認識バ
ッファ７−２にセットする。ユーザがステップＳ８７で
、「画像読み込み」を選択した場合は、ステップＳ８８
で、画像ファイル１１内のデータを読み込み、画像バッ
ファ７−１にセットする。ユーザがステップＳ８９で「
テキスト印刷」を選択した場合は、ステップＳ９０で、
文字認識バッファ７−２内の結果データ４５を、プリン
タ２４に印刷する。ある部分の文字が、第１候補と第２
候補が並んでいる場合は、第１候補の文字だけを印刷す
る。ユーザがステップＳ９１で「画像印刷」を選んだ場
合は、画像バッファ７−１内のデータをプリンタ２４に
印刷する。なお、ステップＳ８２、Ｓ８４、Ｓ８６、Ｓ
８８、Ｓ９０、Ｓ９２の各処理が終了したあとは、ステ
ップＳ８０の直前に戻ってループを形成し、ユーザによ
るメニュー選択を待つ形になる。FIG. 13 is a flow chart for explaining the operation of the storage program 10-7. When the save program 10-7 is started from the main program 10-1, a small window is opened in which various save menus are lined up for the user to select. FIG. 13 explains each process when the user makes a selection operation from the menu. If the user selects "end" in step S80, the small window with various save menus is closed and the save program is ended. If the user selects "text save" in step S81, the recognition result data 45 on the character recognition buffer 7-2 is saved in the text file 12 in step S82. If the user selects "save image" in step S83, the image buffer 7-1 is saved in step S84.
The read image data 44 at the top is converted to the image file 11.
Save to. If the user selects "read text" in step S85, the result data in the text file 12 is read in and set in the character recognition buffer 7-2 in step S86. If the user selects "image loading" in step S87, step S88
Then, the data in the image file 11 is read and set in the image buffer 7-1. In step S89, the user selects "
If you select "Text Print", in step S90,
The result data 45 in the character recognition buffer 7-2 is printed on the printer 24. The characters in a certain part are the first candidate and the second candidate.
If candidates are lined up, only the first candidate character is printed. If the user selects "image printing" in step S91, the data in the image buffer 7-1 is printed on the printer 24. Note that steps S82, S84, S86, S
After each process of 88, S90, and S92 is completed, the process returns to immediately before step S80 to form a loop, and waits for the menu selection by the user.

【００３０】本実施例を用いることにより、図４に示し
たように、文字認識前の画像イメージと文字認識後の結
果テキストを並べて一画面上に表示するので、認識結果
のチェック等もユーザがやりやすいようになっている。By using this embodiment, as shown in FIG. 4, the image before character recognition and the resulting text after character recognition are displayed side by side on one screen, so the user can check the recognition results. It's made easy to do.

【００３１】また、文字認識結果を、音声出力できるよ
うになっているので、この点でも、認識結果のチェック
がしやすい。しかも、認識結果で誤った部分があれば、
キーボードを用いて直接修正することができる。[0031] Furthermore, since the character recognition results can be outputted audibly, it is easy to check the recognition results in this respect as well. Moreover, if there are any errors in the recognition results,
It can be modified directly using the keyboard.

【００３２】また、複数ページ処理も入れてあるので、
データ量が多くても対応が可能となる。[0032] Also, since multiple page processing is included,
It is possible to handle even large amounts of data.

【００３３】（実施例２）次に、文字認識を行う際に、
候補文字を得点付きで複数個導出し、得点が低い場合は
文字の切り出し領域を変更する実施例について説明する
。(Example 2) Next, when performing character recognition,
An example will be described in which a plurality of candidate characters are derived with scores, and if the scores are low, the character extraction area is changed.

【００３４】図１４は、文字の切り出し処理の説明図で
ある。図１４における（ａ）は、ひらながの「ほ」の字
を用いて説明する。実施例１では、文字切り出しプログ
ラム１０−３が、画像イメージからの文字１文字分のイ
メージ切り出しを担当していたが、例えば「ほ」の字の
左半分を誤まって切り出してしまったとすると、実施例
１では、次の１文字イメージとして「ほ」の右半分を切
り出して、文字認識処理を連続して行う以外になかった
。その結果、認識結果としては、「１」と「ま」と出力
してしまう。そこで、実施例２では、「ほ」の左半分の
文字認識が出力された段階で、得点をチェックする。この場合の得点とは、実施例１の図３における（ｂ）で
示した得点である。この得点が例えば８点未満であるな
ら、文字イメージの切り出しが誤っていたものと考え、
次は「ほ」の左半分に右半分のイメージを加えて、１文
字イメージとして、再度文字認識処理を行うものである
。そして、認識結果として、正しい「ほ」の１文字が得
られる。FIG. 14 is an explanatory diagram of character extraction processing. (a) in FIG. 14 will be explained using the hiragana character "ho". In the first embodiment, the character cutting program 10-3 is in charge of cutting out the image of one character from the image, but for example, suppose that the left half of the character "ho" is cut out by mistake. In the first embodiment, there was no choice but to cut out the right half of "ho" as the next character image and perform character recognition processing continuously. As a result, "1" and "ma" are output as the recognition results. Therefore, in the second embodiment, the score is checked at the stage when the character recognition of the left half of "ho" is output. The score in this case is the score shown in (b) in FIG. 3 of Example 1. For example, if this score is less than 8 points, it is assumed that the character image was cut out incorrectly.
Next, the image of the right half of "ho" is added to the left half, and the character recognition process is performed again as a single character image. As a recognition result, the correct character "ho" is obtained.

【００３５】図１４における（ｂ）も、図１４における
（ａ）の「ほ」の字と同じように、「刈」の字に対応し
た例である。「刈」の左半分のイメージを文字認識させ
たら得点が低かったので、「刈」の右半分のイメージを
合体させて、再度「刈」を１文字イメージとして、文字
認識処理を行って、「刈」の１字を得たものである。14(b) is also an example corresponding to the character ``kari'', similar to the character ``ho'' in FIG. 14(a). When I performed character recognition on the left half of the image of "Kari", the score was low, so I merged the right half of the image of "Kari" and performed character recognition processing again with "Kari" as a single character image. The character ``kari'' was obtained.

【００３６】図１５及び図１６は、その場合の文字切り
出しプログラムの動作を説明するフローチャートである
。文字認識指令プログラム１０−４によって、文字切り
出しプログラム１０−３が起動される点は、実施例１と
同様である。本実施例では起動されると、まずステップ
Ｓ１００で、文字認識の対象領域の最初の「一文字」分
の文字イメージを切り出す。次にステップＳ１０１で文
字認識装置２１に、１文字分のイメージを送り、文字認
識処理の開始を指令し、ステップＳ１０２で、認識結果
を受け取る。そして、ステップＳ１０３で、認識結果の
得点をチェックし、規定より高ければステップＳ１１０
に飛ぶ。この場合の“規定”とは、例えば、「１０点満
点で８点以上なら、ＹＥＳとする」等である。ステップ
Ｓ１１０では、候補第１位の文字コードを文字認識バッ
ファ７−２に加え、同点の候補があるなら、候補第２位
の文字コードも文字認識バッファ７−２に加える。そし
て、ステップＳ１１２で、認識対象領域から、次の１文
字イメージを切り出し、ステップＳ１１３で、「次の１
文字イメージ」の有無をチェックし、「なし」ならば文
字切り出しプログラムをリターンする。「有り」ならば
、ステップＳ１０１の直前に戻り、ループを形成し、ス
テップＳ１０１から、Ｓ１１２までの処理をくり返す。ステップＳ１００、Ｓ１０１、Ｓ１０２、Ｓ１１０、Ｓ
１１１、Ｓ１１２、Ｓ１１３は、実施例１の場合とまっ
たく同一であるが、この実施例で特有の部分は、ステッ
プＳ１０３から、Ｓ１０９までである。FIGS. 15 and 16 are flowcharts illustrating the operation of the character extraction program in this case. This embodiment is similar to the first embodiment in that the character recognition command program 10-4 starts the character segmentation program 10-3. In this embodiment, when started, first in step S100, a character image for the first "one character" in the target area for character recognition is cut out. Next, in step S101, an image for one character is sent to the character recognition device 21 to instruct the start of character recognition processing, and in step S102, the recognition result is received. Then, in step S103, the score of the recognition result is checked, and if it is higher than the standard, step S110
fly to In this case, the "rule" is, for example, "If the score is 8 or more on a scale of 10, the answer is YES." In step S110, the character code of the first candidate is added to the character recognition buffer 7-2, and if there is a candidate with the same score, the character code of the second candidate is also added to the character recognition buffer 7-2. Then, in step S112, the next character image is cut out from the recognition target area, and in step S113, the next character image is cut out from the recognition target area.
The presence or absence of a character image is checked, and if there is no character image, the character extraction program is returned. If "Yes", the process returns to immediately before step S101, forms a loop, and repeats the processes from step S101 to S112. Steps S100, S101, S102, S110, S
Steps 111, S112, and S113 are exactly the same as in the first embodiment, but the unique parts in this embodiment are steps S103 to S109.

【００３７】ステップＳ１０３で、得点が低くてＮＯの
場合は、ステップＳ１０４に移り、直前に分析した１文
字イメージに、次の１文字イメージを加え、新しい１文
字イメージとし、ステップＳ１０５で、この新しい１文
字イメージを文字認識処理装置２１に送り、文字認識処
理の開始を指令し、ステップＳ１０６で、認識結果を受
け取る。そして、ステップＳ１０７で、前回の認識結果
の得点と比較し、「高い」のであればそのままステップ
Ｓ１１０以降の処理に移る。「高くない」のであれば、
ステップＳ１０８で、１文字イメージとして追加した分
を取り消し、未分析のイメージと定義し直し、ステップ
Ｓ１０９で、前回受け取った認識結果を正式の認識結果
とする。それ以降はステップＳ１１０からの処理に移る
。If the score is low and the answer is NO in step S103, the process moves to step S104, where the next one-character image is added to the one-character image analyzed just before to form a new one-character image, and in step S105, this new one-character image is A single character image is sent to the character recognition processing device 21, command is given to start character recognition processing, and the recognition result is received in step S106. Then, in step S107, the score is compared with the score of the previous recognition result, and if the score is "high", the process directly proceeds to step S110 and subsequent steps. If it's not expensive,
In step S108, the added single character image is canceled and redefined as an unanalyzed image, and in step S109, the previously received recognition result is made the official recognition result. After that, the process moves to step S110.

【００３８】ステップＳ１０８、Ｓ１０９の処理は、合
体した１文字イメージが必ずしも正しい１文字とは限ら
ない場合にそなえたものである。The processes in steps S108 and S109 are provided for the case where the combined character image is not necessarily the correct character.

【００３９】実施例１でも、実施例２でも文字認識バッ
ファ中のデータフォーマットは図５で示した形態である
。つまり、文字候補は必ず１文字だけだった。それに対
して、図１７は、文字切り出し方法の違いを考慮して、
ある文字の左半分のイメージ、右半分のイメージを別々
に文字認識した場合と、左半分と右半分を合体して文字
認識した場合のふたつの結果を、含んだ形にするための
フォーマット例である。この場合、「メ」と「リ」と分
析した候補と、「刈」と分析した候補が並置されている
。また、「女」と「子」の候補と、「女子」の候補も並
置されている。つまり、文字候補は、必ずしも１文字だ
けとは限らず、２文字である場合もある。この点が実施
例１、実施例２とは異なる。In both the first embodiment and the second embodiment, the data format in the character recognition buffer is as shown in FIG. In other words, there was always only one character candidate. On the other hand, in FIG. 17, considering the difference in character extraction methods,
This is an example of a format that includes two results: one where the left half image and the right half image of a certain character are recognized separately, and the other when the left half and right half are combined. be. In this case, the candidates analyzed as "me" and "ri" and the candidates analyzed as "kari" are juxtaposed. In addition, candidates for "woman" and "child" and candidates for "girl" are also juxtaposed. In other words, the character candidate is not necessarily limited to one character, but may be two characters. This point differs from Example 1 and Example 2.

【００４０】（実施例３）実施例１では、漢字仮名変換
は、音声出力する際に、文字認識結果テキストの漢字か
な混じり文を、読みだけの文にするために用いる例につ
いて述べたが、ここでは文字候補を含んだ認識結果テキ
ストの文字候補入れかえに用いる例について述べる。(Embodiment 3) In Embodiment 1, an example was described in which kanji-kana conversion is used to convert a sentence containing kanji and kana in the character recognition result text to a reading-only sentence when outputting voice. Here, we will discuss an example used to replace character candidates in recognition result text that includes character candidates.

【００４１】図１８は、この方法を用いた文字候補入れ
かえ処理プログラムを説明するためのフローチャートで
ある。なお、このプログラムは、実施例１の図９で説明
した文字認識指令プログラム中で、ステップＳ４２「後
処理プログラム起動」の直後に挿入された形で働く。FIG. 18 is a flow chart for explaining a character candidate replacement processing program using this method. Note that this program operates by being inserted immediately after step S42 "Start post-processing program" in the character recognition command program described in FIG. 9 of the first embodiment.

【００４２】起動されると、まずステップＳ１２０で、
漢字仮名変換を行い、読みだけの文章を得、ステップＳ
１２１で読みの二文節最長一致法で文節に区切る。ステ
ップＳ１２２で漢字１文字文節が連続した部分をサーチ
し、その結果、ステップＳ１２３で、「なし」であれば
、このプログラムからリターンする。「有り」ならば、
ステップＳ１２４で、その部分に、文字認識結果の文字
候補があるかチェックし、ＮＯであればステップＳ１２
９に飛び、次の漢字１文字文節が連続した部分をサーチ
して、ステップＳ１２３の直前に戻り、ループを形成す
る。ステップＳ１２４でＹＥＳの場合は、ステップＳ１
２５で、認識文字の第２候補を第１候補に変え、ステッ
プＳ１２６で、その前後の数文節分を読みの二文節最長
一致法で再度文節に区切る。その結果をステップＳ１２
７で漢字の１文字文節の連続がなくなったかチェックし
、ＹＥＳなら、ステップＳ１２９に飛ぶ。ＮＯなら、ス
テップＳ１２８で、文字認識候補の第２候補を第１候補
に戻し、以後、ステップＳ１２９の処理に移るが、これ
以降は前述の説明の通り。[0042] When started, first in step S120,
Perform kanji-kana conversion to obtain reading-only sentences, step S
At 121, the reading is divided into clauses using the longest matching method. In step S122, a search is made for a continuous one-character Chinese character clause, and if the result is "none" in step S123, the program returns. If “Yes”,
In step S124, it is checked whether there is a character candidate as a result of character recognition in that part, and if NO, step S12
The process jumps to step S123, searches for the next continuous one-character phrase, and returns to immediately before step S123 to form a loop. If YES in step S124, step S1
In step S125, the second candidate for the recognition character is changed to the first candidate, and in step S126, the several clauses before and after that are again divided into clauses using the two-clause longest matching method. The result is sent to step S12.
In step S7, it is checked whether there are no more consecutive one-character phrases of kanji, and if YES, the process jumps to step S129. If NO, in step S128, the second character recognition candidate is returned to the first candidate, and thereafter the process moves to step S129, but the rest is as described above.

【００４３】概説すれば、「漢字１文字文節が連続した
部分は、文字認識に失敗している」のではないか、と推
定して、文字候補の入れかえを行い、そして、再度分析
して、漢字１文字候補の連続がなくなった場合には、正
式に候補決定とするわけである。[0043] To summarize, it is assumed that ``character recognition has failed in parts where one-character clauses of kanji are consecutive'', the character candidates are replaced, and then analyzed again. When there are no more consecutive kanji character candidates, the candidate is officially determined.

【００４４】（実施例４）この実施例では、実施例３と
同様な文字認識の結果テキストに対する、後処理のもう
ひとつの例について述べる。(Embodiment 4) In this embodiment, another example of post-processing for text as a result of character recognition similar to that in Embodiment 3 will be described.

【００４５】この実施例では、テキスト中の、主語、述
語の関係、目的語、述語の関係をとらえ、誤った結びつ
きであると判断された場合は、ディスプレイ表示上のそ
の部分をアンダーライン表示する、と言うものである。一種の単純な日本語文章の意味処理と言えるものである
が、市販の各社のワープロでも、仮名漢字変換の同音語
決定に用いる一般的技術である。通常、「ＡＩ変換」、
「用例変換」と呼ばれ宣伝されている。[0045] In this embodiment, the relationship between the subject and predicate, and the relationship between object and predicate in the text is determined, and if it is determined that there is an incorrect connection, that part is underlined on the display. , is what is said. Although it can be said to be a kind of simple semantic processing of Japanese sentences, it is a general technology used in word processors of various companies on the market to determine homophones for kana-kanji conversion. Usually, "AI conversion",
It is advertised as "example conversion".

【００４６】図１９は、この意味処理プログラムを説明
するためのフローチャートである。このプログラムは、
実施例１の図９で説明した文字認識指令プログラム１０
−４のリターンの直前に挿入された形で働く。まず、ス
テップＳ１３０では、文節間で、主語、述語の結びつき
をサーチし、ステップＳ１３１で、主語、述語の結合規
則で不正な部分があれば、その部分を画像表示上４４、
結果表示上４５でアンダーライン表示する。そして、ス
テップＳ１３２で、文節間で目的語、述語の結びつきを
サーチし、ステップＳ１３３で目的語、述語の結合規則
で不正な部分があれば、その部分を、画像表示上４４、
結果表示上４５で、アンダーライン表示し、このプログ
ラムからリターンする。FIG. 19 is a flowchart for explaining this semantic processing program. This program is
Character recognition command program 10 explained in FIG. 9 of Example 1
It works by being inserted just before the -4 return. First, in step S130, the connection between subjects and predicates is searched between clauses, and in step S131, if there is an invalid part in the combination rule of subjects and predicates, that part is displayed on the image display 44,
The result is displayed underlined at 45 on the display. Then, in step S132, the connection between the object and predicate is searched between the clauses, and in step S133, if there is an invalid part in the combination rule of the object and predicate, that part is displayed as 44 on the image display.
45 is underlined on the result display, and the program returns.

【００４７】図２０は、図１９の意味処理プログラムで
不正であると判断された文の例である。FIG. 20 is an example of a sentence determined to be invalid by the semantic processing program of FIG. 19.

【００４８】（実施例５）この実施例では、音声出力を
聞いているユーザが「これは誤った文」と考えた場合に
、キーボードのＥＳＣキーをプッシュすると、ディスプ
レイ上に表示されている相当する文がアンダーライン表
示される、という例について述べる。図２１及び図２２
は、この実施例のために修正された音声合成プログラム
１０−６を示している。他の部分については、実施例１
と同一で良い。まず、ステップＳ１４０では、音声出力
領域が確定済みかチェックし、ＮＯなら、このプログラ
ムからリターンする、ＹＥＳならステップＳ１４１で音
声出力領域の最初の文を、音声合成バッファ７−３に取
り出し、ステップＳ１４２で漢字仮名変換で、かな読み
を得て、単語に区切り、ステップＳ１４３でアクセント
辞書によって単語ごとのアクセント情報を付加し、ステ
ップＳ１４４でイントネーション規則により、文全体の
イントネーションを整える。そして、ステップＳ１４５
で、音声合成装置２２が発声中かチェックし、発声が終
了しているのであれば、ステップＳ１４６に移り、読み
、アクセント、イントネーション情報を、音声合成装置
２２に送り、発声を指令する。その時、ステップＳ１４
７で、ユーザがキーボードからＥＳＣキーをプッシュし
たかチェックし、ＹＥＳなら、ステップＳ１５１で、発
声中の文を、画像表示４４、結果表示４５上で、アンダ
ーライン表示する。その後、ステップＳ１５２で、次の
文を、音声合成バッファ７−３に取り出し、ステップＳ
１５３で、“次の文”があったのかチェックし、ＮＯな
らそのまま、このプログラムからリターンする。ＹＥＳ
なら、ステップＳ１４２の直前に戻り、ステップＳ１４
２以下の処理をくり返す。ステップＳ１４７でＮＯなら
ば、ステップＳ１４８で、音声合成装置２２が発声中か
チェックし、ＹＥＳならステップＳ１４７の直前に戻り
、ＥＳＣキーのチェックをくり返す。ステップＳ１４８
でＮＯならば、ステップＳ１４９で２秒待ち、ステップ
Ｓ１５０で、ユーザからのＥＳＣキー入力をチェックす
る。ここでＹＥＳなら、ステップＳ１５１でのアンダー
ライン表示の処理を行い、あては前述の説明の通り、処
理が進む。ステップＳ１５０でＮＯの場合は、ステップ
Ｓ１５２に移り、“次の文”のセットを行って、あとは
前述の説明の通り処理が進む。(Example 5) In this example, when the user who is listening to the audio output thinks that "this is an incorrect sentence" and presses the ESC key on the keyboard, the corresponding text displayed on the display will be displayed. Let's take an example where a sentence is displayed underlined. Figures 21 and 22
shows a speech synthesis program 10-6 modified for this embodiment. For other parts, Example 1
It is fine to be the same as. First, in step S140, it is checked whether the audio output area has been determined. If NO, return from this program; if YES, in step S141, the first sentence of the audio output area is taken out to the audio synthesis buffer 7-3, and in step S142 In step S143, accent information is added to each word using an accent dictionary, and in step S144, the intonation of the entire sentence is adjusted according to intonation rules. Then, step S145
Then, it is checked whether the speech synthesizer 22 is producing a voice, and if the speech has been completed, the process moves to step S146, and the reading, accent, and intonation information is sent to the speech synthesizer 22 to instruct the speech synthesizer 22 to produce a voice. At that time, step S14
In step S7, it is checked whether the user has pushed the ESC key from the keyboard, and if YES, the sentence being uttered is displayed underlined on the image display 44 and result display 45 in step S151. After that, in step S152, the next sentence is taken out to the speech synthesis buffer 7-3, and in step S152, the next sentence is extracted into the speech synthesis buffer 7-3.
At step 153, it is checked whether there is a "next sentence", and if NO, the program returns directly. YES
If so, return to immediately before step S142 and proceed to step S14.
Repeat steps 2 and below. If NO in step S147, it is checked in step S148 whether the speech synthesizer 22 is producing a voice, and if YES, the process returns to immediately before step S147 and the ESC key check is repeated. Step S148
If NO in step S149, the process waits for 2 seconds, and in step S150, the ESC key input from the user is checked. If YES here, an underline display process is performed in step S151, and the process proceeds as described above. If NO in step S150, the process moves to step S152, where the "next sentence" is set, and the rest of the process proceeds as described above.

【００４９】ステップＳ１４９で、２秒待つ意味は、ひ
とつの文の発声と次の文の発声の間を、少なくとも２秒
あけるのと、その間に、ユーザが「不正」と判断して、
ＥＳＣキーを押した場合、直前に発声した文に「不正」
を示すアンダーラインを表示させるためのものである。もちろん、文の発声中でも、「不正」と判断できたなら
、ユーザは即ＥＳＣキーをプッシュして良い。ステップ
Ｓ１４７では、その判断をしている。In step S149, the meaning of waiting 2 seconds is to wait at least 2 seconds between the utterance of one sentence and the utterance of the next sentence, and during that time, if the user determines that the sentence is ``inauthentic,''
If you press the ESC key, the last sentence uttered will be marked as "invalid".
This is to display an underline indicating the . Of course, even if the user is uttering a sentence, if it is determined that the sentence is "incorrect," the user may press the ESC key immediately. In step S147, this determination is made.

【００５０】[0050]

【発明の効果】以上説明したように、一連の文字認識作
業の中で、最も負荷のかかる、１文字イメージだけの文
字認識処理の部分を、別制御、別筐体として独立させ、
他の各処理は従来通りメインのマシン上で行うように分
離することにより、（１）メインのマシンは、低価格のパソコン、あるいは
ワークステーションで実行できる。（２）認識速度も、従来の２〜３倍の６０〜７０字／秒
が確保できる。（３）また、メインのマシン側の処理である、用紙上の
活字イメージの読み込みのプログラム、１文字イメージ
の切り出しプログラム、認識結果の評価と修正のプログ
ラム、テキスト表示のプログラム、等もユーザ側で、市
販ソフトを組み合み合わせるとか、あるいは、一部自作
する等、自由度がかなり広がる。（４）認識結果のテキストに対し、市販の音声出力装置
、ソフトウェアを付加して、スピーカ発声させれば、文
字認識結果を音声で確認できる。と言う効果がある。[Effects of the Invention] As explained above, in a series of character recognition operations, the character recognition process for only one character image, which is the most burdensome part, is controlled separately and handled independently as a separate housing.
By separating all other processes so that they are performed on the main machine as before, (1) the main machine can be executed on a low-cost personal computer or workstation. (2) Recognition speed can also be secured at 60 to 70 characters/second, two to three times faster than conventional technology. (3) In addition, the main machine-side processing, such as a program for reading the type image on paper, a program for cutting out single character images, a program for evaluating and correcting recognition results, a program for displaying text, etc., are also performed by the user. , you can combine commercially available software, or create some of your own, giving you a lot of freedom. (4) By adding a commercially available audio output device and software to the text that is the recognition result and making it speak through a speaker, the character recognition result can be confirmed audibly. It has the effect of saying.

[Brief explanation of drawings]

【図１】実施例１のシステム構成図。FIG. 1 is a system configuration diagram of a first embodiment.

【図２】文字認識装置のシステム構成図。FIG. 2 is a system configuration diagram of a character recognition device.

【図３】画像データ及び認識結果データのフォーマット
例示図。FIG. 3 is a diagram illustrating a format of image data and recognition result data.

【図４】画面表示の例示図。FIG. 4 is an exemplary diagram of a screen display.

【図５】結果テキストのフォーマット例示図。FIG. 5 is a diagram illustrating a format of result text.

【図６】メインプログラムのフローチャート。FIG. 6 is a flowchart of the main program.

【図７】文字認識プログラムのフローチャート。FIG. 7 is a flowchart of a character recognition program.

【図８】画像読み取りプログラムのフローチャート。FIG. 8 is a flowchart of an image reading program.

【図９】文字認識指令プログラムのフローチャート。FIG. 9 is a flowchart of a character recognition command program.

【図１０】文字切り出しプログラムのフローチャート。FIG. 10 is a flowchart of a character extraction program.

【図１１】後処理プログラムのフローチャート。FIG. 11 is a flowchart of a post-processing program.

【図１２】音声合成プログラムのフローチャート。FIG. 12 is a flowchart of a speech synthesis program.

【図１３】保存プログラムのフローチャート。FIG. 13 is a flowchart of a storage program.

【図１４】実施例２の文字きりだし処理の説明図。FIG. 14 is an explanatory diagram of character cutting processing according to the second embodiment.

【図１５】実施例２の文字切り出しプログラムのフロー
チャートの第１図。FIG. 15 is a first diagram of a flowchart of a character segmentation program according to the second embodiment.

【図１６】実施例２の文字切り出しプログラムのフロー
チャートの第２図。FIG. 16 is a second diagram of a flowchart of a character segmentation program according to the second embodiment.

【図１７】結果テキストフォーマットの第２の例示図。FIG. 17 is a second exemplary diagram of the resulting text format.

【図１８】実施例３の文字候補入れ替え処理プログラム
のフローチャート。FIG. 18 is a flowchart of a character candidate replacement processing program according to the third embodiment.

【図１９】実施例４の意味処理プログラムのフローチャ
ート。FIG. 19 is a flowchart of a semantic processing program according to the fourth embodiment.

【図２０】意味処理で、不正と判断される文の例示図。FIG. 20 is a diagram illustrating a sentence that is determined to be invalid through semantic processing.

【図２１】実施例５の音声合成プログラムのフローチャ
ートの第１図。FIG. 21 is a first diagram of a flowchart of a speech synthesis program according to the fifth embodiment.

【図２２】実施例５の音声合成プログラムのフローチャ
ートの第１図。FIG. 22 is a first diagram of a flowchart of a speech synthesis program according to the fifth embodiment.

Claims

[Claims]

1. An input means for inputting image information, a cutting means for cutting out a character region from the image information, a recognition means for recognizing the image information of the cut out region and deriving it as character information, and a recognition result. A document processing device comprising: a display means for displaying character information; and an audio output means for outputting the character information as sound.

2. The document processing apparatus according to claim 1, further comprising editing means for editing the displayed character information.

3. The document processing apparatus according to claim 1, wherein the cutting means derives a plurality of pieces of character information from the image information to be cut out, together with a frequency representing certainty.

4. The method according to claim 1, further comprising control means for controlling to change the region to be cut out when the frequency representing the certainty of the plurality of derived character information is low. Document processing device.

5. The document processing device includes means for storing the derived plurality of character information, means for converting the character information stored in the storage means into kanji and kana characters, and a means for converting the character information stored in the storage means into kanji and kana characters, 2. The document processing apparatus according to claim 1, further comprising selection means for selecting character information from which a phrase can be obtained.

6. The document processing device includes an instruction means for instructing a change in the recognition result, and when instructed by the instruction means, the document processing device changes character information corresponding to the information outputted by the voice output means. 2. The document processing apparatus according to claim 1, further comprising display control means for adding identification information and controlling the display on said display means.

7. Inputting image information, cutting out a character region from the image information, recognizing the image information of the cut out region, deriving it as character information, and displaying the character information that is the recognition result, A document processing method characterized by outputting the character information as audio.

8. The document processing method according to claim 7, further comprising editing the displayed character information.

9. The document processing method according to claim 7, further comprising deriving a plurality of pieces of character information along with a frequency representing certainty from the extracted image information.

10. The document processing method according to claim 7, further comprising controlling to change the region to be cut out if the frequency representing the certainty of the plurality of derived character information is low.