JPH06290298A

JPH06290298A - Correcting method for erroneously written character

Info

Publication number: JPH06290298A
Application number: JP5076589A
Authority: JP
Inventors: Hironori Miyamoto; 博紀宮本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-04-02
Filing date: 1993-04-02
Publication date: 1994-10-18

Abstract

PURPOSE:To correct an erroneously written character by using a correction pattern prepared by an operator himself and a co-operator by automatically correcting the erroneously written character by using cooccurence relation generated by his own computer and the correction pattern generated by the other computer. CONSTITUTION:Concerning data read from a word lattice file 303, a text correction aid program 301 performs both the automatic correction of the erroneously written character by using the cooccurence relation read from a cooccurence relation file 305 and an intermediate file 309 and the correction of the erroneously written character manually worked by the user and outputs the corrected result to a text file 307. When the user selects the character recognized text to be the object of correction, the erroneously written character in a correcting image is automatically corrected by using the cooccurence relation of the word stored in advance. When any erroneously written character exists in the character recognized text to be the object of correction, the correct character candidates of all the words are displayed, the inputs of both the word including the right character and the cooccurence word are accepted, and the new cooccurence relation is generated and stored from the word including the correct character inputted by the user and the cooccurence word.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は自然言語で書かれたテキ
ストの誤字を修正する方法に関するものである。特に、
文字認識結果に含まれる誤字の修正に関する。FIELD OF THE INVENTION The present invention relates to a method for correcting typographical errors in text written in natural language. In particular,
Regarding correction of typographical errors included in character recognition results.

【０００２】[0002]

【従来の技術】文字認識結果の修正を行う従来技術の例
として、特開平４−３２３７８７があげられる。これは
誤読文字の修正方法の特許である。この発明は、或る単
語と共起関係にある単語を頻度順に並べた共起テーブル
を参照して得られる共起単語と、単語長が２以上で候補
もれを含む候補単語または１文字単語の候補単語との照
合を行うことにより、誤読文字の修正を効率良く行い誤
読修正率を向上させるものである。2. Description of the Related Art Japanese Patent Laid-Open No. 4-323787 is an example of a conventional technique for correcting a character recognition result. This is a patent for a method of correcting misread characters. The present invention provides a co-occurrence word obtained by referring to a co-occurrence table in which words having a co-occurrence relationship with a certain word are arranged in order of frequency, and a candidate word or a single-character word having a word length of 2 or more and including candidate omissions. By collating with the candidate word of, the misread character is efficiently corrected and the misread correction rate is improved.

【０００３】また、分散した計算機環境上での文書処理
の従来技術の例として、特開平４−２４８６４０があげ
られる。これは、文書論理構造の分散処理装置の特許で
ある。この発明は、文書論理の分散処理装置に関し、文
書の論理構造および属性情報を特定の機種に依存しない
構造メディアを元に実体の更新、参照、追加、削除など
を行い、文書単位に共同執筆、共同作成を可能にするこ
とを目的としている。Japanese Patent Laid-Open No. 4-248640 can be cited as an example of a conventional technique for document processing in a distributed computing environment. This is a patent for a distributed processing device having a document logical structure. The present invention relates to a distributed processing apparatus for document logic, which updates, refers to, adds, or deletes entities based on a structural medium that does not depend on a specific model for the logical structure and attribute information of a document, and co-writes in document units. The purpose is to enable co-creation.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、文字認識の
結果に含まれる誤字を複数の人が共同で修正する手段を
提供する。特に、文字認識された一つの文書を分割し、
複数の人で分担して誤字を修正する手段を提供する。SUMMARY OF THE INVENTION The present invention provides a means for a plurality of persons to jointly correct a typographical error included in the result of character recognition. In particular, it divides one document that has been character-recognized,
Providing a means to correct typos shared by multiple people.

【０００５】[0005]

【課題を解決するための手段】文字認識結果に含まれる
誤字を修正する手段としては、ユーザが修正対象となる
文字認識されたテキストを選択すると、あらかじめ記憶
された単語の共起関係を用いて修正対象中の誤字の自動
修正を行い、なおかつ、修正対象となる文字認識された
テキストに誤字が存在する場合、全単語の正字候補を表
示し、正字を含む単語およびその共起語の入力を受け付
け、ユーザにより入力された正字を含む単語およびその
共起語から新たな共起関係を生成し記憶する方法を用い
る。[Means for Solving the Problems] As means for correcting a typographical error included in a character recognition result, when a user selects a character-recognized text to be corrected, a co-occurrence relation of words stored in advance is used. If a typographical error in the correction target is corrected automatically, and if there is a typographical error in the recognized text that is the correction target, the orthographic candidates for all words are displayed, and the words containing the orthographic characters and their co-occurrence words are input. A method is used in which a new co-occurrence relation is generated and stored from a word including an orthography input by the user and its co-occurrence word.

【０００６】さらに、複数の人で共同で誤字の修正をす
る手段としては、あらかじめ記憶された共起関係と他計
算機から送られてきた単語の共起関係を用いて誤字の自
動修正を行い、自計算機で生成した新たな共起関係を自
計算機に記憶し、他計算機に配布する方法を用いる。さ
らに、同一の文書を分割し、複数の人で分担して誤字を
修正する手段として、修正対象となる文字認識されたテ
キストと同一のテキストより生成された単語の共起関係
を用いて誤字の自動修正を行い、修正対象となる文字認
識されたテキストを持つ計算機名とファイル名を新たに
生成した共起関係に付加する方法を用いる。Further, as means for jointly correcting a typographical error by a plurality of people, a typographical error is corrected automatically by using a co-occurrence relation stored in advance and a co-occurrence relation of words sent from another computer. A new co-occurrence relation generated by the own computer is stored in the own computer and distributed to other computers. Furthermore, as a means to correct the typographical error by dividing the same document and sharing it among multiple people, the co-occurrence relationship between the character-recognized text to be corrected and the word generated from the same text is used to correct the typographical error. A method is used in which automatic correction is performed and a computer name and a file name having the character-recognized text to be corrected are added to the newly generated co-occurrence relation.

【０００７】この他に、文字認識結果に含まれる誤字を
修正する手段としては、ユーザが修正対象となる文字認
識されたテキストを選択すると、あらかじめ記憶された
正誤関係を用いて修正対象中の誤字の自動修正を行い、
なおかつ、修正対象となる文字認識されたテキストに誤
字が存在する場合、全単語の正字候補を表示し、正字を
含む単語の入力を受け付け、ユーザにより入力された正
字を含む単語から新たな正誤関係を生成し記憶する手段
を用いる。さらに、複数の人で共同で誤字の修正をする
手段としては、あらかじめ記憶された正誤関係と他計算
機から送られてきた単語の正誤関係を用いて修正対象中
の誤字の自動修正を行い、自計算機で生成した新たな正
誤関係を自計算機に記憶し、他計算機に配布する手段を
用いる。さらに同一の文書を分割し、複数の人で分担し
て誤字を修正する手段として、修正対象となる文字認識
されたテキストと同一のテキストより生成された単語の
正誤関係を用いて修正対象中の誤字の修正を行い、修正
対象となる文字認識されたテキストを持つ計算機名とフ
ァイル名を新たに生成した正誤関係に付加する手段を用
いる。In addition to this, as a means for correcting a typographical error included in the character recognition result, when the user selects the text for which the character is recognized as the correction target, the typographical error in the correction target is stored using the correctness relation stored in advance. Automatic correction of
In addition, if there is a typographical error in the character-recognized text that is the correction target, the orthographic candidates of all words are displayed, the input of the word including the orthographic characters is accepted, and the new correct or incorrect relation is input from the word including the orthographic characters input by the user. To generate and store. Furthermore, as a method for joint correction of typographical errors by multiple people, the typographical error in the correction target is automatically corrected by using the correctness relationship stored in advance and the correctness relationship of words sent from another computer. A means for storing a new correct / wrong relationship generated by a computer in its own computer and distributing it to another computer is used. Furthermore, as a means of dividing the same document and correcting the typographical error by sharing it among multiple people, the correctness of the word generated from the same text as the character-recognized text to be corrected is used to correct the erroneous character. A means for correcting the typographical error and adding the computer name and the file name having the character-recognized text to be corrected to the newly created correctness relationship is used.

【０００８】[0008]

【作用】文字認識されたテキストをユーザが選択する
と、あらかじめ記憶された単語の共起関係を用いて修正
対象中の誤字の自動修正を行うことで、テキスト中の誤
字を修正する。なおかつ、文字認識されたテキストに誤
字が存在する場合、全単語の正字候補を表示し、正字を
含む単語およびその共起語の入力を受け付け、ユーザに
より入力された正字を含む単語およびその共起語から新
たな共起関係を生成し記憶することで、今までに現れな
かった共起関係の生成および記憶を行う。When the user selects the character-recognized text, the typographical error in the text is corrected by automatically correcting the typographical error in the correction target using the co-occurrence relation of the words stored in advance. If there are typographical errors in the text that has been recognized, the candidates for orthographic characters of all the words are displayed, the words that include the orthographic characters and their co-occurrence words are accepted, and the words that include the orthographic characters input by the user and their co-occurrence words are received. By generating a new co-occurrence relation from a word and storing it, a co-occurrence relation that has not appeared so far is generated and stored.

【０００９】あらかじめ記憶された共起関係と他計算機
から送られてきた単語の共起関係を用いて誤字の自動修
正を行うことで、自計算機と他計算機で出てきた過去の
共起関係を用いてテキスト中の誤字を修正する。自計算
機で生成した新たな共起関係を自計算機に記憶し、他計
算機に配布することで、自計算機と他計算機に今まで現
れなかった共起関係を記憶する。修正対象となるテキス
トと同一のテキストより生成された単語の共起関係を用
いて誤字の自動修正を行うことで、自計算機と他計算機
で出てきた過去の共起関係のうち現在修正中の文書から
生成された共起関係を用いて文書中の誤字を修正する。By automatically correcting typos using the co-occurrence relations stored in advance and the co-occurrence relations of words sent from another computer, the past co-occurrence relations appearing on the self-computer and the other computer can be corrected. Use to correct typographical errors in the text. By storing the new co-occurrence relation generated in the self-computer in the self-computer and distributing it to the other computer, the co-occurrence relation which has not appeared in the self-computer and the other computer until now is stored. By automatically correcting typographical errors using the co-occurrence relationship of words generated from the same text as the correction target, the current co-occurrence relationship between the self-calculator and other calculators that is currently being corrected is corrected. Correct typographical errors in a document using the co-occurrence relationships generated from the document.

【００１０】修正対象となるテキストを持つファイルの
計算機名とファイル名を新たに生成した共起関係に付加
することで、現在修正中のテキストについて今まで現れ
なかった共起関係を自計算機と他計算機に記憶する。文
字認識されたテキストをユーザが選択すると、あらかじ
め記憶された単語の正誤関係を用いて修正対象中の誤字
の自動修正を行うことで、テキスト中の誤字を修正す
る。By adding the computer name and the file name of the file having the text to be corrected to the newly generated co-occurrence relation, the co-occurrence relation that has not appeared until now with respect to the text currently being corrected can be exchanged with the other computer. Store in the calculator. When the user selects the character-recognized text, the typographical error in the text is corrected by automatically correcting the typographical error in the correction target using the correctness relation of the words stored in advance.

【００１１】なおかつ、文字認識されたテキストに誤字
が存在する場合、全単語の正字候補を表示し、正字を含
む単語の入力を受け付け、ユーザにより入力された正字
を含む単語から新たな正誤関係を生成し記憶すること
で、今までに現れなかった正誤関係の生成および記憶を
行う。あらかじめ記憶された正誤関係と他計算機から送
られてきた単語の正誤関係を用いて誤字の自動修正を行
うことで、自計算機と他計算機で出てきた過去の正誤関
係を用いてテキスト中の誤字を修正する。If there is a typographical error in the character-recognized text, the orthographical candidates of all the words are displayed, the input of the word including the orthographical characters is accepted, and a new correctness relation is input from the word including the orthographical characters input by the user. By generating and storing, a correctness relationship that has not appeared so far is generated and stored. By automatically correcting typographical errors using the correctness relations stored in advance and the correctness relations of words sent from other computers, the typographical errors in the text are made using the past correctness relations that have appeared on your computer and other computers. To fix.

【００１２】自計算機で生成した新たな正誤関係を自計
算機に記憶し、他計算機に配布することで、自計算機と
他計算機に今まで現れなかった正誤関係を記憶する。修
正対象となるテキストと同一のテキストより生成された
単語の正誤関係を用いて誤字の自動修正を行うことで、
自計算機と他計算機で出てきた過去の正誤関係のうち現
在修正中の文書から生成された正誤関係を用いて文書中
の誤字を修正する。By storing the new correct / wrong relationship generated by the own computer in the own computer and distributing it to the other computer, the correct / wrong relationship which has never appeared in the own computer and the other computer is stored. By automatically correcting typos using the correctness relationship between words generated from the same text as the correction target,
The typographical error in the document is corrected by using the correctness relation generated from the document currently being corrected among the past correctness relations appearing on the own computer and other computers.

【００１３】修正対象となるテキストを持つファイルの
計算機名とファイル名を新たに生成した正誤関係に付加
することで、現在修正中のテキストについて今まで現れ
なかった正誤関係を自計算機と他計算機に記憶する。By adding the computer name and the file name of the file having the text to be corrected to the newly created correctness relationship, the correctness relationship that has not appeared so far for the text currently being modified can be transmitted to the own computer and other computers. Remember.

【００１４】[0014]

【実施例】本発明の一実施例を図を用いて説明する。こ
こでは、本発明を用いた日本語テキストの修正支援シス
テムについて説明する。図１は、システムフローであ
る。このシステムフローはテキスト修正支援プログラム
３０１のものである。このシステムフローの説明をする
前に、ハードウェア構成とシステム構成を図２と図３を
用いて説明する。Embodiment An embodiment of the present invention will be described with reference to the drawings. Here, a Japanese text correction support system using the present invention will be described. FIG. 1 is a system flow. This system flow is for the text correction support program 301. Before describing this system flow, the hardware configuration and system configuration will be described with reference to FIGS. 2 and 3.

【００１５】図２はハードウェア構成である。ハードウ
ェア構成は複数の計算機が通信回線２１３で結ばれてい
る。これらの計算機間では命令およびデータのやりとり
ができる。計算機は主記憶装置２０１（２２１）、ＣＰ
Ｕ２０３（２２３）、ＣＲＴ２０５（２２５）、２次記
憶装置２０７（２２７）、キーボード２０９（２２
９）、マウス２１１（２３１）で構成されている。FIG. 2 shows the hardware configuration. The hardware configuration is such that a plurality of computers are connected by a communication line 213. Instructions and data can be exchanged between these computers. The computer is the main memory 201 (221), CP
U203 (223), CRT205 (225), secondary storage device 207 (227), keyboard 209 (22)
9) and the mouse 211 (231).

【００１６】図３はシステム構成である。テキスト修正
支援プログラム３０１は、単語ラティスファイル３０３
から読み込んだデータに対して、共起関係ファイル３０
５と中間ファイル３０９から読み込んだ共起関係を用い
た誤字の自動修正ならびにユーザの手作業による誤字の
修正を行い、修正結果をテキストファイル３０７に出力
する。テキスト修正支援プログラム３０１と通信プログ
ラム３１１は主記憶装置２０１（２２１）上に存在す
る。仮想記憶が実現されている計算機の場合は仮想記憶
上に存在する。FIG. 3 shows the system configuration. The text correction support program 301 uses the word lattice file 303.
The co-occurrence relation file 30 for the data read from
5 and erroneous characters are automatically corrected using the co-occurrence relationship read from the intermediate file 309 and the intermediate file 309, and the correction result is output to the text file 307. The text correction support program 301 and the communication program 311 exist on the main storage device 201 (221). In the case of a computer that realizes virtual memory, it exists on the virtual memory.

【００１７】単語ラティスファイル３０３、共起関係フ
ァイル３０５、テキストファイル３０７および中間ファ
イル３０９は２次記憶装置２０７（２２７）上に存在す
る。The word lattice file 303, the co-occurrence relation file 305, the text file 307, and the intermediate file 309 exist on the secondary storage device 207 (227).

【００１８】テキスト修正支援プログラム３０１と通信
プログラム３１１のプログラムは別のプロセスである。
この２つのプログラムはプロセス間通信によって、命令
およびデータのやりとりができる。単語ラティスファイ
ル３０３は、文字認識結果の単語ラティスを格納したフ
ァイルである。単語ラティスとは、ある文字認識結果に
対して複数通りの単語分割を行い、分割結果に優先順位
をつけたものである。単語ラティスファイル３０３につ
いて図４を用いて説明する。The programs of the text correction support program 301 and the communication program 311 are separate processes.
These two programs can exchange instructions and data by interprocess communication. The word lattice file 303 is a file that stores the word lattice of the character recognition result. The word lattice is obtained by dividing a certain character recognition result into a plurality of word divisions and prioritizing the division results. The word lattice file 303 will be described with reference to FIG.

【００１９】図４に単語ラティスファイル３０３の論理
構造を挙げる。データ４０１（４１０）はブロック識別
子である。ブロック識別子は文字認識単位を識別するた
めのものである。データ４０３（４１３）はデータ識別
子である。データ４０３は計算機を識別するためのホス
ト識別子４０３ａとファイル識別子４０３ｂからなる。
データ４０４（４０５、４０６、４０７、４０８、４０
９）は単語情報である。単語情報は単語見出し４０４ａ
と品詞情報４０４ｂと優先順位４０４ｃからなる。FIG. 4 shows the logical structure of the word lattice file 303. The data 401 (410) is a block identifier. The block identifier is for identifying the character recognition unit. The data 403 (413) is a data identifier. The data 403 includes a host identifier 403a and a file identifier 403b for identifying the computer.
Data 404 (405, 406, 407, 408, 40)
9) is word information. The word information is the word headline 404a.
And part-of-speech information 404b and priority 404c.

【００２０】ブロック識別子はデータ識別子へのポイン
タ４０２と単語情報へのポインタ（４１４、４１５、４
１６）と次のブロック識別子へのポインタ４１７を持
つ。Block identifiers are pointers 402 to data identifiers and pointers to word information (414, 415, 4).
16) and a pointer 417 to the next block identifier.

【００２１】単語情報は次の単語へのポインタ（４１
８、４１９、４２０、４２１、４２２、４２３、４２
４）を持つ。共起関係ファイル３０５は、誤字の修正パ
ターンである共起関係を格納したファイルである。共起
関係ファイル３０５について図５を用いて説明する。The word information is a pointer to the next word (41
8, 419, 420, 421, 422, 423, 42
Have 4). The co-occurrence relation file 305 is a file that stores a co-occurrence relation that is a correction pattern of a typographical error. The co-occurrence relation file 305 will be described with reference to FIG.

【００２２】図５に共起関係ファイル３０５の論理構造
を挙げる。データ５０１は共起関係の見出しである。共
起関係の見出しには見出し語５０１ａ、ホスト識別子５
０１ｂ、およびファイル識別子５０１ｃからなる。見出
し語５０１ａは誤字が含まれている単語である。データ
５０３は修正語である。修正語５０３は見出し語５０１
ａに含まれる誤字に対する正字が含まれている単語であ
る。データ５０５は共起データである。共起データは共
起語見出し５０５ａ、品詞情報５０５ｂ、位置情報５０
５ｃからなる。共起語見出し５０５ａは見出し語５０１
ａと共起する単語である。FIG. 5 shows the logical structure of the co-occurrence relation file 305. Data 501 is a co-occurrence relationship heading. The co-occurrence relationship heading is headword 501a and host identifier 5
01b and a file identifier 501c. The entry word 501a is a word including a typographical error. The data 503 is a correction word. The correction word 503 is the headword 501
It is a word that includes a proper letter for the typographical error included in a. The data 505 is co-occurrence data. The co-occurrence data includes co-occurrence word headline 505a, part-of-speech information 505b, and position information 50.
It consists of 5c. The co-occurrence word headline 505a is the headword 501
It is a word that co-occurs with a.

【００２３】共起関係の見出しは修正語へのポインタ５
０２と、次の共起関係へのポインタ５０７を持つ。修正
語５０３は共起データへのポインタ５０４と次の修正語
へのポインタ５０８を持つ。共起データ５０５は次の共
起データへのポインタ５０６を持つ。テキストファイル
３０７は、誤字の修正結果を格納するファイルである。
テキストファイル３０７について図６を用いて説明す
る。The co-occurrence relation heading is a pointer 5 to a correction word.
02 and a pointer 507 to the next co-occurrence relation. The correction word 503 has a pointer 504 to the co-occurrence data and a pointer 508 to the next correction word. The co-occurrence data 505 has a pointer 506 to the next co-occurrence data. The text file 307 is a file that stores the correction result of a typographical error.
The text file 307 will be described with reference to FIG.

【００２４】図６にテキストファイルの論理構造を挙げ
る。フィールド６０１はブロック識別子フィールドであ
る。ブロック識別子フィールドには単語ラティスファイ
ルのブロック識別子４０１を格納する。フィールド６０
２はホスト識別子フィールドである。ホスト識別子フィ
ールドには単語ラティスファイルのホスト識別子４０３
ａを格納する。フィールド６０３はファイル識別子フィ
ールドである。ファイル識別子フィールドには単語ラテ
ィスファイルのブロック識別子４０３ｂを格納する。フ
ィールド６０４はテキストフィールドである。テキスト
フィールドには共起関係を用いて修正した単語列を格納
する。中間ファイル３０９は、テキスト修正支援プログ
ラム３０１と通信プログラム３１１間で共起関係の交換
をするために用いる。中間ファイル３０９について図７
を用いて説明する。FIG. 6 shows the logical structure of the text file. Field 601 is a block identifier field. The block identifier 401 of the word lattice file is stored in the block identifier field. Field 60
2 is a host identifier field. In the host identifier field, the host identifier 403 of the word lattice file
Store a. Field 603 is a file identifier field. The block identifier 403b of the word lattice file is stored in the file identifier field. Field 604 is a text field. The text field stores the word string corrected using the co-occurrence relationship. The intermediate file 309 is used to exchange the co-occurrence relationship between the text correction support program 301 and the communication program 311. About the intermediate file 309 FIG.
Will be explained.

【００２５】図７に中間ファイル３０９の論理構造を挙
げる。中間ファイル３０９にはフィールド７０１、７０
２、７０３、７０４を持つレコード７１０が格納され
る。フィールド７０１は入出力フラグフィールドであ
る。入出力フラグフィールドには外部の計算機に配布す
るデータか外部の計算機から取得したデータかを区別す
るフラグを格納する。フィールド７０２はホスト識別子
フィールドである。ホスト識別子フィールドには単語ラ
ティス中のホスト識別子４０３ａの値を格納する。FIG. 7 shows the logical structure of the intermediate file 309. The intermediate file 309 contains fields 701 and 70.
A record 710 having 2, 703, 704 is stored. The field 701 is an input / output flag field. The input / output flag field stores a flag for distinguishing between data to be distributed to an external computer and data acquired from the external computer. Field 702 is a host identifier field. The value of the host identifier 403a in the word lattice is stored in the host identifier field.

【００２６】フィールド７０３はファイル識別子フィー
ルドである。ファイル識別子フィールドには単語ラティ
ス中のファイル識別子４０３ｂの値を格納する。フィー
ルド７０４はデータフィールドである。データフィール
ドには共起関係を格納する。データフィールドに格納さ
れる共起関係は共起関係ファイル３０５の論理構造から
ホスト識別子５０１ｂとファイル識別子５０１ｃを除い
たものと等しい。Field 703 is a file identifier field. The file identifier field stores the value of the file identifier 403b in the word lattice. Field 704 is a data field. The data field stores the co-occurrence relationship. The co-occurrence relation stored in the data field is equal to the logical structure of the co-occurrence relation file 305 excluding the host identifier 501b and the file identifier 501c.

【００２７】通信プログラム３１１は、自計算機のデー
タを他計算機に配布する処理、他計算機のデータを自計
算機にとり込む処理を行う。通信プログラム３１１の処
理を図８を用いて説明する。The communication program 311 performs a process of distributing the data of its own computer to another computer and a process of fetching the data of the other computer into its own computer. The processing of the communication program 311 will be described with reference to FIG.

【００２８】図８は通信プログラム３１１の処理フロー
である。処理８０１は自計算機と他計算機からのメッセ
ージ入力待ち処理である。処理８０３は処理８０１で得
られた入力の入力先の計算機の判定処理である。入力先
が自計算機ならば中間ファイル３０９から入出力フラグ
が”ＯＵＴ”の共起関係を他計算機に配布し、入力先が
他計算機ならば配布された共起関係に入出力フラグ”Ｉ
Ｎ”をつけて中間ファイル３０９に格納する。処理８０
５は中間ファイル３０９中の入出力フラグが”ＯＵＴ”
である共起関係の有無を判定する処理である。”ＯＵ
Ｔ”フラグを持つレコードの作成はテキスト修正支援プ
ログラム３０１の共起関係配布処理１４０７で行われ
る。該当する共起関係が存在するならば、他計算機に共
起関係を配布し、配布した共起関係を中間ファイル３０
９から削除する。該当する共起関係が存在しないなら
ば、処理８０１に戻る。FIG. 8 is a processing flow of the communication program 311. A process 801 is a process of waiting for message input from the own computer and other computers. A process 803 is a process of determining the input destination computer of the input obtained in the process 801. If the input destination is the own computer, the co-occurrence relation with the input / output flag "OUT" is distributed from the intermediate file 309 to the other computer, and if the input destination is another computer, the input / output flag "I" is added to the distributed co-occurrence relation.
N is added and stored in the intermediate file 309. Process 80
5 indicates that the input / output flag in the intermediate file 309 is "OUT"
Is a process of determining the presence or absence of a co-occurrence relationship. OU
The record having the T "flag is created by the co-occurrence relation distribution processing 1407 of the text correction support program 301. If the corresponding co-occurrence relation exists, the co-occurrence relation is distributed to another computer and the distributed co-occurrence relation is distributed. Relationship intermediate file 30
Delete from 9. If the corresponding co-occurrence relation does not exist, the process returns to processing 801.

【００２９】処理８０７は他計算機に共起関係を配布す
る処理である。処理８０９は他計算機に配布した共起関
係を中間ファイル３０９から削除する処理である。処理
８１１は配布された共起関係に入出力フラグ”ＩＮ”を
つけて中間ファイル３０９に格納する処理である。Process 807 is a process for distributing the co-occurrence relation to other computers. The process 809 is a process of deleting the co-occurrence relation distributed to another computer from the intermediate file 309. The process 811 is a process of adding the input / output flag “IN” to the distributed co-occurrence relation and storing it in the intermediate file 309.

【００３０】図１はテキスト修正支援プログラム３０１
のシステムフローである。テキスト修正支援プログラム
３０１は単語ラティスファイル３０３から修正対象とな
る単語ラティスを読み込み、共起関係を用いた単語列の
自動修正、単語列の手動修正の支援を行い、修正結果を
テキストファイル３０７に出力するプログラムである。
処理１０１は画面表示を行う処理である。処理１０１で
表示する画面を図９に挙げる。FIG. 1 shows a text correction support program 301.
Is the system flow of. The text correction support program 301 reads the word lattice to be corrected from the word lattice file 303, supports the automatic correction of the word string using the co-occurrence relation, the manual correction of the word string, and outputs the correction result to the text file 307. It is a program to do.
Process 101 is a process for displaying a screen. The screen displayed in the process 101 is shown in FIG.

【００３１】図９は単語ラティスファイル３０３から読
み込んだ単語列を表示する画面である。表示枠９０３は
単語列を表示する枠である。表示枠９０３は文字認識単
位表示枠９０３ａ、９０３ｂ、９０３ｃ、９０３ｄ、９
０３ｅからなる。ユーザが修正したい文字認識単位を選
択するには文字認識単位表示枠９０３ａ、９０３ｂ、９
０３ｃ、９０３ｄ、９０３ｅをユーザがマウス２１１
（２３１）を使って指示する。FIG. 9 is a screen for displaying a word string read from the word lattice file 303. The display frame 903 is a frame for displaying a word string. The display frame 903 is a character recognition unit display frame 903a, 903b, 903c, 903d, 9
It consists of 03e. To select the character recognition unit that the user wants to correct, the character recognition unit display frames 903a, 903b, 9
03c, 903d, 903e is the user's mouse 211
Instruct using (231).

【００３２】記号９０５、９０７、９０９、９１１、９
１３、９１４、９１５、９１６はテキスト修正支援プロ
グラムのコマンドである。記号９０５はプログラム終了
コマンドである。記号９０７は単語ラティスファイル３
０３から単語ラティスを読む込むコマンドである。記号
９０９は修正した単語列をテキストファイル３０７に格
納するコマンドである。記号９１１は手動によるテキス
ト修正開始コマンドである。記号９１３、９１４、９１
５、９１６は枠９０３に表示された単語列のスクロール
を行うコマンドである。これらのコマンドはマウス２１
１（２３１）を使って指示することで起動される。Symbols 905, 907, 909, 911, 9
Reference numerals 13, 914, 915, and 916 are text correction support program commands. Symbol 905 is a program end command. Symbol 907 is word lattice file 3
It is a command to read the word lattice from 03. Symbol 909 is a command for storing the corrected word string in the text file 307. Reference numeral 911 is a manual text correction start command. Symbols 913, 914, 91
Reference numerals 5 and 916 are commands for scrolling the word string displayed in the frame 903. These commands are mouse 21
It is activated by instructing using 1 (231).

【００３３】処理１０３はユーザからの入力を待つ処理
である。処理１０５は処理１０３で得られた入力を判定
する処理である。処理１０３でプログラム終了コマンド
９０５を得た場合、処理１０７を実行する。処理１０３
で単語ラティスを読み込むコマンド９０７を得た場合、
処理１０９を実行する。処理１０３で修正した単語列を
格納するコマンド９０９を得た場合、処理１１１を実行
する。処理１０３でスクロールコマンド９１３、９１
４、９１５、９１６を得た場合、処理１１７を実行す
る。処理１０３で文字認識単位選択を得た場合、処理１
１３を実行する。処理１０３でテキスト修正開始コマン
ド９１１を得た場合、処理１１５を実行する。処理１０
７は処理１０１で表示した画面を消去する処理である。
処理１０９は単語ラティスファイル３０３から単語ラテ
ィスを読み込む処理である。処理１０９を図１０を用い
て説明する。Process 103 is a process of waiting for an input from the user. Process 105 is a process of determining the input obtained in process 103. When the program end command 905 is obtained in the process 103, the process 107 is executed. Process 103
If you get the command 907 to read the word lattice with
The process 109 is executed. When the command 909 for storing the word string corrected in the process 103 is obtained, the process 111 is executed. In process 103, scroll commands 913, 91
When 4, 915 and 916 are obtained, the processing 117 is executed. When the character recognition unit selection is obtained in process 103, process 1
Execute 13. When the text correction start command 911 is obtained in the process 103, the process 115 is executed. Processing 10
7 is a process for deleting the screen displayed in the process 101.
The process 109 is a process of reading a word lattice from the word lattice file 303. The process 109 will be described with reference to FIG.

【００３４】図１０は単語ラティスを読み込む処理の処
理フローである。処理１００１は単語ラティスファイル
３０３のファイル名、単語ラティスの先頭位置、単語ラ
ティスの件数をユーザから取得する処理である。処理１
００３は処理１００１で指定されたファイルを開く処理
である。処理１００５は処理１００３の処理の成功／不
成功の判定処理である。処理１００３の処理が成功した
ならば、単語ラティスファイルを読み込む。処理１００
３の処理が失敗したならば、図１０の処理を終了する。
処理１００７は処理１００１で得られた単語ラティスの
先頭位置まで単語ラティスを読み飛ばす処理である。処
理１００９は処理１００１で得られた件数分の単語ラテ
ィスを読み込み、表示枠９０３に最も優先順位の高い単
語情報を表示する処理である。処理１０１１は処理１０
０３で開いたファイルを閉じる処理である。処理１１１
は修正結果をテキストファイル３０７に格納する処理で
ある。処理１１１を図１１を用いて説明する。図１１は
修正結果をテキストファイルに格納する処理の処理フロ
ーである。FIG. 10 is a processing flow of processing for reading the word lattice. Process 1001 is a process of acquiring the file name of the word lattice file 303, the start position of the word lattice, and the number of word lattices from the user. Processing 1
A process 003 is a process for opening the file designated in the process 1001. A process 1005 is a process success / failure determination process of the process 1003. If the process 1003 is successful, the word lattice file is read. Processing 100
If the process of 3 fails, the process of FIG. 10 ends.
Process 1007 is a process of skipping the word lattice up to the start position of the word lattice obtained in process 1001. A process 1009 is a process of reading word lattices for the number of cases obtained in the process 1001 and displaying word information with the highest priority in the display frame 903. Process 1011 is process 10
This is the process of closing the file opened in 03. Process 111
Is a process of storing the correction result in the text file 307. The process 111 will be described with reference to FIG. FIG. 11 is a processing flow of processing for storing the correction result in the text file.

【００３５】処理１１０１はテキストファイル３０７の
ファイル名をユーザから取得する処理である。処理１１
０３は処理１１０１で指定されたファイルを開く処理で
ある。処理１１０５は処理１１０３の処理の成功／不成
功の判定処理である。処理１１０３が成功したならば、
修正結果をテキストファイルに出力する。処理１１０３
の処理が失敗したならば、図１１の処理を終了する。処
理１１０７は処理１１０３で指定されたファイルにブロ
ック識別子、ホスト識別子、ファイル識別子、および最
も優先順位の高い単語情報を出力する処理である。処理
１１０９は処理１１０３で開いたファイルを閉じる処理
である。処理１１３は処理１０３で選択された文字認識
単位に対して共起関係を用いて誤字の修正を行う処理で
ある。処理１１３を図１２を用いて説明する。Process 1101 is a process for acquiring the file name of the text file 307 from the user. Process 11
Reference numeral 03 is a process for opening the file designated in the process 1101. Processing 1105 is processing success / failure determination processing of processing 1103. If the process 1103 is successful,
Output the correction result to a text file. Process 1103
If the process of 1 fails, the process of FIG. 11 ends. The process 1107 is a process of outputting the block identifier, the host identifier, the file identifier, and the word information with the highest priority to the file designated in the process 1103. Processing 1109 is processing for closing the file opened in processing 1103. Process 113 is a process of correcting a typo using the co-occurrence relation for the character recognition unit selected in process 103. The process 113 will be described with reference to FIG.

【００３６】図１２は共起関係を用いて誤字の自動修正
を行う処理の処理フローである。処理１２０１は最も優
先順位の高い単語情報（以下、第１候補と呼ぶ）の数を
カウントするカウンタの初期化処理である。処理１２０
３は照合対象となる単語数の取得処理である。照合対象
は第１候補である。処理１２０５は修正処理の終了判定
である。単語数カウンタが照合対象の単語数を上回った
場合、図１２の処理は終了する。単語数カウンタが照合
対象の単語数に満たない場合は共起関係を用いた単語修
正処理を行う。処理１２０７は共起関係ファイル３０５
とｉ番めの単語の第１候補を照合する処理である。照合
は共起関係ファイル３０５の見出し語５０１ａが第１候
補と等しくかつホスト識別子５０１ｂがホスト識別子４
０３ａと等しくかつファイル識別子５０１ｃがファイル
識別子４０３ｂと等しい共起関係を検索し、検索された
共起関係が持つ共起データ５０５の共起語見出し５０５
ａ、品詞情報５０５ｂ、位置情報５０５ｃと全ての第１
候補をマッチする。FIG. 12 is a processing flow of processing for automatically correcting a typographical error using the co-occurrence relationship. A process 1201 is a counter initialization process for counting the number of pieces of word information having the highest priority (hereinafter, referred to as a first candidate). Process 120
Reference numeral 3 is an acquisition process of the number of words to be matched. The matching target is the first candidate. A process 1205 is a termination determination of the correction process. When the word number counter exceeds the number of words to be matched, the process of FIG. 12 ends. When the word number counter does not reach the number of words to be matched, word correction processing using the co-occurrence relation is performed. The process 1207 is the co-occurrence relation file 305.
Is a process of matching the first candidate of the i-th word. In the matching, the entry word 501a of the co-occurrence relation file 305 is equal to the first candidate and the host identifier 501b is the host identifier 4
03a and the file identifier 501c is equal to the file identifier 403b, and the co-occurrence word heading 505 of the co-occurrence data 505 possessed by the retrieved co-occurrence relationship is searched.
a, part of speech information 505b, position information 505c and all first
Match the candidates.

【００３７】処理１２０９は処理１２０７の成功／不成
功の判定処理である。処理１２０７が成功したならば、
ｉ番めの単語の第１候補を共起関係に含まれる修正語と
する。処理１２０７が失敗したならば、中間ファイル３
０９中の共起関係を用いた照合を行う。処理１２１１は
処理１２０７で照合した共起関係に含まれる修正語をｉ
番めの単語の第１候補に代入する処理である。処理１２
１３は中間ファイル３０９中の入出力フラグフィールド
７０１が”ＩＮ”であるレコードの共起関係とｉ番めの
単語の第１候補を照合する処理である。照合は中間ファ
イル３０９のデータフィールド７０４に含まれる見出し
語５０１ａが第１候補と等しくかつホスト識別子フィー
ルド７０２がホスト識別子４０３ａと等しくかつファイ
ル識別子フィールド７０３がファイル識別子４０３ｂと
等しい共起関係を検索し、検索された共起関係が持つ共
起データ５０５の共起語見出し５０５ａ、品詞情報５０
５ｂ、位置情報５０５ｃと全ての第１候補をマッチす
る。Process 1209 is a success / failure determination process of process 1207. If the process 1207 is successful,
The first candidate of the i-th word is set as the correction word included in the co-occurrence relation. If the process 1207 fails, the intermediate file 3
Matching is performed using the co-occurrence relation in 09. The process 1211 is the correction word included in the co-occurrence relation checked in the process 1207 is i
This is a process of substituting the first candidate for the second word. Process 12
13 is a process of collating the co-occurrence relation of the record in which the input / output flag field 701 in the intermediate file 309 is "IN" with the first candidate of the i-th word. The collation searches for a co-occurrence relation in which the entry word 501a included in the data field 704 of the intermediate file 309 is equal to the first candidate, the host identifier field 702 is equal to the host identifier 403a, and the file identifier field 703 is equal to the file identifier 403b. The co-occurrence word headline 505a and the part-of-speech information 50 of the co-occurrence data 505 possessed by the retrieved co-occurrence relationship
5b, position information 505c and all the first candidates are matched.

【００３８】処理１２１５は中間ファイル３０９中の入
出力フラグフィールド７０１が”ＩＮ”であるレコード
の共起関係を共起関係ファイル３０５に格納する処理で
ある。格納したレコードは中間ファイル３０９から削除
する。処理１２１７は処理１２１３の成功／不成功の判
定処理である。処理１２１３が成功したならば、処理１
２１１を行う。処理１２１３が失敗したならば、次の第
１候補について共起関係の照合を行う。処理１２１９は
照合した第１候補の数をカウントする処理である。処理
１１５は単語列のユーザによる手動修正処理である。処
理１１５を図１４を用いて説明する。The process 1215 is a process of storing the co-occurrence relation of the record in which the input / output flag field 701 in the intermediate file 309 is "IN" in the co-occurrence relation file 305. The stored record is deleted from the intermediate file 309. Process 1217 is a success / failure determination process of process 1213. If the processing 1213 is successful, the processing 1
Perform 211. If the process 1213 is unsuccessful, the co-occurrence relation is collated for the next first candidate. Process 1219 is a process of counting the number of first candidates that have been collated. Process 115 is a manual correction process by the user of the word string. The process 115 will be described with reference to FIG.

【００３９】図１４はユーザによる手動修正処理の処理
フローである。処理１４０１は手動修正用の画面を表示
する処理である。処理１４０１で表示する画面を図１３
を用いて説明する。FIG. 14 is a processing flow of the manual correction processing by the user. A process 1401 is a process of displaying a screen for manual correction. The screen displayed in the process 1401 is shown in FIG.
Will be explained.

【００４０】図１３は手動修正用の画面である。記号１
３０３、１３０４、１３０５、１３０６、１３０７は手
動修正用のコマンドである。記号１３０３は終了コマン
ドである。記号１３０４、１３０５、１３０６、１３０
７は単語表示枠１３０２のスクロールコマンドである。
表示枠１３０２は単語情報を表示する枠である。記号１
３１０、１３１１、１３１２、１３１３、１３１４は第
１候補の単語情報である。記号１３１５、１３１６、１
３１７、１３１８、１３１９は第１候補の単語情報より
優先順の低い単語情報（以下、第ｎ候補と呼ぶ）であ
る。第ｎ候補１３１５、１３１６に対する第１候補は記
号１３１０であり、第ｎ候補１３１７に対する第１候補
は記号１３１２であり、第ｎ候補１３１８、１３１９に
対する第１候補は記号１３１３である。FIG. 13 shows a screen for manual correction. Symbol 1
Reference numerals 303, 1304, 1305, 1306, and 1307 are commands for manual correction. Reference numeral 1303 is an end command. Symbols 1304, 1305, 1306, 130
Reference numeral 7 is a scroll command for the word display frame 1302.
The display frame 1302 is a frame for displaying word information. Symbol 1
310, 1311, 1312, 1313, and 1314 are the first candidate word information. Symbols 1315, 1316, 1
Word information 317, 1318, and 1319 are word information having a lower priority than the word information of the first candidate (hereinafter referred to as the nth candidate). The first candidate for the nth candidates 1315, 1316 is symbol 1310, the first candidate for the nth candidate 1317 is symbol 1312, and the first candidate for the nth candidates 1318, 1319 is symbol 1313.

【００４１】処理１４０３はユーザによる単語の修正処
理である。処理１４０３を図１５を用いて説明する。図
１５はユーザによる単語の修正処理の処理フローであ
る。処理１５０１はユーザからの入力待ちである。処理
１５０３は処理１５０１で得られた入力の判定処理であ
る。処理１５０１でスクロールコマンドが入力されたな
らば表示枠１３０２のスクロールを行う。処理１５０１
で第ｎ候補の記号が選択されたならば誤字の修正と共起
関係の作成を行う。処理１５０１で終了コマンドが入力
されたならば図１５の処理を終了する。処理１５０５は
表示枠１３０２のスクロール処理である。表示枠１３０
２は記号１３０４を選択すると上へ、記号１３０５を選
択すると下へ、記号１３０６を選択すると左へ、記号１
３０７を選択すると右へスクロールされる。Process 1403 is a process for correcting a word by the user. The process 1403 will be described with reference to FIG. FIG. 15 is a processing flow of word correction processing by the user. A process 1501 is waiting for an input from the user. Process 1503 is the input determination process obtained in process 1501. If a scroll command is input in processing 1501, the display frame 1302 is scrolled. Process 1501
If the nth candidate symbol is selected at, the typographical error is corrected and the co-occurrence relation is created. If the end command is input in processing 1501, the processing of FIG. 15 is ended. A process 1505 is a process of scrolling the display frame 1302. Display frame 130
2 selects the symbol 1304 to move up, the symbol 1305 selects the symbol down, and the symbol 1306 selects the symbol to the left, the symbol 1
Selecting 307 scrolls to the right.

【００４２】処理１５０７は選択された第ｎ候補に対す
る第１候補を見出しとする共起関係を作る処理である。
処理１５０９は選択された第ｎ候補を処理１５０７で作
成した共起関係の修正語に代入する処理である。処理１
５１１は選択された第ｎ候補を第１候補として表示枠１
３０２に表示する処理である。処理１５１３は共起語の
入力待ちである。共起語の入力はユーザがマウス２２１
（２３１）を使って単語情報を指示することで行われ
る。処理１５１５は処理１５１３で得られた入力の判定
処理である。処理１５１３で記号１３０３が入力された
ならば処理１５０１に戻る。処理１５１３でスクロール
コマンドが入力されたならば表示枠１３０２のスクロー
ルを行う。処理１５１３で単語情報が入力されたならば
入力された単語情報を処理１５０７で作成した共起関係
の共起データとする。The process 1507 is a process for creating a co-occurrence relation with the first candidate as the index for the selected nth candidate.
Process 1509 is a process of substituting the selected n-th candidate for the co-occurrence relation correction word created in process 1507. Processing 1
511 is a display frame 1 in which the selected nth candidate is the first candidate.
This is processing to be displayed on 302. The process 1513 is waiting for the input of the co-occurrence word. The user inputs the co-occurrence word with the mouse 221.
This is done by instructing the word information using (231). Process 1515 is the input determination process obtained in process 1513. If the symbol 1303 is input in process 1513, the process returns to process 1501. If a scroll command is input in processing 1513, the display frame 1302 is scrolled. When the word information is input in the process 1513, the input word information is used as the co-occurrence data of the co-occurrence relation created in the process 1507.

【００４３】処理１５１７は処理１５１３で入力された
単語情報を処理１５０７で作成した共起関係の共起デー
タとして記憶する処理である。処理１４０５は処理１４
０３で作成された共起関係を共起関係ファイル３０５に
格納する処理である。処理１４０７は処理１４０３で作
成された共起関係を他計算機へ配布する処理である。処
理１４０７では中間ファイル３０９に入出力フラグフィ
ールド７０１に”ＯＵＴ”を、ホスト識別子フィールド
７０２に共起関係のホスト識別子４０３ａを、ファイル
識別子フィールド７０３に共起関係のホスト識別子４０
３ｂを、データフィールド７０４に作成された共起関係
を持つレコードを格納し、通信プログラム３１１にメッ
セージを送る。Process 1517 is a process of storing the word information input in process 1513 as co-occurrence data of the co-occurrence relationship created in process 1507. Process 1405 is process 14
This is a process of storing the co-occurrence relation created in 03 in the co-occurrence relation file 305. Process 1407 is a process of distributing the co-occurrence relationship created in process 1403 to other computers. In the process 1407, “OUT” is set in the input / output flag field 701 of the intermediate file 309, the host identifier 403a of the co-occurrence relation is set in the host identifier field 702, and the host identifier 40 of the co-occurrence relation is set in the file identifier field 703.
3b stores the record having the co-occurrence relationship created in the data field 704, and sends a message to the communication program 311.

【００４４】処理１４０９は処理１４０１で表示した画
面を削除する処理である。処理１１７は表示枠９０３の
スクロール処理である。表示枠９０３は記号９１３を選
択すると上へ、記号９１４を選択すると下へ、記号９１
５を選択すると左へ、記号９１６を選択すると右へスク
ロールされる。Process 1409 is a process for deleting the screen displayed in process 1401. Process 117 is a scroll process of the display frame 903. When the symbol 913 is selected, the display frame 903 moves up; when the symbol 914 is selected, it moves down;
Selecting 5 scrolls to the left and selecting symbol 916 scrolls to the right.

【００４５】本実施例の効果は、自計算機で作成した共
起関係を他計算機に配布することで、自計算機の共起関
係が他計算機上にも存在する。これにより、自計算機を
停止させても、他計算機では自計算機の共起関係を用い
た誤字の修正が可能になる。The effect of this embodiment is that the co-occurrence relation created by the self-computer is distributed to the other computers, so that the co-occurrence relation of the self-computer also exists on the other computers. As a result, even if the own computer is stopped, other computers can correct typographical errors using the co-occurrence relation of the own computer.

【００４６】他の実施例として、共起関係の代わりに図
１７の正誤関係を用いた誤字の修正方法について述べ
る。正誤関係について図１７を用いて説明する。正誤関
係は正誤関係ファイル１６０５に格納される。データ１
７０１は正誤関係の見出しである。正誤関係の見出しに
は見出し語１７０１ａ、ホスト識別子１７０１ｂ、およ
びファイル識別子１７０１ｃからなる。見出し語１７０
１ａは誤字が含まれている単語である。データ１７０３
は修正語である。修正語１７０３は見出し語１７０１ａ
に含まれる誤字に対する正字が含まれている単語であ
る。正誤関係の見出しは修正語へのポインタ１７０２
と、次の正誤関係へのポインタ１７０５を持つ。修正語
１７０３は次の修正語へのポインタ１７０４を持つ。As another embodiment, a method of correcting a typographical error using the correctness relationship shown in FIG. 17 instead of the cooccurrence relationship will be described. The correctness relationship will be described with reference to FIG. The correctness relationship is stored in the correctness relationship file 1605. Data 1
Reference numeral 701 is a headline of the correctness relationship. The headline of the correctness relationship includes a headword 1701a, a host identifier 1701b, and a file identifier 1701c. Entry word 170
1a is a word containing a typographical error. Data 1703
Is a modified word. The modified word 1703 is a headword 1701a
It is a word that contains orthographic characters for typographical errors included in. The headline of the correctness relationship is the pointer 1702 to the correction word.
And a pointer 1705 to the next correctness relationship. The correction word 1703 has a pointer 1704 to the next correction word.

【００４７】システム構成は図１６のようになる。テキ
スト修正支援プログラム１６０１は、単語ラティスファ
イル３０３から読み込んだデータに対して、正誤関係フ
ァイル１６０５と中間ファイル１６０９から読み込んだ
正誤関係を用いた誤字の自動修正ならびにユーザの手作
業による誤字の修正を行い、修正結果をテキストファイ
ル３０７に出力する。テキスト修正支援プログラム１６
０１と通信プログラム１６１１は主記憶装置２０１（２
２１）上に存在する。仮想記憶が実現されている計算機
の場合は仮想記憶上に存在する。The system configuration is as shown in FIG. The text correction support program 1601 automatically corrects typographical errors in the data read from the word lattice file 303 using the correctness / correlation relationship read from the correctness / correlation file 1605 and the intermediate file 1609, and corrects the typographical errors by the user's manual work. , And outputs the correction result to the text file 307. Text correction support program 16
01 and the communication program 1611 are stored in the main storage device 201 (2
21) present above. In the case of a computer that realizes virtual memory, it exists on the virtual memory.

【００４８】単語ラティスファイル３０３、正誤関係フ
ァイル１６０５、テキストファイル３０７および中間フ
ァイル１６０９は２次記憶装置２０７（２２７）上に存
在する。テキスト修正支援プログラム１６０１と通信プ
ログラム１６１１のプログラムは別のプロセスである。
この２つのプログラムはプロセス間通信によって、命令
およびデータのやりとりができる。The word lattice file 303, the correct / wrong relationship file 1605, the text file 307, and the intermediate file 1609 exist on the secondary storage device 207 (227). The text correction support program 1601 and the communication program 1611 are different processes.
These two programs can exchange instructions and data by interprocess communication.

【００４９】図１８に中間ファイル１６０９の論理構造
を挙げる。中間ファイル１６０９にはフィールド１８０
１、１８０２、１８０３、１８０４を持つレコード１８
１０が格納される。フィールド１８０１は入出力フラグ
フィールドである。入出力フラグフィールドには外部の
計算機に配布するデータか外部の計算機から取得したデ
ータかを区別するフラグを格納する。フィールド１８０
２はホスト識別子フィールドである。ホスト識別子フィ
ールドには単語ラティス中のホスト識別子４０３ａの値
を格納する。フィールド１８０３はファイル識別子フィ
ールドである。ファイル識別子フィールドには単語ラテ
ィス中のファイル識別子４０３ｂの値を格納する。フィ
ールド１８０４はデータフィールドである。データフィ
ールドには正誤関係を格納する。データフィールドに格
納される正誤関係は正誤関係ファイル１６０５の論理構
造からホスト識別子１７０１ｂとファイル識別子１７０
１ｃを除いたものと等しい。FIG. 18 shows the logical structure of the intermediate file 1609. The intermediate file 1609 has a field 180
Record 18 with 1, 1802, 1803, 1804
10 is stored. Field 1801 is an input / output flag field. The input / output flag field stores a flag for distinguishing between data to be distributed to an external computer and data acquired from the external computer. Field 180
2 is a host identifier field. The value of the host identifier 403a in the word lattice is stored in the host identifier field. Field 1803 is a file identifier field. The file identifier field stores the value of the file identifier 403b in the word lattice. Field 1804 is a data field. The data field stores the correctness relationship. The true / false relationship stored in the data field is determined from the logical structure of the true / false relationship file 1605 by the host identifier 1701b and the file identifier 170.
Equivalent to 1c.

【００５０】図１９は通信プログラム１６１１の処理フ
ローである。処理１９０１は自計算機と他計算機からの
メッセージ入力待ち処理である。処理１９０３は処理１
９０１で得られた入力の入力先の計算機の判定処理であ
る。入力先が自計算機ならば中間ファイル１６０９から
入出力フラグが”ＯＵＴ”の正誤関係を他計算機に配布
し、入力先が他計算機ならば配布された正誤関係に入出
力フラグ”ＩＮ”をつけて中間ファイル１６０９に格納
する。処理１９０５は中間ファイル１６０９中の入出力
フラグが”ＯＵＴ”である正誤関係の有無を判定する処
理である。”ＯＵＴ”フラグを持つレコードの作成はテ
キスト修正支援プログラム１６０１の正誤関係配布処理
１４０７で行われる。該当する正誤関係が存在するなら
ば、他計算機に正誤関係を配布し、配布した正誤関係を
中間ファイル１６０９から削除する。該当する正誤関係
が存在しないならば、処理１９０１に戻る。FIG. 19 is a processing flow of the communication program 1611. Processing 1901 is processing for waiting for message input from the own computer and other computers. Process 1903 is process 1
This is the determination processing of the input destination computer of the input obtained in 901. If the input destination is the own computer, the correctness relationship with the input / output flag "OUT" is distributed from the intermediate file 1609 to the other computer, and if the input destination is another computer, the input / output flag "IN" is attached to the distributed correctness relationship. It is stored in the intermediate file 1609. A process 1905 is a process of determining the presence / absence of a correct / incorrect relationship in which the input / output flag in the intermediate file 1609 is “OUT”. The creation of the record having the “OUT” flag is performed by the correctness / correlation distribution process 1407 of the text correction support program 1601. If there is a corresponding right / wrong relationship, the right / wrong relationship is distributed to other computers, and the distributed right / wrong relationship is deleted from the intermediate file 1609. If there is no corresponding right / wrong relationship, the process returns to processing 1901.

【００５１】処理１９０７は他計算機に正誤関係を配布
する処理である。処理１９０９は他計算機に配布した正
誤関係を中間ファイル１６０９から削除する処理であ
る。処理１９１１は配布された正誤関係に入出力フラ
グ”ＩＮ”をつけて中間ファイル１６０９に格納する処
理である。テキスト修正支援プログラム１６０１のシス
テムフローを図２０を用いて説明する。処理２００１は
処理１０３で選択された文字認識単位に対して正誤関係
を用いて誤字の修正を行う処理である。処理２００１を
図２１を用いて説明する。Process 1907 is a process for distributing the correctness relationship to other computers. A process 1909 is a process for deleting the correct / wrong relationship distributed to another computer from the intermediate file 1609. A process 1911 is a process of adding the input / output flag “IN” to the distributed correctness relationship and storing it in the intermediate file 1609. The system flow of the text correction support program 1601 will be described with reference to FIG. Process 2001 is a process of correcting a typographical error by using the correctness relationship with the character recognition unit selected in process 103. The process 2001 will be described with reference to FIG.

【００５２】図２１は正誤関係を用いて誤字の自動修正
を行う処理の処理フローである。処理２１０１は第１候
補の数をカウントするカウンタの初期化処理である。処
理２１０３は照合対象となる単語数の取得処理である。
照合対象は第１候補である。処理２１０５は修正処理の
終了判定である。単語数カウンタが照合対象の単語数を
上回った場合、図２１の処理は終了する。単語数カウン
タが照合対象の単語数に満たない場合は正誤関係を用い
た単語修正処理を行う。処理２１０７は正誤関係ファイ
ル１６０５とｉ番めの単語の第１候補を照合する処理で
ある。照合は正誤関係ファイル１６０５の見出し語１６
０１ａが第１候補と等しくかつホスト識別子１６０１ｂ
がホスト識別子４０３ａと等しくかつファイル識別子１
６０１ｃがファイル識別子４０３ｂと等しい正誤関係を
検索することである。FIG. 21 is a processing flow of processing for automatically correcting a typographical error by using the correctness relationship. Process 2101 is a counter initialization process for counting the number of first candidates. Process 2103 is a process of acquiring the number of words to be matched.
The matching target is the first candidate. A process 2105 is a determination of the end of the correction process. When the word number counter exceeds the number of words to be collated, the processing of FIG. 21 ends. When the word number counter does not reach the number of words to be matched, the word correction process using the correctness relationship is performed. The process 2107 is a process of collating the correct / wrong relationship file 1605 with the first candidate of the i-th word. The collation is headword 16 of the true / false relationship file 1605.
01a is equal to the first candidate and host identifier 1601b
Is equal to the host identifier 403a and the file identifier 1
601c is to search for a true / false relationship equal to the file identifier 403b.

【００５３】処理２１０９は処理２１０７の成功／不成
功の判定処理である。処理２１０７が成功したならば、
ｉ番めの単語の第１候補を正誤関係に含まれる修正語と
する。処理２１０７が失敗したならば、中間ファイル１
６０９中の正誤関係を用いた照合を行う。処理２１１１
は処理２１０７で照合した正誤関係に含まれる修正語を
ｉ番めの単語の第１候補に代入する処理である。処理２
１１３は中間ファイル１６０９中の入出力フラグフィー
ルド１８０１が”ＩＮ”であるレコードの正誤関係とｉ
番めの単語の第１候補を照合する処理である。照合は中
間ファイル１６０９のデータフィールド１８０４に含ま
れる見出し語１７０１ａが第１候補と等しくかつホスト
識別子フィールド１８０２がホスト識別子４０３ａと等
しくかつファイル識別子フィールド１８０３がファイル
識別子４０３ｂと等しい正誤関係を検索することであ
る。Process 2109 is a success / failure determination process of process 2107. If the process 2107 is successful,
The first candidate of the i-th word is set as the correction word included in the correctness relationship. If the process 2107 fails, the intermediate file 1
Matching is performed using the correctness relationship in 609. Process 2111
Is a process of substituting the correction word included in the correctness relation checked in the process 2107 into the first candidate of the i-th word. Process 2
Reference numeral 113 denotes a correct / wrong relationship of records in which the input / output flag field 1801 in the intermediate file 1609 is “IN” and i
This is a process of collating the first candidate of the second word. The collation is performed by searching for a correct relationship in which the entry word 1701a included in the data field 1804 of the intermediate file 1609 is equal to the first candidate, the host identifier field 1802 is equal to the host identifier 403a, and the file identifier field 1803 is equal to the file identifier 403b. is there.

【００５４】処理２１１５は中間ファイル１６０９中の
入出力フラグフィールド１８０１が”ＩＮ”であるレコ
ードの正誤関係を正誤関係ファイル１６０５に格納する
処理である。格納したレコードは中間ファイル１６０９
から削除する。処理２１１７は処理２１１３の成功／不
成功の判定処理である。処理２１１３が成功したなら
ば、処理２１１１を行う。処理２１１３が失敗したなら
ば、次の第１候補について正誤関係の照合を行う。処理
２１１９は照合した第１候補の数をカウントする処理で
ある。処理２００２は単語列のユーザによる手動修正処
理である。処理２００２を図２２を用いて説明する。処
理２２０３はユーザによる単語の修正処理である。処理
２２０３を図２３を用いて説明する。The process 2115 is a process for storing the correct / wrong relationship of the record of which the input / output flag field 1801 in the intermediate file 1609 is “IN” in the correct / wrong relationship file 1605. The stored record is the intermediate file 1609.
Remove from. Process 2117 is a success / failure determination process of process 2113. If the process 2113 is successful, the process 2111 is performed. If the process 2113 fails, the correctness of the next first candidate is checked. Process 2119 is a process of counting the number of first candidates checked. The process 2002 is a manual correction process by the user of the word string. The process 2002 will be described with reference to FIG. Process 2203 is a word correction process by the user. The process 2203 will be described with reference to FIG.

【００５５】図２３はユーザによる単語の修正処理の処
理フローである。処理２３０１はユーザからの入力待ち
である。処理２３０３は処理２３０１で得られた入力の
判定処理である。処理２３０１でスクロールコマンドが
入力されたならば表示枠１３０２のスクロールを行う。
処理２３０１で第ｎ候補の記号が選択されたならば誤字
の修正と正誤関係の生成を行う。処理２３０１で終了コ
マンドが入力されたならば図２３の処理を終了する。処
理２３０７は選択された第ｎ候補に対する第１候補を見
出しとする正誤関係を作る処理である。処理２３０９は
選択された第ｎ候補を処理２３０７で作成した正誤関係
の修正語に代入する処理である。FIG. 23 is a processing flow of word correction processing by the user. A process 2301 is waiting for an input from the user. The process 2303 is the input determination process obtained in the process 2301. If a scroll command is input in processing 2301, the display frame 1302 is scrolled.
If the nth candidate symbol is selected in the process 2301, the typographical error is corrected and the correctness relation is generated. If the end command is input in process 2301, the process of FIG. 23 is ended. A process 2307 is a process of forming a correct / incorrect relationship using the first candidate as a heading for the selected nth candidate. Process 2309 is a process of substituting the selected n-th candidate for the corrective word of the correctness relationship created in process 2307.

【００５６】処理２３１１は選択された第ｎ候補を第１
候補として表示枠１３０２に表示する処理である。処理
２３１７は処理２３０７で作成した正誤関係を記憶する
処理である。処理２２０５は処理２２０３で作成された
正誤関係を正誤関係ファイル１６０５に格納する処理で
ある。処理２２０７は処理２２０３で作成された正誤関
係を他計算機へ配布する処理である。処理２２０７では
中間ファイル１６０９に入出力フラグフィールド１８０
１に”ＯＵＴ”を、ホスト識別子フィールド１８０２に
正誤関係のホスト識別子４０３ａを、ファイル識別子フ
ィールド１８０３に正誤関係のホスト識別子４０３ｂ
を、データフィールド１８０４に作成された正誤関係を
持つレコードを格納し、通信プログラム１６１１にメッ
セージを送る。本実施例の効果は、共起関係を用いたも
のに比べ簡単な処理で、誤字の修正ができる。The process 2311 sets the selected nth candidate to the first candidate.
This is a process of displaying in the display frame 1302 as a candidate. Process 2317 is a process of storing the correctness relationship created in process 2307. A process 2205 is a process of storing the correct / wrong relationship created in the process 2203 in the correct / wrong relationship file 1605. Process 2207 is a process of distributing the correctness relationship created in process 2203 to other computers. In process 2207, the input / output flag field 180 is added to the intermediate file 1609.
“1” is “OUT”, the host identifier field 1802 is the host identifier 403a having the correct relationship, and the file identifier field 1803 is the host identifier 403b having the correct relationship.
Is stored in the data field 1804, and the record having the correct relationship is stored, and the message is sent to the communication program 1611. The effect of this embodiment is that the typographical error can be corrected by a simpler process than that using the co-occurrence relation.

【００５７】[0057]

【発明の効果】本発明の効果は、自計算機で生成された
共起関係および他計算機で生成された修正パターンを用
いて誤字を自動修正することで、自分ならびに共同作業
者の作成した修正パターンを利用して誤字の修正ができ
るので、誤字の少ない文字認識結果が得られる。The effect of the present invention is to automatically correct a typographical error by using the co-occurrence relation generated by the self-computer and the correction pattern generated by another computer, so that the correction pattern created by myself and the collaborator can be obtained. Since the typographical error can be corrected by using, the character recognition result with few typographical errors can be obtained.

[Brief description of drawings]

【図１】システムフローについて説明する図である。FIG. 1 is a diagram illustrating a system flow.

【図２】ハードウエア構成について説明する図である。FIG. 2 is a diagram illustrating a hardware configuration.

【図３】システム構成について説明する図である。FIG. 3 is a diagram illustrating a system configuration.

【図４】単語ラティスファイルの論理構造について説明
する図である。FIG. 4 is a diagram illustrating a logical structure of a word lattice file.

【図５】共起関係ファイル論理構造について説明する図
である。FIG. 5 is a diagram illustrating a logical structure of a co-occurrence relation file.

【図６】テキストファイルの論理構造について説明する
図である。FIG. 6 is a diagram illustrating a logical structure of a text file.

【図７】中間ファイルの論理構造について説明する図で
ある。FIG. 7 is a diagram illustrating a logical structure of an intermediate file.

【図８】通信プログラムの処理フローについて説明する
図である。FIG. 8 is a diagram illustrating a processing flow of a communication program.

【図９】テキスト表示画面について説明する図である。FIG. 9 is a diagram illustrating a text display screen.

【図１０】単語ラティスファイルから単語ラティスを読
み込む処理の処理フローについて説明する図である。FIG. 10 is a diagram illustrating a processing flow of processing for reading a word lattice from a word lattice file.

【図１１】修正結果をテキストファイルに格納する処理
の処理フローについて説明する図である。FIG. 11 is a diagram illustrating a processing flow of processing for storing a correction result in a text file.

【図１２】共起関係を用いて誤字の自動修正を行う処理
の処理フローについて説明する図である。FIG. 12 is a diagram illustrating a processing flow of processing for automatically correcting a typographical error using a co-occurrence relationship.

【図１３】手動修正用の画面について説明する図であ
る。FIG. 13 is a diagram illustrating a screen for manual correction.

【図１４】ユーザによる手動修正処理の処理フローにつ
いて説明する図である。FIG. 14 is a diagram illustrating a processing flow of manual correction processing by a user.

【図１５】ユーザによる単語修正処理の処理フローにつ
いて説明する図である。FIG. 15 is a diagram illustrating a processing flow of word correction processing by a user.

【図１６】その他の実施例のシステム構成について説明
する図である。FIG. 16 is a diagram illustrating a system configuration of another embodiment.

【図１７】その他の実施例の正誤関係について説明する
図である。FIG. 17 is a diagram illustrating a correctness relationship of another embodiment.

【図１８】その他の実施例の中間ファイルの論理構造に
ついて説明する図である。FIG. 18 is a diagram illustrating a logical structure of an intermediate file according to another embodiment.

【図１９】その他の実施例の通信プログラムについて説
明する図である。FIG. 19 is a diagram illustrating a communication program according to another embodiment.

【図２０】その他の実施例のシステムフローについて説
明する図である。FIG. 20 is a diagram illustrating a system flow of another embodiment.

【図２１】その他の実施例の誤字の自動修正処理につい
て説明する図である。FIG. 21 is a diagram illustrating an automatic erroneous character correction process according to another embodiment.

【図２２】その他の実施例の誤字の手動修正の処理フロ
ーについて説明する図である。FIG. 22 is a diagram illustrating a processing flow of manually correcting a typographical error in another embodiment.

【図２３】その他の実施例の単語手動修正の処理フロー
について説明する図である。FIG. 23 is a diagram illustrating a processing flow of word manual correction according to another embodiment.

[Explanation of symbols]

１１３…単語列自動修正処理。１１５…単語列手動修正
処理。３０１…テキスト修正支援プログラム。３０３…
単語ラティスファイル。３０５…共起関係ファイル。３
０７…テキストファイル。３０９…中間ファイル。３１
１…通信プログラム。113 ... Automatic word string correction processing. 115 ... Manual correction processing of word string. 301 ... Text correction support program. 303 ...
Word lattice file. 305 ... Co-occurrence relation file. Three
07 ... Text file. 309 ... Intermediate file. 31
1 ... communication program.

Claims

[Claims]

1. A method for correcting a typographical error included in a character recognition result, wherein when a user selects a character-recognized text to be corrected, the typographical error is automatically detected using a co-occurrence relation of words stored in advance. If there is a typographical error in the recognized text that has been corrected and the character to be corrected is displayed, the orthographic candidates for all words are displayed, and the word containing the orthographic characters and its co-occurrence word are accepted and entered by the user. A method for correcting a typographical error, which is characterized in that a new co-occurrence relation is generated and stored from a word including an orthographic character and its co-occurrence word.

2. The correction method according to claim 1, wherein the typographical error is corrected automatically by using the co-occurrence relation stored in advance and the co-occurrence relation of the words sent from another computer, and a new character generated by the own computer is generated. A method of correcting a typographical error, which is characterized by storing the co-occurrence relationship in the own computer and distributing it to other computers.

3. The correction method according to claim 2, wherein erroneous characters are automatically corrected using a co-occurrence relation of words generated from the same text as the character-recognized text to be corrected,
A method of correcting a typographical error, which comprises adding a computer name and a file name having a character-recognized text to be corrected to a newly generated co-occurrence relation.

4. A method of correcting a typographical error included in a character recognition result, wherein when the user selects a character-recognized text to be corrected, the typographical error is automatically corrected using a pre-stored correctness relationship. In addition, if there is a typographical error in the character-recognized text to be corrected, all orthographic word candidates are displayed, and the input of the word containing the orthographic characters is accepted. A method of correcting a typographical error characterized by generating and storing a relationship.

5. The correction method according to claim 4, wherein a typographical error is corrected automatically by using a correctness relationship stored in advance and a correctness relationship between words sent from another computer, and a new correctness relationship generated by the own computer. A method of correcting a typographical error, which is characterized by storing in a computer and distributing it to other computers.

6. The correction method according to claim 5, wherein the typographical error is automatically corrected by using the correctness relationship between words generated from the same text as the character-recognized text to be corrected,
A method of correcting a typographical error, which comprises adding a computer name and a file name having a character-recognized text to be corrected to a newly created correctness relationship.