JPH11167576A

JPH11167576A - Document proofreading device

Info

Publication number: JPH11167576A
Application number: JP9333902A
Authority: JP
Inventors: Jun Ibuki; 潤伊吹
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1997-12-04
Filing date: 1997-12-04
Publication date: 1999-06-22
Anticipated expiration: 2017-12-04
Also published as: JP3936453B2

Abstract

PROBLEM TO BE SOLVED: To detect an error out of a text by collating extracted fact data with each record contained in a fact data base and correcting the detected mismatched data and the expression included in the corresponding text. SOLUTION: A data extraction part 1 analyzes the descriptions concerning the fact data included in a text and extracts the description in a form that can be registered in a fact data base 4 to send it to a consistency verification part 2. The part 2 retrieves the data concerning the fact equal to the extracted fact data out of the data base 4 to check the presence or absence of the points conflicting with each other between the retrieved fact data and the fact data extracted from the text. When the data conflicting with each other are detected at the part 2, an error processing part 3 corrects the extracted fact data based on the data stored in the data base 4 to secure matching between both fact data. The data base 4 stores the fact data which show the facts.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文書校正装置に係
り、特にテキスト文書中の事実の記述が正確か否かを既
存のデータベース内の事実データとチェックしてテキス
ト文書の持つ誤りの検出や訂正を行うものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document proofreading apparatus, and more particularly, to a method of detecting or correcting an error of a text document by checking whether the description of the fact in the text document is accurate or not with fact data in an existing database. About what to do.

【０００２】[0002]

【従来の技術】例えば新聞記事等のテキストに対する従
来の誤りの指摘技術としては、テキストを構成する文字
列を単語群に切り出して辞書と照合を行う形態素解析の
結果から未登録単語等の、例えば正確には「フセイン」
と書くべきところを未登録単語である「フサイン」と書
いたような場合、この誤りらしい部分を抽出してそのま
ま提示されるのを指摘するもの、あるいは同音異語誤
り、インタフェースあるいはインターフェイスのような
外来語をカナ書きするときに生ずるカタカナ表記の揺れ
等誤りの種類をある程度限定した上で、その訂正までを
扱うもの等が存在していた。2. Description of the Related Art As a conventional technique for pointing out an error in a text of a newspaper article or the like, for example, a character string constituting a text is cut out into a group of words and collated with a dictionary. To be precise, "Hussein"
If you write an unregistered word "Fusain" where you should write, it is necessary to extract this erroneous part and point out that it is presented as it is. Some types of errors, such as fluctuations in katakana notation that occur when writing foreign words in katakana, are limited to some extent, and there are some that handle the correction up to the correction.

【０００３】[0003]

【発明が解決しようとする課題】これらの装置は、基本
的には形態素解析に失敗する種類の非単語誤り、つまり
形態素解析で誤った結果、単語として認識されなかった
場合や、同音異語誤り等を対象としたものであり、テキ
スト中の記述が事実と食い違うことを指摘することがで
きなかった。These devices are basically non-word errors of the type that fail in morphological analysis, that is, they are not recognized as words as a result of erroneous morphological analysis, It was not possible to point out that the description in the text was inconsistent with the fact.

【０００４】実際に新聞社の校閲部で行う作業の多く
が、例えば数字や名前等のデータを常識や様々な知識と
つき合わせることによってその整合性を判断することで
あり、既存の校正支援システムは誤りの大きな部分を占
める事実としての誤りの検出を未だ取り扱うことができ
なかった。Most of the work actually performed by the review section of a newspaper company is to judge the consistency of data, such as numbers and names, with common sense and various kinds of knowledge. Could not yet handle the detection of errors as facts that account for a large portion of the errors.

【０００５】従って本発明の目的は、テキスト中におけ
るこのような事実誤りを検出可能とする文書校正装置を
提供することである。Accordingly, it is an object of the present invention to provide a document proofreading apparatus capable of detecting such a fact error in a text.

【０００６】[0006]

【課題を解決するための手段】本発明の構成を図１に示
す。図１において、１はデータ抽出部、２は整合性検証
部、３は誤り処理部、４は事実データデータベースであ
る。FIG. 1 shows the configuration of the present invention. In FIG. 1, 1 is a data extraction unit, 2 is a consistency verification unit, 3 is an error processing unit, and 4 is a fact data database.

【０００７】本発明の前記目的は、請求項に記載された
下記の発明により達成することができる。 (1) 請求項１に記載された文書校正装置では、特定の事
項に関するデータを蓄積した事実データベースと、入力
されたテキストから事実データを抽出するデータ抽出部
と、抽出された事実データを事実データベース中の各レ
コードと照合し、その不整合を検出する整合性検証部
と、不整合データ及び対応するテキスト中の表現の修正
を行う誤り処理部と、を具備したことを特徴とする。The above object of the present invention can be achieved by the following inventions described in the claims. (1) In the document proofreading apparatus according to claim 1, a fact database storing data relating to a specific matter, a data extracting unit for extracting fact data from input text, and a fact database for extracting the extracted fact data The system is characterized by comprising a consistency verification unit that checks each record in the data and detects the inconsistency, and an error processing unit that corrects the inconsistent data and the corresponding expression in the text.

【０００８】(2) 請求項２に記載された文書校正装置で
は、請求項１における整合性検証部において、前記事実
データベース中のデータについて、各フィールド毎に誤
りの可能性を評価しておき、テキスト中から抽出された
データが事実データベース中のデータと完全に一致せ
ず、かつ事実データベース中に対応する可能性のあるデ
ータが複数存在したとき、誤り可能性に基づいてフィー
ルド値の変更のコストを評価し、最も低コストで事実デ
ータベース中のデータと対応する変更を選択して、誤り
の内容を判断することを特徴とする。(2) In the document proofreading device according to the second aspect, the consistency verification unit according to the first aspect evaluates the possibility of an error for each field in the data in the fact database, The cost of changing field values based on the likelihood of error when the data extracted from the text does not exactly match the data in the fact database and there is more than one possible data in the fact database Is evaluated, the change corresponding to the data in the fact database at the lowest cost is selected, and the content of the error is determined.

【０００９】(3) 請求項３に記載された文書校正装置で
は、前記データ抽出部において事実の変更に関するデー
タを抽出し、前記整合性検証部で変更前の状態に対して
の整合性のチェックを行い、対応するデータが検索され
て整合性が検証されたデータに関しては、対応データを
変更後の状態へ修正する文書校正装置において、期日を
含む事実データを扱う場合に、事実の変更に関するデー
タをテキストから抽出した後、前記整合性検証部におい
て変更前の状態が抽出データの記載時において存在し得
るか否かを検証し、誤り処理部において更に旧事実の終
了期日の設定、新事実の生起期日の設定を行うことを特
徴とする。(3) In the document proofreading device according to the third aspect, the data extraction unit extracts data relating to a change in fact, and the consistency verification unit checks consistency with the state before the change. When the corresponding data is searched and the data is verified for consistency, the document proofreading device that corrects the corresponding data to the state after the change, when handling the fact data including the due date, the data related to the fact change Is extracted from the text, the consistency verification unit verifies whether the state before the change can exist at the time of writing the extracted data, and the error processing unit further sets the end date of the old fact, It is characterized in that the date of occurrence is set.

【００１０】(4) 請求項４に記載された文書校正装置で
は、テキストから抽出された各事実データを、事実デー
タベースの中の既存のデータとの整合性をチェックして
問題がないデータについては順次事実データベースに登
録することによりテキスト中に記載された事実データ同
士の整合性をチェックする文書校正装置において、テキ
ストコーパスを対象とする場合に、テキストを一旦分類
し、各分類中のテキストを元に分類毎に固有の事実デー
タベースを構築し、その中で整合性のチェックを整合性
検証部により行うことを特徴とする。(4) In the document proofreading device according to the fourth aspect, each fact data extracted from the text is checked for consistency with existing data in the fact database, and for data having no problem, In a document proofreading device that checks the consistency between fact data described in texts by sequentially registering them in a fact database, when targeting a text corpus, the texts are classified once, and the text in each classification is used as the source. In addition, a unique fact database is constructed for each classification, and the consistency check is performed by the consistency verification unit in the fact database.

【００１１】(5) 請求項５に記載された文書校正装置で
は、テキストから抽出された各事実データを、事実デー
タベース中の既存のデータとの整合性をチェックして問
題がないデータについては順次事実データベースに登録
することによりテキスト中に記載された事実データ同士
の整合性をチェックする文書校正装置において、整合性
検証部が、テキスト全体を一度に処理するのではなく、
テキストの文書構造を参照して特定の文書構造に対応す
る部分を抽出して、その中で整合性の判断を行うことを
特徴とする。(5) In the document proofreading apparatus according to the fifth aspect, each fact data extracted from the text is checked for consistency with existing data in the fact database, and if there is no problem, data is sequentially checked. In a document proofreading device that checks the consistency between fact data described in a text by registering it in a fact database, a consistency verification unit does not process the entire text at once,
It is characterized in that a part corresponding to a specific document structure is extracted with reference to the document structure of the text, and the consistency is determined therein.

【００１２】本発明では下記の作用を奏する。 (1) 入力されたテキストから事実データを抽出し、抽出
された事実データを事実データベース中の各レコードと
参照し、その不整合を検出し、不整合データのテキスト
中の表現の修正を行うので、原テキスト中の事実と異な
る誤った不整合データを正確に表示し、校正処理するこ
とができる。The present invention has the following effects. (1) Since fact data is extracted from the input text, the extracted fact data is referred to each record in the fact database, the inconsistency is detected, and the expression of the inconsistent data in the text is corrected. Incorrect and inconsistent data different from the facts in the original text can be accurately displayed and corrected.

【００１３】(2) テキストから抽出した抽出データと事
実データベース中のデータ間の不整合が検出されたと
き、両者の信頼性を評価して信頼性の劣る方を誤りと判
断するので、正確な校正を行うことができる。(2) When inconsistency between the extracted data extracted from the text and the data in the fact database is detected, the reliability of the two is evaluated, and the one with the lower reliability is determined as an error. Calibration can be performed.

【００１４】(3) 事実データを事実データベースに登録
する際に、事実データの生起頻度や情報源の信頼性をチ
ェックして、データの不整合部分についてこの信頼性に
基づいて各事実データの信頼性を判定して誤りデータを
判断するので、正確な校正を行うことができる。(3) When the fact data is registered in the fact database, the frequency of occurrence of the fact data and the reliability of the information source are checked, and the inconsistency of the data is checked based on the reliability. Since the erroneous data is determined by determining the characteristics, accurate calibration can be performed.

【００１５】(4) 事実データベース中のデータについ
て、各フィールド毎に誤りの可能性を評価し、テキスト
中のデータが事実データベース中のデータと一致せず、
しかも事実データベース中に対応する可能性のあるデー
タが複数存在したとき、前記誤りの可能性によりフィー
ルド値の変更コストを評価し、最も低いコストで事実デ
ータベース中のデータと対応する変更を選択して誤りの
内容を判断するので、正確な校正を行うことができる。(4) For the data in the fact database, the possibility of error is evaluated for each field, and if the data in the text does not match the data in the fact database,
Furthermore, when there is a plurality of data that may correspond in the fact database, the cost of changing the field value is evaluated based on the possibility of the error, and the change corresponding to the data in the fact database at the lowest cost is selected. Since the content of the error is determined, accurate calibration can be performed.

【００１６】(5) 事実の変更に関するテキストデータを
抽出したとき、変更前の状態で整合性のチェックを行
い、整合性が得られたものについて、事実データベース
中の対応データを変更後の状態に修正するので、事実デ
ータベースの内容を正確なものに更新することができ、
正確な校正を行うことができる。(5) When the text data relating to the fact change is extracted, the consistency is checked in the state before the change, and if the consistency is obtained, the corresponding data in the fact database is changed to the state after the change. So we can update the facts database to be accurate,
Accurate calibration can be performed.

【００１７】(6) 事実データベースにおいて各事実につ
いての生起期日、終了期日等を併記したので、テキスト
中よりその事実を抽出したとき、その事実データの記載
日時における正誤を正確に、しかも簡単に判断できる。(6) Since the occurrence date, end date, etc. of each fact are described in the fact database, when the fact is extracted from the text, it is possible to accurately and easily judge the correctness of the date and time of the fact data. it can.

【００１８】(7) テキストから抽出した事実データと時
間を含めてキーが一致するデータが事実データベース中
に存在せず、この抽出した事実データの時間以外のキー
が事実データベース中に存在するとき、テキスト中の事
実データに対して時間的な前後関係を表すように修正す
るので、正確な校正を行うことができる。(7) When there is no data in the fact database whose key coincides with the fact data extracted from the text, including the time, and a key other than the time of the extracted fact data exists in the fact database, Since the fact data in the text is corrected so as to represent the temporal context, accurate proofreading can be performed.

【００１９】(8) 年令の如く、期日によって規則的に変
化する事実データについてデータの記載期日における値
を事実データベース中のデータに基づき計算して整合す
るので、このようなデータを正確に校正することができ
る。(8) For fact data that regularly changes depending on the due date, such as age, the values on the due date of the data are calculated and matched based on the data in the fact database, so such data is accurately calibrated. can do.

【００２０】(9) 名前を略称とか愛称のような正確に一
致しない場合でも、名前の一致性判定に関する規則を定
めておくので、一致するものとして校正することができ
る。 (10)例えばＡ米国務庁長官→Ａ長官→Ａ国務庁長官のよ
うな順序でテキストから同一レコードが、抽出した場
合、新聞等の表現では、最初は省略せずに書き、順次省
略するので、この２番目のＡ長官という表現をＡ国務庁
長官と校正することができる。(9) Even when the names do not match exactly, such as abbreviations or nicknames, rules for determining the consistency of names are determined, so that the names can be calibrated as matching. (10) For example, if the same record is extracted from the text in the order of Secretary A, Secretary A → Secretary A, Secretary A, in a newspaper or the like, the first record is written without omitting it, and it is omitted sequentially. The second expression of Secretary A can be proofread with Secretary A of State.

【００２１】(11)キーを指定してもユニークに値が定ま
らず、複数の値が存在するときは各値について一致する
か否かを調べ、一致する値が存在しない場合に不整合と
判定するので、正確な校正を行うことができる。(11) Even if a key is specified, a value is not uniquely determined. If a plurality of values exist, it is checked whether or not each value matches. If no matching value exists, it is determined that there is no match. Therefore, accurate calibration can be performed.

【００２２】(12)予め基準とするテキスト群から抽出さ
れた各事実データを元にして事実データベースを構築す
るので、事実データベースを正確なものとすることがで
き、正確な校正を行うことができる。(12) Since the fact database is constructed on the basis of each fact data extracted from the reference text group in advance, the fact database can be made accurate and accurate proofreading can be performed. .

【００２３】(13)テキストから抽出された各事実データ
を事実データベース中の既存のデータとの整合性をチェ
ックして問題のないデータについては順次登録してゆく
ことにより、対称のテキスト集合の中で矛盾しているも
のを検出することが可能となる。(13) Each fact data extracted from the text is checked for consistency with the existing data in the fact database, and data having no problem is sequentially registered, so that the fact data can be stored in the symmetric text set. In this way, it is possible to detect inconsistencies.

【００２４】(14)事実データベースを種別毎に複数備
え、テキストから事実データベースの種別に関する情報
を抽出して参照すべき事実データベースを選択できるの
で、正確な校正を行うことができる。(14) Since a plurality of fact databases are provided for each type, and information on the type of the fact database can be extracted from the text to select a fact database to be referred to, accurate calibration can be performed.

【００２５】(15)各フィールド毎に生起し易い誤りの種
別情報を持ち、フィールド値の変更の内容が記載された
誤りの種類に対応するかにより変更に伴うコストを評価
したので、誤り易い部分の情報を抽出して正確な校正を
行うことができる。(15) Each field has error type information that is likely to occur, and the cost associated with the change is evaluated depending on whether the content of the change in the field value corresponds to the type of error described. Information can be extracted for accurate calibration.

【００２６】(16)実際に起きた誤りの傾向を分析し、生
起し易い種類の誤り種類に対する評価を行うので、個々
のテキストに最適な誤り検出を行うことができ、正確な
校正を行うことができる。(16) Since the tendency of an error that has actually occurred is analyzed and the type of error that is likely to occur is evaluated, an error can be detected optimally for each text, and accurate proofreading can be performed. Can be.

【００２７】(17)ある事象に従属して起こる事実データ
の変更についての表を備えて、特定の事象が起った際の
事実データベース中の他のデータの整合性をチェックす
ることができ、正確な校正を行うことができる。(17) having a table of factual data changes that occur dependent on an event, to check the consistency of other data in the facts database when a particular event occurs; Accurate calibration can be performed.

【００２８】(18)期日を含むデータに対して、事実の変
更をテキストから抽出したとき、変更前の状態が抽出デ
ータの記載時において存在し得るか否かをチェックし、
事実データベースに更に旧事実の終了期日の設定、新事
実の生起期日の設定を行うので、正確な校正を行うこと
ができる。(18) For data including a date, when a change in fact is extracted from the text, it is checked whether the state before the change can exist at the time of describing the extracted data,
Since the end date of the old fact and the date of the occurrence of the new fact are set in the fact database, accurate calibration can be performed.

【００２９】(19)終了期日または生起期日が不明な事実
データが存在したとき、その事実データの指定期日にお
ける信頼性をデータ変動の頻度、指定期日と生起又は終
了期日との差を元にして評価したので、そのデータの有
効性を正確に判断することができる。(19) When there is fact data whose end date or date of occurrence is unknown, the reliability of the fact data on the designated date is determined based on the frequency of data fluctuation, the difference between the designated date and the occurrence or end date. As a result of the evaluation, the validity of the data can be accurately determined.

【００３０】(20)テキストを一旦予め定めた手法で分類
し、各分類中のテキストを元に分類毎の固有の事実デー
タベースを構築して整合性をチェックし、問題のないデ
ータを順次登録したので、分類毎の事実データデータベ
ースの構築が可能となり、きめ細かなチェックが可能と
なり、正確な校正を行うことができる。(20) Texts are once classified by a predetermined method, a unique fact database is constructed for each classification based on the text in each classification, the consistency is checked, and data without problems are sequentially registered. Therefore, it is possible to construct a fact data database for each classification, to perform a detailed check, and to perform accurate calibration.

【００３１】(21)テキストの文書構造に付加された分類
を示すタグを参照して、例えば社会面とかスポーツ面等
の、特定の文書構造に対応する部分を抽出して整合性を
判断し、問題のないデータを順次登録するので、これま
た特定分類の事実データデータベースの構築が早くで
き、その内容を充実させきめ細かなチェックが可能とな
り、正確な校正を行うことができる。(21) With reference to the tag indicating the classification added to the document structure of the text, a portion corresponding to a specific document structure, for example, a social aspect or a sports aspect, is extracted to determine the consistency, Since no problematic data is sequentially registered, the fact data database of a specific classification can be quickly constructed, the contents thereof can be enhanced and detailed checks can be performed, and accurate calibration can be performed.

【００３２】[0032]

【発明の実施の形態】（１）本発明の第１の実施の形態本発明の第１の実施の形態を図１及び図２、図３に基づ
き説明する。図１において１はデータ抽出部、２は整合
性検証部、３は誤り処理部、４は事実データデータベー
スであり、図２はその動作説明図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS (1) First Embodiment of the Present Invention A first embodiment of the present invention will be described with reference to FIGS. In FIG. 1, 1 is a data extraction unit, 2 is a consistency verification unit, 3 is an error processing unit, 4 is a fact data database, and FIG. 2 is an explanatory diagram of its operation.

【００３３】データ抽出部１はテキスト中の事実データ
に関する記述を解析し、事実データデータベース４に登
録可能な形で抽出し、整合性検証部２に送出する。整合
性検証部２では、抽出された事実データと同一の事実に
関するデータを事実データデータベース４中より検索
し、これら検索した事実データとテキスト中から抽出し
た抽出データとの間に矛盾する点の有無をチェックする
ものである。The data extraction unit 1 analyzes the description of the fact data in the text, extracts the description in a form that can be registered in the fact data database 4, and sends it to the consistency verification unit 2. The consistency verification unit 2 searches the fact data database 4 for data relating to the same fact as the extracted fact data, and determines whether there is any contradiction between the searched fact data and the extracted data extracted from the text. Is to check.

【００３４】誤り処理部３では、前記整合性検証部２で
矛盾するデータが検出されたとき、両者の整合をとるた
め事実データデータベース４のデータに基づきテキスト
中から抽出した抽出データの修正を行うものである。な
お事実データデータベース４には、多数の事実を示す事
実データが格納されるものである。When the inconsistency data is detected by the consistency verification unit 2, the error processing unit 3 corrects the extracted data extracted from the text based on the data of the fact data database 4 in order to match the two. Things. The fact data database 4 stores fact data indicating many facts.

【００３５】図１に示す実施の形態の動作を図２により
説明する。図１に示すデータ抽出部１に、図２（）に
示す如く「イラクのフセイン国王はクルド人ゲリラに対
する攻撃を開始」という原テキストが入力されると、デ
ータ抽出部１は、図２（）に示す如く、組織名として
「イラク」を、役職として「国王」を、個人名として
「フセイン」を抽出し、これにもとづき整合性検証部２
が事実データデータベース４を参照する。この事実デー
タデータベース４中には、組織名、役職、個人名とし
て、それぞれ「イラク大統領フセイン」とか「アメ
リカ大統領クリントン」等が格納されているので、
整合性検証部２における参照の結果、不整合データとし
て組織名、役職、個人名として「イラク国王フセイ
ン」と「イラク大統領フセイン」が検出され、抽出
データである「イラクのフセイン国王」を誤りとして誤
り処理部３に表示する。The operation of the embodiment shown in FIG. 1 will be described with reference to FIG. When the original text “King Hussein of Iraq starts an attack on Kurdish guerrillas” is input to the data extraction unit 1 shown in FIG. 1 as shown in FIG. As shown in the figure, "Iraq" is extracted as the organization name, "King" as the title, and "Hussein" as the individual name.
Refer to the fact data database 4. Since the fact data database 4 stores "Iraqi President Hussein" and "U.S. President Clinton" as the organization name, post and personal name, respectively,
As a result of the reference in the consistency verification unit 2, "King Hussein of Iraq" and "President Hussein of Iraq" are detected as inconsistent data as the organization name, post and personal name, and the extracted data "King Hussein of Iraq" is incorrect It is displayed on the error processing unit 3.

【００３６】図３に図１の詳細図を示す。図３におい
て、図１と同符号は同一部を示し、誤り処理部３は、図
３に示す如く、誤り部分表示部３−１と処理部３−２を
具備し、整合性検証部２によりテキスト中から抽出した
抽出データと事実データデータベースとを照合し、事実
データベースを基準としてそれと矛盾する抽出データを
誤りと判断してこれを誤り部分表示部３−１で、これら
テキスト中の表現と対応する事実データデータベース４
の事実データとを表示してユーザに提示する。ユーザが
これを確認して、例えばキーボード等より修正確認操作
信号を入力すれば処理部３−２がテキストの誤り部分を
事実データにより修正する。FIG. 3 shows a detailed view of FIG. 3, the same reference numerals as in FIG. 1 denote the same parts, and the error processing unit 3 includes an error part display unit 3-1 and a processing unit 3-2 as shown in FIG. The extracted data extracted from the text is compared with the fact data database, and the extracted data contradictory to the fact database is determined as an error based on the fact database, and the extracted data is determined by the error part display unit 3-1 to correspond to the expressions in the text. Fact data database 4
Is displayed and presented to the user. If the user confirms this and inputs a correction confirmation operation signal from, for example, a keyboard or the like, the processing unit 3-2 corrects the erroneous portion of the text with the fact data.

【００３７】しかし、テキストからの抽出データに対応
するデータが事実データデータベース４に保持されてい
ないとき、誤り部分表示部３−１には抽出データのみが
表示されるので、ユーザがこれを確認して、例えばキー
ボード等より更新確認操作信号を入力すれば、データ更
新部５が整合性検証部２に保持されている前記抽出デー
タを事実データデータベース４に新しく登録する。However, when the data corresponding to the extracted data from the text is not held in the fact data database 4, only the extracted data is displayed on the error portion display section 3-1. For example, when an update confirmation operation signal is input from a keyboard or the like, the data updating unit 5 newly registers the extracted data held in the consistency verification unit 2 in the fact data database 4.

【００３８】（２）本発明の第２の実施の形態本発明の第２の実施の形態を図４及び図５に基づき説明
する。図４は本発明の第２の実施の形態図、図５はその
動作説明図である。図中他図と同記号は同一部を示し、
６は信頼性評価部、７は誤り部分判定部、８はデータ更
新部、９は誤り部分表示部である。(2) Second Embodiment of the Present Invention A second embodiment of the present invention will be described with reference to FIGS. FIG. 4 is a diagram showing a second embodiment of the present invention, and FIG. In the figures, the same symbols as those in the other figures indicate the same parts,
Reference numeral 6 denotes a reliability evaluation unit, 7 denotes an error part determination unit, 8 denotes a data update unit, and 9 denotes an error part display unit.

【００３９】事実データデータベース４には、事実デー
タの外に、その事実の信頼度、例えばＭ新聞のときは
０．９、Ｎ新聞のときは０．５とか、同一事実の生起回
数等の信頼性を示すデータが記入されている。The fact data database 4 stores, in addition to the fact data, the reliability of the fact, for example, 0.9 for an M newspaper, 0.5 for an N newspaper, or the reliability of the number of occurrences of the same fact. The data indicating the gender is entered.

【００４０】整合性検証部２がテキストから抽出された
抽出データと事実データデータベース４から相互に矛盾
するデータを発見したとき、信頼性評価部６はテキスト
から抽出された抽出データと、それに対応して参照され
た事実データデータベース４から前記の如き信頼性を示
すデータに基づき信頼性を演算して定量的な評価結果を
付加し、これを誤り部分判定部７に送出するものであ
る。When the consistency verification unit 2 finds mutually inconsistent data from the extracted data extracted from the text and the fact data database 4, the reliability evaluation unit 6 determines the extracted data extracted from the text and the corresponding data. The reliability is calculated based on the data indicating the reliability from the fact data database 4 referred to as described above, a quantitative evaluation result is added, and the result is transmitted to the error part determination unit 7.

【００４１】誤り部分判定部７は、前記抽出データに付
加された評価結果と、この抽出データにより事実データ
データベース４を参照して得たデータに付加された評価
結果とを比較して誤り部分がいずれかを認定するもので
ある。The error part judging section 7 compares the evaluation result added to the extracted data with the evaluation result added to the data obtained by referring to the fact data database 4 based on the extracted data to determine the error part. Either is certified.

【００４２】データ更新部８は、誤り部分判定部７が誤
り部分が事実データデータベース４からのデータにある
と判断した場合に伝達された前記抽出データを受けと
り、これに基づき事実データデータベース４中のデータ
を修正する。The data updating unit 8 receives the extracted data transmitted when the erroneous part determination unit 7 determines that the erroneous part exists in the data from the fact data database 4, and based on the received extracted data, Modify the data.

【００４３】逆にテキストより抽出された抽出データが
誤っていると判断した場合には、抽出データが誤り部分
表示部９の方に送られ、ユーザに提示される。このとき
事実データデータベース４から参照されたデータを同時
に表示し、これに修正することもできる。Conversely, if it is determined that the extracted data extracted from the text is incorrect, the extracted data is sent to the erroneous portion display section 9 and presented to the user. At this time, the data referred from the fact data database 4 can be simultaneously displayed and corrected.

【００４４】例えば図５のに示す如く、テキストより
抽出された抽出データに、組織体名として「Ａ証券」、
役職として「社長」、個人名として「Ｃ」、情報源とし
て「Ｙ新聞」、生起回数として「１」が記入されていた
とき、これに基づき、図４に示す整合性検証部２が事実
データデータベース４を参照する。そして図５のに示
す如く、参照データとして組織体名「Ａ証券」、役職
「社長」、個人名「Ｂ」、情報源「Ｘ新聞」、それまで
の生起回数「２」、信頼度「０．７」、Ｙ新聞の信頼度
「０．１」を読み出す。ここで信頼度「０．７」は予め
Ｘ新聞の記事の信頼度としてこの「０．７」が事実デー
タデータベース４に登録してあり、またＹ新聞の記事の
信頼度としてこの「０．１」が登録されている。For example, as shown in FIG. 5, in the extracted data extracted from the text, "A securities"
When “President” is entered as the post, “C” as the personal name, “Y newspaper” as the information source, and “1” as the number of occurrences, the consistency verification unit 2 shown in FIG. Reference is made to the database 4. Then, as shown in FIG. 5, as the reference data, the organization name “A securities”, the position “President”, the personal name “B”, the information source “X newspaper”, the number of occurrences so far “2”, and the reliability “0” .7 "and the reliability" 0.1 "of the Y newspaper. Here, the reliability “0.7” is registered in advance in the fact data database 4 as the reliability of the article of the X newspaper, and the reliability “0.1” is registered as the reliability of the article of the Y newspaper. Is registered.

【００４５】これにより整合性検証部２から、図５に
示す如きデータが信頼度等の付加されたものとして信頼
性評価部６に出力され、同じくに示す如き事実データ
データベース４からの信頼度等の付加された参照データ
として同じく信頼性評価部６として出力される。信頼性
評価部６では、これら生起回数と信頼度を乗算して、両
者の信頼性を評価し、これを誤り部分判定部７に送出す
る。誤り部分判定部７では、この乗算結果をみて、例え
ば数値の大きい方を信頼性の高いものと判定し、誤り部
分を判断する。As a result, the data shown in FIG. 5 is output from the consistency verification unit 2 to the reliability evaluation unit 6 as the one with the added reliability and the like, and the reliability and the like from the fact data database 4 as shown in FIG. Are also output as the reliability evaluation unit 6 as reference data to which is added. The reliability evaluation unit 6 evaluates the reliability of both by multiplying the number of occurrences by the reliability, and sends it to the error part determination unit 7. The error part determination unit 7 determines the error part by referring to the result of the multiplication, for example, to determine the one having a larger numerical value as having higher reliability.

【００４６】このように信頼性を信頼度及び生起確率等
の形で定量的に評価した結果、事実データデータベース
４中のデータが誤りと判断されたとき、誤り部分判定部
７は抽出データをデータ更新部８に送る。データ更新部
８はこれに基づき事実データデータベース４中のデータ
を修正する。逆に抽出データが誤りと判断されたとき、
誤り部分判定部７は抽出データを誤り部分表示部９に送
出して、これをユーザに表示する。このとき事実データ
データベース４より参照した正しいデータも同じく表示
され、これに基づき校正が行われる。As a result of the quantitative evaluation of the reliability in the form of the reliability and the probability of occurrence, if the data in the fact data database 4 is determined to be erroneous, the erroneous portion determination unit 7 converts the extracted data into data. Send to update section 8. The data updating unit 8 corrects the data in the fact data database 4 based on this. Conversely, when the extracted data is determined to be incorrect,
The erroneous part determination unit 7 sends the extracted data to the erroneous part display unit 9 and displays this to the user. At this time, the correct data referred from the fact data database 4 is also displayed, and calibration is performed based on this.

【００４７】図５に示す例ではの「Ａ証券社長
Ｂ」の信頼性が２×０．７であり、の「Ａ証券社長
Ｃ」の信頼性が１×０．１であるので、誤り部分表
示部９にこのの抽出データとの事実データデータベ
ース４からの参照データがユーザに表示され、これにも
とづく校正が行われる。In the example shown in FIG. 5, "President of A Securities
Since the reliability of “B” is 2 × 0.7 and the reliability of “A Securities President C” is 1 × 0.1, the error data display unit 9 displays the fact data database 4 with the extracted data. Is displayed to the user, and calibration is performed based on the reference data.

【００４８】このため、この実施の形態では、事実デー
タを事実データデータベース４に登録する際に、事実デ
ータの生起頻度や情報源及びその情報源に対する信頼度
等の背景データを登録する。先に例えば各新聞の信頼度
を登録しておき、これを参照しながら事実データデータ
ベース４に順次事実データが登録される。Therefore, in this embodiment, when the fact data is registered in the fact data database 4, background data such as the frequency of occurrence of the fact data, the information source, and the reliability of the information source are registered. First, for example, the reliability of each newspaper is registered, and the fact data is sequentially registered in the fact data database 4 while referring to this.

【００４９】本発明の第２の実施の形態の別の動作説明
図（その２）を図６により説明する。前記事実データデ
ータベース４に、各フィールド毎に誤りの可能性を評価
してこれに対応する数値を付加しておく。そしてテキス
トからの抽出データが事実データデータベース４中の参
照データと完全に一致せず、しかも対応する可能性のあ
るデータが複数存在したとき、この誤り可能性に基づき
フィールド値の変更のコストを評価し、最も低コスト
で、事実データベース中のデータと対応する変更を選択
して、誤り内容と判断する。Another operation explanatory diagram (part 2) of the second embodiment of the present invention will be described with reference to FIG. The possibility of an error is evaluated for each field in the fact data database 4 and a numerical value corresponding to this is added. When the data extracted from the text does not completely match the reference data in the fact data database 4 and there is a plurality of data that may correspond, the cost of changing the field value is evaluated based on the possibility of error. Then, at the lowest cost, a change corresponding to the data in the fact database is selected and determined as an error content.

【００５０】例えば図６に示す如く、国名というフィー
ルドと役職というフィールドを比較したとき、誤り易い
役職の誤り確率を、例えば「２」とし、誤りにくい国名
の確率をこれより大きい「３」と記入しておく。即ち誤
り易いものを低い数値で記入しておく。For example, as shown in FIG. 6, when the field of country name and the field of position are compared, the error probability of a position that is easily erroneous is set to, for example, “2”, and the probability of a country name that is not easily erroneous is set to “3”, which is larger. Keep it. That is, those that are easy to make mistakes are entered in low numerical values.

【００５１】いま、図６に示す如く、「ヨルダンのフ
セイン大統領がエジプト訪問」という原テキストがある
とき、データ抽出部（図４では省略）は、同に示す如
く、国名として「ヨルダン」を、役職として「大統領」
を、個人名として「フセイン」を抽出し、これにもとづ
き図４に示す整合性検証部２が事実データデータベース
４を参照する。As shown in FIG. 6, when there is an original text “Jordan President Hussein visits Egypt”, the data extraction unit (omitted in FIG. 4), as shown in FIG. "President" as post
Is extracted as the personal name, and the consistency verification unit 2 shown in FIG. 4 refers to the fact data database 4 based on this.

【００５２】そして事実データデータベース４中から一
致する可能性のあるものとして、国名、役職、個人名
が、それぞれ「ヨルダン国王フセイン」と、「イラ
ク大統領フセイン」とが抽出される。Then, as possible matches in the fact data database 4, the country name, title and personal name are extracted as "King Hussein of Jordan" and "President Hussein of Iraq", respectively.

【００５３】このとき、図６に示す如く、誤り確率と
して国名が「３」、役職が「２」のため、整合性検証部
２では最も低コストで事実データデータベース４中のデ
ータと対応する変更ができる（即ち役職の変更は
「２」、国名の変更は「３」）、役職の変更を選び、
に示す如く、「ヨルダンのフセイン大統領」を「ヨルダ
ンのフセイン国王」とその役職の方を変えることにより
完全一致するデータを選択する。At this time, as shown in FIG. 6, since the country name is “3” and the position is “2” as the error probability, the consistency verification unit 2 changes the data corresponding to the data in the fact data database 4 at the lowest cost. (Ie change of title is "2", change of country is "3"), select change of title,
By selecting "Jordan's President Hussein" and "Jordan's King Hussein" as shown in the figure below, the exact data is selected.

【００５４】（３）本発明の第３の実施の形態本発明の第３の実施の形態を図７及び図８により説明す
る。図７は、例えば首相の辞任等の事実の変更に関する
データを抽出し、変更前の状態に対しての整合性のチェ
ックを行い、また対応データを変更後の状態に修正する
ものである。(3) Third Embodiment of the Present Invention A third embodiment of the present invention will be described with reference to FIGS. FIG. 7 is for extracting data relating to a change in fact such as, for example, resignation of the Prime Minister, checking consistency with the state before the change, and correcting the corresponding data to the state after the change.

【００５５】図７において、２は整合性検証部、４は事
実データデータベース、８はデータ更新部、９は誤り部
分表示部、１０は事実の変更データ抽出部である。事実
の変更データ抽出部１０は、入力されたテキストから事
実データのうち、例えば死亡とか辞任とか事実の変更に
関するデータを抽出するものである。In FIG. 7, 2 is a consistency verification unit, 4 is a fact data database, 8 is a data update unit, 9 is an error part display unit, and 10 is a fact change data extraction unit. The fact change data extracting unit 10 extracts, from the input text, data on a fact change, such as death or resignation, from the input text.

【００５６】いま、事実の変更データ抽出部１０に、例
えば図８に示す如き、「英メージャー首相辞任」と
いうテキストが入力されたとき、事実の変更データ抽出
部１０は、事実データを抽出するとともに、この「辞
任」をキーにして事実データの中から事実の変更データ
を抽出する。Now, as shown in FIG. 8, for example, when the text "Resign of the Prime Minister of the United Kingdom" is input to the fact change data extraction unit 10, the fact change data extraction unit 10 extracts the fact data and The fact change data is extracted from the fact data using the “resignation” as a key.

【００５７】ところで、図８のに示す如く、「辞任」
ということは、その前提条件として対象人物が当該の職
務についていることが必要であり、また「辞任」にとも
なって当該職務のレコードの削除が必要となる。By the way, as shown in FIG.
This means that as a prerequisite, the target person must be in the job, and the record of the job must be deleted with “resignation”.

【００５８】このため図８に示す如く、事実データデ
ータベース中のデータの変更が必要となる。即ち事実デ
ータデータベース中に記載された、国名、役職、個人名
が「英首相メージャー」というデータから、に示
す如く、「メージャー」を削除した「英首相・・・」
というデータに修正するものである。For this reason, as shown in FIG. 8, it is necessary to change the data in the fact data database. In other words, as shown in the fact data database, the country name, title, and personal name are "UK Prime Minister Major", and as shown in the table below, "Major" is deleted.
The data is corrected.

【００５９】このため、前記事実の変更データ抽出部１
０により、例えば「英メージャー首相辞任」という事
実の変更に関するデータを抽出したとき、事実の変更デ
ータであることを付加して整合性検証部２に送る。For this reason, the fact change data extraction unit 1
For example, when data relating to a change in fact, such as “Resignation of the British Prime Minister” is extracted, the fact that the data is changed is sent to the consistency verification unit 2.

【００６０】整合性検証部２では事実の変更前のデータ
「英メージャー首相」にもとづき事実データデータベ
ース４を参照してこの事実データデータベース４のデー
タとの整合性のチェックを行う。そして整合性のチェッ
クに合格した事実の変更データはデータ更新部８に伝達
され、事実データデータベース４中の「英首相メー
ジャー」というデータの個人名が削除され、図８に示
す如きデータに修正される。The consistency verifying unit 2 checks the consistency with the data in the fact data database 4 by referring to the fact data database 4 based on the data before the fact change, “The Prime Minister of Great Britain”. Then, the changed data of the fact that passed the consistency check is transmitted to the data updating unit 8, the personal name of the data “French Primer Major” in the fact data database 4 is deleted, and corrected to the data as shown in FIG. You.

【００６１】しかし事実の変更前のデータによる前記整
合性のチェックにおいて整合性がとれなかった、不合格
のデータは誤りデータとして誤り部分表示部９に送出さ
れ、ユーザに表示されて校正されるものとなる。However, rejected data, which was not consistent in the above-described consistency check based on the data before the fact was changed, is sent to the error portion display unit 9 as error data, and is displayed to the user and corrected. Becomes

【００６２】（４）本発明の第４の実施の形態本発明の第４の実施の形態を図９及び図１０に基づき説
明する。図９においては、事実データデータベース４の
記載に、生起期日、終了期日、記載期日の３つの欄が設
けられており、それに基づいてクリントンが１９９４年
に米国大統領に就任というデータがあるとき、例えば１
９９２年のクリントンに関する記載が正確か否かを校正
するものである。(4) Fourth Embodiment of the Present Invention A fourth embodiment of the present invention will be described with reference to FIGS. In FIG. 9, the description of the fact data database 4 includes three columns of the date of occurrence, the date of termination, and the date of entry. Based on this, if there is data indicating that Clinton will become President of the United States in 1994, for example, 1
This is to correct the accuracy of the description of Clinton in 992.

【００６３】整合性検証部１２には期日整合判別部１２
−１が設けられ、原テキストから抽出された期日付き抽
出データが正確か否かをチェックするものである。例え
ば図９に示す事実データデータベース４に、図１０の
で示す如く、国名、肩書、名前、生起期日、終了期日と
して「米国、大統領、クリントン、１９９４、不明」と
いう事実データが記載されているとき、図１０ので示
す如く、「クリントン米大統領は１９９２年にベルリン
を訪問」という原テキストがデータ抽出部１に入力され
ると、データ抽出部１は、図１０ので示す如く、抽出
された国名として「米国」、肩書として「大統領」、名
前として「クリントン」、生起期日として「１９９
２」、終了期日として「不明」を抽出する。The consistency verification unit 12 includes a date consistency discrimination unit 12
-1 is provided to check whether or not the date-extracted extracted data extracted from the original text is accurate. For example, when fact data “US, President, Clinton, 1994, unknown” is described in the fact data database 4 shown in FIG. 9 as the country name, title, name, date of birth, end date as shown in FIG. As shown in FIG. 10, when the original text “President Clinton visits Berlin in 1992” is input to the data extraction unit 1, the data extraction unit 1, as shown in FIG. "US", title "President", name "Clinton", date of birth "199
2), "Unknown" is extracted as the end date.

【００６４】そしてこれらの事実データが整合性検証部
１２に伝達され、整合性検証部１２では、国名「米
国」、肩書「大統領」、名前「クリントン」により事実
データデータベース４を参照する。そして図１０のに
示す如き事実データを得る。The fact data is transmitted to the consistency verification unit 12, and the consistency verification unit 12 refers to the fact data database 4 by the country name “USA”, title “President”, and name “Clinton”. Then, fact data as shown in FIG. 10 is obtained.

【００６５】このとき期日整合判別部１２−１にはテキ
ストからの抽出データより生起期日として「１９９２」
という数字が保持されており、これが事実データデータ
ベース４から参照された生起期日「１９９４」と比較し
てそれよりも小さい数字つまり古いものであることが判
断される。従ってクリントンが１９９２年に米国大統領
ということは誤りであることが判るので、図１０のに
示す如く、整合性検証部１２では整合性チェックの結果
を×つまり、抽出データ「米国大統領クリントン
１９９２年」は誤りとして誤り処理部３に通知され、こ
れが「米国大統領クリントン１９９４年」と共に
ユーザに表示されて正確な校正が行われる。At this time, the date matching discriminating unit 12-1 sets "1992" as the date of occurrence based on the data extracted from the text.
Is stored, and it is determined that this is a smaller number, that is, an older one as compared with the date of occurrence “1994” referred to from the fact data database 4. Therefore, since it is known that Clinton is incorrect in being the United States President in 1992, as shown in FIG.
"1992" is notified to the error processing unit 3 as an error, and this is displayed to the user together with "US President Clinton 1994", and correct calibration is performed.

【００６６】（５）本発明の第５の実施の形態本発明の第５の実施の形態を図１１及び図１２に基づき
説明する。図１１においては抽出した事実データと時間
を含めてキーの一致するデータが事実データデータベー
ス中に存在しないが時間以外のキーが一致するデータが
存在したとき、テキスト中の事実データに対して時間的
な前後関係を表すように修正するものである。例えば
「Ａ証券会長Ｂ１９９７．８」というテキストが
あり、事実データデータベース中に「Ａ証券会長Ｂ
終了期日１９９７．５」というデータが存在したと
き、テキストを「Ａ証券前会長Ｂ」と修正するもの
である。(5) Fifth Embodiment of the Present Invention A fifth embodiment of the present invention will be described with reference to FIGS. In FIG. 11, when there is no data having the same key including the extracted fact data and time in the fact data database, but there is data having the key other than time coincident with the fact data in the text, the temporal data is compared with the fact data in the text. It is modified so as to express a proper context. For example, there is a text “A Securities Chairman B 1997.8”, and “A Securities Chairman B
When the data of “End date 19977.5” exists, the text is corrected to “A Securities former chairman B”.

【００６７】図１１においては、事実データデータベー
ス４に生起期日、終了期日の項が設けられている。例え
ば図１２に示す如く、事実データデータベース４中に
組織体名、肩書、名前、生起期日、終了期日として「Ａ
証券会長Ｂ終了期日１９９７．５」というデータ
が記入されている。In FIG. 11, the fact data database 4 is provided with items of the occurrence date and the end date. For example, as shown in FIG. 12, in the fact data database 4, the name of the organization, the title, the name, the date of occurrence, and the date of termination are "A
Securities Chairman B Termination Date 19977.5 "is entered.

【００６８】いま図１１のデータ抽出部１に、図１２の
に示す如き「１９９７年８月Ａ証券のＢ社長は検察
からの事情聴取を受けた」というテキストが入力される
と、データ抽出部１は、図１２のに示す如く、組織体
名としてＡ証券、肩書として会長、名前としてＢ、記載
期日として１９９７．８を事実データとして抽出する。Now, as shown in FIG. 12, when the text “A president B of A Securities has received information from the prosecutor's office” as shown in FIG. 12, is input to the data extraction unit 1 of FIG. 1 extracts, as fact data, A securities as an organization name, a chairman as a title, B as a name, and 1997.8 as a description date, as shown in FIG.

【００６９】これにより整合性検証部１２が事実データ
データベース４を照合し、図１２に示す如く、組織体
名として「Ａ」、肩書として「会長」、名前として
「Ｂ」、終了期日として「１９９７．５」を得る。As a result, the consistency verification unit 12 checks the fact data database 4, and as shown in FIG. 12, the organization name is "A", the title is "Chairman", the name is "B", and the end date is "1997". .5 ".

【００７０】期日整合判別部１２−１は、これら図１２
のとを比較し、にＡ証券Ｂ社長に関して既に終了
期日のデータがあることを認識する。またにＡ証券Ｂ
社長に関する抽出データがあるが、テキストからの抽出
データの記載期日が前記の終了期日と一致せず、記載
期日が終了期日よりも後であるため一致するデータとは
みなされない。The date matching discriminating unit 12-1
And recognize that there is already data on the end date for President A Securities B. In addition, A securities B
Although there is extracted data related to the president, the date described in the extracted data from the text does not match the above-mentioned end date, and the date described is later than the end date.

【００７１】このように、抽出データに対して一致し得
るデータが他にない場合、整合性検証部１２は記載期日
のような期日指定を無視して一致するデータを事実デー
タデータベース４中に探す。そして図１２のとのよ
うに「Ａ証券会長Ｂ」というデータとして一致するもの
を参照する。As described above, when there is no other data that can match the extracted data, the consistency verification unit 12 ignores the date specification such as the description date and searches the fact data database 4 for matching data. . Then, as shown in FIG. 12, the matching data is referred to as "A Securities Chairman B".

【００７２】このとき、期日整合判別部１２−１では、
図１２の終了期日１９９７．５との記載期日１９９
７．８をチェックして１９９７．８が後であることを認
識し、これを一致データとともに誤り処理部３に通知す
る。これにより誤り部分表示部３−１には先ず「Ａ証券
会長Ｂ」が表示されて、そのあとで誤り処理部３により
「会長」が「前会長」と修正され、図１２のに示す如
く、元のテキストの記述を現職でないことを示すものに
修正される。At this time, the date matching discrimination unit 12-1
Description date 199 with end date 19977.5 in FIG.
7.8 is checked to recognize that 19977.8 is later, and this is notified to the error processing unit 3 together with the coincidence data. As a result, "A Chairman of the Securities B" is first displayed on the error portion display section 3-1. Thereafter, the "Chairman" is corrected by the error processing section 3 to be "Former Chairman", and as shown in FIG. The original text description is modified to indicate that it is not current.

【００７３】（６）本発明の第６の実施の形態本発明の第６の実施の形態を１３及び図１４に基づき説
明する。第６の実施の形態においては、例えば年令のよ
うに規則的に変化する事実データについて、抽出データ
の記載期日における値を事実データデータベース中のデ
ータに基づいて計算して整合性を判断するものである。(6) Sixth Embodiment of the Present Invention A sixth embodiment of the present invention will be described with reference to FIG. 13 and FIG. In the sixth embodiment, for fact data that changes regularly, such as an age, the value on the date of description of the extracted data is calculated based on the data in the fact data database to determine consistency. It is.

【００７４】図１３において、整合性検証部１３には期
日演算判別部１３−１が設けられ、年令の如く、期日に
よって変化するデータに関して特定期日における値を計
算するものであり、計算用のアルゴリズムが記述されて
いる。In FIG. 13, the consistency verification unit 13 is provided with a date calculation discriminating unit 13-1 for calculating a value at a specific date with respect to data that changes with the date, such as age. The algorithm is described.

【００７５】図１３において、図１４に示す如く、事
実データデータベース４中に組織体名、肩書、名前、年
令、記載期日として「ＡＢＣ会長ＥＦ５０才１
９９４」というデータが記入されている。In FIG. 13, as shown in FIG. 14, in the fact data database 4, the name of the organization, title, age, age, and date of description are “ABC Chairman EF 50 years old 1
994 "is entered.

【００７６】いま図１３のデータ抽出部１に、図１４の
に示す如き「１９９７．６．１ＡＢＣのＥＦ会長（５
０歳）は」というテキストが入力されると、データ抽出
部１は、図１４のに示す如く、組織体名としてＡＢ
Ｃ、肩書として会長、名前としてＥＦ、年令として５
０、記載期日として１９９７を事実データとして抽出す
る。Now, as shown in FIG. 14, the data extraction unit 1 shown in FIG. 13 reads “1997.6.1 ABC EF Chairman (5.
When the text “0 years old” is input, the data extraction unit 1 sets the name of the organization as AB as shown in FIG.
C, President as title, EF as name, 5 as age
0, 1997 is extracted as fact data as the description date.

【００７７】整合性検証部１３は、この事実データに基
づき、事実データデータベース４を参照し、図１４に
示す組織体名としてＡＢＣ、肩書として会長、名前とし
てＥＦ、年令として５０、記載期日として１９９４が読
出される。そしてこれが期日演算判別部１３−１に送出
される。Based on this fact data, the consistency verification unit 13 refers to the fact data database 4, and shows ABC as the organization name, chairman as the title, EF as the name, EF as the age, and 50 as the date shown in FIG. 1994 is read. This is sent to the date calculation discriminating unit 13-1.

【００７８】期日演算判別部１３−１には、別にデータ
抽出部１より伝達された記載期日１９９７が伝達されて
いるので、前記１９９４、５０及び１９９７にもとづき
５０＋（１９９７−１９９４）を演算して年令５３を演
算する。そしてこの５３が誤り処理部３の処理部３−２
に伝達される。The date calculation discriminator 13-1 receives the description date 1997 separately transmitted from the data extractor 1, and calculates 50+ (1997-1994) based on the above 1994, 50 and 1997. The age 53 is calculated. This 53 is the processing unit 3-2 of the error processing unit 3.
Is transmitted to

【００７９】このとき誤り処理部３には、この年令を含
むテキストの一部「ＡＢＣのＥＦ会長（５０歳）は」が
誤り部分表示部３−１に表示されており、この数字が処
理部３−２により修正されて「ＡＢＣのＥＦ会長（５３
歳）は」と校正されることになる。At this time, in the error processing unit 3, a part of the text including this age, "ABC EF chairman (50 years old)" is displayed in the error part display unit 3-1. Amended by Part 3-2, "EF Chairman of ABC (53
Years old) will be proofread.

【００８０】このようにして、テキストから抽出された
人物に、例えば年令に関するデータがあり、事実データ
データベース中にも同一人物の年令についての記述があ
るような場合、事実データデータベース中のデータを抽
出データの記載期日における年令を計算し、整合性を調
べる。図１４の例では、計算したものと一致しなかった
ので、この事実データデータベース中の値に基づき計算
した値に修正している。If the person extracted from the text has, for example, data on the age and the fact data database also describes the age of the same person, the data in the fact data database Calculate the age on the date described in the extracted data and check the consistency. In the example of FIG. 14, the value does not match the calculated value, and is corrected to a value calculated based on the value in the fact data database.

【００８１】（７）本発明の第７の実施の形態本発明の第７の実施の形態を図１５及び図１６に基づき
説明する。第７の実施の形態においては、例えば「橋本
龍太郎」を「橋龍」という略称で表現することがある
が、このような場合でもテキスト中の「橋龍」を正しい
ものとして判断するものである。(7) Seventh Embodiment of the Present Invention A seventh embodiment of the present invention will be described with reference to FIGS. In the seventh embodiment, for example, “Ryutaro Hashimoto” may be represented by an abbreviation of “Ryutaro Hashi”, but even in such a case, “Ryutaro Hashimoto” in the text is determined to be correct. .

【００８２】図１５において、１４は略称テーブルであ
り、例えば個人名についてその略称と標準名称があらか
じめ登録されている。図１５の例では「橋龍←→橋本龍
太郎」、「クリントン←→ビル・クリントン」等が登録
されている。事実データデータベース４に登録する際に
は標準的な表現に一旦変換して登録する。例えば「日本
首相橋龍」を「日本首相橋本龍太郎」と登録す
る。In FIG. 15, reference numeral 14 denotes an abbreviated name table in which abbreviated names and standard names of personal names are registered in advance. In the example of FIG. 15, “Ryuhashi ← → Ryutaro Hashimoto”, “Clinton ← → Bill Clinton” and the like are registered. When registering in the fact data database 4, it is once converted into a standard expression and registered. For example, register “Japanese Prime Minister Hashimoto Ryu” as “Japanese Prime Minister Ryutaro Hashimoto”.

【００８３】整合性検証部１５には略称チェック部１５
−１が設けられ、整合性検証部１５において、抽出デー
タを事実データデータベース４の各フィールド値を検索
したとき、名前が一致しなかった場合、この名前により
略称テーブル１４をアクセスするものである。例えば前
記検索において、「日本」「首相」という項で一致して
も名前の項で一致しなかったとき、抽出データの名前
「橋龍」で略称テーブル１４をアクセスして「橋龍」に
対しての標準名称として「橋本龍太郎」を得ることによ
り、その一致が得られる。The consistency verification unit 15 has an abbreviation check unit 15
When the consistency verification unit 15 searches the extracted data for each field value of the fact data database 4, if the names do not match, the abbreviation table 14 is accessed using this name. For example, in the above search, if the terms "Japan" and "Prime Minister" match, but the terms do not match, the abbreviation table 14 is accessed using the name "Hashiryu" of the extracted data and "Hashiryu" By obtaining "Ryutaro Hashimoto" as the standard name for all, the agreement is obtained.

【００８４】いま図１５に示すデータ抽出部１に、図１
６に示す如き「日本首相橋龍さん」という原テキ
ストが入力されると、データ抽出部１は、図１６に示
す如く、国名として日本、肩書として首相、個人名とし
て橋龍が事実データとして抽出する。Now, the data extraction unit 1 shown in FIG.
When the original text “Japanese Prime Minister Hashi Ryu” as shown in FIG. 6 is input, the data extraction unit 1 extracts Japan as the country name, Prime Minister as the title, and Ryu Hashi as the personal name as fact data as shown in FIG. I do.

【００８５】整合性検証部１５は、この事実データに基
づき、事実データデータベース４を参照し、図１６に
示す如く、国名として日本、肩書として首相、個人名と
して橋本龍太郎という参照データを得る。しかしこの参
照データは、国名及び肩書というフィールドでは一致す
るものの、名前のところでは一致しない。Based on the fact data, the consistency verification unit 15 refers to the fact data database 4 and obtains reference data of Japan as the country name, Prime Minister as the title, and Ryutaro Hashimoto as the personal name, as shown in FIG. However, this reference data matches in the country and title fields but not in the name.

【００８６】この場合、略称チェック部１５−１が不一
致の名前「橋龍」により略称テーブル１４をアクセスし
て「橋龍」に対応する標準名称「橋本龍太郎」を読み出
す。そしてこれに基づき、整合性検証部１３が参照デー
タと再度比較することによりその一致をみるので、原テ
キストの正確性が認識される。In this case, the abbreviation checking unit 15-1 accesses the abbreviation table 14 using the mismatched name “Hashiryu” and reads out the standard name “Hashimoto Ryutaro” corresponding to “Hashiryu”. Then, based on this, the matching is checked by the consistency verification unit 13 again comparing with the reference data, so that the accuracy of the original text is recognized.

【００８７】このようにして名前の一致性判定に関する
規則を定めて略称のように、正確に一致しない場合で
も、一致し得るものとできる条件を定めることができ
る。（８）本発明の第８の実施の形態本発明の第８の実施の形態を図１７及び図１８に基づき
説明する。例えば新聞記事の表現では、同一事実の説明
の場合、最初は肩書などを省略せずに全部書き、順次少
しずつ省略表現することが行われることがあるが、第８
の実施の形態はこのような場合に対応するものである。In this way, the rules for determining the matching of names can be determined, and conditions that can be matched even if they do not match exactly, such as abbreviations, can be determined. (8) Eighth Embodiment of the Present Invention An eighth embodiment of the present invention will be described with reference to FIGS. For example, in the description of a newspaper article, in the case of an explanation of the same fact, the entirety is first written without omitting the title, etc., and then abbreviated gradually in some cases.
The embodiment of the present invention corresponds to such a case.

【００８８】第８の実施の形態では、図１７に示す如
く、同一事実の表現を出現順で示した、出現順リスト１
６を設け、整合性検証部１７には、この出現順リスト１
６を作成したり、この出現順リスト１６を検索してその
省略状態をチェックする出現順リスト作成チェック部１
７−１が設けられている。In the eighth embodiment, as shown in FIG. 17, the same facts are expressed in the order of appearance.
6 is provided in the consistency verification unit 17.
6 and an appearance order list creation check unit 1 for searching the appearance order list 16 and checking its omission state.
7-1 is provided.

【００８９】いま、図１８に示す如く、「リーガン米
国務庁長官・・・リーガン長官・・・リーガン国務庁長
官・・・」という原テキストがデータ抽出部１に入力さ
れると、データ抽出部１は同に示す如く、「リーガン
米国務庁長官」、「リーガン長官」、「リーガン国務庁
長官」を順次抽出し、これらを順次出現順リスト作成チ
ェック部１７−１に送出する。Now, as shown in FIG. 18, when the original text "Secretary of the United States Regent ... Secretary of the Regent ... Secretary of the Regents ..." is input to the data extraction unit 1, the data extraction unit 1 sequentially extracts "Regan Secretary of State,""Regan," and "Regan Secretary of State," and sequentially sends them to the appearance order list creation check section 17-1.

【００９０】出現順リスト作成チェック部１７−１は、
この抽出された事実データに基づき、リーガンに関する
同一事実について、図１７に示す如き、出現順リスト１
６を作成する。The appearance order list creation check section 17-1
Based on the extracted fact data, the same facts regarding Regan are listed in the order of appearance 1 as shown in FIG.
Create 6.

【００９１】この出現順リスト１６を作成したのち、出
現順リスト作成チェック部１７−１は、その記載状態
が、前記省略表現に適合しているか否かをチェックす
る。先ず、図１８のａに示す如く、出現順リスト１６
からＮｏ．１の表現とＮｏ．２の表現を比較する。これ
によりＮｏ．１の「リーガン米国務庁長官」よりＮｏ．
２の「リーガン長官」の表現の方が省略されていること
が判別されるので、Ｎｏ．１の表現を合格とする。After the appearance order list 16 has been created, the appearance order list creation check unit 17-1 checks whether or not the description state conforms to the abbreviation. First, as shown in FIG.
From No. 1 and No. 1. Compare the two expressions. Thereby, No. No. 1 “Regan US Secretary of State” No. 1
Since it is determined that the expression of “Secretary of Reagan” is omitted in No. 2, No. The expression of 1 is accepted.

【００９２】次に出現順リスト１６からＮｏ．２の表現
とＮｏ．３の表現を比較する。これによりＮｏ．２の
「リーガン長官」よりＮｏ．３の「リーガン国務庁長
官」の方が、例えば長くて省略されていないことが判別
されるので、図１８のｂに示す如く、Ｎｏ．２の表現
を不合格と判定する。Next, from the appearance order list 16, 2 and No. 2. Compare expressions 3 Thereby, No. No. 2 from "Regan Secretary" For example, it is determined that the “Secretary of State for Reagan” of No. 3 is long and not omitted, and as shown in FIG. The expression of 2 is judged as reject.

【００９３】勿論Ｎｏ．２、Ｎｏ．３の表現がその前の
ものと同じ場合も合格とする。このように、一般には詳
しく肩書を明記した後に省略するので、後方の表現が前
方の表現よりも省略された形のとき、あるいは同一の場
合を合格とする。これにより同一のデータの出現の順番
と隣接する肩書などの要素に関する制限を記述する規則
に基づき、リスト中の要素の整合性のチェックを行うこ
とができる。Of course, no. 2, No. If the expression of 3 is the same as the previous expression, it is also accepted. As described above, in general, the title is omitted after the title is specified in detail, so that a case where the back expression is omitted from the front expression or the same case is judged as pass. This makes it possible to check the consistency of the elements in the list based on the rules for describing the order of appearance of the same data and restrictions on elements such as adjacent titles.

【００９４】（９）本発明の第９の実施の形態本発明の第９の実施の形態を図１９及び図２０に基づき
説明する。例えば会社の常務の如く、同一肩書に複数の
人物が存在するような場合、会社名と肩書が特定されて
も複数の人物が存在するので、キー項目に対して値が一
つに決定できない。このためデータベース中にも複数の
レコードが存在するため、これらの全部と整合性を求め
ることが必要となる。(9) Ninth Embodiment of the Present Invention A ninth embodiment of the present invention will be described with reference to FIGS. For example, in the case where a plurality of persons exist in the same title as in the company's office, even if the company name and the title are specified, there are a plurality of persons, so that a single value cannot be determined for the key item. For this reason, since a plurality of records exist in the database, it is necessary to obtain consistency with all of them.

【００９５】このため、事実データデータベース４には
例えば組織体名が「Ａ社」であり、肩書が「常務」であ
るようなレコードについては、全員の名前を、図１９に
おいて、同一キー項目部４−０に示す如く、登録してお
く。For example, in the fact data database 4, for a record in which the name of the organization is “Company A” and the title is “Management”, the names of all the members are entered in FIG. Register as shown in 4-0.

【００９６】また整合性検証部１８には、同一キー項目
全チェック部１８−１を設け、例えば会社名と常務のよ
うな特定肩書のようにキーを指定してもユニークに値が
定まらず、複数の値が存在するとき、この同一キー項目
全チェック部１８−１が同一キーに関して登録されてい
る全部のレコードについて整合性をチェックし、一致す
る値の存在しないときに不整合と判定する。Further, the consistency verification unit 18 is provided with the same key item all check unit 18-1. Even if a key is designated as a specific title such as a company name and a managing director, a unique value is not determined, When there are a plurality of values, the same key item all check unit 18-1 checks the consistency of all records registered for the same key, and determines that there is no matching when there is no matching value.

【００９７】例えば図１９に示す事実データデータベー
ス４中に、図２０ので示す如く、組織体名「Ａ社」の
肩書「常務」として名前「ＡＢ」、「ＣＤ」が登録され
ているとき、図１９に示すデータ抽出部１に、図２０の
で示す如き原テキスト「Ａ社のＸＹ常務」が入力され
ると、データ抽出部１は図２０ので示す如く、組織体
名として「Ａ社」、肩書として「常務」、名前として
「ＸＹ」を事実データとして抽出する。For example, when the names “AB” and “CD” are registered in the fact data database 4 shown in FIG. 19 as the title “Management” of the organization name “Company A”, as shown in FIG. When the original text “XY managing director of Company A” as shown in FIG. 20 is input to the data extracting unit 1 shown in FIG. 19, the data extracting unit 1 obtains “Company A” as the organization name and the title as shown in FIG. And "XY" as the name are extracted as fact data.

【００９８】これにより同一キー項目全チェック部１８
−１が、「Ａ社」、「常務」をキー項目として同一キー
項目部４−０を参照し、これらのキー項目と一致する複
数の登録されたレコードを全部チェックして前記「Ａ
社」、「常務」、「ＸＹ」と一致するもの有無をチェッ
クする。As a result, the same key item all check section 18
-1 refers to the same key item section 4-0 with “Company A” and “Management” as key items, checks all of a plurality of registered records that match these key items, and
Check whether there is a match with “company”, “management”, and “XY”.

【００９９】これにより一つでも一致するものがあれば
問題はないとするが、図２０に示す場合には、一致する
ものが存在しないので、そのに示す如く、チェック結
果は整合性なしと判断され、誤り処理部３に表示される
ことになる。As a result, there is no problem if there is at least one match, but in the case shown in FIG. 20, since there is no match, it is determined that the check result is not consistent as shown in FIG. The result is displayed on the error processing unit 3.

【０１００】（１０）本発明の第１０の実施の形態本発明の第１０の実施の形態を図２１に基づき説明す
る。第１０の実施の形態は、本発明の文書校正装置に使
用する事実データデータベース４を構築する手法に関す
るものである。(10) Tenth Embodiment of the Present Invention A tenth embodiment of the present invention will be described with reference to FIG. The tenth embodiment relates to a method for constructing a fact data database 4 used in the document proofreading apparatus of the present invention.

【０１０１】データ抽出部１は抽出した事実データを整
合性検証部２に送出するか、データ更新部１９に送出す
る。いずれかを選択するのかを更新信号により制御す
る。例えば更新信号が「１」のとき、基準テキストから
抽出された事実データがデータ更新部１９に送出され、
更新信号が「０」のとき、チェック対象テキストから抽
出された事実データが整合性検証部２に送出される。The data extracting unit 1 sends the extracted fact data to the consistency verifying unit 2 or sends it to the data updating unit 19. Which one is selected is controlled by the update signal. For example, when the update signal is “1”, the fact data extracted from the reference text is sent to the data update unit 19,
When the update signal is “0”, fact data extracted from the check target text is sent to the consistency verification unit 2.

【０１０２】基準テキストは、事実データデータベース
４を構築するためのものであって、その記述内容は、予
め厳重なチェックを受けた正確な内容のテキスト群で構
成されている。The reference text is for constructing the fact data database 4, and the description content is composed of a text group of accurate content which has been strictly checked in advance.

【０１０３】データ更新部１９は、基準テキストに基づ
きデータ抽出部１が抽出した事実データを事実データデ
ータベース４に登録するものである。図２１において、
データ抽出部１に先ず基準テキストを入力する。このと
き更新信号を例えば「１」にしておく。これによりデー
タ抽出部１は基準テキストから抽出した事実データをデ
ータ更新部１９に送出する。そしてデータ更新部１９は
この事実データに基づき事実データデータベースを順次
更新し、事実データデータベースを構築する。The data updating unit 19 registers the fact data extracted by the data extracting unit 1 based on the reference text in the fact data database 4. In FIG.
First, a reference text is input to the data extraction unit 1. At this time, the update signal is set to, for example, “1”. As a result, the data extracting unit 1 sends the fact data extracted from the reference text to the data updating unit 19. Then, the data updating unit 19 sequentially updates the fact data database based on the fact data, and constructs the fact data database.

【０１０４】このようにして事実データデータベースを
修正した後に、更新信号を「０」にして、チェック対象
テキストをデータ抽出部１に入力する。データ抽出部１
により抽出された事実データは、今度は整合性検証部２
により、先程更新された事実データデータベース４を参
照しながら、誤りの検出処理を受ける。After the fact data database is corrected in this way, the update signal is set to “0” and the text to be checked is input to the data extraction unit 1. Data extraction unit 1
The fact data extracted by the verification unit 2
As a result, an error detection process is performed while referring to the fact data database 4 updated earlier.

【０１０５】このようにして基準テキストにより事実デ
ータデータベース４を随時更新して正確なものとするの
で、チェック対象テキストを正確に校正することができ
る。（１１）本発明の第１１の実施の形態本発明の第１１の実施の形態を図２２により説明する。
第１１の実施の形態では、すでに登録しているデータと
の整合性をチェックして矛盾のないものについては登録
するものである。As described above, the fact data database 4 is updated at any time with the reference text to make it accurate, so that the text to be checked can be accurately corrected. (11) Eleventh Embodiment of the Present Invention An eleventh embodiment of the present invention will be described with reference to FIG.
In the eleventh embodiment, the consistency with already registered data is checked, and if there is no inconsistency, the data is registered.

【０１０６】例えば「Ａ社常務ＡＢ氏、Ａ社常務
ＣＤ氏、Ａ社常務ＥＦ氏・・・」というテキスト
が入力されてデータ抽出部１により組織体名、肩書、名
前がそれぞれ「Ａ社常務ＡＢ」、「Ａ社常務Ｃ
Ｄ」、「Ａ社常務ＥＦ」・・・という事実データが
抽出され、順次整合性検証部２０に送出される。For example, texts such as “Company A's Managing AB, Company A's Managing CD, Company A's Managing EF ...” are input, and the data extracting unit 1 changes the organization name, title, and name to “Company A AB ”,“ Company A, Managing C ”
D ”,“ Company A, Managing EF ”... Are extracted and sequentially sent to the consistency verification unit 20.

【０１０７】これにより整合性検証部２０では、先ず組
織体名、肩書、名前が「Ａ社常務ＡＢ」により事実デ
ータデータベース４を参照する。これにより事実データ
データベース４より「Ａ社社長ＡＢ」というデータ
が参照されたとき、矛盾検出部２０−１はこれをチェッ
クしてＡ社のＡＢが、常務と社長の異なる肩書をもつこ
とは矛盾するので整合性なしと判断し、これを誤り部分
表示部３−１で表示させる。Thus, the consistency verifying section 20 first refers to the fact data database 4 by the name of the organization, the title, and the name “Company A, Managing AB”. As a result, when the data “President AB of Company A” is referred to from the fact data database 4, the contradiction detecting unit 20-1 checks this, and it is contradictory that AB of Company A has different titles of the managing director and the president. Therefore, it is determined that there is no consistency, and this is displayed on the error part display unit 3-1.

【０１０８】しかし「Ａ社常務ＣＤ」により事実デ
ータデータベース４を参照したとき、「Ａ社常務Ｘ
Ｙ」というデータが参照されても、矛盾検出部２０−１
はこれをチェックして常務に異なる名前の者が複数存在
しても矛盾しないので整合性ありと判断し、この「Ａ社
常務ＣＤ」をデータ更新部５に送出し、事実データ
データベース４をこれにより更新する。同様に「Ａ社
常務ＥＦ」というデータもデータ更新部５に送出さ
れ、事実データデータベース４を更新する。従ってその
後にテキスト抽出部１より「Ａ社取締役ＥＦ」とい
う事実データが抽出されて整合性検証部２０で事実デー
タデータベース４から「Ａ社常務ＥＦ」というデー
タが参照されたとき、矛盾検出部２０−１が抽出された
事実データと参照されたデータとが矛盾するものと判断
し、これを誤り部分表示部３−１に表示する。However, when the fact data database 4 is referred to by “Company A managing CD”, “Company A managing X”
Even if the data "Y" is referred to, the inconsistency detection unit 20-1
Checks this and judges that there is no contradiction even if there are a plurality of persons with different names in the management, and sends the "Company A's management CD" to the data update unit 5 and updates the fact data database 4 Update by Similarly, "Company A
The data of "management EF" is also sent to the data update unit 5, and updates the fact data database 4. Therefore, when the fact data "Company A Director EF" is subsequently extracted from the text extraction unit 1 and the data "Company A Managing EF" is referred to from the fact data database 4 by the consistency verification unit 20, the contradiction detection unit 20 -1 is determined to be inconsistent between the extracted fact data and the referenced data, and this is displayed on the error display section 3-1.

【０１０９】このようにして、テキストから抽出された
各事実データを、事実データデータベース４中の既存の
データとの整合性をチェックして、問題がないデータに
ついては順次登録することにより、テキスト中に記載さ
れた事実データ同士の整合性をチェックすることがで
き、テキスト中の事実データ相互の不整合部分を抽出す
ることが可能となる。In this way, each fact data extracted from the text is checked for consistency with the existing data in the fact data database 4, and if there is no problem, the data is sequentially registered. Can be checked for consistency between fact data described in the above, and it is possible to extract inconsistent portions between fact data in the text.

【０１１０】（１２）本発明の第１２の実施の形態本発明の第１２の実施の形態を図２３により説明する。
第１２の実施の形態では事実データデータベース４を複
数の種別、例えば政治分野とか、経済分野とか、スポー
ツ分野とかに毎にデータベース４−１、４−２、４−３
・・・を分け、テキストの種別に応じてそれに対応する
データベースを参照するものである。(12) Twelfth Embodiment of the Present Invention A twelfth embodiment of the present invention will be described with reference to FIG.
In the twelfth embodiment, the fact data database 4 is divided into a plurality of types, for example, databases 4-1 4-2, 4-3 for each of the political field, the economic field, and the sports field.
.. Are divided and a database corresponding to the text type is referred to.

【０１１１】またデータ抽出部２１には種別検出部２１
−１が設けられ、テキストの種別を抽出された事実デー
タに基づき断然する。例えば「首相」という語が検出さ
れたときテキスト種別を「政治」と判断し、「野球」と
いう語が検出されたときはテキストの種別を「スポー
ツ」と判断し、これに応じてデータベースを識別するＤ
Ｂ識別信号を出力する。The data detecting section 21 includes a type detecting section 21.
-1 is provided, and the type of the text is definitely determined based on the extracted fact data. For example, when the word "Prime Minister" is detected, the text type is determined to be "politics". When the word "baseball" is detected, the text type is determined to be "sports", and the database is identified accordingly. D
A B identification signal is output.

【０１１２】参照ＤＢ制御部２２は、前記ＤＢ識別信号
に応じて事実データデータベース４からこのＤＢ識別信
号に対応するデータベース４−１、４−２、４−３・・
・の１つを選択するものである。The reference DB control unit 22 converts the data 4-1 to 4-2, 4-3,... Corresponding to the DB identification signal from the fact data database 4 in response to the DB identification signal.
・ Select one of the following.

【０１１３】図２３において、データ抽出部２１に、例
えば「横綱若ノ花が優勝しました」というテキストが入
力されるとデータ抽出部２１により「横綱若ノ花優勝」
という事実データが抽出され、また種別検出部２１−１
により「横綱」という語からそのテキストの種別が「ス
ポーツ」と識別され、その種別に応じたスポーツＤＢ識
別信号を参照ＤＢ制御部２２に出力する。In FIG. 23, for example, when a text “Yokozuna Wakanohana has won” is input to the data extraction unit 21, the data extraction unit 21 outputs “Yokozuna Wakanohana Win”.
Is extracted, and the type detection unit 21-1
Thus, the type of the text is identified as “sports” from the word “Yokozuna”, and a sports DB identification signal corresponding to the type is output to the reference DB control unit 22.

【０１１４】これにより参照ＤＢ制御部２２は、スポー
ツ用のデータベース４−３を選択するように制御を行
い。整合性検証部２がスポーツ用のデータベース４−３
を参照できるように接続制御される。Thus, the reference DB control unit 22 performs control so as to select the sports database 4-3. The consistency verification unit 2 executes the sports database 4-3.
The connection is controlled so that can be referred to.

【０１１５】この状態で、整合性検証部２が前記事実デ
ータ「横綱若ノ花優勝」にもとづきスポーツ用のデータ
ベース４−３を参照して「横綱若ノ花優勝」を読み出
し、整合性なしと判断し、誤り部分表示部３−１にこれ
を表示する。In this state, the consistency verifying unit 2 reads “Yokozuna Wakanohana victory” by referring to the sports database 4-3 based on the fact data “Yokozuna Wakanohana victory”, judges that there is no consistency, and makes an error. This is displayed on the partial display unit 3-1.

【０１１６】この場合、事実データデータベース４を種
別に構成してあるので、整合性の可否を速く、正確に検
証することができる。（１３）本発明の第１３の実施の形態本発明の第１３の実施の形態を図２４及び図２５に基づ
き説明する。第１３の実施の形態では誤り易い可能性の
高い部分を予め予想しておき、正確なチェックを行うも
のである。例えば中近東諸国のように、あまり知られて
いない地域では近隣の国と国名の混同がおき得る。この
ため、日本ではあまり馴染みのない中近東のヨルダンに
対して、近隣のレバノン、イラクを混合し易い国名とし
て記述し、誤り確率を例えば数値１のように混同可能性
テーブルに誤り易いものを低い数値で記入しておく。In this case, since the fact data database 4 is configured for each type, it is possible to quickly and accurately verify whether or not there is consistency. (13) Thirteenth Embodiment of the Present Invention A thirteenth embodiment of the present invention will be described with reference to FIGS. In the thirteenth embodiment, a portion which is likely to be erroneous is predicted in advance and an accurate check is performed. In lesser-known regions, such as the Middle East, confusion between neighboring countries and names can occur. For this reason, for Jordan in the Middle and Near East, which is not very familiar in Japan, describe nearby Lebanon and Iraq as easy-to-mix country names, and set the error probability in the confusion possibility table, such as numerical value 1, as low as possible. Fill in numerical values.

【０１１７】同様に昇進や辞任等である人物の肩書が変
更された直後には、最新のデータを持たないためにその
人物に古い役職を書いてしまう可能性が高い。このため
に新任の英首相であるブレア氏に対して労働党党首の役
職で呼ぶ可能性が高いので、これまた誤り確率を例えば
数値１．５のように混同可能性テーブルに誤り易いもの
を低い数値で記入しておく。Similarly, immediately after the title of a person who has been promoted or resigned has been changed, there is a high possibility that an old post will be written for that person because he does not have the latest data. For this reason, it is highly likely that Mr. Blair, the newly appointed British Prime Minister, will be called in the post of Labor Party leader, so that the error probability is set to a low value in the confusion possibility table like 1.5, for example. Please fill in.

【０１１８】図２４に示す如く、整合性検証部２３にコ
スト変更部２３−１を設け、複数の参照データが存在す
るとき誤り確率の低いものを求めるものである。混同可
能性テーブル２４は、前記の如く、誤り易い可能性の高
い部分を予め予想しておき誤り確率を、誤り易いもの程
小さい数値で表示するものである。混同可能性テーブル
２４はフィールド毎に誤りの種類に関する情報を持つ。
図２４では、混同可能性テーブル２４−１になじみのな
い中小の国、あまり知られていない地域の近隣の国と混
同し易いものを示し、混同可能性テーブル２４−２に
は、最近に行われた昇進や辞任等により現肩書と前肩書
との混同し易い可能性の高いものを示す。As shown in FIG. 24, a cost changing unit 23-1 is provided in the consistency verifying unit 23, and when a plurality of reference data exist, a unit having a low error probability is obtained. As described above, the confusion possibility table 24 predicts a portion that is likely to be erroneous in advance, and displays the error probability with a numerical value that is smaller as the error is more likely. The confusion possibility table 24 has information on the type of error for each field.
FIG. 24 shows small and medium-sized countries that are unfamiliar to the confusion possibility table 24-1 and those that are easily confused with neighboring countries in a less-known region. This indicates that there is a high possibility that the current title and the previous title are easily confused by the promotion or resignation.

【０１１９】例えば図２５に示す如く、「ヨルダンの
フセイン大統領がエジプト訪問」という原テキストが図
２４に示す、データ抽出部１に入力され、図２５に示
す如く、国名として「ヨルダン」、役職として「大統
領」、個人名として「フセイン」が抽出データとして抽
出される。これに基づき、図２４に示す整合性検証部２
３が事実データデータベース４を参照する。For example, as shown in FIG. 25, the original text “President Hussein of Jordan visits Egypt” is input to the data extraction unit 1 shown in FIG. 24, and as shown in FIG. 25, the country name is “Jordan” and the post is “President” and “Hussein” as personal names are extracted as extraction data. Based on this, the consistency verification unit 2 shown in FIG.
3 refers to the fact data database 4.

【０１２０】そして事実データデータベース４中から一
致する可能性のあるものとして、国名、役職、個人名が
それぞれ「ヨルダン国王フセイン」と、「イラク
大統領フセイン」とが抽出される。In the fact data database 4, there is a possibility that the country name, the title, and the personal name may be "King Hussein of Jordan" and "Iraq," respectively.
President Hussein is extracted.

【０１２１】この時、図２５に示す如く、誤り確率と
して国名が「３」、役職が「２」も抽出される。整合性
検証部２３では、抽出データに存在する「ヨルダン」よ
り、混同可能性テーブル（国名）２４−１を参照すべき
ものであること認識し、コスト変更部２３−１がこの混
同可能性テーブル２４−１からヨルダンをイラク、また
はレバノンと誤り確率が数値「１」であることを判断
し、そのもっとも低い数値のものを誤り易いものと判断
する。そしてこの場合、図２５のに示す如く、テキス
トの「ヨルダン」を「イラク」と校正すれば事実データ
データベース４中のデータと一致すことを認識する。At this time, as shown in FIG. 25, the country name “3” and the post “2” are also extracted as error probabilities. The consistency verification unit 23 recognizes that the confusion possibility table (country name) 24-1 should be referred to from “Jordan” existing in the extracted data, and the cost change unit 23-1 recognizes the confusion possibility table 24-1. From −1, Jordan is judged to be Iraq or Lebanon and the error probability is numerical value “1”, and the lowest numerical value is judged to be easy to error. Then, in this case, as shown in FIG. 25, if the text "Jordan" is corrected to "Iraq", it is recognized that the data matches the data in the fact data database 4.

【０１２２】（１４）本発明の第１４の実施の形態本発明の第１４の実施の形態を図２６により説明する。
第１４の実施の形態では、先ず訓練用テキストを使用し
て誤り検出・修正を行った結果を記録しておき、実際に
起きた誤りの傾向を分析し、生起し易い種類の誤り種類
に対する評価を変更し、個々のテキストに最適な誤り検
出・訂正を行うようにしたものである。(14) Fourteenth Embodiment of the Present Invention A fourteenth embodiment of the present invention will be described with reference to FIG.
In the fourteenth embodiment, first, the result of error detection / correction using a training text is recorded, the tendency of an actually occurring error is analyzed, and the type of error that is likely to occur is evaluated. Is changed so as to perform optimal error detection / correction for each text.

【０１２３】本発明では、先ず訓練用テキストをデータ
抽出部１に入力され、データ抽出部１により事実データ
が抽出され、整合性検証部２６に伝達されて事実データ
データベース４を参照し、不一致部分つまり仮り誤り部
分が仮り誤り検出部２６−１により検出される。このと
き訓練用テキスト全体に対する仮り誤りが検出され、そ
の結果仮り誤りの集合が得られる。In the present invention, first, the training text is input to the data extracting unit 1, the fact data is extracted by the data extracting unit 1, transmitted to the consistency verifying unit 26, the fact data database 4 is referred to, That is, the provisional error portion is detected by the provisional error detection unit 26-1. At this time, provisional errors for the entire training text are detected, and as a result, a set of provisional errors is obtained.

【０１２４】この仮り誤りの集合は誤り傾向分析部２６
−２により、誤りがどのような傾向で存在するのか分析
される。その結果、例えば国名に誤りが発生し易いと
か、人名に誤りが発生し易いとかその傾向が判断され
る。This set of provisional errors is sent to the error tendency analysis unit 26.
With -2, the tendency of the error exists is analyzed. As a result, for example, it is determined whether an error easily occurs in a country name or an error easily occurs in a personal name.

【０１２５】パラメータ調整部２６−３は、この傾向が
伝達されたとき、この起こり易い誤りの検出能力を高め
るように、パラメータ例えば図６ので示す誤り確率の
数値をその部門について小さくし、誤り検出能力を高く
する。When this tendency is transmitted, the parameter adjusting unit 26-3 reduces the parameter, for example, the numerical value of the error probability shown in FIG. Improve ability.

【０１２６】このあとで誤り検出部２６−４により誤り
検出をもう一度繰り返し、前記調整通りの、誤りが起こ
り易い部分について誤り検出が正しく行われるか否かを
チェックする。このようにして高精度の誤り検出結果を
得る。Thereafter, the error detection unit 26-4 repeats the error detection once again, and checks whether or not the error detection is correctly performed on the error-prone portion as described above. In this way, a highly accurate error detection result is obtained.

【０１２７】なお上記説明は訓練用テキストの場合であ
り、通常のテキストに対しては、データ抽出部１にて抽
出された事実データは誤り検出部２６−４に伝達され、
前記調整結果により事実データデータベース４の参照が
行われる。The above description is for training texts. For ordinary texts, the fact data extracted by the data extraction unit 1 is transmitted to the error detection unit 26-4.
The fact data database 4 is referred to based on the adjustment result.

【０１２８】（１５）本発明の第１５の実施の形態本発明の第１５の実施の形態を図２７、図２８に基づき
説明する。例えば会社が倒産したときその会社の役員が
辞職するので、事実データデータベースよりその会社役
員を削除することが必要となる。第１５の実施の形態で
は、ある事象に従属して起きる事実データの変更につい
ての表を備え、特定の事象が起きたとき、事実データデ
ータベース４中の他のデータをこれに応じて更新処理
し、データの整合性を保つものである。(15) Fifteenth Embodiment of the Present Invention A fifteenth embodiment of the present invention will be described with reference to FIGS. For example, when a company goes bankrupt, an officer of the company resigns, so it is necessary to delete the company officer from the fact data database. In the fifteenth embodiment, a table is provided for fact data changes that occur in response to a certain event, and when a particular event occurs, other data in the fact data database 4 is updated accordingly. , To maintain data consistency.

【０１２９】図２７に示す如く、従属事象テーブル２７
を用意し、ある事象に従属して起きる事実データの変更
内容を示す。例えば倒産という事象に対しては役員削除
を行うことを示す。事象例としては、図２８に示す如
く、倒産という事象に伴って、役員はその地位を失うの
で役員データを削除することが必要となる。また図２８
に示す如く、要人の暗殺という事象に伴って当人のつ
いていた役職は全て解任されるので当人のデータをこれ
また修正することが必要となる。As shown in FIG. 27, the dependent event table 27
Is prepared, and the contents of the change of the fact data that occurs following a certain event are shown. For example, it indicates that officers will be deleted for the event of bankruptcy. As an example of an event, as shown in FIG. 28, an officer loses his / her position with the event of bankruptcy, so it is necessary to delete officer data. FIG. 28
As shown in the above, all positions held by the person in connection with the event of the assassination of the dignitary will be dismissed, and it will be necessary to correct the data of the person again.

【０１３０】図２７において、データ抽出部１にテキス
トを入力する。データ抽出部１は事実データの抽出処理
を行い、この事実データが整合性検証部２６に伝達され
る。整合性検証部２６は、この事実データに、倒産と
か、暗殺とか、従属事象テーブル２７の検索項目が、例
えばＡ社倒産ということが存在していることを検出した
とき、従属事象検索部２６−１に対し、従属事象テーブ
ル２７を検索させて倒産という事象に従属して行うべき
事項を検索させる。そして役員削除ということを認識す
る。In FIG. 27, a text is input to the data extraction unit 1. The data extraction unit 1 performs a process of extracting fact data, and this fact data is transmitted to the consistency verification unit 26. When the consistency verification unit 26 detects in this fact data that bankruptcy, assassination, or a search item in the dependent event table 27 indicates, for example, bankruptcy of Company A, the dependent event search unit 26- 1 is made to search the dependent event table 27 to search for items to be performed depending on the event of bankruptcy. Then, they recognize that the officer has been deleted.

【０１３１】これにより整合性検証部２６はＡ社役員削
除という変更データをデータ更新部５に送出する。これ
に基づきデータ更新部５は、事実データデータベース４
中のＡ社役員に関するデータを全部削除する。このよう
にして事実データデータベース４のデータを事象に合わ
せて対応処理することができる。As a result, the consistency verification section 26 sends the changed data indicating that the company A officer has been deleted to the data update section 5. On the basis of this, the data update unit 5 sends the fact data database 4
Delete all the data concerning the officers of Company A. In this manner, the data in the fact data database 4 can be processed in accordance with the event.

【０１３２】（１６）本発明の第１６の実施の形態本発明の第１６の実施の形態を図２９及び図３０に基づ
き説明する。例えば米国大統領が新く選出された場合、
これに基づき旧大統領、新大統領の終了期日、新任期日
等を設定するものであり、第１６の実施の形態では事実
の変更に関するデータをテキストから抽出した後、変更
前の状態が抽出データの記載時において存在し得るかを
検証し、更に旧事実の終了期日の設定、新事実の生起期
日の設定を行うものである。(16) Sixteenth Embodiment of the Present Invention A sixteenth embodiment of the present invention will be described with reference to FIGS. 29 and 30. For example, if the US President is newly elected,
Based on this, the end date of the old president and the new president, the date of new appointment, etc. are set. In the sixteenth embodiment, after extracting data relating to the fact change from the text, the state before the change is the extracted data. It verifies whether it can exist at the time of description, and sets the end date of the old fact and the date of occurrence of the new fact.

【０１３３】このために、図２９に示す如く、整合性検
証部２８に、変更前の状態が抽出データの記載時におい
て存在し得るかを検証する変更前状態検出部２８−１を
設け、また誤り処理部２９に、旧事実の終了期日の設定
及び新事実の生起期日の設定を行う更新データ作成部２
９−１を設ける。For this purpose, as shown in FIG. 29, the consistency verification section 28 is provided with a pre-change state detection section 28-1 for verifying whether a state before change can exist at the time of describing the extracted data. An update data creation unit 2 for setting an end date of an old fact and an occurrence date of a new fact in the error processing unit 29.
9-1 is provided.

【０１３４】いま、図３０に示す如く、事実データデ
ータベース４中に、国名、肩書、名前、生起期日、終了
期日がそれぞれ、「米国、大統領、ブッシュ、１９９
０、不明」というデータが登録されているとき、データ
抽出部１に、図３０に示す如く、「クリントン大統領
が１９９２年の大統領選挙に初当選」というテキストが
入力される。そしてこれより抽出された事実データ、
「クリントン、大統領、１９９２年、大統領選挙、初当
選」という事実データが整合性検証部２８に送出され
る。Now, as shown in FIG. 30, in the fact data database 4, the country name, title, name, date of birth, and date of end are respectively "US, President, Bush, 199".
When the data “0, unknown” is registered, the text “President Clinton is first elected in the 1992 presidential election” is input to the data extraction unit 1, as shown in FIG. And fact data extracted from this,
The fact data “Clinton, President, 1992, presidential election, first election” is sent to the integrity verification unit 28.

【０１３５】ところで、このときブッシュが１９９０年
に米大統領になったことだけが事実データデータベース
４中に記載されている。このとき前記テキストから抽出
された事実データにより、「クリントン、１９９２、大
統領選挙、初当選」により、クリントンが１９９２年に
大統領に初当選したということが判別される。クリント
ンが１９９２年に初めて大統領になるためにはそれ以前
に別の人が大統領でなければならないが、整合性検証部
２８の変更前状態検出部２８−１が、図３０の「米国
大統領ブッシュ１９９０終了期日不明」という
データから、このブッシュが前大統領であることを認識
する。By the way, at this time, only the fact that Bush became US President in 1990 is described in the fact data database 4. At this time, from the fact data extracted from the text, it is determined that Clinton was first elected president in 1992 by "Clinton, 1992, presidential election, first election". Before Clinton can become President for the first time in 1992, another person must be President before that. However, the pre-change state detector 28-1 of the integrity verification unit 28 reads the “US President Bush 1990 in FIG. The end date is unknown, ”confirms that Bush is the former president.

【０１３６】一方クリントンの大統領就任のため、ブッ
シュの任期は１９９２年で終了することになるが、これ
が誤り処理部２９により認識される。また誤り処理部２
９では、事実データデータベース４に、クリントンの大
統領就任にもとづく新たなデータ追加を必要とすること
を認識する。On the other hand, Bush's term will end in 1992 due to the inauguration of Clinton's presidency, which is recognized by the error processing unit 29. Error processing unit 2
9 recognizes that new data needs to be added to the fact data database 4 based on Clinton's inauguration.

【０１３７】誤り処理部２９の更新データ作成部２９−
１により、図３０に示す如き更新データが作成され、
この更新データがデータ更新部５に送出されて事実デー
タデータベース４に登録されて、図３０に示す通りの
データが登録されることになる。The update data creating unit 29-of the error processing unit 29
1, the update data as shown in FIG. 30 is created,
The updated data is sent to the data updating unit 5 and registered in the fact data database 4, and the data as shown in FIG. 30 is registered.

【０１３８】（１７）本発明の第１７の実施の形態本発明の第１７の実施の形態を図３１及び図３２に基づ
き説明する。図３１は本発明の第１７の実施の形態図、
図３２はその動作説明図である。整合性検証部が事実デ
ータデータベースを検索したとき、終了期日又は生起期
日が不明な事実データが存在した場合、そのデータ指定
期日における信頼性をデータ変動の頻度、指定期日と生
起期日又は終了期日との差を元にして評価するものであ
り、特定の期日においてある閾値内の信頼性を持つ事実
データのみ整合性チェックを行うものである。(17) Seventeenth Embodiment of the Present Invention A seventeenth embodiment of the present invention will be described with reference to FIGS. FIG. 31 is a seventeenth embodiment of the present invention,
FIG. 32 is an explanatory diagram of the operation. When the consistency verification unit searches the fact data database and there is fact data whose end date or date of occurrence is unknown, the reliability at the data designated date is determined by the frequency of data fluctuation, the designated date and the date of occurrence or end date. The evaluation is based on the difference between the fact data and the consistency check is performed only on fact data having reliability within a certain threshold on a specific date.

【０１３９】例えば政情不安定な国において大統領が短
期間に頻繁に変えるような場合には、閾値を例えば２年
と定め、指定期日より２年の差のあるものはチェックし
ない。For example, if the president changes frequently in a short period of time in a politically unstable country, the threshold value is set to, for example, two years, and those having a difference of two years from the designated date are not checked.

【０１４０】また政情が安定している国においても、不
完全な期日指定しかないデータに対して、如何にその信
頼性を評価するのかについて、図３２に例示する。図３
２の例では、事実データデータベース４に、に示す如
く、クリントンが１９９７年に米国大統領であることだ
けが記載されている場合である。FIG. 32 shows an example of how to evaluate the reliability of data for which only incomplete dates are specified in a country where the political situation is stable. FIG.
In the second example, the fact data database 4 only describes that Clinton is the United States President in 1997, as shown in FIG.

【０１４１】いま、図３１に示すデータ抽出部１に、図
３２のに示す如く「ブッシュ米大統領が１９９１年に
ベルリンを訪問」というテキストが入力され、これに基
づき、に示す如く、国名、肩書、名前、生起期日がそ
れぞれ「米国、大統領、ブッシュ、１９９１」という事
実データが抽出される。実際は抽出された事実データに
ベルリン訪問も含まれるが、この部分は期日指定に関係
がないので、省略する。Now, as shown in FIG. 32, the text “President US Bush visits Berlin in 1991” is input to the data extraction unit 1 shown in FIG. , The name and the date of occurrence are respectively extracted as “USA, President, Bush, 1991”. Actually, the extracted fact data includes a visit to Berlin, but this part is not relevant to the date specification and is omitted.

【０１４２】図３２ので示す事実データデータベース
４中のクリントン大統領のデータには、生起期日も終了
期日も指定がないので、１９９１年にクリントンが大統
領であった可能性は完全には否定できない。このため大
統領の地位の確認できるデータが、このように１９９７
年である場合に、その６年前に同一人物が同じ地位につ
いている確率を定量的に評価することとなる。In the fact data of President Clinton in the fact data database 4 shown in FIG. 32, neither the date of birth nor the date of termination is specified, so the possibility that Clinton was President in 1991 cannot be completely denied. As such, data confirming the status of the President is,
In the case of the year, the probability that the same person is in the same position six years ago is quantitatively evaluated.

【０１４３】米国大統領の任期が最大で２期（１期４
年）までということを考えるとその可能性は非常に小さ
いことがわかる。このため閾値Ｔｈ₁を６年とし、この
閾値以上越えたとき、このような場合に、前記抽出され
た事実データをチェック対象としなくとも、ジッシュの
データと相互に矛盾する可能性は、図３２に示す如
く、非常に低いことが想定でき、整合性の検証に合格し
たものとする。The term of office of the US President is a maximum of two terms (1 term 4
Year), the possibility is very small. Therefore the threshold value Th ₁ and 6 years, when it exceeds than the threshold value, in this case, even without the checked facts data the extracted, can conflict with each other and data Jisshu is 32 It can be assumed that it is very low as shown in FIG.

【０１４４】前記の場合、図３２ので示すテキストに
よりデータ抽出部１から抽出されたで示す如き事実デ
ータが整合性検証部３０により事実データデータベース
４を参照したとき、図３２ので示すデータの存在を検
知する。このとき閾値期限判別部３０−１が、テキスト
のデータが閾値Ｔｈ₁の６年以上の差のあることを認識
してチェックするのを中止し、整合したものとしてこの
テキストを出力する。In the above case, when the fact data extracted from the data extracting unit 1 by the text shown in FIG. 32 and referred to by the consistency verification unit 30 to the fact data database 4, the existence of the data shown in FIG. Detect. Threshold limit determination unit 30-1 this time, stops to check it recognizes that the data of the text with a difference more than 6 years of threshold Th _1, and outputs the text as having been aligned.

【０１４５】また、閾値はその対称により適宜定めるこ
とができ、例えば政情不安定な国における大統領に関す
る事項は閾値Ｔｈ₀を例えば２年と定め、２年以上の差
のあるデータはチェックを行わず、整合性の検証に合格
することができる。[0145] Further, the threshold value can be appropriately determined by its symmetry, for example, defined matters and the threshold value Th ₀ example 2 years about President in political unstable countries, data with a difference of more than 2 years without checking , Can pass the integrity verification.

【０１４６】（１８）本発明の第１８の実施の形態本発明の第１８の実施の形態を図３３に基づき説明す
る。本発明の第１８の実施の形態では、テキストを一旦
特定の分類毎に分類し、この分類中のテキストに基づき
分類毎に固有のデータベースを構築し、この分類毎で整
合性のチェックを行うようにしたものである。(18) Eighteenth Embodiment of the Present Invention The eighteenth embodiment of the present invention will be described with reference to FIG. In the eighteenth embodiment of the present invention, a text is once classified for each specific classification, a unique database is constructed for each classification based on the text in this classification, and the consistency is checked for each classification. It was made.

【０１４７】第１８の実施の形態は、図３３に示す如
く、事実データデータベース４、データ抽出部３２、デ
ータ更新部３３、整合性検証部３４等を具備する。事実
データデータベース４は、例えば前記政治分野、経済分
野、スポーツ分野・・・等に分類された分類別データベ
ース４−１、４−２、・・・４−ｎより構成される。そ
して分類別データベース４−１は政治分野に分類される
データが格納され、分類別データベース４−２は経済分
野に分類されるデータが格納される。そして分類別デー
タベース４−ｎにはスポーツ分野に分類されるデータが
格納される。As shown in FIG. 33, the eighteenth embodiment includes a fact data database 4, a data extraction unit 32, a data update unit 33, a consistency verification unit 34, and the like. The fact data database 4 is composed of classification databases 4-1 4-2,... 4-n classified into, for example, the political field, the economic field, the sports field, and the like. The category-specific database 4-1 stores data classified into the political field, and the category-specific database 4-2 stores data classified into the economic field. The classification-specific database 4-n stores data classified into the sports field.

【０１４８】テキスト３１は政治分野、経済分野、スポ
ーツ分野・・・等の複数の分類種別のテキストが存在す
るテキストコーパスである。テキストコーパスは複数の
テキストの集合体であり、複数の種別で構成されている
必要はない。The text 31 is a text corpus in which texts of a plurality of classification types such as a political field, an economic field, a sports field, etc. exist. A text corpus is an aggregate of a plurality of texts, and need not be composed of a plurality of types.

【０１４９】データ抽出部３２は、テキストより事実デ
ータを抽出するものであり、テキストより抽出した事実
データを予め定められた複数の分類毎に分類する部分テ
キスト抽出部３２−１を有するものである。The data extracting unit 32 extracts fact data from text, and has a partial text extracting unit 32-1 for classifying fact data extracted from text into a plurality of predetermined classifications. .

【０１５０】データ更新部３３は、データ抽出部３２か
ら伝達された分類毎の事実データを、その分類と同一分
類の分類別データベースに格納して分類毎に固有のデー
タベースを構築するものである。The data updating section 33 stores the fact data for each category transmitted from the data extracting section 32 in a database for each category of the same category as the category and constructs a unique database for each category.

【０１５１】整合性検証部３４はテキスト３１より抽出
された事実データが事実データデータベース４に格納さ
れているデータとの整合性をチェックするものであり、
部分テキスト抽出部３２−１による分類に基づき、その
分類と同じ分類の分類別データベースに格納されている
データとの整合性をチェックする分類別チェック部３４
−１を具備する。The consistency verifying unit 34 checks the consistency of the fact data extracted from the text 31 with the data stored in the fact data database 4.
Based on the classification by the partial text extraction unit 32-1, a classification checking unit 34 checks the consistency with the data stored in the classification database of the same classification as the classification.
-1.

【０１５２】例えばテキスト３１として１ヶ月分の新聞
の１面記事から構成されるテキストコーパスをデータ抽
出部３２に入力する。これによりデータ抽出部３２は、
これより事実データを抽出するが、部分テキスト抽出部
３２−１は、抽出した単語をキーとしてその事実データ
が例えば政治分野に分類されるものとか、経済分野に分
類されるものとか予め定められた分類に分ける。そして
これをデータ更新部３３及び整合性検証部３４に送出す
る。For example, a text corpus consisting of one page of a one-month newspaper article is input to the data extraction unit 32 as the text 31. Thereby, the data extraction unit 32
From this, fact data is extracted, and the partial text extracting unit 32-1 determines in advance that the fact data is classified into, for example, the political field or the economic field using the extracted word as a key. Divide into categories. This is sent to the data update unit 33 and the consistency verification unit 34.

【０１５３】整合性検証部３４では分類別チェック部３
４−１により、事実データの前記分けられた分類に応じ
た分類別データベースを参照し、整合性をチェックす
る。そしてこのときすでに参照された特定の分類別デー
タベース、例えば政治分野なら分類別データベース４−
１を参照して矛盾がなく、しかも同じものが格納されて
いなければこれを格納するように、データ更新部３３に
通知する。これによりデータ更新部３３は、その事実デ
ータを、その分類の分類別データベースに格納する。In the consistency verification section 34, the classification-based checking section 3
According to 4-1, the consistency is checked by referring to the classification-specific database corresponding to the divided classification of the fact data. At this time, a specific classification database already referred to, for example, a classification database 4 in the political field.
If no inconsistency is found by referring to No. 1 and the same is not stored, the data updating unit 33 is notified to store the same. Thereby, the data updating unit 33 stores the fact data in the classification-specific database of the classification.

【０１５４】このようにして分類別に固有のデータベー
スを構築することができる。従って分類別の正確なデー
タベースを構築することができ、正しい校正を行うこと
ができる。In this way, a unique database can be constructed for each classification. Therefore, an accurate database for each classification can be constructed, and correct calibration can be performed.

【０１５５】（１９）本発明の第１９の実施の形態本発明の第１９の実施の形態を図３４に基づき説明す
る。本発明の第１９の実施の形態では、テキスト全体を
一度に処理するのではなく、テキストの文書構造すなわ
ち文書のタイトルとか本文に付加されているタグにより
示される分類を参照して特定の種類の文書構造に対応す
る部分を抽出し、その中で整合性の判断を行い、矛盾の
ない整合性の得られたものを順次その種類つまりその分
類別のデータベースに格納するものである。(19) Nineteenth Embodiment of the Present Invention A nineteenth embodiment of the present invention will be described with reference to FIG. In the nineteenth embodiment of the present invention, instead of processing the entire text at once, a specific type of text is referred to by referring to the document structure of the text, that is, the classification indicated by the tag added to the document title or body. The part corresponding to the document structure is extracted, the consistency is determined in the extracted parts, and those having consistent consistency are sequentially stored in a database of the type, that is, the classification.

【０１５６】第１９の実施の形態は、図３４に示す如
く、事実データデータベース４、データ更新部３３、デ
ータ抽出部３５、整合性検証部３４等を具備する。事実
データデータベース４、データ更新部３３、整合性検証
部３４等は図３３に示すものと同一である。As shown in FIG. 34, the nineteenth embodiment includes a fact data database 4, a data update unit 33, a data extraction unit 35, a consistency verification unit 34, and the like. The fact data database 4, data update unit 33, consistency verification unit 34, and the like are the same as those shown in FIG.

【０１５７】データ抽出部３５は、テキストから事実デ
ータを抽出するものであるが、特定分類抽出部３５−１
を有し、予め指定された特定分類の事実データを、分類
種別を示すために付加されているテキストの文書構造の
タグを参照してその部分を抽出するものである。例えば
政治分野という分類指定を行うと政治に関するテキスト
の事実データだけが抽出される。The data extracting section 35 is for extracting fact data from the text.
And extracts the part of the fact data of the specific classification designated in advance by referring to the tag of the document structure of the text added to indicate the classification type. For example, when the classification is designated as a political field, only fact data of a text related to politics is extracted.

【０１５８】いま、特定分類抽出部３５−１に対して、
例えば分類指定として政治分野という指定を行い、デー
タ抽出部３５に対しテキストを入力する。これにより特
定分類抽出部３５−１は、テキストの文書構造のタグを
参照して政治分野のテキストのみから事実データを抽出
し、この分類指定事項を付加してこれをデータ更新部３
３及び整合性検証部３４に送出する。Now, for the specific classification extracting unit 35-1,
For example, a designation of a political field is designated as a classification designation, and a text is input to the data extraction unit 35. Thereby, the specific classification extraction unit 35-1 extracts fact data from only the text in the political field by referring to the tag of the document structure of the text, adds this classification designation item, and adds it to the data update unit 3
3 and the consistency verification unit 34.

【０１５９】整合性検証部３４では、これを事実データ
データベース４中の、その分類に応じた分類別データベ
ースを参照して整合性をチェックする。このときその分
類別データベースに格納されたデータとの矛盾がなく、
しかも同じものがなければこの事実データを格納するよ
うにデータ更新部３３に通知する。これによりデータ更
新部３３はその事実データをその分類の分類別データベ
ースに格納する。The consistency verification unit 34 checks the consistency by referring to the classification-specific database in the fact data database 4 corresponding to the classification. At this time, there is no inconsistency with the data stored in the classification database,
Moreover, if there is not the same data, the data update unit 33 is notified to store the fact data. As a result, the data updating unit 33 stores the fact data in the classification-specific database of the classification.

【０１６０】このようにして分類指定された分類に対す
る固有のデータベースを速く構築することができるの
で、特定の分類に対するデータベースの内容が不充分の
ときにこれを充実させることができる。As described above, a database unique to a classification specified by a classification can be quickly constructed, and can be enhanced when the content of the database for a specific classification is insufficient.

【０１６１】[0161]

【発明の効果】本発明では下記の作用を奏する。 (1) 入力されたテキストから事実データを抽出し、抽出
された事実データを事実データベース中の各レコードと
参照し、その不整合を検出し、不整合データのテキスト
中の表現の修正を行うので、原テキスト中の事実と異な
る誤った不整合データを正確に表示し、校正処理するこ
とができる。The present invention has the following effects. (1) Since fact data is extracted from the input text, the extracted fact data is referred to each record in the fact database, the inconsistency is detected, and the expression of the inconsistent data in the text is corrected. Incorrect and inconsistent data different from the facts in the original text can be accurately displayed and corrected.

【０１６２】(2) 事実データベース中のデータについ
て、各フィールド毎に誤りの可能性を評価し、テキスト
中のデータが事実データベース中のデータと一致せず、
しかも事実データベース中に対応する可能性のあるデー
タが複数存在したとき、前記誤りの可能性によりフィー
ルド値の変更コストを評価し、最も低いコストで事実デ
ータベース中のデータと対応する変更を選択して誤りの
内容を判断するので、正確な校正を行うことができる。(2) For the data in the fact database, the possibility of error was evaluated for each field, and the data in the text did not match the data in the fact database.
Furthermore, when there is a plurality of data that may correspond in the fact database, the cost of changing the field value is evaluated based on the possibility of the error, and the change corresponding to the data in the fact database at the lowest cost is selected. Since the content of the error is determined, accurate calibration can be performed.

【０１６３】(3) 期日を含むデータに対して、事実の変
更をテキストから抽出したとき、変更前の状態が抽出デ
ータの記載時において存在し得るか否かをチェックし、
事実データベースに更に旧事実の終了期日の設定、新事
実の生起期日の設定を行うので、正確な校正を行うこと
ができる。(3) For data including a due date, when a fact change is extracted from the text, it is checked whether or not the state before the change can exist at the time of describing the extracted data.
Since the end date of the old fact and the date of the occurrence of the new fact are set in the fact database, accurate calibration can be performed.

【０１６４】(4) テキストを一旦予め定めた手法で分類
し、各分類中のテキストを元に分類毎の固有の事実デー
タベースを構築して整合性をチェックし、問題のないデ
ータを順次登録したので、分類毎の事実データベースの
構築が可能となり、きめ細かなチェックが可能となり、
正確な校正を行うことができる。(4) Texts are once classified by a predetermined method, a unique fact database is constructed for each classification based on the text in each classification, the consistency is checked, and data without problems are sequentially registered. Therefore, it is possible to build a fact database for each classification, and it is possible to perform detailed checks,
Accurate calibration can be performed.

【０１６５】(5) テキストの文書構造に付加された分類
を示すタグを参照して、例えば社会面とかスポーツ面等
の、特定の文書構造に対応する部分を抽出して整合性を
判断し、問題のないデータを順次登録するので、これま
た特定分類の事実データデータベースの構築が早くで
き、その内容を充実させ、きめ細かなチェックが可能と
なり、正確な校正を行うことができる。(5) With reference to the tag indicating the classification added to the document structure of the text, a portion corresponding to a specific document structure, for example, a social aspect or a sports aspect, is extracted to determine consistency. Since data having no problem is sequentially registered, the fact data database of a specific classification can be quickly constructed, the contents thereof can be enhanced, a detailed check can be performed, and accurate calibration can be performed.

[Brief description of the drawings]

【図１】本発明の一実施の形態である。FIG. 1 is an embodiment of the present invention.

【図２】本発明の一実施の形態の動作説明図である。FIG. 2 is an operation explanatory diagram of one embodiment of the present invention.

【図３】本発明の第１の実施の形態詳細図である。FIG. 3 is a detailed view of the first embodiment of the present invention.

【図４】本発明の第２の実施の形態である。FIG. 4 is a second embodiment of the present invention.

【図５】本発明の第２の実施の形態の動作説明図（その
１）である。FIG. 5 is an operation explanatory view (1) of the second embodiment of the present invention.

【図６】本発明の第２の実施の形態の動作説明図（その
２）である。FIG. 6 is an operation explanatory diagram (part 2) of the second embodiment of the present invention.

【図７】本発明の第３の実施の形態である。FIG. 7 is a third embodiment of the present invention.

【図８】本発明の第３の実施の形態の動作説明図であ
る。FIG. 8 is an operation explanatory diagram of the third embodiment of the present invention.

【図９】本発明の第４の実施の形態である。FIG. 9 is a fourth embodiment of the present invention.

【図１０】本発明の第４の実施の形態の動作説明図であ
る。FIG. 10 is an operation explanatory diagram of the fourth embodiment of the present invention.

【図１１】本発明の第５の実施の形態である。FIG. 11 is a fifth embodiment of the present invention.

【図１２】本発明の第５の実施の形態の動作説明図であ
る。FIG. 12 is an explanatory diagram of an operation according to the fifth embodiment of the present invention.

【図１３】本発明の第６の実施の形態である。FIG. 13 is a sixth embodiment of the present invention.

【図１４】本発明の第６の実施の形態の動作説明図であ
る。FIG. 14 is an explanatory diagram of the operation of the sixth embodiment of the present invention.

【図１５】本発明の第７の実施の形態である。FIG. 15 is a seventh embodiment of the present invention.

【図１６】本発明の第７の実施の形態の動作説明図であ
る。FIG. 16 is an operation explanatory diagram of the seventh embodiment of the present invention.

【図１７】本発明の第８の実施の形態である。FIG. 17 shows an eighth embodiment of the present invention.

【図１８】本発明の第８の実施の形態の動作説明図であ
る。FIG. 18 is an operation explanatory view of the eighth embodiment of the present invention.

【図１９】本発明の第９の実施の形態である。FIG. 19 shows a ninth embodiment of the present invention.

【図２０】本発明の第９の実施の形態の動作説明図であ
る。FIG. 20 is an operation explanatory diagram of the ninth embodiment of the present invention.

【図２１】本発明の第１０の実施の形態である。FIG. 21 is a tenth embodiment of the present invention.

【図２２】本発明の第１１の実施の形態である。FIG. 22 shows an eleventh embodiment of the present invention.

【図２３】本発明の第１２の実施の形態である。FIG. 23 shows a twelfth embodiment of the present invention.

【図２４】本発明の第１３の実施の形態である。FIG. 24 shows a thirteenth embodiment of the present invention.

【図２５】本発明の第１３の実施の形態の動作説明図で
ある。FIG. 25 is an operation explanatory view of the thirteenth embodiment of the present invention.

【図２６】本発明の第１４の実施の形態である。FIG. 26 shows a fourteenth embodiment of the present invention.

【図２７】本発明の第１５の実施の形態である。FIG. 27 is a fifteenth embodiment of the present invention.

【図２８】本発明の第１５の実施の形態の動作説明図で
ある。FIG. 28 is an operation explanatory diagram of the fifteenth embodiment of the present invention.

【図２９】本発明の第１６の実施の形態である。FIG. 29 shows a sixteenth embodiment of the present invention.

【図３０】本発明の第１６の実施の形態の動作説明図で
ある。FIG. 30 is an operation explanatory diagram of the sixteenth embodiment of the present invention.

【図３１】本発明の第１７の実施の形態である。FIG. 31 shows a seventeenth embodiment of the present invention.

【図３２】本発明の第１７の実施の形態の動作説明図で
ある。FIG. 32 is an operation explanatory diagram of the seventeenth embodiment of the present invention.

【図３３】本発明の第１８の実施の形態である。FIG. 33 shows an eighteenth embodiment of the present invention.

【図３４】本発明の第１９の実施の形態である。FIG. 34 shows a nineteenth embodiment of the present invention.

[Explanation of symbols]

１データ抽出部２整合性検証部３誤り処理部３−１誤り部分表示部３−２処理部４事実データデータベース５データ更新部６信頼性評価部７誤り部分判定部 DESCRIPTION OF SYMBOLS 1 Data extraction part 2 Consistency verification part 3 Error processing part 3-1 Error part display part 3-2 Processing part 4 Actual data database 5 Data update part 6 Reliability evaluation part 7 Error part judgment part

Claims

[Claims]

1. A fact database storing data relating to a specific matter, a data extracting unit for extracting fact data from input text, and collating the extracted fact data with each record in the fact database. A document proofreading device comprising: a consistency verification unit that detects a match; and an error processing unit that corrects mismatched data and a corresponding expression in a text.

2. The consistency verification unit evaluates the possibility of an error for each field in the data in the fact database, and makes the data extracted from the text completely different from the data in the fact database. When there are multiple pieces of data that do not match and may correspond in the database, the cost of changing the field value is evaluated based on the possibility of error, and the corresponding change in the data in the database at the lowest cost is determined. 2. The document proofreading apparatus according to claim 1, wherein the content of the error is determined by selection.

3. The data extraction unit extracts data relating to a change in fact, and the consistency verification unit checks the consistency with respect to the state before the change, and the corresponding data is searched for consistency. Regarding the verified data, in the document proofreading device that corrects the corresponding data to the state after the change, when handling the fact data including the due date, after extracting the data related to the fact change from the text, the consistency verification unit Document proofreading characterized by verifying whether the state before the change can exist at the time of describing the extracted data, and further setting the end date of the old fact and the date of occurrence of the new fact in the error processing unit. apparatus.

4. Each fact data extracted from a text is
In a document proofreading device that checks the consistency with existing data in the fact database and checks the consistency between fact data described in the text by sequentially registering data having no problem in the fact database, When targeting a text corpus, it is necessary to classify the text once, build a unique fact database for each classification based on the text in each classification, and check the consistency in the consistency verification unit in it. Characteristic proofreading device.

5. Each fact data extracted from a text is
In a document proofreading device that checks consistency with existing data in the fact database and sequentially registers data with no problem in the fact database to check the consistency between fact data described in the text, The character verification unit does not process the entire text at once, but extracts the part corresponding to a specific document structure by referring to the document structure of the text, and judges consistency in the extracted part. Document proofreading device.