JPH08153101A

JPH08153101A - Proofreading method for japanese sentence

Info

Publication number: JPH08153101A
Application number: JP6294097A
Authority: JP
Inventors: Kazuyuki Yasui; 和之安井
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1994-11-29
Filing date: 1994-11-29
Publication date: 1996-06-11

Abstract

PURPOSE: To support proofreading by automatic checking as to errors of various chapter titles. CONSTITUTION: A morpheme analysis of a document is taken (S1) and when a specified proofread item of this document is a chapter line (S2), the chapter title part noted in the document is recognized and extracted from the morpheme analytic data (S3), and the pattern of this chapter title is stored in a tree structure (S4), whether or not there is an error in chapter title notation is checked based on the chapter title data in the tree structure (S5), and when there is a error, the chapter title is indicated to an operator to correct the chapter title by correction input by the operator (S6 and S7). To recognize the chapter title (S3), basic components noting the chapter title are prepared and the chapter title appearing in the document is recognized as a chapter title pattern which is a combination of basic components.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、日本語処理システムに
係り、特に文書の章タイトルの校正方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese language processing system, and more particularly to a method for proofreading chapter titles of documents.

【０００２】[0002]

【従来の技術】ワードプロセッサや機械翻訳、ドキュメ
ントデータベース、ハイパーテキストといったコンピュ
ータを使った日本語処理が実用化されている。2. Description of the Related Art Japanese language processing using computers such as word processors, machine translations, document databases, and hypertexts has been put into practical use.

【０００３】このための自然言語解析は、まず解析対象
となる文章を形態素単位（語構成の最小単位）に区切
り、それぞれの形態素がもつ性質を明らかにする形態素
解析を行う。この後、自然言語の統語規則から解析する
構文解析、続いて曖昧性や漠然性を取り除く意味解析、
文脈解析を行う。In natural language analysis for this purpose, a sentence to be analyzed is first divided into morpheme units (minimum units of word structure), and morpheme analysis is performed to clarify the properties of each morpheme. After this, a syntactic analysis that analyzes from the syntactic rules of natural language, and then a semantic analysis that removes ambiguity and vagueness,
Perform contextual analysis.

【０００４】このような日本語処理において、文書校正
支援システムとして、誤字、脱字、誤用語や未登録語を
抽出・修正する機能を設けている。In such Japanese processing, as a document proofreading support system, a function for extracting / correcting typographical errors, omissions, erroneous terms and unregistered words is provided.

【０００５】しかし、章タイトルについてはその校正機
能を設けているものはない。これは、章タイトルの書き
方は非常にバリエーションが多いため、これらの全てに
対応することが不可能であることによる。However, no chapter title has a proofreading function. This is because there are so many variations in how to write chapter titles that it is impossible to support all of them.

【０００６】[0006]

【発明が解決しようとする課題】従来の日本語処理シス
テムにおいて、文章校正支援システムには章タイトルの
校正支援機能がないため、人手による校正になり、手間
がかかるし、章タイトルの間違いの見落としを起こす恐
れがある。[Problems to be Solved by the Invention] In the conventional Japanese language processing system, since the grammar proofing support system does not have a chapter title proofreading support function, manual proofreading is required, and it takes time to overlook a mistake in the chapter title. May cause

【０００７】本発明の目的は、各種の章タイトルの誤り
についての自動チェックにより校正支援をする校正方法
を提供することにある。An object of the present invention is to provide a proofreading method for assisting the proofreading by automatically checking various chapter title errors.

【０００８】[0008]

【課題を解決するための手段】本発明は、前記課題の解
決を図るため、文書に表記される章タイトル部分をそれ
ぞれ認識・抽出し、前記抽出した章タイトルのパターン
をツリー構造にして保存し、前記ツリー構造の章タイト
ルデータから章タイトル表記の誤りの有無をチェック
し、誤りのある章タイトルをオペレータに指示し、オペ
レータの修正入力で該章タイトルを修正することを特徴
とする。In order to solve the above problems, the present invention recognizes and extracts each chapter title portion written in a document, and saves the extracted chapter title patterns in a tree structure. It is characterized in that the chapter title data of the tree structure is checked for an error in the chapter title notation, the operator is instructed to specify the incorrect chapter title, and the chapter title is corrected by the operator's correction input.

【０００９】また、本発明は、前記章タイトルの認識
は、章タイトルを表記する基本部品を用意し、文書に表
れる章タイトルを該基本部品の組み合わせになる章タイ
トルパターンとして認識することを特徴とする。Further, the present invention is characterized in that the chapter title is recognized by preparing a basic part for indicating the chapter title and recognizing the chapter title appearing in the document as a chapter title pattern which is a combination of the basic parts. To do.

【００１０】[0010]

【作用】章タイトル部分の抽出と、タイトルパターンの
ツリー構造化により表記の誤りを自動的にチェックでき
るようにする。[Function] The notation error can be automatically checked by extracting the chapter title part and by making the title pattern into a tree structure.

【００１１】また、章タイトルの認識には、基本部品の
組み合わせとして認識することにより、少ないデータを
使って確実な認識を得る。Further, in recognition of a chapter title, recognition is performed as a combination of basic parts, so that reliable recognition can be obtained using a small amount of data.

【００１２】[0012]

【実施例】図１は、本発明の一実施例を示す処理手順図
である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT FIG. 1 is a processing procedure diagram showing an embodiment of the present invention.

【００１３】（Ｓ１）日本語処理対象となる原文を形態
素解析する。(S1) Morphological analysis is performed on an original sentence to be processed in Japanese.

【００１４】（Ｓ２）形態素解析結果となる形態素デー
タを読み込み、オペレータが校正項目（章タイトル、誤
字、脱字、誤用語など）を指定する。(S2) The morpheme data which is the morpheme analysis result is read in, and the operator designates a calibration item (chapter title, typographical error, omission, erroneous term, etc.)

【００１５】（Ｓ３）指定された校正項目が章タイトル
の場合、形態素データを使って文書中の章タイトルを、
文書の頭から認識していく。この認識は、例えば、図２
に文書構成と章タイトルのみを示すように、「・・・」
で示す文章を含む文書の中から章タイトルのみを抽出す
る。(S3) If the designated proofreading item is a chapter title, the morpheme data is used to identify the chapter title in the document.
Recognize from the beginning of the document. This recognition is performed, for example, in FIG.
As shown in the document structure and chapter title only, "..."
Only the chapter title is extracted from the document including the sentence shown in.

【００１６】（Ｓ４）認識した章タイトルパターンのデ
ータをツリー構造で保存する。このツリー構造は、図２
の文書例に対するものを図３に示すように、章と節と款
等の順に分岐したデータ構造になる。(S4) The recognized chapter title pattern data is stored in a tree structure. This tree structure is shown in Figure 2.
As shown in FIG. 3, the data structure for the example document has a data structure that branches into chapters, sections, subsections, and the like in this order.

【００１７】（Ｓ５）ツリー構造の章タイトルデータか
ら章タイトルの書き方に誤りがあるか否かをチェックす
る。このチェックは、章タイトル番号の順列の抜けやダ
ブリ、さらに表記方法の違いをチェックする。(S5) It is checked from the chapter title data of the tree structure whether there is an error in the chapter title writing method. This check checks for missing permutations of chapter title numbers, duplication, and differences in notation.

【００１８】例えば、図３のツリー構造において、第２
章から第４章への飛び（第３章の抜け）、〇と●の表記
方法の違いがあり、これらをチェックする。For example, in the tree structure of FIG.
There is a jump from chapter to chapter 4 (missing chapter 3), and there is a difference in the notation of ◯ and ●, so check these.

【００１９】（Ｓ６）章タイトルチェックで誤りがある
か否かの結果と、誤り部分を抽出する。(S6) The result of whether there is an error in the chapter title check and the error portion are extracted.

【００２０】（Ｓ７）誤りがあれば、該誤り部分を表示
装置に表示し、オペレータによる修正入力で修正する。(S7) If there is an error, the error portion is displayed on the display device and corrected by the operator's correction input.

【００２１】したがって、本実施例によれば、章タイト
ルの書き方の間違いを自動的に認識・抽出することがで
き、間違い部分を指示することによりオペレータの修正
で簡単に校正できる。Therefore, according to the present embodiment, a mistake in writing the chapter title can be automatically recognized and extracted, and correction can be easily performed by the operator's correction by instructing the mistaken portion.

【００２２】次に、処理（Ｓ３）における章タイトルの
認識方法を説明する。章タイトルの書き方は多くのバリ
エーションがあるが、認識にはこれらの基本部品だけを
用意しておき、実際の文章の章タイトルに応じて章タイ
トルパターンを認識する。Next, a method of recognizing a chapter title in the process (S3) will be described. Although there are many variations in how to write chapter titles, only these basic parts are prepared for recognition, and the chapter title pattern is recognized according to the chapter title of the actual sentence.

【００２３】例えば、章タイトルの表記方法には、下記
表にに示すような基本部品が利用され、これら基本部品
を用意しておく。For example, in the notation method of chapter titles, the basic parts shown in the following table are used, and these basic parts are prepared.

【００２４】[0024]

【表１】 [Table 1]

【００２５】そして、実際に認識するときは、以下のよ
うなタイトル、「 ■３．１１統計データとその活用」には、基本部品を参照して「最初にタブが２つ入ってい
て、その後に四角が１つ続き、さらに数字列が１つにピ
リオド、数字列に空白を開け、最後に文字列が続く」と
いう基本部品の組み合わせになる章タイトルパターンと
して認識する。When actually recognizing, the following title, "3.11 Statistical data and its utilization", refers to the basic parts and says "First there are two tabs, then Is followed by a square, a number string is followed by a period, a blank is left in the number string, and a character string follows at the end. "

【００２６】なお、最初から文字が続いたり、空白の１
つの後に文字列が続くものは、章タイトルでなく普通の
文章であると認識し、章タイトル認識対象から除外す
る。Characters continue from the beginning or blank 1
Those that are followed by a character string are recognized as normal sentences, not as chapter titles, and are excluded from chapter title recognition targets.

【００２７】このような章タイトルの認識結果は、以後
に表れる章タイトルについても同様に実行され、処理Ｓ
４のタイトルパターンとして利用され、また処理Ｓ５の
章タイトルチェックにも利用される。The chapter title recognition result as described above is similarly executed for the chapter titles appearing later, and the processing S
It is also used as the title pattern of No. 4, and is also used for the chapter title check of process S5.

【００２８】したがって、本実施例の章タイトル認識方
法は、いろいろなパターンの書き方に対応して、基本部
品を用意するのみで章タイトルを認識できる。パターン
の識別になるため、タイトルパターンのツリー構造の作
成とチェックに利用できる。Therefore, according to the chapter title recognition method of the present embodiment, the chapter title can be recognized only by preparing the basic parts corresponding to various patterns of writing. It can be used to create and check the tree structure of the title pattern because it becomes the identification of the pattern.

【００２９】なお、実施例において、文書から直接に章
タイトル部分を他の文章と分離できる手段（例えば、数
字の混在と空白の数から分離する手段）を利用するとき
は、形態素解析データを必ずしも必要としない。In the embodiment, when the means for separating the chapter title part from other sentences directly from the document (for example, means for separating the mixed numbers and the number of blank spaces) is used, the morphological analysis data is not always required. do not need.

【００３０】[0030]

【発明の効果】以上のとおり、本発明によれば、文書に
表記される章タイトル部分をそれぞれ認識・抽出し、こ
の章タイトルのパターンをツリー構造にして保存し、ツ
リー構造の章タイトルデータから章タイトル表記の誤り
の有無をチェックし、誤りのある章タイトルをオペレー
タに指示し、オペレータの修正入力で該章タイトルを修
正するようにしたため、章タイトルの表記の誤りを自動
的にチェック及び修正でき、オペレータの負担を軽減
し、また誤りの見落としが無くなる。As described above, according to the present invention, the chapter title portions written in the document are respectively recognized and extracted, the pattern of the chapter titles is saved as a tree structure, and the chapter title data of the tree structure is extracted. By checking whether there is an error in the chapter title notation and instructing the operator with the incorrect chapter title and correcting the chapter title by the operator's correction input, the notation error of the chapter title is automatically checked and corrected. Therefore, the burden on the operator is reduced, and errors are not overlooked.

【００３１】また、本発明は、章タイトルの認識は、章
タイトルを表記する基本部品を用意し、文書に表れる章
タイトルを該基本部品の組み合わせになる章タイトルパ
ターンとして認識するようにしたため、多数のバリエー
ションになる章タイトルの自動認識を少ないデータを使
って確実に実行できる効果がある。Further, according to the present invention, since the chapter titles are recognized by preparing the basic parts for indicating the chapter titles and recognizing the chapter titles appearing in the document as the chapter title pattern which is a combination of the basic parts, many There is an effect that automatic recognition of chapter titles that are variations of can be surely executed using a small amount of data.

[Brief description of drawings]

【図１】本発明の一実施例を示す処理手順図。FIG. 1 is a processing procedure diagram showing an embodiment of the present invention.

【図２】文書構成中の章タイトル例。FIG. 2 is an example of a chapter title in the document structure.

【図３】章タイトルのツリー構造例。FIG. 3 is an example of a tree structure of chapter titles.

Claims

[Claims]

1. A chapter title portion described in a document is recognized and extracted, and the extracted chapter title pattern is saved as a tree structure, and whether there is an error in the chapter title notation from the chapter title data of the tree structure. Is checked, the chapter title having an error is indicated to the operator, and the chapter title is corrected by the operator's correction input.

2. The chapter title recognition is characterized in that a basic part for indicating a chapter title is prepared, and the chapter title appearing in a document is recognized as a chapter title pattern which is a combination of the basic parts. How to proofread the written Japanese sentence.