JPH1011443A

JPH1011443A - Document code check system

Info

Publication number: JPH1011443A
Application number: JP8182752A
Authority: JP
Inventors: Yasuhiko Nakane; 保彦中根; Takashi Takayanagi; 隆高柳; Toshito Hara; 敏人原
Original assignee: Advantest Corp
Current assignee: Advantest Corp
Priority date: 1996-06-24
Filing date: 1996-06-24
Publication date: 1998-01-16

Abstract

PROBLEM TO BE SOLVED: To attain the accurate elaboration support to the writing errors and then to secure a function to automatically produce an index list from the extracted codes by receiving the information on the intermediate codes and the sorting results of an entire document, sorting the code names in an ascending order and displaying the sorting results in a single line for every individual code. SOLUTION: An object code selection part 80 receives the intermediate codes from an intermediate code extraction part 60 and performs the decision processing of four types of sorting among these intermediate codes by a five steps selection processing, i.e., a noise code, a noise suspected code, a pseudo code and other object codes. A code totalization display part 10 receives the extraction and sorting result information on the intermediate codes and term parts of an entire document and sorts these information in an ascending order of code names to display the normal information in a single line for every code. That is, the extracted term parts, the code names, the area-based generation frequency of relevant codes and the reference terms are displayed. If an attentional display object recognized in the display mode, the part of the display object is shown in a corresponding attentional display form.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は文章ファイルにお
いて、主に符号に関する検査推敲容易化と、抽出した符
号のインデックス自動生成用の符号検査システムに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a code inspection system for facilitating inspection and elaboration of a code mainly in a text file and for automatically generating an index of an extracted code.

【０００２】[0002]

【従来の技術】従来においても、文章ファイル中の符号
検査を行う様々な手法が存在する。例えば、作成文章の
中で図面に付与した符号に対して、これに対応して文章
中に記述されている符号と該符号に対応する用語を抽出
してＫＷＩＣリスト手法で実現する手法がある。この第
１手法例として特開平３ー１３２８６６の「文書作成装
置」がある。これによればＫＷＩＣ（keyword in conte
xt）形式の表示機能を設けて、目的符号をキーワード入
力することで、全ての目的符号がＫＷＩＣ画面表示さ
れ、これによりオンライン編集修正可能としたものであ
る。第２手法例として特開平４ー２９６９７０の「文章
検査装置」がある。これによればＫＷＩＣ形式の表示機
能を設けて、目的数字列を抽出し、単語辞書により直前
文字列を図面参照語候補とし、更に文法的整合性を検査
して不要候補を除き、ソートしてＫＷＩＣ形式に生成し
て出力するものである。ここで、ＫＷＩＣとは、文脈付
き見出し語と呼ばれ、目的のキーワード（見出し語）と
前後の文脈を併せ付けソートした形で画面表示してオン
ライン編集を容易にするものである。2. Description of the Related Art Conventionally, there are various methods for checking a code in a text file. For example, there is a method in which, for a code added to a drawing in a created text, a code described in the text corresponding to the code and a term corresponding to the code are extracted and realized by a KWIC list method. As a first example of the method, there is a “document creation device” of Japanese Patent Application Laid-Open No. 3-132866. According to this, KWIC (keyword in conte
By providing a display function of xt) format and inputting a target code as a keyword, all target codes are displayed on the KWIC screen, thereby enabling online editing and correction. As a second example of the technique, there is a "text inspection apparatus" disclosed in Japanese Patent Application Laid-Open No. 4-296970. According to this, a display function of a KWIC format is provided, a target numeral string is extracted, the immediately preceding character string is used as a drawing reference word candidate by a word dictionary, and grammatical consistency is checked to remove unnecessary candidates, and sorting is performed. It is generated and output in the KWIC format. Here, the KWIC is called a headword with a context, and is used to display a screen on the target keyword (headword) and the context before and after the target keyword and sort them together to facilitate online editing.

【０００３】ここで符号とは、例えば図面中の特定部を
明示する為の記号即ち図符号であったり、文書中で使用
する定義語や特定の分野で用いる専門語（technical te
rms）等に対応させて参照容易にする為に用語の直後に
付与した記号である。これら記号は、分野や業界や文書
ファイル種により異なるが、主に数値や英字やギリシャ
文字や添え字及びこれらの組合せで使用に供される。[0003] Here, the code is, for example, a symbol for specifying a specific part in a drawing, that is, a graphic code, a definition word used in a document, or a technical word used in a specific field.
rms), etc., in order to make it easier to refer to. These symbols vary depending on the field, industry, and document file type, but are mainly used for numerical values, alphabets, Greek letters, subscripts, and combinations thereof.

【０００４】[0004]

【発明が解決しようとする課題】一般的に作成された文
章中で、目的符号（及び対をなす直前語の目的用語）の
出現回数は例えば数百回にも及ぶが、これらの中で誤記
（入力ミス）発生件数ははるかに少ない。The number of appearances of an object code (and the object word of the immediately preceding word in a pair) in a generally prepared sentence is, for example, several hundred, but among these, a mistake is made. (Input mistakes) The number of occurrences is far less.

【０００５】ところで文章中には目的符号以外のノイズ
符号（目的符号以外の無用符号）が多く存在し、これら
を含む混在した抽出符号群として得られる。ノイズ符号
は、文章の種類にもよるが一般的に誤記発生件数よりも
はるかに多い回数出現する。図１２にノイズ符号の一例
として単位数値や術語の一部分や図番号や頻度数値を示
す。これらノイズ符号の内在の為、従来の比較的単純な
ノイズ符号除去手段では多数の無用な抽出符号が目的符
号として認識されてしまう。ところで符号検査システム
の主目的は、誤記を的確に見出す支援システムであり、
この為にはノイズ符号をなるべく排除して的確に目的符
号を抽出することが最重要課題の一つとなっている。[0005] Incidentally, there are many noise codes (useless codes other than the target code) other than the target code in the text, and they are obtained as a group of extracted codes including these. The noise code generally appears much more times than the number of erroneous entries, depending on the type of text. FIG. 12 shows, as an example of the noise code, a unit numerical value, a part of a term, a figure number, and a frequency numerical value. Because of these noise codes, a large number of useless extracted codes are recognized as target codes by the conventional relatively simple noise code removing means. By the way, the main purpose of the code inspection system is a support system that accurately finds errors.
For this purpose, it is one of the most important issues to remove the noise code as much as possible and to accurately extract the target code.

【０００６】従来では、これらノイズを除去して認識率
を向上する様々な技術的手段がある。しかし上記観点か
ら見ると従来技術では、第１にノイズ符号を含む抽出符
号が出力される結果、出力表示が乱雑となり推敲者には
極めて目障りとなってしまう難点がある。また第２にＫ
ＷＩＣ形式で出力する場合は文章中の前後関係の文章と
ともに抽出符号が表示されるが、殆どのものは良符号・
良用語であり、これらの明らかに推敲を要しないものま
でも表示されることは推敲者として煩わしく感じ好まし
くない。また第３に図１３に示すように文章記述形態と
して複数符号を連続して記述する記述形態が多くあり、
このような記述形態に対しては従来適正な用語抽出が出
来ず実用上の難点であった。また第４に図１４に示すよ
うに抽出した目的符号と目的用語を索引用インデックス
として巻末に付与したい用途があり、この場合の自動生
成において、従来技術では前記多くの支障や多様な記述
形態がある為、実用的なインデックス用の自動生成が困
難であり実用されていなかった。Conventionally, there are various technical means for removing these noises and improving the recognition rate. However, from the above point of view, in the related art, firstly, as a result of outputting an extracted code including a noise code, there is a problem that the output display is messy and the elaborator is extremely obstructive. Second, K
When outputting in WIC format, the extracted code is displayed together with the contextual sentence in the text.
It is a good term, and it is unpleasant and unpleasant to be displayed as those that clearly do not require elaboration as an elaborator. Thirdly, as shown in FIG. 13, there are many description forms in which a plurality of codes are continuously described as a sentence description form.
Conventionally, proper term extraction has not been possible for such a description form, which is a practical difficulty. Fourth, as shown in FIG. 14, there is a use in which the extracted target code and target term are to be added to the end of the book as an index for indexing. For this reason, it has been difficult to automatically generate a practical index and has not been used in practice.

【０００７】そこで本発明が解決しようとする課題は、
本発明では可能な限りノイズ符号を除去し、疑惑符号を
認識し、誤記を認識して目的符号認識率を格段に向上さ
せ、疑問と判断される符号を注目表示し、かつ無用な表
示を排除し、全符号を分野別の一覧表示することで、誤
記に対する的確な推敲援助を実現し、更に抽出した符号
からインデックス一覧の自動生成機能を実現する推敲援
助システムを目的とする。Therefore, the problem to be solved by the present invention is as follows:
In the present invention, noise codes are removed as much as possible, suspicious codes are recognized, erroneous writing is recognized, the target code recognition rate is significantly improved, codes judged to be questionable are displayed, and unnecessary display is eliminated. In addition, the present invention aims at realizing an accurate revision assist for erroneous writing by displaying a list of all codes according to fields, and further realizing a revision assistance system for automatically generating an index list from extracted codes.

【０００８】[0008]

【課題を解決するための手段】第１に、上記課題を解決
するために、本発明の構成では、文章ファイルを受け、
抽出条件設定部４０からの設定条件を受けて、所定の順
序で文書データを出力する文章ファイル入力部２０を設
け、前記文章ファイル入力部２０からの文書データを受
け、抽出条件設定部４０からの符号適合条件６２を受け
て、これに適合する符号対象を中間符号１０２midとし
て取り出し、中間符号に対応する用語部分１０１を取り
出し、中間符号を領域別に各符号発生回数を計数する中
間符号抽出部６０を設け、前記中間符号抽出部６０から
の中間符号１０２midを受けて、中間符号に対して、基
本フィルタ処理部８１と分野共通フィルタ処理部８２と
ユーザ辞書処理部８４と符号明示用語処理部８６と同一
用語比較判定処理部８８とによる分類処理により、ノイ
ズ符号１０６nzと疑惑符号１０７（ノイズ疑惑符号１１
６nzを含む）とこれ以外の目的符号１０４objに分類判
定して出力する目的符号選別部８０を設け、前記で得ら
れた文書全体に渡る中間符号１０２midと分類結果の情
報を受けて、符号名を昇順にソートし、個別符号毎に１
行に表示し、少なくとも表示内容として抽出用語部分１
０１を表示し、中間符号１０２mid名を表示し、符号の
領域別符号発生回数４００を表示し、インデックス一覧
（「符号の説明」も同類）の参照用語３０２を表示し、
この表示行中に注目表示対象がある場合はその表示部分
を注目表示形態で表示し、全表示対象の符号を一覧表示
する符号集計表示部１０を設ける構成手段とする。これ
により、文書ファイル中に記載される符号とこれと対応
する用語の検査において、無用なノイズ符号を的確に除
去し、疑惑符号を認識し、目的符号認識率を格段に向上
させ、誤記可能性部分を注目表示して、全符号を一覧表
示して、誤記に対する的確なる推敲援助システムを実現
する。First, in order to solve the above-mentioned problems, in the configuration of the present invention, a sentence file is received.
A text file input unit 20 for outputting document data in a predetermined order in response to a setting condition from the extraction condition setting unit 40 is provided. In response to the code conformance condition 62, a code object that conforms to the condition is extracted as an intermediate code 102mid, a term part 101 corresponding to the intermediate code is extracted, and the intermediate code extraction unit 60 that counts the number of code occurrences for each region by the intermediate code is provided. In response to the intermediate code 102mid from the intermediate code extracting unit 60, the same as the basic filter processing unit 81, the field common filter processing unit 82, the user dictionary processing unit 84, and the code explicit term processing unit 86 for the intermediate code. By the classification processing by the term comparison determination processing unit 88, the noise code 106nz and the suspect code 107 (the noise suspect code 11
6nz) and a target code selection unit 80 for classifying and outputting the target code 104obj other than the target code 104obj, and receiving the intermediate code 102mid and the classification result information over the entire document obtained as described above, and changing the code name. Sort in ascending order, 1 for each individual code
Displayed in a row, and at least the extracted term part 1 as the display content
01, the name of the intermediate code 102mid is displayed, the code generation count 400 per code region is displayed, and the reference term 302 of the index list ("code description" is also similar) is displayed.
If there is an attention display target in this display row, the display portion is displayed in an attention display mode, and a code total display unit 10 for displaying a list of codes of all display targets is provided. This makes it possible to accurately remove unnecessary noise codes, recognize suspicious codes, significantly improve the target code recognition rate, and improve the possibility of erroneous writing in the inspection of codes written in the document file and the corresponding terms. Attention is paid to the part and all codes are displayed in a list to realize an accurate elaboration support system for erroneous writing.

【０００９】第２に、上記課題を解決するために、本発
明の構成では、文章ファイルを受け、抽出条件設定部４
０からの設定条件を受けて、所定の順序で文書データを
出力する文章ファイル入力部２０を設け、前記文章ファ
イル入力部２０からの文書データを受け、抽出条件設定
部４０からの符号適合条件６２を受けて、これに適合す
る図番号１０２zu対象を取り出し、領域別に各図番号発
生回数を計数する中間符号抽出部６０を設け、前記図番
号１０２zuと領域別の発生回数を受けて、図番号を昇順
にソートし、各図番号毎に１行に表示し、各領域別（所
望により同類領域を併合圧縮した領域別）の発生回数を
表示し、図番号のインデックス欄がある場合は、この後
に参照用図番号を表示し、更にこの後に判定結果表示部
６０６に全文中の発生回数の有無による誤記／記述抜け
／過剰記述の有無の注目表示形態で表示する図番号一覧
表示手段を設ける構成手段とする。これにより、文書フ
ァイル中に記載される図番号の検査において、各領域で
の図番号の発生の正当性を的確に推敲判断を可能にする
推敲援助システムを実現する。Second, in order to solve the above-mentioned problem, according to the configuration of the present invention, a text file is received and the extraction condition setting unit 4
A document file input unit 20 for outputting document data in a predetermined order in response to setting conditions from 0, receiving document data from the document file input unit 20, receiving a code matching condition 62 from an extraction condition setting unit 40, Then, the figure number 102zu target that matches this is taken out, and an intermediate code extraction unit 60 that counts the number of occurrences of each figure number for each area is provided, and receives the figure number 102zu and the number of occurrences for each area. Sort in ascending order, display one line for each drawing number, display the number of occurrences for each region (for each region where similar regions are merged and compressed as desired), and if there is a drawing number index column, A figure number list display means for displaying a reference figure number, and thereafter, in the judgment result display unit 606, displaying in a noticeable display form whether or not there is an erroneous description / missing description / excessive description due to the presence / absence number of occurrences in all sentences. Means. As a result, in the inspection of the figure numbers described in the document file, a revision assisting system is realized which enables the revision judgment of the generation of the figure numbers in each area accurately.

【００１０】第３に、上記課題を解決するために、本発
明の構成では、文章ファイルを受け、抽出条件設定部４
０からの設定条件を受けて、所定の順序で文書データを
出力する文章ファイル入力部２０を設け、前記文章ファ
イル入力部２０からの文書データを受け、抽出条件設定
部４０からの符号適合条件６２を受けて、これに適合す
る符号対象を中間符号１０２midとして取り出し、中間
符号に対応する用語部分１０１を取り出し、中間符号を
領域別に各符号発生回数を計数する中間符号抽出部６０
を設け、前記中間符号抽出部６０からの中間符号１０２
midを受けて、中間符号に対して、基本フィルタ処理部
８１と分野共通フィルタ処理部８２とユーザ辞書処理部
８４と符号明示用語処理部８６と同一用語比較判定処理
部８８とによる分類処理により、ノイズ符号１０６nzと
疑惑符号１０７（ノイズ疑惑符号１１６nzを含む）とこ
れ以外の目的符号１０４objに分類判定して出力する目
的符号選別部８０を設け、抽出した中間符号１０２mid
に対応する用語部分１０１を受けて、用語部分１０１先
頭側にある無用な語句を削除して、中間符号１０２mid
に対する用語として生成出力し、所望により同一用語は
集合処理して出力するインデックス自動生成手段を設け
る構成手段とする。これにより、インデックス一覧の作
成が必要な文書ファイル中に記載される符号とこれと対
応する用語から、インデックス自動生成機能を実現す
る。Third, in order to solve the above-mentioned problem, in the configuration of the present invention, a text file is received and the extraction condition setting unit 4
A document file input unit 20 for outputting document data in a predetermined order in response to setting conditions from 0, receiving document data from the document file input unit 20, receiving a code matching condition 62 from an extraction condition setting unit 40, In response to this, the code object conforming to this is extracted as the intermediate code 102mid, the term part 101 corresponding to the intermediate code is extracted, and the intermediate code extraction unit 60 that counts the number of occurrences of each code for each intermediate code for each area.
And the intermediate code 102 from the intermediate code extraction unit 60
In response to the mid code, the intermediate code is subjected to classification processing by the basic filter processing unit 81, the field common filter processing unit 82, the user dictionary processing unit 84, the code explicit term processing unit 86, and the same term comparison processing unit 88, A target code selection unit 80 is provided for classifying and outputting the noise code 106nz, the suspect code 107 (including the noise suspect code 116nz), and the other target codes 104obj, and outputs the extracted intermediate code 102mid
, The useless phrase at the beginning of the term part 101 is deleted, and the intermediate code 102mid
And an automatic index generation means for generating and outputting as a term for, and, if desired, collecting and outputting the same term. As a result, an index automatic generation function is realized from codes described in a document file that requires creation of an index list and terms corresponding to the codes.

【００１１】また、第１の構成手段に加えて、指定され
た中間符号１０２midに対する詳細表示指示を受けて、
個々の指定中間符号１０２midに対して、領域情報２６a
reaと行番号情報２４gyouを表示し、該当行先頭から文
章を表示し、指示中間符号１０２midと手前の用語部分
１０１あるいは用語対象２０１yogoを注目表示し、中間
符号以後の文脈を指定文字数（例えば２０文字）で表示
する詳細符号表示手段を設ける構成手段がある。また、
第１の構成手段に加えて、文章ファイルの種類や領域に
よる特有な使用禁止語句を登録した使用禁止語句辞書
（使用禁止語句辞書２７１dicや敬語調の語句や命令口
調語句等）を設け、検査該当領域あるいは全文に対して
サーチし、使用禁止語句辞書に該当する語句が検出され
た場合、この領域情報２６areaと行番号情報２４gyouを
表示し、その文書行先頭から文章を表示し、辞書内容に
該当する語句を注目表示する表示手段を設けることで、
文書ファイル種別に特有の使用禁止語句の推敲判断を一
層容易にできる。また、第１の構成手段に加えて、中間
符号１０２midを受け、用語対象２０１yogoを受けて、
ＫＷＩＣ形態で前後の文脈を併せ付け、用語対象を注目
表示させて画面表示し、オンライン編集するＫＷＩＣ編
集手段を設ける構成手段がある。Further, in addition to the first constituent means, upon receiving a detailed display instruction for the designated intermediate code 102mid,
For each designated intermediate code 102mid, the area information 26a
rea and line number information 24gyou are displayed, the sentence is displayed from the beginning of the corresponding line, the designated intermediate code 102mid and the preceding term part 101 or term target 201yogo are displayed with attention, and the context after the intermediate code is designated by the specified number of characters (for example, 20 characters). There is a configuration means for providing a detailed code display means for displaying in ()). Also,
In addition to the first constituent means, a banned word dictionary (a banned word dictionary 271dic, an honorific word, a command tone word, etc.) in which a specific prohibited word according to the type and area of the sentence file is registered is provided. When the search is performed on the area or the entire text and a phrase corresponding to the prohibited phrase dictionary is detected, the area information 26area and the line number information 24gyou are displayed, the text is displayed from the beginning of the document line, and By providing a display that highlights the words you want to use,
It is possible to further easily determine the elaboration of the use prohibition phrase specific to the document file type. In addition, in addition to the first configuration means, receiving the intermediate code 102mid, receiving the term subject 201yogo,
There is a configuration unit that includes a KWIC editing unit that associates contexts before and after in a KWIC format, displays a term object on a screen with attention displayed, and performs online editing.

【００１２】[0012]

【発明の実施の形態】以下に本発明の実施の形態を実施
例と共に詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below in detail with reference to examples.

【００１３】[0013]

【実施例】最初に符号の抽出過程における各符号の分類
定義を図１０の関係図を示して説明する。抽出対象符号
１００allとは文書中の符号となりうる全ての記述文字
列とする。中間符号１０２midとは抽出対象符号１００a
ll内から後述する所定条件に適合して抽出された第１段
階の抽出符号とする。目的符号１０４objとは当該文書
ファイルにおいて著者が意図した符号文字列であり、抽
出すべき符号とする。ノイズ符号１０６nzとは中間符号
１０２midに対して、後述するフィルタ辞書により除去
されるべき符号とする。疑惑符号１０７とはノイズ符号
１０６nzか目的符号１０４objか判定困難な疑わしい符
号とする。この集合分類内には、執筆者の記述誤りによ
る誤記符号１０８missや誤記用語１０９があり、これは
目的符号１０４obj内とノイズ符号１０６nz内と疑惑符
号１０７内及び中間符号領域外に渡って混存している。
符号検査システムとしては、推敲者がこれらの中から目
的とする誤記部分を的確容易に見出すことができ、添削
修正作業が行なえることが望まれている。DESCRIPTION OF THE PREFERRED EMBODIMENTS First, the classification definition of each code in the code extraction process will be described with reference to the relation diagram of FIG. The extraction target code 100all is all description character strings that can be codes in the document. The intermediate code 102mid is an extraction target code 100a.
It is a first-stage extraction code extracted from within ll in accordance with predetermined conditions described later. The target code 104obj is a code character string intended by the author in the document file, and is a code to be extracted. The noise code 106nz is a code to be removed from the intermediate code 102mid by a filter dictionary described later. The suspicious code 107 is a suspicious code that is difficult to determine whether it is the noise code 106nz or the target code 104obj. In this set classification, there are erroneous code 108miss and erroneous term 109 due to the writing error of the author, which are mixed in the target code 104obj, the noise code 106nz, the suspicious code 107 and outside the intermediate code area. ing.
As a code inspection system, it is desired that an elaborator can easily and easily find a target erroneous portion from these, and can perform correction and correction work.

【００１４】ここで誤記符号１０８missの誤記形態とし
ては３形態ある。第１は複数回発生する目的符号に対応
する用語の全部が誤記の場合があり、この場合は用語側
の誤記かあるいは符号側の誤記の両方の誤記形態でがあ
る。第２は目的符号自体の誤記の場合である。第３は目
的符号に対応する用語の記述抜けの場合である。また同
一符号に対する出現回数は１回のみの場合と複数回出現
する場合とがある。これらの記述される符号に対して何
れの誤記であるかをなるべく的確な指摘表示できるよう
にすることが望ましい。一方、符号の記述は分野や執筆
者により様々な任意多様な符号記述が許されている為、
誤記符号１０８missを直接的に検出指摘することは一部
を除いて事実上困難である。この為、符号の推敲援助シ
ステムでは、可能なかぎりの分類手段を設けて、ノイズ
符号１０６nzを除外分類し、疑惑符号１０７を分類し、
誤記符号１０８miss、誤記用語１０９を分類し、これら
を適切な表示形態で表示することで推敲者自身が総合的
に判断しやすいように支援することが求められている。Here, there are three types of erroneous writing of the erroneous code 108miss. First, there are cases where all of the terms corresponding to the object code that occurs a plurality of times are erroneous, and in this case, there are erroneous forms of both erroneous writing on the term side and erroneous writing on the code side. The second is a case where the target code itself is erroneously described. The third is a case where the description of the term corresponding to the objective code is omitted. In addition, the number of appearances for the same code may be only once or may occur plural times. It is desirable to be able to indicate the erroneous description as accurately as possible with respect to these described codes. On the other hand, since the description of the code is allowed by various and various code descriptions depending on the field and the author,
It is practically difficult to directly detect and point out the erroneous code 108miss except for a part. For this reason, in the code elaboration assistance system, as much as possible classifying means is provided, the noise code 106nz is excluded and the suspicious code 107 is classified,
It is required to classify the erroneous code 108miss and the erroneous term 109 and display them in an appropriate display form so that the elaborator can easily make a comprehensive judgment.

【００１５】文章ファイル例として特許明細書ファイル
の具体例で以下に説明する。（実施例１）本発明実施例１の符号検査システムの処理
構成は、図１に示すように文章ファイル入力部２０と、
条件設定部４０からの設定スイッチと、中間符号抽出部
６０と、目的符号選別部８０と、定義符号抽出部９０
と、符号集計表示部１０とで成る。A specific example of a patent specification file will be described below as an example of a text file. (Embodiment 1) The processing configuration of a code checking system according to Embodiment 1 of the present invention includes a text file input unit 20 as shown in FIG.
A setting switch from the condition setting unit 40, an intermediate code extraction unit 60, a target code selection unit 80, and a definition code extraction unit 90
And a code tally display unit 10.

【００１６】最初に図１に示す文章ファイル入力部２０
を説明する。文章ファイル入力部２０は、条件設定部４
０内にある読み出し設定条件２２に対応した順序で検査
対象のファイルから１行単位に読み出し出力する。とこ
ろで３文書（要約書、明細書、願書）は任意順序で連結
して１つのファイルとしている場合もある。この為、読
み出し設定条件２２には、どの文書順序で読み出すかを
指定しておく。通常は願書、明細書、要約書の順序に読
みだして出力し、該当文書が無い場合はスキップする。
また明細書中の「符号の説明」領域内の符号を検査用参
照（リファレンス）として使用したい場合があり、この
場合は当該領域を最初に出力させる条件設定にしても良
い。これを図２（ａ）に示す読み出し順管理部２１が行
う。この文章ファイル入力部２０の処理構成は、図２
（ａ）に示すように、読み出し順管理部２１と、読み出
し設定条件２２と、行番号管理部２４と、領域管理部２
６と、見出し順管理部２８と、見出し参照テーブル２９
と、コメント削除部３２とで成る。図２（ａ）に示す行
番号管理部２４は、行単位を改行記号単位とする場合と
論理行単位・論理頁単位とする場合の２通りがある。特
許明細書の場合は、論理行単位（７２バイト）・論理頁
単位行（２９行）と決められているので、改行単位に
読み込んだ文書データをこの行・頁条件で計数管理を行
い、この行番号情報２４gyouを出力する。First, a text file input unit 20 shown in FIG.
Will be described. The sentence file input unit 20 includes the condition setting unit 4
The data is read out from the file to be inspected line by line in the order corresponding to the read setting condition 22 within 0 and output. By the way, three documents (abstract, specification, application) may be combined in an arbitrary order to form one file. For this reason, the reading setting condition 22 specifies in which document order to read. Normally, the application is read and output in the order of application, specification, and abstract, and skips if there is no corresponding document.
In some cases, it is desired to use the code in the “explanation of code” area in the specification as a reference for inspection. In this case, the condition may be set so that the area is output first. This is performed by the reading order management unit 21 shown in FIG. The processing configuration of the text file input unit 20 is shown in FIG.
As shown in (a), the read order management unit 21, the read setting condition 22, the row number management unit 24, and the area management unit 2
6, a heading order management unit 28, and a heading reference table 29
And a comment deletion unit 32. The line number management unit 24 shown in FIG. 2A has two cases: a line unit is a line feed symbol unit; and a line unit is a logical line unit and a logical page unit. In the case of a patent specification, since the logical line unit (72 bytes) and the logical page unit line (29 lines) are determined, the document data read in line feed units are counted and managed under this line / page condition. The line number information 24gyou is output.

【００１７】図２（ａ）に示す領域管理部２６は、章立
て記号を領域としたり、特定の見出し記号をその領域と
みなす場合がある。特許明細書の場合は、墨付き括弧記
号で明確に領域区分けがされているので、この墨付き括
弧の見出し名（例えば「書類名」「特許請求の範囲」
「図面の簡単な説明」等）単位に領域管理し、この領域
情報２６areaを出力する。The area management unit 26 shown in FIG. 2A may regard a chapter mark as an area or a specific heading as that area. In the case of a patent specification, since the regions are clearly divided by the black brackets, the heading names of the black brackets (for example, “document name”, “claims”
The area management is performed in units of “Brief description of drawings” and the like, and the area information 26area is output.

【００１８】図２（ａ）に示す見出し順管理部２８は、
ファイル種に対応した見出しの発生順の正当性を管理す
る。ここでいう見出しとは、例えば「第１章」「１．１
マニュアルの構成」「１．２取り扱い上の注意」「１．
３性能諸元」のように章番号が昇順に数値が付与される
場合や、特許明細書では出願形式により決められた順序
の墨付き括弧の明示的見出しの発生順が定められてい
て、この基準を基に正当性を管理する。この為見出し順
管理部２８では、順番に入力される見出しとファイル種
（特許明細書、取扱説明書、等）に対応した基準となる
見出し参照テーブル２９（あるいは数値昇順の生成手
段）を設けて、前後の見出し関係を基にして当該見出し
（あるいは昇順数値）の発生順に誤りがあるかを判断
し、もし誤りの場合は見出し名とともに警告タグを付与
した見出し情報２８titlを出力して、図３４（ａ〜ｃ）
に示すように警告表示させる。The heading order management unit 28 shown in FIG.
Manages the validity of the order of headings corresponding to the file type. The heading here means, for example, “Chapter 1” “1.1
Structure of the manual, "1.2 Precautions in handling", "1.
In the case where chapter numbers are assigned in ascending order as in “3 performance specifications”, or in patent specifications, the order in which explicit headings of black parentheses are generated in the order determined by the application format is defined. Manage legitimacy based on criteria. For this reason, the heading order management unit 28 is provided with a heading reference table 29 (or a numerical ascending order generating means) which is a reference corresponding to the heading and the file type (patent specification, instruction manual, etc.) input in order. 34, it is determined whether or not there is an error in the order of occurrence of the heading (or ascending numerical value) based on the heading relation before and after the heading. If there is an error, heading information 28titl with a warning tag is output together with the heading name, and FIG. (Ac)
A warning is displayed as shown in FIG.

【００１９】図２（ａ）に示すコメント削除部３２は、
ワープロソフトが有する機能により文章内部で校正メモ
用に記述され、最終的な生成物（ドキュメント）として
出力されない特別な内部コメント文章形態や、文字情報
以外の情報であるイメージデータ、線画データ、図デー
タ、及び外部図ファイルとのリンク情報データ等があ
る。ここでファイル・フォーマットとしては、テキスト
データ形式とワープロ種固有のバイナリデータ形式があ
る。バイナリデータ形式には文章中に図やイメージデー
タを張り付けて１つのファイル形式とした形態のものや
別ファイルに置いておきこれに対するリンク情報データ
を格納する形態とがある。これらの本来文章以外の情報
データ部分を検出した場合は、これを削除して以後の検
査に供給しない。ただし、非イメージデータの中には図
番号等が文字コード形態で格納されて抽出可能なデータ
形式の場合があり、この場合には図番号情報３２zuを出
力して後述する抽出後の符号に対する誤記判断の参照比
較用として利用しても良い。The comment deletion section 32 shown in FIG.
A special internal comment text form that is described as a proof memo in the text by the function of the word processing software and is not output as a final product (document), image data, line drawing data, figure data other than character information , And link information data with the external diagram file. Here, the file format includes a text data format and a binary data format specific to a word processor. The binary data format includes a format in which a figure or image data is attached to a sentence in a single file format, or a format in which the link information data is stored in a separate file. When an information data portion other than the original text is detected, it is deleted and not supplied to the subsequent inspection. However, some non-image data may be in a data format in which a drawing number or the like is stored in a character code form and can be extracted. In this case, the drawing number information 32zu is output and erroneous writing for the extracted code described later is performed. It may be used as a reference for comparison.

【００２０】次に図１に示す中間符号抽出部６０を説明
する。中間符号抽出部６０は、主に符号適合条件６２に
よる符号対象とこれに対応する用語を取り出す。この処
理構成は、図２（ｂ）に示すように中間符号単純抽出部
６０ｂと、連続符号認識処理部６０ｃと、複数用語認識
抽出処理部６０ｄと、中間符号抽出除外処理部６０ｅと
で成る。Next, the intermediate code extracting section 60 shown in FIG. 1 will be described. The intermediate code extraction unit 60 mainly extracts a code object according to the code matching condition 62 and a term corresponding thereto. As shown in FIG. 2B, this processing configuration includes an intermediate code simple extraction unit 60b, a continuous code recognition processing unit 60c, a plural term recognition extraction processing unit 60d, and an intermediate code extraction exclusion processing unit 60e.

【００２１】中間符号単純抽出部６０ｂは、符号ワード
形態に着目した単純機械的な抽出であり、図３に示す符
号適合条件６２を受けて、文章中から当該条件に適合し
た中間符号１０２midを単純機械的に抽出して出力す
る。この符号の手前に有る対応する用語部分１０１も併
せて出力する。この符号適合条件６２により無用な抽出
が無くなり、以後のフィルタ処理を容易にする利点が得
られる。The intermediate code simple extraction unit 60b is a simple mechanical extraction focusing on the code word form, and receives the code adaptation condition 62 shown in FIG. 3 and simply extracts the intermediate code 102mid conforming to the condition from the text. Extract and output mechanically. The corresponding term part 101 located before this code is also output. The sign matching condition 62 eliminates unnecessary extraction, and provides an advantage that the subsequent filtering process is facilitated.

【００２２】連続符号認識処理部６０ｃは、図８（ａ）
の文書例に示すように第１符号、第２符号、第３符号の
記述による連続符号記述形態の場合があり、この場合に
対応して、最先符号の直前にある同一用語部分１０１sa
meを連続符号個々の用語部分１０１と見なす関連付けを
して出力する。図８（ｂ）の連続符号認識辞書６０dic
は多様な連続符号の記述形態を認識する為の辞書例を示
している。図８（ｃ）に、この辞書で認識可能な連続符
号の記述文書例を示す。これにより連続符号記述に対し
ても誤記検査対象とすることが可能になる利点が得られ
る。The continuous code recognition processing unit 60c operates as shown in FIG.
As shown in the example of the document, there is a case where the first code, the second code, and the third code are used to describe the continuous code. In this case, the same term portion 101sa immediately before the first code is described.
me is regarded as an individual term part 101 of each continuous code, and is output. The continuous code recognition dictionary 60dic of FIG.
Shows a dictionary example for recognizing the description form of various continuous codes. FIG. 8C shows an example of a description document of a continuous code recognizable by the dictionary. As a result, there is obtained an advantage that it is possible to make the erroneous writing inspection target even for the continuous code description.

【００２３】複数用語認識抽出処理部６０ｄは、括弧付
けした定義用語を複数用語（第２／第３用語）として取
り出す。文書の記述形態の中には、中間符号１０２mid
手前にある用語対象２０１yogo（これを第１用語２０１
yogoとする）が符号本来に対応する用語であるが、これ
に加えて括弧を付けて同義語を定義して他で符号と共に
記述使用される。これに対応して複数用語を認識するも
のである。この複数用語としては、図１１（ａ）の文書
例に示すように中間符号１０２mid手前にある第１用語
２０１yogoとこの間に括弧を付けて第２用語２０２yogo
を定義宣言したり、あるいは符号直後に括弧を付けて第
２用語２０２yogoを定義宣言する文章形態がある。これ
ら記述方法は一般的に良く使用される。この第２用語２
０２yogoや第３用語２０３yogoは第１用語２０１yogoに
対する同義語あるいは定義語、簡略語あるいは英語スペ
ルが記載される。この場合の記述文章は執筆者にとって
正常な符号と用語の記述であり、誤記と判定させないこ
とが望まれる。そこで複数用語認識抽出処理部では、第
１用語２０１yogoと共に括弧内に定義記載されている第
２用語２０２yogo、あるいは第３用語２０３yogoを取り
出し、複数用語としての対応付けをする用語識別マーク
２１２mrk、２１３mrkを付けて後処理時の誤記判定しな
いようにする。この定義宣言文は、通常第２用語２０２
yogoの１つを定義使用される場合が多いが、複数箇所で
記載可能なので多数第Ｎ用語２０Ｎyogoが登録される場
合もあり各々個別の識別を付ける。この複数用語認識手
段によって同一中間符号１０２midに対して複数用語が
出現しても誤記と判断する誤りを無くする利点が得ら
れ、無用なエラー表示出力を無くする効果が得られる。The multiple term recognition extraction processing unit 60d extracts the defined terms in parentheses as multiple terms (second / third terms). In the description form of the document, the intermediate code 102mid
The term object 201yogo in the foreground (this is the first term 201
yogo) is a term corresponding to the code itself, but in addition to this, a synonym is defined by adding parentheses, and is described and used together with the code in other places. In response to this, a plurality of terms are recognized. As the plurality of terms, as shown in the example of the document in FIG. 11A, a first term 201yogo before the intermediate code 102mid and a second term 202yogo by attaching parentheses between them.
There is a sentence form in which the second term 202yogo is defined and declared by defining or declaring the second term 202yogo with a parenthesis immediately after the code. These writing methods are commonly used. This second term 2
02yogo and the third term 203yogo describe synonyms or definition words, abbreviations, or English spellings for the first term 201yogo. The descriptive sentence in this case is a description of a code and a term that is normal for the author, and it is desirable that the description be not misjudged. Therefore, the multiple term recognition extraction processing unit extracts the second term 202yogo or the third term 203yogo defined in parentheses together with the first term 201yogo, and extracts the term identification marks 212mrk and 213mrk that are associated as a plurality of terms. To avoid misjudgment during post-processing. This definition declaration statement is usually made up of the second term 202
In many cases, one of yogo is defined and used, but since it can be described in a plurality of places, a large number of N-th terms 20Nyogo may be registered, and each is individually identified. This plural term recognition means has the advantage of eliminating the error of misjudgment even when multiple terms appear for the same intermediate code 102mid, and has the effect of eliminating unnecessary error display output.

【００２４】ところで前記第２用語、第３用語の記載と
して上述したように用語のみを直接括弧内に記述する記
述形態と、図１１（ｂ）に示すように定義宣言文章形態
で用語を記述する記述形態とがある。この定義宣言文章
形態の場合には無用な宣言語句部分を除外して用語を抽
出する必要がある。この為、図１１（ｃ）に示す２種類
の宣言用語フィルタ辞書１２７dicを設けて前側接頭語
句の指示代名詞相当部分に対しては前フィルタ辞書で削
除し、後側接尾語句の宣言述語相当部分に対しては後フ
ィルタ辞書で削除して、残った部分を第２用語と見なし
て抽出処理する。無論この中に中間符号１０２mid記述
部分がある場合はこの符号部分も削除した用語にする。As described above for the second and third terms, only the terms are directly described in parentheses, and the terms are described in a definition declaration sentence form as shown in FIG. 11B. There is a description form. In the case of this definition declaration sentence form, it is necessary to extract terms by excluding unnecessary declaration phrases. For this reason, two types of declarative term filter dictionaries 127dic shown in FIG. 11C are provided, and the part corresponding to the demonstrative pronoun of the front prefix is deleted by the front filter dictionary, and the part corresponding to the declarative predicate of the rear suffix is removed. On the other hand, it is deleted by the post-filter dictionary, and the remaining part is regarded as the second term and the extraction processing is performed. Of course, if there is an intermediate code 102mid description part in this, this term is also deleted.

【００２５】中間符号抽出除外処理部６０ｅは、図２３
に示す明示的除外用語辞書６０dicを使用して中間符号
１０２mid直前にある用語を参照し、辞書と一致するも
のは明示的な除外用語と見なして中間符号１０２midか
ら除外処理する。この明示的除外用語辞書６０dicは、
全分野に適用する除外辞書であり、直前用語基本フィル
タ辞書６０dic1と、直後用語基本フィルタ辞書６０dic2
とで成る。図２３（ａ）に示す直前用語基本フィルタ辞
書６０dic1により中間符号１０２mid直前にある用語が
これに該当する場合は明らかに無用なノイズ符号１０６
nzとして除外する。また図２３（ｂ）に示す直後用語基
本フィルタ辞書６０dic2により中間符号１０２mid直後
にある用語がこれに該当する場合も明らかに無用なノイ
ズ符号１０６nzとして除外する。これにより明らかに無
用なノイズ符号を分類して除外することで以後の分類処
理が容易になる利点が得られ、無用の推敲作業を軽減す
る効果が得られる。無論利用者（執筆者／推敲者等）
は、この明らかに除外すべき明示的除外用語辞書６０di
c内容を所望に適宜編集（追加／変更／削除）して使用
可能である。これらにより、中間符号１０２midの出力
とともに、文章中における中間符号１０２midのバイト
位置情報６１posと、更に文章ファイル入力部２０で得
た位置情報や行番号情報２４gyouや領域情報２６areaや
見出し情報２８titlも各符号と関連付けして中間符号格
納部６８へ保存し、以後の処理に利用する。The intermediate code extraction exclusion processing section 60e
The term immediately before the intermediate code 102mid is referred to by using the explicit exclusion term dictionary 60dic shown in (1), and the one that matches the dictionary is regarded as an explicit exclusion term and is excluded from the intermediate code 102mid. This explicit exclusion term dictionary 60dic is
This is an exclusion dictionary applied to all fields, and includes the immediately preceding term basic filter dictionary 60dic1 and the immediately following term basic filter dictionary 60dic2.
And According to the immediately preceding term basic filter dictionary 60dic1 shown in FIG.
Exclude as nz. Also, the case where the term immediately after the intermediate code 102mid corresponds to this according to the immediately following term basic filter dictionary 60dic2 shown in FIG. 23B is excluded as a clearly useless noise code 106nz. As a result, by clearly classifying and eliminating useless noise codes, there is obtained an advantage that the subsequent classification processing is facilitated, and an effect of reducing useless elaboration work is obtained. Of course users (author / editor, etc.)
Is an explicit exclusion term dictionary 60di
c It can be used by appropriately editing (adding / changing / deleting) the contents as desired. Thus, together with the output of the intermediate code 102mid, the byte position information 61pos of the intermediate code 102mid in the text and the position information, the line number information 24gyou, the area information 26area, and the heading information 28titl obtained by the text file input unit 20 are also encoded. Is stored in the intermediate code storage unit 68 in association with the data and is used for the subsequent processing.

【００２６】図１に示す条件設定部４０は、各種設定条
件の設定テーブルであり、利用者からの設定を受けて初
期条件を任意に変更可能であり、以下に説明する符号適
合条件６２も含んで各処理部に実行制御条件を供給す
る。The condition setting section 40 shown in FIG. 1 is a setting table of various setting conditions, and can arbitrarily change initial conditions in response to setting from a user, and also includes a code matching condition 62 described below. Supplies the execution control condition to each processing unit.

【００２７】ここで図３の符号適合条件６２について詳
細説明する。符号適合条件６２は、条件設定部４０の設
定スイッチの一部分であり、目的符号として抽出すべき
符号ワード形態に対する設定条件があり個別任意に設定
できる。各項目において全角数値設定とは、全角文字を
抽出するか否かのスイッチ（ＯＮ／ＯＦＦ）であり、半
角数値設定とは半角文字を抽出するか否かのスイッチで
あり、数値前英字数設定とは数値直前にある英文字数が
所定文字数（０〜７）以内のものを抽出対象とし、数値
後英字数設定とは数値直後にある英文字数が所定文字数
（０〜１０）以内のものを抽出対象とし、数値前全角英
字設定とは数値直前の全角英字の場合に抽出するか否か
のスイッチであり、数値後全角英字設定とは数値直後の
全角英字の場合に抽出するか否かのスイッチであり、数
値前半角英字設定とは数値直前の半角英字の場合に抽出
するか否かのスイッチであり、数値後半角英字設定とは
数値直後の半角英字の場合に抽出するか否かのスイッチ
である。更に数値の範囲とは数値文字集合が１〜９９９
９の範囲の任意範囲（複数領域に分割可能）を抽出有効
とし、奇数偶数指定とは符号数値が奇数のものを有効と
するか偶数のものを有効とするか両方共有効とするかの
指定であり、行範囲とはファイルの行番号（あるいは頁
／行番号）が１〜９９９９９（あるいは指定頁の指定
行）の任意範囲（複数領域に分割可能）を抽出有効と
し、見出し領域指定とは指定した一致見出し文字列（章
番号も見出しとみなす）と一致する領域（複数指定可
能）を抽出有効とし、行範囲とはファイルの行番号（あ
るいは頁／行番号）が１〜９９９９９（あるいは指定頁
の指定行）の任意範囲（複数領域に分割可能）を抽出有
効とし、上付数値設定とは上付数値を抽出するか否かの
スイッチであり、下付数値設定とは下付数値を抽出する
か否かのスイッチであり、上付英字設定とは上付英字を
抽出するか否かのスイッチであり、下付英字設定とは下
付英字を抽出するか否かのスイッチであり、英大文字設
定とは英大文字を抽出するか否かのスイッチであり、英
小文字設定とは英小文字を抽出するか否かのスイッチで
ある。これら各項目条件に対応する記述例を図３の右側
の符号例に示す。Here, the code matching condition 62 in FIG. 3 will be described in detail. The code matching condition 62 is a part of a setting switch of the condition setting unit 40, and has a setting condition for a code word form to be extracted as a target code, and can be set individually and arbitrarily. In each item, the double-byte numerical value setting is a switch (ON / OFF) for extracting a double-byte character or not, and the single-byte numerical value setting is a switch for determining whether or not a single-byte character is extracted. "" Means that the number of alphabetic characters immediately before the numerical value is within a predetermined number of characters (0 to 7) to be extracted, and "after the numerical value" indicates that the number of alphabetic characters immediately after the numerical value is within a predetermined number of characters (0 to 10). The target, full-width alphabetic setting before the numerical value is a switch whether or not to extract in the case of the full-width alphabet immediately before the numerical value, and the full-width alphabetic setting after the numerical value is a switch whether to extract in the case of the full-width alphabet immediately after the numerical value The setting of the first half-width alphabetic character is a switch of whether or not to extract in the case of the half-width alphabet immediately before the numerical value, and the setting of the second half-width alphabet of the numerical value is a switch for determining whether to extract in the case of the half-width alphabet immediately after the numerical value. It is. Further, the range of the numerical value is a numerical character set of 1 to 999.
An arbitrary range in the range of 9 (can be divided into multiple areas) is valid for extraction, and the specification of odd-even is to specify whether the code value is valid for odd, even, or both are valid. The line range means that an arbitrary range (which can be divided into a plurality of areas) whose file line number (or page / line number) is 1 to 99999 (or a specified line of a specified page) is valid. A region (a plurality of regions can be specified) that matches the specified matching heading character string (a chapter number is also regarded as a heading) is extracted and the line range is defined as a line number (or page / line number) of a file of 1 to 99999 (or specified). Extraction of an arbitrary range (can be divided into multiple areas) in the specified line of the page is valid. Superscript value setting is a switch of whether or not to extract a superscript value. Subscript value setting is a subscript value. Switch to extract or not The superscript alphabet setting is a switch of whether or not to extract a superscript alphabet, the subscript alphabet setting is a switch of whether or not to extract a subscript alphabet, and the uppercase setting is to extract uppercase letters Is a switch for determining whether or not to extract lowercase letters. A description example corresponding to each of these item conditions is shown in the code example on the right side of FIG.

【００２８】次に図１に示す目的符号選別部８０を説明
する。目的符号選別部８０は、上記中間符号抽出部６０
からの中間符号１０２midを受けて、５段階の選別処理
により、中間符号１０２midの中から次の４分類の判定
処理を行う。即ち、ノイズ符号１０６nzとノイズ疑惑符
号１１６nzと疑惑符号１０７とこれ以外の目的符号１０
４objの４分類処理を行う。この処理構成は、図９に示
すように基本フィルタ処理部８１と、分野共通フィルタ
処理部８２と、ユーザ辞書処理部８４と、符号明示用語
処理部８６と、同一用語比較判定処理部８８とで成る。Next, the target code selecting section 80 shown in FIG. 1 will be described. The target code selection unit 80 includes the intermediate code extraction unit 60
In response to the intermediate code 102mid from the intermediate code 102mid, the following four types of determination processing are performed from the intermediate code 102mid by a five-stage selection process. That is, the noise code 106nz, the noise suspect code 116nz, the suspect code 107, and the other object code 10
Four classification processing of 4obj is performed. This processing configuration includes a basic filter processing unit 81, a field common filter processing unit 82, a user dictionary processing unit 84, a code explicit term processing unit 86, and an identical term comparison processing unit 88, as shown in FIG. Become.

【００２９】第１段階の基本フィルタ処理部８１は、図
４に示す基本フィルタ辞書６４dicを使用して分類処理
を行う。図４（ａ）は、中間符号１０２midの直前文字
列に対して分類分け対象にする直前用語基本フィルタ辞
書６４dic1の例であり、これに該当する文書例を図４
（ｂ）に示す。図４（ｃ）は、中間符号１０２midの直
後文字列に対して分類分け対象にする直後用語基本フィ
ルタ辞書６４dic2の例であり、これに該当する文書例を
図４（ｄ）に示す。そして符号直前と直後の一方が一致
した場合は疑惑符号１０７として分類し、符号前後共に
一致した場合はノイズ符号１０６nzとして分類処理をす
る。これらによって中間符号１０２midは、無用と思わ
れるノイズ符号１０６nzが更に除外分類される。無論利
用者は、この基本フィルタ辞書６４dic内容を所望に適
宜編集して使用可能である。ただし特許明細書の請求項
領域例のように、符号前後に「（」と「）」を付与する
特別条件があり、この領域では抽出対象と見なす。The first-stage basic filter processing section 81 performs a classification process using the basic filter dictionary 64dic shown in FIG. FIG. 4A shows an example of the immediately preceding term basic filter dictionary 64dic1 which is to be classified for the character string immediately before the intermediate code 102mid. FIG.
(B). FIG. 4C is an example of a term term basic filter dictionary 64dic2 which is to be classified for the character string immediately after the intermediate code 102mid. FIG. 4D shows an example of a corresponding document. If one of the codes immediately before and after the code matches, the classification is made as the suspect code 107, and if the code matches before and after the code, the classification processing is performed as the noise code 106nz. As a result, the intermediate code 102mid is further excluded from the noise code 106nz which is considered to be useless. Of course, the user can edit and use the contents of the basic filter dictionary 64dic as needed. However, there is a special condition of adding "(" and ")" before and after the code as in the example of the claim area in the patent specification, and this area is regarded as an extraction target.

【００３０】第２段階の共通フィルタ処理部８２では、
複数ある利用分野別の共通フィルタ辞書８３dicの中か
ら所望の辞書を１つあるいは複数選択して使用に供す
る。共通フィルタ辞書８３dicには、産業分野（機械、
化学、電気、電子）や文章ファイルの種類（マニュア
ル、解説書、辞書、明細書、公文書等）に特有な符号名
や特異な付与符号をフィルタ辞書に登録してある。図５
（ａ）に電気分野特有の直前フィルタ語句と直後フィル
タ語句とによる共通フィルタ辞書８３dic例を示す。上
記で説明した基本フィルタ辞書６４dicの場合と同様に
して符号直前と直後を対応するフィルタ辞書で符号前後
の一致比較をし、一方が一致した場合は疑惑符号１０７
として分類し、符号前後共に一致した場合はノイズ符号
１０６nzとして分類処理をする。ここで符号前後の一致
とは、上記第１段階の基本フィルタ辞書６４dicを含め
た前後一致を意味する。例えば基本フィルタ辞書６４di
cで直前用語が一致し、本ユーザ辞書８５dicで直後が一
致した場合、前後一致とみなす。これに該当するノイズ
符号１０６nzと疑惑符号１０７に分類される文書例を図
５（ｂ）に示す。ここの直前フィルタ辞書において「全
平仮名」とフィルタ記述は平仮名の何れかが検出された
場合を意味する。図５（ｃ）には分野共通フィルタ辞書
の他の例として機械分野特有のフィルタ辞書例とこれに
該当する文書例を図５（ｄ）に示す。更に図５（ｅ）に
は特許分野特有のフィルタ辞書例とこれに該当する文書
例を図５（ｆ）に示す。これらは、ファイル種により単
一分野あるいは複数分野別の分野共通フィルタ辞書８３
dicを単一あるいは複数引用して本フィルタ処理を実行
する。無論利用者は、この分野共通フィルタ辞書８３di
c内容を所望に適宜編集して使用可能である。In the second stage common filter processing section 82,
One or a plurality of desired dictionaries are selected for use from a plurality of common filter dictionaries 83dic for each application field. The common filter dictionary 83dic includes industrial fields (machinery,
Code names specific to the types of text files (chemical, electric, electronic) and text file types (manuals, manuals, dictionaries, specifications, official documents, etc.) and specific codes are registered in the filter dictionary. FIG.
(A) shows an example of a common filter dictionary 83dic using an immediately preceding filter term and an immediately following filter term specific to the electric field. In the same manner as in the case of the basic filter dictionary 64dic described above, the code dictionary immediately before and after the code is compared with the corresponding filter dictionary before and after the code.
, And when they match both before and after the code, the classification processing is performed as the noise code 106nz. Here, matching before and after the sign means matching before and after including the first-stage basic filter dictionary 64dic. For example, basic filter dictionary 64di
If the previous term matches in c, and the immediately following matches in the user dictionary 85dic, it is regarded as a front-back match. FIG. 5B shows an example of a document classified into the noise code 106nz and the suspicious code 107 corresponding to this. In this immediately preceding filter dictionary, “all hiragana” and the filter description mean that any of the hiragana is detected. FIG. 5C shows an example of a filter dictionary specific to the machine field as another example of the field common filter dictionary, and FIG. 5D shows an example of a corresponding document. FIG. 5E shows an example of a filter dictionary specific to the patent field and an example of a document corresponding to the filter dictionary. These are field common filter dictionaries 83 for a single field or a plurality of fields depending on the file type.
This filter processing is executed by citing one or more dic. Needless to say, the user can use the common filter dictionary 83di in this field.
c The contents can be edited as desired and used.

【００３１】第３段階のユーザ辞書処理部８４は、推敲
者個人別の随時カスタマイズして使用するユーザフィル
タ辞書８５dicによるフィルタ処理であり、図６（ａ）
の符号直前ユーザフィルタ辞書８５dic1と図６（ｃ）の
符号直後ユーザフィルタ辞書８５dic2とで辞書を構成す
る。これは、原稿執筆者の個性や経験や技術的背景によ
って定義符号文字の決め方が異なったり特徴的なユニー
クな符号記述を採用する場合が多々ある為、より的確に
フィルタ分類する為、利用者自身がファイル種に応じて
個々にフィルタ語句を随時登録して使用に供する個人用
のユーザフィルタ辞書８５dicである。ここでの処理
は、ユーザフィルタ辞書８５dicを利用する点を除けば
上記説明のフィルタ処理と同様であり、中間符号１０２
mid直前と直後の両方が一致した場合はノイズ符号１０
６nzとして識別処理し、一方のみが一致した場合はノイ
ズ疑惑符号１１６nzとして識別処理する。ここで符号前
後の一致とは、上記の第２段階の共通フィルタ辞書８３
dic及び第１段階の基本フィルタ辞書６４dicを含めた前
後一致を意味する。例えば共通フィルタ辞書８３dicで
直前用語が一致し、本ユーザフィルタ辞書８５dicで直
後が一致した場合、前後一致とみなす。これにより第１
段階処理後の疑惑符号１０７や中間符号１０２midの中
から更にノイズ符号１０６nzやノイズ疑惑符号１１６nz
として識別分類される。図６（ａ）は符号直前ユーザフ
ィルタ辞書例に該当する文書例を図６（ｂ）に示す。ま
た図６（ｃ）は符号直後ユーザフィルタ辞書例に該当す
る文書例を図６（ｄ）に示す。ところで、文書記述によ
っては非符号として明示的に除外させたい場合がある。
この場合の例としては、文書中に半角スペース文字ある
いは全角スペース文字をフィルタさせるべき文字列の直
前あるいは直後に意図的に記述しておき、これに対応す
る半角／全角スペース文字をユーザフィルタ辞書に登録
しておいて確実にフィルタ除外処理させる使用方法もあ
る。The user dictionary processing section 84 in the third stage is a filtering process using a user filter dictionary 85dic that is customized and used for each elaborator as needed.
The user filter dictionary 85dic1 immediately before the code and the user filter dictionary 85dic2 immediately after the code in FIG. 6C constitute a dictionary. This is because, in many cases, the definition code characters are determined differently depending on the personality, experience, or technical background of the manuscript author, or a unique code description that is characteristic is adopted. Is a personal user filter dictionary 85dic for individually registering and using filter words as needed according to the file type. The processing here is the same as the above-described filter processing except that the user filter dictionary 85dic is used.
If both immediately before and after mid match, noise code 10
The identification processing is performed as 6nz, and if only one of them matches, the identification processing is performed as the noise suspect code 116nz. Here, the match before and after the code means the common filter dictionary 83 in the second stage.
dic and match before and after including the first stage basic filter dictionary 64dic. For example, if the term immediately before matches in the common filter dictionary 83dic and the word immediately after matches in the user filter dictionary 85dic, it is regarded as a match before and after. This makes the first
The noise code 106nz and the noise suspect code 116nz are further selected from the suspect code 107 and the intermediate code 102mid after the step processing.
Classified as FIG. 6A shows an example of a document corresponding to an example of a user filter dictionary immediately before a code, in FIG. 6B. FIG. 6C shows an example of a document corresponding to the example of the user filter dictionary immediately after the code, in FIG. 6D. By the way, depending on the document description, there is a case where it is desired to explicitly exclude it as a non-sign.
As an example of this case, a half-width space character or a full-width space character is intentionally described immediately before or immediately after a character string to be filtered in the document, and the corresponding half-width / full-width space character is written in the user filter dictionary. There is also a usage method in which a filter is registered and a filter exclusion process is surely performed.

【００３２】第４段階の符号明示用語処理部８６は、図
２７（ａ、ｂ）に示す符号明示用語辞書８６dicを利用
して目的符号１０４objを直接特定する処理を行う。一
般に利用分野や執筆者にとって特徴的固有の用語を使用
する場合が多く存在する。この点に着目して図２７
（ａ）に示す符号直前明示用語辞書８６dic1と、図２７
（ｂ）に示す符号直後明示用語辞書８６dic2を設ける。
そして符号が明示辞書に該当する場合は目的符号１０４
objとして直接分類処理する。これに該当する文書例を
図２７（ｃ）に示す。このように明示用語辞書に該当す
る場合は直ちに目的符号１０４objとして分類処理させ
ることで、より一層認識率の向上が計れる利点が得られ
る。これとは逆に、文書中には特定の部品やＩＣ商品や
装置や勧告等の番号／規格名（例えばトランジスタ規格
名：２ＳＣ１７３）を多用する場合があり、これについ
ても明示的に削除したい場合がある。この為に図２６に
示すような規格名フィルタ辞書６１dicを設け、この辞
書に部分一致する符号に対しては無条件にノイズ符号１
０６nzと見なして除外処理する。この辞書において
「？」記号とは任意の単一あるいは複数の数値文字列
（０〜９９９９９）とする。The fourth step, the code-specific term processing unit 86, performs a process of directly specifying the target code 104obj using the code-specific term dictionary 86dic shown in FIG. 27 (a, b). In general, there are many cases where terms specific to a field of use or an author are used. Focusing on this point, FIG.
FIG. 27A shows an explicit term dictionary 86dic1 immediately before a code shown in FIG.
An explicit term dictionary 86dic2 immediately after the code shown in (b) is provided.
If the code corresponds to the explicit dictionary, the target code 104
Classify directly as obj. FIG. 27C shows a document example corresponding to this. In this way, when the object code corresponds to the explicit term dictionary, the classification process is immediately performed as the object code 104obj, thereby obtaining an advantage that the recognition rate can be further improved. Conversely, the document often uses a number / standard name (eg, transistor standard name: 2SC173) of a specific part, IC product, device, recommendation, etc., and also wants to explicitly delete this. There is. For this purpose, a standard name filter dictionary 61dic as shown in FIG. 26 is provided.
Exclusion processing is performed assuming 06nz. In this dictionary, the “?” Symbol is an arbitrary single or plural numerical character strings (0 to 99999).

【００３３】第５段階の同一用語比較判定処理部８８
は、抽出された用語を利用し、この用語から逆に全文に
対してサーチし、検出されたものを誤記２００errと分
類して取り上げる誤記検出網羅手段であり、第１に符号
誤記検出手段と、第２に無符号誤記検出手段がある。こ
こで符号の記述抜けや誤記によって中間符号１０２mid
として網羅検出できない場合がある。正常な場合を図１
９（ａ）の「Ｒ１２」とし、図３に示す符号適合条件６
２に合致した文書例とする。この例では直前の用語対象
２０１yogo「基準抵抗」が図１９（ｂ）の誤記例にも同
一記述されている場合と仮定する。ここで用語対象２０
１yogo （例「基準抵抗」）の取り出しは、図１５
（ａ）に示す用語先頭側の削除語句辞書１５０dic1と先
頭側非削除平仮名辞書１５０dic2を使用して手前側の不
要語句を削除して適切なる用語対象２０１yogoを取り出
す。一方、図１９（ｂ）の文書例で「Ｒ12」は全角／半
角数値の符号適合条件６２設定により中間符号１０２mi
dとして抽出されなかった文書例と仮定する。また他の
文書例としては符号自体の記述抜けの文書例を図１９
（ｃ）に示し、この場合も中間符号１０２midとして抽
出されない。The same term comparison / determination processing unit 88 in the fifth stage
Is an erroneous detection detecting means that uses the extracted term, searches the entire sentence in reverse from this term, and classifies the detected thing as an erroneous note 200 err and takes it up. Secondly, there is an unsigned error detection means. Here, the intermediate code 102mid
May not be exhaustively detected. Figure 1 shows the normal case
9 (a) as "R12", and the code matching condition 6 shown in FIG.
A document example that conforms to 2. In this example, it is assumed that the immediately preceding term object 201yogo “reference resistance” is also described in the erroneous example in FIG. 19B. Here the term object 20
1yogo (eg "reference resistance")
Using the deleted word dictionary 150dic1 at the head of the term and the non-deleted Hiragana dictionary 150dic2 at the head shown in FIG. 9A, unnecessary words at the front are deleted to extract an appropriate term object 201yogo. On the other hand, in the document example of FIG. 19B, “R12” is an intermediate code 102
Assume a document example not extracted as d. As another document example, a document example in which the description of the code itself is missing is shown in FIG.
As shown in (c), also in this case, it is not extracted as the intermediate code 102mid.

【００３４】そこで前記図１９（ｂ）の誤記文書例を抽
出する為に、第１の符号誤記検出手段では、中間符号１
０２midから除外された抽出対象符号１００allの中から
誤記符号を取り出す。この為には図３に示す符号適合条
件６２の制限を一時的に最大限に拡大して符号抽出を行
い、この結果図１９（ｂ）に示す符号が中間符号として
検出される。この新たな中間符号に対して、この直前の
用語が図１９（ａ）の用語対象２０１yogoと一致するか
を調べ、一致する場合は符号誤記８８missと判断して分
類出力し、後述するの注目表示を可能にする。この検出
手段を設けることにより符号自体の誤記が検出可能にな
る利点が得られる。In order to extract the erroneous document example shown in FIG. 19B, the first code erroneous detection means uses an intermediate code 1
The erroneous code is extracted from the extraction target code 100all excluded from 02mid. For this purpose, codes are extracted by temporarily maximizing the restriction of the code matching condition 62 shown in FIG. 3 and extracting the codes. As a result, the codes shown in FIG. 19B are detected as intermediate codes. For this new intermediate code, it is checked whether the term immediately before this matches the term object 201yogo in FIG. 19 (a), and if it matches, it is determined to be a code error 88miss and classified and output. Enable. Providing this detection means has the advantage that erroneous writing of the code itself can be detected.

【００３５】次に前記図１９（ｃ）の誤記文書例に対応
する為に、第２の無符号誤記検出手段では、他で用語対
象２０１yogoが正しく記載されている場合と仮定し、こ
の用語を取り出す。この為には既に得た全ての用語対象
２０１yogoを使用して全文検索実施し、図１９（ｃ）に
示すように一致用語の直後が無符号のものを見出し、こ
れを無符号誤記８９missと判断して分類出力し、後述す
るの注目表示を可能にする。この検出手段を設けること
により符号記述欠落の誤記に対しても検出可能になる利
点が得られ、これらにより一層推敲作業が容易になる利
点が得られる。Next, in order to cope with the erroneous document example shown in FIG. 19C, the second unsigned erroneous error detection means assumes that the term subject 201yogo is correctly described elsewhere. Take out. For this purpose, a full-text search is performed using all the term objects 201yogo that have already been obtained, and as shown in FIG. 19 (c), an unsigned one immediately after the matching term is found, and this is determined to be an unsigned mistake 89miss. Then, the output is classified and output as described later is enabled. By providing this detection means, it is possible to obtain an advantage that it is possible to detect even an erroneous description of a missing code description, thereby obtaining an advantage that the elaboration work is further facilitated.

【００３６】上記各分類手段に加え、上記で分類された
ノイズ符号１０６nzやノイズ疑惑符号１１６nzの発生回
数が複数の場合は、目的符号１０４objの確率が高い。
これを利用して直前の用語部分１０１の一致部分が有る
場合は目的符号１０４objと見なす判定処理が可能で
る。即ち、この実行選択と同一符号回数Ｍを条件設定部
４０の設定スイッチに設けておき、実行すべき場合は、
各個別符号名毎のノイズ符号１０６nzやノイズ疑惑符号
１１６nzの直前にある用語部分１０１をアンド処理し、
結果の残用語が２文字以上あり、かつその同一符号回数
がＭ回以上の場合はこれを目的符号１０４objとする見
なし処理を行う。これにより目的符号１０４objとして
の識別判定ができ、無用の警告表示を軽減できる利点が
得られることとなる。If the number of occurrences of the noise code 106nz and the noise suspect code 116nz classified in addition to the classification means is plural, the probability of the target code 104obj is high.
By utilizing this, when there is a matching part of the term part 101 immediately before, a determination process that regards the target code 104obj as possible can be performed. That is, the same code count M as the execution selection is provided in the setting switch of the condition setting unit 40, and when it is to be executed,
The term part 101 immediately before the noise code 106nz and the noise suspect code 116nz for each individual code name is AND-processed,
If there are two or more remaining words in the result and the number of same codes is M or more, it is assumed that this is the target code 104obj, and processing is performed. As a result, the identification of the target code 104obj can be determined, and an advantage that unnecessary warning display can be reduced can be obtained.

【００３７】また上記各分類手段に加え、上記で分類さ
れたノイズ符号１０６nzやノイズ疑惑符号１１６nzの中
で、後述する抽出用語表示部分１０１dsp部分の末尾側
からの文字列において、連続漢字／片仮名文字数が多い
場合は目的符号１０４objの確率が高く、目的符号１０
４objと見なす判定処理としても良い場合がある。そこ
で、この実行選択の制御と、末尾側連続漢字文字数Ｐあ
るいは末尾側連続片仮名文字数Ｑあるいは末尾側連続漢
字／片仮名混在文字数Ｒの各条件を条件設定部４０の設
定スイッチに設けておき、実行すべき場合は、各連続文
字数条件Ｐ、Ｑ、Ｒに該当する末尾側文字列数であるか
を検査し、該当する場合はこれを目的符号１０４objと
する見なし処理を行う。これにより目的符号１０４obj
としての識別判定ができ、無用の警告表示を更に軽減で
きる利点が得られることとなる。In addition to the above classification means, in the noise code 106nz and the noise suspicion code 116nz classified in the above, in the character string from the end side of the extracted term display portion 101dsp described later, the number of continuous kanji / Katakana characters If there are many, the probability of the target code 104obj is high,
In some cases, the determination processing may be considered as 4obj. Therefore, the control of the execution selection and the conditions of the number P of the trailing side continuous Chinese characters, the number Q of the trailing side continuous katakana characters, or the number R of the trailing side continuous Kanji / Katakana mixed characters R are provided in the setting switch of the condition setting unit 40 and executed. If it should be, it is checked whether or not the number of the character strings at the end corresponding to each of the continuous character number conditions P, Q, and R. If so, it is assumed that this is the object code 104obj. As a result, the object code 104obj
, And the advantage that unnecessary warning display can be further reduced can be obtained.

【００３８】上記各分類手段により分類されたノイズ符
号１０６nzやノイズ疑惑符号１１６nzや疑惑符号１０７
や目的符号１０４objには、後で領域別一覧表示させる
為に領域別毎、かつ同一符号毎に発生回数を計数する。
ただし第２／第３用語等の用語識別マークのついた中間
符号は同一符号であっても個別に発生回数を計数する。The noise code 106nz, the noise suspicious code 116nz, and the suspicious code 107 classified by the respective classifying means are described.
For the target code 104obj, the number of occurrences is counted for each region and for each same code in order to display a list by region later.
However, the number of occurrences of an intermediate code having a term identification mark such as a second / third term is counted individually even if the code is the same.

【００３９】次に図１に示す定義符号抽出部９０を説明
する。定義符号抽出部９０は、オプションであり、図１
７の例に示す「符号の説明」領域欄が有る場合、ここに
記述されている符号と用語を基準の比較参照用として得
る為に、参照符号３０１とこれに対応する参照用語３０
２を取り出し符号集計表示部１０に供給する。更に、こ
の「符号の説明」領域にある符号名と一致する中間符号
１０２midは明白なる目的符号１０４objとして直接分類
処理し、この符号が疑惑符号やノイズ符号と分類されて
いても優先分類する。他文書ファイルでは、一般に巻末
等にインデックス一覧が付与されている場合があり、こ
のインデックス領域に載っている符号対象を同様の手段
で参照符号３０１と参照用語３０２を取り出す。また、
図２に示すコメント削除部３２が出力する図番号情報３
２zuがある場合は、これから得られる場合は、同様に処
理し、併せて符号集計表示部１０に供給する。また、符
号適合条件６２の設定によっては中間符号１０２midと
して抽出されない場合がある。この為のオプション動作
として、ここで取り出した参照符号３０１を基準にして
中間符号１０２mid中に存在しない場合は、全文章に渡
ってサーチし、見出した同一符号名を中間符号１０２mi
dかつ目的符号１０４objとして直ちに分類処理しても良
い。Next, the definition code extracting section 90 shown in FIG. 1 will be described. The definition code extraction unit 90 is optional,
In the case where there is an “explanation of code” area column shown in the example of FIG. 7, the reference code 301 and the corresponding reference term 30 are used to obtain the code and term described here for comparison reference of the standard.
2 is extracted and supplied to the code tally display unit 10. Further, the intermediate code 102mid corresponding to the code name in the "description of code" area is directly classified as an obvious target code 104obj, and priority classification is performed even if this code is classified as a suspicious code or a noise code. In other document files, in general, an index list may be provided at the end of the book or the like, and reference numerals 301 and reference terms 302 are extracted by the same means from code objects in this index area. Also,
Figure number information 3 output by the comment deletion unit 32 shown in FIG.
When there is 2zu, if it is obtained from this, the same processing is performed, and it is also supplied to the code tally display unit 10. Also, depending on the setting of the code matching condition 62, the code may not be extracted as the intermediate code 102mid. As an optional operation for this, if the reference code 301 extracted here does not exist in the intermediate code 102mid, the search is performed over the entire text and the same code name found is searched for the intermediate code 102mi.
The classification processing may be immediately performed as d and the target code 104obj.

【００４０】次に図１に示す符号集計表示部１０を説明
する。符号集計表示部１０は、上記で得られた文書全体
に渡る中間符号１０２mid、用語部分１０１の抽出及び
分類結果情報を受けて、図２０の表示出力例に示すよう
に、符号名昇順にソートし、通常のものは各符号毎に１
行に表示する。即ち、抽出用語部分１０１を表示し、符
号名を表示し、当該符号の領域別発生回数を表示し、
「符号の説明」欄の参照用語３０２を表示する。この表
示で注目表示対象がある場合はその部分を対応した注目
表示形態で表示をする。この処理構成は、図７に示すよ
うに符号回数計数表示部１１と、複数回発生用語部アン
ド処理部１２と、参照用語比較警告処理部１３と、無用
参照符号の警告処理部１４と、疑惑符号警告表示処理部
１５と、全角／半角同符号識別警告処理部１６と、表示
行表示制御部１７とで成る。Next, the code tally display section 10 shown in FIG. 1 will be described. The code tally display unit 10 receives the intermediate code 102mid and the term portion 101 extracted and classified result information over the entire document obtained above and sorts them in ascending code name order as shown in the display output example of FIG. , Normal one for each code
Display on a line. That is, the extracted term part 101 is displayed, the code name is displayed, the number of occurrences of the code by region is displayed,
The reference term 302 in the “explanation of reference numerals” column is displayed. If there is an attention display target in this display, that portion is displayed in a corresponding attention display mode. As shown in FIG. 7, this processing configuration includes a code count display section 11, a multiple occurrence term section and processing section 12, a reference term comparison warning processing section 13, a useless reference code warning processing section 14, It comprises a code warning display processing unit 15, a full-width / half-width same-code identification warning processing unit 16, and a display line display control unit 17.

【００４１】符号回数計数表示部１１では、図２０の集
計一覧表示出力例に示すように、同一符号かつ同一用語
対象２０１yogo毎全領域に渡る発生回数４００の計数表
示と、各領域毎の個別発生回数４０１を計数表示する。
更に全符号名数４０２と、符号の全発生回数４０３を計
数表示し、非符号発生回数４０４を計数表示する。この
非符号発生回数４０４は、場合によっては誤記の可能性
があり得るので、この指針表示になる。ここで非符号の
計数対象はノイズ符号１０６nzとし表示出力されない。
他方符号対象はノイズ疑惑符号１１６nzと疑惑符号１０
７と目的符号１０４objとし、これら符号対象は図２０
に示すように一覧表示される。As shown in an example of a tabulated list display in FIG. 20, the code count display section 11 counts and displays the number of occurrences 400 over the entire area for the same code and the same term subject 201yogo, and generates an individual occurrence for each area. The number 401 is counted and displayed.
Further, the total number of code names 402 and the total number of code occurrences 403 are counted and displayed, and the non-code occurrence number 404 is counted and displayed. Since the number of non-code occurrences 404 may possibly be erroneous in some cases, this guideline is displayed. Here, the non-code counting target is the noise code 106nz and is not displayed and output.
On the other hand, the code objects are the noise suspect code 116nz and the suspect code 10
7 and the object code 104obj, and these code objects are
The list is displayed as shown.

【００４２】１表示行には、同一符号１０２midかつ同
一用語対象２０１yogo毎に１行で表示する。当然ながら
第２用語２０２yogo、第Ｎ用語２０Ｎyogoは別行に個別
に表示する。また図２０に示す集計一覧表示例における
１行中の表示順は、最初に各符号毎の発生回数４００、
抽出用語表示部分１０１dsp、符号名、請求項発生回
数、従来技術発生回数、解決手段発生回数、実施例発生
回数、「符号の説明」の参照用語３０２の順である。こ
こで参照用語３０２の表示において図１５（ｄ）に示す
ように参照用語の複数定義用語３０５（第Ｎ用語に相当
する用語）がある場合は、抽出用語表示部分１０１dsp
と一致する参照用語３０２を表示出力する。この表示例
において、計数値が０値の領域欄は「‥」記号を付与し
て見易くする。また領域名称４０６が４点（請求項、従
来技術、解決手段、実施例）と圧縮していて他の領域は
４点の中の該当領域（例えば「実施の形態」と「作用」
と「効果」と「要約書」領域を「実施例」領域に該当さ
せる）に併合圧縮して通常のＤＯＳ画面の８０文字表示
に収まるようにした例である。In one display line, the same reference numeral 102mid and the same term subject 201yogo are displayed in one line. Naturally, the second term 202yogo and the Nth term 20Nyogo are individually displayed on separate lines. In addition, the display order in one line in the example of the summary list display shown in FIG.
The extracted term display portion 101dsp, the code name, the number of claims, the number of prior art occurrences, the number of occurrences of the solving means, the number of occurrences of the embodiment, and the reference term 302 of "explanation of the sign" are in that order. Here, in the display of the reference term 302, when there is a plurality of definition terms 305 (terms corresponding to the Nth term) of the reference term as shown in FIG.
Is displayed and output. In this display example, the area column where the count value is 0 is given a symbol “Δ” to make it easier to see. In addition, the area name 406 is compressed to four points (claims, conventional technology, solution, embodiment), and the other areas are the corresponding areas (for example, “embodiment” and “action”) among the four points.
This is an example in which the “commercial”, “effect”, and “abstract” areas correspond to the “embodiment” area) so as to fit in the 80-character display of a normal DOS screen.

【００４３】複数回発生用語アンド処理部１２では、抽
出用語表示部分１０１dspに表示する文字列を生成出力
する。ただし明示的に宣言した第２用語〜第Ｎ用語は各
々独立表示行に表示する為、個別にアンド処理をする。
先ず同一符号１０２midに対応する用語部分１０１同士
をアンド処理して残った一致文字列部分を抽出用語理表
示部分１０１dspとして表示出力する。更にこれらアン
ド処理時において、アンド処理の残文字列が無くなる場
合あるいは１文字のみとなる場合は、そのものは誤記用
語１０９あるいは誤記符号１０８missと見なして続く次
の行表示分けし、かつ注目表示形態で両者を表示出力す
る。ここで注目表示とは、推敲者が容易に注目できるよ
うな表示形態を付与することであり、例えば複数種のカ
ラー表示、反転表示、点滅表示、下線表示、網目表示等
であり、誤記判定ランクや疑惑の程度により使い分けを
する。この注目表示により誤記符号１０８missや誤記用
語１０９を容易に認識判断できる利点が得られる。また
これらの一覧表示出力形態によって、どの符号がどの領
域で発生し使用頻度が何回か一目瞭然に確認でき、これ
ら符号発生の分布情報から符号の誤記に対する推敲判断
を容易化する大きな利点が得られる。The multiple occurrence term AND processing section 12 generates and outputs a character string to be displayed in the extracted term display portion 101dsp. However, the second term to the N-th term explicitly declared are individually AND-processed in order to be displayed on independent display lines.
First, the word part 101 corresponding to the same code 102mid is AND-processed, and the remaining matched character string part is displayed and output as an extracted terminology display part 101dsp. Further, in the AND processing, when the remaining character string of the AND processing is lost or becomes only one character, it is regarded as the erroneous term 109 or the erroneous sign 108miss, and the next line is divided into the following lines. Both are displayed and output. Here, the attention display is to give a display form that the elaborator can easily pay attention. For example, a plurality of types of color display, reverse display, blinking display, underline display, mesh display, etc. And the degree of suspicion. This attention display has an advantage that the erroneous code 108miss and the erroneous term 109 can be easily recognized and determined. In addition, by using these list display output modes, it is possible to easily confirm at a glance which codes are generated in which areas and how many times they are used, and it is possible to obtain a great advantage of facilitating the elaboration judgment for the erroneous writing of the codes from the distribution information of these codes. .

【００４４】参照用語比較警告処理部１３では、参照用
語３０２との不一致がある場合誤記と見なして注目表示
処理する。即ち、図２０に示すように上記定義符号抽出
部９０で得た「符号の説明」領域の参照符号３０１と一
致する中間符号１０２mid表示行に対して、抽出用語表
示部分１０１dspと参照用語３０２（複数定義用語３０
５が有る場合は一致文字数の多い参照用語）とを比較し
て、参照用語３０２と全文字列一致しない場合は、参照
用語３０２表示部分に対して不一致部分を注目表示形態
５２１で表示させる。これにより誤記用語１０９部分を
明確に指摘することができ、的確な推敲判断を容易化す
る利点が得られる。無論参照符号３０１に該当しない中
間符号１０２midは空白表示となり、「符号の説明」欄
中に無いことが明白であり、目的符号１０４obj以外の
誤記その他であることも容易に推敲者に直視できる利点
が得られる。In the reference term comparison warning processing section 13, when there is a mismatch with the reference term 302, it is regarded as an erroneous description and attention display processing is performed. That is, as shown in FIG. 20, for an intermediate code 102mid display line that matches the reference code 301 in the “description of code” area obtained by the definition code extraction unit 90, the extracted term display part 101dsp and the reference term 302 Definition term 30
In the case where all the character strings do not match the reference term 302, a part that does not match the reference term 302 display part is displayed in the attention display form 521. As a result, the erroneous term 109 can be clearly pointed out, and an advantage of facilitating accurate elaboration determination can be obtained. Of course, the intermediate code 102mid which does not correspond to the reference code 301 is blanked out, and it is clear that it is not in the “description of code” column. can get.

【００４５】無用参照符号の警告処理部１４では、「符
号の説明」領域で得た参照符号３０１が全文中に存在し
なかった場合、誤記あるいは記述抜けと考えられる為注
目表示する。即ち図２４に示すように、前記参照用語比
較警告処理部１３で未使用となった参照符号３０１が有
るかを調べ、有れば無用の符号宣言として独立行に注目
表示形態５１２で表示出力する。これは明らかな符号名
記述抜け、あるいは誤記符号１０８miss、あるいは「符
号の説明」での誤記、過剰記述が考えられ、何れにして
も誤記の存在確率が高く、的確な指摘表示できる利点が
得られる。When the reference code 301 obtained in the "explanation of code" area does not exist in the entire text, the warning processing unit 14 of the useless reference code displays a notice because it is considered to be erroneous or missing. That is, as shown in FIG. 24, the reference term comparison / warning processing section 13 checks whether there is an unused reference code 301, and if so, displays it as an unnecessary code declaration on an independent line in the attention display form 512. . This may be due to obvious missing code name description, erroneous reference code 108miss, or erroneous description in “description of code” or excessive description. In any case, the existence probability of erroneous description is high, and the advantage that accurate indication can be obtained can be obtained. .

【００４６】疑惑符号警告表示処理部１５では、ノイズ
疑惑符号１１６nzあるいは疑惑符号１０７に分類された
符号に対しては、図２０の集計一覧表示に示すように該
当符号の行全体を疑惑注目表示形態５３１で表示させ
る。これにより目的符号１０４objとは異なることを明
示され詳細確認要符号であることを示している。疑惑符
号の場合はフィルタ条件や辞書内容により誤記であるか
の自動判定困難であり、これらについては推敲者自身が
判断するのが適切である。通常利用者側は、これら疑惑
符号が最少となるように各種フィルタ辞書を適切に随時
編集を行って利便性を容易に向上可能である。この疑惑
符号行の表示において、推敲者は、アンド処理表示部分
１０１dspと各領域の発生回数あるいは参照用語３０２
の有無の総合情報から、誤記か否かの判断は比較的容易
な利点が得られる。The suspicious code warning display processing unit 15 displays the entire row of the suspicious code as a noise suspicious code 116nz or a suspicious code 107 as shown in the summary list display of FIG. 531 is displayed. This clearly indicates that the code is different from the target code 104obj, and indicates that the code is a detailed confirmation required code. In the case of a suspicious code, it is difficult to automatically determine whether or not it is an erroneous description due to filter conditions or dictionary contents, and it is appropriate for the elaborator to judge these. Usually, the user can easily edit various filter dictionaries as needed so as to minimize these suspicious codes, so that convenience can be easily improved. In displaying the suspicious code line, the elaborator specifies the AND processing display portion 101dsp and the number of occurrences of each area or the reference term 302.
It is relatively easy to judge whether or not there is an erroneous writing from the comprehensive information on the presence or absence of the error.

【００４７】全角／半角同符号識別警告処理部１６は、
同一符号に対しての記述を全角と半角の両方で記述し、
両者を同一と認識すべき場合と、逆に一方の記述は誤記
符号１０８missと判断すべき場合の両面があり、条件設
定スイッチにより処理が変わる。即ち、条件設定部４０
からの全角／半角符号識別用の設定スイッチを受けて、
第１に全角／半角同一符号とすべき場合は、両符号を全
角符号とみなして併合して計数し１行に表示し、第２に
全角のみ符号とすべき場合は、図２０に示すように半角
記述符号を誤記とみなす注目表示形態５４１で表示し、
第３に半角のみ符号とすべき場合は、全角記述符号を誤
記とみなす注目表示形態で表示出力する。これにより誤
記判断の一助とする利点が得られる。The full-width / half-width same-sign identification warning processing unit 16
Write the description for the same sign in both full-width and half-width,
There are both cases where both should be recognized as the same, and conversely, one of the descriptions should be determined to be erroneous code 108miss, and the processing changes depending on the condition setting switch. That is, the condition setting unit 40
Receiving a setting switch for full-width / half-width code identification from
First, when the same two-byte / one-byte codes are to be used, both codes are regarded as full-width codes, merged, counted and displayed on one line, and secondly, when only two-byte characters are to be used, as shown in FIG. Is displayed in the attention display form 541 in which the half-width description code is regarded as an erroneous description.
Third, when only a half-width code is to be used, the full-width description code is displayed and output in a noticeable display form in which the code is regarded as erroneous. This has the advantage of assisting erroneous writing decisions.

【００４８】表示行表示制御部１７は、推敲を要しない
無警告行表示を抑止したい場合に使用する。例えば一覧
表示の符号確認後、注目表示された疑問となる符号行の
みを表示させたい場合があり、これに対応した表示オプ
ションであり、注目表示行のみを表示するか否かの表示
出力制御を行う。条件設定部４０からの設定スイッチを
受けて、注目表示行のみ表示設定の場合は、図２２に示
すように、注目表示部分が無い無警告行はこの表示出力
がスキップされる。これにより多数符号の場合等の一覧
表示画面が流れて消えない為、目的とする推敲対象のみ
が表示されて推敲が容易となる利点が得られる。なお、
前記説明では注目表示に対する表示制御であったが、所
望により指定した注目表示形態のもののみを表示する条
件設定部４０の設定スイッチを設けて実施しても良い。The display line display control unit 17 is used when it is desired to suppress the display of a non-warning line that does not require elaboration. For example, after confirming the sign of the list display, there is a case where it is desired to display only the questionable code line displayed in the attention, and a display option corresponding to this, the display output control of whether to display only the attention display line is performed. Do. In the case where the setting switch is received from the condition setting section 40 and only the display row of interest is set to be displayed, as shown in FIG. 22, the display output is skipped for a non-warning row having no display section of interest. As a result, the list display screen in the case of a large number of codes does not flow and does not disappear, so that there is an advantage that only the target object to be revised is displayed and the revision is easy. In addition,
In the above description, the display control is performed for the attention display. However, the display control may be performed by providing a setting switch of the condition setting unit 40 that displays only the attention display mode designated as desired.

【００４９】（実施例２）なお、上記実施例１の説明で
は、各フィルタ処理により直ちにノイズ符号１０６nzや
ノイズ疑惑符号１１６nzや疑惑符号１０７や目的符号１
０４objと分類処理する事例であったが、以下に説明す
る重み付け方式による疑惑度判定手段で実施しても良
い。即ち抽出した中間符号１０２mid毎に重量値カウン
タを設けて、各フィルタ処理により該当した場合は対応
する重み付け値（使用者が任意変更可能）をその重量値
カウンタに加算する。例えば図４の基本フィルタ辞書６
４dic該当時は１０ポイント加算し、図５の分野共通フ
ィルタ辞書８３dic該当時は５ポイント加算し、図６の
ユーザフィルタ辞書８５dic該当時は２ポイント加算す
る。最終処理段階において得られた重量値カウンタの値
を受けて、所望値により複数ランクに分け、この重量ラ
ンク別の表示形態を任意選択出来る手段を設け、この結
果を受けて符号集計表示部１０が対応した注目表示する
ように自由度を持たせる判定と注目表示手段としても良
い。(Second Embodiment) In the description of the first embodiment, the noise code 106nz, the noise suspect code 116nz, the suspect code 107, and the target code 1
Although the case where the classification process is performed as 04obj is performed, the suspicion degree determination unit using a weighting method described below may be used. That is, a weight value counter is provided for each of the extracted intermediate codes 102mid, and if a corresponding weight value is obtained by each filtering process, the corresponding weight value (which can be arbitrarily changed by the user) is added to the weight value counter. For example, the basic filter dictionary 6 in FIG.
When 4dic is applied, 10 points are added. When the field common filter dictionary 83dic shown in FIG. 5 is applied, 5 points are added. When the user filter dictionary 85dic shown in FIG. 6 is applied, 2 points are added. A means for receiving the value of the weight value counter obtained in the final processing stage, dividing the display into a plurality of ranks according to a desired value, and arbitrarily selecting a display mode for each weight rank is provided. It is also possible to use the determination and attention display means to have a degree of freedom so as to display the corresponding attention.

【００５０】（実施例３）上記一覧表示形態の結果を推
敲者がカーソル指示（又はマウス指示）した中間符号１
０２mid、あるいは注目表示対象の中間符号１０２midを
受けて、第１手段として図１８に示すように従来技術で
あるＫＷＩＣ編集手段を用いて編集実施しても良いし、
また第２手段としてＷＩＮＤＯＷＳ上のマルチウィンド
ウ機能を利用して、該一覧表示ウィンドウ表示を見なが
ら、別ウィンドウ画面において当該符号の位置情報や行
番号情報２４gyouを与えて検査対象の文書ファイル（ソ
ースファイル）を編集しても良い。(Embodiment 3) Intermediate code 1 in which the result of the above-mentioned list display form is pointed by a cursor (or a mouse) by an elaborator
Upon receiving 02mid or the intermediate code 102mid of the display object of interest, the KWIC editing means of the prior art may be used as the first means for editing as shown in FIG.
As a second means, using the multi-window function on WINDOWS, while viewing the list display window display, the position information of the code and the line number information 24gyou are given on another window screen to provide a document file to be inspected (source file). ) May be edited.

【００５１】（実施例４）他の実施例としては、符号の
特殊なものとして図番号１０２zuがある。この図番号１
０２zuも一種の符号と考えることができ、これを符号と
見なした応用実施例を以下に説明する。一般に文書中に
は多数の図面を参照して説明に使用される。無論図番号
自身の誤記も許されず推敲の対象である。また図番号１
０２zuの数値は連続する昇順が一般的である。そこで上
記実施例の符号の一覧表示の実現手段と同様の技術的手
段を使用して、図番号１０２zu名を抽出し、この発生領
域別の発生回数を計数して、図３０に示すように図番号
一覧表示させ、各図番号毎に誤記／記述抜け／過剰記述
の有無を注目表示させることで推敲者への的確な推敲援
助が可能になる。ところで図番号１０２zuの記述形態
は、産業分野やファイル種や執筆者等により多様であり
一様ではない。この為図３１（ａ）に示す図番号参照テ
ーブルを利用して多様記述形態の図番号１０２zuを比較
させ、図３１（ｂ）に示す統一された画一図番号に変換
してから比較、計数を行う。この辞書において「？」記
号とは任意の単一あるいは複数の全角／半角数値文字列
（１〜９９９）とし、英字「ａ、ｎ」は任意の全角／半
角英字とする。この統一化変換例としては、「図４
（ａ）」、「図４ａ」、「第４図Ａ」、「４ａ図」、
「図４ー（ａ）」、「ｆｉｇ．４ａ」の記述例を統一し
た「図４（ａ）」と変換してから使用に供する。(Embodiment 4) As another embodiment, FIG. This figure number 1
02zu can also be considered as a kind of code, and an application example in which this is regarded as a code will be described below. It is generally used in the description with reference to a number of drawings. Of course, errors in the figure numbers themselves are not allowed and are subject to elaboration. Figure number 1
The numerical value of 02zu is generally in a continuous ascending order. Then, using the same technical means as the means for realizing the code list display of the above embodiment, the figure number 102zu is extracted, and the number of occurrences for each occurrence area is counted, and as shown in FIG. By displaying a list of numbers and noting the presence / absence of an erroneous description / missing description / excessive description for each figure number, accurate refining assistance can be provided to a refining person. By the way, the description form of FIG. 102zu varies and varies depending on the industrial field, file type, author, and the like. For this reason, the figure number 102zu in the diverse description form is compared using the figure number reference table shown in FIG. 31 (a), and converted into the unified drawing number shown in FIG. I do. In this dictionary, the "?" Symbol is any single or plural full-width / half-width numeric character strings (1 to 999), and the alphabetic characters "a, n" are any full-width / half-width alphabet. As an example of this unified conversion, see FIG.
(A) "," FIG. 4a "," FIG. 4A "," FIG. 4a ",
The description examples of "FIG. 4- (a)" and "fig.4a" are converted into a unified "FIG. 4 (a)" before use.

【００５２】特許明細書の場合においては、図番号１０
２zuのインデックスとしての「図の説明」欄領域があ
り、連続した数値昇順と決められている。これを利用し
て、この領域の図番号１０２zurefを上記実施例１同様
に参照図番号１０２zurefとして取り出して、他領域の
文書中から抽出した図番号１０２zuと照合し、誤記／記
述抜け／過剰記述の場合は注目表示形態５５１で表示し
て、より一層の推敲援助表示を可能にする。図３０に示
す図番号一覧表示例を説明する。抽出した図番号を昇順
にソートして各図番号を１行に表示し、各行中の表示順
は、最初に抽出して統一変換した図番号１０２zu名称、
従来技術発生回数、解決手段発生回数、実施例発生回
数、その他発生回数、「図の説明」の発生回数、判定結
果表示部６０６の順である。この表示例においても、計
数値が０値の領域欄は「‥」記号を付与して見やすくし
ている。また領域名称４０６も実施例１と同様に併合圧
縮してあり、例えば「従来技術」領域とは「従来の技
術」と「解決しようとする課題」を併合したものであ
り、条件設定部４０の設定スイッチの変更により個別表
示にすることも任意可能である。この表示で判定結果表
示部６０６には「図の説明」に記載の図番号１０２zure
fがどの領域にも無い場合、あるいは逆に文書中の図番
号１０２zuが「図の説明」に記載無い場合には「要削除
／追加」の注目表示形態で表示し、英字添え字のある図
番号（例えば「図１（ａ）」）も個別の図番号と見なし
て抽出し、文書中に該当する図番号１０２zuが無い場合
は「追加要望」の注目表示形態で表示する。更に「図の
説明」欄には「従来の」か「本発明の」と文書中に明示
されている図番号にはこれを検出して「図の説明」表示
欄に従来／発明付与表示６０４を設けても良い。これら
によりどの領域で出現すべきかの推敲判断を一層容易に
することができる。図３２は本明細書自体の実行結果を
示す図番号一覧表示例である。In the case of a patent specification, FIG.
There is an "explanation of figure" column area as an index of 2zu, and it is determined that the numerical values are consecutive in ascending order. Utilizing this, the figure number 102zuref of this area is extracted as the reference figure number 102zuref in the same manner as in the first embodiment, and is compared with the figure number 102zu extracted from the document in the other area. In this case, it is displayed in the attention display form 551, and further elaboration assistance display is enabled. An example of the diagram number list display shown in FIG. 30 will be described. The extracted figure numbers are sorted in ascending order and each figure number is displayed on one line. The display order in each line is the figure number
The order of occurrence of the prior art, the number of occurrences of the solving means, the number of occurrences of the example, the number of occurrences of others, the number of occurrences of "explanation of figure", and the determination result display unit 606 are in that order. Also in this display example, the area column in which the count value is 0 is provided with a symbol “Δ” to make it easier to see. The area name 406 is also merged and compressed in the same manner as in the first embodiment. For example, the “prior art” area is a combination of “conventional technology” and “problem to be solved”. The individual display can be optionally performed by changing the setting switch. In this display, the judgment result display unit 606 displays the figure number 102zure described in “Description of the figure”.
If f is not in any area, or conversely, if the figure number 102zu in the document is not described in the "Description of figure", it is displayed in the attention display form of "deletion / addition required", and a figure with an alphabetic subscript Numbers (for example, “FIG. 1 (a)”) are also extracted as being regarded as individual figure numbers, and when there is no corresponding figure number 102zu in the document, it is displayed in the attention display form of “addition request”. Further, the figure number specified in the document as "conventional" or "of the present invention" is detected in the "Description of figure" column, and the conventional / invention indication 604 is displayed in the "Description of figure" display field. May be provided. With these, it is possible to further facilitate the elaboration determination of which region should appear. FIG. 32 is a diagram number list display example showing the execution result of the present specification itself.

【００５３】（実施例５）上記実施例１の手段により抽
出した符号と用語からインデックス一覧である「符号の
説明」形態のインデックス自動生成手段について説明す
る。この為には、用語部分１０１先頭側にある無用語句
を適切に削除する手段必要がある。図１５（ａ）に示す
先頭側削除語句辞書１５０dic1例は、先頭側にある通常
は用語と見なされない削除対象の文字列（主に格助詞や
語尾や接続助詞や助動詞等）が登録されている。また図
１５（ｂ）に示す先頭側非削除平仮名辞書１５０dic2例
は、逆に図１５（ａ）に示す先頭側削除語句辞書１５０
dic1で削除をさせない為の平仮名付き語句が登録されて
いて、これに該当する平仮名は削除しない。例えば用語
部分１０１「回路により読み出しメモリ」において「出
し」が検出された為この位置の「し」は削除されず最終
生成用語は「読み出しメモリ」となる。この両辞書も、
利用者が辞書内容を随時追加／修正可能であり、自動生
成品質を容易に向上可能である。この先頭側削除語句辞
書１５０dic1を利用して、図１５（ｃ）に示す用語部分
１０１と対応する符号例に対して、先頭側にある非用語
語句部分を削除し、更に同一用語名を集合処理し、符号
を昇順にソートし、「符号の説明」形態で出力する。こ
の自動生成出力例を図１５（ｄ）に示す。ここで第Ｎ用
語がある場合は、括弧を付けた複数定義用語３０５形式
で集合出力する。この場合は、明細書の「符号の説明」
に対応した表現形式で出力した場合である。同様に他の
出力形態に対応は容易であり、例えば一般文書等の巻末
に付与されるインデックス一覧表形式で自動生成出力し
ても良い。この実施例に示したように抽出符号から実用
性の高いインデックス一覧を機械的に自動生成出力手段
を実現でき、これにより執筆者はインデックス一覧の大
部分の作成作業から開放できる極めて便利な利点が得ら
れる。(Embodiment 5) An automatic index generation means in the form of "explanation of codes" which is an index list based on the codes and terms extracted by the means of the first embodiment will be described. For this purpose, there is a need for a means for appropriately deleting the non-phrase phrase at the head of the term part 101. In the example of the leading-side deleted phrase dictionary 150dic1 shown in FIG. 15A, a character string (mainly a case particle, an ending, a connecting particle, an auxiliary verb, and the like) at the head which is not normally regarded as a term is registered. I have. In addition, the example of the head side non-deletion hiragana dictionary 150dic2 shown in FIG. 15B is reverse to the head side deletion word dictionary 150dic shown in FIG.
A word with hiragana to prevent deletion is registered in dic1, and the corresponding hiragana is not deleted. For example, since "out" is detected in the term portion 101 "read memory by circuit", the "shi" at this position is not deleted and the final generated term is "read memory". Both dictionaries,
The user can add / modify the contents of the dictionary at any time, and the quality of automatic generation can be easily improved. Using the head side deleted phrase dictionary 150dic1, the non-term phrase part at the head side is deleted from the code example corresponding to the term part 101 shown in FIG. Then, the codes are sorted in ascending order and output in the form of “explanation of codes”. FIG. 15D shows an example of this automatically generated output. If there is an N-th term, it is output collectively in the form of a multi-defined term 305 with parentheses. In this case, "Description of the code" in the specification
This is the case when output is performed in the expression format corresponding to. Similarly, it is easy to cope with other output forms. For example, it may be automatically generated and output in the form of an index list attached to the end of a general document or the like. As shown in this embodiment, a highly practical index list can be mechanically and automatically generated and output from the extracted codes by means of an automatic generation means. This provides an extremely convenient advantage that the author can be freed from most of the work of creating the index list. can get.

【００５４】（実施例６）なお、上記実施例１の手段に
加えて、指定された中間符号１０２mid個々の詳細表示
機能手段を設けても良い。この指定符号の詳細表示機能
とは、符号発生毎の指示符号１０７前後文脈を個別に詳
細表示して推敲を容易にする機能である。通常は、最初
に上記符号集計一覧表示の結果を見て、調べたい指示符
号１０７位置に画面のカーソル（あるいはマウス）を移
動させて実行する。この実行結果を図１６に示す。この
表示形態例は、指示符号１０７の有る行に対して、表示
先頭部には領域情報２６area表示と行番号情報２４gyou
表示を注目表示し、その文書行先頭から文章を表示し、
指示符号１０７及び手前の用語部分１０１をそれぞれ注
目表示し、以後は指定文字数（ここでの設定スイッチは
２０文字）以下で表示する例である。これにより、どの
領域で使われ、どういう記述内容であるかが一目瞭然に
容易に的確な推敲判断ができる利点が得られる。無論従
来方式のＫＷＩＣ方式により指定符号前後の文脈表示を
する推敲手段を使用しても良い。(Embodiment 6) In addition to the means of the first embodiment, a detailed display function means for each designated intermediate code 102mid may be provided. The detailed display function of the designated code is a function of individually displaying the context before and after the instruction code 107 for each code generation in detail, thereby facilitating revision. Normally, the result is displayed first by moving the cursor (or mouse) on the screen to the position of the instruction code 107 to be examined. FIG. 16 shows the execution result. In this example of the display mode, for the line having the instruction code 107, the display head has the area information 26area display and the line number information 24gyou.
Attention is displayed on the display, the sentence is displayed from the beginning of the document line,
This is an example in which the instruction code 107 and the term part 101 in front are displayed with attention, and thereafter, the display is performed with the designated number of characters or less (the setting switch here is 20 characters) or less. As a result, there is obtained an advantage that it is possible to easily and accurately determine the elaboration of which area is to be used and what the description content is. Of course, an elaboration means for displaying a context before and after a designated code by a conventional KWIC method may be used.

【００５５】（実施例７）なお、上記実施例１の手段に
加えて、原稿ファイル種によって記述スタイルを限定す
る使用禁止語句検出機能手段を設けても良い。この使用
禁止語句検出機能とは、産業分野や原稿ファイル種や領
域によっては特有な使用禁止語句とすべき場合があり、
いわゆる「べからず集」として集大成されている類のも
のである。特許明細書においては図２８（ａ）に示す使
用禁止語句辞書２７１dicのように、請求項領域におい
ては権利範囲あいまいとなる記述の禁止語句がある。こ
の為請求項領域文書のみ検査実行し、該当した場合は図
２８（ｂ）の警告表示例に示すように、２６area表示と
行番号情報２４gyou表示を注目表示し、その文書行先頭
から文章を表示し、該当語句を注目表示した例である。
この場合は符号の誤記ではなく文書語句に対する誤記で
あるが推敲対象であり推敲作業の利便性が向上する。他
ファイル種では、同様の使用禁止語句としては取扱説明
書において、図２９（ａ）に示すように敬語調の語句は
使用せず「ですます調」で記述すべきとする場合があ
る。更に図２９（ｂ）に示す命令口調語句を禁止する場
合があり、又、図２９（ｃ）に示すあいまい的語調の禁
止する場合もある。このように特異分野のファイル種に
対応した使用禁止語句辞書を設けて文章語句の誤記を検
出する。このように、原稿ファイル種によって記述スタ
イルを限定する禁止語句辞書（前記の他には無駄な定型
語辞書や、冗長語辞書や、送り仮名統一辞書や、片仮名
表記法辞書、等）を設けて実行させることで、容易に使
用禁止語句に対する誤記が注目表示されて、見落としの
少ない推敲作業ができ、利便性が一層向上する。(Embodiment 7) In addition to the means of the first embodiment, a use prohibition phrase detecting function means for limiting the description style depending on the type of the document file may be provided. This use prohibited phrase detection function may be a special use prohibited phrase depending on the industrial field, manuscript file type and area,
It is a kind of culmination as a so-called "do not collect". In the patent specification, there is a prohibited word in the claim area where the right range is ambiguous, as in the use prohibited word dictionary 271dic shown in FIG. For this reason, only the claim area document is inspected and, if applicable, as shown in the warning display example of FIG. 28B, the 26 area display and the line number information 24gyou display are noticeably displayed, and the text is displayed from the beginning of the document line. This is an example in which the corresponding phrase is noticeably displayed.
In this case, not the erroneous writing of the code but the erroneous writing of the document phrase, but the object of the elaboration is improved, and the convenience of the elaboration work is improved. In other file types, similar use-prohibited words may be described in the instruction manual in the form of "Damasuma-tone" without using honorific words as shown in FIG. 29 (a). Further, the command tone phrase shown in FIG. 29B may be prohibited, or the ambiguous tone shown in FIG. 29C may be prohibited. As described above, the use prohibited word dictionary corresponding to the file type in the unique field is provided, and the erroneous writing of the sentence word is detected. In this way, a prohibited word dictionary (such as a useless fixed-word dictionary, a redundant word dictionary, a sentence kana unified dictionary, a katakana notation dictionary, etc.) is provided to limit the description style according to the document file type. By executing the process, the erroneous description of the prohibited word is easily displayed in a noticeable manner, the elaboration work with few oversights can be performed, and the convenience is further improved.

【００５６】なお、上記実施例１説明の表示出力の図例
では、ＭＳＤＯＳ上の画面サイズに合わせて横方向８
０文字（全角では４０文字）に収まるように領域を併
合圧縮して図２０に示す集計一覧表示例のように表示出
力する場合の具体例で説明していたが、併合圧縮数を減
らし（あるいは無併合）て個別の領域毎に計数情報を一
覧表示出力し、画面からはみ出る部分の表示は、折り返
し表示とし、あるいは表示画面の縦・横スクロール表示
機能を設けて所望の部分を画面表示できるようにしても
良い。またプリンター出力の場合では一般に多文字印刷
が可能であり、これに対応させて併合圧縮数を減らして
画面表示よりも多くの詳細情報を一覧印刷させても良
い。またＷＩＮＤＯＷＳ上画面においては文字フォント
サイズの変更により縦・横方向任意文字数で表示できる
ので、所望により図２１に示すようにより多くの詳細情
報を一覧表示させる手段を設けても良い。In the example of the display output described in the first embodiment, the horizontal direction is adjusted according to the screen size on MSDOS.
In the specific example of the case where the area is merged and compressed so that it fits into 0 characters (40 double-byte characters) and the display is output as in the example of the total list display shown in FIG. 20, the number of merged compressions is reduced (or No merged), count information is output for each individual area in a list, and the display of the part protruding from the screen is displayed in a wrapped manner, or a vertical / horizontal scroll display function of the display screen is provided so that a desired part can be displayed on the screen. You may do it. In addition, in the case of printer output, multi-character printing is generally possible. In response to this, the number of merged compressions may be reduced to print more detailed information in a list than on the screen. Since the number of characters in the vertical and horizontal directions can be displayed on the WINDOWS upper screen by changing the character font size, a means for displaying a list of more detailed information as shown in FIG. 21 may be provided if desired.

【００５７】また、上記実施例１の説明では特許明細書
の具体例の説明事例であり、「符号の説明」類に相当す
るものは、他文書においてはインデックス一覧があり、
これに対しても同様の手法で実施できることは明白であ
る。また、上記実施例１の説明では、ノイズ符号１０６
nzは一覧表示から除外する場合であったが、所望により
条件設定部４０に設定スイッチを設けてノイズ符号１０
６nzに対しても疑惑符号の警告表示と同様にしてノイズ
符号注目表示形態で表示させて、推敲判断の一助として
表示出力するようにしても良い。また、上記実施例１の
説明では、目的符号選別部８０において複数の辞書（基
本フィルタ辞書６４dic、共通フィルタ辞書８３dic、ユ
ーザフィルタ辞書８５dic）に分けた場合で説明してい
たが、所望により、同類辞書を集合使用して一括処理す
る構成としても良い。また、上記実施例１の説明では、
目的符号選別部８０において符号明示用語処理部８６と
同一用語比較判定処理部８８の構成手段を設けた場合で
説明していたが、所望によりこの構成手段の両方あるい
は一方を削除した構成手段としても良い。あるいは条件
設定部４０に設定スイッチを設けて各々の実行選択制御
としても良い。In the description of the first embodiment, a specific example of a patent specification is described. An index list corresponding to “description of reference numerals” is provided in another document.
Obviously, a similar technique can be used for this. In the description of the first embodiment, the noise code 106
Although nz was excluded from the list display, if necessary, a setting switch was provided in the condition setting unit 40 to set the noise code 10
6nz may be displayed in the noise code attention display form in the same manner as the warning display of the suspicious code, and may be displayed and output as an aid in elaboration determination. In the description of the first embodiment, the case where the target code selection unit 80 divides the dictionary into a plurality of dictionaries (a basic filter dictionary 64dic, a common filter dictionary 83dic, and a user filter dictionary 85dic) has been described. A configuration may be adopted in which batch processing is performed using a set of dictionaries. In the description of the first embodiment,
Although the description has been given of the case where the constituent means of the code-specific term processing section 86 and the same term comparison / determination processing section 88 are provided in the target code selecting section 80, a constituent means in which both or one of these constituent means is deleted as desired may be adopted. good. Alternatively, a setting switch may be provided in the condition setting unit 40 to perform each execution selection control.

【００５８】また、上記実施例１の説明では、定義符号
抽出部９０を構成する場合で説明していたが、所望によ
りこの構成を削除した構成、あるいは条件設定部４０に
実行選択スイッチを設けて実行の選択制御としても良
い。また、上記実施例５の説明では、インデックス一覧
である「符号の説明」形態の自動生成手段を符号集計表
示部１０と並行して実施する場合で説明していたが、自
動生成出力のみを目的とする場合には、所望により図３
５に示すように符号集計表示部１０の構成手段を削除し
た構成としても良い。In the description of the first embodiment, the case where the definition code extracting unit 90 is configured has been described. However, if desired, this configuration may be deleted or the condition setting unit 40 may be provided with an execution selection switch. Execution selection control may be used. Also, in the description of the fifth embodiment, the automatic generation means in the form of “description of code”, which is an index list, is described in parallel with the code totaling display unit 10. If desired, FIG.
As shown in FIG. 5, the configuration of the code tally display unit 10 may be omitted.

【００５９】また、上記実施例１の説明では、符号対象
として図３に示す符号適合条件６２の例に示すように必
ず数値を有する場合として説明していたが、分類認識品
質は低下する難点はあるが、所望により図３に示す数値
文字桁数を０と設定して英字のみ（例えば「ｆosc，Ｔs
tart，Ｄｉｎ」等）も抽出するように、以後の処理を対
応させて実施しても良い。Further, in the description of the first embodiment, a case has been described in which the code object always has a numerical value as shown in the example of the code conformity condition 62 shown in FIG. 3 as a code object. However, if desired, the number of numeric characters shown in FIG. 3 is set to 0 and only alphabetic characters (for example, “fosc, Ts
tart, Din ”) may also be executed in association with the subsequent processing.

【００６０】[0060]

【発明の効果】本発明は、以上説明した内容から、下記
に記載される効果を奏する。図３に示す符号適合条件６
２は、無用ノイズ符号を単純機械的に除外識別する抽出
条件テーブルであり、これにより多くの明示的な無用ノ
イズ符号が抽出除外されて、以後のフィルタ処理を容易
にする利点が得られる。文書中から中間符号１０２mid
を取り出し、該符号自身や、この前後の語句やに対し
て、目的符号から除外する複数辞書や、目的符号を明示
する複数辞書や誤記符号を特定する辞書や、連続符号認
識する辞書によって中間符号１０２midの中から目的符
号１０４objと疑惑符号１０７とノイズ符号１０６nzと
符号誤記８８missと無符号誤記８９missと誤記符号１０
８missとに各々分類処理を行い、これらの領域別発生回
数を計数し、インデックス欄（例「符号の説明」欄）が
有る場合はこの参照符号と参照用語を取り出し、これら
から各符号を領域別一覧表示形態で表示し注目表示形態
で表示することで、誤記あるいは疑わしい符号あるいは
推敲すべき符号であるかが一目瞭然に表示させる。この
結果、的確な判断のもとに推敲作業の実現が可能となる
利点が得られる。According to the present invention, the following effects can be obtained from the contents described above. Code matching condition 6 shown in FIG.
Reference numeral 2 is an extraction condition table for simply and mechanically excluding unnecessary noise codes, whereby many explicit useless noise codes are extracted and excluded, thereby obtaining an advantage of facilitating subsequent filtering. Intermediate code 102mid from the document
And, for the code itself, words before and after it, multiple dictionaries that are excluded from the target code, multiple dictionaries that specify the target code, dictionaries that specify erroneous codes, and dictionaries that recognize continuous codes From 102mid, the target code 104obj, the suspicious code 107, the noise code 106nz, the code error 88miss, the non-code error 89miss, and the error code 10
8miss, and the number of occurrences for each area is counted. If there is an index column (eg, "description of code" column), the reference code and the reference term are taken out, and each code is extracted from these for each region. By displaying in a list display form and in a noticeable display form, it is possible to display at a glance whether the code is erroneous or suspicious or a code to be refined. As a result, there is obtained an advantage that the elaboration work can be realized based on accurate judgment.

【００６１】本発明の実施結果を図２５に示す。これは
本明細書そのものの符号検査の実行結果の一部であり、
符号の発生領域と発生回数と「符号の説明」との自動照
合を一目瞭然に表示させた出力形態での結果を示してい
て推敲判断が容易であり有効であることは明白である。
しかもこの表示形態により従来のＫＷＩＣ方式に比べて
劇的に表示量の削減ができ、無用な冗長表示による推敲
者の迷いを大幅に軽減でき、しかも誤記と思われる部分
を注目表示することで推敲を一層容易にできる多くの利
点が得られる。FIG. 25 shows the result of the implementation of the present invention. This is a part of the result of performing the code check of the present specification itself,
It shows the result in an output form in which the code collation area and the number of occurrences and the "code description" are displayed automatically at a glance, and it is clear that the elaboration judgment is easy and effective.
In addition, this display form can dramatically reduce the display amount compared to the conventional KWIC system, greatly reduce the confusing of the elaborator due to unnecessary redundant display, and furthermore, elaborate by paying attention to the part which seems to be erroneous. There are many advantages that can be made easier.

【００６２】また図３２は、本明細書そのものの図番号
検査の実行結果の一部であり、図番号の発生領域と発生
回数をソートした出力結果を示していて、これについて
も発生領域と判定結果表示部６０６から一目瞭然に図番
号の正当性を確認できる効果が得られ、実用的に有効な
図番号検査機能であることが判る。また図３３は、本明
細書の実行による「符号の説明」欄の自動生成出力結果
の一部であり、抽出した全符号と用語から特許明細書の
「符号の説明」形式で自動生成出力した結果を示してい
て、十分実用的な索引自動生成機能を有していることが
判り、一部を可筆するのみで良くインデックス部分の執
筆作業を大幅に削除でき、その効果は大である。FIG. 32 shows a part of the execution result of the figure number inspection of the present specification itself, and shows an output result in which the figure number generation area and the number of occurrences are sorted. The effect that the validity of the figure number can be confirmed at a glance from the result display unit 606 is obtained, and it is understood that this is a practically effective figure number inspection function. FIG. 33 shows a part of the automatically generated and output result of the “description of code” column according to the execution of the present specification, and automatically generated and output in the “description of code” of the patent specification from all the extracted codes and terms. The results show that the system has a sufficiently practical automatic index generation function. It is sufficient to write only a part of the index, and the writing work for the index part can be largely deleted. The effect is great.

[Brief description of the drawings]

【図１】本発明実施例１の、符号検査システムの処理
構成図である。FIG. 1 is a processing configuration diagram of a code checking system according to a first embodiment of the present invention.

【図２】本発明の、（ａ）文章ファイル入力部２０の
処理構成と、（ｂ）中間符号抽出部６０の処理構成図で
ある。2A and 2B are a processing configuration of a text file input unit 20 and a processing configuration of an intermediate code extraction unit 60 according to the present invention.

【図３】本発明の、ワードリミッタ条件設定テーブル
例と文字列例である。FIG. 3 shows an example of a word limiter condition setting table and an example of a character string according to the present invention.

【図４】本発明の、（ａ）直前用語基本フィルタ辞書
と、（ｂ）直前用語適用文書例と、（ｃ）直後用語基本
フィルタ辞書と、（ｄ）直後用語適用文書例である。FIG. 4 shows (a) an immediately preceding term basic filter dictionary, (b) an immediately preceding term applied document example, (c) an immediately following term basic filter dictionary, and (d) an immediately following term applied document example of the present invention.

【図５】本発明の、（ａ）電気分野特有のフィルタ辞
書例と、（ｂ）この文書例と、（ｃ）機械分野特有のフ
ィルタ辞書例と、（ｄ）この文書例と、（ｅ）特許分野
特有のフィルタ辞書例と、（ｆ）この文書例である。FIG. 5 shows (a) an example of a filter dictionary specific to the electric field, (b) an example of this document, (c) an example of a filter dictionary specific to the mechanical field, (d) an example of this document, and (e) of the present invention. And (f) an example of this document.

【図６】本発明の、（ａ）符号直前ユーザフィルタ辞
書例と、（ｂ）符号直前ユーザフィルタ辞書の該当文書
例と、（ｃ）符号直後ユーザフィルタ辞書例と、（ｄ）
符号直前ユーザフィルタ辞書の該当文書例である。FIG. 6A shows an example of a user filter dictionary immediately before a code, (b) an example of a corresponding document of a user filter dictionary immediately before a code, (c) an example of a user filter dictionary immediately after a code, and (d).
It is an applicable document example of a user filter dictionary immediately before a code.

【図７】本発明の、符号集計表示部の処理構成図であ
る。FIG. 7 is a processing configuration diagram of a code tally display unit according to the present invention.

【図８】（ａ）連続符号記述形態文書例と、（ｂ）本
発明の連続符号記述ワード辞書例と、（ｃ）連続符号記
述ワードで連続符号を記述した文書例である。8A is an example of a continuous code description form document, FIG. 8B is an example of a continuous code description word dictionary of the present invention, and FIG. 8C is an example of a document in which a continuous code is described using a continuous code description word.

【図９】本発明の、目的符号選別部の処理構成図であ
る。FIG. 9 is a processing configuration diagram of a target code selection unit according to the present invention.

【図１０】符号の抽出過程における各符号の分類定義
説明図である。FIG. 10 is an explanatory diagram of a classification definition of each code in a code extraction process.

【図１１】（ａ）第２用語と第３用語の記載説明図
と、（ｂ）第２用語を定義宣言文章形態での記述例と、
本発明の（ｃ）定義宣言文章形態用の複数用語フィルタ
辞書例である。FIG. 11 (a) a description explanatory diagram of a second term and a third term, (b) a description example of a second term in the form of a definition declaration sentence,
It is an example of the multiple term filter dictionary for (c) definition declaration sentence form of this invention.

【図１２】ノイズ符号の一例である。FIG. 12 is an example of a noise code.

【図１３】複数符号の連続記述形態例である。FIG. 13 is an example of a continuous description form of a plurality of codes.

【図１４】索引用インデックス文書例である。FIG. 14 is an example of an index document for an index.

【図１５】本発明の、（ａ）用語先頭側の削除語句辞
書例と、（ｂ）先頭側非削除平仮名辞書例と、（ｃ）文
書中の用語部分と符号の例と、（ｄ）「符号の説明」形
式で自動生成させた出力例である。FIG. 15A shows an example of a deleted phrase dictionary at the beginning of a term, (b) an example of a non-deleted Hiragana dictionary at the beginning, (c) an example of a term part and a code in a document, and (d) of the present invention. It is an output example automatically generated in the “description of code” format.

【図１６】本発明の、指示符号における符号記述前後
の文脈の参照出力例である。FIG. 16 is a reference output example of a context before and after a code description in an instruction code according to the present invention.

【図１７】特許明細書の「符号の説明」領域の記述内
容例である。FIG. 17 is an example of description contents in a “description of reference numerals” area of the patent specification;

【図１８】ＫＷＩＣ編集手段を用いる編集画面例であ
る。FIG. 18 is an example of an editing screen using KWIC editing means.

【図１９】（ａ）文書中の中間符号と対応する用語部
分と用語対象の説明図と、（ｂ）非中間符号の文書例
と、（ｃ）用語のみで中間符号なしの文書例である。FIG. 19A is an explanatory diagram of a term portion and a term object corresponding to an intermediate code in a document, FIG. 19B is a document example of a non-intermediate code, and FIG. .

【図２０】本発明の、集計一覧表示出力を説明する図
である。FIG. 20 is a diagram for explaining total list display output according to the present invention.

【図２１】本発明の、１行中に多くの詳細情報表示し
た例である。FIG. 21 is an example of displaying many pieces of detailed information in one line according to the present invention.

【図２２】本発明の、注目表示部分が有する符号行の
みの表示形態例である。FIG. 22 is an example of a display mode of the present invention in which only a code line included in a display portion of interest is provided.

【図２３】本発明の、明示的除外用語辞書の例であ
る。FIG. 23 is an example of an explicit exclusion term dictionary of the present invention.

【図２４】本発明の、無用の符号宣言の注目表示形態
例である。FIG. 24 is an example of a noticeable display form of a useless code declaration according to the present invention.

【図２５】本発明の、本明細書自体の実行結果の一部
である。FIG. 25 shows a part of the execution result of the present specification itself of the present invention.

【図２６】本発明の、規格名フィルタ辞書例である。FIG. 26 is an example of a standard name filter dictionary of the present invention.

【図２７】本発明の、（ａ）符号直前明示用語辞書例
と、（ｂ）符号直後明示用語辞書例と、（ｃ）この明示
用語に該当する文書例である。FIGS. 27A and 27B are an example of an explicit term dictionary immediately before a code, an example of an explicit term dictionary immediately after a code, and an example of a document corresponding to the explicit term.

【図２８】本発明の、（ａ）使用禁止語句辞書例と、
（ｂ）使用禁止語句の文例である。FIG. 28 shows an example of (a) a prohibited phrase dictionary according to the present invention;
(B) It is a sentence example of a use prohibition phrase.

【図２９】本発明の、（ａ）敬語調の語句辞書例と、
（ｂ）命令口調語句辞書例と、（ｃ）あいまい的語調辞
書例である。FIG. 29 shows an example of a (a) honorific tone phrase dictionary according to the present invention;
(B) An example of a command tone phrase dictionary, and (c) an example of an ambiguous tone dictionary.

【図３０】本発明の、図番号一覧表示形態を説明する
図である。FIG. 30 is a diagram illustrating a figure number list display mode according to the present invention.

【図３１】本発明の、（ａ）図番号参照テーブル例
と、（ｂ）統一変換する画一図番号例である。FIG. 31 shows (a) an example of a diagram number reference table and (b) an example of a diagram number for uniform conversion according to the present invention.

【図３２】本発明の、本明細書自体の実行結果の図番
号一覧表示例の一部である。FIG. 32 is a part of a diagram number list display example of an execution result of the present specification itself of the present invention.

【図３３】本発明の、本明細書自体を実行して「符号
の説明」欄の自動生成出力例の一部である。FIG. 33 is a part of an example of an automatically generated output in the “explanation of reference numerals” column by executing the present specification itself of the present invention.

【図３４】本発明の、見出し順番誤記による警告表示
の例である。FIG. 34 is an example of a warning display due to an incorrect heading order according to the present invention.

【図３５】本発明の、インデックス自動生成手段を実
現する符号検査システムの処理構成図である。FIG. 35 is a processing configuration diagram of a code checking system that realizes an automatic index generation unit according to the present invention.

[Explanation of symbols]

１０符号集計表示部１１符号回数計数表示部１２回発生用語部アンド処理部１３参照用語比較警告処理部１４警告処理部１５疑惑符号警告表示処理部１６全角／半角同符号識別警告処理部１７表示行表示制御部２０文章ファイル入力部２１読み出し順管理部２８見出し順管理部２４行番号管理部２４gyou 行番号情報２６領域管理部３２コメント削除部４０抽出条件設定部６０中間符号抽出部６０dic 連続符号認識辞書６０ｂ中間符号単純抽出部６０ｃ連続符号認識処理部６０ｄ用語認識抽出処理部６０ｅ中間符号抽出除外処理部６１dic 規格名フィルタ辞書６４dic 基本フィルタ辞書６８中間符号格納部８０目的符号選別部８１基本フィルタ処理部８２分野共通フィルタ処理部８３dic 共通フィルタ辞書８４ユーザ辞書処理部８５dic 本ユーザ辞書８６符号明示用語処理部８６dic 符号明示用語辞書８８用語比較判定処理部９０定義符号抽出部１２７dic 宣言用語フィルタ辞書１５０dic 削除語句辞書２７１dic 使用禁止語句辞書 10 Code total display section 11 Code count display section 12 Timed term section and processing section 13 Reference term comparison warning processing section 14 Warning processing section 15 Suspicious code warning display processing section 16 Full-width / half-width same sign identification warning processing section 17 Display row Display control unit 20 Text file input unit 21 Reading order management unit 28 Index order management unit 24 Line number management unit 24gyou Line number information 26 Area management unit 32 Comment deletion unit 40 Extraction condition setting unit 60 Intermediate code extraction unit 60dic Continuous code recognition dictionary 60b Intermediate code simple extraction unit 60c Continuous code recognition processing unit 60d Term recognition extraction processing unit 60e Intermediate code extraction exclusion processing unit 61dic Standard name filter dictionary 64dic Basic filter dictionary 68 Intermediate code storage unit 80 Object code selection unit 81 Basic filter processing unit 82 Field common filter processing unit 83dic Common filter dictionary 84 Chromatography The dictionary unit 85dic This user dictionary 86 code explicitly term processor 86dic code explicitly term dictionary 88 term comparison determination unit 90 defines the code extraction unit 127dic declared term filter dictionary 150dic delete the phrase dictionary 271dic banned word dictionary

Claims

[Claims]

In the inspection of a code described in a document file and a term corresponding to the code, a document file is received, and a setting condition from an extraction condition setting unit (40) is received. A text file input unit (20) for output is provided, and receives document data from the text file input unit (20), and receives a code matching condition (6) from an extraction condition setting unit (40).
In response to 2), the code object conforming to this is changed to an intermediate code (1
02mid), a term part (101) corresponding to the intermediate code is extracted, and an intermediate code extracting unit (60) is provided for counting the number of code occurrences for each area of the intermediate code, and the intermediate code extracting unit (60) Intermediate code from (102mi
In response to d), the intermediate code is compared with the basic filter processing unit (81), the common field filter processing unit (82), the user dictionary processing unit (84), and the code explicit term processing unit (86). An object code selecting unit (80) for classifying and outputting a noise code (106nz), a suspicious code (107), and another object code (104obj) by classification by the judgment processing unit (88); Intermediate code (102mid) over the entire document obtained in
And code information, sort the code names in ascending order,
One line is displayed for each individual code, and at least the extracted term part (101) is displayed as display contents, and the intermediate code (102mi) is displayed.
d) The name is displayed, and the code generation frequency (40
0) is displayed, and if there is a noticeable display object in this display line, the display portion is displayed in a noticeable display form, and a code totaling display unit (10) for displaying a list of codes of all display targets is provided. A document code inspection system, comprising:

2. A document file input for receiving a document file and receiving setting conditions from an extraction condition setting unit (40) and outputting document data in a predetermined order in checking a figure number described in the document file. Unit (20) for receiving the document data from the text file input unit (20) and receiving the code matching condition (6) from the extraction condition setting unit (40).
In response to the 2), an intermediate code extraction unit (60) for extracting a figure number (102zu) object corresponding to the figure number and counting the number of occurrences of each figure number for each area is provided. Receiving the number of times,
Sort the figure numbers in ascending order, display each figure number on one line,
A document code inspection system, comprising: a figure number list display means for displaying the number of occurrences for each area;

3. In the inspection of the code described in the document file and the term corresponding to the code, the document file is received, and the setting data from the extraction condition setting unit (40) is received. A text file input unit (20) for output is provided, and receives document data from the text file input unit (20), and receives a code matching condition (6) from an extraction condition setting unit (40).
In response to 2), the code object conforming to this is changed to an intermediate code (1
02mid), a term part (101) corresponding to the intermediate code is extracted, and an intermediate code extraction unit (60) is provided for counting the number of occurrences of each code for each area of the intermediate code, and the intermediate code extraction unit (60) Intermediate code from (102mi
In response to d), the same term comparison is performed on the intermediate code by the basic filter processing unit (81), the field common filter processing unit (82), the user dictionary processing unit (84), and the code explicit term processing unit (86). By a classification process performed by the determination processing unit (88), a target code selection unit (80) that classifies and outputs a noise code (106nz), a suspicious code (107), and another target code (104obj) is provided. Receiving the term part (101) corresponding to the intermediate code (102mid), deletes unnecessary phrases at the head of the term part (101), and automatically generates and outputs an index as a term for the intermediate code (102mid). A document code inspection system, comprising: means for providing a document code;

4. In addition to the configuration means according to claim 1, upon receiving a detailed display instruction for the designated intermediate code (102mid), each of the designated intermediate codes (102mid) is
The area information (26area) and the line number information (24gyou) are displayed, the text is displayed from the head of the corresponding line, and the designated intermediate code (10
(2mid) and a term part (101) or term object (201yogo) in front of the document code, and a detailed code display means for displaying a context after the intermediate code by a designated number of characters is provided.

5. In addition to the constitutional means according to claim 1, a use prohibited word dictionary in which a special use prohibited word is registered according to the type and area of the sentence file is provided. When a phrase corresponding to the prohibited phrase dictionary is detected, the area information (26area) and the line number information (24gyou) are displayed, a sentence is displayed from the beginning of the document line, and the phrase corresponding to the dictionary content is displayed. A document code inspection system provided with a display means, comprising:

6. In addition to the constituent means according to claim 1, receiving the intermediate code (102mid), receiving the term subject (201yogo), and combining the context before and after in the form of a keyword in context (KWIC), A document code inspection system characterized in that KWIC editing means for displaying an object on a screen with attention and displaying the object and editing it online is provided.