JP2023072321A

JP2023072321A - Document proofreading support device, document proofreading support method, and document proofreading support program

Info

Publication number: JP2023072321A
Application number: JP2021184785A
Authority: JP
Inventors: 駿介花岡; Shunsuke Hanaoka; 一則和久井; Kazunori Wakui; 知弘米田; Tomohiro Yoneda
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2023-05-24

Abstract

To provide a document proofreading support device, method, and program capable of detecting a part where a semantic error is potentially present in a document.SOLUTION: The document proofreading support device comprises a document analysis unit for determining whether or not a sentence included in a document is an unknown sentence which may include a semantic error according to whether or not the sentence included in the document follows a predetermined rule, an unknown sentence processing unit for estimating to which one of a plurality of components defined according to the type of the sentence the unknown sentence approximates, and a display processing unit for displaying the unknown sentence and a component to which the unknown sentence approximates.SELECTED DRAWING: Figure 1

Description

本発明は、文書校正支援装置、文書校正支援方法及び文書校正支援プログラムに関する。 The present invention relates to a document proofreading support device, a document proofreading support method, and a document proofreading support program.

企業においては、日常的に多種多量の文書が作成される。しかしながら、文書作成者は、文書を見直す時間を充分に確保できない場合が多い。さらに、業務用の文書は、専門的な内容を正確に伝える語法に適っている必要がある。近時、コンピュータがこのような見直しを行うことが一般化している。 In companies, a large number of documents of various kinds are created on a daily basis. However, document creators often do not have enough time to review the document. In addition, business documents should be written in a language that accurately conveys the technical content. In recent years, it has become common for a computer to perform such a review.

特許文献１の文書校正支援装置は、所定のルールに合致しない不適切な記載箇所を文書から抽出し、それを修正するために要する予想修正時間を算出し、不適切な記載箇所及びその予想修正時間を出力する。当該文書校正支援装置は、ユーザが指定する修正時間を予想修正時間と比較する。予想修正時間の方が短い場合、当該文書校正支援装置は、すべての不適切な記載箇所を修正の対象とする。逆に、予想修正時間の方が長い場合、当該文書校正支援装置は、不適切な記載箇所のうち重要度が高いもののみを修正の対象とする。 The document proofreading support device disclosed in Patent Document 1 extracts inappropriate descriptions that do not meet predetermined rules from a document, calculates an expected correction time required for correcting them, and identifies the inappropriate descriptions and their expected corrections. Output time. The document proofreading support apparatus compares the correction time specified by the user with the expected correction time. If the expected correction time is shorter, the document proofreading support device corrects all inappropriate entries. Conversely, if the expected correction time is longer, the document proofreading support apparatus will correct only those inappropriate descriptions with high importance.

特開２０１７－４１１６４号公報JP 2017-41164 A

校正前の原稿は、誤字、脱字、表記揺れ等の文法的又は型式的な間違いだけではなく、意味的な間違いを含む場合がある。特許文献１のルールは、文法的又は型式的な間違いを検知するためのものである。意味的な間違いが潜在的に存在する箇所をユーザに知らせるには、別途方策が必要であった。
そこで、本願は、文書内において、意味的な間違いが潜在的に存在する箇所を検知することを目的とする。 Manuscripts before proofreading may contain not only grammatical or formal errors such as spelling errors, omissions, and spelling inconsistencies, but also semantic errors. The rules of Patent Literature 1 are for detecting grammatical or formal errors. A separate strategy was required to inform the user where a potential semantic error exists.
Accordingly, an object of the present application is to detect a place where a semantic error potentially exists in a document.

本発明の文書校正支援装置は、文書に含まれるセンテンスが所定のルールに一致するか否かにより、前記センテンスが意味的な間違いを含み得る未知センテンスであるか否かを判断する文書解析部と、前記文書の種類に応じて定義される複数の構成要素のうちのいずれに前記未知センテンスが近似するかを推定する未知センテンス処理部と、前記未知センテンス及び前記未知センテンスが近似する前記構成要素を表示する表示処理部と、を備えることを特徴とする。
その他の手段については、発明を実施するための形態のなかで説明する。 A document proofreading support apparatus according to the present invention includes a document analysis unit that determines whether a sentence contained in a document is an unknown sentence that may contain a semantic error, based on whether the sentence matches a predetermined rule. , an unknown sentence processing unit for estimating to which of a plurality of components defined according to the type of the document the unknown sentence approximates; and a display processing unit for displaying.
Other means are described in the detailed description.

本発明によれば、文書内において、意味的な間違いが潜在的に存在する箇所を検知することができる。 According to the present invention, it is possible to detect a place where a semantic error potentially exists in a document.

文書校正支援装置の構成を説明する図である。1 is a diagram for explaining the configuration of a document proofreading support device; FIG. 文書の一例である。This is an example document. ルール情報の一例である。It is an example of rule information. 一致センテンス情報の一例である。It is an example of matching sentence information. 未知センテンス情報の一例である。It is an example of unknown sentence information. ルール別校正時間情報の一例である。It is an example of rule-by-rule calibration time information. 一致センテンス別校正時間情報の一例である。It is an example of proofreading time information classified by matching sentence. 文書種類・構成要素情報の一例である。This is an example of document type/component information. センテンス空間の一例である。It is an example of sentence space. 距離情報の一例である。It is an example of distance information. 構成要素別重要度情報の一例である。It is an example of importance information for each component. スコア情報の一例である。It is an example of score information. スコア・校正時間換算情報の一例である。This is an example of score/correction time conversion information. 未知センテンス別校正時間情報の一例である。It is an example of calibration time information for each unknown sentence. 文書解析処理手順のフローチャートである。4 is a flowchart of a document analysis processing procedure; 一致センテンス処理手順のフローチャートである。4 is a flow chart of a matching sentence processing procedure; 未知センテンス処理手順のフローチャートである。4 is a flowchart of an unknown sentence processing procedure; 校正時間表示画面の一例である。It is an example of a calibration time display screen.

以降、本発明を実施するための形態（“本実施形態”という）を、図等を参照しながら詳細に説明する。本実施形態は、業務用の文書から、文法的又は型式的な間違い、及び、意味的な間違いを抽出する例である。印刷、製本等を目的として、文書作成者が作成した原稿の誤記又は表現を修正し最終稿とする作業は、一般に“校正”と呼ばれる。本実施形態は、印刷、製本以外の目的にも使用される。その場合における“修正”も含めて、本実施形態は、“校正”の語を使用する。 EMBODIMENT OF THE INVENTION Hereinafter, the form (it is called "this embodiment") for implementing this invention is demonstrated in detail, referring drawings. This embodiment is an example of extracting grammatical or formal errors and semantic errors from business documents. The work of correcting typographical errors or expressions in a manuscript prepared by a document creator for the purpose of printing, bookbinding, etc., and making it into a final manuscript is generally called “proofreading”. This embodiment is also used for purposes other than printing and binding. This embodiment uses the term "proofreading" including "correction" in that case.

（用語等）
文書とは、文字列を含む電子ファイルであり、校正前の原稿である。
文章とは、文書が含む連続する文字列のうち、句点“。”で区切られる１単位である。本実施形態では、“文章”と“センテンス”とは同義である。
ルールとは、センテンスから文法的又は型式的な間違いを検知するための具体的な基準である。 (terms, etc.)
A document is an electronic file containing character strings, and is a manuscript before proofreading.
A sentence is one unit separated by a full stop "." in a continuous character string included in a document. In this embodiment, "sentence" and "sentence" are synonymous.
A rule is a specific criterion for detecting grammatical or formal errors from a sentence.

一致センテンスとは、ルールに一致する箇所を含むセンテンスである。一致センテンスは、文法的又は型式的な間違いを含む。
未知センテンスとは、ルールに一致する箇所を含まないセンテンスである。未知センテンスは、意味的な間違いを含む可能性がある。意味的な間違いを含むか否かが未知であることが、“未知”センテンスの命名理由である。その意味で、未知センテンスは、潜在校正箇所を含むともいえる。
一致センテンスは、意味的な間違いも含む可能性がある。本実施形態では、一致センテンスは、自身が校正され文法的又は型式的な間違いを含まなくなった時点で、未知センテンスになるものとする。 A matching sentence is a sentence that contains a portion that matches the rule. Matching sentences contain grammatical or formal mistakes.
An unknown sentence is a sentence that does not contain a part that matches the rule. Unknown sentences may contain semantic mistakes. The reason for naming the "unknown" sentence is that it is unknown whether or not it contains a semantic error. In that sense, it can be said that the unknown sentence contains the potential proofreading part.
Matching sentences may also contain semantic mistakes. In this embodiment, a matching sentence becomes an unknown sentence once it has been proofread and contains no grammatical or formal errors.

文書種類とは、文書のカテゴリであり、例えば、“見積書”、“特許明細書”、“報告書”、“仕様書”、“議事録”、“決裁書”等である。
構成要素とは、文書が通常含む記載項目であり、文書種類ごとに定義される。例えば、見積書の構成要素は、“工程”、“作業費”、“旅費”及び“作業内容”である。 A document type is a category of a document, and includes, for example, "estimate", "patent specification", "report", "specification", "minutes", and "approval".
A component is a description item that a document normally contains, and is defined for each document type. For example, the components of the quotation are "process", "work cost", "travel cost" and "work content".

（文書校正支援装置の構成）
図１は、文書校正支援装置１の構成を説明する図である。文書校正支援装置１は、一般的なコンピュータであり、中央制御装置１１、マウス、キーボード等の入力装置１２、ディスプレイ等の出力装置１３、主記憶装置１４及び補助記憶装置１５を備える。これらは、バスで相互に接続されている。 (Configuration of Document Proofing Support Device)
FIG. 1 is a diagram for explaining the configuration of a document proofreading support apparatus 1. As shown in FIG. The document proofreading support apparatus 1 is a general computer, and includes a central control unit 11 , an input device 12 such as a mouse and a keyboard, an output device 13 such as a display, a main storage device 14 and an auxiliary storage device 15 . These are interconnected by a bus.

補助記憶装置１５は、文書３１、ルール情報３２、一致センテンス情報３３、未知センテンス情報３４、ルール別校正時間情報３５、一致センテンス別校正時間情報３６、文書種類・構成要素情報３７、距離情報３８、構成要素別重要度情報３９、スコア情報４０、スコア・校正時間換算情報４１、未知センテンス別校正時間情報４２及び構成要素推定モデル４３を格納している（詳細後記）。 The auxiliary storage device 15 stores documents 31, rule information 32, matching sentence information 33, unknown sentence information 34, rule-based proofreading time information 35, matching sentence-based proofreading time information 36, document type/component information 37, distance information 38, It stores component element importance information 39, score information 40, score/proofreading time conversion information 41, unknown sentence proofreading time information 42, and component element estimation model 43 (details will be described later).

これらのうち、文書３１、ルール情報３２、ルール別校正時間情報３５、文書種類・構成要素情報３７、構成要素別重要度情報３９、スコア・校正時間換算情報４１及び構成要素推定モデル４３は、ユーザが作成したものを文書校正支援装置１が補助記憶装置１５内に取り込んだ結果である。残りの一致センテンス情報３３、未知センテンス情報３４、一致センテンス別校正時間情報３６、距離情報３８、スコア情報４０及び未知センテンス別校正時間情報４２は、文書校正支援装置１が処理途中で作成したものである。 Of these, the document 31, the rule information 32, the rule-by-rule proofreading time information 35, the document type/component information 37, the component-by-component importance information 39, the score/proofreading time conversion information 41, and the component element estimation model 43 are is the result of taking in the document proofreading support apparatus 1 into the auxiliary storage device 15. The remaining matching sentence information 33, unknown sentence information 34, matching sentence proofreading time information 36, distance information 38, score information 40, and unknown sentence proofreading time information 42 are created by the document proofreading support apparatus 1 during processing. be.

主記憶装置１４における文書解析部２１、一致センテンス処理部２２、未知センテンス処理部２３及び表示処理部２４は、プログラムである。中央制御装置１１は、これらのプログラムを補助記憶装置１５から読み出し主記憶装置１４にロードすることによって、それぞれのプログラムの機能（詳細後記）を実現する。補助記憶装置１５は、需給調整支援装置１から独立した構成となっていてもよい（クラウド）。 The document analysis section 21, matching sentence processing section 22, unknown sentence processing section 23 and display processing section 24 in the main memory 14 are programs. The central control unit 11 reads out these programs from the auxiliary storage device 15 and loads them into the main storage device 14, thereby implementing the functions of each program (details will be described later). The auxiliary storage device 15 may be configured independently from the supply and demand adjustment support device 1 (cloud).

（文書）
図２は、文書３１の一例である。図２の文書３１は、メーカにおける開発関係の報告書である。文書３１は、センテンスＳＥ０１及びＳＥ０２を含む。センテンスＳＥ０１には、２つのルールが一致している（符号５１及び５２）。したがって、センテンスＳＥ０１は、一致センテンスである。センテンスＳＥ０２に一致するルールは存在しない（符号５３）。したがって、センテンスＳＥ０２は、未知センテンスである。なお、符号５１～５３のルールは、説明目的のものであって、文書３１自身にこれらが記載されているわけではない。 (documents)
FIG. 2 is an example of the document 31. As shown in FIG. A document 31 in FIG. 2 is a development-related report by the manufacturer. Document 31 includes sentences SE01 and SE02. Two rules match for the sentence SE01 (reference numerals 51 and 52). Therefore, sentence SE01 is a matching sentence. There is no rule that matches sentence SE02 (reference numeral 53). Therefore, sentence SE02 is an unknown sentence. Note that the rules 51 to 53 are for the purpose of explanation and are not described in the document 31 itself.

（ルール情報）
図３は、ルール情報３２の一例である。ルール情報３２においては、ルールＩＤ（欄１０１）、ルール（欄１０２）及び重要度（欄１０３）が相互に関連付けて記憶されている。
ルールＩＤ（欄１０１）は、ルールを一意に特定する識別子である。
ルール（欄１０２）は、前記したルールである。
重要度は、複数のルール間における相対的なウエイトである。ユーザは、“０＜重要度≦１”の範囲内で、重要度を設定する。 (rule information)
FIG. 3 is an example of the rule information 32. As shown in FIG. In the rule information 32, a rule ID (column 101), a rule (column 102), and a degree of importance (column 103) are associated with each other and stored.
A rule ID (column 101) is an identifier that uniquely identifies a rule.
Rules (column 102) are the rules described above.
Importance is relative weight among multiple rules. The user sets the importance within the range of "0<importance≦1".

（一致センテンス情報）
図４は、一致センテンス情報３３の一例である。一致センテンス情報３３においては、センテンスＩＤ（欄１１１）、一致センテンス（欄１１２）、ルールＩＤ（欄１１３）及び重要度（欄１１４）が相互に関連付けて記憶されている。
センテンスＩＤ（欄１１１）は、センテンスを一意に特定する識別子であり、ここでは、一致センテンスを特定している。
一致センテンス（欄１１２）は、前記した一致センテンスである。
ルールＩＤ（欄１１３）は、図３のルールと同じである。
重要度（欄１１４）は、図３の重要度と同じである。
図４の一致センテンス情報３３は、センテンスＳＥ０１についての２本のレコードを含む。これは、図２におけるルール５１及び５２に対応している。 (matching sentence information)
FIG. 4 is an example of matching sentence information 33 . In the matching sentence information 33, a sentence ID (column 111), a matching sentence (column 112), a rule ID (column 113) and a degree of importance (column 114) are associated with each other and stored.
A sentence ID (column 111) is an identifier that uniquely identifies a sentence, and here identifies a matching sentence.
Matching Sentences (column 112) are the matching sentences described above.
The rule ID (column 113) is the same as the rule in FIG.
The importance (column 114) is the same as the importance in FIG.
The matching sentence information 33 in FIG. 4 includes two records for the sentence SE01. This corresponds to rules 51 and 52 in FIG.

（未知センテンス情報）
図５は、未知センテンス情報３４の一例である。未知センテンス情報３４においては、センテンスＩＤ（欄１２１）及び未知センテンス（欄１２２）が相互に関連付けて記憶されている。
センテンスＩＤ（欄１２１）は、センテンスを一意に特定する識別子であり、ここでは、未知センテンスを特定している。
未知センテンス（欄１２２）は、前記した未知センテンスである。
図５の未知センテンス情報３４は、センテンスＳＥ０２についての１本のレコードを含む。これは、図２における欄５３（一致するルールなし）に対応している。 (Unknown sentence information)
FIG. 5 is an example of the unknown sentence information 34. As shown in FIG. In the unknown sentence information 34, the sentence ID (column 121) and the unknown sentence (column 122) are associated with each other and stored.
A sentence ID (column 121) is an identifier that uniquely identifies a sentence, and here identifies an unknown sentence.
The unknown sentence (column 122) is the previously described unknown sentence.
The unknown sentence information 34 in FIG. 5 includes one record for sentence SE02. This corresponds to column 53 (no matching rule) in FIG.

（ルール別校正時間情報）
図６は、ルール別校正時間情報３５の一例である。ルール別校正時間情報３５においては、ルールＩＤ（欄１３１）及び校正時間（欄１３２）が相互に関連付けて記憶されている。
ルールＩＤ（欄１３１）は、図３のルールＩＤと同じである。
校正時間（欄１３２）は、そのルールに一致する間違いを校正するために必要な時間である。ユーザは、過去の事例に基づき、秒単位で校正時間を設定する。 (Calibration time information by rule)
FIG. 6 is an example of the calibration time information 35 by rule. In the rule-by-rule proofreading time information 35, rule IDs (column 131) and proofreading times (column 132) are associated with each other and stored.
The rule ID (column 131) is the same as the rule ID in FIG.
The proof time (column 132) is the time required to proof the errors that match the rule. The user sets the calibration time in seconds based on past cases.

（一致センテンス別校正時間情報）
図７は、一致センテンス別校正時間情報３６の一例である。一致センテンス別校正時間情報３６においては、センテンスＩＤ（欄１４１）及び校正時間（欄１４２）が相互に関連付けて記憶されている。
センテンスＩＤ（欄１４１）は、図４のセンテンスＩＤと同じである。
校正時間（欄１４２）は、図６の校正時間と同じであるが、ここでは、図６の校正時間を一致センテンスごとに集計したものである。例えば、センテンスＳＥ０１の校正時間“３９０”は、図６におけるルールＲ０２の“３６０”とルールＲ０４の“３０”との合計である。 (Proofreading time information for each matching sentence)
FIG. 7 is an example of the matching sentence proofreading time information 36 . In the matching sentence proofreading time information 36, the sentence ID (column 141) and the proofreading time (column 142) are associated with each other and stored.
The sentence ID (column 141) is the same as the sentence ID in FIG.
The proofreading time (column 142) is the same as the proofreading time in FIG. 6, but here, the proofreading time in FIG. 6 is aggregated for each matching sentence. For example, the proofreading time "390" of sentence SE01 is the sum of "360" of rule R02 and "30" of rule R04 in FIG.

（文書種類・構成要素情報）
図８は、文書種類・構成要素情報３７の一例である。文書種類・構成要素情報３７においては、文書種類（欄１５１）に構成要素１（欄１５２）～構成要素４（欄１５５）が関連付けて記憶されている。
文書種類（欄１５１）は、前記した文書種類である。ユーザは、自身の業務に応じて、複数の文書種類を設定する。
構成要素１（欄１５２）～構成要素４（欄１５５）は、前記した構成要素である。ユーザは、文書種類ごとに任意の複数の構成要素を設定する。“ＫＰＩ”は、“重要業績評価指標”を意味する。 (Document type/component information)
FIG. 8 is an example of the document type/component information 37. As shown in FIG. In the document type/component information 37, component 1 (column 152) to component 4 (column 155) are stored in association with the document type (column 151).
The document type (column 151) is the document type described above. A user sets a plurality of document types according to his or her business.
Component 1 (column 152) to component 4 (column 155) are the components described above. The user sets any number of constituent elements for each document type. "KPI" means "Key Performance Indicator".

図９は、センテンス空間の一例である。文書校正支援装置１がセンテンス空間を使用して処理をする前提として、センテンスベクトルが定義される。 FIG. 9 is an example of a sentence space. Sentence vectors are defined on the premise that the document proofreading support apparatus 1 uses the sentence space for processing.

（センテンスベクトル）
文書校正支援装置１は、文字列としての１つのセンテンスを１つのセンテンスベクトルに変換する。センテンスベクトルの次元数（要素数）は、そのセンテンスの言語の単語辞書の単語数に等しい。そして、センテンスベクトルの各要素は、例えば、その単語がそのセンテンス内に出現する回数である。いま、単語辞書が、単語ａ、単語ｂ、単語ｃ、単語ｄ及び単語ｅからなり、センテンス中に、単語ａが１回、単語ｂが０回、単語ｃが２回、単語ｄが０回、単語ｅが１回出現する場合、センテンスベクトルは“（１，０，２，０，１）”となる。ここで説明したセンテンスベクトルは、非常に単純な例である。文書校正支援装置１は、センテンスの意味的特徴をより正確に示すより精緻なセンテンスベクトルを任意の方法で作成し得る。 (sentence vector)
The document proofreading support device 1 converts one sentence as a character string into one sentence vector. The number of dimensions (the number of elements) of the sentence vector is equal to the number of words in the word dictionary of the sentence's language. And each element of the sentence vector is, for example, the number of times the word appears in that sentence. Now, the word dictionary consists of word a, word b, word c, word d, and word e. In a sentence, word a is 1 time, word b is 0 times, word c is 2 times, and word d is 0 times. , the word e appears once, the sentence vector is "(1, 0, 2, 0, 1)". The sentence vector described here is a very simple example. The document proofreading support apparatus 1 can create a more precise sentence vector that more accurately indicates the semantic features of the sentence by any method.

（センテンス空間）
文書校正支援装置１は、センテンスベクトルをセンテンス空間４４内の点として描画することができる。センテンス空間４４の次元数は、センテンスベクトルの次元数に等しい。センテンス空間４４の各軸は、特定の単語の出現回数を示している。文書校正支援装置１は、すべてのセンテンスが文法的、型式的かつ意味的に正しいことが既知である見本文書（学習データ）を文書種類ごとに複数集め、各見本文書のすべてのセンテンスをセンテンスベクトルに変換し、センテンス空間４４に“●”として描画する。 (sentence space)
The document proofreading support apparatus 1 can draw sentence vectors as points in the sentence space 44 . The number of dimensions of the sentence space 44 is equal to the number of dimensions of the sentence vector. Each axis of sentence space 44 indicates the number of occurrences of a particular word. The document proofreading support apparatus 1 collects a plurality of sample documents (learning data) in which all sentences are known to be grammatically, formally, and semantically correct for each document type, and converts all sentences of each sample document into a sentence vector. , and drawn as “●” in the sentence space 44 .

その結果、文書校正支援装置１は、文書種類ごとに、センテンス空間４４を作成することになる。図９の１つの“●”が、１つのセンテンスに対応している。文書校正支援装置１は、例えばｋ平均法のような技術を使用し、これらの●をクラスタに分類する。すると、クラスタ６１ａ～６１ｄは、多くの場合その文書種類の構成要素に１対１で対応することが経験的にわかっている。なお、センテンス空間４４は、学習データとしてのセンテンスが複数のクラスタに分類されている空間である。 As a result, the document proofreading support apparatus 1 creates a sentence space 44 for each document type. One "●" in FIG. 9 corresponds to one sentence. The document proofreading support apparatus 1 uses a technique such as the k-means method to classify these ● into clusters. It is empirically known that the clusters 61a-61d often correspond one-to-one to the components of the document type. The sentence space 44 is a space in which sentences as learning data are classified into a plurality of clusters.

文書校正支援装置１は、ある１つの未知センテンスをセンテンスベクトルに変換し、センテンス空間４４に“○”として描画する。すると、ある〇がクラスタ６１ａ～６１ｄのいずれかに分類されるのに対し、他のある○は、いずれのクラスタ６１ａ～６１ｄにも分類されない、ということが起こる。○６２ａのセンテンスは、クラスタ６１ａに分類され、文書種類“見積書”の構成要素“工程”について記載したものである。○６２ｂのセンテンスは、どのクラスタ６１ａ～６１ｄにも分類されていない。当該センテンスは、見積書のどの構成要素について記載したものともいえず、意味的な間違い（例えば、見積書の内容に相応しくない宣伝文言）を含む可能性が高い。因みに、ある文書種類のすべての構成要素のクラスタ内に、少なくとも１つの○が分類されている場合、その文書は、必要な記載項目をすべてカバーしているといえる。○が分類されないクラスタが１つでも存在する場合、その文書は、その構成要素（記載項目）を欠いているといえる。 The document proofreading support apparatus 1 converts one unknown sentence into a sentence vector and draws it in the sentence space 44 as a circle. Then, a certain circle is classified into one of the clusters 61a to 61d, while another certain circle is not classified into any of the clusters 61a to 61d. The sentence 62a is classified into the cluster 61a and describes the component "process" of the document type "estimate". o Sentence 62b is not classified into any cluster 61a-61d. The sentence cannot be said to describe any component of the quotation, and there is a high possibility that it contains a semantic error (for example, an advertising word that is not suitable for the contents of the quotation). By the way, if at least one ○ is classified in the cluster of all the components of a certain document type, it can be said that the document covers all necessary description items. If there is at least one cluster that is not classified as ○, it can be said that the document lacks its constituent elements (entry items).

（構成要素推定モデル）
構成要素推定モデル４３は、ある文書種類の文書を構成するセンテンスベクトルが入力されると、センテンス空間４４における当該センテンスベクトル（○）と当該文書種類の各構成要素（各クラスタの中心）との間の距離を出力する関数である。構成要素推定モデル４３は、文書種類ごとに存在する。構成要素推定モデル４３は、未知センテンスをセンテンスベクトルに変換する処理を併せて行ってもよい。文書校正支援装置１は、任意のタイミングにおいて、最新の学習データを用いて、センテンス空間４４におけるクラスタ６１ａ～６１ｄの位置及び大きさを更新して補助記憶装置１５に記憶してもよい。 (Constituent element estimation model)
When a sentence vector forming a document of a certain document type is input, the constituent element estimation model 43 calculates the distance between the sentence vector (○) in the sentence space 44 and each constituent element (the center of each cluster) of the document type. is a function that outputs the distance of The component estimation model 43 exists for each document type. The constituent element estimation model 43 may also perform a process of converting the unknown sentence into a sentence vector. The document proofreading support apparatus 1 may update the positions and sizes of the clusters 61a to 61d in the sentence space 44 and store them in the auxiliary storage device 15 at any timing using the latest learning data.

（距離情報）
図１０は、距離情報３８の一例である。距離情報３８においては、センテンスＩＤ（欄１６１）、未知センテンス（欄１６２）、工程距離（欄１６３）、作業費距離（欄１６４）、旅費距離（欄１６５）及び作業内容距離（欄１６６）が相互に関連付けて記憶されている。
センテンスＩＤ（欄１６１）は、図５のセンテンスＩＤと同じである。
未知センテンス（欄１６２）は、図５の未知センテンスと同じである。 (distance information)
FIG. 10 is an example of the distance information 38. As shown in FIG. In the distance information 38, the sentence ID (column 161), the unknown sentence (column 162), the process distance (column 163), the work cost distance (column 164), the travel cost distance (column 165), and the work content distance (column 166). stored in association with each other.
The sentence ID (column 161) is the same as the sentence ID in FIG.
The unknown sentence (column 162) is the same as the unknown sentence of FIG.

工程距離（欄１６３）は、センテンス空間４４（図９）における未知センテンス（“○”で示される）とクラスタ６１ａの中心との間の距離である。当該距離は、ユークリッド距離、マハラノビス距離又はその他の距離であり得る。この距離が所定の閾値（例えばクラスタ６１ａの半径）より大きい場合、未知センテンスは、少なくとも構成要素“工程”について記載されていない可能性が高い（以下同様）。
作業費距離（欄１６４）は、センテンス空間４４における未知センテンスとクラスタ６１ｂの中心との間の距離である。
旅費距離（欄１６５）は、センテンス空間４４における未知センテンスとクラスタ６１ｃの中心との間の距離である。
作業内容距離（欄１６６）は、センテンス空間４４における未知センテンスとクラスタ６１ｄの中心との間の距離である。 The step distance (column 163) is the distance between the unknown sentence (indicated by "o") in the sentence space 44 (FIG. 9) and the center of the cluster 61a. The distance may be Euclidean distance, Mahalanobis distance or some other distance. If this distance is greater than a predetermined threshold (for example, the radius of cluster 61a), it is highly probable that the unknown sentence does not describe at least the component "process" (and so on).
The work cost distance (column 164) is the distance between the unknown sentence in sentence space 44 and the center of cluster 61b.
Travel distance (column 165) is the distance between the unknown sentence in sentence space 44 and the center of cluster 61c.
The work content distance (column 166) is the distance between the unknown sentence in the sentence space 44 and the center of the cluster 61d.

図１０の距離情報３８は、文書種類“見積書”についての距離情報３８である。図１０が、例えば文書種類“特許明細書”についての距離情報３８である場合、工程距離、作業距離、旅費距離及び作業内容距離は、それぞれ、課題距離、解決方法距離、請求項距離及び先行技術距離に変わる。 The distance information 38 in FIG. 10 is the distance information 38 for the document type "estimate". If FIG. 10 is the distance information 38 for the document type "patent specification", for example, the process distance, work distance, travel cost distance, and work content distance are the problem distance, solution distance, claim distance, and prior art distance, respectively. change in distance.

（構成要素別重要度情報）
図１１は、構成要素別重要度情報３９の一例である。構成要素別重要度情報３９においては、構成要素（欄１７１）及び重要度（欄１７２）が相互に関連付けて記憶されている。
構成要素（欄１７１）は、前記した構成要素である。
重要度（欄１７２）は、複数の構成要素間における相対的なウエイトである。ユーザは、“０＜重要度≦１”の範囲内で、重要度を設定する。文書校正支援装置１は、見本文書の各構成要素におけるセンテンス中の文字数又はキーワード数に基づき重要度を自動的に設定してもよい。
構成要素別重要度情報３９は、文書種類ごとに存在する。 (Importance information for each component)
FIG. 11 is an example of the importance level information 39 for each component. In the component-specific importance level information 39, the component (column 171) and the level of importance (column 172) are associated with each other and stored.
The component (column 171) is the component described above.
Importance (column 172) is the relative weight between multiple components. The user sets the importance within the range of "0<importance≦1". The document proofreading support apparatus 1 may automatically set the importance based on the number of characters or keywords in the sentence of each component of the sample document.
The component importance information 39 exists for each document type.

（スコア情報）
図１２は、スコア情報４０の一例である。スコア情報４０においては、センテンスＩＤ（欄１８１）、未知センテンス（欄１８２）、工程スコア（欄１８３）、作業費スコア（欄１８４）、旅費スコア（欄１８５）及び作業内容スコア（欄１８６）が相互に関連付けて記憶されている。
センテンスＩＤ（欄１８１）は、図５のセンテンスＩＤと同じである。
未知センテンス（欄１８２）は、図５の未知センテンスと同じである。 (Score information)
FIG. 12 is an example of the score information 40. As shown in FIG. In the score information 40, the sentence ID (column 181), the unknown sentence (column 182), the process score (column 183), the work cost score (column 184), the travel cost score (column 185), and the work content score (column 186). stored in association with each other.
The sentence ID (column 181) is the same as the sentence ID in FIG.
The unknown sentence (column 182) is the same as the unknown sentence of FIG.

工程スコア（欄１８３）は、図１０の工程距離に対して、図１１の重要度のうち工程に対応するものを乗算した値である。
作業費スコア（欄１８４）は、図１０の作業費距離に対して、図１１の重要度のうち作業費対応するものを乗算した値である。
旅費スコア（欄１８５）は、図１０の旅費距離に対して、図１１の重要度のうち旅費に対応するものを乗算した値である。
作業内容スコア（欄１８６）は、図１０の作業内容距離に対して、図１１の重要度のうち作業内容に対応するものを乗算した値である。
スコア情報４０もまた、文書種類ごとに存在する。前記では、スコアは距離に対し重要度を乗算したものとしたが、これはあくまでも一例である。スコアは、加算、指数計算等を使用して算出されてもよい。要するに、距離が大きいほど、かつ、重要度が大きいほど、スコアも大きくなればよい。 The process score (column 183) is a value obtained by multiplying the process distance shown in FIG. 10 by the degree of importance shown in FIG. 11 corresponding to the process.
The work cost score (column 184) is a value obtained by multiplying the work cost distance in FIG. 10 by the degree of importance in FIG. 11 corresponding to the work cost.
The travel expense score (column 185) is a value obtained by multiplying the travel expense distance in FIG. 10 by the degree of importance in FIG. 11 corresponding to the travel expense.
The work content score (column 186) is a value obtained by multiplying the work content distance in FIG. 10 by the degree of importance in FIG. 11 corresponding to the work content.
Score information 40 also exists for each document type. In the above description, the score is obtained by multiplying the distance by the degree of importance, but this is only an example. Scores may be calculated using addition, exponential calculations, and the like. In short, the greater the distance and the greater the importance, the greater the score.

（スコア・校正時間換算情報）
図１３は、スコア・校正時間換算情報４１の一例である。スコア・校正時間換算情報４１においては、スコア（欄１９１）及び校正時間（欄１９２）が相互に関連付けて記憶されている。
スコア（欄１９１）は、例えば前記した“工程スコア”であり、より一般的には、センテンスベクトルと構成要素のクラスタの中心との間の距離に対して、その構成要素についての重要度を演算（乗算等）した値である。
校正時間（欄１９２）は、未知センテンスのうち、そのスコアに対応する間違いの箇所を校正するために必要な時間である。ユーザは、過去の事例に基づき、秒単位で校正時間を設定する。文書校正支援装置１は、ユーザが実際に校正に有した時間に基づき校正時間を更新してもよい。 (Score/correction time conversion information)
FIG. 13 is an example of the score/proofreading time conversion information 41 . In the score/proofreading time conversion information 41, scores (column 191) and proofreading times (column 192) are stored in association with each other.
The score (column 191) is, for example, the "process score" described above, and more generally calculates the importance of the component with respect to the distance between the sentence vector and the center of the cluster of the component. It is a value obtained by (multiplying, etc.).
The proofreading time (column 192) is the time required to proofread the erroneous part corresponding to the score in the unknown sentence. The user sets the calibration time in seconds based on past cases. The document proofreading support apparatus 1 may update the proofreading time based on the time the user actually spent proofreading.

（未知センテンス別校正時間情報）
図１４は、未知センテンス別校正時間情報４２の一例である。未知センテンス別校正時間情報４２においては、センテンスＩＤ（欄２０１）、未知センテンス（欄２０２）、工程スコア（欄２０３ａ）、工程校正時間（欄２０３ｂ）、作業費スコア（欄２０４ａ）、作業費校正時間（欄２０４ｂ）、旅費スコア（欄２０５ａ）、旅費校正時間（欄２０５ｂ）、作業内容スコア（欄２０６ａ）及び作業内容校正時間（欄２０６ｂ）が相互に関連付けて記憶されている。 (Calibration time information for each unknown sentence)
FIG. 14 is an example of the calibration time information 42 classified by unknown sentence. In the unknown sentence proofreading time information 42, sentence ID (column 201), unknown sentence (column 202), process score (column 203a), process proofreading time (column 203b), work cost score (column 204a), work cost proofreading Time (column 204b), travel expense score (column 205a), travel expense proofreading time (column 205b), work content score (column 206a), and work content proofreading time (column 206b) are stored in association with each other.

センテンスＩＤ（欄２０１）は、図５のセンテンスＩＤと同じである。
未知センテンス（欄２０２）は、図５の未知センテンスと同じである。
工程スコア（欄２０３ａ）は、図１２の工程スコアと同じである。
工程校正時間（欄２０３ｂ）は、スコア・校正時間換算情報４１（図１３）が工程スコアを換算した結果の校正時間である。
作業費スコア（欄２０４ａ）は、図１２の作業費スコアと同じである。
作業費校正時間（欄２０４ｂ）は、スコア・校正時間換算情報４１が作業費スコアを換算した結果の校正時間である。 The sentence ID (column 201) is the same as the sentence ID in FIG.
The unknown sentence (column 202) is the same as the unknown sentence in FIG.
The process score (column 203a) is the same as the process score in FIG.
The process calibration time (column 203b) is the calibration time resulting from conversion of the process score by the score/calibration time conversion information 41 (FIG. 13).
The work cost score (column 204a) is the same as the work cost score in FIG.
The work cost proofreading time (column 204b) is the proofreading time resulting from the conversion of the work cost score by the score/proofreading time conversion information 41 .

旅費スコア（欄２０５ａ）は、図１２の旅費スコアと同じである。
旅費校正時間（欄２０５ｂ）は、スコア・校正時間換算情報４１が旅費スコアを換算した結果の校正時間である。
作業内容スコア（欄２０６ａ）は、図１２の作業内容スコアと同じである。
作業内容校正時間（欄２０６ｂ）は、スコア・校正時間換算情報４１が作業内容スコアを換算した結果の校正時間である。 The travel expense score (column 205a) is the same as the travel expense score in FIG.
The travel expense proofreading time (column 205b) is the proofreading time resulting from the conversion of the travel expense score by the score/proofreading time conversion information 41 .
The work content score (column 206a) is the same as the work content score in FIG.
The work content proofreading time (column 206b) is the proofreading time resulting from conversion of the work content score by the score/proofreading time conversion information 41 .

未知センテンス別校正時間情報４２がスコアだけでなく校正時間を記憶することによって、ユーザは、ある未知センテンスをどの構成要素に校正する場合どの程度の時間を要するかがわかるようになる。 The unknown sentence proofreading time information 42 stores not only the score but also the proofreading time, so that the user can know how much time it takes to proofread an unknown sentence to which component.

（処理手順）
以降で本実施形態の処理手順を説明する。処理手順は３つ存在し、それらは、文書解析処理手順、一致センテンス処理手順及び未知センテンス処理手順である。 (Processing procedure)
The processing procedure of this embodiment will be described below. There are three procedures: document analysis procedure, matched sentence procedure and unknown sentence procedure.

（文書解析処理手順）
図１５は、文書解析処理手順のフローチャートである。
ステップＳ３０１において、文書校正支援装置１の文書解析部２１は、文書を取得する。具体的には、文書解析部２１は、文書３１を、入力装置１２を介して外部から又は補助記憶装置１５から取得する。 (Document analysis processing procedure)
FIG. 15 is a flow chart of the document analysis processing procedure.
In step S301, the document analysis unit 21 of the document proofreading support apparatus 1 acquires a document. Specifically, the document analysis unit 21 acquires the document 31 from the outside or from the auxiliary storage device 15 via the input device 12 .

ステップＳ３０２において、文書解析部２１は、文字列を取得する。具体的には、文書解析部２１は、文書３１の中から、文字列を取得する。
ステップＳ３０３において、文書解析部２１は、文字列をセンテンスに分割する。具体的には、文書解析部２１は、句点“。”を区切りとして、文字列を複数のセンテンスに分割する。このとき、文書解析部２１は、形態素解析（品詞分解）及び単語間の係り受け解析を行ってもよい。 In step S302, the document analysis unit 21 acquires a character string. Specifically, the document analysis unit 21 acquires character strings from the document 31 .
In step S303, the document analysis unit 21 divides the character string into sentences. Specifically, the document analysis unit 21 divides the character string into a plurality of sentences using the period "." as a delimiter. At this time, the document analysis unit 21 may perform morphological analysis (part-of-speech analysis) and dependency analysis between words.

ステップＳ３０４において、文書解析部２１は、センテンスとルールとを突合する。具体的には、第１に、文書解析部２１は、未処理のセンテンスのうち任意の１つを取得する。
第２に、文書解析部２１は、センテンスとルール情報３２（図３）の各ルールとを突合し、そのセンテンスに一致するすべてのルールを特定する。
第３に、文書解析部２１は、ステップＳ３０４の“第２”において特定したルールの数をカウントする。カウント結果は、“０”、“１”、“２”、“３”、・・・である。 In step S304, the document analysis unit 21 compares sentences and rules. Specifically, first, the document analysis unit 21 acquires any one of the unprocessed sentences.
Second, the document analysis unit 21 compares the sentence with each rule in the rule information 32 (FIG. 3) to identify all rules that match the sentence.
Third, the document analysis unit 21 counts the number of rules specified in the "second" of step S304. The count results are "0", "1", "2", "3", .

ステップＳ３０５において、文書解析部２１は、センテンスがルールに一致するか否かを判断する。具体的には、文書解析部２１は、ステップＳ３０４の“第３”におけるカウント結果が“０”である場合（ステップＳ３０５“ＮＯ”）、ステップＳ３０７に進み、それ以外の場合（ステップＳ３０５“ＹＥＳ”）、ステップＳ３０６に進む。 In step S305, the document analysis unit 21 determines whether the sentence matches the rule. Specifically, when the count result in the “third” of step S304 is “0” (step S305 “NO”), the document analysis unit 21 proceeds to step S307; otherwise (step S305 “YES”). ”), and the process proceeds to step S306.

ステップＳ３０６において、文書解析部２１は、一致センテンス情報３３（図４）に登録する。具体的には、文書解析部２１は、一致センテンス情報３３において、処理対象のセンテンスについてのレコードを作成する。 In step S306, the document analysis unit 21 registers the matching sentence information 33 (FIG. 4). Specifically, the document analysis unit 21 creates a record for the sentence to be processed in the matching sentence information 33 .

ステップＳ３０７において、文書解析部２１は、未知センテンス情報３４（図５）に登録する。具体的には、文書解析部２１は、未知センテンス情報３４において、処理対象のセンテンスについてのレコードを作成する。文書解析部２１は、ステップＳ３０５において、文書３１に含まれるセンテンスが所定のルールに一致しない場合、ステップＳ３０７において、そのセンテンスが意味的な間違いを含み得る未知センテンスであると判断することになる。 In step S307, the document analysis unit 21 registers the unknown sentence information 34 (FIG. 5). Specifically, the document analysis unit 21 creates a record for the sentence to be processed in the unknown sentence information 34 . If the sentence included in the document 31 does not match the predetermined rule in step S305, the document analysis unit 21 determines in step S307 that the sentence is an unknown sentence that may contain a semantic error.

文書解析部２１は、ステップＳ３０４以降の処理を、未処理のセンテンスごとに繰り返し、最後のセンテンスについてのステップＳ３０６又はＳ３０７の後に文書解析処理手順を終了する。文書解析処理手順が終了した時点で、ステップＳ３０１において取得した文書３１に含まれるすべてのセンテンスは、一致センテンス情報３３（図４）又は未知センテンス情報３４（図５）に仕分けられたうえで記憶されている。 The document analysis unit 21 repeats the processing after step S304 for each unprocessed sentence, and ends the document analysis processing procedure after step S306 or S307 for the last sentence. When the document analysis processing procedure ends, all the sentences included in the document 31 acquired in step S301 are sorted into matching sentence information 33 (FIG. 4) or unknown sentence information 34 (FIG. 5) and stored. ing.

（一致センテンス処理手順）
図１６は、一致センテンス処理手順のフローチャートである。
ステップＳ３２１において、文書校正支援装置１の一致センテンス処理部２２は、一致センテンスを取得する。具体的には、一致センテンス処理部２２は、一致センテンス情報３３（図４）から未処理の任意の一致センテンスを取得する。 (Matching sentence processing procedure)
FIG. 16 is a flow chart of the matching sentence processing procedure.
In step S321, the matching sentence processing unit 22 of the document proofreading support apparatus 1 acquires a matching sentence. Specifically, the matching sentence processing unit 22 acquires any unprocessed matching sentence from the matching sentence information 33 (FIG. 4).

ステップＳ３２２において、一致センテンス処理部２２は、ルールに基づき校正時間を取得する。具体的には、一致センテンス処理部２２は、ステップＳ３２１において取得したセンテンスに一致するすべてのルールの校正時間をルール別校正時間情報３５（図６）から取得する。 In step S322, the matching sentence processing section 22 acquires the proofreading time based on the rule. Specifically, the matching sentence processing unit 22 acquires the proofreading times of all rules matching the sentence obtained in step S321 from the rule-by-rule proofreading time information 35 (FIG. 6).

ステップＳ３２３において、一致センテンス処理部２２は、センテンスごとに校正時間を合計する。具体的には、一致センテンス処理部２２は、ステップＳ３２２において取得した校正時間を合計する。 In step S323, the matching sentence processing unit 22 totals the proofreading time for each sentence. Specifically, the matching sentence processing unit 22 sums up the proofreading times acquired in step S322.

ステップＳ３２４において、一致センテンス処理部２２は、一致センテンス別校正時間情報３６（図７）に登録する。具体的には、一致センテンス処理部２２は、一致センテンス別校正時間情報３６において、処理対象のセンテンスについてのレコードを作成する。
一致センテンス処理部２２は、ステップＳ３２１～Ｓ３２４の処理を、未処理の一致センテンスごとに繰り返す。未処理の一致センテンスがなくなった段階で、一致センテンス処理手順を終了する。 In step S324, the matching sentence processing unit 22 registers in the matching sentence proofreading time information 36 (FIG. 7). Specifically, the matching sentence processing unit 22 creates a record for the sentence to be processed in the matching sentence proofreading time information 36 .
The matching sentence processing unit 22 repeats the processing of steps S321 to S324 for each unprocessed matching sentence. When there are no more unprocessed matching sentences, the matching sentence processing procedure ends.

（未知センテンス処理手順）
図１７は、未知センテンス処理手順のフローチャートである。
ステップＳ３４１において、文書校正支援装置１の未知センテンス処理部２３は、未知センテンスを取得する。具体的には、未知センテンス処理部２３は、未知センテンス情報３４（図５）から未処理の任意の未知センテンスを取得する。 (Unknown sentence processing procedure)
FIG. 17 is a flow chart of the unknown sentence processing procedure.
In step S341, the unknown sentence processing section 23 of the document proofreading support apparatus 1 acquires an unknown sentence. Specifically, the unknown sentence processing unit 23 acquires any unprocessed unknown sentence from the unknown sentence information 34 (FIG. 5).

ステップＳ３４２において、未知センテンス処理部２３は、文書種類を受け付ける。具体的には、第１に、未知センテンス処理部２３は、ステップＳ３０１において取得した文書３１を出力装置１３に表示する。
第２に、未知センテンス処理部２３は、ユーザが入力装置１２を介して文書種類を入力するのを受け付ける。ユーザは、文書３１を視認して、入力するべき文書種類を決定する。説明の都合上、ここでは“見積書”が入力されたとする。未知センテンス処理部２３は、ユーザによる入力を待つまでもなく、例えば文書３１のタイトル等に基づき、自動的に文書種類を決定してもよい。 At step S342, the unknown sentence processing unit 23 accepts the document type. Specifically, first, the unknown sentence processing unit 23 displays the document 31 acquired in step S301 on the output device 13 .
Second, the unknown sentence processing unit 23 accepts input of the document type by the user via the input device 12 . The user views the document 31 and determines the type of document to be input. For convenience of explanation, it is assumed here that "estimate" has been entered. The unknown sentence processing unit 23 may automatically determine the document type based on, for example, the title of the document 31 without waiting for input by the user.

ステップＳ３４３において、未知センテンス処理部２３は、センテンスベクトルを作成する。具体的には、未知センテンス処理部２３は、ステップＳ３４１において取得したセンテンスを前記した方法でセンテンスベクトルに変換する。 In step S343, the unknown sentence processing unit 23 creates a sentence vector. Specifically, the unknown sentence processing unit 23 converts the sentence obtained in step S341 into a sentence vector by the method described above.

ステップＳ３４４において、未知センテンス処理部２３は、センテンス空間４４を作成する。具体的には、第１に、未知センテンス処理部２３は、図９のセンテンス空間４４を作成し、見積書の見本文書を学習データ（●）として、複数のクラスタを作成する。作成された個々のクラスタは、文書種類・構成要素情報３７（図８）の構成要素１～構成要素４に対応している。ここでのクラスタは、そのクラスタに分類されるすべての●を包絡する最小の球であってもよいし、すべての●の重心を中心とし、重心から最も遠い●までの距離を半径とする球であってもよい。未知センテンス処理部２３は、任意のタイミングにおいて当該処理を予め完了させておいてもよい。
第２に、未知センテンス処理部２３は、ステップＳ３４３において作成したセンテンスベクトル（○）を、センテンス空間４４に描画する。 In step S344, the unknown sentence processing unit 23 creates the sentence space 44. FIG. Specifically, first, the unknown sentence processing unit 23 creates the sentence space 44 in FIG. 9, and creates a plurality of clusters using the sample document of the quotation as learning data (●). Each created cluster corresponds to the component 1 to component 4 of the document type/component information 37 (FIG. 8). A cluster here may be the smallest sphere that envelops all ● classified into that cluster, or a sphere whose center is the center of gravity of all ● and whose radius is the distance from the center of gravity to the furthest ● may be The unknown sentence processing section 23 may complete the processing in advance at any timing.
Second, the unknown sentence processing unit 23 draws the sentence vector (○) created in step S343 in the sentence space 44 .

ステップＳ３４５において、未知センテンス処理部２３は、未知センテンスが構成要素を含むか否かを判断する。具体的には、第１に、未知センテンス処理部２３は、ステップＳ３４４の“第２”において描画した○が、いずれかのクラスタの内部に存在するか否かを調べる。
第２に、未知センテンス処理部２３は、○がいずれかのクラスタの内部に存在する場合（ステップＳ３４５“ＹＥＳ”）、ステップＳ３４６に進み、それ以外の場合（ステップＳ３４５“ＮＯ”）、ステップＳ３４７に進む。 In step S345, the unknown sentence processing section 23 determines whether or not the unknown sentence contains a component. Specifically, first, the unknown sentence processing unit 23 checks whether or not the ◯ drawn in the “second” of step S344 exists inside any cluster.
Secondly, the unknown sentence processing unit 23 proceeds to step S346 if ◯ exists inside any cluster (step S345 "YES"), otherwise (step S345 "NO"), step S347. proceed to

ステップＳ３４６において、未知センテンス処理部２３は、スコア及び校正時間を“０”とする。具体的には、未知センテンス処理部２３は、ステップＳ３４１において取得した未知センテンスのスコア及び校正時間は“０”であるとする。ここで未知センテンス処理部２３は、その未知センテンスが見積書に通常含まれるいずれかの構成要素を記載している結果、その未知センテンスは校正を要しないと判断している。 At step S346, the unknown sentence processing unit 23 sets the score and proofreading time to "0". Specifically, the unknown sentence processing unit 23 assumes that the score and proofreading time of the unknown sentence acquired in step S341 are "0". Here, the unknown sentence processing unit 23 determines that the unknown sentence does not require proofreading because the unknown sentence describes one of the constituent elements normally included in the quotation.

ステップＳ３４７において、未知センテンス処理部２３は、距離を算出する。具体的には、第１に、未知センテンス処理部２３は、見積書についての構成要素推定モデル４３に対し、ステップＳ３４３において作成したセンテンスベクトルを入力する。すると、構成要素推定モデル４３は、センテンス空間４４における、当該未知センテンス（○）と各クラスタの中心との距離を出力する。未知センテンス処理部２３は、この距離を受け取る。
第２に、未知センテンス処理部２３は、ステップＳ３４７の“第１”において受け取った距離に基づき、距離情報３８（図１０）のレコードを作成する。ステップＳ３４７において、未知センテンス処理部２３は、文書３１の種類に応じて定義される複数の構成要素のうちのいずれに未知センテンスが近似するかを推定することになる。 In step S347, the unknown sentence processing section 23 calculates the distance. Specifically, first, the unknown sentence processing unit 23 inputs the sentence vector created in step S343 to the constituent element estimation model 43 for the quotation. Then, the constituent element estimation model 43 outputs the distance between the unknown sentence (○) and the center of each cluster in the sentence space 44 . The unknown sentence processor 23 receives this distance.
Secondly, the unknown sentence processing unit 23 creates a record of the distance information 38 (FIG. 10) based on the distance received in the "first" of step S347. At step S347, the unknown sentence processing unit 23 estimates which of the plurality of components defined according to the type of the document 31 the unknown sentence is similar to.

ステップＳ３４８において、未知センテンス処理部２３は、スコアを算出する。具体的には、第１に、未知センテンス処理部２３は、ステップＳ３４７の“第２”において作成したレコードの工程距離に対し、図１１の重要度のうち工程に対応するものを乗算し、工程スコアを算出する。未知センテンス処理部２３は、同様にして、作業費スコア、旅費スコア及び作業内容スコアも算出する。
第２に、未知センテンス処理部２３は、ステップＳ３４８の“第１”において算出したスコアに基づき、スコア情報４０（図１２）のレコードを作成する。 In step S348, the unknown sentence processing unit 23 calculates a score. Specifically, first, the unknown sentence processing unit 23 multiplies the process distance of the record created in the "second" step S347 by the importance corresponding to the process in FIG. Calculate the score. The unknown sentence processing unit 23 similarly calculates the work cost score, the travel cost score, and the work content score.
Secondly, the unknown sentence processing unit 23 creates a record of the score information 40 (FIG. 12) based on the score calculated in the "first" of step S348.

ステップＳ３４９において、未知センテンス処理部２３は、校正時間を算出する。具体的には、未知センテンス処理部２３は、ステップＳ３４８の“第２”において作成したレコードの工程スコアに対し、図１３のスコア・校正時間換算情報４１を適用し、工程校正時間を算出する。未知センテンス処理部２３は、同様にして、作業費校正時間、旅費校正時間及び作業内容校正時間も算出する。 In step S349, the unknown sentence processing section 23 calculates the calibration time. Specifically, the unknown sentence processing unit 23 applies the score/proofreading time conversion information 41 of FIG. 13 to the process score of the record created in the "second" of step S348 to calculate the process proofreading time. The unknown sentence processing unit 23 similarly calculates the work expense proofreading time, the travel expense proofreading time, and the work content proofreading time.

ステップＳ３５０において、未知センテンス処理部２３は、未知センテンス別校正時間情報４２（図１４）に登録する。具体的には、未知センテンス処理部２３は、ステップＳ３４６、Ｓ３４８及びＳ３４９において算出したスコア及び校正時間に基づき、未知センテンス別校正時間情報４２（図１４）のレコードを作成する。 In step S350, the unknown sentence processing unit 23 registers the unknown sentence-based calibration time information 42 (FIG. 14). Specifically, the unknown sentence processing unit 23 creates a record of the unknown sentence proofreading time information 42 (FIG. 14) based on the scores and proofreading times calculated in steps S346, S348 and S349.

ステップＳ３５１において、文書校正支援装置１の表示処理部２４は、校正時間を表示する。具体的には、表示処理部２４は、ステップＳ３２４において作成したレコード及びステップＳ３５０において作成したレコードを使用して出力装置１３に校正時間表示画面７１（図１８）を表示する。その後、未知センテンス処理手順を終了する。 In step S351, the display processing unit 24 of the document proofreading support apparatus 1 displays the proofreading time. Specifically, the display processing unit 24 displays the calibration time display screen 71 (FIG. 18) on the output device 13 using the record created in step S324 and the record created in step S350. After that, the unknown sentence processing procedure is terminated.

図１８は、校正時間表示画面７１の一例である。一致センテンス欄７２には、文書３１の一致センテンスについての校正時間及び重要度が表示されている。ここでの校正時間及び重要度は、原則、一致センテンスに一致するルールごとに表示される。未知センテンス欄７３には、文書３１の未知センテンスについてのスコア及び校正時間が表示されている。ここでのスコア及び校正時間は、原則、未知センテンスごとかつ構成要素ごとに表示される。いま、ユーザが一致センテンス欄７２及び未知センテンス欄７３のあるレコードの選択欄にチェックマークを入力したとする。すると、表示処理部２４は、文書欄７４に文書３１を表示したうえで、選択されたセンテンスを強調表示（例えば下線付与）する。ここでの文書３１は、図２の文書３１とは異なる。 FIG. 18 is an example of the calibration time display screen 71. As shown in FIG. The matching sentence column 72 displays the proofreading time and importance of the matching sentence of the document 31 . The proofreading time and importance here are, in principle, displayed for each rule that matches the matching sentence. The unknown sentence column 73 displays the score and proofreading time for the unknown sentence of the document 31 . In principle, the score and proofreading time here are displayed for each unknown sentence and for each component. Suppose now that the user has entered a check mark in the selection column of the record having the matching sentence column 72 and the unknown sentence column 73 . Then, the display processing unit 24 displays the document 31 in the document column 74 and then highlights (for example, underlines) the selected sentence. The document 31 here is different from the document 31 of FIG.

文書欄７４において、センテンスＳＥ０３は、未知センテンスである。表示処理部２４は、センテンスＳＥ０３に吹き出し７５を付している。吹き出し７５には“最近似構成要素：工程”が記載されている。このことは、センテンスＳＥ０３と各構成要素との距離のうち、“工程距離”が最も短いことを示している。 In the document field 74, sentence SE03 is an unknown sentence. The display processing unit 24 attaches a balloon 75 to the sentence SE03. A balloon 75 describes "closest component: process". This indicates that the "process distance" is the shortest among the distances between the sentence SE03 and each component.

この場合、例えば以下のことが想定される。
・文書作成者は、工程についてセンテンスＳＥ０３を記載しようとしたにもかかわらず、僅かに注意力が不足した結果、センテンスＳＥ０３が意味的な間違いを含んでしまった可能性が高い。
・文書作成者は、いずれの構成要素とも関係のない事象についてセンテンスＳＥ０３を記載していた可能性も高い。このセンテンスをいずれかの構成要素についての記載に校正することは可能である。その場合、未知センテンスＳＥ０３が工程に最も近似していることを考慮すれば、工程の重要度が極端に大きくない限り、未知センテンスＳＥ０３を工程についてのセンテンスに校正する校正時間が最も短い。 In this case, for example, the following are assumed.
・Although the document creator tried to write the sentence SE03 about the process, it is highly likely that the sentence SE03 contained a semantic error as a result of a slight lack of attention.
・There is a high possibility that the document creator wrote sentence SE03 about an event unrelated to any component. It is possible to modify this sentence to describe any component. In that case, considering that the unknown sentence SE03 is closest to the process, the calibration time to calibrate the unknown sentence SE03 to a sentence about the process is the shortest unless the importance of the process is extremely large.

表示処理部２４は、センテンスＳＥ２１に吹き出し７６を付している。吹き出し７６には、センテンスＳＥ２１に一致する２つのルールが記載されている。表示処理部２４は、ユーザ（文書作成者又は校正担当者）が文書３１の校正に使用できる時間を、ユーザから受け付け、又は、ユーザのスケジュール情報等から取得し、対応可能時間７７として表示する。表示処理部２４は、文書３１が含むすべてのセンテンス又はそのうち入力されたチェックマークに対応するセンテンスの校正に要する時間（前記した校正時間の和）を予測校正時間７８として表示する。表示処理部２４は、文書欄７４においてユーザがセンテンスを校正した結果を補助記憶装置１５に記憶してもよい。 The display processing unit 24 attaches a balloon 76 to the sentence SE21. Balloon 76 describes two rules that match sentence SE21. The display processing unit 24 receives from the user or obtains from the user's schedule information or the like the time that the user (document creator or person in charge of proofreading) can use to proofread the document 31 , and displays it as the available time 77 . The display processing unit 24 displays the time required for proofreading all the sentences included in the document 31 or the sentence corresponding to the input check mark (the sum of the proofreading times described above) as the predicted proofreading time 78 . The display processing unit 24 may store the result of proofreading of the sentence by the user in the document field 74 in the auxiliary storage device 15 .

表示処理部２４は、ステップＳ３４５において校正を要しないと判断された未知センテンスを校正時間表示画面７１の任意の箇所に表示してもよい。 The display processing unit 24 may display the unknown sentence determined not to require proofreading in step S345 at any location on the proofreading time display screen 71. FIG.

（本実施形態の効果）
本実施形態の文書校正支援装置の効果は以下の通りである。
（１）文書校正支援装置は、意味的な間違いを含み得る未知センテンス及びその未知センテンスが近似する文書の構成要素を表示することができる。
（２）文書校正支援装置は、センテンス空間内の距離として、未知センテンスと文書の各構成要素との近似を数値化することができる。
（３）文書校正支援装置は、未知センテンスをセンテンスベクトルに変換することによって、未知センテンスの構成要素を正確に推定することができる。 (Effect of this embodiment)
The effects of the document proofreading support apparatus of this embodiment are as follows.
(1) The document proofreading support device can display an unknown sentence that may contain a semantic error and the constituent elements of the document that the unknown sentence is similar to.
(2) The document proofreading support apparatus can quantify the approximation between the unknown sentence and each component of the document as the distance in the sentence space.
(3) The document proofreading support apparatus can accurately estimate the components of an unknown sentence by converting the unknown sentence into a sentence vector.

（４）文書校正支援装置は、学習データを更新することによってクラスタの位置及び大きさを更新することができる。
（５）文書校正支援装置は、校正する必要がない未知センテンスを正確に特定することができる。
（６）文書校正支援装置は、構成要素ごとの重要度を距離に反映させることができる。
（７）文書校正支援装置は、未知センテンスの校正に必要な時間を表示することができる。 (4) The document proofreading support apparatus can update the position and size of the cluster by updating the learning data.
(5) The document proofreading support device can accurately identify unknown sentences that do not need to be proofread.
(6) The document proofreading support device can reflect the importance of each component in the distance.
(7) The document proofreading support device can display the time required to proofread an unknown sentence.

なお、本発明は前記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、前記した実施例は、本発明を分かり易く説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 In addition, the present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the described configurations. In addition, it is possible to replace part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Moreover, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration.

１文書校正支援装置
１１中央制御装置
１２入力装置
１３出力装置
１４主記憶装置
１５補助記憶装置
２１文書解析部
２２一致センテンス処理部
２３未知センテンス処理部
２４表示処理部
３１文書
３２ルール情報
３３一致センテンス情報
３４未知センテンス情報
３５ルール別校正時間情報
３６一致センテンス別校正時間情報
３７文書種類・構成要素情報
３８距離情報
３９構成要素別重要度情報
４０スコア情報
４１スコア・校正時間換算情報
４２未知センテンス別校正時間情報
４３構成要素推定モデル
４４センテンス空間
７１校正時間表示画面 1 Document Proofing Support Device 11 Central Control Device 12 Input Device 13 Output Device 14 Main Storage Device 15 Auxiliary Storage Device 21 Document Analysis Section 22 Matching Sentence Processing Section 23 Unknown Sentence Processing Section 24 Display Processing Section 31 Document 32 Rule Information 33 Matching Sentence Information 34 unknown sentence information 35 proofreading time information by rule 36 proofreading time information by matching sentence 37 document type/component information 38 distance information 39 importance information by component 40 score information 41 score/proofreading time conversion information 42 proofreading time by unknown sentence Information 43 Component estimation model 44 Sentence space 71 Calibration time display screen

Claims

a document analysis unit that determines whether or not the sentence contained in the document is an unknown sentence that may contain a semantic error, based on whether or not the sentence contained in the document matches a predetermined rule;
an unknown sentence processing unit for estimating to which of a plurality of components defined according to the type of the document the unknown sentence approximates;
a display processing unit that displays the unknown sentence and the component that the unknown sentence approximates;
A document proofreading support device comprising:

The unknown sentence processing unit
using a component estimation model that takes as input an unknown sentence that does not match the predetermined rule and outputs distances between the unknown sentence and each of the plurality of components;
The document proofreading support device according to claim 1, characterized by:

The unknown sentence processing unit
converting the unknown sentence to a sentence vector, inputting the converted sentence vector to the component estimation model, and obtaining the distance from the component estimation model;
3. The document proofreading support device according to claim 2, characterized by:

The component estimation model is
Calculating the distance between the converted sentence vector and each of the plurality of clusters in a space in which sentences as learning data are classified into a plurality of clusters;
4. The document proofreading support device according to claim 3, characterized by:

The unknown sentence processing unit
determining that there is no need to proofread the unknown sentence if the unknown sentence is classified into one of the plurality of clusters;
5. The document proofreading support device according to claim 4, characterized by:

The unknown sentence processing unit
calculating a score for each component based on the distance and the importance defined for each component;
The display processing unit
displaying the calculated score in association with the unknown sentence;
The document proofreading support device according to claim 5, characterized by:

The unknown sentence processing unit
Convert the score to the time required for proofreading,
The display processing unit
displaying the converted time in association with the unknown sentence;
7. The document proofreading support device according to claim 6, characterized by:

The document analysis unit of the document proofreading support device
determining whether or not the sentence contained in the document is an unknown sentence that may contain a semantic error, depending on whether or not the sentence contained in the document matches a predetermined rule;
The unknown sentence processing unit of the document proofreading support device includes:
estimating to which of a plurality of components defined according to the type of the document the unknown sentence approximates;
The display processing unit of the document proofreading support device includes:
displaying the unknown sentence and the component to which the unknown sentence approximates;
A document proofreading support method for a document proofreading support device characterized by:

the computer,
a document analysis unit that determines whether or not the sentence contained in the document is an unknown sentence that may contain a semantic error, based on whether or not the sentence contained in the document matches a predetermined rule;
an unknown sentence processing unit for estimating to which of a plurality of components defined according to the type of the document the unknown sentence approximates;
a display processing unit that displays the unknown sentence and the component that the unknown sentence approximates;
A document proofreading support program to function as