JP5312531B2

JP5312531B2 - Text association system and text correspondence program

Info

Publication number: JP5312531B2
Application number: JP2011159799A
Authority: JP
Inventors: 禎崇古宮; 孝道秋間
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2011-07-21
Filing date: 2011-07-21
Publication date: 2013-10-09
Anticipated expiration: 2028-04-15
Also published as: JP2011233164A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sentence associating system and a sentence associating program that execute, when association between sentences related to practical conduct of business is to be established, association between a plurality of sentences. <P>SOLUTION: A sentence associating system 100 associates a sentence for practical conduct of business contained in a practical conduct of business sentence DB22 with a core sentence contained in a core sentence DB21. In doing so, the system figures out a degree of sentence similarity as a measure of identity between words contained in the core sentence and words contained in the sentence for practical conduct of business. Furthermore, each sentence for practical conduct of business is associated with the core sentence that gives the maximum sentence similarity thereto. After that, sentences whose association has been completed are accumulated as part of the core sentence collection, from which learning is to be done. <P>COPYRIGHT: (C)2012,JPO&INPIT

Description

本発明は、文章間の対応付けを行うシステムに関し、とくに、業務に関連する文章を扱うものに関する。 The present invention relates to a system for associating sentences, and more particularly to a system that handles sentences related to business.

自治体や会社等の組織において、業務の内容および手順や、情報システムの構成を最適化するための設計手法として、様々なものが知られている。たとえば、ＥＡ（エンタープライズアーキテクチャ）は、業務およびシステムについて現状（ＡｓＩｓ）とあるべき姿（ＴｏＢｅ）とを整理し、さらに、あるべき姿の実現を目指した業務・システムの改善方策を段階的に実施していくことで、業務の標準化・効率化を図ることを目的とするものである。
たとえば自治体の業務については、取り纏め組織である総務省が業務統一のベースとして自治体ＥＡの参照モデルを作成し、これを自治体に提示して業務改善するよう指導を行っている。自治体はこの参照モデルを雛型にＥＡ分析を実施し、業務フローの統一を目指している。この参照モデルの様式は、インターネット上で非特許文献１に開示される。 In organizations such as local governments and companies, various methods are known as design methods for optimizing the contents and procedures of operations and the configuration of information systems. For example, EA (Enterprise Architecture) organizes the current status (AsIs) and ideal form (ToBe) of business and systems, and implements step-by-step measures to improve the business and systems aimed at realizing the ideal form By doing so, it aims at standardization and efficiency improvement of business.
For example, regarding the work of the local government, the Ministry of Internal Affairs and Communications, which is the organizing organization, creates a reference model of the local government EA as a base for unifying the business, and provides guidance to the local government to improve the business. The local government aims to unify the business flow by conducting EA analysis using this reference model as a model. This reference model format is disclosed in Non-Patent Document 1 on the Internet.

このような手法において、現状をどのように最適化するかを検討するためには、現状とあるべき姿との対応関係が明確に把握されている必要がある。このため、たとえばＥＡにおける業務の分析では、現状の業務とあるべき姿の業務とを文章で記述し、それぞれの文章を対応付ける作業が行われる。 In order to examine how to optimize the current state in such a method, it is necessary to clearly understand the correspondence between the current state and the way it should be. For this reason, for example, in the analysis of business in EA, the current business and the business as it should be are described in text, and the work of associating each text is performed.

このような文章間の対応付けに応用可能な技術の例として、特許文献１に記載される検索システムが挙げられる。この検索システムは、任意に入力される例題文書を、データベースに登録された文書のいずれかに対応付けるものである。
また、文章でなく一般的な知識を表すデータを分類するとともに、分類の結果を学習する技術の例として、特許文献２に記載される知識処理システムが挙げられる。 An example of a technique applicable to such correspondence between sentences is a search system described in Patent Document 1. This search system associates an arbitrarily input example document with one of documents registered in a database.
A knowledge processing system described in Patent Document 2 is an example of a technique for classifying data representing general knowledge instead of sentences and learning the result of classification.

特開２００３−２８１１８６号公報JP 2003-281186 A 特開平５−１４３３４２号公報JP-A-5-143342

総務省、「自治体ＥＡ業務・システム刷新化の手引き」、[online]、２００６年、総務省、［平成２０年３月２７日検索］、インターネット<URL:http://www.soumu.go.jp/denshijiti/system_tebiki/hyouki/gyomu/2a-4-yokenteigi.html>Ministry of Internal Affairs and Communications, “Local Government EA Business / System Renewal Guide”, [online], 2006, Ministry of Internal Affairs and Communications, [March 27, 2008 search], Internet <URL: http: //www.soumu.go. jp / denshijiti / system_tebiki / hyouki / gyomu / 2a-4-yokenteigi.html>

しかしながら、従来の技術では、業務に関連する文章の対応付けを、複数の文章間で実行するシステムを構築することができないという問題があった。
たとえば、特許文献１には複数の例題文書を扱うことは明記されていない。
また、特許文献２の技術では、段落番号［００５４］等に記載されるように、適したフォーマットのデータを入力する必要がある。すなわち、業務内容を記述した、自然言語による文章を、曖昧性を排除したフォーマットに変換するために、高度な知識を有する管理者の作業が必要となる。このため、特許文献２の技術を、業務の内容を表す文章の処理に応用することは困難である。 However, the conventional technique has a problem in that it is not possible to construct a system that executes correspondence between sentences related to business among a plurality of sentences.
For example, Patent Document 1 does not specify that a plurality of example documents are handled.
In the technique of Patent Document 2, it is necessary to input data in a suitable format as described in paragraph number [0054] and the like. That is, an administrator with advanced knowledge is required to convert a natural language sentence describing business contents into a format that eliminates ambiguity. For this reason, it is difficult to apply the technique of Patent Document 2 to the processing of a sentence representing the content of business.

この発明は、このような問題点を解決するためになされたものであり、業務に関連する文章の対応付けを、複数の文章間で実行する、文章対応付けシステムおよび文章対応付けプログラムを提供することを目的とする。 The present invention has been made in order to solve such problems, and provides a sentence association system and a sentence association program for executing association of sentences related to work among a plurality of sentences. For the purpose.

上述の問題点を解決するため、この発明に係る文章対応付けシステムは、文章間の対応付けを行う、文章対応付けシステムであって、対応付けの基軸となる複数の基軸文章と、基軸文章に対して対応付けを行なう対象となる複数の実業務文章とを記憶する、記憶手段と、実業務文章のそれぞれについて、基軸文章のいずれかへの対応付けを行う、演算手段とを備え、実業務文章はそれぞれ、その実業務文章を特定するための業務番号と、その実業務文章に対応する作業の処理内容を表す実業務文字列とを含み、基軸文章はそれぞれ、その基軸文章を特定するための番号と、その基軸文章に対応する作業の処理内容を表す基軸文字列とを含み、演算手段は、基軸文字列および実業務文字列のそれぞれに含まれる単語を識別し、演算手段は、基軸文字列に含まれる基軸単語と、実業務文字列に含まれる実業務単語とが一致する度合いに基づいて、業務番号と基軸文章を特定するための番号とを用いて対応付けを行い、演算手段は、出力用のデータを作成し、出力用のデータは、基軸文章と、その基軸文章に対応付けられた実業務文章とを、左右に並列して出力することを示す。 In order to solve the above-described problems, a sentence association system according to the present invention is a sentence association system that associates sentences, and includes a plurality of basic sentences that serve as a basis of association, and a basic sentence. A storage means for storing a plurality of actual business sentences to be associated with each other, and a computing means for associating each of the actual business sentences with any of the basic sentences. Each sentence includes a work number for identifying the actual work sentence and an actual work character string indicating the processing contents of the work corresponding to the actual work sentence, and each of the basic sentences is a number for identifying the basic sentence. And a basic character string representing the processing content of the work corresponding to the basic sentence, the calculating means identifies the words included in each of the basic character string and the actual business character string, and the calculating means includes the basic character string Based on the degree to which the basic word included in the character string matches the actual business word included in the actual business character string, the business number is associated with the number for specifying the basic sentence, and the calculation means Indicates that data for output is generated, and the data for output indicates that the basic sentence and the actual business sentence associated with the basic sentence are output in parallel on the left and right.

演算手段は、単語のそれぞれについて定義される重みに基づいて対応付けを行ってもよい。
演算手段は、実業務文字列および基軸文字列のそれぞれについて、形態素解析を行って実業務単語および基軸単語を取得し、演算手段は、実業務文字列と基軸文字列との組み合わせのそれぞれについて、実業務単語と、基軸単語またはその同義語とが一致する回数を算出し、演算手段は、組み合わせのそれぞれについて、一致する回数と、一致した基軸単語について定義された重みとを乗算して、各実業務単語の一致度を算出し、演算手段は、組み合わせのそれぞれについて、すべての実業務単語の一致度の総和を算出し、総和に基づいて、その組み合わせにおける一致する度合いを文章類似度として算出し、演算手段は、実業務文字列のそれぞれを、最も大きい文章類似度を与える基軸文章に対して、または、文章類似度が閾値以上となる基軸文章すべてに対して、対応付けてもよい。
演算手段は、基軸単語が名詞である場合にはその基軸単語の重みを１とし、それ以外の場合にはその基軸単語の重みを０とし、記憶手段は、複数の基軸単語について同義語を定義した辞書ファイルを記憶してもよい。
演算手段は、業務の入力または出力となる情報の名称を表す文字列と、業務に関連する法令の箇条番号とに基づいて対応付けを行ってもよい。
記憶手段は、１つの基軸文章と、対応付けにおいてその基軸文章に対応付けられたすべての実業務文章とを、１つの基軸文章に集積して記憶してもよい。
実業務文章は自治体の業務の内容を表すものであってもよい。 The computing means may perform association based on weights defined for each word.
The computing means performs morphological analysis for each of the actual business character string and the basic character string to obtain the actual business word and the basic word, and the arithmetic means for each combination of the actual business character string and the basic character string, The number of matches between the actual business word and the base word or its synonym is calculated, and the computing means multiplies each combination by the number of matches and the weight defined for the matched base word, The degree of coincidence of the actual business word is calculated, and the computing means calculates the total sum of the coincidence degrees of all the actual business words for each combination, and calculates the degree of matching in the combination as the sentence similarity based on the total. Then, the computing means sets each of the actual business character strings with respect to the basic sentence giving the largest sentence similarity, or the sentence similarity is equal to or greater than a threshold value. For all axis sentence, it may be associated.
If the base word is a noun, the computing means sets the weight of the base word to 1, otherwise sets the weight of the base word to 0, and the storage means defines synonyms for the plurality of base words. You may memorize the dictionary file.
The computing means may perform association based on a character string representing the name of information to be input or output of a business and a clause number of a law related to the business.
The storage unit may accumulate and store one basic sentence and all actual business sentences associated with the basic sentence in the association in one basic sentence.
The actual business text may represent the content of the local government business.

この発明に係る文章対応付けプログラムは、コンピュータを、上述の文章対応付けシステムとして機能させる。 The sentence association program according to the present invention causes a computer to function as the above-described sentence association system.

この発明に係る文章対応付けシステムは、文章間の対応付けを行う、文章対応付けシステムであって、対応付けの基軸となる複数の基軸文章と、基軸文章に対して対応付けを行なう対象となる複数の実業務文章とを記憶する、記憶手段と、実業務文章のそれぞれについて、基軸文章のいずれかへの対応付けを行う、演算手段とを備え、実業務文章はそれぞれ、その実業務文章を特定するための業務番号と、その実業務文章に対応する作業の処理内容を表す実業務文字列とを含み、基軸文章はそれぞれ、その基軸文章を特定するための番号と、その基軸文章に対応する作業の処理内容を表す基軸文字列とを含み、演算手段は、基軸文字列および実業務文字列のそれぞれに含まれる単語を識別し、演算手段は、基軸文字列に含まれる基軸単語と、実業務文字列に含まれる実業務単語とが一致する度合いに基づいて、業務番号と基軸文章を特定するための番号とを用いて対応付けを行い、演算手段は、出力用のデータを作成し、出力用のデータは、基軸文章と、その基軸文章に対応付けられた実業務文章とを、左右に並列して出力することを示すので、基軸文章との対応付けが完了した実業務文章を、基軸文章の一部として集積して記憶し、これによって学習し、組織の業務に関連する文章の対応付けを、複数の文章間で実行することができる。
また、この発明に係る文章対応付けプログラムは、コンピュータを、上述の文章対応付けシステムとして機能させるので、組織の業務に関連する文章の対応付けを、複数の文章間で実行することができる。 The sentence association system according to the present invention is a sentence association system that associates sentences with each other, and is a target to be associated with a plurality of basic sentences serving as the basic axes of the association and the basic sentences. A storage means for storing a plurality of actual business sentences, and a computing means for associating each of the actual business sentences with any of the basic sentences, each of which identifies the actual business sentence A work number and an actual work character string indicating the processing contents of the work corresponding to the actual work sentence, and the basic sentence is a number for identifying the basic sentence and the work corresponding to the basic sentence, respectively. And a calculation means for identifying a word included in each of the basic character string and the actual business character string, and the calculation means includes a basic word included in the basic character string, Based on the degree of coincidence with the actual business word included in the business character string, the business number and the number for specifying the basic sentence are associated with each other, and the computing means creates output data, Since the output data indicates that the basic sentence and the actual business sentence associated with the basic sentence are output in parallel on the left and right, the actual business sentence that has been associated with the basic sentence is It is possible to accumulate and store as a part of the basic sentence, learn by this, and execute the correspondence of sentences related to the work of the organization among a plurality of sentences.
In addition, the sentence association program according to the present invention causes the computer to function as the above-described sentence association system, so that the association of sentences related to the work of the organization can be executed among a plurality of sentences.

本発明に係る文章対応付けシステムの構成を示す図である。It is a figure which shows the structure of the text matching system which concerns on this invention. 図１の基軸文章ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of basic text DB of FIG. 図２の基軸文章ＤＢの元となる表の例を示す図である。It is a figure which shows the example of the table | surface used as the origin of the basic text DB of FIG. 図１の実業務文章ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of actual business text DB of FIG. 図４の実業務文章ＤＢの元となる表の例を示す図である。It is a figure which shows the example of the table | surface used as the origin of actual business sentence DB of FIG. 図１の基軸単語ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of the basic word DB of FIG. 図１の基軸単語拡張ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of the basic word word expansion DB of FIG. 図１の実業務単語ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of actual business word DB of FIG. 図１の突合詳細ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of the matching detail DB of FIG. 図１の突合詳細ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of the matching detail DB of FIG. 図１の突合詳細ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of the matching detail DB of FIG. 図１の突合詳細ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of the matching detail DB of FIG. 図１の集計ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of total DB of FIG. 図１の対応付け結果ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of matching result DB of FIG. 図１４の対応関係を概略的に示す図である。It is a figure which shows the correspondence of FIG. 14 roughly. 図１の文章対応付けシステムが、基軸単語ＤＢおよび基軸単語拡張ＤＢを作成する際の処理の流れを表すフローチャートである。It is a flowchart showing the flow of a process at the time of the text matching system of FIG. 1 creating a base word DB and a base word expansion DB. 図１の文章対応付けシステムが、基軸文章と実業務文章とを対応付ける際の処理の流れを表すフローチャートである。It is a flowchart showing the flow of a process at the time of the text matching system of FIG. 1 matching a basic text and a real business text. 図１の文章対応付けシステムによる、対応付け結果の出力の例を示す図である。It is a figure which shows the example of the output of a matching result by the text matching system of FIG. 図１の集積の結果として更新された基軸文章ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of basic text DB updated as a result of the accumulation | storage of FIG. 図１の文章対応付けシステムの動作の概要を説明する図である。It is a figure explaining the outline | summary of operation | movement of the text matching system of FIG. 本発明の実施の形態２に係る実業務法令ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of actual business regulation DB which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る基軸入力ＤＢの構成の例を示す図である。It is a figure which shows the example of a structure of basic axis input DB which concerns on Embodiment 2 of this invention.

以下、この発明の実施の形態を添付図面に基づいて説明する。
実施の形態１．
図１に、本発明に係る文章対応付けシステム１００の構成を示す。文章対応付けシステム１００は、外部から入力される複数の文章に基づいて、文章間の対応付けを行うシステムである。
文章対応付けシステム１００は周知のコンピュータとしての構成を有し、演算を行う演算手段１０と、情報を格納する記憶手段２０とを備える。また、図示しないが、文章対応付けシステム１００は、外部からのデータの入力を受け取る入力部と、外部に対してデータを出力する出力部とを有する。演算手段１０はＣＰＵ（中央処理装置）を含み、記憶手段２０はメモリおよびＨＤＤ（ハードディスクドライブ）を含み、入力部はキーボードおよびマウスを含み、出力部はディスプレイおよびプリンタを含む。また、文章対応付けシステム１００は、通信ネットワークに対する入力装置と出力装置とを兼ねるネットワークインタフェース（図示せず）を備える。
また、図示しないが、記憶手段２０には、文章対応付けシステム１００の動作を規定する文章対応付けプログラムが格納されている。コンピュータである文章対応付けシステム１００は、この文章対応付けプログラムを実行することによって、本明細書に記載する機能を実現する。 Embodiments of the present invention will be described below with reference to the accompanying drawings.
Embodiment 1 FIG.
In FIG. 1, the structure of the text matching system 100 which concerns on this invention is shown. The sentence association system 100 is a system that associates sentences based on a plurality of sentences input from the outside.
The sentence association system 100 has a configuration as a well-known computer, and includes a calculation unit 10 that performs a calculation and a storage unit 20 that stores information. Although not shown, the sentence association system 100 includes an input unit that receives input of data from the outside and an output unit that outputs data to the outside. The computing means 10 includes a CPU (central processing unit), the storage means 20 includes a memory and an HDD (hard disk drive), the input unit includes a keyboard and a mouse, and the output unit includes a display and a printer. Moreover, the text matching system 100 includes a network interface (not shown) that serves as both an input device and an output device for a communication network.
Although not shown, the storage unit 20 stores a sentence association program that defines the operation of the sentence association system 100. The sentence association system 100, which is a computer, implements the functions described in this specification by executing this sentence association program.

記憶手段２０は、ＤＢ（データベース）として、基軸文章ＤＢ２１、実業務文章ＤＢ２２、基軸単語ＤＢ２５、基軸単語拡張ＤＢ２６、実業務単語ＤＢ２７、突合詳細ＤＢ２８、集計ＤＢ２９、および対応付け結果ＤＢ３０を記憶し格納する。これらのＤＢは、たとえばＲＤＢ（関係型データベース）として構築されるが、他の形式で構築されてもよい。 The storage means 20 stores and stores the basic sentence DB 21, the actual business sentence DB 22, the basic word DB 25, the basic word expansion DB 26, the actual business word DB 27, the matching detail DB 28, the aggregation DB 29, and the association result DB 30 as the DB (database). To do. These DBs are constructed, for example, as RDBs (relational database), but may be constructed in other formats.

図２に、基軸文章ＤＢ２１の構成の例を示す。図２の各行が１つの基軸文章に対応する。これらの基軸文章は、組織の業務の内容を表すものである。また、これらの基軸文章は、文章対応付けシステム１００が文章間の対応付けを行う際の基軸となるものである。すなわち、文章対応付けシステム１００は、他の文章が、この基軸文章のいずれに対応するかを決定することにより、文章間の対応付けを行う。
基軸文章ＤＢ２１は、各基軸文章について、項目名Ｌ１，Ｌ２，Ｌ３，ＳＡ１，ＳＡ２，ＳＡ３，ＮＡＩＹＯＵ，ＩＮＰＵＴ，ＯＵＴＰＵＴ，ＲＥＦ，ＭＥＴＨＯＤ，Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２で表される情報を記憶する。 FIG. 2 shows an example of the configuration of the basic sentence DB 21. Each line in FIG. 2 corresponds to one basic sentence. These basic texts represent the contents of the business of the organization. Moreover, these basic texts are used as the basic axes when the text matching system 100 performs matching between texts. That is, the sentence association system 100 associates sentences by determining which of the basic sentences the other sentences correspond to.
The basic sentence DB 21 is information represented by item names L1, L2, L3, SA1, SA2, SA3, NAIYOU, INPUT, OUTPUT, REF, METHOD, J1, J2, K1, K2, G1, and G2 for each basic sentence. Remember.

Ｌ１，Ｌ２，Ｌ３は、その基軸文章を特定するための情報としてのＥＡ番号を表す項目である。たとえば、Ｌ１は大項目の分類、Ｌ２は中項目の分類、Ｌ３は小項目の分類を、それぞれ表す。例として、図２の最上行の基軸文章は、Ｌ１＝１、Ｌ２＝１、Ｌ３＝０という値の組み合わせによって特定される。なお、以下の説明において、基軸文章を特定する際、３つのＥＡ番号を順に記して「基軸文章（１−１−０）」のように表記する。
ＳＡ１，ＳＡ２，ＳＡ３は、作業概要すなわちその基軸文章の要点を表す項目である。これらはたとえばそれぞれ大項目、中項目、小項目として、ＥＡ番号に対応する項目を表す。 L1, L2, and L3 are items representing EA numbers as information for specifying the basic sentence. For example, L1 represents a large item classification, L2 represents a medium item classification, and L3 represents a small item classification. As an example, the basic text in the top row in FIG. 2 is specified by a combination of values L1 = 1, L2 = 1, and L3 = 0. In the following description, when specifying a basic sentence, three EA numbers are written in order and expressed as “base sentence (1-1-0)”.
SA1, SA2 and SA3 are items representing the outline of the work, that is, the main points of the basic sentence. These represent, for example, items corresponding to the EA number as large items, medium items, and small items, respectively.

ＮＡＩＹＯＵはその基軸文章に対応する作業の内容を表す項目である。この項目はたとえば自治体の業務の処理内容を表す文字列を含む。
ＩＮＰＵＴおよびＯＵＴＰＵＴは、その作業を行う際の入力および出力となる資料を特定する情報を表す項目である。これらは、たとえば資料の名称を表す文字列を含む。図２の例では、基軸文章（１−１−０）は、自治体の担当者が、「課税対象者情報」と名付けられた資料に基づいて、たとえば「総括表」と名付けられた資料を作成するという作業内容を表す。 NAIYOU is an item representing the content of work corresponding to the basic sentence. This item includes, for example, a character string representing the processing contents of the municipal business.
INPUT and OUTPUT are items representing information for specifying materials to be input and output when performing the work. These include, for example, a character string representing the name of the material. In the example of FIG. 2, the basic text (1-1-0) is created by the person in charge of the local government based on the material named “Taxable Person Information”, for example, “Summary Table”. Indicates the work content of

ＲＥＦは、その基軸文章に関連するその他の情報を表す項目である。
ＭＥＴＨＯＤは、その作業の実施方法を表し、手作業を含むか否か、コンピュータシステムによる処理を含むか否か、および、外部委託を含むか否か、を表す項目である。
Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２は、作業に関係する法令の箇条番号（関係法令箇条番号）を表す項目であり、この順に、条番、条附番、項番、項附番、号番、号附番に対応する。たとえば第３１７条の６第１項の場合、Ｊ１＝３１７であり、Ｊ２＝６であり、Ｋ１＝１となる。なお、この例では基軸文章ＤＢ２１に含まれる基軸文章すべてが同一の法令（たとえば地方税法）のみに関連するものであるため法令名は記憶されないが、法令名を表す項目が追加で設けられてもよい。 REF is an item representing other information related to the basic sentence.
“METHOD” represents an execution method of the work, and is an item representing whether or not manual work is included, whether or not processing by a computer system is included, and whether or not outsourcing is included.
J1, J2, K1, K2, G1, and G2 are items that represent item numbers of laws and regulations related to work (related item numbers), and in this order, item numbers, item numbers, item numbers, item numbers, Corresponds to issue numbers and issue numbers. For example, in Article 317-6, first term, J1 = 317, J2 = 6, and K1 = 1. In this example, since all of the basic texts included in the basic text DB 21 relate to only the same law (for example, local tax law), the law name is not stored, but an item representing the law name may be additionally provided. Good.

この基軸文章ＤＢ２１は、たとえば図３に示す表に基づいて作成される。ここで、図３は総務省が規定する自治体ＥＡ参照モデルの例である。この参照モデルの様式は上述の非特許文献１に開示されている。また、このモデルに沿って作成された実際のデータ、すなわち図３に示す表の内容は、自治体が個別に総務省に問い合わせることによって入手可能である。
このデータをＤＢに入力した後、図２の項目名をそれぞれ所定の列に割り振ることで、基軸文章ＤＢ２１を作成することができる。項目名の割り振りは、たとえば専門の担当者が行うが、あらかじめ決められた規則に従って文章対応付けシステム１００が自動的に行ってもよい。 The basic sentence DB 21 is created based on, for example, a table shown in FIG. Here, FIG. 3 is an example of a local government EA reference model defined by the Ministry of Internal Affairs and Communications. The format of this reference model is disclosed in Non-Patent Document 1 described above. In addition, the actual data created according to this model, that is, the contents of the table shown in FIG. 3 can be obtained by the local government inquiring individually from the Ministry of Internal Affairs and Communications.
After inputting this data into the DB, the basic sentence DB 21 can be created by assigning the item names in FIG. 2 to predetermined columns. The assignment of item names is performed by, for example, a specialized person in charge, but may be automatically performed by the text matching system 100 according to a predetermined rule.

図４に、実業務文章ＤＢ２２の構成の例を示す。図４の各行が１つの実業務文章に対応する。これらの実業務文章は、組織の業務の内容を表すものである。また、これらの実業務文章は、文章対応付けシステム１００が基軸文章に対して対応付けを行う対象となるものである。すなわち、文章対応付けシステム１００は、実業務文章のそれぞれについて、上述の基軸文章のいずれかへの対応付けを行う。 FIG. 4 shows an example of the configuration of the actual business text DB 22. Each line in FIG. 4 corresponds to one actual business sentence. These actual business sentences represent the contents of the business of the organization. In addition, these actual business sentences are objects to which the sentence association system 100 associates with the basic sentence. That is, the text association system 100 associates each of the actual business texts with any of the above-described basic texts.

実業務文章ＤＢ２２は、各基軸文章について、項目名Ｌ１，Ｌ２，Ｌ３，ＳＡ１，ＳＡ２，ＳＡ３，ＮＡＩＹＯＵ，Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２で表される情報を記憶する。これらの項目は、上述の基軸文章ＤＢ２１と同様に、実業務文章に関して、その実業務文章を特定するための情報（業務番号）、実業務文章の作業概要、および作業内容を表す。以下では、基軸文章と同様にして、実業務文章を特定する際に３つの業務番号を順に記して「実業務文章（１−１−０）」のように表記する。
なお、実業務文章ＤＢ２２は、図４に示される項目に加え、基軸文章ＤＢ２１と同様のＩＮＰＵＴ，ＯＵＴＰＵＴ，ＲＥＦ，ＭＥＴＨＯＤの項目を含んでもよい。 The actual business text DB 22 stores information represented by item names L1, L2, L3, SA1, SA2, SA3, NAIYO, J1, J2, K1, K2, G1, and G2 for each basic text. These items represent the information (work number) for identifying the actual business text, the work outline of the actual business text, and the work content regarding the actual business text, in the same manner as the basic text DB 21 described above. In the following, in the same way as the basic text, when specifying the actual business text, three business numbers are written in order and expressed as “actual business text (1-1-0)”.
In addition to the items shown in FIG. 4, the actual business sentence DB 22 may include the same INPUT, OUTPUT, REF, and METHOD items as the basic sentence DB 21.

この実業務文章ＤＢ２２は、たとえば図５に示す表に基づいて作成される。ここで、図５は自治体における実際の作業を記述する表である。このデータをＤＢに入力した後、図４の項目名をそれぞれ所定の列に割り振ることで、実業務文章ＤＢ２２を作成することができる。項目名の割り振りは、たとえば専門の担当者が行うが、あらかじめ決められた規則に従って文章対応付けシステム１００が自動的に行ってもよい。 The actual business sentence DB 22 is created based on, for example, a table shown in FIG. Here, FIG. 5 is a table describing actual work in the local government. After inputting this data into the DB, the actual business sentence DB 22 can be created by assigning the item names of FIG. 4 to predetermined columns. The assignment of item names is performed by, for example, a specialized person in charge, but may be automatically performed by the text matching system 100 according to a predetermined rule.

図６に、基軸単語ＤＢ２５の構成の例を示す。基軸単語ＤＢ２５は、基軸文章に含まれる単語（以下「基軸単語」と称する）それぞれに関連する情報を表すものである。この基軸単語ＤＢ２５は、基軸文章ＤＢ２１の所定の項目、たとえばＮＡＩＹＯＵに記憶される文章に含まれる単語に基づいて作成される。図６の例では、図２の基軸文章（１−１−０）および基軸文章（１−２−０）の作業内容に対応する部分のみが示されている。 FIG. 6 shows an example of the configuration of the basic word DB 25. The base word DB 25 represents information related to each word (hereinafter referred to as “base word”) included in the base sentence. The basic word DB 25 is created based on words included in predetermined items of the basic sentence DB 21, for example, sentences stored in the NAIYOU. In the example of FIG. 6, only the part corresponding to the work content of the basic sentence (1-1-0) and the basic sentence (1-2-0) of FIG. 2 is shown.

基軸単語ＤＢ２５は、各基軸単語について、項目名Ｌ１，Ｌ２，Ｌ３，ＦＵＢＡＮ，ＴＡＮＧＯ，ＨＩＮＳＩで表される情報を記憶する。
Ｌ１，Ｌ２，Ｌ３は、基軸文章ＤＢ２１と同様である。ＦＵＢＡＮは、同一の基軸文章に含まれる基軸単語（すなわちＥＡ番号のＬ１，Ｌ２，Ｌ３がすべて一致する基軸単語）のそれぞれを互いに識別するために付される数字としての単語附番を表す項目である。ＴＡＮＧＯはその基軸単語に対応する文字列を表す項目である。ＨＩＮＳＩはその基軸単語の品詞を表す項目である。 The basic word DB 25 stores information represented by item names L1, L2, L3, FUBAN, TANGO, and HINSI for each basic word.
L1, L2, and L3 are the same as the basic sentence DB 21. FUBAN is an item representing a word number as a number assigned to identify each of the basic words included in the same basic sentence (that is, the basic words having the same EA numbers L1, L2, and L3). is there. TANGO is an item representing a character string corresponding to the basic word. HINSI is an item representing the part of speech of the basic word.

図７は、基軸単語拡張ＤＢ２６の構成の例を示す。基軸単語拡張ＤＢ２６は、基軸単語ＤＢ２５に含まれる基軸単語のそれぞれに、さらに付加的な情報を関連付けるものである。なお、図７の例では、図６と同様に、図２の基軸文章（１−１−０）および基軸文章（１−２−０）の作業内容に対応する部分のみが示されている。
基軸単語拡張ＤＢ２６は、各基軸単語について、基軸単語ＤＢ２５と同様に、項目名Ｌ１，Ｌ２，Ｌ３，ＦＵＢＡＮ，ＴＡＮＧＯ，ＨＩＮＳＩで表される情報を記憶する。さらに、基軸単語拡張ＤＢ２６は、各基軸単語について、項目名ＯＭＯＭＩおよびＤＯＵＧＩＧＯで表される情報を関連付けて記憶する。ＯＭＯＭＩは、文章対応付けシステム１００が文章の対応付けを決定する際にその基軸単語がいかなる重みを持つかを表す項目である。ＤＯＵＧＩＧＯは、その基軸単語と同一の意味または類似した意味を持つ単語（またはそのような単語のリスト）を表す項目である。
なお、図７の例では、品詞（項目名ＨＩＮＳＩ）が名詞である単語のみに０でない重み（項目名ＯＭＯＭＩ）の値が設定され、それ以外の単語については重みがすべて０となっている。 FIG. 7 shows an example of the configuration of the basic word expansion DB 26. The basic word extension DB 26 is for associating additional information with each of the basic words included in the basic word DB 25. In the example of FIG. 7, as in FIG. 6, only portions corresponding to the work contents of the basic sentence (1-1-0) and basic sentence (1-2-0) in FIG. 2 are shown.
The base word expansion DB 26 stores information represented by item names L1, L2, L3, FUBAN, TANGO, and HINSI for each base word as in the base word DB25. Further, the basic word expansion DB 26 stores information represented by item names OMOMI and DOUGIGO in association with each basic word. OMOMI is an item that indicates what weight the core word has when the text matching system 100 determines text matching. DOUGIGO is an item representing a word (or a list of such words) having the same meaning or similar meaning to the basic word.
In the example of FIG. 7, a non-zero weight (item name OMOMI) value is set only for words whose part of speech (item name HINSI) is a noun, and all other words have zero weight.

図８は、実業務単語ＤＢ２７の構成の例を示す。実業務単語ＤＢ２７は、実業務文章に含まれる単語（以下「実業務単語」と称する）それぞれに関連する情報を表すものである。この実業務単語ＤＢ２７は、実業務文章ＤＢ２２の所定の項目、たとえばＮＡＩＹＯＵに記憶される文章に含まれる単語に基づいて作成される。なお、この例では、この項目名「ＮＡＩＹＯＵ」は、上述の基軸単語ＤＢ２５の基となる項目名「ＮＡＩＹＯＵ」と同一である。図８の例では、図４の実業務文章（１−１−０）および実業務文章（１−２−０）の作業内容に対応する部分のみが示されている。
実業務単語ＤＢ２７は、各実業務単語について、上述の基軸単語ＤＢ２５と同様に、項目名Ｌ１，Ｌ２，Ｌ３，ＦＵＢＡＮ，ＴＡＮＧＯ，ＨＩＮＳＩで表される情報を関連付ける。 FIG. 8 shows an example of the configuration of the actual business word DB 27. The actual business word DB 27 represents information related to each word (hereinafter referred to as “actual business word”) included in the actual business text. The actual business word DB 27 is created based on words included in a predetermined item of the actual business text DB 22, for example, a text stored in the NAIYOU. In this example, this item name “NAIYOU” is the same as the item name “NAIYOU” that is the basis of the above-mentioned basic word DB 25. In the example of FIG. 8, only the portions corresponding to the work contents of the actual business sentence (1-1-0) and the actual business sentence (1-2-0) of FIG. 4 are shown.
The actual business word DB 27 associates information represented by the item names L1, L2, L3, FUBAN, TANGO, and HINSI with each actual business word in the same manner as the basic word DB 25 described above.

図９〜図１２は、突合詳細ＤＢ２８の構成の例を示す。突合詳細ＤＢ２８は、実業務単語のそれぞれを基軸単語のそれぞれと突合し、これらが一致するかどうかを判定した結果と、その結果に基づいて算出される文章類似度とを表すものである。
この突合の結果は、文章の組み合わせを単位として記憶される。例として、
‐図９は実業務文章（１−１−０）を基軸文章（１−１−０）と突合した結果であり、
‐図１０は実業務文章（１−１−０）を基軸文章（１−２−０）と突合した結果であり、
‐図１１は実業務文章（１−２−０）を基軸文章（１−１−０）と突合した結果であり、
‐図１２は実業務文章（１−２−０）を基軸文章（１−２−０）と突合した結果である。 9 to 12 show examples of the configuration of the matching details DB 28. The collation detail DB 28 represents the result of determining whether or not the actual business words match each of the basic words and whether or not they match, and the sentence similarity calculated based on the result.
The result of this match is stored in units of sentence combinations. As an example,
-FIG. 9 shows the result of matching the actual business sentence (1-1-0) with the basic sentence (1-1-0),
-FIG. 10 shows the result of matching the actual business sentence (1-1-0) with the basic sentence (1-2-0),
-FIG. 11 shows the result of matching the actual business sentence (1-2-0) with the basic sentence (1-1-0),
FIG. 12 shows the result of matching the actual business sentence (1-2-0) with the basic sentence (1-2-0).

突合詳細ＤＢ２８は、各実業務単語に基づいて作成され、上述の実業務単語ＤＢ２７と同様に、項目名Ｌ１，Ｌ２，Ｌ３，ＦＵＢＡＮ，ＴＡＮＧＯで表される情報を記憶する。さらに、突合詳細ＤＢ２８は、各実業務単語について、項目名ＩＴＴＩＤＯ，ＩＴＴＩＤＯ２，ＧＡＴＴＩで表される情報を関連付けて記憶する。
ＩＴＴＩＤＯおよびＩＴＴＩＤＯ２は、基軸文章に含まれる単語と、実業務文章に含まれる単語とが一致する度合いを表す項目である。このうち、ＩＴＴＩＤＯは、その実業務単語と一致する基軸単語の重みＯＭＯＭＩの合計を表す。ＩＴＴＩＤＯ２は、その実業務単語が基軸単語の同義語と一致する場合、すなわち基軸単語拡張ＤＢ２６の項目ＤＯＵＧＩＧＯに含まれる単語と一致する場合の、その基軸単語の重みＯＭＯＭＩの合計を表す。ＧＡＴＴＩは、その実業務単語と合致した基軸単語、すなわちＩＴＴＩＤＯおよびＩＴＴＩＤＯ２に関連する基軸単語の単語附番ＦＵＢＡＮ（複数ある場合はそのリスト）を表す。 The matching detail DB 28 is created based on each actual business word, and stores information represented by the item names L1, L2, L3, FUBAN, and TANGO in the same manner as the actual business word DB 27 described above. Furthermore, the matching detail DB 28 stores information represented by item names ITTIDO, ITTIDO2, GATTI in association with each actual business word.
ITTIDO and ITTIDO2 are items representing the degree of matching between words included in the base sentence and words included in the actual business sentence. Of these, ITTIDO represents the sum of the weights OMOMI of the base word that matches the actual business word. ITTIDO2 represents the total of the weights OMOMI of the base word when the actual business word matches the synonym of the base word, that is, matches the word included in the item DOUGIGO of the base word extension DB 26. GATTI represents a basic word that matches the actual business word, that is, a word number FUBAN of the basic word related to ITTIDO and ITTIDO2 (or a list thereof when there are a plurality of basic words).

図９において、実業務文章（１−１−０）の単語附番３に対応する単語、すなわちＬ１＝１，Ｌ２＝１，Ｌ３＝０，ＦＵＢＡＮ＝３である実業務単語は「義務者」であるが、単語「義務者」は突合対象の基軸文章（１−１−０）中には一度だけ出現している（図７のＬ１＝１，Ｌ２＝１，Ｌ３＝０，ＦＵＢＡＮ＝２８）。また、その重みＯＭＯＭＩは２である。よって、この実業務単語の一致度ＩＴＴＩＤＯは２であり、合致番号ＧＡＴＴＩは２８となる。さらに、単語「義務者」は、突合対象の基軸文章（１−１−０）の同義語としては出現しないので、同義語に対する一致度ＩＴＴＩＤＯ２は０となる。 In FIG. 9, the word corresponding to the word number 3 of the actual business sentence (1-1-0), that is, the actual business word with L1 = 1, L2 = 1, L3 = 0, and FUBAN = 3 is “obligor”. However, the word “obligor” appears only once in the basic sentence (1-1-0) to be matched (L1 = 1, L2 = 1, L3 = 0, FUBAN = 28 in FIG. 7). ). The weight OMOMI is 2. Therefore, the coincidence degree ITTIDO of this actual business word is 2, and the coincidence number GATTI is 28. Furthermore, since the word “obligor” does not appear as a synonym of the base sentence (1-1-0) to be matched, the matching degree ITTIDO2 for the synonym is zero.

また、実業務文章（１−１−０）の単語附番１７に対応する実業務単語は「送付」であるが、この単語は突合対象の基軸文章（１−１−０）中には出現せず、ＩＴＴＩＤＯは０となる。ただし、基軸文章（１−１−０）の単語附番２２および３３の基軸単語「発送」には同義語「送付」が関連付けられており、これらと一致する。また、これらの重みはそれぞれ５である。よって、この単語のＩＴＴＩＤＯ２は１０であり、合致番号ＧＡＴＴＩは「２２，３３」となる。
このようにして定義される一致度の総合計、すなわちＩＴＴＩＤＯの合計とＩＴＴＩＤＯ２の合計との和が、実業務文章と基軸文章とが一致する度合いを表す文章類似度となる。 The actual business word corresponding to the word number 17 of the actual business sentence (1-1-0) is “Send”, but this word appears in the basic sentence (1-1-0) to be matched. Without it, ITTIDO becomes zero. However, the synonym “send” is associated with the base word “shipping” of the word numbers 22 and 33 of the base sentence (1-1-0), and they match. Each of these weights is 5. Therefore, ITTIDO2 of this word is 10, and the match number GATTI is “22, 33”.
The total sum of coincidences defined in this way, that is, the sum of the sum of ITTIDO and the sum of ITTIDO2, becomes the sentence similarity representing the degree of matching between the actual business sentence and the base sentence.

さらに、図１０において、実業務文章（１−１−０）の単語附番３の「義務者」は突合対象の基軸文章（１−２−０）中に一度だけ出現している（図７のＬ１＝１，Ｌ２＝２，Ｌ３＝０，ＦＵＢＡＮ＝１３）。また、その重みＯＭＯＭＩは２である。よって、この実業務単語の一致度ＩＴＴＩＤＯは２であり、合致番号ＧＡＴＴＩは１３となる。さらに、単語「義務者」は、突合対象の基軸文章（１−２−０）の同義語としては出現しないので、同義語に対する一致度ＩＴＴＩＤＯ２は０となる。 Further, in FIG. 10, the “obligor” of the word number 3 of the actual business sentence (1-1-0) appears only once in the basic sentence (1-2-0) to be matched (FIG. 7). L1 = 1, L2 = 2, L3 = 0, FUBAN = 13). The weight OMOMI is 2. Therefore, the coincidence degree ITTIDO of this actual business word is 2, and the coincidence number GATTI is 13. Furthermore, since the word “obligatory” does not appear as a synonym of the base sentence (1-2-0) to be matched, the matching degree ITTIDO2 for the synonym is zero.

また、実業務文章（１−１−０）の単語附番６の「給与」は突合対象の基軸文章（１−１−０）中に一度だけ出現している（図７のＬ１＝１，Ｌ２＝２，Ｌ３＝０，ＦＵＢＡＮ＝１）。また、その重みＯＭＯＭＩは２である。よって、この実業務単語の一致度ＩＴＴＩＤＯは２であり、合致番号ＧＡＴＴＩは１となる。さらに、単語「給与」は、突合対象の基軸文章（１−２−０）の同義語としては出現しないので、同義語に対する一致度ＩＴＴＩＤＯ２は０となる。
このようにして、一致度の総合計、すなわち文章類似度は２＋２＝４となる。 Further, “salary” of the word number 6 of the actual business sentence (1-1-0) appears only once in the basic sentence (1-1-0) to be matched (L1 = 1 in FIG. 7). L2 = 2, L3 = 0, FUBAN = 1). The weight OMOMI is 2. Therefore, the coincidence degree ITTIDO of this actual business word is 2, and the coincidence number GATTI is 1. Furthermore, since the word “salary” does not appear as a synonym of the basic sentence (1-2-0) to be matched, the matching degree ITTIDO2 for the synonym is zero.
In this way, the total degree of coincidence, that is, the sentence similarity is 2 + 2 = 4.

同様にして、基軸文章（１−１−０）と実業務文章（１−２−０）とが突合され、その結果として図１１の内容が作成される。また、基軸文章（１−２−０）と実業務文章（１−２−０）とが突合され、その結果として図１２の内容が作成される。 Similarly, the basic sentence (1-1-0) and the actual business sentence (1-2-0) are collated, and as a result, the contents of FIG. 11 are created. Further, the basic sentence (1-2-0) and the actual business sentence (1-2-0) are collated, and as a result, the contents of FIG. 12 are created.

図１３は、集計ＤＢ２９の構成の例を示す。集計ＤＢ２９は、各基軸文章と各実業務文章との文章類似度を集計したものである。たとえば、基軸文章（１−２−０）と実業務文章（１−１−０）との組み合わせに対しては「４」が記憶されているが、これは図１０の文章類似度が４であることに対応する。 FIG. 13 shows an example of the configuration of the summary DB 29. The tabulation DB 29 tabulates the sentence similarity between each basic sentence and each actual work sentence. For example, “4” is stored for the combination of the basic sentence (1-2-0) and the actual business sentence (1-1-0), and this is a sentence similarity of 4 in FIG. Corresponding to something.

図１４は、対応付け結果ＤＢ３０の構成の例を示す。対応付け結果ＤＢ３０は、各基軸文章に対して、実業務文章のいずれが対応付けられるかを表す。図１５は、図１４の対応関係を概略的に示す。これらの対応関係は、後述するように、図１３の集計ＤＢ２９に基づいて決定される。各実業務文章について、最も大きい文章類似度の値（すなわち、各列における最大値）を与える基軸文章に対して、その実業務文章が対応付けられている。
この例では、たとえば基軸文章（１−１−０）に対しては複数の実業務文章（１−１−０）および（１−２−０）が対応付けられており、また基軸文章（１−５−０）に対してはいずれの実業務文章も対応付けられていない。さらに、基軸文章（１−３−０）および基軸文章（１−４−０）のように、基軸文章ＤＢ２１における前後関係と、それぞれ対応する業務文章の実業務文章ＤＢ２２における前後関係とが逆転する対応付けも含まれる。 FIG. 14 shows an example of the configuration of the association result DB 30. The association result DB 30 represents which actual business sentence is associated with each basic sentence. FIG. 15 schematically shows the correspondence of FIG. These correspondences are determined based on the tabulation DB 29 in FIG. 13 as will be described later. For each actual business sentence, the actual business sentence is associated with a basic sentence that gives the largest sentence similarity value (that is, the maximum value in each column).
In this example, for example, the basic sentence (1-1-0) is associated with a plurality of actual business sentences (1-1-0) and (1-2-0), and the basic sentence (1-0) No actual business text is associated with -5-0). Further, as in the basic sentence (1-3-0) and the basic sentence (1-4-0), the context in the basic sentence DB 21 and the context in the actual business sentence DB 22 of the corresponding business sentences are reversed. Correspondence is also included.

以上のように構成される文章対応付けシステム１００の動作を、図１６および図１７に示すフローチャートを用いて説明する。
図１６は、文章対応付けシステム１００が基軸文章ＤＢ２１に基づいて基軸単語ＤＢ２５および基軸単語拡張ＤＢ２６を作成する際の処理の流れを表す。この処理は、たとえば基軸文章ＤＢ２１が作成または変更されるたびに実行される。
まず、文章対応付けシステム１００の演算手段１０は基軸文章ＤＢ２１を読み込み、これによって基軸文章をすべて入力する（ステップＳ１）。 The operation of the sentence association system 100 configured as described above will be described using the flowcharts shown in FIGS. 16 and 17.
FIG. 16 shows a flow of processing when the sentence association system 100 creates the basic word DB 25 and the basic word expansion DB 26 based on the basic sentence DB 21. This process is executed every time the basic sentence DB 21 is created or changed, for example.
First, the computing means 10 of the text association system 100 reads the base text DB 21 and inputs all the base text (step S1).

次に、演算手段１０は各基軸文章の項目「ＮＡＩＹＯＵ」に含まれる文字列に対して形態素解析を行い、その結果に基づいて基軸単語ＤＢ２５を作成する（ステップＳ２）。この際、演算手段１０は、各基軸文章における単語の出現順序に基づいて単語附番を採番する。なお形態素解析とは、日本語等の自然言語による文を単語に分解し、各単語の品詞を特定する処理のことである。形態素解析を行う技術は公知であるので、詳細な説明は省略する。 Next, the computing means 10 performs a morphological analysis on the character string included in the item “NAIYOU” of each basic sentence, and creates a basic word DB 25 based on the result (step S2). At this time, the computing means 10 assigns word numbering based on the appearance order of words in each basic sentence. Note that the morphological analysis is a process of decomposing a sentence in a natural language such as Japanese into words and specifying the part of speech of each word. Since a technique for performing morphological analysis is known, detailed description thereof is omitted.

次に、演算手段１０は、基軸単語ＤＢ２５に含まれる各基軸単語について、重みおよび同義語に関する情報の入力を要求して受け付け、この入力に基づいて基軸単語拡張ＤＢ２６を作成する（ステップＳ３）。この入力は、たとえば文章対応付けシステム１００の管理者によってなされる。
ここで、管理者は、突合の妥当性を向上させるため、より重要な単語にはより大きな重みを付加しておく。たとえば、管理者は、図７に示すように、基軸文章（１−１−０）の単語附番２２の単語「発送」に対して重み５を付与する。また、「て」、「に」、「を」、「は」等の、組織の業務とは直接関係がない単語が対応付けに影響を与えないようにするために、文章間の対応付けに利用すべき品詞（たとえば名詞）以外については、重みを０にする。このようにして管理者は、形態素解析で出力された品詞のうち、実業務文章との突合および対応付けの際に利用する品詞を指定するパラメータを入力することができる。
また、管理者は、突合の妥当性を向上させるため、重要な単語には同義語を関連付ける。たとえば、管理者は、図７に示すように、基軸文章（１−１−０）の単語附番２２の単語「発送」に対して同義語「送付」を関連付ける。 Next, the computing means 10 requests and accepts input of information relating to weights and synonyms for each of the basic words included in the basic word DB 25, and creates the basic word expansion DB 26 based on this input (step S3). This input is made, for example, by an administrator of the text matching system 100.
Here, the manager adds a greater weight to more important words in order to improve the validity of the match. For example, as shown in FIG. 7, the administrator assigns a weight 5 to the word “shipping” of the word number 22 of the basic sentence (1-1-0). In addition, in order to prevent words that are not directly related to the organization's business such as “te”, “ni”, “ha”, “ha”, etc. from affecting the association, The weight is set to 0 for parts of speech other than parts of speech to be used (for example, nouns). In this way, the administrator can input a parameter for designating a part of speech to be used in matching and association with the actual business sentence among the parts of speech output in the morphological analysis.
The manager also associates synonyms with important words to improve the validity of the match. For example, as shown in FIG. 7, the administrator associates the synonym “send” with the word “ship” of the word number 22 of the base sentence (1-1-0).

図１７は、文章対応付けシステム１００が基軸文章と実業務文章とを対応付ける際の処理の流れを表す。この処理は、たとえば実業務文章ＤＢ２２が作成されるたびに実行される。
まず、文章対応付けシステム１００の演算手段１０は実業務文章ＤＢ２２を読み込み、これによって実業務文章をすべて入力する（ステップＳ１１）。次に、演算手段１０は、各実業務文章の項目「ＮＡＩＹＯＵ」に含まれる文字列に対して形態素解析を行い、その結果に基づいて実業務単語ＤＢ２７を作成する（ステップＳ１２）。この処理は図１６のステップＳ２と同様にしてなされる。 FIG. 17 shows the flow of processing when the text matching system 100 matches the basic text and the actual business text. This process is executed each time the actual business sentence DB 22 is created, for example.
First, the calculation means 10 of the text association system 100 reads the actual business text DB 22 and inputs all the actual business text (step S11). Next, the computing means 10 performs morphological analysis on the character string included in the item “NAIYOU” of each actual business sentence, and creates the actual business word DB 27 based on the result (step S12). This process is performed in the same manner as step S2 in FIG.

次に、演算手段１０は、基軸単語拡張ＤＢ２６および実業務単語ＤＢ２７を参照し、各基軸文章と各実業務文章とを突合する突合処理を行い、その結果に基づいて突合詳細ＤＢ２８を作成する（ステップＳ１３）。ここで、演算手段１０は、同一のＥＡ番号を有する基軸単語のグループを１つの基軸文章に対応するものとして扱い、同一の業務番号を有する実業務単語のグループを１つの実業務文章に対応するものとして扱い、１つの基軸文章と１つの実業務文章との組み合わせを単位として突合処理を行う。
演算手段１０は、すべての基軸文章とすべての実業務文章との組み合わせに対して、この突合処理を繰り返す。この際、組み合わせのそれぞれについて、図９〜図１２に示すように、単語自体の一致度であるＩＴＴＩＤＯの合計と、同義語に対する一致度であるＩＴＴＩＤＯ２の合計とを算出する。また、これらの合計の和として、その組み合わせに対する文章類似度を算出する。
図７の基軸単語拡張ＤＢ２６の例では名詞のみに０でない重みが与えられているので、演算手段１０は、基軸文章に含まれる名詞と、実業務文章に含まれる名詞とが一致する度合いとして、文章類似度を算出することになる。 Next, the computing means 10 refers to the basic word extension DB 26 and the actual business word DB 27, performs a matching process for matching each basic sentence and each actual business sentence, and creates a matching detail DB 28 based on the results ( Step S13). Here, the computing means 10 treats a group of basic words having the same EA number as corresponding to one basic sentence, and corresponds a group of actual business words having the same business number to one actual business sentence. It is treated as a thing, and the matching process is performed in units of a combination of one basic sentence and one actual business sentence.
The computing means 10 repeats this matching process for all combinations of the basic text and all the actual business text. At this time, as shown in FIGS. 9 to 12, for each combination, the sum of ITTIDO that is the degree of coincidence of the words themselves and the sum of ITTIDO2 that is the degree of coincidence with the synonym are calculated. Also, the sentence similarity for the combination is calculated as the sum of these sums.
In the example of the basic word extension DB 26 of FIG. 7, since only a noun is given a non-zero weight, the computing means 10 determines the degree of coincidence between the noun included in the basic sentence and the noun included in the actual business sentence. The sentence similarity is calculated.

次に、演算手段１０は、突合詳細ＤＢ２８の結果を集計し、集計ＤＢ２９を作成する（ステップＳ１４）。たとえば、図１０に示される、基軸文章（１−２−０）と基軸文章（１−１−０）と文章類似度は４であるので、集計ＤＢ２９においてこれに対応するフィールド、すなわちＥＡ番号「Ｌ１＝１，Ｌ２＝２，Ｌ３＝０」の行、業務番号「Ｌ１＝１，Ｌ２＝１，Ｌ３＝０」の列のフィールドの値は４となる。
このように、すべての突合結果について、その結果（文章類似度）を集計して、集計ＤＢ２９を作成する。 Next, the calculation means 10 totals the results of the matching detail DB 28 and creates a total DB 29 (step S14). For example, since the basic sentence (1-2-0), the basic sentence (1-1-0), and the sentence similarity shown in FIG. 10 are 4, the corresponding field in the tabulation DB 29, that is, the EA number “ The value of the field in the row of L1 = 1, L2 = 2, L3 = 0 and the column of the business number “L1 = 1, L2 = 1, L3 = 0” is 4.
In this way, for all the matching results, the results (sentence similarity) are totaled to create the tabulation DB 29.

次に、演算手段１０は、集計ＤＢ２９に基づいて文章の対応付けを行い、対応付け結果ＤＢ３０を作成する（ステップＳ１５）。ここで、演算手段１０は、各実業務文章について、最も大きい文章類似度の値（すなわち、集計ＤＢ２９の各列における最大値）を求め、その最大値を与える基軸文章に対して、その実業務文章を対応付ける。
たとえば図１３において、破線で囲んだ値が各列の最大値であるとすると、実業務文章（１−１−０）および実業務文章（１−２−０）はともに基軸文章（１−１−０）に対応付けられ、実業務文章（１−３−０）は基軸文章（１−２−０）に対応付けられることになる。このようにして、演算手段１０は文章類似度に基づいて対応付けを行い、これによって対応付け結果ＤＢ３０を作成する。
なお、この対応付けは、文章類似度の最大値に基づいてなされるのではなく、文章類似度が所定の閾値以上かどうかに基づいて行われてもよい。すなわち、ある実業務文章を、文章類似度が閾値以上となる基軸文章すべてに対応付けるものであってもよく、また、文章類似度が閾値以上となる基軸文章が存在しない場合には、いずれの基軸文章にも対応付けないものであってもよい。 Next, the computing means 10 associates sentences based on the summary DB 29 and creates an association result DB 30 (step S15). Here, the calculation means 10 obtains the largest sentence similarity value (that is, the maximum value in each column of the summary DB 29) for each actual business sentence, and the actual business sentence for the base sentence that gives the maximum value. Associate.
For example, in FIG. 13, if the value surrounded by the broken line is the maximum value of each column, the actual business sentence (1-1-0) and the actual business sentence (1-2-0) are both the base sentence (1-1). -0), the actual business sentence (1-3-0) is associated with the basic sentence (1-2-0). In this way, the calculation means 10 performs association based on the sentence similarity, thereby creating the association result DB 30.
Note that this association is not performed based on the maximum value of the sentence similarity, but may be performed based on whether the sentence similarity is equal to or higher than a predetermined threshold. That is, an actual business sentence may be associated with all of the basic sentences whose sentence similarity is equal to or higher than the threshold value, and if there is no basic sentence whose sentence similarity is equal to or higher than the threshold value, It may not be associated with a sentence.

次に、演算手段１０は、対応付け結果に基づいて出力用のデータを作成し、このデータに基づいて文章対応付けシステム１００の出力部を制御する。この制御に応じて、出力部は対応付け結果を文章対応付けシステム１００の外部に対して出力する（ステップＳ１６）。この出力は、たとえばプリンタ等の印刷装置による印刷処理として実行されるが、ディスプレイ等の表示装置による表示処理として実行されてもよい。 Next, the computing means 10 creates output data based on the association result, and controls the output unit of the sentence association system 100 based on this data. In response to this control, the output unit outputs the association result to the outside of the sentence association system 100 (step S16). This output is executed as a printing process by a printing apparatus such as a printer, but may be executed as a display process by a display apparatus such as a display.

図１８は、対応付け結果の出力の例（ＥＡ資料対応表）を示す。この例では、対応関係が表形式で表され、左側には基軸文章が、右側には対応する実業務文章が表示される。このような表を出力させるために、演算手段１０は、対応付け結果ＤＢ３０に記録されるＥＡ番号と業務番号との対応関係に基づき、まず、基軸文章ＤＢ２１に記録される基軸文章をＥＡ番号により抽出し、実業務文章ＤＢ２２に記録される実業務文章を業務番号により抽出し、次に、抽出した基軸文書と実業務文書とを、ＥＡ番号と業務番号との対応関係に基づいて関連付けてデータ化する。あるいは、演算手段１０は、入力部からＥＡ番号の指定を受け付けて、対応付け結果ＤＢ３０に記録される指定されたＥＡ番号と業務番号との対応関係に基づき、同様にして基軸文書と実業文書とを関連付けてデータ化してもよい。（図１８の例では、ＥＡ番号についてＬ１＝１となる基軸文書、すなわち作業概要の大が個人住民税となる基軸文書と、業務文書との対応付けがなされている。）ここで、演算手段１０は、基軸文章と、その基軸文章に対応付けられた実業務文章とを、左右に並列して出力することを示すものとして、出力用のデータを作成する。
なお、出力用のデータの形式は、たとえば関係型データベースアプリケーションが使用するファイル形式であるが、これは他の形式であってもよく、たとえばＣＳＶ形式、ＨＴＭＬ形式、ＸＭＬ形式、ＰＤＦ形式等であってもよい。 FIG. 18 shows an example (EA material correspondence table) of output of the association result. In this example, the correspondence is represented in a tabular form, the basic sentence is displayed on the left side, and the corresponding actual business sentence is displayed on the right side. In order to output such a table, the calculation means 10 first calculates the basic sentence recorded in the basic sentence DB 21 based on the EA number based on the correspondence between the EA number recorded in the association result DB 30 and the business number. Extract the actual business text recorded in the actual business text DB 22 by the business number, and then associate the extracted basic document and the actual business document based on the correspondence between the EA number and the business number, and data Turn into. Alternatively, the calculation means 10 receives the designation of the EA number from the input unit, and similarly, based on the correspondence between the designated EA number and the business number recorded in the association result DB 30, the basic document and the business document May be converted into data. (In the example of FIG. 18, the basic document in which L1 = 1 for the EA number, that is, the basic document whose work summary is the personal resident tax is associated with the business document.) 10 indicates that the basic sentence and the actual business sentence associated with the basic sentence are to be output in parallel on the left and right, and creates data for output.
The output data format is, for example, a file format used by a relational database application, but this may be another format, for example, CSV format, HTML format, XML format, PDF format or the like. May be.

次に、演算手段１０は、文章間の対応関係に基づいて、基軸文章ＤＢ２１に各実業務文章を追加して集積する（ステップＳ１７）。このステップはいわゆる「学習」に相当する。
ここで、演算手段１０は、各実業務文章を、その実業務文章が対応付けられた基軸文章に追加して、基軸文章ＤＢ２１に格納する。すなわち、文章対応付けシステム１００の記憶手段２０は、各基軸文章と、対応付けにおいてその基軸文章に対応付けられたすべての実業務文章とを、それぞれの基軸文章に集積して記憶することになる。 Next, the computing means 10 adds each actual business sentence to the basic sentence DB 21 and accumulates it based on the correspondence between the sentences (step S17). This step corresponds to so-called “learning”.
Here, the computing means 10 adds each actual business sentence to the basic sentence associated with the actual business sentence and stores it in the basic sentence DB 21. That is, the storage means 20 of the sentence association system 100 stores each basic sentence and all the actual business sentences associated with the basic sentence in the association in the respective basic sentences. .

図１９は、この集積の結果として更新された基軸文章ＤＢ２１の構成の例を示す。実業務文章ＤＢ２２に含まれる情報のうち、作業内容を表す項目、すなわち項目ＮＡＩＹＯＵと、作業に関係する法令の箇条番号を表す項目、すなわち項目Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２とにおいて、新たな情報が基軸文章ＤＢ２１に追加されている。ここで、図１９において、図２と比較して新たに追加された部分を破線で囲んで示す。
なお、これ以外の項目において新たな情報が追加されてもよい。また、情報の追加に応じて、基軸文章ＤＢ２１に新たな項目名が追加されてもよい。 FIG. 19 shows an example of the configuration of the basic sentence DB 21 updated as a result of this accumulation. Of the information included in the actual business text DB 22, the item representing the work content, that is, the item NAIYOU, and the item representing the item number of the law related to the work, that is, the items J1, J2, K1, K2, G1, G2, New information is added to the base sentence DB 21. Here, in FIG. 19, the part newly added compared with FIG. 2 is shown enclosed with a broken line.
Note that new information may be added in other items. Further, a new item name may be added to the base sentence DB 21 in accordance with the addition of information.

次に、演算手段１０は、ステップＳ１７で基軸文章ＤＢ２１に追加された文字列について、ステップＳ２（図１６）と同様の処理を行い、追加された文章に含まれる単語を基軸単語拡張ＤＢ２６に追加する（ステップＳ１８）。すなわち、追加された文章に基づいて基軸単語拡張ＤＢ２６を更新する。
さらに、演算手段１０は、基軸単語拡張ＤＢ２６に追加された単語について、ステップＳ３（図１６）と同様の処理を行い、重みおよび同義語を追加して基軸単語拡張ＤＢ２６を更新する（ステップＳ１９）。
このようにして、図１７の処理が実行されるたびに、文章対応付けシステム１００は実業務文章ＤＢ２２から新たな実業務文章を学習して取り込み、基軸単語拡張ＤＢ２６に新たな基軸単語を追加する。これによって基軸単語拡張ＤＢ２６における基軸文章と実際の業務との関係は現実をよりよく反映するものとなり、次回の対応付け処理においてより精度の高い結果を出すことができる。 Next, the computing means 10 performs the same process as in step S2 (FIG. 16) on the character string added to the base sentence DB 21 in step S17, and adds the word included in the added sentence to the base word extension DB 26. (Step S18). That is, the basic word expansion DB 26 is updated based on the added sentence.
Further, the computing means 10 performs the same processing as step S3 (FIG. 16) for the word added to the base word extension DB 26, and adds the weights and synonyms to update the base word extension DB 26 (step S19). .
In this way, each time the processing of FIG. 17 is executed, the sentence association system 100 learns and loads a new actual business sentence from the actual business sentence DB 22 and adds a new basic word to the basic word expansion DB 26. . As a result, the relationship between the basic sentence in the basic word expansion DB 26 and the actual business more accurately reflects the reality, and a more accurate result can be obtained in the next association processing.

以上のように説明される、実施の形態１に係る文章対応付けシステム１００の動作の概要を、図２０を用いてまとめると以下のようになる。
まず、文章対応付けシステム１００の管理者は、総務省のＥＡ資料（図３）をＲＤＢ化し、基軸文章ＤＢ２１を作成する（この処理は文章対応付けシステム１００によって自動的になされてもよい）。文章対応付けシステム１００は、形態素解析によって基軸単語ＤＢ２５を作成する（ステップＳ１，Ｓ２）。さらに、重みおよび同義語の入力を受け付け、基軸単語拡張ＤＢ２６を作成する（ステップＳ３）。 The outline of the operation of the text association system 100 according to Embodiment 1 described as described above is summarized as follows using FIG.
First, the administrator of the sentence association system 100 converts the EA material (FIG. 3) of the Ministry of Internal Affairs and Communications into an RDB and creates a basic sentence DB 21 (this process may be automatically performed by the sentence association system 100). The sentence association system 100 creates the base word DB 25 by morphological analysis (steps S1 and S2). Further, the input of weights and synonyms is received, and the basic word expansion DB 26 is created (step S3).

次に、文章対応付けシステム１００の管理者は、自治体の実業務説明文書（図５）をＲＤＢ化し、実業務文章ＤＢ２２を作成する（この処理は文章対応付けシステム１００によって自動的になされてもよい）。文章対応付けシステム１００は、形態素解析によって実業務単語ＤＢ２７を作成する（ステップＳ１１，Ｓ１２）。
文章対応付けシステム１００は、基軸文章と実業務文章との組み合わせのそれぞれについて、突合を行い文章類似度を算出し（ステップＳ１３）、文章類似度を集計し（ステップＳ１４）、文章類似度の大きさから突合の結果を判断して対応付けを行うとともに対応付け結果ＤＢを格納する（ステップＳ１５）。さらに対応付けの結果をＥＡ資料対応表（図１８）として出力し（ステップＳ１６）、実業務文章をそれぞれ対応付けられた基軸文章に集積して学習する（ステップＳ１７）。そして集積された単語について基軸単語ＤＢ２５および基軸単語拡張ＤＢ２６を更新する（ステップＳ１８，Ｓ１９）。 Next, the administrator of the text association system 100 converts the actual business explanation document (FIG. 5) of the local government into an RDB and creates the actual business text DB 22 (even if this processing is automatically performed by the text association system 100). Good). The text association system 100 creates the actual business word DB 27 by morphological analysis (steps S11 and S12).
The sentence association system 100 collates and calculates the sentence similarity for each combination of the basic sentence and the actual business sentence (step S13), sums up the sentence similarity (step S14), and increases the sentence similarity. Then, the result of matching is judged and matched, and the matching result DB is stored (step S15). Further, the result of association is output as an EA material correspondence table (FIG. 18) (step S16), and the actual business sentences are accumulated and learned in the associated basic sentences (step S17). Then, the basic word DB 25 and the basic word expansion DB 26 are updated for the accumulated words (steps S18 and S19).

このように、この発明の実施の形態１に係る文章対応付けシステム１００は、文章間の対応付けを行う、文章対応付けシステムであって、組織の業務の内容を表す複数の実業務文章と、対応付けの基軸となる複数の基軸文章とを記憶する、記憶手段と、実業務文章のそれぞれについて、基軸文章のいずれかへの対応付けを行う、演算手段とを備え、組織の業務に関連する文章の対応付けを、複数の基軸文章と、複数の実業務文章との間で実行することができる。また、基軸文章との対応付けが完了した実業務文章を、基軸文章の一部として集積し、これによって学習することができる。このため、文章対応付けシステム１００は、組織の業務に関連する文章の対応付けを精度よく実行することができる。また、実施の形態１に係る文章対応付けプログラムは、コンピュータを文章対応付けシステム１００として機能させるので、組織の業務に関連する文章の対応付けを、複数の文章間で実行することができる。 As described above, the sentence association system 100 according to Embodiment 1 of the present invention is a sentence association system that associates sentences, and includes a plurality of actual business sentences representing the contents of business of an organization, A storage means for storing a plurality of basic texts that serve as a basis for correspondence, and a computing means for associating each of the actual business texts with any of the basic texts, and relating to the business of the organization Sentence association can be executed between a plurality of basic sentences and a plurality of actual business sentences. In addition, the actual business sentences that have been associated with the basic text can be accumulated as a part of the basic text and can be learned thereby. For this reason, the text matching system 100 can accurately execute text matching related to the work of the organization. In addition, the sentence association program according to the first embodiment causes the computer to function as the sentence association system 100, so that the association of sentences related to the work of the organization can be executed between a plurality of sentences.

学習によって得られる効果の例を、以下に説明する。
図２の基軸文章ＤＢ２１において、基軸文章（１−２−０）の作業内容を表す項目（ＮＡＩＹＯＵ）には、「給報」という用語が含まれている。この用語は「給与支払報告書」の略語であるが、コンピュータによる一般的な形態素解析では、このような略語は必ずしも適切には処理されない。たとえば図６の基軸単語ＤＢ２５では、この用語は、ＥＡ番号（１−２−０）の単語附番４の「給」という動詞と、単語附番５の「報」という接尾辞の組み合わせとして誤って解析されている。
また、図４の実業務文章ＤＢ２２において、実業務文章（１−４−０）の作業内容を表す項目（ＮＡＩＹＯＵ）には、「給与支払報告書」という用語が含まれている。この用語は複合語として一つの文書を示すものであり、一まとまりの用語として扱うべきであるが、コンピュータによる一般的な形態素解析では、このような複合語は必ずしも適切には処理されない。たとえば図８の実業務単語ＤＢ２７では、この用語は、業務番号（１−４−０）の単語附番５〜７の「給与」「支払」「報告書」という３つの名詞に分割されている。
このように、基軸文章（１−２−０）および実業務文章（１−４−０）は、実質的にはいずれも「給与支払報告書」という同一の文書に関する処理を含むものであり、これが文章類似度の算出において考慮されるべきであるにもかかわらず、それぞれ形態素解析において適切な処理がなされず、結果としてこの用語は突合の際に一致しないものとなる。このように、学習されない状態では、文章類似度の値は必ずしも最適なものとはならない。 Examples of effects obtained by learning will be described below.
In the basic sentence DB 21 of FIG. 2, the item (NAIYOU) representing the work content of the basic sentence (1-2-0) includes the term “payment”. This term is an abbreviation for “salary payment report”, but such abbreviations are not always properly handled in general morphological analysis by a computer. For example, in the basic word DB 25 of FIG. 6, this term is mistaken as a combination of the verb “Supply” in the word number 4 of the EA number (1-2-0) and the suffix “Information” in the word number 5 Have been analyzed.
Further, in the actual business text DB 22 of FIG. 4, the item (NAIYOU) representing the work content of the actual business text (1-4-0) includes the term “salary payment report”. This term indicates one document as a compound word and should be treated as a group of terms. However, in a general morphological analysis by a computer, such a compound word is not necessarily processed appropriately. For example, in the actual business word DB 27 in FIG. 8, this term is divided into three nouns of “salary”, “payment”, and “report” with the word numbers 5 to 7 of the business number (1-4-0). .
As described above, the basic text (1-2-0) and the actual business text (1-4-0) substantially include processing related to the same document as “salary payment report”. Even though this should be taken into account in the calculation of sentence similarity, appropriate processing is not performed in each morphological analysis, and as a result, this term does not match at the time of matching. Thus, in a state where learning is not performed, the value of the sentence similarity is not necessarily optimal.

ところが、実施の形態１に係る文章対応付けシステム１００は、基軸文章との対応付けが完了した実業務文章を、基軸文章の一部として集積し、これによって学習を行う。上述の例では、基軸文章（１−２−０）および実業務文章（１−４−０）は、「給与支払報告書」という用語以外の単語による一致の度合いが大きいため（または、実施の形態２において後述する、関係法令箇条番号を表す項目、入力情報を表す項目、出力情報を表す項目等いずれかの一致の度合いが大きいため）、結果として対応付けられ、図１９の基軸文章（１−２−０）に示すように１つの基軸文章として集積される。これによって、集積された後の基軸文章は、作業内容を表す項目（ＮＡＩＹＯＵ）に、「給報」という略語（または「給」という動詞および「報」という接尾辞）と、「給与」「支払」「報告書」という３つの名詞とを両方とも含む。したがって、新たな実業務文章として、「給報」という略語を使用した文章が入力された場合であっても、「給与支払報告書」という複合語を使用した文章が入力された場合であっても、少なくともいずれか一方が一致することになり、文章類似度がより適切に算出される。
このように、文章対応付けシステム１００は、学習を行うことによって、また学習を繰り返すことによって、文章類似度をより適切に算出することができる。 However, the sentence association system 100 according to Embodiment 1 accumulates the actual business sentences that have been associated with the base sentence as a part of the base sentence, and learns accordingly. In the above example, the basic sentence (1-2-0) and the actual business sentence (1-4-0) have a high degree of matching with words other than the term “salary payment report” (or implementation Since the degree of coincidence of any of the items representing the related law clause numbers, the items representing the input information, the items representing the output information, etc., which will be described later in the second embodiment, is matched, the key sentence (1 As shown in -2-0), it is accumulated as one basic sentence. As a result, the basic sentences after the accumulation are added to the item (NAIYOU) indicating the work content, the abbreviation “pay” (or the verb “pay” and the suffix “ho”), “salary” “payment” "It contains both three nouns" Report ". Therefore, even if a sentence that uses the abbreviation “payroll” is entered as a new actual work sentence, a sentence that uses the compound word “payroll report” is entered. Also, at least one of them will match, and the sentence similarity will be calculated more appropriately.
As described above, the sentence association system 100 can more appropriately calculate the sentence similarity by performing learning and repeating the learning.

上述の実施の形態１では、単語の重みは品詞に基づいて決定されており、名詞の重みは０でない値であり、名詞以外の重みは０である。変形例として、単語の重みは品詞に基づかず、他の方法で決定されてもよい。
また、文章対応付けシステム１００は、ステップＳ３およびステップＳ１９において基軸単語拡張ＤＢ２６を作成または更新する際に、管理者からの入力を必要とする。変形例として、文章対応付けシステム１００は基軸単語拡張ＤＢ２６を自動的に作成または更新するものであってもよい。
この場合、たとえば文章対応付けシステム１００は、名詞には重み「１」を付与し、その他の品詞には重みを付与しないものであってもよい。また、同義語を一切付与しないものであってもよい。このようにすると、外部からの入力を必要としないので、作業手順を簡素化することができる。
また、文章対応付けシステム１００の記憶手段２０は、単語ごとに付与されるべき重みと同義語とを定義した辞書ファイルを、あらかじめ格納していてもよい。この場合、文章対応付けシステム１００は、この辞書ファイルに基づき、自動的に基軸単語拡張ＤＢ２６を作成または更新することができる。 In the first embodiment described above, the word weight is determined based on the part of speech, the noun weight is a non-zero value, and the non-noun weight is zero. As a variation, the weight of the word may be determined by other methods without being based on the part of speech.
Moreover, the text matching system 100 requires an input from an administrator when creating or updating the basic word expansion DB 26 in step S3 and step S19. As a modification, the sentence association system 100 may automatically create or update the basic word expansion DB 26.
In this case, for example, the sentence association system 100 may assign a weight “1” to a noun and not assign a weight to other parts of speech. Moreover, you may not give a synonym at all. In this way, since no external input is required, the work procedure can be simplified.
Moreover, the memory | storage means 20 of the text matching system 100 may store beforehand the dictionary file which defined the weight and synonym which should be provided for every word. In this case, the text association system 100 can automatically create or update the basic word expansion DB 26 based on the dictionary file.

基軸文章および実業務文章は、自治体以外の組織の業務に関連するものであってもよく、たとえば会社の業務に関連するものであってもよい。また、組織の業務に直接関連しないものであってもよく、複数の文章の間で対応付けを行う用途であればどのような文章に対しても文章対応付けシステム１００を使用することができる。 The basic text and the actual business text may be related to the business of an organization other than the local government, and may be related to the business of a company, for example. Further, the sentence association system 100 may be used for any sentence as long as it is not intended to be directly related to the business of the organization and is associated with a plurality of sentences.

上述の実施の形態１では、すべてのＤＢが単一のコンピュータである文章対応付けシステム１００の記憶手段２０に格納される。変形例として、文章対応付けシステムは複数のコンピュータによって構成されてもよく、それぞれのＤＢが複数のコンピュータに分散して設けられてもよい。たとえば、ステップＳ２およびステップＳ１２における形態素解析処理と、ステップＳ１３における突合処理とが、異なるコンピュータの異なる演算手段によって実行されてもよい。 In the first embodiment described above, all the DBs are stored in the storage unit 20 of the sentence association system 100 that is a single computer. As a modification, the text association system may be configured by a plurality of computers, and each DB may be distributed and provided in a plurality of computers. For example, the morphological analysis process in step S2 and step S12 and the matching process in step S13 may be executed by different calculation means of different computers.

実施の形態２．
実施の形態２は、実施の形態１において、基軸文章ＤＢ２１および実業務文章ＤＢ２２の項目ＮＡＩＹＯＵだけでなく、その他の項目に含まれる情報も使用して文章の対応付けを行う構成としたものである。以下、実施の形態１との相違点を説明する。 Embodiment 2. FIG.
The second embodiment has a configuration in which the texts are associated by using not only the items NAIYO of the basic text DB 21 and the actual business text DB 22 but also information included in other items in the first embodiment. . Hereinafter, differences from the first embodiment will be described.

実施の形態１では、作業内容を表す項目（ＮＡＩＹＯＵ）に含まれる単語に基づいて、基軸単語拡張ＤＢ２６および実業務単語ＤＢ２７からなる組が作成される。実施の形態２に係る文章対応付けシステム（図示せず）の演算手段は、これらに加え、これらと同様の構成を有するＤＢの組を、関係法令箇条番号を表す項目（Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２）についても作成し、記憶手段はこれらを格納する。すなわち、記憶手段は、基軸法令ＤＢおよび実業務法令ＤＢを格納する。
図２１は、実業務法令ＤＢの構成の例を示す。実業務法令単語ＤＢは、実業務文章のそれぞれに関連する法令箇条番号を表すものである。この例では、たとえば実業務文章（１−１−０）に対して第３１７条の６第１項が関連付けられている。なお、基軸法令ＤＢも同様の構成を有する。 In the first embodiment, a set including the basic word expansion DB 26 and the actual business word DB 27 is created based on words included in an item (NAIYOU) representing work content. In addition to these, the calculation means of the sentence association system (not shown) according to the second embodiment selects a set of DBs having the same configuration as these items (J1, J2, K1, K2, G1, G2) are also created, and the storage means stores them. That is, the storage means stores the basic law DB and the actual business law DB.
FIG. 21 shows an example of the configuration of the actual business law DB. The actual business law word DB represents a legal clause number related to each of the actual business texts. In this example, for example, Article 317-6 1st term is associated with the actual business sentence (1-1-0). The basic law DB has a similar configuration.

同様にして、入力情報を表す項目（ＩＮＰＵＴ）についてもＤＢの組が作成され、出力情報を表す項目（ＯＵＴＰＵＴ）についてもＤＢの組が作成される。すなわち、記憶手段は、基軸入力ＤＢ、実業務入力ＤＢ、基軸出力ＤＢ、および、実業務出力ＤＢを格納する。なお、実施の形態１と異なり、実業務文章ＤＢ２２も基軸文章ＤＢ２１と同様の項目ＩＮＰＵＴおよびＯＵＴＰＵＴを含むものとする。
図２２は、基軸入力ＤＢの構成の例を示す。基軸入力ＤＢは、図２の基軸文章ＤＢ２１の項目ＩＮＰＵＴから所定の規則により抽出される文字列に基づいて作成される。たとえば、基軸文章（１−２−０）に対応する入力情報を表す項目（ＩＮＰＵＴ）からは、「・」という記号と改行を表す情報とで囲まれた「給与支払報告書」という文字列と、「住民税申告書」という文字列と、「委任状」という文字列に基づいて、図２２のＥＡ番号（１−２−０）で示される３行が作成される。なお、実業務入力ＤＢ、基軸出力ＤＢ、および、実業務出力ＤＢも同様の構成を有する。
さらに、実施の形態１では、基軸単語ＤＢに対して基軸単語拡張ＤＢが作成されるように、実施の形態２では、基軸入力ＤＢに対して基軸入力拡張ＤＢが作成され、基軸出力ＤＢに対して基軸出力拡張ＤＢが作成される。基軸入力拡張ＤＢは、基軸入力ＤＢの各文字列について、項目名ＯＭＯＭＩおよびＤＯＵＧＩＧＯで表される情報を関連付けて記憶する。ＯＭＯＭＩは、文章対応付けシステムが文章の対応付けを決定する際にその文字列がいかなる重みを持つかを表す項目である。ＤＯＵＧＩＧＯは、その文字列と同一の意味または類似した意味を持つ文字列（またはそのような文字列のリスト）を表す項目である。基軸出力拡張ＤＢも、同様の構成を有する。実施の形態２に係る演算手段は、ステップＳ３において、さらに基軸入力拡張ＤＢおよび基軸出力拡張ＤＢの、重みおよび同義語に関する情報の入力を受け付けて設定する。または、演算手段は、基軸入力拡張ＤＢおよび基軸出力拡張ＤＢの、すべての重みに所定の値を設定し、すべての同義語を設定しなくてもよい。 Similarly, a DB set is created for an item (INPUT) representing input information, and a DB set is created for an item (OUTPUT) representing output information. That is, the storage means stores a basic input DB, an actual business input DB, a basic output DB, and an actual business output DB. Note that, unlike the first embodiment, the actual business sentence DB 22 also includes the same items INPUT and OUTPUT as the basic sentence DB 21.
FIG. 22 shows an example of the configuration of the base axis input DB. The base axis input DB is created based on a character string extracted according to a predetermined rule from the item INPUT of the base axis text DB 21 of FIG. For example, from the item (INPUT) representing the input information corresponding to the basic sentence (1-2-0), a character string “Salary payment report” surrounded by a symbol “·” and information representing a line feed, Based on the character string “resident tax return” and the character string “proxy letter”, three lines indicated by the EA number (1-2-0) in FIG. 22 are created. The actual business input DB, the basic output DB, and the actual business output DB have the same configuration.
Further, in the first embodiment, the basic word input DB is created for the basic input DB in the second embodiment so that the basic word extended DB is created for the basic word DB. Thus, the base axis output extension DB is created. The base input expansion DB stores information represented by item names OMOMI and DOUGIGO in association with each character string of the base input DB. OMOMI is an item that represents what weight the character string has when the text matching system determines text matching. DOUGIGO is an item representing a character string (or a list of such character strings) having the same or similar meaning as the character string. The basic output extension DB has the same configuration. In step S3, the computing unit according to the second embodiment further accepts and sets information regarding weights and synonyms in the base axis input extension DB and base axis output extension DB. Alternatively, the calculation means may set a predetermined value for all weights of the base axis input extension DB and the base axis output extension DB, and may not set all synonyms.

実施の形態２に係る演算手段は、ステップＳ１３において突合詳細ＤＢ２８を作成する際に、作業内容を表す項目（ＮＡＩＹＯＵ）の突合、すなわち実施の形態１において行われる、基軸単語拡張ＤＢ２６と実業務単語ＤＢ２７との突合だけでなく、関係法令箇条番号を表す項目（Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２）、入力を表す項目（ＩＮＰＵＴ）、および、出力を表す項目（ＯＵＴＰＵＴ）の突合もそれぞれ行う。 When the calculation unit according to the second embodiment creates the matching detail DB 28 in step S13, the calculation means matches the items (NAIYOU) representing the work content, that is, the basic word expansion DB 26 and the actual business word that are performed in the first embodiment. Not only the matching with DB27, but also the matching of the items (J1, J2, K1, K2, G1, G2) representing the relevant laws and regulations, the items representing the input (INPUT), and the items representing the output (OUTPUT). Do.

たとえば、関係法令箇条番号を表す項目の突合では、基軸法令ＤＢおよび実業務法令ＤＢを用いて突合が行われる。この突合では、基軸文章と実業務文章との組み合わせごとに、関係法令箇条番号が完全に一致するかどうかが判定される。たとえば、基軸文章が第３１７条の６第２項に関連付けられており、実業務文章が第３１７条の６第１項に関連付けられている場合は、関係法令箇条番号の一部しか一致しないので、この基軸文章と実業務文章とは類似しないものと判定され、その組み合わせの文章類似度は０となる。関係法令箇条番号が完全に一致する場合は、基軸文章と実業務文章とは類似するものと判定され、その組み合わせの文章類似度は０でない所定の値となる。 For example, in the collation of items representing related law clause numbers, the collation is performed using the basic law DB and the actual business law DB. In this collation, it is determined whether or not the related law clause numbers completely match for each combination of the basic sentence and the actual business sentence. For example, if the basic text is associated with Article 317-6, Paragraph 2 and the actual business sentence is associated with Article 317-6, Paragraph 1, only a part of the relevant law clause numbers match. Therefore, it is determined that the basic sentence and the actual business sentence are not similar, and the sentence similarity of the combination is 0. When the related law clause numbers completely match, it is determined that the basic sentence and the actual business sentence are similar, and the sentence similarity of the combination is a predetermined value other than zero.

また、入力情報を表す項目（ＩＮＰＵＴ）の突合では、基軸入力拡張ＤＢおよび実業務入力ＤＢを用いて突合が行われる。実施の形態１で、文章の組み合わせごとに単語の一致度が判定される方法と同様にして、実施の形態２では、文章の組み合わせごとに文字列の一致度が判定される。そして、実施の形態１と同様にして、実施の形態２でも、一致度の総合計から文章類似度を算出する。出力情報を表す項目（ＯＵＴＰＵＴ）の突合についても同様である。 In addition, in the matching of items (INPUT) representing input information, matching is performed using the basic input expansion DB and the actual business input DB. In the second embodiment, in the same manner as the method for determining the word matching degree for each sentence combination, in the second embodiment, the character string matching degree is determined for each sentence combination. In the same manner as in the first embodiment, the sentence similarity is calculated from the total sum of matching degrees in the second embodiment as well. The same applies to the matching of items (OUTPUT) representing output information.

以上のようにして、作業内容を表す項目（ＮＡＩＹＯＵ）の突合、関係法令箇条番号を表す項目（Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２）の突合、入力を表す項目（ＩＮＰＵＴ）の突合、および、出力を表す項目（ＯＵＴＰＵＴ）の突合が行われ、それぞれにおいて文章間の文章類似度が算出される。この文章類似度は、文章の組み合わせごとに合計された後、実施の形態１のステップＳ１５（図１７）と同様の最終的な対応付けに用いられる。
このように、実施の形態２に係る演算手段は、業務の処理内容を表す文字列だけでなく、業務の入力または出力となる情報の名称を表す文字列と、業務に関連する法令の箇条番号とにも基づいて対応付けを行う。 As described above, a match of items (NAIYOU) representing work contents, a match of items (J1, J2, K1, K2, G1, G2) representing related law clause numbers, a match of items (INPUT) representing inputs, And the item (OUTPUT) showing an output is collated and the sentence similarity between sentences is calculated in each. The sentence similarity is summed up for each combination of sentences, and is then used for final association similar to step S15 (FIG. 17) of the first embodiment.
As described above, the computing unit according to the second embodiment is not limited to the character string that represents the processing contents of the business, but the character string that represents the name of the information that is input or output of the business and the item number of the law related to the business. Based on the above, the association is performed.

このように、実施の形態２では、様々な情報を含む様々な項目に基づいて対応付けを行うので、対応付けの精度をより向上させることが可能である。
また、実施の形態２では、組織の性質または業務の性質に応じて項目ごとに重みを変化させることで、対応付けの精度をより向上させることができる。たとえば、自治体のように法律に関係の深い組織の業務に対しては、関係法令箇条番号を表す項目（Ｊ１，Ｊ２，Ｋ１，Ｋ２，Ｇ１，Ｇ２）に含まれる単語（すなわち箇条番号）に対して、他の項目に含まれる単語より大きい重みを付与することで、精度をより向上させることが可能である。
なお、箇条番号は単なる数値であり、文章表現が一般に含むような曖昧性を持たないので、さらに精度が向上する可能性がある。 As described above, in the second embodiment, since the association is performed based on various items including various information, the accuracy of the association can be further improved.
In Embodiment 2, the accuracy of association can be further improved by changing the weight for each item according to the nature of the organization or the nature of the business. For example, for a business of an organization that is closely related to the law, such as a local government, for the words (ie, item numbers) included in the items (J1, J2, K1, K2, G1, G2) representing the relevant law item numbers. Thus, it is possible to further improve the accuracy by assigning more weight than words included in other items.
Note that the item number is simply a numerical value and does not have the ambiguity that the sentence expression generally includes, so the accuracy may be further improved.

実施の形態２において、さらに他の項目に基づいて対応付けを行ってもよい。たとえば、業務に関連する法令の名称を表す項目、業務に関連する外部組織の名称（「税務署」等）を表す項目、業務に関連する組織内の部署名（「住民課」「税務課」等）を表す項目、業務を実施する実施時期や実施月（「４月」等）を表す項目、等が考えられる。また、とくに自治体の税業務を想定する場合、業務に関連する税の名称を表す項目に基づいて対応付けを行うようにすれば、さらに精度が向上する可能性がある。
また、実施の形態２において、実施の形態１と同様の変形を施すことができる。 In the second embodiment, the association may be performed based on other items. For example, items that represent the names of laws and regulations related to operations, items that represent the names of external organizations related to operations (such as "Tax Office"), and internal department names related to operations ("Residents Division", "Tax Affairs Section", etc.) ), Items indicating the implementation time and month (“April”, etc.), etc., are considered. In particular, when assuming a tax service of a local government, if the association is performed based on an item representing the name of the tax related to the service, the accuracy may be further improved.
Further, the second embodiment can be modified in the same manner as in the first embodiment.

１０演算手段、２０記憶手段、２１基軸文章ＤＢ、２２実業務文章ＤＢ、２５基軸単語ＤＢ、２６基軸単語拡張ＤＢ、２７実業務単語ＤＢ、２８突合詳細ＤＢ、２９集計ＤＢ、３０結果ＤＢ、１００文章対応付けシステム。 10 calculation means, 20 storage means, 21 basic sentence DB, 22 actual business sentence DB, 25 basic word DB, 26 basic word expansion DB, 27 actual business word DB, 28 collation detail DB, 29 aggregation DB, 30 result DB, 100 Sentence matching system.

Claims

A text matching system for matching texts,
Storage means for storing a plurality of basic sentences serving as the basic axes of the association and a plurality of actual business sentences to be associated with the basic sentences;
Computation means for associating each of the actual business texts with any of the basic texts,
Each of the actual business sentences includes a business number for identifying the actual business text, and an actual business character string representing the processing content of the work corresponding to the actual business text,
Each of the basic texts includes a number for identifying the basic text and a basic character string representing the processing content of the work corresponding to the basic text,
The computing means identifies words included in each of the basic character string and the actual business character string,
The calculation means includes the number for identifying a base sentence and the business number based on the degree of coincidence between the basic word included in the basic character string and the actual business word included in the actual business character string. To perform the matching,
The computing means creates output data,
The output data is a sentence association system that indicates that the basic sentence and the actual business sentence associated with the basic sentence are output in parallel on the left and right.

The sentence association system according to claim 1, wherein the computing unit performs the association based on a weight defined for each of the basic words.

The arithmetic means performs a morphological analysis for each of the actual business character string and the basic character string to obtain the actual business word and the basic word,
The computing means calculates the number of times the actual business word and the basic word or a synonym thereof match for each combination of the actual business character string and the basic character string,
The computing means multiplies the number of matches by the weight defined for the matched key word for each of the combinations to calculate the degree of match of each actual business word,
For each of the combinations, the calculation means calculates the sum of the matching degrees of all actual business words, and calculates the matching degree in the combination as a sentence similarity based on the sum.
The calculation means associates each of the actual business character strings with the basic sentence that gives the largest sentence similarity, or with all the basic sentences that have the sentence similarity equal to or higher than a threshold value, The sentence matching system according to claim 2.

The computing means sets the weight of the base word to 1 when the base word is a noun, and sets the weight of the base word to 0 otherwise.
The sentence storage system according to claim 2 or 3, wherein the storage unit stores a dictionary file that defines synonyms for a plurality of basic words.

The computing means is
A character string representing the name of information to be input or output of the business;
The sentence matching system according to any one of claims 1 to 4, wherein the matching is performed based on item numbers of laws and regulations related to the business.

6. The storage unit according to claim 1, wherein the storage unit stores one basic sentence and all actual business sentences that are associated with the basic sentence in the association in the one basic sentence. The sentence matching system according to claim 1.

The sentence correspondence system according to any one of claims 1 to 6, wherein the actual business sentence represents a content of business of a local government.

The text matching program for functioning a computer as a text matching system as described in any one of Claims 1-7.