JP7155758B2

JP7155758B2 - Information processing device, information processing method and program

Info

Publication number: JP7155758B2
Application number: JP2018158653A
Authority: JP
Inventors: 直之伊藤; 聡田端; 錬松山; 遥前田; 和久大野
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2022-10-19
Anticipated expiration: 2038-08-27
Also published as: JP2020035019A

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

コンピュータが文章を自動生成する文章生成技術がある。例えば特許文献１では、再帰型ニューラルネットワーク（ＲＮＮ；Recurrent Neural Network、以下では「ＲＮＮ」と記す）を用いて、コンピュータに入力された文書を要約した説明文を生成する説明文生成方法等が開示されている。 There is a sentence generation technology in which a computer automatically generates sentences. For example, Patent Document 1 discloses a descriptive text generation method for generating descriptive text summarizing a document input to a computer using a recurrent neural network (RNN; hereinafter referred to as "RNN"). It is

特開２０１８－２８８６６号公報JP 2018-28866 A

しかしながら、特許文献１に係る発明は文書の要約を行っているに過ぎず、ユーザが指定したテキストから、当該テキストを含む文章を自動生成するに至っていない。 However, the invention according to Patent Literature 1 merely summarizes a document, and does not automatically generate sentences containing the text from the text specified by the user.

一つの側面では、ユーザが指定したテキストを含む文章を適切に生成することができる情報処理装置等を提供することを目的とする。 An object of one aspect of the present invention is to provide an information processing apparatus and the like capable of appropriately generating sentences including text specified by a user.

一つの側面では、情報処理装置は、所定単位の文字又は文字列である各要素から構成される文章において、前記各要素の次に出現する前記要素を前記文章の順に学習した言語モデルを記憶する記憶部と、生成する文章に含める前記要素の指定入力を受け付ける受付部と、前記言語モデルに、指定された前記要素のいずれかを入力して前記要素の次に出現する要素を取得し、指定された前記要素及び取得した要素を前記言語モデルに入力して更に次に出現する要素を取得する処理を繰り返し、指定された前記要素及び取得した各要素を含む文章を生成する生成部と、生成した前記文章を出力する出力部とを備え、前記生成部は、前記言語モデルの学習に用いた学習元文章の先頭に存在する要素を前記言語モデルに入力して前記要素の次に出現する要素を取得し、前記先頭の要素及び取得した前記次に出現する要素を含む先頭フレーズを生成し、または、前記学習元文章の先頭に存在する要素を前記言語モデルに入力して前記要素の次に出現する要素を取得し、前記先頭の要素及び取得した要素を前記言語モデルに入力して更に次に出現する要素を取得する処理を繰り返し、前記先頭の要素及び取得した各要素を含む先頭フレーズを生成し、生成した先頭フレーズ及び前記指定された要素を含む文章を生成する。 In one aspect, an information processing device stores a language model obtained by learning, in the order of the sentence, the elements that appear next to each element in a sentence composed of elements that are characters or character strings of a predetermined unit. a storage unit, a reception unit that receives input for specifying the elements to be included in a sentence to be generated , and an element that appears next to the element by inputting one of the specified elements to the language model, and specifying the element a generation unit that repeats the process of inputting the specified element and the obtained element into the language model and obtaining the element that appears next, and generates a sentence containing the specified element and the obtained elements ; and an output unit that outputs the sentence that has been written, and the generation unit inputs an element that exists at the beginning of the learning source sentence used for learning the language model to the language model, and an element that appears next to the element and generate a head phrase that includes the head element and the element that appears next to the obtained element, or input the element that exists at the head of the learning source sentence into the language model and next to the element Obtaining an appearing element, inputting the first element and the obtained element to the language model, and repeating the process of obtaining the next appearing element, and obtaining the first phrase including the first element and each obtained element Generate a sentence containing the generated head phrase and the specified element .

一つの側面では、ユーザが指定したテキストを含む文章を適切に生成することができる。 In one aspect, it is possible to appropriately generate sentences that include user-specified text.

文章生成システムの構成例を示す模式図である。It is a schematic diagram which shows the structural example of a sentence production|generation system. サーバの構成例を示すブロック図である。It is a block diagram which shows the structural example of a server. キーワードＤＢ及び置換辞書のレコードレイアウトの一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a record layout of a keyword DB and a replacement dictionary; 文章学習処理に関する説明図である。FIG. 10 is an explanatory diagram relating to sentence learning processing; 言語モデルに関する説明図である。FIG. 4 is an explanatory diagram of a language model; 文章生成処理に関する説明図である。It is explanatory drawing regarding sentence production|generation processing. 端末が表示する画面イメージ図である。FIG. 3 is a screen image diagram displayed by a terminal; 文章学習処理の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the processing procedure of sentence learning processing. 文章生成処理の処理手順の一例を示すフローチャートである。It is a flow chart which shows an example of a processing procedure of sentence generation processing. 実施の形態２の概要を示す説明図である。FIG. 10 is an explanatory diagram showing an overview of Embodiment 2; 実施の形態２に係る文章生成処理の処理手順の一例を示すフローチャートである。10 is a flow chart showing an example of a processing procedure of sentence generation processing according to Embodiment 2; 実施の形態３の概要を示す説明図である。FIG. 11 is an explanatory diagram showing an overview of Embodiment 3; 実施の形態３に係る文章生成処理の処理手順の一例を示すフローチャートである。14 is a flow chart showing an example of a processing procedure of sentence generation processing according to Embodiment 3; 実施の形態４の概要を示す説明図である。FIG. 12 is an explanatory diagram showing an outline of a fourth embodiment; FIG. 実施の形態４に係る文章生成処理の処理手順の一例を示すフローチャートである。FIG. 12 is a flow chart showing an example of a processing procedure of sentence generation processing according to Embodiment 4; FIG. 実施の形態５に係る文章学習処理に関する説明図である。FIG. 21 is an explanatory diagram of sentence learning processing according to Embodiment 5; 実施の形態５に係る文章生成処理に関する説明図である。FIG. 21 is an explanatory diagram of sentence generation processing according to Embodiment 5; 実施の形態５に係る文章学習処理の処理手順の一例を示すフローチャートである。FIG. 14 is a flow chart showing an example of a processing procedure of sentence learning processing according to Embodiment 5. FIG. 実施の形態５に係る文章生成処理の処理手順の一例を示すフローチャートである。FIG. 13 is a flow chart showing an example of a processing procedure of sentence generation processing according to Embodiment 5. FIG. 上述した形態のサーバの動作を示す機能ブロック図である。It is a functional block diagram which shows operation|movement of the server of the form mentioned above.

以下、本発明をその実施の形態を示す図面に基づいて詳述する。
（実施の形態１）
図１は、文章生成システムの構成例を示す模式図である。本実施の形態では、ユーザが指定した語句に基づき、契約書の文章を自動生成する文章生成システムについて説明する。文章生成システムは、情報処理装置１、端末２、２、２…を含む。各装置は、インターネット等のネットワークＮを介して通信接続されている。 Hereinafter, the present invention will be described in detail based on the drawings showing its embodiments.
(Embodiment 1)
FIG. 1 is a schematic diagram showing a configuration example of a sentence generation system. In this embodiment, a text generation system that automatically generates a text of a contract based on words specified by a user will be described. The text generation system includes an information processing device 1, terminals 2, 2, 2, . . . Each device is connected for communication via a network N such as the Internet.

情報処理装置１は、種々の情報処理、情報の送受信が可能な情報処理装置であり、例えばサーバ装置、パーソナルコンピュータ等である。本実施の形態では情報処理装置１がサーバ装置であるものとし、以下の説明では簡潔にサーバ１と読み替える。サーバ１は、学習用の契約書群を用いて、契約書に出現する語句を学習する機械学習を行い、契約書内の任意の箇所に出現する語句を、直前までに出現する語句から推定可能な言語モデルを生成する。言語モデルは、自然言語の文章が生成される確率をモデル化したものであり、例えばＮ－ｇｒａｍモデル、隠れマルコフモデルなどがある。後述するように、本実施の形態においてサーバ１は、学習用に与えられる契約書群からＲＮＮを言語モデルとして構築し、当該ＲＮＮを用いて契約書用の文章を生成する。 The information processing device 1 is an information processing device capable of various types of information processing and transmission/reception of information, and is, for example, a server device, a personal computer, or the like. In the present embodiment, the information processing device 1 is assumed to be a server device, and is simply replaced with the server 1 in the following description. The server 1 uses a group of contracts for learning to perform machine learning to learn words and phrases that appear in the contract, and can estimate words and phrases that appear at any point in the contract from the words and phrases that appear immediately before. generate a language model. A language model is a model of the probability of generating natural language sentences, and includes, for example, an N-gram model and a hidden Markov model. As will be described later, in the present embodiment, the server 1 constructs RNN as a language model from a group of contracts given for learning, and uses the RNN to generate sentences for contracts.

端末２は、契約書を作成する各ユーザが使用する端末装置であり、例えばパーソナルコンピュータ、多機能端末等である。端末２は、作成する契約書の文章に含める語句を指定する指定入力を受け付け、指定された語句（以下では適宜に「指定語句」と呼ぶ）を含む文章の生成をサーバ１に要求する。端末２は、生成された文章をサーバ１から受信し、ユーザに提示する。 The terminal 2 is a terminal device used by each user who creates a contract, and is, for example, a personal computer, a multifunctional terminal, or the like. The terminal 2 receives a specification input for specifying a phrase to be included in the text of the contract to be created, and requests the server 1 to generate a sentence including the specified phrase (hereinafter referred to as "specified phrase" as appropriate). The terminal 2 receives the generated sentence from the server 1 and presents it to the user.

図２は、サーバ１の構成例を示すブロック図である。サーバ１は、制御部１１、主記憶部１２、通信部１３、補助記憶部１４を備える。
制御部１１は、一又は複数のＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、ＧＰＵ（Graphics Processing Unit）等の演算処理装置を有し、補助記憶部１４に記憶されたプログラムＰを読み出して実行することにより、サーバ１に係る種々の情報処理、制御処理等を行う。主記憶部１２は、ＳＲＡＭ（Static Random Access Memory）、ＤＲＡＭ（Dynamic Random Access Memory）、フラッシュメモリ等の一時記憶領域であり、制御部１１が演算処理を実行するために必要なデータを一時的に記憶する。通信部１３は、通信に関する処理を行うための処理回路等を含み、外部と情報の送受信を行う。 FIG. 2 is a block diagram showing a configuration example of the server 1. As shown in FIG. The server 1 includes a control section 11 , a main storage section 12 , a communication section 13 and an auxiliary storage section 14 .
The control unit 11 has an arithmetic processing unit such as one or more CPU (Central Processing Unit), MPU (Micro-Processing Unit), GPU (Graphics Processing Unit), etc., and executes the program P stored in the auxiliary storage unit 14. By reading and executing, various information processing, control processing, etc. related to the server 1 are performed. The main storage unit 12 is a temporary storage area such as SRAM (Static Random Access Memory), DRAM (Dynamic Random Access Memory), flash memory, etc., and temporarily stores data necessary for the control unit 11 to perform arithmetic processing. Remember. The communication unit 13 includes a processing circuit and the like for performing processing related to communication, and transmits and receives information to and from the outside.

補助記憶部１４は大容量メモリ、ハードディスク等であり、制御部１１が処理を実行するために必要なプログラムＰ、その他のデータを記憶している。また、補助記憶部１４は、言語モデル１４１、キーワードＤＢ１４２、及び置換辞書１４３を記憶している。言語モデル１４１は、上述の如く、学習用の契約書群から生成される言語モデルのデータであり、各種契約書に含まれる各条項のモデルデータである。後述するように、本実施の形態でサーバ１は、契約書に含まれる文章のカテゴリ（契約書の種別、及び契約書内の条項）毎に言語モデル１４１を生成して補助記憶部１４に記憶してある。 The auxiliary storage unit 14 is a large-capacity memory, a hard disk, or the like, and stores programs P and other data necessary for the control unit 11 to execute processing. The auxiliary storage unit 14 also stores a language model 141 , a keyword DB 142 and a replacement dictionary 143 . The language model 141 is, as described above, language model data generated from a learning contract group, and is model data of each clause included in various contracts. As will be described later, in this embodiment, the server 1 generates a language model 141 for each category of sentences included in a contract (type of contract and clauses in the contract) and stores the language model 141 in the auxiliary storage unit 14. I have

キーワードＤＢ１４２は、各種契約書の各条項に頻出するキーワードを格納したデータベースである。例えばサーバ１は、機械学習を行って学習用の契約書群から言語モデル１４１を生成する際に、出現頻度に応じて契約書内の各条項のキーワードを抽出し、キーワードＤＢ１４２に格納しておく。 The keyword DB 142 is a database that stores keywords that frequently appear in each clause of various contracts. For example, when the server 1 performs machine learning to generate the language model 141 from a group of contracts for learning, it extracts the keywords of each clause in the contract according to the appearance frequency and stores them in the keyword DB 142. .

置換辞書１４３は、同義語、類義語等のような関連ある複数の語句を対応付けて格納したテーブルである。後述するようにユーザから語句の指定を受けて文章を生成する際に、サーバ１は置換辞書１４３を参照して、ユーザが指定した語句を置換して文章を生成する。 The replacement dictionary 143 is a table that associates and stores a plurality of related terms such as synonyms and synonyms. As will be described later, when a sentence is generated by receiving designation of a word from the user, the server 1 refers to the replacement dictionary 143 and replaces the word specified by the user to generate the sentence.

なお、補助記憶部１４はサーバ１に接続された外部記憶装置であってもよい。また、サーバ１は複数のコンピュータからなるマルチコンピュータであってもよく、ソフトウェアによって仮想的に構築された仮想マシンであってもよい。 Incidentally, the auxiliary storage unit 14 may be an external storage device connected to the server 1 . Moreover, the server 1 may be a multicomputer consisting of a plurality of computers, or may be a virtual machine virtually constructed by software.

また、本実施の形態においてサーバ１は上記の構成に限られず、例えば可搬型記憶媒体に記憶された情報を読み取る読取部、操作入力を受け付ける入力部、画像を表示する表示部等を含んでもよい。 Further, in the present embodiment, the server 1 is not limited to the above configuration, and may include, for example, a reading unit for reading information stored in a portable storage medium, an input unit for receiving operation input, a display unit for displaying images, and the like. .

図３は、キーワードＤＢ１４２及び置換辞書１４３のレコードレイアウトの一例を示す説明図である。キーワードＤＢ１４２は、種別列、条項列、キーワード列を含む。種別列は、キーワードの抽出元である契約書の種別を記憶している。条項列は、契約書の種別と対応付けて、キーワードの抽出元である契約書内の各条項を記憶している。キーワード列は、契約書の種別及び条項と対応付けて、各種別の契約書の各条項から抽出した一又は複数のキーワードを記憶している。 FIG. 3 is an explanatory diagram showing an example of the record layout of the keyword DB 142 and replacement dictionary 143. As shown in FIG. The keyword DB 142 includes a type column, a clause column, and a keyword column. The type column stores the types of contracts from which keywords are extracted. The clause column stores each clause in the contract from which keywords are extracted, in association with the contract type. The keyword column stores one or more keywords extracted from each clause of each type of contract in association with the type and clause of the contract.

置換辞書１４３は、対象語列、置換語列を含む。対象語列は、ユーザが指定した語句について、置換辞書１４３を用いた置換対象とする対象語を記憶している。置換語列は、対象語と対応付けて、対象語を置換した語句を記憶している。 The replacement dictionary 143 includes a target word string and a replacement word string. The target word string stores target words to be replaced using the replacement dictionary 143 for the words specified by the user. The replacement word string stores words and phrases that replace the target word in association with the target word.

図４は、文章学習処理に関する説明図である。図４では、サーバ１が学習用の契約書の文章を学習する機械学習を行い、複数の言語モデル１４１、１４１、１４１…を生成する様子を概念的に図示してある。以下では、サーバ１が実行する処理の概要について説明する。 FIG. 4 is an explanatory diagram relating to sentence learning processing. FIG. 4 conceptually shows how the server 1 performs machine learning for learning the sentences of a learning contract and generates a plurality of language models 141, 141, 141, . The outline of the processing executed by the server 1 will be described below.

サーバ１は、学習用の契約書群を外部から取得し、各契約書の文章（学習元文章）に出現する語句を学習する機械学習を行う。例えばサーバ１は、人手で作成された既存の契約書を学習用の契約書として用いて学習を行う。サーバ１はまず、各々の契約書から、その契約書の種別、及び契約書に含まれる各条項を判別し、カテゴリ（種別及び条項）毎に契約書内の文章を分類する文書の構造化を行う。そしてサーバ１は、構造化した契約書の各文章を、カテゴリ毎に学習して別々の言語モデル１４１を生成する。 The server 1 acquires a group of contracts for learning from the outside, and performs machine learning to learn words and phrases appearing in sentences (learning source sentences) of each contract. For example, the server 1 performs learning using an existing manually created contract as a learning contract. First, the server 1 determines the type of the contract and each clause included in the contract from each contract, and structuring the document by classifying the sentences in the contract for each category (type and clause). conduct. The server 1 then learns each sentence of the structured contract for each category to generate separate language models 141 .

契約書の種別は、契約内容を大別する分類であり、例えば秘密保持契約、業務委託契約、共同研究契約などである。なお、上記はいずれも一例であって、契約書の種別は特に限定されない。図４左側に一例として、秘密保持契約に係る契約書を概念的に図示してある。図４で太線枠に示すように、一般的な契約書は、複数の条項に区分して契約内容が記述される。サーバ１は、契約書の条項毎に、各条項の文章を学習する。なお、本明細書で「文章」と言う場合、文章は一文（センテンス）に限定されず、複数の文から成る文章、又は一文よりも短い文章も含まれ得る。後述するように、サーバ１は、各種別の契約書の各条項に出現する語句を、契約書における各語句の並び順に従って学習する。これによりサーバ１は、各条項に応じた言語モデル１４１を生成する。 The type of contract is a classification that broadly categorizes the content of the contract, such as a nondisclosure agreement, outsourcing agreement, joint research agreement, and the like. All of the above are examples, and the type of contract is not particularly limited. As an example on the left side of FIG. 4, a contract relating to a confidentiality agreement is conceptually illustrated. As shown in the thick line frame in FIG. 4, a general contract is divided into a plurality of clauses and the contents of the contract are described. The server 1 learns the text of each clause for each clause of the contract. In addition, when the term "sentence" is used in this specification, the sentence is not limited to one sentence, but may include a sentence consisting of a plurality of sentences or a sentence shorter than one sentence. As will be described later, the server 1 learns words and phrases that appear in each clause of each type of contract according to the order of the words and phrases in the contract. As a result, the server 1 generates a language model 141 corresponding to each clause.

まずサーバ１は、複数の学習用契約書を、契約書の種別に応じて分類する。例えばサーバ１は、契約書の書類名等から契約書の種別を判別する。図４に示す契約書では、サーバ１はタイトルの「秘密保持契約書」から、当該契約書が秘密保持契約に係る契約書であることを判別する。 First, the server 1 classifies a plurality of learning contracts according to the type of the contract. For example, the server 1 determines the type of contract based on the document name of the contract. In the contract shown in FIG. 4, the server 1 determines from the title "confidentiality agreement" that the contract is related to the nondisclosure agreement.

さらにサーバ１は、契約書内の各条項に相当する文章部分を特定し、契約書内の文章を条項毎に分類する。例えばサーバ１は、各条項のタイトルに相当する小見出し（サブタイトル）に基づいて条項を特定する。図４の例では、サーバ１は「第１条」の小見出し「目的」から、当該文章部分を契約の目的に関する条項として判別する。 Further, the server 1 identifies text portions corresponding to each clause in the contract, and classifies the text in the contract for each clause. For example, the server 1 identifies a clause based on a subheading (subtitle) corresponding to the title of each clause. In the example of FIG. 4, the server 1 determines from the subheading "Purpose" of "Article 1" that the text portion is a clause relating to the purpose of the contract.

なお、上記ではサーバ１が契約書のタイトル名、小見出し等からルールベースで契約書のカテゴリ（種別及び条項）を判別するものとしたが、本実施の形態はこれに限定されるものではない。例えばサーバ１は、ｔｆ－ｉｄｆ法等を用いて各契約書の文章の類似度を、文章に含まれる単語の出現頻度に応じて算出し、文章同士の類似度を算出して契約書の構造化を行ってもよい。 In the above description, the server 1 determines the category (type and clause) of the contract based on the rule based on the title name, subheading, etc. of the contract, but the present embodiment is not limited to this. For example, the server 1 uses the tf-idf method or the like to calculate the degree of similarity between sentences in each contract according to the appearance frequency of words contained in the sentences, and calculates the degree of similarity between sentences to determine the structure of the contract. may be modified.

サーバ１は各契約書について上述の処理を繰り返し、複数の学習用契約書それぞれの種別を判別し、各契約書に記述されている各条項を判別する。すなわちサーバ１は、契約書の構造化を行う。サーバ１は、構造化した各契約書においてカテゴリ（種別及び条項）が共通する文章を元に、各言語モデル１４１を生成する。例えば図４右側に示すように、サーバ１は、種別が「秘密保持契約」であり、かつ、条項が「損害賠償」で共通する各契約書の文章を元に、一の言語モデル１４１を生成する。その他の言語モデル１４１についても同様に、サーバ１は文章のカテゴリに応じて言語モデル１４１を生成する。 The server 1 repeats the above-described processing for each contract, discriminates the type of each of the plurality of learning contracts, and discriminates each clause described in each contract. That is, the server 1 structures the contract. The server 1 generates each language model 141 based on sentences having a common category (type and clause) in each structured contract. For example, as shown on the right side of FIG. 4, the server 1 generates one language model 141 based on the text of each contract whose type is "confidentiality agreement" and whose clause is "compensation for damages" in common. do. Similarly for the other language models 141, the server 1 generates the language models 141 according to the categories of sentences.

図５は、言語モデル１４１に関する説明図である。本実施の形態でサーバ１は、言語モデル１４１としてＲＮＮを構築する。例えばサーバ１は、ＲＮＮの一種であるＬＳＴＭ（Long Short-Term Memory）を構築する。 FIG. 5 is an explanatory diagram of the language model 141. As shown in FIG. In this embodiment, the server 1 constructs RNN as the language model 141 . For example, the server 1 constructs an LSTM (Long Short-Term Memory), which is a type of RNN.

サーバ１は、上記のようにカテゴリ毎に分類した文章をＲＮＮに入力し、各カテゴリの文章に出現する語句を学習する。ここでサーバ１はまず、ＲＮＮに入力する文章に対して形態素解析等の自然言語処理を行い、所定単位の文字又は文字列である語句（要素）毎に分割する。この分割単位は、例えば単語、文節等の単位であるが、特に限定されない。例えばサーバ１は、複数の語句を格納した辞書（不図示）を予め記憶しておき、当該辞書に格納された語句に従って文章を分割する。 The server 1 inputs the sentences classified for each category as described above to the RNN, and learns the words appearing in the sentences of each category. Here, the server 1 first performs natural language processing such as morphological analysis on the text input to the RNN, and divides the text into words and phrases (elements) that are characters or character strings of predetermined units. This division unit is, for example, a unit such as a word or a clause, but is not particularly limited. For example, the server 1 stores in advance a dictionary (not shown) storing a plurality of words and phrases, and divides sentences according to the words and phrases stored in the dictionary.

なお、サーバ１は単語、文節等の単位ではなく、その他の単位で文章を分割してもよい。その他の分割単位としては、サブワード（部分語）と呼ばれる単位が想定され得る。サブワードは通常の分かち書きとは異なり、文章中に出現する頻度に応じて文章を区分した語句の単位である。一般的に文章の最小構成単位として用いられる「単語」は、文章中の文字又は文字列を意味、文法等の観点から最小化した単位であるが、サブワードは意味、文法等による単位ではなく、文章中で用いられる頻度に応じて最小化した単位である。サブワード単位で分割した場合、サーバ１は契約書特有の専門用語も分割可能であるため、より好適に文書の分割を行うことができる。このように、サーバ１は所定単位の文字又は文字列である要素毎に契約書の文章を分割可能であればよく、分割単位とする要素は単語等の単位に限定されない。 It should be noted that the server 1 may divide the text in units other than words, phrases, and the like. As another division unit, a unit called a subword (partial word) can be assumed. A subword is a unit of words and phrases into which a sentence is divided according to the frequency of appearance in the sentence, unlike the usual spaced words. "Word", which is generally used as the minimum structural unit of sentences, is a unit that minimizes characters or character strings in sentences from the viewpoint of meaning, grammar, etc., but subwords are not units based on meaning, grammar, etc. It is a unit minimized according to the frequency used in a sentence. When the document is divided into subwords, the server 1 can also divide technical terms specific to the contract, so that the document can be divided more appropriately. In this manner, the server 1 may divide the text of the contract into elements, which are characters or character strings in predetermined units, and the elements to be divided are not limited to units such as words.

サーバ１は、分割した各語句をＲＮＮに係る入力層に入力し、機械学習を行う。図５に、ＲＮＮの構成を概念的に図示する。図５に示すように、ＲＮＮは、入力層、中間層（隠れ層）、及び出力層を有する。入力層は、文章の先頭から末尾に亘って順に出現する各語句の入力をそれぞれ受け付ける複数のニューロンを有する。出力層は、入力層の各ニューロンに対応して、各ニューロンに入力される語句の次に出現する語句をそれぞれ推定して出力する複数のニューロンを有する。そして中間層は、入力層の各ニューロンへの入力値（語句）に対して出力層の各ニューロンにおける出力値（語句）を演算するための複数のニューロンを有する。中間層の各ニューロンは、過去の入力値に関する中間層での演算結果を用いて次の入力値に関する演算を行うことで、直前までに出現する一又は複数の語句から次の語句を推定する。 The server 1 inputs each divided word/phrase to the input layer related to the RNN, and performs machine learning. FIG. 5 conceptually illustrates the configuration of the RNN. As shown in FIG. 5, the RNN has an input layer, an intermediate layer (hidden layer), and an output layer. The input layer has a plurality of neurons that receive inputs of words that appear in order from the beginning to the end of a sentence. The output layer has a plurality of neurons for estimating and outputting words appearing next to words input to each neuron corresponding to each neuron in the input layer. The intermediate layer has a plurality of neurons for computing output values (words) in each neuron in the output layer with respect to input values (words) in each neuron in the input layer. Each neuron in the intermediate layer estimates the next word/phrase from one or a plurality of words/phrases appearing immediately before by performing an operation regarding the next input value using the operation result in the intermediate layer regarding the past input value.

なお、図５に示すＲＮＮの構成は一例であって、本実施の形態はこれに限定されるものではない。例えば中間層は一層に限定されず、二層以上であってもよい。また、入力層及び出力層のニューロンの数は同数に限定されず、例えば入力に対して出力の数は少なくともよい。 Note that the configuration of the RNN shown in FIG. 5 is an example, and the present embodiment is not limited to this. For example, the intermediate layer is not limited to one layer, and may be two or more layers. Also, the number of neurons in the input layer and the number of output layers are not limited to the same number, and for example, the number of outputs may be at least as many as the inputs.

また、本実施の形態でサーバ１はＲＮＮのアルゴリズムに従って学習を行うが、例えばその他の深層学習、Ｎ－ｇｒａｍモデル、ＳＶＭ（Support Vector Machine）、ベイジアンネットワーク、決定木など、他のアルゴリズムに従って学習を行い、言語モデル１４１を生成してもよい。 In addition, although the server 1 performs learning according to the RNN algorithm in the present embodiment, learning can be performed according to other algorithms such as other deep learning, N-gram model, SVM (Support Vector Machine), Bayesian network, decision tree, etc. may be performed to generate the language model 141 .

サーバ１は、学習用文章の各語句を、当該文章内での並び順に従って入力層の各ニューロンに入力し、出力層の各ニューロンから出力値を得る。図５の例では、サーバ１は学習用文章の各語句「甲」、「及び」、「乙」、「が」…を、文章内での順番に従い、対応する入力層の各ニューロンに入力する。サーバ１は、中間層を経て出力層の各ニューロンでの演算を行い、文章内の任意の位置（順番）に出現する語句の生起確率を、直前までに出現する語句に基づいて算出し、次に出現する語句を推定する。図５の例では、サーバ１は１番目の語句「甲」に基づき、２番目に出現する語句の生起確率を算出して推定を行う。また、サーバ１は１番目、２番目の語句「甲」、「及び」に基づき、３番目の語句の生起確率を算出して推定を行う。以下同様にして、サーバ１は各語句を推定する。 The server 1 inputs each word of the learning sentence to each neuron of the input layer according to the order of arrangement in the sentence, and obtains an output value from each neuron of the output layer. In the example of FIG. 5, the server 1 inputs the words "A", "And", "Otsu", "GA", etc. of the learning sentence to each neuron of the corresponding input layer according to the order in the sentence. . The server 1 performs calculations in each neuron of the output layer via the intermediate layer, calculates the occurrence probability of a word appearing at an arbitrary position (order) in the sentence based on the words appearing immediately before, infer words that appear in In the example of FIG. 5, the server 1 performs estimation by calculating the occurrence probability of the second appearing word based on the first word "ko". The server 1 also calculates and estimates the occurrence probability of the third word based on the first and second words "ko" and "and". Similarly, the server 1 estimates each word.

サーバ１は、推定した語句を実際の語句（正解値）と比較し、出力層の各ニューロンからの出力値が正解値に近似するよう各ニューロンのパラメータを調整し、ＲＮＮを構築する。例えばサーバ１は、「甲」に続く語句として推定した語句が、実際の語句「及び」となるように、各ニューロンでの重み等を調整する。これによりサーバ１は、学習用文章の正順序における語句の並び順を学習した言語モデル１４１を生成する。 The server 1 compares the estimated word/phrase with the actual word/phrase (correct value), adjusts the parameters of each neuron so that the output value from each neuron in the output layer approximates the correct value, and constructs an RNN. For example, the server 1 adjusts the weight and the like in each neuron so that the word/phrase estimated as the word/phrase following "ko" becomes the actual word/phrase "and". As a result, the server 1 generates a language model 141 that has learned the order of words in the correct order of the learning sentences.

サーバ１は上述の処理を各カテゴリの文章について行い、ＲＮＮに係る言語モデル１４１を、契約書のカテゴリ毎に生成する。 The server 1 performs the above-described processing for sentences in each category, and generates a language model 141 related to RNN for each category of contract.

また、サーバ１は上記の学習時に、併せて各カテゴリの文章に高頻度で出現するキーワードを抽出してキーワードＤＢ１４２に格納しておく。例えばサーバ１は、文章内での各語句の出現頻度を計算し、出現頻度が閾値以上の語句をキーワードとして抽出する。サーバ１は、キーワードとして抽出した語句を、抽出元である文章のカテゴリ（種別及び条項）と対応付けてキーワードＤＢ１４２に格納する。後述するように、サーバ１は、生成する契約書の文章に含める語句をユーザが指定する場合に、キーワードとして抽出した語句をユーザに提示して指定入力を受け付ける。 In addition, the server 1 extracts keywords that frequently appear in sentences of each category and stores them in the keyword DB 142 at the time of learning. For example, the server 1 calculates the frequency of appearance of each word/phrase in a sentence, and extracts words/phrases whose frequency of appearance is greater than or equal to a threshold value as keywords. The server 1 stores the words extracted as keywords in the keyword DB 142 in association with the category (type and clause) of the text that is the extraction source. As will be described later, when the user designates words and phrases to be included in the text of the contract to be generated, the server 1 presents the words and phrases extracted as keywords to the user and accepts designation input.

図６は、文章生成処理に関する説明図である。図７は、端末２が表示する画面イメージ図である。図６では、ユーザが指定した複数の語句から契約書の文章を生成する様子を概念的に図示している。図７では、文章生成時における端末２の表示画面を模式的に図示している。図６及び図７に基づき、言語モデル１４１を用いた文章生成処理について説明する。 FIG. 6 is an explanatory diagram regarding the text generation process. FIG. 7 is a screen image diagram displayed by the terminal 2. As shown in FIG. FIG. 6 conceptually illustrates how a contract sentence is generated from a plurality of words specified by the user. FIG. 7 schematically shows the display screen of the terminal 2 during sentence generation. A text generation process using the language model 141 will be described with reference to FIGS. 6 and 7. FIG.

まずサーバ１は、端末２から、ユーザが文章の作成を希望する契約書のカテゴリと、当該文章に含めたい複数の語句とを指定する指定入力を受け付ける。例えば図７に図示する入力画面を介して、サーバ１は端末２から各種情報の指定入力を受け付ける。 First, the server 1 receives, from the terminal 2, a designation input designating a contract category for which the user wishes to create a text and a plurality of words to be included in the text. For example, the server 1 receives various information designation inputs from the terminal 2 via the input screen illustrated in FIG.

入力画面は、例えば条項選択欄７１、キーワード選択欄７２、テキスト入力欄７３を含む。条項選択欄７１は、入力画面上部に既に表示されている種別の契約書の条項をプルダウンメニューとして表示し、作成を希望する文章の条項をユーザが選択するための選択欄である。図７に示すように、条項選択欄７１において端末２は、入力画面上部に表示されている契約書の種別「秘密保持契約」に含まれる各種条項をプルダウンで表示し、いずれかの条項の選択入力を受け付ける。これにより端末２は、ユーザが作成を希望する契約書の種別及び条項、すなわちカテゴリの指定入力を受け付ける。 The input screen includes, for example, a clause selection field 71, a keyword selection field 72, and a text input field 73. The clause selection column 71 is a selection column for displaying the clauses of the type of contract already displayed in the upper part of the input screen as a pull-down menu, and allowing the user to select the clause of the text that the user desires to create. As shown in FIG. 7, in the clause selection field 71, the terminal 2 displays various clauses included in the contract type "confidentiality agreement" displayed at the top of the input screen in a pull-down, and selects one of the clauses. Accept input. As a result, the terminal 2 accepts the specification input of the type and clause of the contract that the user desires to create, that is, the category.

キーワード選択欄７２は、条項選択欄７１で選択された条項のキーワードをプルダウンメニューとして表示し、作成する文章に含めたいキーワードをユーザが選択するための選択欄である。サーバ１は条項選択欄７１で選択された条項のキーワードをキーワードＤＢ１４２から読み出して端末２に出力し、端末２は当該キーワードをキーワード選択欄７２にプルダウンで表示し、いずれかのキーワードの選択入力を受け付ける。 The keyword selection column 72 is a selection column for displaying the keyword of the clause selected in the clause selection column 71 as a pull-down menu and allowing the user to select the keyword to be included in the sentence to be created. The server 1 reads the keyword of the clause selected in the clause selection column 71 from the keyword DB 142 and outputs it to the terminal 2, and the terminal 2 displays the keyword in the keyword selection column 72 by pull-down, and selects and inputs one of the keywords. accept.

テキスト入力欄７３は、作成する文章に含めたい任意のキーワード（語句）をユーザがテキスト入力するための入力欄である。端末２はキーワード選択欄７２で提示したキーワードから選択入力を受け付けるだけでなく、テキスト入力欄７３において任意のキーワードの入力を受け付ける。 The text input field 73 is an input field for the user to input any keyword (word/phrase) desired to be included in the sentence to be created. The terminal 2 not only accepts a selection input from the keywords presented in the keyword selection field 72 , but also accepts input of any keyword in the text input field 73 .

なお、テキスト入力欄７３によりユーザが任意のテキストをキーワードとして入力した場合、ユーザが自由に入力を行うため、入力されるテキストの内容によっては言語モデル１４１を用いても当該テキストから契約書の文章を生成することが困難となる虞がある。そこでサーバ１は、置換辞書１４３を用いてユーザが入力したテキストを変換し、変換後のテキストをキーワードとして用いてもよい。図７の例の場合、サーバ１は、ユーザが入力したテキスト「文書による許諾」に含まれる語句「文書」及び「許諾」を「書面」及び「承諾」に変換し、変換後のテキストをキーワードとして用いる。これにより、言語モデル１４１による文章生成が困難になる事態を防止する。 When the user inputs arbitrary text as a keyword in the text input field 73, the user can freely input text. may be difficult to generate. Therefore, the server 1 may convert the text input by the user using the replacement dictionary 143 and use the converted text as a keyword. In the example of FIG. 7, the server 1 converts the words "document" and "permission" included in the text "permission by document" input by the user into "document" and "approval", and converts the converted text to keywords used as This prevents a situation in which sentence generation by the language model 141 becomes difficult.

上述の如く、端末２は、ユーザが作成を希望する契約書のカテゴリ、及び当該条項の文章に含める複数の語句（キーワード）の指定入力を受け付ける。生成ボタン７４への操作入力を受け付けた場合、端末２は入力内容をサーバ１に出力し、指定された語句を含む文章の生成を要求する。 As described above, the terminal 2 accepts designation input of the category of the contract that the user desires to create and a plurality of words and phrases (keywords) to be included in the text of the clause. When an operation input to the generation button 74 is accepted, the terminal 2 outputs the input content to the server 1 and requests generation of a sentence including the specified phrase.

図６に戻って、サーバ１は、ユーザが指定したカテゴリの言語モデル１４１を用い、指定された複数の語句を含む文章を生成して端末２に出力する。図６では、ユーザがカテゴリとして種別「秘密保持契約」の条項「知的財産権」を指定し、かつ、文章に含める語句として「発明」及び「共有」が指定された場合を図示している。この場合、サーバ１は契約書の種別が「秘密保持契約」であり、かつ、条項が「知的財産権」である言語モデル１４１を用い、文章を生成する。 Returning to FIG. 6 , the server 1 uses the language model 141 of the category specified by the user to generate a sentence containing a plurality of specified words and phrases, and outputs the sentence to the terminal 2 . FIG. 6 illustrates a case where the user specifies the clause "intellectual property rights" of the type "confidentiality agreement" as a category, and specifies "invention" and "sharing" as words to be included in the text. . In this case, the server 1 generates a sentence using the language model 141 in which the contract type is "confidentiality agreement" and the clause is "intellectual property rights".

サーバ１はまず、指定された複数の語句のうち、いずれかの語句から、最終的に出力する文章の候補（以下では「候補文」と呼ぶ）を複数生成する。図６では、語句「発明」から複数の候補文が生成される様子を図示している。サーバ１は、当該語句をＲＮＮに係る入力層に入力し、当該語句に基づき次の語句の生起確率を算出して、２番目の語句として生起確率が高い一又は複数の語句を出力として得る。次にサーバ１は、２番目の語句を入力層に入力して、１番目の語句「発明」と、２番目の語句とに基づき次の語句の生起確率を算出して、３番目の語句として生起確率が高い一又は複数の語句を出力として得る。 First, the server 1 generates a plurality of sentence candidates (hereinafter referred to as “candidate sentences”) to be finally output from any of the specified words and phrases. FIG. 6 illustrates how a plurality of candidate sentences are generated from the word "invention". The server 1 inputs the word/phrase into the input layer related to the RNN, calculates the occurrence probability of the next word/phrase based on the word/phrase, and obtains one or more words/phrases with the highest occurrence probability as the second word/phrase as an output. Next, the server 1 inputs the second word/phrase to the input layer, calculates the probability of occurrence of the next word/phrase based on the first word/phrase "invention" and the second word/phrase, and obtains the third word/phrase as One or more words with a high probability of occurrence are obtained as an output.

このように、サーバ１は直前までの語句から次の語句の生起確率を算出し、次の語句を推定していく。サーバ１は上述の処理を繰り返し、一の語句から複数の文章を生成する。この場合にサーバ１は、生成した各文章の妥当性を評価するためのスコアを、文章生成の際に算出された各語句の生起確率に基づいて算出し、スコアが高い文章を候補文とする。例えばサーバ１は、文章全体のスコアとして、文章に含まれる各語句の生起確率の平均値を用いる。サーバ１は、生起確率の平均値が閾値以上の文章を候補文とする。 In this way, the server 1 calculates the occurrence probability of the next word/phrase from the previous word/phrase, and estimates the next word/phrase. The server 1 repeats the above process to generate a plurality of sentences from one phrase. In this case, the server 1 calculates a score for evaluating the validity of each generated sentence based on the occurrence probability of each word calculated when generating the sentence, and selects a sentence with a high score as a candidate sentence. . For example, the server 1 uses the average probability of occurrence of each word included in the sentence as the score of the entire sentence. The server 1 regards sentences with an average value of occurrence probability equal to or greater than a threshold as candidate sentences.

なお、サーバ１は生起確率の平均値をスコアとして用いることで候補文を決定したが、他の基準で候補文を決定してもよい。例えばサーバ１は、生起確率が閾値以下の語句を含む文章、つまり不適切な語句を含む文章を除外することで、候補文を決定してもよい。 Although the server 1 determines the candidate sentences by using the average value of the occurrence probabilities as the score, the candidate sentences may be determined based on other criteria. For example, the server 1 may determine candidate sentences by excluding sentences containing words and phrases whose occurrence probability is equal to or less than a threshold value, that is, sentences containing inappropriate words and phrases.

また、上記ではユーザが指定した複数の語句のうち、単一の語句から候補文を生成したが、本実施の形態はこれに限定されるものではない。例えばサーバ１は、ユーザが指定した３以上の語句のうち、２以上の語句から複数の候補文を生成し、残る指定語句を含む候補文を選択するようにしてもよい。つまりサーバ１は、ユーザが指定した複数の語句のうち、一部の語句を用いて候補文を生成可能であればよく、当該一部の語句は単一の語句に限定されない。 Also, in the above description, candidate sentences are generated from a single word out of a plurality of words specified by the user, but the present embodiment is not limited to this. For example, the server 1 may generate a plurality of candidate sentences from two or more of the three or more words specified by the user, and select candidate sentences containing the remaining specified words. That is, the server 1 only needs to be able to generate candidate sentences using some of the multiple words specified by the user, and the part of the words is not limited to a single word.

サーバ１は、生成した複数の候補文から、ユーザが指定した複数の語句のうち、文章生成に用いていない他の指定語句を含む候補文を、端末２に出力する文章として選択する。図６の例では、サーバ１は「発明」以外の語句「共有」を含む、点線矩形枠で囲んだ候補文を出力文章として選択する。 The server 1 selects candidate sentences including other specified words and phrases not used for sentence generation from among the plurality of words and phrases specified by the user, as sentences to be output to the terminal 2 from the generated candidate sentences. In the example of FIG. 6, the server 1 selects candidate sentences surrounded by dotted-line rectangular frames that include the word "share" other than "invention" as output sentences.

上述の如く、サーバ１は、生成した複数の候補文から、指定された複数の語句を全て含み、かつ、文章全体のスコアが高い候補文を選択する。なお、選択する候補文は単一であってもよく、複数であってもよい。サーバ１は、選択した候補文を端末２に出力する。端末２は、図７に示すようにサーバ１から出力された候補文を表示し、ユーザに提示する。 As described above, the server 1 selects candidate sentences that include all of the specified words and phrases and that have a high overall score from the generated candidate sentences. A single candidate sentence may be selected, or a plurality of candidate sentences may be selected. The server 1 outputs the selected candidate sentences to the terminal 2 . The terminal 2 displays the candidate sentences output from the server 1 as shown in FIG. 7 and presents them to the user.

図８は、文章学習処理の処理手順の一例を示すフローチャートである。図８に基づき、文章学習処理の処理内容について説明する。
サーバ１の制御部１１は、学習用の契約書群を取得する（ステップＳ１１）。制御部１１は、取得した各契約書の文章を、契約書の種別、契約書内の条項等のカテゴリに応じて構造化する処理を行い、各契約書内の文章をカテゴリ毎に分類する（ステップＳ１２）。 FIG. 8 is a flow chart showing an example of the procedure of sentence learning processing. Based on FIG. 8, the contents of the sentence learning process will be described.
The control unit 11 of the server 1 acquires a contract group for learning (step S11). The control unit 11 performs a process of structuring the obtained text of each contract according to categories such as the type of contract and clauses in the contract, and classifies the text in each contract by category ( step S12).

制御部１１は、カテゴリ毎に分類した各文章を、所定単位の文字又は文字列である複数の語句（要素）に分割する（ステップＳ１３）。例えば制御部１１は、単語、文節等の意味単位で文章を分割してもよく、サブワード等の出現頻度に応じた単位で文章を分割してもよい。 The control unit 11 divides each sentence classified into each category into a plurality of words (elements), which are characters or character strings of a predetermined unit (step S13). For example, the control unit 11 may divide the sentence into semantic units such as words and clauses, or may divide the sentence into units according to the appearance frequency of subwords and the like.

制御部１１は、分割した文章の各語句を、文章における並び順に従って学習する機械学習処理を行い、カテゴリ毎に言語モデル１４１を生成する（ステップＳ１４）。例えば制御部１１は、ＲＮＮ（ＬＳＴＭ）のアルゴリズムに基づく機械学習を行い、文章の先頭から順に出現する一又は複数の語句から、当該一又は複数の語句に続いて出現する語句を推定する言語モデル１４１を生成する。制御部１１は、カテゴリ毎に別々の言語モデル１４１を生成する。制御部１１は、生成した各カテゴリの言語モデル１４１を補助記憶部１４に格納する。 The control unit 11 performs machine learning processing for learning each word of the divided sentence according to the order of arrangement in the sentence, and generates the language model 141 for each category (step S14). For example, the control unit 11 performs machine learning based on the RNN (LSTM) algorithm, and from one or more words that appear in order from the beginning of the sentence, a language model that estimates words that appear following the one or more words. 141 is generated. The control unit 11 generates separate language models 141 for each category. The control unit 11 stores the generated language model 141 for each category in the auxiliary storage unit 14 .

また、制御部１１は、各カテゴリの文章内での出現頻度に基づき、各カテゴリの文章のキーワードを抽出してキーワードＤＢ１４２に記憶する（ステップＳ１５）。制御部１１は、一連の処理を終了する。 Further, the control unit 11 extracts the keywords of the sentences of each category based on the appearance frequency in the sentences of each category and stores them in the keyword DB 142 (step S15). The control unit 11 ends the series of processes.

図９は、文章生成処理の処理手順の一例を示すフローチャートである。図９に基づき、文章生成処理の処理内容について説明する。
サーバ１の制御部１１は、図７で例示した入力画面を介して、生成する文章のカテゴリを指定する指定入力を端末２から受け付ける（ステップＳ３１）。上述の如く、文章のカテゴリは契約書の種別、条項等である。制御部１１は、同じく図７で例示した入力画面を介して、生成する文章に含める複数の語句を指定する指定入力を端末２から受け付ける（ステップＳ３２）。例えば制御部１１は、キーワードＤＢ１４２を参照して、ステップＳ３１で指定されたカテゴリの文章のキーワードを複数提示し、ユーザによる選択入力を受け付ける。また、制御部１１はユーザから任意の語句のテキスト入力を受け付けてもよい。また、任意の語句のテキスト入力を受け付けた場合、制御部１１は置換辞書１４３を用いて、入力された語句を置換してもよい。 FIG. 9 is a flow chart showing an example of a processing procedure of sentence generation processing. Based on FIG. 9, the processing contents of the text generation processing will be described.
The control unit 11 of the server 1 receives, from the terminal 2, a designation input that designates the category of the text to be generated via the input screen illustrated in FIG. 7 (step S31). As described above, the text category is the contract type, clause, and the like. The control unit 11 also receives, from the terminal 2, a designation input for designating a plurality of words to be included in the sentence to be generated via the input screen illustrated in FIG. 7 (step S32). For example, the control unit 11 refers to the keyword DB 142, presents a plurality of keywords of sentences of the category specified in step S31, and receives selection input by the user. Further, the control unit 11 may accept text input of arbitrary phrases from the user. Further, when receiving a text input of an arbitrary word/phrase, the control unit 11 may use the replacement dictionary 143 to replace the input word/phrase.

制御部１１は、ステップＳ３１で指定されたカテゴリの言語モデル１４１を用いて、ステップＳ３２で指定された複数の語句のうち、いずれかの語句から候補文を複数生成する（ステップＳ３３）。例えば制御部１１は、指定語句のいずれかをＲＮＮである言語モデル１４１に入力し、指定語句に続く語句を出力として得る。さらに制御部１１は、指定語句と、出力された語句とに基づいて次に続く語句を出力として得る。制御部１１は当該処理を繰り返し、複数の文章を生成する。制御部１１は、文章を生成する際に算出した各語句の生起確率に基づいて各パターンの文章全体のスコア（例えば生起確率の平均値）を算出する。制御部１１は、算出したスコアに応じて候補文を決定する。例えば制御部１１は、スコアが閾値以上の文章を候補文に決定する。 Using the language model 141 of the category specified in step S31, the control unit 11 generates a plurality of candidate sentences from any of the words specified in step S32 (step S33). For example, the control unit 11 inputs one of the specified words into the language model 141, which is the RNN, and obtains the words following the specified words as output. Furthermore, the control unit 11 obtains the following phrase as an output based on the specified phrase and the outputted phrase. The control unit 11 repeats the processing to generate a plurality of sentences. The control unit 11 calculates a score (for example, an average value of occurrence probabilities) of the entire sentence of each pattern based on the occurrence probability of each word/phrase calculated when generating the sentence. The control unit 11 determines candidate sentences according to the calculated scores. For example, the control unit 11 determines sentences with scores equal to or higher than a threshold as candidate sentences.

制御部１１は、生成した複数の候補文のうち、ステップＳ３３で候補文を生成する際に用いた語句以外の他の指定語句を含む候補文を、最終的に出力する文章として選択する（ステップＳ３４）。制御部１１は、選択した文章を端末２に出力し（ステップＳ３５）、一連の処理を終了する。 The control unit 11 selects, from among the plurality of generated candidate sentences, candidate sentences containing designated words other than the words used when generating the candidate sentences in step S33 as sentences to be finally output (step S34). The control unit 11 outputs the selected text to the terminal 2 (step S35), and ends the series of processes.

なお、上記ではユーザが指定した複数の語句を用いて文章を生成したが、単一の指定語句のみから文章を生成してもよい。 In the above description, a sentence is generated using a plurality of words specified by the user, but a sentence may be generated only from a single specified word.

また、上記では生成する文章の一例として契約書を挙げたが、生成する文章は契約書に限定されず、他の文書に係るものであってもよい。 In addition, although the contract is given as an example of the text to be generated above, the text to be generated is not limited to the contract, and may relate to other documents.

また、上記では、サーバ１は学習用契約書の文章に出現する各語句を先頭から末尾に亘り文章通りの正順序で学習して言語モデル１４１を生成したが、本実施の形態はこれに限定されるものではない。例えばサーバ１は、後述する実施の形態５のように文章の先頭及び末尾を入れ替え、文章の末尾から先頭に亘る逆順序で各語句を学習した言語モデル１４１を生成し、逆順序の言語モデル１４１から文章を生成するようにしてもよい。また、例えばサーバ１は、正順序及び逆順序の双方向で学習を行ったｂｉｄｉｒｅｃｔｉｏｎａｌＲＮＮを言語モデル１４１として構築し、当該言語モデル１４１を用いて文章を生成してもよい。このように、サーバ１は文章内の語順を学習した言語モデル１４１から文章を生成可能であればよく、その語順は正順序であるか、又は逆順序であるかを問わず、また、正順序及び逆順序の双方向を学習して言語モデル１４１を構築してもよい。 In the above description, the server 1 generates the language model 141 by learning each word appearing in the text of the learning contract from the beginning to the end in the correct order of the text, but the present embodiment is limited to this. not to be For example, the server 1 replaces the beginning and end of a sentence as in Embodiment 5, which will be described later, and generates a language model 141 that has learned each word in reverse order from the end to the beginning of the sentence. You may make it generate|occur|produce a sentence from. Further, for example, the server 1 may construct a bidirectional RNN that has been learned in both the forward order and the reverse order as the language model 141, and use the language model 141 to generate sentences. In this way, the server 1 only needs to be able to generate a sentence from the language model 141 that has learned the word order in the sentence, regardless of whether the word order is forward or reverse. The language model 141 may be constructed by learning bi-directionally and in reverse order.

以上より、本実施の形態１によれば、学習用契約書の文章内に出現する各語句を、その文章の順に学習した言語モデル１４１を用いることで、ユーザが指定した語句を含む文章を適切に生成することができる。 As described above, according to the first embodiment, by using the language model 141 that learns each phrase appearing in the sentence of the learning contract in the order of the sentence, the sentence including the phrase specified by the user can be appropriately reproduced. can be generated to

また、本実施の形態１によれば、ユーザが指定した複数の語句から文章を生成することで、より適切な文章を生成することができる。 Moreover, according to the first embodiment, by generating a sentence from a plurality of words specified by the user, it is possible to generate a more appropriate sentence.

また、本実施の形態１によれば、言語モデル１４１から生成した文章のスコアを各語句の生起確率から算出することで、当該文章の妥当性を評価し、適切な文章をユーザに提示することができる。 Further, according to the first embodiment, by calculating the score of a sentence generated from the language model 141 from the probability of occurrence of each word, the validity of the sentence is evaluated, and an appropriate sentence is presented to the user. can be done.

また、本実施の形態１によれば、文章のカテゴリ毎に言語モデル１４１を用意し、ユーザが指定したカテゴリに対応する言語モデル１４１から文章を生成することで、より適切な文章をユーザに提示することができる。 Further, according to the first embodiment, a language model 141 is prepared for each category of sentences, and a sentence is generated from the language model 141 corresponding to the category specified by the user, thereby presenting a more appropriate sentence to the user. can do.

また、本実施の形態１によれば、ユーザが任意に入力した語句から文章を生成する際に、置換辞書を用いて入力された語句を置換することで、言語モデル１４１を用いても文章を生成困難となるような事態を防止することができる。 Further, according to the first embodiment, when a sentence is generated from a word arbitrarily input by the user, the sentence is generated using the language model 141 by replacing the input word using the replacement dictionary. It is possible to prevent situations in which generation is difficult.

（実施の形態２）
本実施の形態では、言語モデル１４１に基づいて文章の先頭に頻出するフレーズを生成し、当該フレーズから、ユーザが指定した語句を含む文章を生成する形態について述べる。なお、実施の形態１と重複する内容については同一の符号を付して説明を省略する。
図１０は、実施の形態２の概要を示す説明図である。図１０に基づき、本実施の形態でサーバ１が実行する文章生成処理について説明する。 (Embodiment 2)
In this embodiment, a form is described in which phrases that frequently appear at the beginning of sentences are generated based on the language model 141, and sentences including words specified by the user are generated from the phrases. In addition, the same code|symbol is attached|subjected about the content which overlaps with Embodiment 1, and description is abbreviate|omitted.
FIG. 10 is an explanatory diagram showing an overview of the second embodiment. Based on FIG. 10, the sentence generation process which the server 1 performs in this Embodiment is demonstrated.

本実施の形態に係るサーバ１はまず、言語モデル１４１を用いて、契約書の文章先頭に出現するであろうと推定される複数の語句、すなわち頻出の先頭フレーズを生成する。例えばサーバ１は、言語モデル１４１の生成時に用いた学習用契約書の文章、すなわち学習元文章において、文章先頭に存在する語句から言語モデル１４１を用いて先頭フレーズを生成する。図１０の例では、「甲」が学習元文章の先頭に存在する語句に該当する。サーバ１は言語モデル１４１を参照しながら、学習元文章の先頭語句に続く語句として、一定値以上の生起確率を有する語句を選択する。さらにサーバ１は、選択した語句に続く語句として、一定値以上の生起確率を有する語句を選択する。サーバ１は当該処理を繰り返し、一定値以上の生起確率を有する語句を、先頭語句に続く語句として順々に選択し、文章を生成していく。 The server 1 according to the present embodiment first uses the language model 141 to generate a plurality of words and phrases that are presumed to appear at the beginning of the text of the contract, ie, frequently appearing first phrases. For example, the server 1 uses the language model 141 to generate the first phrase from the words and phrases present at the beginning of the text of the learning contract used when generating the language model 141, that is, the learning source text. In the example of FIG. 10, "ko" corresponds to the word at the beginning of the learning source sentence. While referring to the language model 141, the server 1 selects words and phrases having occurrence probabilities equal to or greater than a certain value as words and phrases following the first word and phrase of the learning source sentence. Furthermore, the server 1 selects words and phrases having occurrence probabilities equal to or greater than a certain value as words and phrases following the selected words and phrases. The server 1 repeats this process, sequentially selects words and phrases having occurrence probabilities equal to or greater than a certain value as words and phrases following the head word, and generates sentences.

サーバ１は、上記のように文章先頭から順に生起確率が一定値以上の語句を選択していき、複数の語句を先頭フレーズとして生成する。例えばサーバ１は、ユーザが指定した語句のいずれか（図１０では「発明」又は「共有」）が出現するまで文章生成を行い、指定語句が出現した時点で、指定語句より前に位置する一又は複数の語句を先頭フレーズとして特定すればよい。あるいはサーバ１は、文章の先頭から所定語数の語句を生成し、当該所定語数の語句から、意味、文法等のまとまりを持ったフレーズ部分を特定してもよい。文章先頭からどこまでの語句を先頭フレーズとして特定するか、その手法は特に問わない。 As described above, the server 1 sequentially selects words and phrases having occurrence probabilities equal to or higher than a certain value from the beginning of the sentence, and generates a plurality of words and phrases as the first phrase. For example, the server 1 generates sentences until one of the phrases specified by the user (in FIG. 10, "invention" or "shared") appears, and when the specified phrase appears, the word preceding the specified phrase appears. Alternatively, a plurality of words may be specified as the first phrase. Alternatively, the server 1 may generate a predetermined number of words and phrases from the beginning of the sentence, and from the predetermined number of words and phrases, specify a phrase portion having a cohesive meaning, grammar, and the like. It does not matter how far the phrase from the beginning of the sentence is specified as the first phrase.

サーバ１は、生成した先頭フレーズから、ユーザが指定した語句を含む文章を生成する。具体的には、サーバ１は、ユーザが指定した複数の語句のうち、いずれかの語句を先頭フレーズの次の語句として配置する。そしてサーバ１は、先頭フレーズ及び指定語句をＲＮＮ（言語モデル１４１）に入力し、指定語句に続く語句を推定して複数の候補文を生成する。 The server 1 generates a sentence including the phrase specified by the user from the generated first phrase. Specifically, the server 1 arranges one of the plural words specified by the user as the word following the first phrase. Then, the server 1 inputs the initial phrase and the designated word/phrase to the RNN (language model 141), estimates the word/phrase following the designated word/phrase, and generates a plurality of candidate sentences.

後の処理は実施の形態１と同様であり、サーバ１は文章生成に用いていない他の指定語句を含む候補文を最終的に出力する文章として選択し、端末２に出力する。 The subsequent processing is the same as in the first embodiment, and the server 1 selects candidate sentences containing other specified words and phrases not used for sentence generation as sentences to be finally output, and outputs them to the terminal 2 .

図１１は、実施の形態２に係る文章生成処理の処理手順の一例を示すフローチャートである。
生成する文章のカテゴリの指定入力を受け付け（ステップＳ３１）、当該文章に含める複数の語句の指定入力を受け付けた後（ステップＳ３２）、サーバ１の制御部１１は、以下の処理を実行する。制御部１１は、ステップＳ３１で指定されたカテゴリの言語モデル１４１に基づき、当該カテゴリの文章の先頭に出現するフレーズ（複数の語句）を生成する（ステップＳ２０１）。制御部１１は、生成したフレーズと、ステップＳ３１で指定された複数の語句のうちいずれかの語句とに基づいて、複数の候補文を生成する（ステップＳ２０２）。具体的には、制御部１１は先頭フレーズの次に指定語句のいずれかを配置してＲＮＮに入力し、後続の語句を推定して文章を生成する。制御部１１は、処理をステップＳ３４に移行する。 FIG. 11 is a flowchart illustrating an example of a processing procedure of sentence generation processing according to the second embodiment.
After accepting input specifying the category of the sentence to be generated (step S31) and accepting input specifying multiple words to be included in the sentence (step S32), the control unit 11 of the server 1 executes the following processes. Based on the language model 141 of the category specified in step S31, the control unit 11 generates a phrase (a plurality of words) appearing at the beginning of sentences in the category (step S201). Control unit 11 generates a plurality of candidate sentences based on the generated phrase and one of the plurality of words specified in step S31 (step S202). Specifically, the control unit 11 arranges one of the designated phrases next to the first phrase, inputs it to the RNN, estimates the subsequent phrases, and generates a sentence. The control unit 11 shifts the process to step S34.

以上より、本実施の形態２によれば、言語モデル１４１に基づいて先頭フレーズを生成し、当該フレーズの次に指定語句を配置して後続の語句を生成していくことで、より好適に文章を生成することができる。 As described above, according to the second embodiment, the first phrase is generated based on the language model 141, and the subsequent phrases are generated by arranging the specified phrase next to the phrase, thereby making the sentence more suitable. can be generated.

（実施の形態３）
本実施の形態では、ユーザが指定した順序で指定語句が出現する文章を生成する形態について説明する。
図１２は、実施の形態３の概要を示す説明図である。図１２に基づき、本実施の形態に係るサーバ１が実行する文章生成処理について説明する。 (Embodiment 3)
In this embodiment, a form in which a sentence is generated in which specified words appear in the order specified by the user will be described.
FIG. 12 is an explanatory diagram showing an overview of the third embodiment. The text generation process executed by the server 1 according to the present embodiment will be described with reference to FIG. 12 .

本実施の形態でサーバ１は、図７で例示した画面と同様の入力画面において、複数の語句の指定入力を受け付けるだけでなく、当該複数の語句が文章内で出現する順序の指定入力を受け付ける。そしてサーバ１は、指定された順序で各語句が出現する文章を生成する。 In the present embodiment, the server 1 accepts not only input specifying a plurality of words on an input screen similar to the screen illustrated in FIG. . Then, the server 1 generates sentences in which each word appears in the designated order.

例えば図１２に示すように、「発明」が１番目の語句、「帰属」が２番目の語句、「協議」が３番目の語句として指定された場合を考える。この場合、サーバ１は１番目の語句として指定された「発明」をＲＮＮ（言語モデル１４１）に入力し、後続の語句を推定して複数の候補文を生成する。 For example, as shown in FIG. 12, consider a case where "invention" is specified as the first word, "attribution" as the second word, and "consultation" as the third word. In this case, the server 1 inputs "invention" specified as the first word/phrase to the RNN (language model 141), estimates subsequent words/phrases, and generates a plurality of candidate sentences.

サーバ１は、生成した複数の候補文のうち、文章生成に用いていない他の指定語句がユーザにより指定された順序で出現する文章を選択する。図１２の例では、「発明」以外の指定語句「帰属」及び「協議」を含み、かつ、２番目に指定された語句「帰属」が３番目に指定された語句「協議」よりも先に出現する文章を選択する。サーバ１は、選択した文章を端末２に出力する。 The server 1 selects, from among the plurality of generated candidate sentences, sentences in which other specified words and phrases not used for sentence generation appear in the order specified by the user. In the example of FIG. 12, the specified words other than "invention" and "attribution" and "consultation" are included, and the second specified word "attribution" precedes the third specified word "consultation". Select the text that appears. The server 1 outputs the selected text to the terminal 2.

図１３は、実施の形態３に係る文章生成処理の処理手順の一例を示すフローチャートである。
生成する文章のカテゴリの指定入力を受け付けた後（ステップＳ３１）、サーバ１の制御部１１は以下の処理を実行する。制御部１１は、生成する文章に含める複数の語句と、各語句が文章内で出現する順序とを指定する指定入力を受け付ける（ステップＳ３０１）。制御部１１は、指定された複数の語句のうち、いずれかの語句から候補文を複数生成する（ステップＳ３０２）。そして制御部１１は、複数の候補文から、ステップＳ３０１で指定された順序で各指定語句が出現する候補文を、最終的に出力する文章として選択する（ステップＳ３０３）。制御部１１は処理をステップＳ３５に移行する。 FIG. 13 is a flowchart illustrating an example of a processing procedure of sentence generation processing according to the third embodiment.
After receiving the designation input of the category of the text to be generated (step S31), the control section 11 of the server 1 executes the following processing. The control unit 11 receives a designation input that designates a plurality of words and phrases to be included in the sentence to be generated and the order in which the words and phrases appear in the sentence (step S301). The control unit 11 generates a plurality of candidate sentences from one of the specified words (step S302). Then, the control unit 11 selects a candidate sentence in which each specified word/phrase appears in the order specified in step S301 from a plurality of candidate sentences as a sentence to be finally output (step S303). The control unit 11 shifts the process to step S35.

以上より、本実施の形態３によれば、指定された順序で指定語句が出現する文章を生成することで、より正確な文章の生成を行うことができる。 As described above, according to the third embodiment, a more accurate sentence can be generated by generating a sentence in which specified words appear in a specified order.

（実施の形態４）
本実施の形態では、ユーザが指定した複数の語句を全て含む候補文が生成されない場合、候補文に含まれる語句を指定語句に置き換えて、指定語句を全て含む文章を生成する形態について説明する。
図１４は、実施の形態４の概要を示す説明図である。図１４に基づき、本実施の形態に係るサーバ１が実行する文章生成処理について説明する。 (Embodiment 4)
In this embodiment, when a candidate sentence containing all of a plurality of words specified by the user is not generated, the words included in the candidate sentence are replaced with the specified words to generate a sentence including all the specified words.
FIG. 14 is an explanatory diagram showing an overview of the fourth embodiment. The text generation process executed by the server 1 according to this embodiment will be described with reference to FIG. 14 .

本実施の形態に係るサーバ１は、実施の形態１と同様に、ユーザが指定した複数の語句のうち、いずれかの語句に基づいて複数の候補文を生成する。この場合に、ユーザが指定した語句によっては、候補文生成に用いていない他の指定語句を含む文章が候補文として生成されない可能性もある。例えば図１４の上側に模式的に示すように、「発明」及び「共有」が指定された場合に、語句「発明」から複数の候補文が生成されたものの、候補文のいずれにも「共有」が含まれないような可能性もある。 Server 1 according to the present embodiment generates a plurality of candidate sentences based on one of the plurality of phrases specified by the user, as in the first embodiment. In this case, depending on the phrase specified by the user, sentences containing other specified phrases not used for candidate sentence generation may not be generated as candidate sentences. For example, as schematically shown in the upper part of FIG. ” may not be included.

この場合にサーバ１は、候補文に含まれる語句を指定語句に置換しながら、文章のスコアに応じて端末２に出力する文章を選択する。例えばサーバ１はまず、上記のようにして生成された複数の候補文のうち、スコアが最も高い候補文を選択する。そしてサーバ１は、選択した候補文に含まれる各語句を残りの指定語句に置換して、複数の候補文を新たに生成する。図１４の例では、スコアが最も高い点線矩形枠で囲んだ文章について、「発明」に続く２番目の語句、３番目の語句、４番目の語句…をそれぞれ残りの指定語句「共有」に置換した候補文を生成する。 In this case, the server 1 selects sentences to be output to the terminal 2 according to the score of the sentences while replacing the words included in the candidate sentences with the specified words. For example, the server 1 first selects the candidate sentence with the highest score from among the plurality of candidate sentences generated as described above. Then, the server 1 replaces each word/phrase included in the selected candidate sentence with the remaining designated word/phrase to generate a plurality of new candidate sentences. In the example of FIG. 14, regarding the sentences surrounded by the dotted rectangular frame with the highest score, the second, third, fourth, and so on following "invention" are replaced with the remaining designated words "shared". generate candidate sentences.

なお、上記では指定語句「発明」以外の全ての語句を総当たりで指定語句「共有」に置換しているが、本実施の形態はこれに限定されるものではない。例えばサーバ１は、指定語句「発明」を除く各語句のうち、残りの指定語句「共有」と同一品詞の語句に絞って指定語句「共有」に置換し、候補文を生成してもよい。すなわちサーバ１は、候補文生成に用いていない他の指定語句（要素）と同一品詞の語句を、当該他の指定語句に変換して候補文を生成する。同一品詞の間で変換を行うことで、より適切な候補文を作成することができると共に、文章生成に伴う処理負荷を軽減することができる。 In the above description, all words other than the specified word "invention" are replaced with the specified word "shared" in round robin, but the present embodiment is not limited to this. For example, the server 1 may narrow down words having the same part of speech as the rest of the specified words "shared" out of the specified words "invention" and replace them with the specified words "shared" to generate candidate sentences. That is, the server 1 generates a candidate sentence by converting a word having the same part of speech as another specified word/phrase (element) not used for candidate sentence generation into the other specified word/phrase. By converting between the same parts of speech, more appropriate candidate sentences can be created and the processing load associated with sentence generation can be reduced.

サーバ１は、生成した各候補文の語句をＲＮＮ（言語モデル１４１）に入力して生起確率を算出し、各候補文のスコアを算出する。そしてサーバ１は、スコアが高い候補文を選択する。例えばサーバ１は、スコアが閾値以上の候補文を選択する。このようにしてサーバ１は、複数の指定語句を全て含み、かつ、スコアが閾値以上の候補文を生成する。サーバ１は、生成した候補文を端末２に出力し、ユーザに提示する。 The server 1 inputs the generated word/phrase of each candidate sentence to the RNN (language model 141), calculates the probability of occurrence, and calculates the score of each candidate sentence. Then, the server 1 selects a candidate sentence with a high score. For example, the server 1 selects candidate sentences whose scores are equal to or greater than a threshold. In this way, the server 1 generates candidate sentences that include all of the specified words and phrases and have scores equal to or higher than the threshold. The server 1 outputs the generated candidate sentences to the terminal 2 and presents them to the user.

図１５は、実施の形態４に係る文章生成処理の処理手順の一例を示すフローチャートである。
ステップＳ３２で指定された複数の語句のうち、いずれかの語句を用いて複数の候補文を生成した後（ステップＳ３３）、サーバ１の制御部１１は以下の処理を実行する。制御部１１は、ステップＳ３２で指定された複数の語句のうち、ステップＳ３３で文章を生成する際に用いた語句以外の他の指定語句を含む候補文があるか否かを判定する（ステップＳ４０１）。指定語句を含む候補文があると判定した場合（Ｓ４０１：ＹＥＳ）、制御部１１は処理をステップＳ３４に移行する。 FIG. 15 is a flowchart illustrating an example of a processing procedure of sentence generation processing according to the fourth embodiment.
After generating a plurality of candidate sentences using any one of the plurality of phrases specified in step S32 (step S33), the control section 11 of the server 1 executes the following processing. The control unit 11 determines whether or not there is a candidate sentence including specified words other than the words used when generating the sentence in step S33 among the plurality of words specified in step S32 (step S401). ). If it is determined that there is a candidate sentence containing the specified word (S401: YES), the control section 11 shifts the process to step S34.

指定語句を含む候補文がないと判定した場合（Ｓ４０１：ＮＯ）、制御部１１は、ステップＳ３３で生成した候補文に含まれる各語句を指定語句に置換しながら、新たな候補文を生成する（ステップＳ４０２）。例えば制御部１１は、ステップＳ３３で生成した複数の候補文のうち最もスコアが高い候補文を選択し、当該候補文に含まれる各語句を、文章生成に用いていない指定語句に置換して複数の候補文を新たに生成する。この場合に制御部１１は、例えば候補文に含まれる各語句のうち、文章生成に用いていない指定語句（他の要素）と同一品詞の語句を指定語句に置換することで新たな候補文を生成する。 When it is determined that there is no candidate sentence containing the designated word (S401: NO), the control unit 11 generates a new candidate sentence while replacing each word included in the candidate sentence generated in step S33 with the designated word. (Step S402). For example, the control unit 11 selects a candidate sentence with the highest score from among the plurality of candidate sentences generated in step S33, replaces each word included in the candidate sentence with a specified word not used for generating a sentence, and selects a plurality of candidate sentences. New candidate sentences for are generated. In this case, the control unit 11 generates a new candidate sentence by, for example, replacing words and phrases of the same part of speech as specified words and phrases (other elements) not used for sentence generation among the words and phrases included in the candidate sentences. Generate.

制御部１１は、言語モデル１４１を用いて新たに生成した複数の候補文それぞれのスコアを算出し、スコアが高い候補文を、最終的に出力する文章として選択する（ステップＳ４０３）。例えば制御部１１は、スコアが閾値以上の候補文を選択する。制御部１１は処理をステップＳ３５に移行する。 The control unit 11 uses the language model 141 to calculate the score of each of the newly generated candidate sentences, and selects the candidate sentence with the highest score as the sentence to be finally output (step S403). For example, the control unit 11 selects candidate sentences whose scores are equal to or greater than a threshold. The control unit 11 shifts the process to step S35.

以上より、本実施の形態４によれば、指定語句を全て含む文章が生成されない場合であっても、文章内の語句を置き換えることで、指定語句を含む文章を適切に生成することができる。 As described above, according to the fourth embodiment, even when a sentence including all specified words is not generated, a sentence including the specified words can be appropriately generated by replacing the words in the sentence.

また、本実施の形態４によれば、上記の置換を同一品詞の間で行うことで、指定語句を含む文章をより適切に生成することができる。 Further, according to the fourth embodiment, by performing the above replacement between the same part of speech, it is possible to more appropriately generate a sentence including the designated word/phrase.

（実施の形態５）
本実施の形態では、学習用契約書から正順序の言語モデル１４１（第１の言語モデル）だけでなく逆順序の言語モデル１４１（第２の言語モデル）を生成し、両方の言語モデル１４１を組み合わせて文章を生成する形態について説明する。
図１６は、実施の形態５に係る文章学習処理に関する説明図である。図１６では、学習用契約書に出現する語句の並び順を、文章の末尾から逆順序で学習する様子を概念的に図示している。 (Embodiment 5)
In this embodiment, not only the forward order language model 141 (first language model) but also the reverse order language model 141 (second language model) are generated from the learning contract, and both language models 141 are generated. A form of generating a sentence by combining will be described.
FIG. 16 is an explanatory diagram of sentence learning processing according to the fifth embodiment. FIG. 16 conceptually illustrates how words appearing in a learning contract are learned in reverse order from the end of the sentence.

サーバ１は実施の形態１と同様に、学習用契約書の文章を所定単位の語句毎に分割し、文章の先頭から末尾に亘って正順序で各語句を学習して言語モデル１４１を生成する。本実施の形態ではさらに、サーバ１は文章の各語句を逆順序に並び替え、文章の末尾から逆順序で各語句を学習して、逆順序の言語モデル１４１を生成する。 As in the first embodiment, the server 1 divides the text of the learning contract into predetermined units of words and phrases, learns the words and phrases from the beginning to the end of the text in positive order, and generates the language model 141. . Further, in this embodiment, the server 1 rearranges the words in the sentence in reverse order, learns the words in reverse order from the end of the sentence, and generates the language model 141 in reverse order.

例えばサーバ１は、正順序の学習処理を完了後、学習用文章の先頭から末尾までを入れ替え、末尾の語句を先頭にした逆順序に並び替える。すなわち、図１６に示すように、サーバ１は「甲」、「及び」、「乙」、「が」、…「通知」、「する」と続く語句を、「する」、「通知」、「に」、「相手方」、…「及び」、「甲」という順序に並び替える。サーバ１は、図３で示したＲＮＮと同様の構成を有する逆順序学習用のＲＮＮの入力層に、並び替えた各語句を入力する。そしてサーバ１は、文章の末尾から逆順序で出現する一又は複数の語句に基づき、当該一又は複数の語句の直前に出現する語句を推定するＲＮＮを構築する。つまりサーバ１は、文章内の任意の位置の語句を、当該語句に続く後続の語句から推定するＲＮＮを構築する。 For example, after completing the learning process in the forward order, the server 1 rearranges the learning sentences from the beginning to the end, and rearranges them in reverse order with the last word at the beginning. That is, as shown in FIG. 16, the server 1 converts the phrases "K", "and", "B", "ga", ... "notice", "to do" to "do", "notify", " Arrange them in the order of "to", "counterparty", ... "and", and "Party A". The server 1 inputs each rearranged word/phrase to the input layer of the RNN for reverse order learning having the same configuration as the RNN shown in FIG. Then, the server 1 constructs an RNN for estimating a word that appears immediately before one or more words, based on one or more words that appear in reverse order from the end of the sentence. In other words, the server 1 constructs an RNN that estimates a word/phrase at an arbitrary position in a sentence from words/phrases that follow the word/phrase.

図１７は、実施の形態５に係る文章生成処理に関する説明図である。ユーザが指定した語句から文章を生成する場合、サーバ１は、正順序及び逆順序の双方の言語モデル１４１から算出される各語句の生起確率を組み合わせて、文章を生成する。 FIG. 17 is an explanatory diagram of sentence generation processing according to the fifth embodiment. When generating sentences from words specified by the user, the server 1 combines the occurrence probabilities of each word/phrase calculated from both the forward-order and reverse-order language models 141 to generate sentences.

例えばサーバ１は、まず正順序の言語モデル１４１を用いて候補文を生成する。具体的には実施の形態１と同様に、サーバ１は複数の指定語句のいずれかから複数の候補文を生成する。そしてサーバ１は、残りの指定語句を含む候補文を選択する。ここで、図１７に例示する候補文が選択された場合を考える。 For example, the server 1 first generates candidate sentences using the forward order language model 141 . Specifically, as in the first embodiment, the server 1 generates a plurality of candidate sentences from any of a plurality of specified words. The server 1 then selects candidate sentences containing the rest of the specified words. Here, consider a case where the candidate sentences illustrated in FIG. 17 are selected.

サーバ１は候補文の語句の並び順を逆順序に変換し、言語モデル１４１に入力して各語句の生起確率を算出する。そしてサーバ１は、逆順序の言語モデル１４１から算出した生起確率に基づき、候補文が適切な文章であるか否か判定する。例えばサーバ１は、各語句の生起確率を所定の閾値と比較し、閾値以下の語句があるか否かを判定する。これによりサーバ１は、逆順序の言語モデル１４１から、不適切な語句があるかをチェックする。 The server 1 reverses the arrangement order of the words in the candidate sentence, inputs it to the language model 141, and calculates the probability of occurrence of each word. Then, the server 1 determines whether or not the candidate sentence is an appropriate sentence based on the probability of occurrence calculated from the reverse order language model 141 . For example, the server 1 compares the probability of occurrence of each word/phrase with a predetermined threshold value, and determines whether or not there is a word/phrase that is equal to or less than the threshold value. As a result, the server 1 checks whether there is an inappropriate word or phrase from the reverse order language model 141 .

生起確率が閾値以下の語句があると判定した場合、例えばサーバ１は逆順序の言語モデル１４１を用いて、該当語句を別の語句に置換する。すなわちサーバ１は、該当語句を校正する。例えばサーバ１は、該当語句の直後から出現する一又は複数の語句（図１７では「に」の直後から出現する「相手方」、「と」、「の」…の語句）より、該当語句の位置に出現する語句を推定し、語句を置換する。これによりサーバ１は、候補文をより適切な文章に校正する。サーバ１は、校正後の文章を端末２に出力する。 When determining that there is a word whose occurrence probability is equal to or lower than the threshold, the server 1 replaces the relevant word with another word using the reverse order language model 141, for example. That is, the server 1 proofreads the relevant words. For example, the server 1 determines the position of the relevant phrase from one or more phrases that appear immediately after the relevant phrase (in FIG. 17, the phrases “opponent”, “to”, “no”, etc. , and replace the words. As a result, the server 1 corrects the candidate sentences into more appropriate sentences. The server 1 outputs the proofread sentence to the terminal 2 .

図１８は、実施の形態５に係る文章学習処理の処理手順の一例を示すフローチャートである。
契約書の文章を複数の語句に分割した後（ステップＳ１３）、サーバ１の制御部１１は以下の処理を実行する。制御部１１は、分割した文章の各語句を、文章における並び順に従って学習する機械学習処理を行い、正順序の言語モデル１４１を生成する（ステップＳ５０１）。次に制御部１１は、文章の先頭から末尾までを並び替え、各語句の並び順を逆順序に変換する（ステップＳ５０２）。そして制御部１１は、各語句を逆順序で学習した言語モデル１４１を生成する（ステップＳ５０３）。制御部１１は、ステップＳ５０１で生成した正順序の言語モデル１４１と、ステップＳ５０３で生成した逆順序の言語モデル１４１とを補助記憶部１４に格納し、処理をステップＳ１５に移行する。 18 is a flowchart illustrating an example of a processing procedure of sentence learning processing according to Embodiment 5. FIG.
After dividing the text of the contract into a plurality of words (step S13), the control section 11 of the server 1 executes the following processes. The control unit 11 performs machine learning processing for learning each word of the divided sentence according to the order in which the sentence is arranged, and generates the forward order language model 141 (step S501). Next, the control unit 11 rearranges the sentences from the beginning to the end, and reverses the arrangement order of each word (step S502). Then, the control unit 11 generates the language model 141 by learning each word in reverse order (step S503). The control unit 11 stores the forward-order language model 141 generated in step S501 and the reverse-order language model 141 generated in step S503 in the auxiliary storage unit 14, and shifts the process to step S15.

図１９は、実施の形態５に係る文章生成処理の処理手順の一例を示すフローチャートである。
生成する文章に含める複数の語句の指定入力を受け付けた後（ステップＳ３２）、サーバ１の制御部１１は以下の処理を実行する。制御部１１は、正順序の言語モデル１４１に基づき、ステップＳ３２で指定された複数の語句のうち、いずれかの語句から候補文を複数生成する（ステップＳ５２１）。制御部１１は、生成した複数の候補文から、ステップＳ５２１で用いていない指定語句を含む候補文を選択する（ステップＳ５２２）。 19 is a flowchart illustrating an example of a processing procedure of sentence generation processing according to Embodiment 5. FIG.
After receiving the designation input of a plurality of words to be included in the sentence to be generated (step S32), the control section 11 of the server 1 executes the following processing. Based on the normal order language model 141, the control unit 11 generates a plurality of candidate sentences from any of the plurality of words specified in step S32 (step S521). The control unit 11 selects a candidate sentence including the specified word/phrase not used in step S521 from the generated candidate sentences (step S522).

制御部１１は、ステップＳ５２２で選択した候補文に含まれる各語句の並び順を逆順序に変換する（ステップＳ５２３）。そして制御部１１は、逆順序の言語モデル１４１に基づき、候補文に含まれる各語句の生起確率を算出する（ステップＳ５２４）。制御部１１は、逆順序の言語モデル１４１から算出した生起確率に応じて、最終的に出力する文章を決定する（ステップＳ５２５）。例えば制御部１１は、逆順序の言語モデル１４１を用いて候補文に含まれる各語句の生起確率を所定の閾値と比較し、閾値以下の語句があるか否かを判定する。閾値以下の語句がないと判定した場合、制御部１１は、候補文を出力文章に決定する。また、例えば制御部１１は、閾値以下の語句があると判定した場合、逆順序の言語モデル１４１を用いて該当語句を別の語句に置換（校正）する処理を行い、置換後の候補文を最終的に出力する文章に決定してもよい。制御部１１は処理をステップＳ３５に移行する。 The control unit 11 reverses the arrangement order of the words included in the candidate sentence selected in step S522 (step S523). Then, the control unit 11 calculates the occurrence probability of each word included in the candidate sentence based on the reverse order language model 141 (step S524). The control unit 11 determines the final sentence to be output according to the occurrence probability calculated from the reverse order language model 141 (step S525). For example, the control unit 11 uses the reverse order language model 141 to compare the probability of occurrence of each word included in the candidate sentence with a predetermined threshold, and determines whether there is a word below the threshold. When determining that there is no word or phrase below the threshold, the control unit 11 determines the candidate sentence as the output sentence. Further, for example, when the control unit 11 determines that there is a word or phrase below the threshold, it performs a process of replacing (proofreading) the corresponding word or phrase with another word or phrase using the reverse order language model 141, and generates a candidate sentence after replacement. The text to be finally output may be determined. The control unit 11 shifts the process to step S35.

なお、逆順序の言語モデル１４１から候補文を生成して、正順序の言語モデル１４１によりチェックを行うようにしてもよい。 Alternatively, candidate sentences may be generated from the language model 141 in reverse order and checked by the language model 141 in normal order.

以上より、本実施の形態５によれば、正順序及び逆順序の言語モデル１４１を組み合わせて文章を生成することで、より適切な文章をユーザに提示することができる。 As described above, according to the fifth embodiment, a more appropriate sentence can be presented to the user by generating a sentence by combining the forward-order and reverse-order language models 141 .

また、本実施の形態５によれば、正順序又は逆順序のいずれかの言語モデル１４１から候補文を生成し、他方の言語モデル１４１を用いて候補文をチェックすることで、より適切に文章を生成することができる。 Further, according to the fifth embodiment, candidate sentences are generated from either the forward-order or reverse-order language model 141, and the other language model 141 is used to check the candidate sentences, thereby making the sentences more appropriate. can be generated.

（実施の形態６）
図２０は、上述した形態のサーバ１の動作を示す機能ブロック図である。制御部１１がプログラムＰを実行することにより、サーバ１は以下のように動作する。
記憶部２０１は、所定単位の文字又は文字列である各要素から構成される文章において、前記各要素の次に出現する前記要素を前記文章の順に学習した言語モデルを記憶する。受付部２０２は、生成する文章に含める前記要素の指定入力を受け付ける。生成部２０３は、前記言語モデルを参照して、指定された前記要素に基づき文章を生成する。出力部２０４は、生成した前記文章を出力する。 (Embodiment 6)
FIG. 20 is a functional block diagram showing the operation of the server 1 of the form described above. When the control unit 11 executes the program P, the server 1 operates as follows.
The storage unit 201 stores a language model obtained by learning, in order of the sentence, the elements that appear next to each element in a sentence composed of each element that is a character or character string of a predetermined unit. The receiving unit 202 receives an input specifying the elements to be included in the text to be generated. The generating unit 203 refers to the language model and generates sentences based on the designated elements. The output unit 204 outputs the generated text.

本実施の形態６は以上の如きであり、その他は実施の形態１から５と同様であるので、対応する部分には同一の符号を付してその詳細な説明を省略する。 The sixth embodiment is as described above, and the other parts are the same as those of the first to fifth embodiments.

今回開示された実施の形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time are illustrative in all respects and should not be considered restrictive. The scope of the present invention is indicated by the scope of the claims rather than the above-described meaning, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

１サーバ（情報処理装置）
１１制御部
１２主記憶部
１３通信部
１４補助記憶部
Ｐプログラム
１４１言語モデル
１４２キーワードＤＢ
１４３置換辞書
２端末 1 server (information processing device)
11 Control Unit 12 Main Storage Unit 13 Communication Unit 14 Auxiliary Storage Unit P Program 141 Language Model 142 Keyword DB
143 replacement dictionary 2 terminal

Claims

a storage unit that stores a language model obtained by learning, in the order of the sentence, the elements appearing next to the elements in a sentence composed of elements that are characters or character strings of a predetermined unit;
a receiving unit that receives a specified input of the element to be included in the generated text;
Input any of the specified elements into the language model to obtain an element appearing next to the element, input the specified element and the obtained element into the language model, and then appear next. a generation unit that repeats the process of acquiring elements to generate sentences that include the specified elements and the acquired elements ;
and an output unit that outputs the generated text ,
The generating unit
An element existing at the beginning of a learning source sentence used for learning the language model is input to the language model, an element appearing next to the element is acquired, and the element appearing at the beginning and the element appearing next to the acquired element are obtained. or input an element existing at the beginning of the learning source sentence into the language model to obtain an element that appears next to the element, and convert the first element and the obtained element to the Repeat the process of inputting to the language model and acquiring the next appearing element to generate a leading phrase including the leading element and each acquired element,
Generate a sentence containing the generated first phrase and the specified element
Information processing equipment.

The language model calculates the probability of occurrence of the element appearing next based on the input leading element , and outputs the element for which the occurrence probability of a certain value or more is calculated,
2. The information processing apparatus according to claim 1 , wherein the head phrase including the element output by the language model is generated.

a storage unit that stores a language model obtained by learning, in the order of the sentence, the elements appearing next to the elements in a sentence composed of elements that are characters or character strings of a predetermined unit;
a receiving unit that receives a specified input of the element to be included in the generated text;
Input any of the specified elements into the language model to obtain an element appearing next to the element, input the specified element and the obtained element into the language model, and then appear next. a generation unit that repeats the process of acquiring elements to generate sentences that include the specified elements and the acquired elements;
an output unit that outputs the generated text;
with
The reception unit receives designation input of a plurality of the elements,
The generation unit inputs any one of the specified plurality of elements to the language model to obtain a plurality of elements that appear next, and for each of the plurality of elements that appear next, Repeating the process of inputting the specified element input to the language model and the next appearing element into the language model and obtaining the next appearing element, and generating a plurality of said sentences containing an element to
A selection unit that selects the sentences by determining whether the sentences contain other elements specified in addition to the elements input to the language model from the plurality of sentences,
The information processing apparatus, wherein the output unit outputs the selected text.

The reception unit receives an input specifying the order in which the plurality of elements appear in the sentence,
The information processing apparatus according to claim 3 , wherein the selection unit selects the sentence in which the plurality of elements appear in a designated order.

If the sentence including the other element is not generated, the generating unit sequentially replaces each element included in the generated sentence with the other element to generate a plurality of sentences ;
Using the language model, the probability of occurrence of each element contained in each of the plurality of generated sentences is calculated, and based on the probability of occurrence of each element, a score indicating the validity of each of the plurality of sentences is calculated. a calculator,
5. The information processing apparatus according to claim 3 , wherein the selection unit selects sentences whose score is equal to or greater than a predetermined value .

6. The information processing apparatus according to claim 5 , wherein among the elements included in the generated sentence, the generation unit replaces an element having the same part of speech as the other element with the other element.

The generation unit generates the sentence by calculating the probability of occurrence of each element following the specified element based on the language model,
A calculation unit that calculates a score indicating the validity of the generated sentence based on the probability of occurrence of each element,
The information processing apparatus according to any one of claims 1 to 6 , wherein the output unit outputs the text whose calculated score is equal to or greater than a predetermined value .

The storage unit stores a first language model that has learned the elements in forward order from the beginning to the end of the sentence, and a second language model that has learned the elements by rearranging the elements from the beginning to the end of the sentence in reverse order. store the language model and
The generating unit
generating the sentence using the first or second language model ;
Converting the arrangement order of the elements included in the generated sentence to reverse order,
Using the second or first language model, calculate the occurrence probability of each element contained in the converted sentence,
The information processing apparatus according to any one of claims 1 to 7 , wherein the output unit outputs the text when the calculated probability of occurrence of each element is equal to or greater than a predetermined value .

The storage unit stores a plurality of the language models that differ according to categories of sentences,
The receiving unit receives a specified input of the category of the sentence to be generated,
The information processing apparatus according to any one of claims 1 to 8 , wherein the generation unit generates the sentence based on the language model corresponding to the designated category.

the storage unit stores a replacement dictionary for replacing the element;
The receiving unit receives a specified input of the element by text input,
a replacement unit that refers to the replacement dictionary and replaces the input element;
The information processing apparatus according to any one of claims 1 to 9 , wherein the generation unit generates the sentence based on the replaced element.

Receiving a specified input of an element that is a character or a character string of a predetermined unit to be included in the generated sentence,
Input any of the specified elements to a language model that has learned the elements that appear next to each element in the sentence in the order of the sentence, acquire the element that appears next to the element, and obtain the element that appears next to the specified element. Repeating the process of inputting the element and the obtained element into the language model and obtaining the element that appears next, generating a sentence containing the specified element and each obtained element ,
A process of outputting the generated text,
An element existing at the beginning of a learning source sentence used for learning the language model is input to the language model, an element appearing next to the element is acquired, and the element appearing at the beginning and the element appearing next to the acquired element are obtained. or input an element existing at the beginning of the learning source sentence into the language model to obtain an element that appears next to the element, and convert the first element and the obtained element to the Repeat the process of inputting to the language model and acquiring the next appearing element to generate a leading phrase including the leading element and each acquired element,
Generate a sentence containing the generated first phrase and the specified element
An information processing method characterized in that a computer executes processing.

Receiving a specified input of an element that is a character or a character string of a predetermined unit to be included in the generated sentence,
Input any of the specified elements to a language model that has learned the elements that appear next to each element in the sentence in the order of the sentence, acquire the element that appears next to the element, and obtain the element that appears next to the specified element. Repeating the process of inputting the element and the obtained element into the language model and obtaining the element that appears next, generating a sentence containing the specified element and each obtained element ,
A process of outputting the generated text,
An element existing at the beginning of a learning source sentence used for learning the language model is input to the language model, an element appearing next to the element is acquired, and the element appearing at the beginning and the element appearing next to the acquired element are obtained. or input an element existing at the beginning of the learning source sentence into the language model to obtain an element that appears next to the element, and convert the first element and the obtained element to the Repeat the process of inputting to the language model and acquiring the next appearing element to generate a leading phrase including the leading element and each acquired element,
Generate a sentence containing the generated first phrase and the specified element
A program characterized by causing a computer to execute processing .