JP7200154B2

JP7200154B2 - Program, device and method for inferring response sentences to received sentences

Info

Publication number: JP7200154B2
Application number: JP2020023845A
Authority: JP
Inventors: 彰夫石川; 広海石先
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-02-14
Filing date: 2020-02-14
Publication date: 2023-01-06
Anticipated expiration: 2040-02-14
Also published as: JP2021128637A

Description

本発明は、受話文に対する応答文を推論する対話生成システムの技術に関する。 The present invention relates to technology of a dialog generation system that infers a response sentence to a received sentence.

従来、深層学習を用いた対話生成システムの技術がある（例えば非特許文献１参照）。この技術によれば、学習モデル自体がブラックボックスとなっており、シーケンスからシーケンスへ(sequence-to-sequence)の系列変換モデルとして学習する。用途として、マイクロブログや映画の字幕、トラブルシューティングデスクの問答集など多様な発話状況を含む対話コーパスを用いて、雑談のように特定の目的を伴わない対話（非タスク指向型対話）を実現する。 Conventionally, there is a technology of a dialog generation system using deep learning (see Non-Patent Document 1, for example). According to this technique, the learning model itself is a black box, and is learned as a sequence-to-sequence conversion model. As applications, we use a dialogue corpus that includes various utterance situations such as microblogs, movie subtitles, and troubleshooting desk questions to realize conversations without a specific purpose (non-task-oriented dialogues) like chats. .

図１は、従来技術における推論装置の機能構成図である。 FIG. 1 is a functional configuration diagram of an inference device in the prior art.

図１によれば、訓練段階及び運用段階からなり、各段階はそれぞれ、エンコーダ－デコーダモデルとして構成されている。エンコーダ及びデコーダはそれぞれ、次の単語の出現確率を出力するＬＳＴＭ(Long short-term memory)に基づくものである。ＬＳＴＭは、ＲＮＮ(Recurrent Neural Network)の拡張として、時系列データ(sequential data)に対して長期記憶(Long term memory)及び短期記憶(Short term memory)によって構成された学習モデルである。 According to FIG. 1, it consists of a training phase and an operational phase, each of which is structured as an encoder-decoder model. The encoder and decoder are each based on a long short-term memory (LSTM) that outputs the probability of occurrence of the next word. LSTM is an extension of RNN (Recurrent Neural Network) and is a learning model composed of long term memory and short term memory for sequential data.

＜訓練段階＞
学習データベースは、単語列のセットからなる学習用受話文及び学習用応答文を対応付けて記憶している。
エンコーダは、学習用受話文からコンテキストベクトルを生成し、デコーダは、当該コンテキストベクトルから学習用応答文を生成する、ように学習する。
図１によれば、例えば以下のように、学習用受話文及び学習用応答文を対応付けて学習している。
（１）学習用受話文「最近、英会話を習い始めました」
学習用応答文「英会話が出来ないのですか？」
（２）学習用受話文「山登りが趣味です」
学習用応答文「どの山に登りましたか？」
＜運用段階＞
エンコーダは、対象受話文からコンテキストベクトルを生成し、デコーダは、当該コンテキストベクトルから応答文を生成する。 <Training stage>
The learning database associates and stores learning received sentences and learning response sentences each including a set of word strings.
The encoder learns to generate a context vector from the learning received sentence, and the decoder generates a learning response sentence from the context vector.
According to FIG. 1, for example, learning received sentences and learning response sentences are associated and learned as follows.
(1) Sentence for learning "Recently, I started learning English conversation"
Learning answer sentence "Can't you speak English?"
(2) Learning received sentence “My hobby is mountain climbing”
Response sentence for learning "Which mountain did you climb?"
<Operation stage>
The encoder generates a context vector from the target received sentence, and the decoder generates a response sentence from the context vector.

Oriol Vinyals, Quoc V. Le, "A Neural Conversational Model", Proceedings of the 31st International Conference on Machine Learning, vol.37, 2017.Oriol Vinyals, Quoc V. Le, "A Neural Conversational Model", Proceedings of the 31st International Conference on Machine Learning, vol.37, 2017. 東中竜一郎，船越孝太郎，荒木雅弘，塚原裕史，小林優佳，水上雅博，“テキストチャットを用いた雑談対話コーパスの構築と対話破綻の分析”，自然言語処理，Ｖｏｌ．２３，Ｎｏ．１，２０１６．Ryuichiro Higashinaka, Kotaro Funakoshi, Masahiro Araki, Hiroshi Tsukahara, Yuka Kobayashi, Masahiro Mizukami, “Construction of chat dialogue corpus using text chat and analysis of dialogue failure”, Natural Language Processing, Vol. 23, No. 1, 2016. 星の本棚、「自然言語処理（ＮＬＰ）」、[online]、［令和１年１２月２９日検索］、インターネット＜URL:http://yagami12.hatenablog.com/entry/2017/12/30/175113#ID_10-5-1＞Star Bookshelf, "Natural Language Processing (NLP)", [online], [searched on December 29, 2019], Internet <URL: http://yagami12.hatenablog.com/entry/2017/12/30 /175113#ID_10-5-1>

前述した非特許文献１に記載の技術によれば、系列変換モデルのみを用いるために、対話文の文脈までは学習していない。そのために、対話が破綻する場合があった。 According to the technique described in Non-Patent Document 1, only the series conversion model is used, so the context of the dialogue sentence is not learned. As a result, the dialogue sometimes broke down.

対話の破綻は、例えば以下の４つの事例に分類されている（例えば非特許文献２参照）。
（事例１）発話の破綻
発話そのものが破綻している場合がある。例えば、構文が崩れていて、そもそも日本語として成立していない場合がある。
（事例２）応答の破綻
日本語としては正しいが、相手の発言に対する応答が破綻している場合がある。例えば、受話文「それでは、趣味はなんですか？」に対して、応答文「最後に旅行されたのはいつですか？」を返答する場合がある。
（事例３）文脈の破綻
１回のやりとりとしては成立しているものの、既に話した内容とかみ合わない場合がある。例えば、１０秒前には応答文「お菓子が好き」と返答したにも拘わらず、直ぐに応答文「お菓子が嫌い」と返答する場合がある。
（事例４）環境の破綻
社会的（常識的）に不適切な発言をしてしまう場合がある。例えば、米国のMicrosoft社が公開した人工知能bot「Ｔａｙ」（登録商標）のように、急に人種差別的な発言をする場合がある。
一般的に、事例２の破綻が約５割、事例３の破綻が約３割、事例１の破綻が１割強で、事例４の破綻が少数程度、発生すると認識されている。 Dialogue breakdowns are classified into, for example, the following four cases (see, for example, Non-Patent Document 2).
(Case 1) Broken speech There are cases where the speech itself is broken. For example, there are cases where the syntax is broken and the language is not established as Japanese in the first place.
(Case 2) Corruption of response Although the Japanese is correct, there are cases where the response to the other party's statement is broken. For example, there is a case where a response sentence "When was the last time you traveled?"
(Case 3) Collapse of context Although it is established as a single exchange, there are cases where it does not mesh with what has already been said. For example, although the response text "I like sweets" was answered 10 seconds ago, the response text "I hate sweets" may be immediately returned.
(Case 4) Collapse of the environment Socially (common sense) inappropriate remarks may be made. For example, the artificial intelligence bot “Tay” (registered trademark) published by Microsoft in the United States may suddenly make racist remarks.
Generally, it is recognized that about 50% of bankruptcies in Case 2, about 30% in Case 3, over 10% in Case 1, and about a small number of Case 4 bankruptcies.

ここで、事例２の応答の破綻について、その原因としては、エンコーダの注意機構が重要視した単語が不適切であることが考えられる。
また、事例１の発話の破綻について、その原因としては、デコーダの精度不足が考えられる。 Here, the failure of the response in Case 2 is considered to be caused by inappropriate words emphasized by the attention mechanism of the encoder.
Further, the failure of speech in Case 1 may be caused by the lack of accuracy of the decoder.

近年のニューラルネットワークを用いた自然言語処理では、文として自然であることを過度に優先することのないよう、エンコーダが「注意機構(Attention)」を備えている。
注意機構とは、新たに「デコーダで生成しようとしているi番目のtargetの単語翻訳時の内部状態」と、「エンコーダでの各単語の隠れ層」とを用いて計算されるコンテキストベクトルを、デコーダの推論時に用いるものである。そのために、注意機構を含むモデルでは、i番目の単語を出力するときに、入力として（１）１つ前の翻訳単語結果と、（２）デコーダの内部状態と、（３）注意機構によって算出されたコンテキストベクトルとが与えられ、それを用いてi番目の単語を推論する。
このように、注意機構によって重要視すべき単語や語句が指定されるために、適切な自然言語処理が可能となる。 In recent years, in natural language processing using neural networks, the encoder is equipped with an "attention mechanism" so as not to give too much priority to natural sentences.
The attention mechanism is a new context vector calculated using the ``internal state at the time of word translation of the i-th target to be generated by the decoder'' and the ``hidden layer of each word in the encoder''. It is used when inferring Therefore, in the model including the attention mechanism, when outputting the i-th word, (1) the previous translated word result, (2) the internal state of the decoder, and (3) the attention mechanism calculated given a context vector and use it to infer the ith word.
In this way, since the attention mechanism designates words and phrases to be emphasized, appropriate natural language processing becomes possible.

しかしながら、ＬＳＴＭを用いたデコーダの場合、注意機構が学習用受話文及び学習用応答文に過学習を起こす場合があるために、受話文の文脈に対してではなく、注意機構が不適切な単語を重要視してしまうという課題がある。 However, in the case of decoders using LSTM, the attention mechanism may overfit training received sentences and training response sentences. There is a problem of placing importance on

図１によれば、運用段階で、例えば以下のように、対象受話文がエンコーダに入力され、その応答文がデコーダから出力されている。
対象受話文「最近、山登りを始めました」
応答文「山登りが出来ないのですか？」
ここで、応答文は日本語として問題は無いが、一般的に「山登りが出来ない」場合は限られており、受話文に対する応答文の文脈に不自然さや違和感を生じる。
これは、例えば訓練段階について、エンコーダが、例えば「～を始めました」を重要視したことによって、応答文「～が出来ないのですか？」の優先度が高まったものと考えられる。
このように、図１の例によれば、エンコーダの注意機構が重要視した単語が不適切であること（事例２の応答の破綻）や、デコーダの精度不足（事例１の発話の破綻）が考えられる。 According to FIG. 1, in the operation stage, for example, a target received speech sentence is input to the encoder and a response sentence is output from the decoder as follows.
Target received sentence "Recently, I started mountain climbing"
Response: "Can't you climb mountains?"
Here, although the response sentence is Japanese, there is no problem, but in general, there are limited cases where the response sentence is "cannot climb a mountain", and the context of the response sentence to the received sentence is unnatural and uncomfortable.
This is probably because, for example, in the training phase, the encoder emphasized, for example, "I started ~", and thus the priority of the response sentence "Can't you do ~?"
Thus, according to the example of FIG. 1, the inappropriateness of the word emphasized by the attention mechanism of the encoder (corruption of response in case 2) and the lack of accuracy of the decoder (corruption of speech in case 1) Conceivable.

そこで、本発明は、応答の破綻や発話の破綻に陥らないように、受話文に対する応答文を推論するプログラム、装置及び方法を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a program, an apparatus, and a method for inferring a response sentence to a received sentence so as not to fall into a broken response or a broken speech.

本発明によれば、受話文に対する応答文を推論する装置に搭載されたコンピュータを機能させるプログラムにおいて、
訓練段階として、
エンコーダ－デコーダモデルとして、第１の言語の第１のコーパステキストからコンテキストベクトルを生成する対訳エンコーダと、当該コンテキストベクトルから第１のコーパステキストの対訳となる第２の言語の第２のコーパステキストを出力する対訳デコーダとを学習し、
エンコーダ－デコーダモデルとして、学習用受話文から対訳エンコーダによって生成されたコンテキストベクトルを入力した際に、当該学習用受話文の対話となる学習用応答文から対訳エンコーダによって生成されたコンテキストベクトルを出力するようにニューラルネットワークを学習し、
運用段階として、
対象の受話文から対訳エンコーダによって第１のコンテキストベクトルを生成し、
第１のコンテキストベクトルからニューラルネットワークによって第２のコンテキストベクトルを生成し、
第２のコンテキストベクトルから対訳デコーダによって応答文を推論する
ようにコンピュータを機能させることを特徴とする。 According to the present invention, in a program that causes a computer installed in a device for inferring a response sentence to a received sentence to function,
As a training step,
The encoder-decoder model is a parallel encoder that generates a context vector from a first corpus text in a first language, and a second corpus text in a second language that is a parallel translation of the first corpus text from the context vector. Learn the bilingual decoder to output,
As an encoder-decoder model, when a context vector generated by a bilingual encoder from a received training sentence is input, a context vector generated by a bilingual encoder is output from a learning response sentence that is a dialogue of the received training sentence. train the neural network as
As an operational stage,
generating a first context vector from the target received sentence by a bilingual encoder;
generating a second context vector from the first context vector by a neural network;
The computer is operable to infer a response sentence from the second context vector by the parallel decoder.

本発明のプログラムにおける他の実施形態によれば、
対訳エンコーダは、注意(attention)機構を有しており、
対訳エンコーダから生成される第１のコンテキストベクトルと、ニューラルネットワークから生成される第２のコンテキストベクトルとは、潜在的に注意機構を含む
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The bilingual encoder has an attention mechanism,
It is also preferred to have the computer function such that the first context vector generated from the bilingual encoder and the second context vector generated from the neural network potentially contain an attention mechanism.

本発明のプログラムにおける他の実施形態によれば、
対訳エンコーダ及び対訳デコーダはそれぞれ、異なる言語数に応じて複数有し、
対訳エンコーダは、異なる言語の複数のコーパステキストをそれぞれ入力し、１つのコンテキストベクトルを生成し、及び／又は、
対訳デコーダは、１つのコンテキストベクトルを入力し、異なる言語の複数のコーパステキストをそれぞれ出力する
べく学習したものとなるようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The bilingual encoder and the bilingual decoder each have a plurality according to the number of different languages,
A bilingual encoder receives multiple corpus texts in different languages, respectively, and generates a context vector; and/or
The bilingual decoder also preferably causes the computer to be trained to input one context vector and output multiple corpus texts in different languages, respectively.

本発明のプログラムにおける他の実施形態によれば、
対訳エンコーダ及び対訳デコーダは、系列変換モデル(sequence-to-sequence)のニューラルネットワークに基づくものであり、
対訳エンコーダは、埋め込み層及び再帰層から構成され、
対訳デコーダは、埋め込み層、再帰層及び出力層から構成される
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The bilingual encoder and bilingual decoder are based on sequence-to-sequence neural networks,
A bilingual encoder consists of an embedding layer and a recurrence layer,
It is also preferred to have the computer function such that the parallel decoder consists of an embedding layer, a recursion layer and an output layer.

本発明によれば、受話文に対する応答文を推論する推論装置において、
訓練段階として、
エンコーダ－デコーダモデルとして、第１の言語の第１のコーパステキストからコンテキストベクトルを生成する対訳エンコーダと、当該コンテキストベクトルから第１のコーパステキストの対訳となる第２の言語の第２のコーパステキストを出力する対訳デコーダとを学習し、
エンコーダ－デコーダモデルとして、学習用受話文から対訳エンコーダによって生成されたコンテキストベクトルを入力した際に、当該学習用受話文の対話となる学習用応答文から対訳エンコーダによって生成されたコンテキストベクトルを出力するようにニューラルネットワークを学習し、
運用段階として、
対象の受話文から対訳エンコーダによって第１のコンテキストベクトルを生成し、
第１のコンテキストベクトルからニューラルネットワークによって第２のコンテキストベクトルを生成し、
第２のコンテキストベクトルから対訳デコーダによって応答文を推論する
ことを特徴とする。 According to the present invention, an inference device for inferring a response sentence to a received sentence:
As a training step,
The encoder-decoder model is a parallel encoder that generates a context vector from a first corpus text in a first language, and a second corpus text in a second language that is a parallel translation of the first corpus text from the context vector. Learn the bilingual decoder to output,
As an encoder-decoder model, when a context vector generated by a bilingual encoder from a received training sentence is input, a context vector generated by a bilingual encoder is output from a learning response sentence that is a dialogue of the received training sentence. train the neural network as
As an operational stage,
generating a first context vector from the target received sentence by a bilingual encoder;
generating a second context vector from the first context vector by a neural network;
A response sentence is inferred by a parallel decoder from the second context vector.

本発明によれば、受話文に対する応答文を推論する装置の推論方法において、
装置は、
訓練段階として、
エンコーダ－デコーダモデルとして、第１の言語の第１のコーパステキストからコンテキストベクトルを生成する対訳エンコーダと、当該コンテキストベクトルから第１のコーパステキストの対訳となる第２の言語の第２のコーパステキストを出力する対訳デコーダとを学習し、
エンコーダ－デコーダモデルとして、学習用受話文から対訳エンコーダによって生成されたコンテキストベクトルを入力した際に、当該学習用受話文の対話となる学習用応答文から対訳エンコーダによって生成されたコンテキストベクトルを出力するようにニューラルネットワークを学習し、
運用段階として、
対象の受話文から対訳エンコーダによって第１のコンテキストベクトルを生成し、
第１のコンテキストベクトルからニューラルネットワークによって第２のコンテキストベクトルを生成し、
第２のコンテキストベクトルから対訳デコーダによって応答文を推論する
ように実行することを特徴とする。 According to the present invention, in an inference method for a device that infers a response sentence to a received sentence,
The device
As a training step,
The encoder-decoder model is a parallel encoder that generates a context vector from a first corpus text in a first language, and a second corpus text in a second language that is a parallel translation of the first corpus text from the context vector. Learn the bilingual decoder to output,
As an encoder-decoder model, when a context vector generated by a bilingual encoder from a received training sentence is input, a context vector generated by a bilingual encoder is output from a learning response sentence that is a dialogue of the received training sentence. train the neural network as
As an operational stage,
generating a first context vector from the target received sentence by a bilingual encoder;
generating a second context vector from the first context vector by a neural network;
It is characterized by performing so as to infer a response sentence by a bilingual decoder from the second context vector.

本発明のプログラム、装置及び方法によれば、応答の破綻や発話の破綻に陥らないように、受話文に対する応答文を推論することができる。 According to the program, apparatus, and method of the present invention, it is possible to infer a response sentence to a received sentence without falling into a failure of response or failure of utterance.

従来技術における推論装置の機能構成図である。1 is a functional configuration diagram of an inference device in the prior art; FIG. 本発明における推論装置の訓練段階の機能構成図である。FIG. 4 is a functional configuration diagram of the training stage of the inference device in the present invention; 対訳デコーダ及び対訳エンコーダの訓練を表す第１の実施形態の説明図である。Fig. 2 is an illustration of a first embodiment representing training of a parallel decoder and a parallel encoder; 対訳デコーダ及び対訳エンコーダの訓練を表す第２の実施形態の説明図である。FIG. 10 is an illustration of a second embodiment representing training of a parallel decoder and a parallel encoder; 本発明における推論装置の運用段階の機能構成図である。3 is a functional configuration diagram of the inference device in the operation stage of the present invention; FIG.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図２は、本発明における推論装置の訓練段階の機能構成図である。 FIG. 2 is a functional configuration diagram of the training stage of the inference device in the present invention.

図２によれば、推論装置１は、訓練段階として、対訳コーパスデータベース１０１と、学習データベース１０２と、対訳エンコーダ１１１と、対訳デコーダ１１２と、ニューラルネットワーク１２とを有する。これら機能構成図は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。また、これら機能構成部の処理の流れは、装置の推論方法における訓練段階としても理解できる。
ここで、訓練は、第１の訓練段階と第２の訓練段階とに区分される。各訓練段階とも、エンコーダ－デコーダモデルとして、コンテキストベクトルを介して構成される。 According to FIG. 2, the inference device 1 has a parallel corpus database 101, a learning database 102, a parallel encoder 111, a parallel decoder 112, and a neural network 12 as training stages. These functional block diagrams are realized by executing a program that causes a computer installed in the apparatus to function. In addition, the processing flow of these functional components can also be understood as a training stage in the reasoning method of the device.
Here, training is divided into a first training phase and a second training phase. Each training stage is constructed via a context vector as an encoder-decoder model.

＜＜第１の訓練段階＞＞
第１の訓練段階では、対訳コーパスデータベース１０１に基づいて、対訳エンコーダ１１１及び対訳デコーダ１１２が訓練される。 <<First training stage>>
In the first training stage, the parallel encoder 111 and the parallel decoder 112 are trained based on the parallel corpus database 101 .

［対訳コーパスデータベース１０１］
対訳コーパスデータベース１０１は、異なる言語間で対訳となるコーパステキスト(corpus text)を蓄積したものである。これは、対訳エンコーダ１１１に入力すべき第１の言語のコーパステキストと、対訳デコーダ１１２で出力されるべき第２の言語のコーパステキストとを対応付けたものである。
コーパステキストは、例えばニューラル機械翻訳における多言語の対訳コーパスであってもよい。即ち、同じ意味を表す異なる言語を、対訳として用意する。
日本語：「私はあなたを愛している。」
英語：「I love you.」
独語：「Ich liebe dich.」
中国語：「我愛弥」 [Parallel corpus database 101]
The bilingual corpus database 101 accumulates corpus texts that are bilingual between different languages. This is a correspondence between the first language corpus text to be input to the bilingual encoder 111 and the second language corpus text to be output by the bilingual decoder 112 .
The corpus text may be, for example, a multilingual parallel corpus in neural machine translation. That is, different languages expressing the same meaning are prepared as parallel translations.
Japanese: "I love you."
English: "I love you."
German: "Ich liebe dich."
Chinese: "I love you"

［対訳エンコーダ１１１・対訳デコーダ１１２］
対訳エンコーダ１１１及び対訳デコーダ１１２は、対訳となる第１の言語のコーパステキスト及び第２の言語のコーパステキストを対応付けて入力し、エンコーダ－デコーダモデルとして学習する。
対訳エンコーダ１１１は、対訳コーパスデータベース１０１から入力した第１の言語のコーパステキストからコンテキストベクトルを生成する、ように学習する。ここで、対訳エンコーダは、注意(attention)機構を有する。
対訳デコーダ１１２は、当該コンテキストベクトルから、第２の言語のコーパステキストを出力する、ように学習する。 [Parallel Encoder 111/Parallel Decoder 112]
The bilingual encoder 111 and the bilingual decoder 112 input the corpus text of the first language and the corpus text of the second language as parallel translations in association with each other, and learn as an encoder-decoder model.
The bilingual encoder 111 learns to generate a context vector from the first language corpus text input from the bilingual corpus database 101 . Here, the bilingual encoder has an attention mechanism.
Parallel decoder 112 learns from the context vector to output corpus text in the second language.

対訳エンコーダ１１１及び対訳デコーダ１１２は、エンコーダ－デコーダモデルとして、一方の系列(sequence)から他方の系列へ変換する確率をモデル化した、ニューラルネットワークに基づく「系列変換モデル(sequence-to-sequence / seq2seq)」として構成されたものである（例えば非特許文献３参照）。即ち、系列Ｘが入力されたときの、ある系列Ｙが出力される条件付き確率Ｐ（Ｙ｜Ｘ）を、モデル化したものである。
系列変換モデルは、系列Ｘを入力し、固定長の「コンテキストベクトル」を生成する対訳エンコーダ１１１と、その固定長のコンテキストベクトルから系列Ｙを出力する対訳デコーダ１１２とから構成される。 The bilingual encoder 111 and the bilingual decoder 112 use, as encoder-decoder models, neural network-based "sequence-to-sequence / seq2seq models" that model the probability of converting from one sequence to the other. )” (see, for example, Non-Patent Document 3). That is, it models the conditional probability P(Y|X) that a certain sequence Y is output when the sequence X is input.
The sequence conversion model is composed of a parallel translation encoder 111 that inputs a sequence X and generates a fixed-length "context vector" and a parallel translation decoder 112 that outputs a sequence Y from the fixed-length context vector.

ここで、本発明によれば、最も注目すべき点として、受話文と応答文とは同一言語であるにも拘わらず、異なる言語間の対訳コーパスを用いることにある。
一般的に、受話文及び応答文が同一言語であれば、訓練段階及び運用段階も同一言語を用いる。当然、受話文及び応答文が日本語であれば、その他の言語間の対訳コーパスなど必要としない。もし、エンコーダ－デコーダのモデルで、日本語同士の対話のコーパスを用いたとしても、単なる恒等変換にしかならない。そのために、受話文及び応答文が同一言語であれば、対訳コーパステキストを用いることは全く想定されない。
これに対し、本発明によれば、あえて、受話文及び応答文が異なる言語となる「対訳コーパス」を用いて学習している。これによって、対訳エンコーダが生成するコンテキストベクトルに内在する注意機構の過学習を防ぐことができる。特に、対訳の言語種別が多いほど、個々の言語モデルの影響を受けないコンテキストベクトルの生成が可能となる。最終的に生成される応答文は、既存の言語モデルの影響をできる限り受けないものとなることが期待される。 Here, according to the present invention, the most notable point is to use a bilingual corpus between different languages, even though the received sentence and the response sentence are in the same language.
In general, if the received sentence and the response sentence are in the same language, the same language is used in the training stage and the operation stage. Naturally, if the received sentence and the response sentence are in Japanese, there is no need for a bilingual corpus for other languages. If the encoder-decoder model were to use a corpus of Japanese-Japanese dialogues, it would be nothing more than a simple identity transformation. Therefore, if the received sentence and the response sentence are in the same language, it is not assumed that the bilingual corpus text is used.
In contrast, according to the present invention, learning is performed using a "parallel translation corpus" in which received sentences and response sentences are in different languages. This prevents over-training of the attention mechanism inherent in the context vectors generated by the bilingual encoder. In particular, the greater the number of parallel translation language types, the more it becomes possible to generate a context vector that is not affected by individual language models. It is expected that the finally generated response sentence will be influenced as little as possible by the existing language model.

図３は、対訳デコーダ及び対訳エンコーダの訓練を表す第１の実施形態の説明図である。 FIG. 3 is an illustration of a first embodiment representing training of a parallel decoder and a parallel encoder.

図３によれば、対訳エンコーダ１１１には、第１の言語のコーパステキストに基づく形態素系列が入力される。
日本語：「あなた／を／愛し／て／いる／<EOS>」
対訳デコーダ１１２には、第２の言語のコーパステキストに基づく形態素系列が入力される。
英語：「<BOS>／I／love／you／<EOS>」
第１の言語のコーパステキスト、第２の言語のコーパステキストとは、異なる言語であるが、同義文である。 According to FIG. 3, the bilingual encoder 111 receives a morpheme sequence based on the corpus text of the first language.
Japanese: "You/to/love/to/be/<EOS>"
A morpheme sequence based on the corpus text of the second language is input to the parallel translation decoder 112 .
English: "<BOS>/I/love/you/<EOS>"
The corpus text of the first language and the corpus text of the second language are synonymous sentences although they are in different languages.

また、図３によれば、対訳エンコーダ１１１には、例えば以下の日本語文が入力されている。
「幕府は、1639年、ポルトガル人を追放し、大名には沿岸の警備を命じた。」
これに対し、対訳デコーダ１１２は、コンテキストベクトルから、以下のような英語文を出力するように、対訳エンコーダ１１１及び対訳デコーダ１１２を学習する。
「The shogunate banished Portuguese in 1639, ordered Daimyo to guard
the coast.」
同様に、図３によれば、対訳エンコーダ１１１には、例えば以下の日本語文が入力されている。
「1639年、ポルトガル人は追放され、幕府は大名から沿岸の警備を命じられた。」
これに対し、対訳デコーダ１１２は、コンテキストベクトルから、以下のような英語文を出力するように、対訳エンコーダ１１１及び対訳デコーダ１１２を学習する。
「In 1639, the Portuguese were expelled, and the shogunate was ordered
to protect the coast from Daimyo.」 Further, according to FIG. 3, the following Japanese sentence is input to the bilingual encoder 111, for example.
``The shogunate expelled the Portuguese in 1639 and ordered the feudal lords to guard the coast.''
On the other hand, the bilingual decoder 112 learns the bilingual encoder 111 and the bilingual decoder 112 so as to output the following English sentence from the context vector.
"The shogunate banished Portuguese in 1639, ordered Daimyo to guard
the coast."
Similarly, according to FIG. 3, the following Japanese sentence is input to the bilingual encoder 111, for example.
"In 1639 the Portuguese were expelled and the shogunate was ordered by the feudal lords to guard the coast."
On the other hand, the bilingual decoder 112 learns the bilingual encoder 111 and the bilingual decoder 112 so as to output the following English sentence from the context vector.
"In 1639, the Portuguese were expelled, and the shogunate was ordered
to protect the coast from Daimyo."

図３によれば、対訳エンコーダ１１１は、埋め込み層及び再帰層から構成され、第１の言語のコーパステキストからコンテキストベクトルを出力するように学習する。
埋め込み層は、入力テキストＸの各単語ｘを、埋め込みベクトル(embedding vector)の分散表現(distribute representation)に変換する。
次に、再帰層は、埋め込みベクトルを入力し、コンテキストベクトルを出力するように、再帰型ニューラルネットワークとして機能する。 According to FIG. 3, the bilingual encoder 111 consists of an embedding layer and a recursion layer and learns to output context vectors from the corpus text of the first language.
The embedding layer transforms each word x of the input text X into a distributed representation of an embedding vector.
The recurrent layer then functions as a recurrent neural network, inputting the embedding vector and outputting the context vector.

これに対し、デコーダ１２は、埋め込み層、再帰層及び出力層から構成され、コンテキストベクトルを入力し、第２の言語のコーパステキストを出力するように学習する。
埋め込み層は、出力テキストＹの各単語ｙを、埋め込みベクトルの分散表現に変換する。
次に、再帰層は、埋め込みベクトルとコンテキストベクトルとを入力し、再帰型ニューラルネットワークとして機能する。
出力層は、再帰層から出力された出力系列Ｙの単語ｙに対応する隠れ層状態ベクトルを入力し、テキストを出力する。 On the other hand, the decoder 12 consists of an embedding layer, a recursion layer and an output layer, receives a context vector and learns to output a corpus text of the second language.
The embedding layer transforms each word y of the output text Y into a distributed representation of embedding vectors.
Next, the recurrent layer receives the embedding vector and the context vector and functions as a recurrent neural network.
The output layer receives a hidden layer state vector corresponding to the word y in the output sequence Y output from the recursive layer, and outputs text.

図４は、対訳デコーダ及び対訳エンコーダの訓練を表す第２の実施形態の説明図である。 FIG. 4 is an illustration of a second embodiment representing the training of the parallel decoder and encoder.

図４によれば、４つの異なる言語のコーパステキストを対応付けると共に、２つの対訳エンコーダ１１１と、２つの対訳デコーダ１１２とから、コンテキストベクトルが生成されている。即ち、２つの対訳エンコーダ１１１と２つの対訳デコーダ１１２とによって構成している。
図４によれば、日本語に対応する対訳エンコーダ１１１と、中国語に対応する対訳エンコーダ１１１とからの出力となるコンテキストベクトルを、英語に対応する対訳デコーダ１１２と、独語に対応する対訳デコーダ１１２とに入力して学習している。これによって、４つの異なる言語について、同義文となる対訳コーパスに共通するコンテキストベクトルが生成されることとなる。 According to FIG. 4, context vectors are generated from two parallel encoders 111 and two parallel decoders 112 while associating corpus texts of four different languages. That is, it is composed of two translation encoders 111 and two translation decoders 112 .
According to FIG. 4, the context vectors output from the parallel encoder 111 for Japanese and the parallel encoder 111 for Chinese are converted into the context vectors output from the parallel decoder 112 for English and the parallel decoder 112 for German. and learning to type. As a result, context vectors common to parallel corpora that are synonymous sentences are generated for four different languages.

勿論、更なる実施形態として、対訳エンコーダ１１１及び対訳デコーダ１１２を、１対２、２対１と異なるように構成してもよい。
例えば、日本語に対応する対訳エンコーダ１１１とからの出力となるコンテキストベクトルを、英語に対応する対訳デコーダ１１２と、独語に対応する対訳デコーダ１１２とに入力して学習するものであってもよい。
また、例えば、日本語に対応する対訳エンコーダ１１１と、中国語に対応する対訳エンコーダ１１１とからの出力となるコンテキストベクトルを、英語に対応する対訳デコーダ１１２に入力して学習するものであってもよい。 Of course, in further embodiments, the parallel encoder 111 and the parallel decoder 112 may be configured differently from 1:2 and 2:1.
For example, context vectors output from the Japanese translation encoder 111 may be input to the English translation decoder 112 and the German translation decoder 112 for learning.
Further, for example, even if context vectors output from the parallel translation encoder 111 corresponding to Japanese and the parallel translation encoder 111 corresponding to Chinese are input to the parallel translation decoder 112 corresponding to English and learning is performed, good.

＜＜第２の訓練段階＞＞
図２に戻って、第２の訓練段階では、学習データベース１０２と、２つの対訳エンコーダ１１１と、ニューラルネットワーク１２とによって訓練される。 <<Second training stage>>
Returning to FIG. 2 , in the second training stage, the training database 102 , two parallel encoders 111 and the neural network 12 are trained.

［学習データベース１０２］
学習データベース１０２は、学習用受話文及び学習用応答文を対応付けて蓄積したものである。これは、従来技術としての図１における学習データベースと同様のものである。 [Learning database 102]
The learning database 102 stores received speech sentences for learning and response sentences for learning in association with each other. This is similar to the learning database in FIG. 1 as the prior art.

［２つの対訳エンコーダ１１１］
２つの対訳エンコーダ１１１は、第１の訓練段階で学習された対訳エンコーダ１１１をそのまま用いている。一方の対訳エンコーダ１１１は、対話となる学習用受話文を入力して、その学習用受話文からコンテキストベクトルを生成し、そのコンテキストベクトルをニューラルネットワーク１２へ入力する。他方の対訳エンコーダ１１１は、対話となる学習用応答文を入力し、その学習用応答文からコンテキストベクトルを生成し、そのコンテキストベクトルをニューラルネットワーク１２の出力側へ入力する。 [Two Parallel Encoders 111]
The two bilingual encoders 111 use the bilingual encoders 111 learned in the first training stage as they are. On the other hand, the bilingual encoder 111 inputs a learning received sentence to be a dialogue, generates a context vector from the learning received sentence, and inputs the context vector to the neural network 12 . The other bilingual encoder 111 inputs a learning response sentence for dialogue, generates a context vector from the learning response sentence, and inputs the context vector to the output side of the neural network 12 .

例えば前述した図１と同様に、以下のように、学習用受話文及び学習用応答文を対応付けて学習している。
（１）学習用受話文「最近、英会話を習い始めました」
学習用応答文「英会話が出来ないのですか？」
（２）学習用受話文「山登りが趣味です」
学習用応答文「どの山に登りましたか？」 For example, similar to FIG. 1 described above, learned received sentences for learning and response sentences for learning are associated and learned as follows.
(1) Sentence for learning "Recently, I started learning English conversation"
Learning answer sentence "Can't you speak English?"
(2) Learning received sentence “My hobby is mountain climbing”
Response sentence for learning "Which mountain did you climb?"

［ニューラルネットワーク１２］
ニューラルネットワーク１２は、エンコーダ－デコーダモデルとして、学習用受話文から対訳エンコーダによって生成されたコンテキストベクトルを入力した際に、当該学習用受話文の対話となる学習用応答文から対訳エンコーダによって生成されたコンテキストベクトルを出力する、ように学習する。
ニューラルネットワーク１２は、例えば畳み込みニューラルネットワークであるのが好ましい。 [Neural network 12]
The neural network 12, as an encoder-decoder model, receives a context vector generated by a bilingual encoder from a received learning sentence, and generates a response sentence for learning, which is a dialogue of the received sentence for learning, by the bilingual encoder. It learns to output a context vector.
Neural network 12 is preferably a convolutional neural network, for example.

注目すべき点は、対訳エンコーダ１１１及び対訳デコーダ１１２を学習させるものではない、ことにある。
また、２つの対訳エンコーダ１１１によって生成されたそれぞれのコンテキストベクトルは、潜在的に注意機構を含むこととなる。
本発明によれば、対訳コーパスデータベース１０１を用いることによって、対訳エンコーダ１１１の注意機構は、言語の種別に影響されず、意味的に重要な単語を重要視するようになるため、応答文の破綻を抑制することができる。ここで、複数の異なる言語の対訳コーパスを用いて対訳デコーダ１１２を訓練することによって、性能が改善し、応答文の破綻を更に抑制することができる。 The point to be noted is that the parallel translation encoder 111 and the parallel translation decoder 112 are not trained.
Also, each context vector generated by the two parallel encoders 111 will potentially contain an attention mechanism.
According to the present invention, by using the bilingual corpus database 101, the attention mechanism of the bilingual encoder 111 is unaffected by the type of language and attaches importance to semantically important words. can be suppressed. Here, by training the bilingual decoder 112 using bilingual corpora of multiple different languages, the performance can be improved and the breakdown of response sentences can be further suppressed.

＜＜運用段階＞＞
図５は、本発明における推論装置の運用段階の機能構成図である。 <<Operation stage>>
FIG. 5 is a functional configuration diagram of the inference device in the operational stage of the present invention.

推論装置１は、対象受話文に対する応答文を推論する。
図５によれば、運用段階では、対訳エンコーダ１１１と、ニューラルネットワーク１２と、対訳デコーダ１１２とからなるエンコーダ－デコーダモデルによって推論される。対訳エンコーダ１１１及び対訳デコーダ１１２は、第１の訓練段階で訓練されたものであり、ニューラルネットワーク１２は、第２の訓練段階で訓練されたものである。 The inference device 1 infers a response sentence to a target received speech sentence.
According to FIG. 5 , in the operational stage, it is inferred by an encoder-decoder model consisting of a parallel encoder 111 , a neural network 12 and a parallel decoder 112 . Parallel encoder 111 and parallel decoder 112 were trained in a first training phase, and neural network 12 was trained in a second training phase.

対訳エンコーダ１１１は、対象の受話文から第１のコンテキストベクトルを生成する。
次に、ニューラルネットワーク１２は、第１のコンテキストベクトルから第２のコンテキストベクトルを生成する。
そして、対訳デコーダ１１２は、第２のコンテキストベクトルから応答文を生成する。
ここで、第１のコンテキストベクトル及び第２のコンテキストベクトルは、潜在的に注意機構を含むこととなる。 The bilingual encoder 111 generates a first context vector from the target received sentence.
Neural network 12 then generates a second context vector from the first context vector.
The bilingual decoder 112 then generates a response sentence from the second context vector.
Here, the first context vector and the second context vector potentially contain the attention mechanism.

図５によれば、運用段階で、例えば以下のように、対象受話文が対訳エンコーダ１１１に入力され、その応答文が対訳デコーダ１１２から出力されている。
対象受話文「最近、山登りを始めました」
応答文「どの山に登りましたか？」
ここで、応答文は、日本語として問題は無いだけでなく、文脈に不自然さや違和感も生じない。これは、エンコーダ－デコーダモデルのコンテキストベクトルに内在する注意機構が過学習を起こしていないことに基づくものである。この点で、従来技術における前述した図１と異なっている。 According to FIG. 5, during the operation stage, for example, a target received sentence is input to the bilingual encoder 111 and a response sentence is output from the bilingual decoder 112 as follows.
Target received sentence "Recently, I started mountain climbing"
Response: "Which mountain did you climb?"
Here, the response sentence does not have any problem as Japanese, and the context does not cause unnaturalness or discomfort. This is based on the fact that the attention mechanisms inherent in the context vector of the encoder-decoder model are not overfitting. In this point, it is different from the above-described FIG. 1 in the prior art.

本発明によれば、対訳コーパスデータベース１０１を用いて対訳エンコーダ１１１を訓練しているために、「（英会話を）習い始めました」と「（山登りを）始めました」とを明確に区別して学習している。そのために、対訳エンコーダ１１１の注意機構は、「始めました」以外（例えば「山登り」）を重要視し、最終的には、正しい応答文を推論することとなる。
特に、深層学習が不得手とする「対話文生成」が可能となり、そのニューラルネットワークの適用範囲を広げることができる。 According to the present invention, since the bilingual encoder 111 is trained using the bilingual corpus database 101, "I started learning (English conversation)" and "I started (mountain climbing)" are clearly distinguished. learning. Therefore, the attention mechanism of the bilingual encoder 111 places importance on things other than "begin" (eg, "hill climbing"), and eventually infers the correct response sentence.
In particular, "dialogue generation," which is a weak point of deep learning, becomes possible, and the application range of the neural network can be expanded.

以上、詳細に説明したように、本発明のプログラム、装置及び方法によれば、応答の破綻や発話の破綻に陥らないように、受話文に対する応答文を推論することができる。 As described in detail above, according to the program, apparatus, and method of the present invention, it is possible to infer a response sentence to a received sentence without falling into a failure of response or speech.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 For the various embodiments of the present invention described above, various changes, modifications and omissions within the spirit and scope of the present invention can be easily made by those skilled in the art. The foregoing description is exemplary only and is not intended to be limiting. The invention is to be limited only as limited by the claims and the equivalents thereof.

１推論装置
１０１対訳コーパスデータベース
１０２学習データベース
１１１対訳エンコーダ
１１２対訳デコーダ
１２ニューラルネットワーク

Reference Signs List 1 inference device 101 bilingual corpus database 102 learning database 111 bilingual encoder 112 bilingual decoder 12 neural network

Claims

In a program that operates a computer installed in a device that infers a response sentence to a received sentence,
As a training step,
The encoder-decoder model is a parallel encoder that generates a context vector from a first corpus text in a first language, and a second corpus text in a second language that is a parallel translation of the first corpus text from the context vector. Learn the bilingual decoder to output,
As an encoder-decoder model, when a context vector generated by a bilingual encoder from a received training sentence is input, a context vector generated by a bilingual encoder is output from a learning response sentence that is a dialogue of the received training sentence. train the neural network as
As an operational stage,
generating a first context vector from the target received sentence by a bilingual encoder;
generating a second context vector from the first context vector by a neural network;
A program characterized by causing a computer to infer a response sentence from a second context vector by a parallel decoder.

The bilingual encoder has an attention mechanism,
2. The method of claim 1, wherein the first context vector generated from the bilingual encoder and the second context vector generated from the neural network potentially cause a computer to include an attention mechanism. program.

The bilingual encoder and the bilingual decoder each have a plurality according to the number of different languages,
A bilingual encoder receives multiple corpus texts in different languages, respectively, and generates a context vector; and/or
3. The program according to claim 1 or 2, wherein the bilingual decoder inputs one context vector and causes the computer to function so as to be trained to output multiple corpus texts in different languages respectively.

The bilingual encoder and bilingual decoder are based on sequence-to-sequence neural networks,
A bilingual encoder consists of an embedding layer and a recurrence layer,
4. The program according to any one of claims 1 to 3, wherein the bilingual decoder causes the computer to function so as to consist of an embedding layer, a recurrence layer and an output layer.

In an inference device that infers a response sentence to a received sentence,
As a training step,
The encoder-decoder model is a parallel encoder that generates a context vector from a first corpus text in a first language, and a second corpus text in a second language that is a parallel translation of the first corpus text from the context vector. Learn the bilingual decoder to output,
As an encoder-decoder model, when a context vector generated by a bilingual encoder from a received training sentence is input, a context vector generated by a bilingual encoder is output from a learning response sentence that is a dialogue of the received training sentence. train the neural network as
As an operational stage,
generating a first context vector from the target received sentence by a bilingual encoder;
generating a second context vector from the first context vector by a neural network;
An inference device that infers a response sentence from a second context vector by a bilingual decoder.

In an inference method for a device that infers a response sentence to a received sentence,
The device
As a training step,
The encoder-decoder model is a parallel encoder that generates a context vector from a first corpus text in a first language, and a second corpus text in a second language that is a parallel translation of the first corpus text from the context vector. Learn the bilingual decoder to output,
As an encoder-decoder model, when a context vector generated by a bilingual encoder from a received training sentence is input, a context vector generated by a bilingual encoder is output from a learning response sentence that is a dialogue of the received training sentence. train the neural network as
As an operational stage,
generating a first context vector from the target received sentence by a bilingual encoder;
generating a second context vector from the first context vector by a neural network;
A method of inference, comprising inferring a response sentence by a parallel decoder from the second context vector.