JP2021051709A

JP2021051709A - Text processing apparatus, method, device, and computer-readable recording medium

Info

Publication number: JP2021051709A
Application number: JP2019209173A
Authority: JP
Inventors: シーホングオ; Xihong Guo; ティエンシャンリュー; Tianshang Liu; シンユグオ; xin yu Guo; アンシンリー; Anxin Li; ランチン; Lan Chen; 大志池田; Hiroshi Ikeda; 拓藤本; Hiroshi Fujimoto
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2019-09-20
Filing date: 2019-11-19
Publication date: 2021-04-01
Also published as: CN112613307A

Abstract

To provide a text processing apparatus, method, and device for displaying external information in a result of text processing, and a computer-readable recording medium.SOLUTION: A text processing apparatus 100 includes: an encoding unit which encodes a source text to acquire a source-text encoding hidden state; a decoding unit which determines a decoding hidden state; and an output unit which determines output word probability distribution on the basis of external information, the source-text encoding hidden state, and the decoding hidden state, for determining an output word.SELECTED DRAWING: Figure 1

Description

本開示はテキスト処理分野に関し、より具体的には、テキスト処理装置、方法、デバイス及びコンピュータ読み取り可能な記録媒体に関する。 The present disclosure relates to the field of text processing, and more specifically to text processing devices, methods, devices and computer-readable recording media.

従来のテキスト処理では、テキスト変換、テキスト生成等のプロセスにおいて、入力されたソーステキストを処理することで最終的なテキスト処理結果を取得することができる。 In conventional text processing, the final text processing result can be obtained by processing the input source text in processes such as text conversion and text generation.

いくつかの場合には、さらに理想的な結果が得られるように、ユーザはテキスト処理プロセスのために何らかの外部情報を指定することができる。このような外部情報は、ユーザが指定したテキストにおける重要な情報であってもよいし、ソーステキストに関連付けられる他のテキスト情報であってもよい。このような外部情報がテキスト処理結果に表れる可能性をより高くするためには、テキスト処理プロセスにおいて外部情報を十分に考慮したテキスト処理方法が必要となる。 In some cases, the user can specify some external information for the text processing process for more ideal results. Such external information may be important information in the text specified by the user, or may be other text information associated with the source text. In order to increase the possibility that such external information appears in the text processing result, a text processing method that fully considers the external information in the text processing process is required.

テキスト処理プロセスにおいて外部情報を十分に考慮するために、本開示では、テキスト処理方法、装置、デバイス及びコンピュータ読み取り可能な記録媒体を提供する。 To fully consider external information in the text processing process, the present disclosure provides text processing methods, devices, devices and computer-readable recording media.

本開示の一態様によれば、ソーステキスト符号化非表示状態を取得するためにソーステキストを符号化するように構成される符号化部と、復号化非表示状態を決定するように構成される復号化部と、出力単語を決定するために、外部情報、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定するように構成される出力部と、を含むテキスト処理装置が提供される。 According to one aspect of the present disclosure, a coding unit configured to encode the source text in order to obtain the source text encoded hidden state and configured to determine the decoded hidden state. A decoding unit, an output unit configured to determine an output word probability distribution based on external information, the source text coded non-display state, and the decrypted non-display state in order to determine the output word. A text processing device including is provided.

一部の実施例において、前記出力部は、前記外部情報に基づいて、候補出力単語のうち、確率が出力確率閾値以上でありかつ前記外部情報に属する単語を現在タイムステップの候補出力単語として決定するように構成される。 In some embodiments, the output unit determines, among the candidate output words, a word having a probability equal to or higher than the output probability threshold value and belonging to the external information as a candidate output word for the current time step, based on the external information. It is configured to do.

一部の実施例において、前記出力部は、さらに、現在タイムステップの候補出力単語と直前のタイムステップで決定された候補系列との同時確率、及び直前のタイムステップで決定された候補系列と前記外部情報との類似度に基づいて前記候補出力単語の候補確率を決定し、候補確率の最も高い所定数の候補出力単語を出力単語として決定するように構成される。 In some embodiments, the output unit further comprises a simultaneous probability of the candidate output word of the current time step and the candidate sequence determined in the previous time step, and the candidate sequence determined in the previous time step and the said. The candidate probabilities of the candidate output words are determined based on the degree of similarity with external information, and a predetermined number of candidate output words having the highest candidate probabilities are determined as output words.

一部の実施例において、前記符号化部は、さらに、外部情報符号化非表示状態を取得するために前記外部情報を符号化するように構成され、前記出力部は、前記外部情報符号化非表示状態と前記復号化非表示状態との類似度を決定し、前記類似度が現在タイムステップの類似度閾値以上である場合、前記外部情報を出力単語として出力するように構成される。 In some embodiments, the coding unit is further configured to encode the external information in order to acquire the external information coding non-display state, and the output unit is the external information coding non-display state. The similarity between the display state and the decrypted non-display state is determined, and when the similarity is equal to or greater than the similarity threshold of the current time step, the external information is output as an output word.

一部の実施例において、前記出力部は、さらに、前記類似度が前記現在類似度閾値より小さい場合、前記出力単語確率分布における確率の最も高い単語を現在タイムステップの出力単語として決定し、前記現在タイムステップの類似度閾値を調整することで、前記現在タイムステップの類似度閾値より小さい、次のタイムステップの類似度閾値として用いられる調整後の類似度閾値を決定するように構成される。 In some embodiments, the output unit further determines, if the similarity is less than the current similarity threshold, the word with the highest probability in the output word probability distribution as the output word of the current time step. By adjusting the similarity threshold of the current time step, it is configured to determine the adjusted similarity threshold used as the similarity threshold of the next time step, which is smaller than the similarity threshold of the current time step.

一部の実施例において、前記テキスト処理装置は、前記外部情報、前記ソーステキスト符号化非表示状態及び前記復号化非表示状態に基づいて現在タイムステップのアテンション分布を決定するように構成されるアテンション生成部をさらに含み、前記出力部は、出力単語を決定するために、前記アテンション分布、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定するように構成される。 In some embodiments, the text processor is configured to determine the attention distribution of the current time step based on the external information, the source text coded hidden state, and the decoded hidden state. The output unit further includes a generation unit, so that the output unit determines the output word probability distribution based on the attention distribution, the source text coded hidden state, and the decoded hidden state in order to determine the output word. It is composed.

一部の実施例において、前記符号化部及び前記復号化部は、トレーニングソーステキスト符号化非表示状態を取得するためにトレーニングソーステキストを符号化するステップと、トレーニング復号化非表示状態を決定するステップと、前記外部情報、前記トレーニングソーステキスト符号化非表示状態、前記トレーニング復号化非表示状態に基づいて現在タイムステップの出力単語を決定するステップと、トレーニング出力単語と前記外部情報に含まれる単語との間の差異がもっとも小さくなるように、前記符号化部、前記復号化部におけるパラメータを調整するステップとによってトレーニングされる。 In some embodiments, the coding unit and the decoding unit determine a step of encoding the training source text to obtain a training source text coded hidden state and a training decrypted hidden state. The step, the step of determining the output word of the current time step based on the external information, the training source text coded hidden state, and the training decoded hidden state, and the training output word and the word contained in the external information. The training is performed by the coding unit and the step of adjusting the parameters in the decoding unit so that the difference between the two and the above is minimized.

本開示の他の一態様によれば、ソーステキスト符号化非表示状態を取得するためにソーステキストを符号化するステップと、復号化非表示状態を決定するステップと、出力単語を決定するために、外部情報、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定するステップとを含むテキスト処理方法がさらに提供される。 According to another aspect of the present disclosure, to obtain the source text encoded hidden state, to encode the source text, to determine the decoded hidden state, and to determine the output word. , External information, the source text coded hidden state, and a step of determining the output word probability distribution based on the decoded hidden state are further provided.

一部の実施例において、前記方法は、前記外部情報に基づいて前記候補出力単語のうち、確率が出力確率閾値以上でありかつ前記外部情報に属する単語を現在タイムステップの候補出力単語として決定するステップをさらに含む。 In some embodiments, the method determines, among the candidate output words, a word having a probability greater than or equal to the output probability threshold and belonging to the external information as a candidate output word for the current time step, based on the external information. Includes more steps.

一部の実施例において、前記方法は、現在タイムステップの候補出力単語と直前のタイムステップで決定された候補系列との同時確率、及び直前のタイムステップで決定された候補系列と前記外部情報との類似度に基づいて前記候補出力単語の候補確率を決定し、候補確率の最も高い所定数の候補出力単語を出力単語として決定するステップをさらに含む。 In some embodiments, the method comprises simultaneous probabilities of the candidate output word of the current time step and the candidate sequence determined in the previous time step, and the candidate sequence determined in the previous time step and the external information. Further includes a step of determining the candidate probabilities of the candidate output words based on the similarity of the above, and determining a predetermined number of candidate output words having the highest candidate probabilities as output words.

一部の実施例において、前記方法は、外部情報符号化非表示状態を取得するために前記外部情報を符号化するステップと、前記外部情報符号化非表示状態と前記復号化非表示状態との類似度を決定し、前記類似度が現在タイムステップの類似度閾値以上である場合、前記外部情報を出力単語として出力するステップとをさらに含む。 In some embodiments, the method comprises a step of encoding the external information to obtain an external information coded hidden state, and the external information coded hidden state and the decoded hidden state. It further includes a step of determining the similarity and outputting the external information as an output word when the similarity is equal to or greater than the similarity threshold of the current time step.

一部の実施例において、前記方法は、前記類似度が前記現在類似度閾値より小さい場合、前記出力単語確率分布における確率の最も高い単語を現在タイムステップの出力単語として決定し、前記現在タイムステップの類似度閾値を調整することで、前記現在タイムステップの類似度閾値より小さい、次のタイムステップの類似度閾値として用いられる調整後の類似度閾値を決定するステップをさらに含む。 In some embodiments, the method determines the word with the highest probability in the output word probability distribution as the output word of the current time step when the similarity is less than the current similarity threshold. Further includes a step of determining the adjusted similarity threshold used as the similarity threshold of the next time step, which is smaller than the similarity threshold of the current time step by adjusting the similarity threshold of.

一部の実施例において、前記方法は、前記外部情報、前記ソーステキスト符号化非表示状態及び前記復号化非表示状態に基づいて現在タイムステップのアテンション分布を決定するステップをさらに含み、出力単語を決定するために、外部情報、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定するステップは、出力単語を決定するために、前記アテンション分布、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定するステップを含む。 In some embodiments, the method further comprises a step of determining the attention distribution of the current time step based on the external information, the source text coded hidden state and the decoded hidden state, and the output word. To determine, the step of determining the output word probability distribution based on the external information, the source text coded hidden state, the decoded hidden state is to determine the output word, the attention distribution, the source. The text-encoded non-display state includes a step of determining the output word probability distribution based on the decrypted non-display state.

本開示のまた別の一態様によれば、プロセッサと、コンピュータ読み取り可能なプログラミングコマンドを記憶したメモリと、と含み、前記コンピュータ読み取り可能なプログラミングコマンドが前記プロセッサによって実行される場合、前記のようなテキスト処理方法を実行するテキスト処理デバイスが提供される。 According to yet another aspect of the present disclosure, the processor and the memory storing the computer-readable programming commands are included, and when the computer-readable programming commands are executed by the processor, as described above. A text processing device is provided that performs the text processing method.

本開示のさらに別の一態様によれば、コンピュータによって実行される場合、前記コンピュータが前記のようなテキスト処理方法を実行するコンピュータ読み取り可能なコマンドが記録されるコンピュータ読み取り可能な記録媒体が提供される。 According to yet another aspect of the present disclosure, a computer-readable recording medium is provided on which, when executed by a computer, computer-readable commands are recorded in which the computer performs a text processing method as described above. To.

本開示にかかるテキスト処理方法、装置、デバイス及びコンピュータ読み取り可能な記録媒体を利用することにより、テキストの生成プロセスにおいて、外部情報を利用して現在タイムステップのアテンション分布を決定する、及び／又は外部情報に基づいて現在タイムステップの出力単語を決定することで、テキスト処理プロセスにおいて外部情報の内容を効果的に考慮し、テキスト生成プロセスにおいて外部情報を生成する確率を向上させることができ、外部情報を考慮した場合、テキストを生成する効果を向上させることができる。 By utilizing the text processing methods, devices, devices and computer readable recording media according to the present disclosure, in the text generation process, external information is used to determine the attention distribution of the current time step and / or externally. By determining the output word of the current time step based on the information, it is possible to effectively consider the content of the external information in the text processing process and improve the probability of generating the external information in the text generation process. When is considered, the effect of generating text can be improved.

図面を参照して本開示の実施例についてより詳細に説明する。本開示の上記及びその他の目的、特徴及び利点がさらに明らかになる。図面は本開示の実施例に対する理解をさらに深めるためのものであって、明細書の一部として本開示実施例とともに本開示を解釈するために用いられ、本開示を制限するものではない。図面において、同一の符号は通常同じ部品又はステップを示している。 The embodiments of the present disclosure will be described in more detail with reference to the drawings. The above and other objectives, features and advantages of the present disclosure will be further clarified. The drawings are for the purpose of further understanding the examples of the present disclosure and are used to interpret the present disclosure together with the present disclosure examples as part of the specification and do not limit the present disclosure. In the drawings, the same reference numerals usually indicate the same parts or steps.

本開示にかかるテキスト処理装置の概略的なブロック図を示している。A schematic block diagram of the text processing apparatus according to the present disclosure is shown. 本願の実施例にかかる出力確率分布に基づいて候補出力単語を決定する概略的な実施例を示している。A schematic example of determining a candidate output word based on the output probability distribution according to the embodiment of the present application is shown. 本願の実施例にかかる出力確率分布に基づいて候補出力単語を決定する概略的な実施例を示している。A schematic example of determining a candidate output word based on the output probability distribution according to the embodiment of the present application is shown. 本開示の実施例にかかるアテンション生成部の概略的なブロック図を示している。A schematic block diagram of an attention generation unit according to an embodiment of the present disclosure is shown. 本願の実施例にかかるアテンション生成部が現在タイムステップのアテンション分布を決定する概略的なプロセスを示している。The attention generator according to the embodiment of the present application shows a schematic process for determining the attention distribution of the current time step. 本願の実施例にかかるアテンション生成部の他の概略的なブロック図を示している。Another schematic block diagram of the attention generator according to the embodiment of the present application is shown. 本願の実施例にかかるテキスト処理装置の他の概略的なブロック図を示している。Another schematic block diagram of the text processing apparatus according to the embodiment of the present application is shown. 本願にかかるテキスト処理方法の概略的なフローチャートを示している。A schematic flowchart of the text processing method according to the present application is shown. 本願の実施例にかかる外部情報に基づいて現在タイムステップのアテンション分布を決定する概略的なフローチャートを示している。A schematic flowchart for determining the attention distribution of the current time step based on the external information according to the embodiment of the present application is shown. 本願の実施例にかかる外部情報に基づいて現在タイムステップのアテンション分布を決定する他の概略的なフローチャートを示している。Other schematic flowcharts for determining the current time step attention distribution based on external information according to the embodiments of the present application are shown. 本願の実施例にかかる他のテキスト処理方法の概略的なフローチャートを示している。A schematic flowchart of another text processing method according to the embodiment of the present application is shown. 本願の実施例にかかるもう１つのテキスト処理方法の例示的フローチャートを示している。An exemplary flowchart of another text processing method according to an embodiment of the present application is shown. 本開示実施例にかかるコンピュータデバイスの概略図である。It is the schematic of the computer device which concerns on this disclosure embodiment.

以下、本開示実施例における図面を参照して、本開示実施例の技術案について明確に、かつ全面的に説明する。記載される実施例が本開示の一部の実施例において、全部の実施例ではないことは明らかである。本開示における実施例に基づいて、当業者が創造的な労働をせずに取得したすべての他の実施例は、本開示の保護範囲に含まれる。 Hereinafter, the technical proposal of the present disclosure embodiment will be clearly and fully described with reference to the drawings in the present disclosure embodiment. It is clear that the examples described are not all examples in some of the embodiments of the present disclosure. All other examples obtained by one of ordinary skill in the art without creative labor based on the examples in this disclosure are included in the scope of protection of this disclosure.

特に定義されない限り、ここで使用される技術的用語又は科学的用語は当業者が理解する通常の意味である。本願で使用される「第一」、「第二」及びそれに類似する単語は何らかの順序、数又は重要性を示すわけではなく、単に異なる構成部分を区別するためのものである。同様に、「含む」や「含まれる」等、それに類似する単語は、当該単語の前に現れる部品或いはものは、当該単語の後に挙げられる部品或いはもの、及びその均等物を含むことを指し、他の部品或いはものを排除しない。「接続する」や「互いに接続する」等、それに類似する単語は、物理的或いは機械的な接続に限定されず、直接的か間接的かを問わず、電気的接続をも含む。「上」、「下」、「左」、「右」等は、単に相対的な位置関係を示すために用いられ、説明対象の絶対的な位置が変わった後、当該相対的な位置もそれに応じて変更される可能性がある。 Unless otherwise defined, technical or scientific terms used herein are the usual meanings understood by one of ordinary skill in the art. The words "first," "second," and similar as used herein do not indicate any order, number, or significance, but merely to distinguish between different components. Similarly, similar words such as "contains" and "contains" mean that the parts or things that appear before the word include the parts or things that follow the word, and their equivalents. Do not exclude other parts or things. Similar words such as "connect" and "connect to each other" are not limited to physical or mechanical connections, but also include electrical connections, whether direct or indirect. "Upper", "lower", "left", "right", etc. are used only to indicate the relative positional relationship, and after the absolute position of the object to be explained changes, the relative position also changes to it. Subject to change accordingly.

以下、テキスト要約の生成を例として本開示の原理について説明する。しかしながら、本開示の原理から逸脱しない限り、本開示によって提供される方法は、例えば、テキスト変換プロセス、機械翻訳プロセス等、他のテキスト処理プロセスに適用されてもよいことは、当業者が理解しうる。 Hereinafter, the principle of the present disclosure will be described by taking the generation of a text summary as an example. However, it will be appreciated by those skilled in the art that the methods provided by this disclosure may be applied to other text processing processes, such as text conversion processes, machine translation processes, etc., as long as they do not deviate from the principles of this disclosure. sell.

図１は、本開示にかかるテキスト処理装置の概略的なブロック図を示している。図１に示すように、テキスト処理装置１００は、符号化部１１０と、復号化部１２０と、アテンション生成部１３０と、出力部１４０と、を含んでもよい。テキスト処理装置１００は、ソーステキストＩに対してテキスト処理を行って、対応するテキスト処理結果を生成することができる。例えば、テキスト処理装置１００を利用してソーステキストＩ用の要約を生成してもよい。ソーステキストＩは、少なくとも１つの文を含んでもよく、各文は少なくとも１つの単語を含む。 FIG. 1 shows a schematic block diagram of the text processing apparatus according to the present disclosure. As shown in FIG. 1, the text processing device 100 may include a coding unit 110, a decoding unit 120, an attention generation unit 130, and an output unit 140. The text processing device 100 can perform text processing on the source text I and generate a corresponding text processing result. For example, the text processing apparatus 100 may be used to generate a summary for the source text I. The source text I may include at least one sentence, and each sentence contains at least one word.

本開示よって提供されるテキスト処理装置１００は、外部情報を受信し、外部情報に基づいてソーステキストに対するテキスト処理プロセスを実行することができる。一部の実施例において、外部情報とは、予め定義された、ソーステキストの処理結果として期待されるテキスト情報を指す。一部の例において、外部情報は、ソーステキストにおける少なくとも１つの単語又は文である。他の一部の例において、外部情報は、ソーステキストにおける、所定位置の単語又は文であり、例えば、ソーステキストの先頭の文、最後の文、又は任意の他の指定位置のテキスト情報である。また他の一部の例において、外部情報はソーステキストに関連付けられる追加テキストである。例えば、ソーステキストのタイトルである。ある実現形態において、外部情報は、ユーザの入力に基づいて決定される追加テキストであってもよい。本願はここで外部情報の決定方法について限定しない。実際には、テキスト処理プロセスに使用する外部情報をいかなる可能な方法で決定してもよい。 The text processing apparatus 100 provided by the present disclosure can receive external information and execute a text processing process on the source text based on the external information. In some embodiments, the external information refers to predefined textual information that is expected as a result of processing the source text. In some examples, the external information is at least one word or sentence in the source text. In some other examples, the external information is a word or sentence in place in the source text, such as the first sentence, the last sentence, or any other designated position in the source text. .. In some other examples, the external information is additional text associated with the source text. For example, the title of the source text. In certain embodiments, the external information may be additional text that is determined based on user input. The present application does not limit here to the method of determining external information. In practice, the external information used for the text processing process may be determined in any possible way.

テキスト処理装置１００を利用してソーステキストを処理する場合、テキスト処理の各段階で外部情報を考慮することで、外部情報がテキスト処理結果に表れる確率を高めることができる。例えば、ソーステキストにおける文章のタイトルが外部情報として決定された場合、本願によって提供されるテキスト処理装置１００から出力されるソーステキストの要約に、文章タイトルにおける単語及び／又は文が必ず現れるか又は現れる可能性が極めて高い。 When the source text is processed by using the text processing apparatus 100, the probability that the external information appears in the text processing result can be increased by considering the external information at each stage of the text processing. For example, when the title of a sentence in the source text is determined as external information, the word and / or sentence in the sentence title always appears or appears in the summary of the source text output from the text processing device 100 provided by the present application. Very likely.

コンピュータでテキスト処理方法を実行する場合、コンピュータは、往々にしてテキストデータを直接に処理できないため、ソーステキスト及び／又は外部情報を処理するとき、まずソーステキストを数値データに変換する必要がある。 When executing a text processing method on a computer, the computer often cannot process the text data directly, so when processing the source text and / or external information, it is necessary to first convert the source text into numerical data.

一部の実施例において、ソーステキストＩは自然言語の形で実装される。この場合、テキスト処理装置１００は、前処理部（図示せず）をさらに含んでもよい。前処理部は、ソーステキストが符号化部に入力される前にソーステキストを数値データに変換するために用いられてもよい。例えば、ソーステキストＩにおける各文に対して単語分割処理を実行することで、各文を複数の単語に分割してもよい。その後、例えば、単語埋め込み（ｗｏｒｄｅｍｂｅｄｄｉｎｇ）の方法で単語分割処理によって得られた複数の単語をそれぞれ特定次元の単語ベクトルに変換してもよい。 In some embodiments, the source text I is implemented in natural language. In this case, the text processing apparatus 100 may further include a preprocessing unit (not shown). The pre-processing unit may be used to convert the source text into numerical data before the source text is input to the encoding unit. For example, each sentence may be divided into a plurality of words by executing a word division process for each sentence in the source text I. Then, for example, a plurality of words obtained by word division processing by a word embedding method may be converted into word vectors having a specific dimension.

同様に、外部情報に対応する少なくとも１つの単語ベクトルを取得してその後のテキスト処理に用いるために、外部情報を変換してもよい。 Similarly, the external information may be transformed in order to acquire at least one word vector corresponding to the external information and use it for subsequent text processing.

一部の実施例において、本開示にかかるソーステキストＩは、数値データの形で実現されてもよく、例えば、少なくとも１つの単語ベクトルでソーステキストＩを示してもよい。この場合、符号化部１１０でソーステキストＩを直接処理してもよい。テキスト処理装置１００から独立して設けられる前処理装置で自然言語に対して前処理を行ってもよい。 In some embodiments, the source text I according to the present disclosure may be implemented in the form of numerical data, for example, the source text I may be indicated by at least one word vector. In this case, the coding unit 110 may directly process the source text I. A preprocessing device provided independently of the text processing device 100 may perform preprocessing on the natural language.

以下は、外部情報とソーステキストとが自然言語の形であるか、又は数値データの形であるかを区別せず、コンピュータを利用して自然言語の形の外部情報及び／又はソーステキストを処理する必要がある場合、当業者は、必要に応じて自然言語の形の外部情報及び／又はソーステキストを数値データに変換してもよい。 The following does not distinguish between external information and source text in natural language or numerical data, and uses a computer to process external information and / or source text in natural language. If necessary, those skilled in the art may convert external information and / or source text in natural language form into numerical data as needed.

符号化部１１０は、ソーステキスト符号化非表示状態ｈを取得するために、処理対象であるソーステキストＩを符号化するように構成されてもよい。 The coding unit 110 may be configured to encode the source text I to be processed in order to acquire the source text coding hidden state h.

一部の実施例において、符号化部１１０は符号化ネットワークとして実装されてもよい。例示的な符号化ネットワークは、長・短期記憶（ＬＳＴＭ）ネットワークを含み、ＬＳＴＭネットワークシステムは、例えば、機械翻訳、テキスト要約生成等のタスクに適用されてもよい。なお、符号化ネットワークは、単語ベクトルを符号化可能ないかなる機械学習モデルとして実装されてもよい。 In some embodiments, the coding unit 110 may be implemented as a coding network. An exemplary coded network includes a long short-term memory (LSTM) network, and the LSTM network system may be applied to tasks such as machine translation, text summarization generation, and the like. The coding network may be implemented as any machine learning model capable of coding a word vector.

例えば、ソーステキストＩに対応する少なくとも１つの単語ベクトルを入力とする場合、符号化部は、各単語ベクトルｘ_１、ｘ_２、ｘ_３…にそれぞれ対応するソーステキスト符号化非表示状態ｈ_１、ｈ_２、ｈ_３…を出力してもよい。ソーステキスト符号化非表示状態の数とソーステキストの単語ベクトルの数は同じであってもよいし、異なってもよい。例えば、ソーステキストＩに基づいてｋ個の単語ベクトルを生成する場合、符号化部は、ｋ個の対応するソーステキスト符号化非表示状態を生成するために、これらｋ個の単語ベクトルを処理してもよい。ｋは１より大きい整数である。 For example, when at least one word vector corresponding to the source text I is input, the encoding unit receives the source text coding hidden state h ₁ _{corresponding to each word vector x 1} , x ₂ , x _{3 ...} You may output h ₂ , h _{3 ...} The number of source-text-coded hidden states and the number of word vectors in the source text may be the same or different. For example, when generating k word vectors based on source text I, the encoding unit processes these k word vectors to generate k corresponding source text-coded hidden states. You may. k is an integer greater than 1.

復号化部１２０は、復号化非表示状態ｓを決定するために用いられてもよい。一部の実施例において、復号化部１２０は、現在タイムステップの復号化非表示状態ｓ_ｔを取得するために、直前のタイムステップｔ−１の復号化非表示状態ｓ_ｔ−１、及び直前のタイムステップでテキスト処理装置が取得した出力単語ｘ_ｔを受信し、ｓ_ｔ−１とｘ_ｔとを処理してもよい。先頭のタイムステップの処理において、ｓ_０及びｘ_１がデフォルトの初期値として決定されてもよい。復号化非表示状態ｓはソーステキストＩに対応する複数の復号化非表示状態ｓ_１、ｓ_２、ｓ_３…を含んでもよい。 The decoding unit 120 may be used to determine the decoding hidden state s. In some embodiments, the decoding unit 120 to obtain the decoded non-display state s _t of the current time step, the previous time step t-1 of the decoded non-display state s _t-1, and the previous _{The output word x t} acquired by the text processing apparatus may be received in the time step of, _{and st-1} and x _t may be processed. In the processing of the first time step, s ₀ and x ₁ may be determined as the default initial values. The decrypted hidden state s may include a plurality of decrypted hidden states s ₁ , s ₂ , s _3, ... Corresponding to the source text I.

一部の実施例において、復号化部１２０は、復号化ネットワークとして実装されてもよい。例示的な復号化ネットワークは長・短期記憶ネットワークを含む。なお、復号化ネットワークは、符号化ネットワークの出力を復号化可能ないかなる機械学習モデルとして実装されてもよい。 In some embodiments, the decoding unit 120 may be implemented as a decoding network. Illustrative decryption networks include long and short term storage networks. The decryption network may be implemented as any machine learning model capable of decoding the output of the coded network.

一部の実施例において、前期符号化ネットワーク及び復号化ネットワークは、シーケンス・ツー・シーケンスモデル（ＳｅｑｕｅｎｃｅｔｏＳｅｑｕｅｎｃｅ，Ｓｅｑ２Ｓｅｑ）で示すことができ、１つの入力系列、（例えば、入力テキストとしての）例えば「ＷＸＹＺ」をもう１つの出力系列、（例えば、テキスト要約としての）例えば「ＡＸＹ」に変換することを実現するために用いられてもよい。 In some embodiments, the early coding and decoding networks can be represented by a sequence-to-sequence model (Sequence to Sequence, Seq2Seq) and one input sequence, eg, as input text. It may be used to realize the conversion of "WXYZ" to another output sequence, such as "AXY" (eg, as a text summary).

アテンション生成部１３０は、前記ソーステキスト符号化非表示状態ｈと前記復号化非表示状態ｓとに基づいてアテンション分布Ａを決定し、現在タイムステップの後続のテキスト処理プロセスのためのアテンション分布Ａを出力するように構成されてもよい。 The attention generation unit 130 determines the attention distribution A based on the source text coding hidden state h and the decoding hidden state s, and determines the attention distribution A for the subsequent text processing process of the current time step. It may be configured to output.

一部の実施例において、現在タイムステップｔのアテンション分布Ａ^ｔは、ソーステキストの符号化アテンション分布であってもよい。 In some embodiments, attention distribution A ^t the current time step t can be a coding attention distribution of the source text.

一部の実装形態において、各タイムステップ（ｔｉｍｅｓｔｅｐ）ｔで現在タイムステップのソーステキスト符号化非表示状態ｈ_ｔと復号化非表示状態ｓ_ｔとを利用して現在タイムステップのソーステキストの符号化アテンション分布ａ^ｔを決定してよい。例えば、式（１）、（２）を利用してソーステキストの符号化アテンション分布ａ^ｔを決定してもよい。
ａ^ｔ＝ｓｏｆｔｍａｘ（ｅ^ｔ）（１）
ｔは現在タイムステップを示し、ｓｏｆｔｍａｘは正規化指数関数を示し、ｅ^ｔは式（２）によって下記のように決定される。

ｉは単語ベクトルのインデクス番号であり、ｈ_ｉは第ｉ個の単語ベクトルに対応するソーステキスト符号化非表示状態であり、ｖ^Ｔ、Ｗ_ｈ、Ｗ_Ｓ、ｂ_ａｔｔｎはトレーニング対象である学習パラメータであり、ｈは現在タイムステップのソーステキスト符号化非表示状態であり、ｓ_ｔは現在タイムステップの復号化非表示状態である。 In some implementations, the sign of the time step (time step) t in the source text of the current time step using the decoded non-display state s _t to the source text encoding a non-display state h _t of the current time step it may determine the of attention distribution a ^t. For example, the formula (1) may determine the coding attention distribution a ^t the source text by using (2).
^{^{a t = softmax (e t)}} (1)
t is the current time step, softmax represents a normalized exponential, ^{e t} is determined as follows by equation (2).

i is an index number of the word vector, _{h i} is the source text encoding a non-display state corresponding to the i-number of word ^{_{_{_{vectors, v T, W h, W}}}} S, learning parameters _{b attn} is a training target , H is the current time step source text coding hidden state, and _st is the current time step decoding hidden state.

他の一部の実施例において、アテンション生成部１３０は、外部情報及び式（１）によって決定されるソーステキストのアテンション分布ａ^ｔに基づいて、外部情報を含む現在タイムステップのアテンション分布Ａ^ｔを決定し、後続の各部の処理プロセスのための、外部情報を含むアテンション分布Ａ^ｔを出力してもよい。 In some other implementations, the attention generating unit 130, based on the attention distribution a ^t the source text which is determined by the external information and the formula (1), the attention distribution A ^t the current time step including external information determined, for treatment process subsequent each part may output the attention distribution a ^t with external information.

一部の実装形態において、外部情報を含む現在タイムステップのアテンション分布Ａ^ｔは、外部情報を利用してソーステキストのアテンション分布ａ^ｔを調整した後に決定されるものであってもよい。 In some implementations, the attention distribution A ^t the current time step with external information, or may be determined after adjusting the attention distribution a ^t the source text by using external information.

他の一部の実施方法において、外部情報を含む現在タイムステップのアテンション分布Ａ^ｔは、ソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔの双方を含んでもよい。 In some other implementation, the attention distribution A ^t the current time step comprising external information may include both attention distribution a ^'t attention distribution a ^t the external information of the source text.

以下、図３Ａ、図３Ｂ及び図４を参照して外部情報を含むアテンション分布Ａ^ｔの決定プロセスについて説明し、ここでは詳しく説明しない。 Hereinafter, FIG. 3A, with reference to FIG. 3B and FIG. 4 describes the process of determining the attention distribution A ^t including external information, not described in detail here.

出力部１４０は、現在タイムステップの出力単語Ｏを決定するために、前記アテンション分布Ａ、前記ソーステキスト符号化非表示状態ｈ、前記復号化非表示状態ｓに基づいて出力単語確率分布を決定するように構成されてもよい。 The output unit 140 determines the output word probability distribution based on the attention distribution A, the source text coding hidden state h, and the decoding hidden state s in order to determine the output word O of the current time step. It may be configured as follows.

出力単語確率分布は、生成確率分布Ｐ_{ｖｏｃａｂ}を含んでもよい。式（３）及び式（４）によって生成確率分布Ｐ_{ｖｏｃａｂ}を決定してもよい。

Ｖ’、Ｖ、ｂ、ｂ’は、出力部におけるトレーニング対象である学習パラメータであり、ｈ_ｔ ^＊はアテンション分布ａ^ｔに基づいて決定したコンテキストベクトルである。例えば、式（４）によってｈ_ｔ ^＊を決定してもよい。

Ａ_ｉ ^ｔはアテンション生成部から出力されるアテンション分布Ａｔにおける第ｉ個要素であり、ｈ_ｉは第ｉ個の単語ベクトルのソーステキスト符号化非表示状態である。 The output word probability distribution may include a generation probability distribution P _vocab . _{The generation probability distribution P vocab} may be determined by the equations (3) and (4).

V ', V, b, b ' is a learning parameter is training target at the output, h _t ^* is the context vector determined based on the attention distribution a ^t. _{For example, ht} ^* may be determined by the equation (4).

A _i ^t is the i-th number component in the attention distribution At output from the attention generating unit, h _i is a source text encoding a non-display state of the i-number of word vectors.

一部の実施例において、出力単語確率分布はアテンション生成部１３０から出力されるアテンション分布Ａ^ｔを含んでもよい。 In some embodiments, the output word probability distribution may include attention distribution A ^t which is output from the attention generation unit 130.

例えば、前記生成確率分布と前記アテンション分布Ａとを重み付け加算することで出力単語確率分布を決定してもよい。 For example, the output word probability distribution may be determined by weighting and adding the generation probability distribution and the attention distribution A.

一部の実装形態において、現在タイムステップのソーステキスト符号化非表示状態、復号化非表示状態、アテンション分布及び現在タイムステップの復号化部の入力ｘ_ｔに基づいて生成確率分布及びアテンション分布用の重み係数Ｐ_ｇｅｎを決定してもよい。 In some implementations, the source text encoding a non-display state of the current time step, decoding the non-display state, for the attention distribution and generation probability distributions and attention distribution based on the input x _t of the decoding unit of the current time step The weighting factor P _gen may be determined.

例えば、前記生成確率分布及び前記アテンション分布に対して重み付け加算を行うための重み係数Ｐ_ｇｅｎは、式（５）によって示されてもよい。

σは、例えばｓｉｇｍｏｉｄ関数のような活性化関数を示し、ｗ_ｈ ^Ｔ、ｗ_ｓ ^Ｔ、ｗ_ｘ ^Ｔ及びｂ_ｐｔｒは、トレーニングパラメータであり、ｈ_ｔ ^＊はタイムステップｔで式（４）によって決定されるパラメータであり、ｓ_ｔはタイムステップｔの復号化非表示状態であり、ｘ_ｔはタイムステップｔの復号化部の入力であり、すなわち、直前のタイムステップｔ−１の出力部の出力である。式（５）によって決定される重み係数Ｐ_ｇｅｎは、スカラーの形によって実装されてもよい。重み係数Ｐ_ｇｅｎを利用して生成確率分布Ｐ_{ｖｏｃａｂ}とアテンション分布Ａ^ｔとに対して加重平均を行うことで出力単語確率分布を取得することができる。 _{For example, the weighting coefficient P gen} for performing weighting addition to the generation probability distribution and the attention distribution may be expressed by the equation (5).

σ, for example shows the activation function, such as sigmoid _{^{_{^{function, w h T, w s T}}}} , w x T and _{b ptr} are training _parameters, determined by equation (4) with _h ^{t *} is the time step t is a parameter, s _t is decoded non-display state of the time step t, x _t is the input of the decoding unit of the time step t, that is, the immediately preceding time step t-1 of the output portion of the output Is. _{The weighting factor P gen} determined by equation (5) may be implemented in the form of a scalar. It is possible to obtain an output word probability distributions by performing a weighted average with respect to the weighting factors P _gen using the generated probability distribution P _vocab and attention distribution A ^t.

アテンション分布Ａ^ｔにソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔの双方が含まれる場合、ソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔとの重み係数パラメータは同じであってもよく、異なってもよい。以下、図４を参照して、生成確率分布Ｐ_{ｖｏｃａｂ}、ソーステキストのアテンション分布ａ^ｔ及び外部情報のアテンション分布ａ’^ｔの重み係数パラメータをそれぞれ決定する方法について説明し、ここでは詳しく説明しない。 'If it contains both ^t, attention distribution a of attention distribution a ^t the external information source text' attention distribution attention distribution source text in A ^t a ^t the attention distribution a of the external information weighting coefficient parameters and ^t is It may be the same or different. Referring to FIG. 4, generation probability distribution P _vocab, describes how to determine each attention distribution a 'weight factor parameters ^t attention distribution a ^t and external information source text, not described in detail here.

一部の実施例において、出力部１４０は、出力単語確率分布における確率の最も高い単語を現在タイムステップの出力単語に決定してもよい。 In some embodiments, the output unit 140 may determine the word with the highest probability in the output word probability distribution as the output word of the current time step.

他の一部の実施例において、出力部１４０は、外部情報に基づいて単語確率分布を決定して出力し、現在タイムステップの出力単語を決定してもよい。 In some other embodiments, the output unit 140 may determine and output a word probability distribution based on external information and determine the output word of the current time step.

一部の実装形態において、出力部１４０は、前記外部情報に基づいて、前記候補出力単語のうち、確率が出力確率閾値以上でありかつ前記外部情報に属する単語を現在タイムステップの候補出力単語として決定する。一部の例において、出力部１４０は、ビームサーチの原則を利用して前記候補出力単語を決定してもよい。 In some implementations, the output unit 140 sets a word having a probability equal to or higher than the output probability threshold value and belonging to the external information as a candidate output word for the current time step among the candidate output words based on the external information. decide. In some examples, the output unit 140 may use the principle of beam search to determine the candidate output word.

例えば、出力部１４０は、各タイムステップで少なくとも２つの単語を現在タイムステップの候補出力単語として決定し、その後、候補出力単語を次のタイムステップのテキスト処理プロセスにおいて使用するようにしてもよい。同様に、次のタイムステップにおいても出力部１４０は少なくとも２つの候補出力単語を決定してもよい。 For example, the output unit 140 may determine at least two words in each time step as candidate output words for the current time step, and then use the candidate output words in the text processing process for the next time step. Similarly, in the next time step, the output unit 140 may determine at least two candidate output words.

具体的には、候補出力単語の数が２であることを例として、タイムステップｔで２つの候補出力単語ａ、ｂを出力することができる。その後、候補出力単語ａ、ｂを次のタイムステップのテキスト処理プロセスに使用し、タイムステップｔ＋１の候補出力単語ｃ、ｄを決定することができる。 Specifically, taking as an example that the number of candidate output words is 2, two candidate output words a and b can be output in the time step t. After that, the candidate output words a and b can be used in the text processing process of the next time step to determine the candidate output words c and d in the time step t + 1.

図２Ａ及び図２Ｂでは、本願の実施例にかかる出力確率分布に基づいて候補出力単語を決定する例示的な実施例を示している。 2A and 2B show an exemplary embodiment in which candidate output words are determined based on the output probability distribution according to the embodiment of the present application.

一部の実施例において、各タイムステップの候補出力単語を決定する場合、出力確率分布における出力確率の最も高い所定数のＭ個（上記の例では、Ｍは２である）の単語を候補出力単語として決定する。Ｍは２以上の整数である。 In some embodiments, when determining candidate output words for each time step, a predetermined number of M words with the highest output probability in the output probability distribution (M is 2 in the above example) are output as candidates. Determine as a word. M is an integer greater than or equal to 2.

図２Ａに示している出力単語確率分布で出力確率の最も高い２つの単語はｗ３とｗ１１であり、そのためｗ３とｗ１１を候補出力単語として決定する。 In the output word probability distribution shown in FIG. 2A, the two words having the highest output probabilities are w3 and w11, and therefore w3 and w11 are determined as candidate output words.

他の一部の実施例において、各タイムステップの候補出力単語を決定する場合、事前定義の方法で出力確率分布における出力確率の最も高いＮ個の単語を選出して決定し、これらＮ個の単語のうちＭ個の単語を候補出力単語として決定する。ＮはＭより大きい整数である。一部の実装形態において、事前にＮの数値を特定してもよい。 In some other embodiments, when determining candidate output words for each time step, the N words with the highest output probabilities in the output probability distribution are selected and determined by a predefined method, and these N words are determined. Of the words, M words are determined as candidate output words. N is an integer greater than M. In some implementations, the value of N may be specified in advance.

他の一部の実施方法において、事前に出力確率閾値を決定し、かつ出力確率が前記出力確率閾値より大きいＮ個の単語のうちＭ個の単語を候補出力単語として決定してもよい。 In some other implementation methods, the output probability threshold value may be determined in advance, and M words out of N words whose output probability is larger than the output probability threshold value may be determined as candidate output words.

出力確率の最も高いＮ個の単語のうち外部情報に属する単語が存在しない場合、これらのＮ個の単語のうち出力確率の最も高いＭ個の単語を候補出力単語として決定してもよい。 When there is no word belonging to external information among the N words having the highest output probability, the M words having the highest output probability among these N words may be determined as candidate output words.

出力確率の最も高いＮ個の単語のうち外部情報に属する単語が存在する場合、これらのＮ個の単語に存在する、外部情報に属する単語数ｎがＭ以上であれば、これらのＮ個の単語のうち出力確率の最も高くかつ外部情報に属するＭ個の単語を候補出力単語として決定し、Ｎ個の単語に存在する、外部情報に属する単語数ｎが所定数Ｍより小さいであれば、Ｎ個の単語のうち外部情報に属する単語、及び残りのＮ−ｎ個の単語のうち出力確率の最も高いＭ−ｎ個の単語を候補出力単語として決定する。 When there are words belonging to external information among the N words having the highest output probability, if the number n of words belonging to external information existing in these N words is M or more, these N words Among the words, M words having the highest output probability and belonging to external information are determined as candidate output words, and if the number n of words belonging to external information existing in N words is smaller than a predetermined number M, Of the N words, the word belonging to the external information and the remaining NN words having the highest output probability of MN are determined as candidate output words.

図２Ｂに示すように、出力単語確率分布における最も高い２個の単語はｗ３とｗ１１であり、出力確率が予め設定した出力確率閾値より大きい単語にはｗ３、ｗ７及びｗ１１が含まれ、ｗ２とｗ７は、外部情報に属する。 As shown in FIG. 2B, the two highest words in the output word probability distribution are w3 and w11, and words whose output probability is larger than the preset output probability threshold include w3, w7 and w11, and w2. w7 belongs to external information.

この場合、ｗ７が外部情報に属しかつｗ７の出力確率が出力確率閾値より大きいため、出力確率がより高いｗ３を選択するのではなく、ｗ７とｗ１１を候補出力単語として選択する。 In this case, since w7 belongs to external information and the output probability of w7 is larger than the output probability threshold value, w7 and w11 are selected as candidate output words instead of selecting w3 having a higher output probability.

このような方法で外部情報における単語が出力単語として決定される確率を高めることができる。 In this way, the probability that a word in external information will be determined as an output word can be increased.

タイムステップｔに出力した候補出力単語ａ、ｂとタイムステップｔ＋１の候補出力単語ｃ、ｄとを利用して少なくとも４個の出力候補系列ａｃ、ａｄ、ｂｃ、ｂｄを決定でき、同時確率の方法で各出力候補系列の出力確率を決定し、４個の出力系列ａｃ、ａｄ、ｂｃ、ｂｄ候補のうち出力確率の最も高い２つをタイムステップｔ＋１の後の候補テキストとして決定する。 At least four output candidate sequences ac, ad, bc, and bd can be determined by using the candidate output words a and b output in the time step t and the candidate output words c and d in the time step t + 1, and a method of simultaneous probability. Determines the output probability of each output candidate series, and determines two of the four output series ac, ad, bc, and bd candidates having the highest output probabilities as candidate texts after the time step t + 1.

例えば、候補出力単語ａ、ｂ、ｃ、ｄの出力確率はそれぞれＰ_ａ、Ｐ_ｂ、Ｐ_ｃ及びＰ_ｄで示すことができる。出力候補系列ａｃ、ａｄ、ｂｃ、ｂｄはそれぞれＰ_ａｃ＝Ｐ_ａ＊Ｐ_ｃ、Ｐ_ａｄ＝Ｐ_ａ＊Ｐ_ｄ、Ｐ_ｂｃ＝Ｐ_ｂ＊Ｐ_ｃ、及びＰ_ｂｄ＝Ｐ_ｂ＊Ｐ_ｄで示すことができる。Ｐ_ａｃ＞Ｐ_ａｄ＞Ｐ_ｂｃ＞Ｐ_ｂｄである場合、タイムステップｔ＋１で出力系列ａｃ、ａｄを後続のテキスト処理に使用する。 For example, the output probabilities of the candidate output words a, b, c, and d can be indicated by _{P a} , P _b , P _c, and P _{d, respectively.} The output candidate series ac, ad, bc, and bd are indicated by P _ac = P _a * P _c , P _ad = P _a * P _d , P _bc = P _b * P _c , and P _bd = P _b * P _d , respectively. be able to. When P _ac > P _ad > P _bc > P _bd , the output series ac and ad are used for the subsequent text processing in the time step t + 1.

一部の実施例において、さらに外部情報に基づいて出力候補系列を決定してもよい。例えば、式（６）を利用して出力候補系列のペナルティ値を決定できる。式（６）によって決定されるペナルティ値は、出力候補系列の同時出力確率を調整することができる。

Ｐ（ｙ_ｔ｜ｘ）はタイムステップｔで単語ｘを出力する確率を示し、ｈは、外部情報を示し、ｓｉｍ（ｙ_＜ｔ，ｈ）はタイムステップｔの前に生成した候補テキスト系列と外部情報との間の類似度を示す。 In some embodiments, the output candidate sequence may be further determined based on external information. For example, the penalty value of the output candidate series can be determined by using the equation (6). The penalty value determined by the equation (6) can adjust the simultaneous output probability of the output candidate series.

P (y _t | x) indicates the probability of outputting the word x in the time step t, h indicates external information, and sim (y _<t , h) indicates the candidate text sequence generated before the time step t. Shows the degree of similarity with external information.

一実装形態において、いかなる可能なテキスト類似度アルゴリズムでタイムステップｔの前に生成した候補テキスト系列と外部情報との間の類似度を決定してもよい。例えば、コサイン類似度の方法を使用してタイムステップｔの前に生成した候補テキスト系列と外部情報との間の類似度を決定することができる。 In one implementation, any possible text similarity algorithm may determine the similarity between the candidate text sequence generated before time step t and the external information. For example, the cosine similarity method can be used to determine the similarity between the candidate text sequence generated before time step t and the external information.

上記式（６）を利用して、タイムステップｔの前に生成された候補テキスト系列と外部情報との間の類似度が高いほど、ペナルティ値は、出力候補系列の出力確率を増やすために用いられる。一部の実装形態において、ペナルティ値ｓ（ｘ，ｙ）と出力候補系列における出力確率とを乗算又は加算し、タイムステップｔの前に生成された候補テキスト系列と外部情報との間の類似度に基づいて出力候補系列を決定する効果を実現することができる。 Using the above equation (6), the higher the similarity between the candidate text series generated before the time step t and the external information, the more the penalty value is used to increase the output probability of the output candidate series. Be done. In some implementations, the penalty value s (x, y) is multiplied or added to the output probabilities in the output candidate sequence, and the similarity between the candidate text sequence generated before the time step t and the external information. It is possible to realize the effect of determining the output candidate sequence based on.

すなわち、外部情報に基づいて上記出力候補系列に使用されるペナルティ値を決定することにより、外部情報が候補テキスト系列に現れる確率を高めることができる。それによって、外部情報が最終的に出力されるテキスト処理結果に表れる確率を高めることができる。 That is, by determining the penalty value used for the output candidate series based on the external information, the probability that the external information appears in the candidate text series can be increased. As a result, it is possible to increase the probability that the external information will appear in the text processing result that is finally output.

他の実装形態において、前記出力部は、さらに外部情報と前記ソーステキスト符号化非表示状態との間の類似度を決定し、外部情報と前記ソーステキスト符号化非表示状態との間の類似度に基づいて現在タイムステップで出力しようとする単語を決定するように構成されてもよい。 In another implementation, the output unit further determines the degree of similarity between the external information and the source text coded hidden state, and the degree of similarity between the external information and the source text coded hidden state. It may be configured to determine the word currently to be output in the time step based on.

例えば、符号化部１１０を利用して前記外部情報を符号化して、外部情報符号化非表示状態を取得するようにしてもよい。 For example, the coding unit 110 may be used to encode the external information to acquire the external information coding non-display state.

出力部１４０は、前記外部情報符号化非表示状態と前記復号化非表示状態の類似度を決定するように構成されてもよい。外部情報符号化非表示状態と前記復号化非表示状態の類似度が事前定義された類似度閾値以上である場合、前記出力部は、前記外部情報を出力して現在タイムステップの出力とする。 The output unit 140 may be configured to determine the degree of similarity between the external information coding non-display state and the decoding non-display state. When the similarity between the external information coding non-display state and the decoding non-display state is equal to or higher than the predefined similarity threshold value, the output unit outputs the external information to output the current time step.

前記外部情報が単語である場合、前記外部情報を現在タイムステップの単語として出力してもよい。前記外部情報が文である場合、前記外部情報を直接現在タイムステップｔの前に生成されたテキスト系列の後に挿入してもよい。 When the external information is a word, the external information may be output as a word of the current time step. If the external information is a sentence, the external information may be inserted directly after the text sequence generated before the current time step t.

なお、現在タイムステップｔの前に既に生成されたテキスト系列は、前記出力確率分布における確率の最も高い単語に基づいて生成されてもよく、出力確率分布における確率の最も高いいくつかの候補出力単語に基づいて生成されてもよい。前記実装方法で説明したプロセスを採用して候補出力単語を決定してもよく、ここでは、詳しく説明しない。 The text sequence already generated before the current time step t may be generated based on the word having the highest probability in the output probability distribution, and some candidate output words having the highest probability in the output probability distribution. It may be generated based on. Candidate output words may be determined by adopting the process described in the implementation method, which will not be described in detail here.

外部情報符号化非表示状態と前記復号化非表示状態との類似度が事前定義の類似度閾値より小さい場合、前記出力部は復号化部及びアテンション生成部から出力される結果に基づいて現在タイムステップの出力単語確率分布を決定し、現在タイムステップの出力単語確率分布に基づいて現在タイムステップの出力単語を決定する。 When the similarity between the external information coding non-display state and the decoding non-display state is smaller than the predefined similarity threshold value, the output unit has the current time based on the results output from the decoding unit and the attention generation unit. The output word probability distribution of the step is determined, and the output word of the current time step is determined based on the output word probability distribution of the current time step.

上記方法を利用して、復号化部から出力される結果と外部情報との間の類似度が比較的高い場合、直接外部情報で復号化部から出力される結果を置き換えてもよい。すなわち、この場合、現在タイムステップの出力後に決定したテキスト系列の結果は直前のタイムステップの出力後に決定したテキスト系列の後に外部情報を挿入して得られた結果である。 When the similarity between the result output from the decoding unit and the external information is relatively high by using the above method, the result output from the decoding unit may be directly replaced with the external information. That is, in this case, the result of the text series determined after the output of the current time step is the result obtained by inserting external information after the text series determined after the output of the immediately preceding time step.

その後、次のタイムステップの処理を行う場合、復号化部を利用して外部情報を符号化して次のタイムステップの復号化非表示状態を得て、後続の復号化プロセスが外部情報の結果を利用できるようにすることで、後の復号化で得られた結果と挿入した外部情報との間のセマンティック一貫性を保証することができる。 After that, when processing the next time step, the decoding unit is used to encode the external information to obtain the decoding hidden state of the next time step, and the subsequent decoding process obtains the result of the external information. By making it available, it is possible to guarantee the semantic consistency between the result obtained by the later decryption and the inserted external information.

外部情報が単語である場合、直前のタイムステップの復号化非表示状態と外部情報を利用して復号化部の入力として処理を行い、現在タイムステップの復号化非表示状態を取得することができる。 When the external information is a word, the decryption / non-display state of the immediately preceding time step and the external information can be used to perform processing as input of the decoding unit, and the decryption / non-display state of the current time step can be acquired. ..

外部情報に複数の単語を含む場合、復号化部で数回のループ処理を行う。第１個目のループにおける復号化部の入力は直前のタイムステップの復号化非表示状態と外部情報の第１個目の単語であり、その後のループにおける復号化部の入力は前回ループで得られた復号化非表示状態と外部情報の次の単語である。数回のループにより外部情報における単語毎に処理することができ、全ての外部情報を含む復号化非表示状態を取得して現在タイムステップの復号化非表示状態とする。 When the external information contains a plurality of words, the decoding unit performs loop processing several times. The input of the decoding unit in the first loop is the decoding hidden state of the immediately preceding time step and the first word of the external information, and the input of the decoding unit in the subsequent loop is obtained in the previous loop. The next word of the decrypted hidden state and the external information. It can be processed word by word in the external information by a few loops, and the decryption / non-display state including all the external information is acquired and the decryption / non-display state of the current time step is set.

一部の実装形態において、既に外部情報で復号化部から出力される結果を置き換えてテキスト処理結果に挿入した後は、上記外部情報符号化非表示状態と前記復号化非表示状態との類似度比較を実行しない。 In some implementations, after replacing the result output from the decoding unit with external information and inserting it into the text processing result, the degree of similarity between the external information coding non-display state and the decoding non-display state. Do not perform a comparison.

一部の例において、上記類似度閾値は所定のタイムステップｔに関する関数によって実装されてもよい。 In some examples, the similarity threshold may be implemented by a function for a given time step t.

前述したように、外部情報符号化非表示状態と前記復号化非表示状態の類似度が事前定義された類似度閾値より小さい場合、上記外部情報で復号化部出力を置き換えて出力にする動作を実行しなく、出力単語確率分布に基づいて出力結果を決定する。この場合、外部情報が最終のテキスト処理結果に表れる確率を高めるために、前記現在タイムステップの類似度閾値を調整することで調整後の類似度閾値を決定し、前記調整後の類似度閾値は、前記現在タイムステップの類似度閾値より小さく、かつ前記調整後の類似度閾値は次のタイムステップの類似度閾値として使用される。 As described above, when the similarity between the external information coding non-display state and the decoding non-display state is smaller than the predefined similarity threshold value, the operation of replacing the decoding unit output with the external information to output is performed. The output result is determined based on the output word probability distribution without execution. In this case, in order to increase the probability that the external information appears in the final text processing result, the adjusted similarity threshold is determined by adjusting the similarity threshold of the current time step, and the adjusted similarity threshold is set. The adjusted similarity threshold, which is smaller than the similarity threshold of the current time step, is used as the similarity threshold of the next time step.

例えば、式（７）を利用して類似度閾値を調整する。

ε_{ＳＩＭ，ｔ＋１}はタイムステップｔ＋１に使用される類似度閾値であり、ε_{ＳＩＭ，ｔ}はタイムステップｔに使用される類似度閾値であり、ｆ（ｔ）は時間ｔに関する単調減少関数である。例えば、ｆ（ｔ）は式（８）のように実装されてもよい。

ｔは現在タイムステップであり、ｋは、ソーステキストの長さであり、ｅは自然対数である。一部の変更例において、ｋは、ソーステキスト長さに関する関数で表示されてもよい。例えば、ｋをβとソーステキスト長さの積で表示されてもよく、βは０より大きく１より小さい事前定義されたパラメータである。 For example, equation (7) is used to adjust the similarity threshold.

ε _{SIM, t + 1} is the similarity threshold used for the time step t + 1, ε _{SIM, t} is the similarity threshold used for the time step t, and f (t) is a monotonic decrease function with respect to the time t. For example, f (t) may be implemented as in equation (8).

t is the current time step, k is the length of the source text, and e is the natural logarithm. In some modifications, k may be displayed as a function for source text length. For example, k may be displayed as the product of β and the source text length, where β is a predefined parameter greater than 0 and less than 1.

上記方法を利用して、タイムステップ毎に類似度閾値について単調減少の調整を実行することで、外部情報と復号化部の出力結果間の類似度がテキスト処理プロセスにて類似度閾値がより低いレベルまで低減されるため、外部情報と復号化部の出力結果間の類似度が現在タイムステップの類似度閾値より大きくなる確率が増えるようになる。すなわち、外部情報が最終のテキスト処理結果に現れる確率が増えるようになる。 By using the above method to adjust the monotonous decrease for the similarity threshold at each time step, the similarity between the external information and the output result of the decoding unit is lower in the text processing process. Since it is reduced to the level, the probability that the similarity between the external information and the output result of the decoding unit becomes larger than the similarity threshold of the current time step increases. That is, the probability that external information will appear in the final text processing result will increase.

本願によって提供されるテキスト処理装置１００はテキスト要約の生成プロセスにおいて、外部情報を利用して現在タイムステップのアテンション分布を決定すると同時に／又は、外部情報に基づいて現在タイムステップの出力単語を決定することで、テキスト処理のプロセス有効に外部情報の内容を考慮することができ、テキスト生成のプロセスにおいて外部情報を生成する確率を高め、外部情報を考慮する場合のテキスト生成の効果が改善できるようになる。 The text processing apparatus 100 provided by the present application uses external information to determine the attention distribution of the current time step and / or determines the output word of the current time step based on the external information in the text summary generation process. By doing so, the content of external information can be effectively considered in the text processing process, the probability of generating external information in the text generation process can be increased, and the effect of text generation when considering external information can be improved. Become.

本願によって提供されるテキスト処理装置１００を実装する場合、当業者は上記技術案に対して任意の組み合わせをしてもよい。例えば、テキスト処理装置１００でソーステキストについてテキスト処理を行うプロセスにおいて、アテンション生成部だけを利用して外部情報に基づいて外部情報を含むアテンション分布を生成し後続のテキスト処理で使用するようにし、かつ後続のテキスト処理プロセスにおいては、外部情報を考慮しないようにしてもよい。例えば、出力部だけを利用して外部情報に基づいて現在タイムステップで出力しようとする単語を決定し、その前の符号化、復号化とアテンション生成プロセスにおいては、外部情報を考慮しないようにしてもよい。さらに、アテンション生成部で、現在タイムステップのアテンション分布を生成する、及び出力部で、現在タイムステップで出力しようとする単語を決定するプロセスにおいて外部情報を考慮することで、テキスト処理結果に外部情報を含む可能性をさらに増やすようにしてもよい。 When implementing the text processing apparatus 100 provided by the present application, those skilled in the art may make any combination with the above technical proposal. For example, in the process of performing text processing on the source text by the text processing device 100, an attention distribution including external information is generated based on the external information by using only the attention generation unit, and the attention distribution is used in the subsequent text processing. External information may not be considered in subsequent text processing processes. For example, only the output section is used to determine the word to be output in the current time step based on the external information, and the coding, decoding and attention generation processes before that do not consider the external information. May be good. Furthermore, the attention generation unit generates the attention distribution of the current time step, and the output unit considers the external information in the process of determining the word to be output in the current time step, so that the external information can be added to the text processing result. May be made to further increase the possibility of including.

図３Ａで本開示の実施例にかかるアテンション生成部の概略的なブロック図を示している。図３Ａに示すアテンション生成部３００を利用して、外部情報に基づいてソーステキストのアテンション分布ａ^ｔを調整し、かつ外部情報を含むアテンション分布Ａ’を決定することができる。 FIG. 3A shows a schematic block diagram of the attention generation unit according to the embodiment of the present disclosure. Using attention generator 300 shown in FIG. 3A, and adjust the attention distribution a ^t of the source text based on external information, and can determine the attention distribution A 'with external information.

図３Ａに示すように、アテンション生成部３００には、ソーステキストアテンション決定部３１０、コンテンツ選択部３２０が含まれる。 As shown in FIG. 3A, the attention generation unit 300 includes a source text attention determination unit 310 and a content selection unit 320.

ソーステキストアテンション決定部３１０は、前記ソーステキスト符号化非表示状態と前記復号化非表示状態に基づいて前記ソーステキストの符号化アテンション分布ａ^ｔを決定するために用いられる。一部の実施例において、前記式（１）を利用してソーステキストの符号化アテンション分布ａ^ｔを決定することができる。 Source text attention determination unit 310 is used to determine the coding attention distribution a ^t of the source text based on the decoded non-display state and the source text encoding a non-display state. In some embodiments, it is possible to determine the coding attention distribution a ^t the source text by using the equation (1).

コンテンツ選択部３２０は、ソーステキストにおける単語毎の選択確率を決定するのに使用されてもよい。一部の実施例において、コンテンツ選択部３２０は、外部情報に基づいて前記ソーステキストに対する選択確率分布を決定するのに使用されてもよく、前記選択確率分布には、前記ソーステキストに対する単語毎の選択確率が含まれる。 The content selection unit 320 may be used to determine the selection probability for each word in the source text. In some embodiments, the content selection unit 320 may be used to determine a selection probability distribution for the source text based on external information, the selection probability distribution for each word for the source text. The selection probability is included.

一部の実施例において、コンテンツ選択部２２０コンテンツ選択ネットワーク（例えばＬＳＴＭネットワーク）を利用してソーステキストＩを処理し前記ソーステキストにおける単語毎の第一選択確率を決定するようにしてもよい。 In some embodiments, the content selection unit 220 content selection network (eg, LSTM network) may be used to process the source text I to determine the word-by-word first selection probability in the source text.

参考となるテキスト処理結果ｒｅｆ（すなわち予め決定されるトレーニングデータのテキスト処理結果）を利用して使用するコンテンツ選択ネットワークに対してトレーニングを行う。コンテンツ選択ネットワークのトレーニングプロセスにおいて、ソーステキストＩと参考となるテキスト処理結果ｒｅｆとに基づいて生成したタグ付け系列をコンテンツ選択ネットワークに入力して処理を行う。タグ付け系列とソーステキストＩの単語系列の長さは同じであり、タグ付け系列の第ｉ個要素の値は、ソーステキストＩの第ｉ個の単語が参考となるテキスト処理結果ｒｅｆに属しているか否かを識別する内容である。上記方法を利用してコンテンツ選択ネットワークについてトレーニングすることでコンテンツ選択ネットワークは、ソーステキストＩを処理しかつ前記ソーステキストにおける単語毎の第一選択確率の結果を出力することができ、第一選択確率はコンテンツ選択ネットワークに基づいてソーステキストＩにおけるこの単語が選択され最終のテキスト処理結果に表れる確率を示している。 Training is performed on the content selection network to be used by using the reference text processing result ref (that is, the text processing result of the training data determined in advance). In the training process of the content selection network, the tagging series generated based on the source text I and the reference text processing result ref is input to the content selection network for processing. The length of the word series of the tagging series and the source text I is the same, and the value of the i-th element of the tagging series belongs to the text processing result ref in which the i-th word of the source text I is used as a reference. It is a content that identifies whether or not it is present. By training the content selection network using the above method, the content selection network can process the source text I and output the result of the first selection probability for each word in the source text, and the first selection probability. Indicates the probability that this word in source text I will be selected and appear in the final text processing result based on the content selection network.

一部の実施例において、前記ソーステキストにおける外部情報に属する少なくとも１つの単語に対して、当該少なくとも１つの単語の選択確率を少なくとも事前定義された確率値λに決定してもよい。例えば、ソーステキストにおける各外部情報に属する単語の第二選択確率を事前定義された確率値λに決定し、外部情報に属してない他の単語の第二選択確率を０に決定する。 In some embodiments, for at least one word belonging to external information in the source text, the selection probability of the at least one word may be determined to be at least a predefined probability value λ. For example, the second selection probability of a word belonging to each external information in the source text is determined to be a predefined probability value λ, and the second selection probability of another word not belonging to the external information is determined to be 0.

上記第一選択確率と第二選択確率に基づいてソーステキストにおける単語毎の選択確率を決定してもよい。例えば、ソーステキストにおける単語毎の選択確率を第一選択確率と第二選択確率の和に決定してもよい。上記からわかるように、外部情報に属する単語に対してその選択確率は事前定義された確率値λ以上である。 The selection probability for each word in the source text may be determined based on the first selection probability and the second selection probability. For example, the selection probability for each word in the source text may be determined as the sum of the first selection probability and the second selection probability. As can be seen from the above, the selection probability of a word belonging to external information is equal to or higher than the predefined probability value λ.

前記選択確率分布に基づいて、コンテンツ選択部２２０は、前記ソーステキストにおける単語毎に対して、当該単語の選択確率によって当該単語のアテンションが決定できアテンション分布Ａが得られるために用いられる。一部の実施例において、コンテンツ選択部２２０は当該単語の選択確率が所定の選択確率閾値εより低い場合、現在タイムステップのアテンション分布において当該単語に使用されるアテンションをゼロとして決定するように構成されてもよい。また、コンテンツ選択部２２０は当該単語の選択確率が所定の選択確率閾値ε以上である場合、現在タイムステップのテンション分布において当該単語に使用されるアテンションをソーステキストの符号化アテンション分布ａ^ｔにおける当該単語のアテンションとして決定するように構成されてもよい。 Based on the selection probability distribution, the content selection unit 220 is used for each word in the source text so that the attention of the word can be determined by the selection probability of the word and the attention distribution A can be obtained. In some embodiments, the content selection unit 220 is configured to determine the attention used for the word in the attention distribution of the current time step as zero when the selection probability of the word is lower than the predetermined selection probability threshold value ε. May be done. Further, if the content selection unit 220 selects the probability of the word is predetermined selection probability threshold ε above, the in tension distribution of the current time step the attention to be used for the word in the coding attention distribution a ^t the source text It may be configured to be determined as a word attention.

上記アテンション生成部を利用することで、ソーステキストにおける単語毎に選択確率が生成でき、すなわち、単語毎のアテンションを決定する時、少なくとも式（１）を利用して計算したアテンションの大きさと当該単語の選択確率との双方を考慮すべである。当該単語の選択確率が所定の選択閾値より低い場合は、当該単語が現在タイムステップに現れる確率が低いと見なしてよく、そのため、後続のテキスト処理プロセスにおいては当該単語のアテンションを考慮しなくてもよい。 By using the above attention generation unit, the selection probability can be generated for each word in the source text, that is, when determining the attention for each word, at least the magnitude of the attention calculated using the equation (1) and the word concerned. Both should be considered with the selection probability of. If the selection probability of the word is lower than the predetermined selection threshold, it can be considered that the word is unlikely to appear in the current time step, so that the subsequent text processing process does not need to consider the attention of the word. Good.

式（９）でコンテンツ選択部を利用して決定したアテンション分布の結果を示してもよい。

はコンテンツ選択によって決定される現在タイムステップのアテンション分布であり、ｘは現在出力しようとする単語であり、ｊは現在タイムステップの番号であり、ｙ_{１：ｊ−１}は既に出力したテキスト系列、ｐ（ａ_ｊ ^ｉ｜ｘ，ｙ_{１：ｊ−１}）は、ソーステキストの符号化アテンション分布であり、例えばｐ（ａ_ｊ ^ｉ｜ｘ，ｙ_{１：ｊ−１}）は、前記式（１）、（２）によって計算された結果である。ｑは上記第一選択確率であり、λ＊ｈｉｎｔ＿ｔａｇは上記第二選択確率である。外部情報に属する第ｉ個の単語に対して、ｑ＋λｈｉｎｔ＿ｔａｇの第ｉ項の値はｑ_ｉ＋λであってもよく、外部情報に属してない第ｋ個の単語に対して、ｑ＋λｈｉｎｔ＿ｔａｇの第ｋ項の値はｑ_ｋであってもよい。 The result of the attention distribution determined by using the content selection unit in the equation (9) may be shown.

Is the attention distribution of the current time step determined by the content selection, x is the word to be output now, j is the number of the current time step, and y _{1: j-1} is the text sequence already output. p (a _j ⁱ | x, y _{1: j-1} ) is a coded attention distribution of the source text. For example, p (a _j ⁱ | x, y _{1: j-1} ) is the above equation (1). , (2) is the result calculated. q is the first-choice probability, and λ * hint_tag is the second-choice probability. Against the i-number of words belonging to the external information, the value of the i section of q + λhint_tag may be a _{q i} + lambda, against the k-number of words that do not belong to the external information, the k-th term of the q + λhint_tag The value of may be q _k .

外部情報に含まれた単語の選択確率を少なくとも事前定義された確率値λに設定することで、事前定義された確率値が所定の選択確率閾値εより大きいと、外部情報における単語がコンテンツ選択のステップにフィルタリングされるのを防止でき、外部情報における単語が後続のテキスト処理プロセスに入ることを保証し、これによって外部情報における単語がテキスト処理結果に現れる確率を高めることができる。なお、一部の実装形態において、事前定義された確率値λは所定の選択確率閾値ε以下に設定してもよい。この場合では、単語毎の選択確率を上記第一選択確率と第二選択確率の和に決定することで、外部情報における単語の選択確率を高めることができ、かつ外部情報における単語がテキスト処理結果に現れる確率を高める効果が実装できる。 By setting the selection probability of the word included in the external information to at least the predefined probability value λ, when the predefined probability value is larger than the predetermined selection probability threshold ε, the word in the external information is the content selection. It can prevent the words in the external information from being filtered into steps, ensuring that the words in the external information enter the subsequent text processing process, which can increase the probability that the words in the external information will appear in the text processing result. In some implementations, the predefined probability value λ may be set to a predetermined selection probability threshold ε or less. In this case, by determining the selection probability for each word as the sum of the first selection probability and the second selection probability, the selection probability of the word in the external information can be increased, and the word in the external information is the text processing result. The effect of increasing the probability of appearing in can be implemented.

図３Ｂで図３Ａに示すアテンション生成部を利用して現在タイムステップのアテンション分布を決定する例示的プロセスを示している。 FIG. 3B shows an exemplary process of determining the current time step attention distribution using the attention generator shown in FIG. 3A.

図３Ｂに示すように、ソーステキストにおける四つの単語を例とし、コンテンツ選択ネットワークを利用して第一、三項目の単語のアテンションを選択し、後続のテキスト処理プロセスにおいて使用する。 As shown in FIG. 3B, taking four words in the source text as an example, the attentions of the first and third words are selected using the content selection network and used in the subsequent text processing process.

図４では本願の実施例のアテンション生成部を利用する他の概略的なブロック図を示している。図４に示すように、アテンション生成部４００は、ソーステキストアテンション決定部４１０と外部情報アテンション決定部４２０を含んでもよい。図４に示しているアテンション生成部を利用して決定したアテンション分布Ａは、ソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔとのの双方を含む。 FIG. 4 shows another schematic block diagram using the attention generation unit of the embodiment of the present application. As shown in FIG. 4, the attention generation unit 400 may include a source text attention determination unit 410 and an external information attention determination unit 420. Attention distribution A determined utilizing the attention generator which is shown in FIG. 4 includes both the the attention distribution a ^'t attention distribution a ^t the external information of the source text.

一部の実施例において、ソーステキストアテンション決定部４１０は式（２）を利用して現在タイムステップのソーステキスト符号化非表示状態と現在タイムステップの復号化非表示状態とに基づいてソーステキストにおける単語毎の符号化アテンションパラメータｅ_ｉ ^ｔを決定する。 In some embodiments, the source text attention determination unit 410 utilizes equation (2) in the source text based on the source text coding hidden state of the current time step and the decoding hidden state of the current time step. It determines the coding attention parameters e _i ^t of each word.

外部情報アテンション決定部４２０は、前記ソーステキストにおける単語毎の外部アテンションパラメータを決定するのに使用され、外部情報に属する単語の外部アテンションパラメータは所定の第一外部アテンションパラメータに決定され、外部情報に属してない単語の外部アテンションパラメータは所定の第二外部アテンションパラメータに決定される。一部の実装形態において、第一外部アテンションパラメータはλ’として設置され、第二外部アテンションパラメータは０に設置されてもよく、λ’は０より大きい値である。 The external information attention determination unit 420 is used to determine the external attention parameter for each word in the source text, and the external attention parameter of the word belonging to the external information is determined to be a predetermined first external attention parameter and is used as the external information. The external attention parameter of a word that does not belong is determined to be a predetermined second external attention parameter. In some implementations, the first external attention parameter may be set as λ'and the second external attention parameter may be set to 0, where λ'is greater than 0.

前記符号化アテンションパラメータと外部アテンションパラメータとに基づいて前記ソーステキストに使用される単語毎のアテンションパラメータを決定する。例えば、単語毎の符号化アテンションパラメータと外部アテンションパラメータとの和を計算すること当該単語のアテンションパラメータｅ’_ｉ ^ｔを求めることができる。 The word-by-word attention parameter used in the source text is determined based on the coded attention parameter and the external attention parameter. For example, it is possible to obtain the attention parameters of the word to calculate the sum of the coded attention parameters and external attention parameters for each word e _'i ^t.

その後、アテンションパラメータｅ’_ｉ ^ｔに基づいてソーステキストの現在タイムステップのアテンション分布を決定する。例えば、アテンションパラメータｅ’_ｉ ^ｔに対してｓｏｆｔｍａｘ関数を運用することで、ソーステキストに使用される単語毎の現在タイムステップのアテンションを取得することができる。 Then, to determine the attention distribution for the current time step of the source text based on the attention parameters e _'i ^t. For example, by operating a softmax function for attention parameters e _'i ^t, it is possible to obtain the attention of the current time step for each word to be used in the source text.

上記方法を利用して、事前定義された外部アテンションパラメータで、ソーステキストにおける外部情報に属する単語のアテンションパラメータを調整でき、外部情報に属する単語アテンションに対する調整を実装することができる。なお、第一外部アテンションパラメータが０より大きいハイパーパラメータλ’で、第二外部アテンションパラメータが０に設置された場合、外部情報に基づいてソーステキストの単語毎のアテンション分布を調整することで、外部情報に属する単語のアテンションがさらに重要になるようにすることができる。 By using the above method, the attention parameter of the word belonging to the external information in the source text can be adjusted by the predefined external attention parameter, and the adjustment for the word attention belonging to the external information can be implemented. If the first external attention parameter is hyperparameter λ'greater than 0 and the second external attention parameter is set to 0, the attention distribution for each word in the source text is adjusted based on the external information to create an external. Attention of words belonging to information can be made more important.

上記の例では、第一外部アテンションパラメータをλ’に、第二外部アテンションパラメータを０の場合を例として本願の原理を説明したが、本願の範囲がそれに限られるわけではない。当業者は実際状況に基づいてソーステキストに使用される単語毎の外部アテンションのパラメータを設置してよく、最終的に外部情報に属する単語のアテンションがさらに重要になる効果だけを実装できればよい。例えば、第一外部アテンションｗｐパラメータλ_１’と設置し、第二外部アテンションをパラメータλ_２’と設置し、λ_１’、λ_２’はいかなる実数であり、λ_１’＞λ_２’だけ満足すればよい。 In the above example, the principle of the present application has been described by taking the case where the first external attention parameter is λ'and the second external attention parameter is 0, but the scope of the present application is not limited thereto. Those skilled in the art may set parameters for external attention for each word used in the source text based on the actual situation, and it is only necessary to implement only the effect that the attention of the word belonging to the external information becomes more important in the end. For example, _'placed between the second external Attention parameter lambda _2' First external Attention wp parameter lambda ₁ is installed and, lambda ₁ ', lambda _2' are any real numbers, satisfying only lambda ₁ '> lambda _2' do it.

一部の実施例において、ソーステキストアテンション決定部４１０は、前記式（１）、（２）を利用してソーステキストの符号化アテンション分布ａ^ｔを決定するのに使用されてもよい。外部情報アテンション決定部４２０は、前記外部情報の符号化アテンション分布ａ’^ｔを決定するのに使用されてもよい。 In some embodiments, the source text attention determination unit 410, the formula (1) may be used to determine the coding attention distribution a ^t the source text by using (2). External information attention determination unit 420 may be used to determine the coding attention distribution a ^'t of the external information.

この場合、図１で示している符号化部１１０を利用して前記外部情報を符号化して外部情報符号化非表示状態ｈ’を取得する。外部情報アテンション部４２０は、外部情報符号化非表示状態ｈ’と復号化非表示状態ｓとに基づいて外部情報の符号化アテンション分布を決定する。 In this case, the external information is encoded by using the coding unit 110 shown in FIG. 1 to acquire the external information coding non-display state h'. The external information attention unit 420 determines the coded attention distribution of the external information based on the external information coding non-display state h'and the decoding non-display state s.

例えば、上記式（１）、（２）を利用して外部情報の符号化アテンション分布ａ’^ｔを決定し、式（１）、（２）におけるソーステキスト符号化非表示状態ｈは、外部情報符号化非表示状態ｈ’に置き換えられるべきである。 For example, the formula (1), (2) using the determined coding attention distribution a ^'t of the external information, the equation (1), the source text encoding a non-display state h in (2), the external information It should be replaced with the coded hidden state h'.

一部の実装形態において、それぞれ外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔを計算する時、シェアパラメータの式（１）、（２）で計算し、すなわち、外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔを計算する時に使用するパラメータｖ^Ｔ、Ｗ_ｈ、Ｗ_Ｓ、ｂ_ａｔｔｎは同じであってもよい。他の一部の実施方法において、外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔとを計算するために用いられるトレーニングパラメータに対してそれぞれトレーニングしてもよく、すなわち、外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔを計算する時に使用するパラメータｖ^Ｔ、Ｗ_ｈ、Ｗ_Ｓ、ｂ_ａｔｔｎは異なってもよい。 In some implementations, when calculating the coding attention distribution a ^t the coding attention distribution a ^'t and the source text of each external information, share parameter equation (1), calculated in (2), i.e., parameter ^v T used in computing the coding attention distribution ^{a t} the coding attention distribution a ^'t and the source text of the external _{_{_{information, W h, W S, b}}} attn may be the same. In some other implementation, may be trained respectively training parameters used to compute the coding attention distribution a ^t the coding attention distribution a ^'t and the source text of the external information, namely , the parameter ^v T used in computing the coding attention distribution ^{a t} the coding attention distribution a ^'t and the source text of the external _{_{_{information, W h, W S, b}}} attn may be different.

図４で示しているアテンション生成部４００で外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａｔ生成し、図１で示している出力部で外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔに対して更なる処理を行い、現在タイムステップの出力単語確率分布を決定する。 Figure 4 is coded attention distribution at the generation of coded attention distribution a ^'t and the source text of the external information Attention generation unit 400 shown in, coding attention distribution a of the external information output unit is shown in Figure 1 'performs further processing on the ^t of a source text encoding attention distribution a ^t, determines the output word probability distribution of the current time step.

アテンション分布Ａ^ｔにソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔとの双方を含む場合、出力単語確率分布は、生成確率分布、ソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔの加重平均で示されてもよい。 If the attention distribution A ^t including both the attention distribution a ^'t attention distribution a ^t the external information source text, the output word probability distributions, generation probability distribution, attention attention distribution a ^t the external information of the source text it may be indicated by the weighted average of the distribution a ^'t.

一部の実施例において、式（１０）で出力単語確率分布を決定してもよい。

生成確率分布Ｐ_{ｖｏｃａｂ}は、ソーステキストの符号化非表示状態、復号化非表示状態及びソーステキストの符号化アテンション分布に基づいて式（３）によって決定されるのであってもよく、

は、ソーステキストの符号化アテンション分布における第ｉ個の単語のアテンションを示し、

は、外部情報の符号化アテンション分布における第ｉ個の単語のアテンションを示している。Ｐ_{ｇｅｎｅｒａｔｏｒ}、Ｐ_{ｐｏｉｎｔｅｒ}、及びＰ_Ｔはそれぞれ生成確率分布Ｐ_{ｖｏｃａｂ}、ソーステキストの符号化アテンション分布ａ^ｔ及び外部情報の符号化アテンション分布ａ’^ｔ用の重み係数を示している。 In some embodiments, the output word probability distribution may be determined by Eq. (10).

The generation probability distribution P _vocab may be determined by Eq. (3) based on the coded hidden state of the source text, the decoded hidden state, and the coded attention distribution of the source text.

Indicates the attention of the i-th word in the coded attention distribution of the source text.

Indicates the attention of the i-th word in the coded attention distribution of external information. P _generator, illustrates the weighting coefficients for _{P pointer,} and _{P T} are each generation probability distribution _{P vocab,} source text encoding attention distribution ^{a t} and coding attention distribution a of the external information ^'t.

一部の実装形態において、現在タイムステップｔのソーステキストの符号化非表示状態、復号化非表示状態、外部情報の符号化アテンション分布及び直前のタイムステップｔ−１の出力部の出力に基づいてＰ_{ｇｅｎｅｒａｔｏｒ}、Ｐ_{ｐｏｉｎｔｅｒ}、及びＰ_Ｔを決定してもよい。 In some implementations, based on the current time step t source text coded hidden state, decoded hidden state, external information coded attention distribution and the output of the output section of the immediately preceding time step t-1. P _generator , P _pointer , and P _T may be determined.

例えば、式（１１）に基づいてＰ_{ｇｅｎｅｒａｔｏｒ}、Ｐ_{ｐｏｉｎｔｅｒ}、及びＰ_Ｔを決定できる。

σは活性化関数を示し、例えばｓｉｇｍｏｉｄ関数であり、

はトレーニングしようとするパラメータであり、ｈ_ｔ ^＊はタイムステップｔでソーステキストの符号化非表示状態ｈと復号化非表示状態ｓに基づき式（３）、（４）によって決定されるパラメータであり、ｓ_ｔはタイムステップｔでの復号化非表示状態であり、ｘ_ｔはタイムステップｔで復号化部への入力であり、すなわち、直前のタイムステップｔ−１で出力部の出力であり、ａ’^ｔはアテンション生成部４００から出力される外部情報の符号化アテンション分布である。 _{For example, the P generator} , P _pointer , and P _T can be determined based on the formula (11).

σ indicates an activation function, for example, a sigmoid function.

Is a parameter to be trained, and _ht ^* is a parameter determined by equations (3) and (4) based on the coded hidden state h and the decoded hidden state s of the source text in the time step t. , _St is the decoding hidden state at the _{time step t, and x t} is the input to the decoding unit at the time step t, that is, the output of the output unit at the immediately preceding time step t-1. a ^'t is the sign of attention distribution of external information output from the attention generator 400.

一部の実施例において、出力部が出力単語確率分布を決定する時、他の方法で決定される確率分布結果を考慮してもよい。例えば、ソーステキストＩにおける単語ベクトルの形成した複数の文ベクトル間の相関性を考慮してソーステキストＩにおける各単語のソーステキストにおける重要度を決定してもよい。出力単語確率分布は、さらに上記重要度によって形成した単語確率分布を含んでもよい。当業者にとって、出力単語確率分布の生成方式はこれに限られなく、本開示原理を逸脱しない状況で、出力単語確率分布は各種形の単語確率分布を含んでもよい。 In some embodiments, when the output unit determines the output word probability distribution, the probability distribution results determined by other methods may be considered. For example, the importance of each word in the source text I in the source text may be determined in consideration of the correlation between the plurality of sentence vectors formed by the word vectors in the source text I. The output word probability distribution may further include a word probability distribution formed by the above importance. For those skilled in the art, the method of generating the output word probability distribution is not limited to this, and the output word probability distribution may include various types of word probability distributions in a situation that does not deviate from the present disclosure principle.

上記方式を利用して、アテンション生成部４００で外部情報の現在タイムステップでのアテンション分布ａ’^ｔを決定し、かつ外部情報の現在タイムステップでのアテンション分布ａ’^ｔを現在タイムステップの出力確率分布を決定してもよい。本開示よって提供される実施例において、外部情報の特徴ではなく外部情報のアテンション分布を利用して現在タイムステップの出力確率分布を決定することで、外部情報の特徴における無効情報が現在タイムステップの出力確率分布に与える影響を避けることができる。 Using the above method, the current attention distribution a at time step 'determines ^t, and now attention distribution a in the time step of the external information' for the current time step ^t-output probability of the external information Attention generator 400 The distribution may be determined. In the embodiments provided by the present disclosure, the output probability distribution of the current time step is determined by using the attention distribution of the external information instead of the feature of the external information, so that the invalid information in the feature of the external information is the current time step. The effect on the output probability distribution can be avoided.

図５は本願の実施例にかかるテキスト処理装置の他の例示的な一実施例を示している。 FIG. 5 shows another exemplary embodiment of the text processing apparatus according to the embodiment of the present application.

図５に示すように、テキスト処理装置５００は符号化部５１０、復号化部５２０、アテンション生成部５３０、出力部５４０及び後処理部５５０を含んでもよい。符号化部５１０、復号化部５２０、アテンション生成部５３０、出力部５４０は、図１〜図３で説明している符号化部１１０、復号化部１２０、アテンション生成部１３０と出力部１４０で実装すればよく、ここでは、詳しく説明しない。 As shown in FIG. 5, the text processing device 500 may include a coding unit 510, a decoding unit 520, an attention generation unit 530, an output unit 540, and a post-processing unit 550. The coding unit 510, the decoding unit 520, the attention generation unit 530, and the output unit 540 are implemented by the coding unit 110, the decoding unit 120, the attention generation unit 130, and the output unit 140 described in FIGS. 1 to 3. This is fine, and I won't explain it in detail here.

後処理部５５０は、外部情報に基づいて前記候補テキストに対して後処理を行うことで、外部情報を含む出力テキストを決定するように構成されてもよい。 The post-processing unit 550 may be configured to determine the output text including the external information by performing post-processing on the candidate text based on the external information.

前述のように、符号化部５１０、復号化部５２０、アテンション生成部５３０、出力部５４０が図１〜図３で説明している符号化部１１０、復号化部１２０、アテンション生成部１３０と出力部１４０に合わせて実装できるので、出力部５４０は、外部情報を含むテキスト処理結果を出力できる。 As described above, the coding unit 510, the decoding unit 520, the attention generation unit 530, and the output unit 540 output the coding unit 110, the decoding unit 120, and the attention generation unit 130 described in FIGS. 1 to 3. Since it can be implemented according to the unit 140, the output unit 540 can output the text processing result including the external information.

出力部５４０から出力される結果に既に外部情報を含む場合は、出力部５４０から出力される結果を直接テキスト処理の結果としてもよい。 When the result output from the output unit 540 already contains external information, the result output from the output unit 540 may be directly used as the result of text processing.

出力部５４０から出力される結果に外部情報を含まない場合は、出力部５４０から出力される結果を候補テキストとし、後処理部５５０により外部情報に基づいて前記候補テキストについて後処理を行い、外部情報を含む出力テキストを決定するようにする。 When the result output from the output unit 540 does not include external information, the result output from the output unit 540 is used as the candidate text, and the post-processing unit 550 performs post-processing on the candidate text based on the external information, and externally. Try to determine the output text that contains the information.

一部の実施例において、外部情報は予め指定された情報を含んでもよい。例えば、外部情報は予め指定された文であり、又は予め指定された単語を含むソーステキストにおける文であってもよい。 In some embodiments, the external information may include pre-designated information. For example, the external information may be a pre-specified sentence or a sentence in a source text containing a pre-specified word.

前記予め指定された外部情報が文である場合、後処理部５５０は、前記候補テキストにおける文と前記外部情報の類似度を決定するように構成されてもよい。前記類似度が所定の候補類似度閾値より大きい場合、前記候補テキストにおける前記文を前記外部情報に置き換えてもよい。 When the pre-designated external information is a sentence, the post-processing unit 550 may be configured to determine the similarity between the sentence in the candidate text and the external information. When the similarity is larger than a predetermined candidate similarity threshold, the sentence in the candidate text may be replaced with the external information.

事前決定される外部情報が単語である場合、後処理部５５０はは、前記候補テキストにおける文と前記外部情報の類似度を決定し、前記類似度が所定の候補類似度閾値より大きい場合、前記候補テキストにおける前記文を前記外部情報に置き換えるように構成されてもよい。 When the pre-determined external information is a word, the post-processing unit 550 determines the similarity between the sentence in the candidate text and the external information, and when the similarity is larger than the predetermined candidate similarity threshold, the said It may be configured to replace the sentence in the candidate text with the external information.

一部の実装形態において、前記類似度が所定の候補類似度閾値より大きい場合、後処理部５５０は、前記候補テキストにおける前記文を削除し、外部情報としての文又は、外部情報としての単語を含む文で削除された候補テキストにおける文に置き換えるように構成されてもよい。 In some implementations, when the similarity is greater than a predetermined candidate similarity threshold, the post-processing unit 550 deletes the sentence in the candidate text and puts a sentence as external information or a word as external information. The containing sentence may be configured to replace the sentence in the deleted candidate text.

一部の例において、ソーステキストにおける外部情報と候補テキストにおける余剰情報のソーステキストにおける相関性に基づいて前記残りの情報に外部情報を挿入する。例えば、外部情報と候補テキストにおける余剰情報がソーステキストに現れる順番に基づいて外部情報を候補テキストの余剰情報に挿入する。 In some examples, external information is inserted into the remaining information based on the correlation between the external information in the source text and the surplus information in the candidate text in the source text. For example, the external information is inserted into the surplus information of the candidate text based on the order in which the external information and the surplus information in the candidate text appear in the source text.

他の一部の実施方法において、前記類似度が所定の候補類似度閾値より小さい場合、後処理部５５０は、前記外部情報と前記候補テキストにおける文の前記ソーステキストにおける相関性に基づいて、前記候補テキストに外部情報を插入する。 In some other embodiments, if the similarity is less than a predetermined candidate similarity threshold, the post-processing unit 550 will base the external information on the correlation of the text in the candidate text with the source text. Insert external information into the candidate text.

前記外部情報と前記候補テキストにおける各文の間の類似度を比較してもよい。前記外部情報と前記候補テキストにおける各文の類似度とも所定の候補類似度閾値より小さい場合、生成したテキスト処理結果に外部情報に類似する情報が含まれてないことを意味する。この場合、外部情報と候補テキストについて直接つづり合いを行い、最終のテキスト処理結果を決定してもよい。 The degree of similarity between the external information and each sentence in the candidate text may be compared. When the similarity between the external information and each sentence in the candidate text is smaller than the predetermined candidate similarity threshold, it means that the generated text processing result does not include information similar to the external information. In this case, the external information and the candidate text may be directly spelled to determine the final text processing result.

例えば、外部情報と候補テキストにおける文がソーステキストに現れる順番に従って、外部情報を候補テキストに挿入することで、最終のテキスト処理結果を決定してもよい。 For example, the final text processing result may be determined by inserting the external information into the candidate text in the order in which the sentences in the external information and the candidate text appear in the source text.

本願にかかる上記テキスト処理装置を利用して、効果的にテキスト処理結果に外部情報の内容を追加することができ、テキスト処理結果に期待する外部情報の内容を追加することが保証できる。 By using the text processing apparatus according to the present application, it is possible to effectively add the content of external information to the text processing result, and it is possible to guarantee that the content of the external information expected to be added to the text processing result is added.

前記に示すように、図１〜図３を参照して示しているテキスト処理装置における符号化部１１０、復号化部１２０、アテンション生成部１３０にはすべてトレーニングする必要のあるパラメータを含んでいる。そのため、機械学習を利用して符号化部１１０、復号化部１２０、アテンション生成部１３０における少なくとも１つに対してトレーニングを行う必要がある。 As shown above, the coding unit 110, the decoding unit 120, and the attention generation unit 130 in the text processing apparatus shown with reference to FIGS. 1 to 3 all include parameters that need to be trained. Therefore, it is necessary to train at least one of the coding unit 110, the decoding unit 120, and the attention generation unit 130 by using machine learning.

一部の実施例において、所定のソーステキストトレーニングセットを利用して前記符号化部、前記アテンション生成部、前記復号化部に対してトレーニングを行う。前記ソーステキストトレーニングセットは複数のトレーニングソーステキストを含む。 In some embodiments, a predetermined source text training set is used to train the coding unit, the attention generating unit, and the decoding unit. The source text training set includes a plurality of training source texts.

図１で示しているテキスト処理装置を利用してトレーニングソーステキストについて処理を行い、トレーニングソーステキストに対するトレーニングテキスト処理結果を取得することができる。例えば、符号化部を利用してトレーニングソーステキストを符号化してトレーニングソーステキストの符号化非表示状態を取得することができる。その後、復号化部を利用してトレーニング復号化非表示状態を決定することができる。またその後、アテンション生成部を利用して前記外部情報、前記トレーニングソーステキスト符号化非表示状態と前記トレーニング復号化非表示状態に基づいて現在タイムステップのトレーニングアテンション分布を決定することができる。出力部を利用して、前記トレーニングアテンション分布、前記トレーニングソーステキスト符号化非表示状態、前記トレーニング復号化非表示状態に基づいてトレーニング出力単語確率分布を決定することで、トレーニング出力単語を決定することができる。 The training source text can be processed by using the text processing device shown in FIG. 1, and the training text processing result for the training source text can be obtained. For example, the coding unit can be used to encode the training source text to obtain the coded non-display state of the training source text. After that, the training decoding hidden state can be determined by using the decoding unit. After that, the attention generation unit can be used to determine the training attention distribution of the current time step based on the external information, the training source text-coded non-display state, and the training decoding non-display state. The training output word is determined by determining the training output word probability distribution based on the training attention distribution, the training source text coding hidden state, and the training decoding hidden state using the output unit. Can be done.

前記符号化部、前記アテンション生成部、前記復号化部におけるパラメータを調整することで、トレーニングプロセスにおいて使用する損失関数を最小化し、前記符号化部、前記アテンション生成部、前記復号化部に対するトレーニングを実装するようにする。 By adjusting the parameters in the coding unit, the attention generating unit, and the decoding unit, the loss function used in the training process is minimized, and training for the coding unit, the attention generating unit, and the decoding unit is performed. Try to implement it.

一部の例において、トレーニングプロセスにおいて使用する損失関数ｌｏｓｓは下記の式（１２）として実装されてもよい。

Ｐ（ｗ_ｔ ^＊）はタイムステップｔの正解単語がタイムステップｔのトレーニング出力単語確率分布における確率値であり、（ｐ_Ｔ ^ｔ _{ｌａｂｅｌ}−ｐ_Ｔ ^ｔ）はトレーニング出力単語確率分布における正解の確率値とトレーニング出力単語確率分布における外部情報の確率値間の差異である。トレーニング出力単語に外部情報に属する単語が現れた時、（ｐ_Ｔ ^ｔ _{ｌａｂｅｌ}−ｐ_Ｔ ^ｔ）の値が小さく、トレーニング出力単語に外部情報に属する単語が現れてない時、（ｐ_Ｔ ^ｔ _{ｌａｂｅｌ}−ｐ_Ｔ ^ｔ）の値が大きい。 In some examples, the loss function lost used in the training process may be implemented as Eq. (12) below.

P _(w ^{t *)} is the probability value correct word of the time step t is in the training output word probability distribution of the time step _{^{_{_{t, (p T t label -p}}}} T t) is the probability value of the correct answer in the training output word probability distribution And the difference between the probability values of external information in the training output word probability distribution. When a word belonging to external information appears in the _{training output word, the value of (p T} ^t _label- p _T ^t ) is small, and when a word belonging to external information does not appear in the training output word, (p _T ^t _label-) The value of p _T ^t ) is large.

他の例において、トレーニングプロセスにおいて使用する損失関数ｌｏｓｓは下記式（１３）、（１４）によって実装することができる。

Ｔはテキスト処理プロセスにおける全体タイムステップであり、ｔは現在タイムステップを示し、Ｐ（ｗ_ｔ ^＊）はタイムステップｔの正解単語がタイムステップｔにトレーニング出力単語確率分布における確率値を示し、−ｌｏｇＰ（ｗ_ｔ ^＊）は負の対数尤度／クロスエントロピー項を示し、

は、ソーステキストの収束メカニズムの損失項を示し、ａ_ｉ ^{ａｒｔｉｃｌｅｔ}は現在タイムステップｔのソーステキストアテンション分布であり、

は前の全てのタイムステップソーステキストアテンション分布の合計であり、（ｐ_Ｔ ^ｔ _{ｌａｂｅｌ}−ｐ_Ｔ ^ｔ）はトレーニング出力単語確率分布における正解の確率値とトレーニング出力単語確率分布における外部情報の確率値間の差異であり、

は、外部情報の収束メカニズム損失項であり、ａ_ｉ ^Ｔ＿ｔは現在タイムステップｔの外部情報アテンション分布

である。γ、βは事前設定したハイパーパラメータである。 In another example, the loss function lost used in the training process can be implemented by the following equations (13) and (14).

T is the whole time step in text processing process, t is the current shows the time step, P (w t _^*) indicates the probability value in the training output word probability distribution correct word of the time step t is the time step t, - logP _(w ^{t *)} indicates the negative of the log-likelihood / cross entropy term,

Shows the loss term of the convergence mechanism of the source _text, ^{a i article t} is a source text attention distribution for the current time step t,

Is the sum of all time step source text attention distribution _{^{_{before, (p T t label -p T}}} t) is between the probability value of the external information in the probability value and the training output word probability distribution of the correct answer in the training output word probability distribution Is the difference between

Is the convergence mechanism loss term of the external information, and _ai ^T_t is the external information attention distribution of the current time step t.

Is. γ and β are hyperparameters set in advance.

図６は本願のテキスト処理方法による例示的フローチャートを示している。図６に示すように、ステップＳ６０２において、前記ソーステキストに対して符号化を行い、ソーステキスト符号化非表示状態を取得することができる。一部の実施例において、符号化ネットワークを利用してソーステキストについて符号化を行ってもよい。例示的符号化ネットワークには長・短期記憶（ＬＳＴＭ）ネットワークを含み、ＬＳＴＭネットワークベースのシステムは例えば機械翻訳、テキスト要約生成等のタスクに適用される。ここからわかるように、符号化ネットワークは単語ベクトルに対して符号化を行ういかなる機械学習モデルとして実装されてもよい。 FIG. 6 shows an exemplary flowchart according to the text processing method of the present application. As shown in FIG. 6, in step S602, the source text can be coded to obtain the source text coded non-display state. In some embodiments, a coding network may be used to code the source text. The exemplary coded network includes long short-term memory (LSTM) networks, and LSTM network-based systems are applied to tasks such as machine translation, text summarization generation, and the like. As can be seen, the coding network may be implemented as any machine learning model that encodes the word vector.

例えば、ソーステキストＩに対応する少なくとも１つの単語ベクトルを入力とする場合、符号化ネットワークは各単語ベクトルｘ_１、ｘ_２、ｘ_３…にそれぞれ対応するソーステキスト符号化非表示状態ｈ_１、ｈ_２、ｈ_３…を出力する。ソーステキスト符号化非表示状態の数とソーステキストの単語ベクトルの数は同じでもよいし、異なってもよい。例えば、ソーステキストＩに基づいてｋ個の単語ベクトルを生成する場合、符号化ネットワークはこのｋ個の単語ベクトルを処理してｋ個の対応するソーステキスト符号化非表示状態を生成する。Ｋは１より大きい整数である。 For example, when at least one word vector corresponding to the source text I is input, the coding network has source text coding hidden states h ₁ , h _{corresponding to each word vector x 1} , x ₂ , x _{3 ..., Respectively.} ₂ , h ₃ ... Is output. The number of source-text-coded hidden states and the number of word vectors in the source text may be the same or different. For example, if k word vectors are generated based on the source text I, the coding network processes the k word vectors to generate k corresponding source text coded hidden states. K is an integer greater than 1.

ステップＳ６０４において、復号化非表示状態を決定できる。一部の実施例において、復号化部１２０は直前のタイムステップｔ−１の復号化非表示状態ｓ_ｔ−１及び直前のタイムステップテキスト処理装置で得られた出力単語ｘ_ｔを受信し、かつｓ_ｔ−１とｘ_ｔとを処理することで現在タイムステップの復号化非表示状態ｓ_ｔを取得する。最初のタイムステップの処理でｓ_０とｘ_１はデフォルトの初期値として決定される。復号化非表示状態ｓは、ソーステキストＳに対応する複数の復号化非表示状態ｓ_１、ｓ_２、ｓ_３…を含んでもよい。例示的な復号化ネットワークは長・短期記憶ネットワークを含む。なお、復号化ネットワークは符号化ネットワークの出力に対して復号化を行ういかなる機械学習モデルによって実装されてもよい。 In step S604, the decryption hidden state can be determined. In some embodiments, the decoding unit 120 receives the decryption hidden state s _t-1 _{of the immediately preceding time step t-1 and the output word x t} obtained by the immediately preceding time step text processing apparatus, and By processing s _t-1 and x _t , the decoding hidden state s _t of the current time step is acquired. In the processing of the first time step, s ₀ and x ₁ are determined as default initial values. The decrypted hidden state s may include a plurality of decrypted hidden states s ₁ , s ₂ , s _3, ... Corresponding to the source text S. Illustrative decryption networks include long and short term storage networks. The decoding network may be implemented by any machine learning model that decodes the output of the coding network.

ステップＳ６０６において、外部情報、前記ソーステキスト符号化非表示状態と前記復号化非表示状態に基づいて現在タイムステップのアテンション分布を決定してもよい。 In step S606, the attention distribution of the current time step may be determined based on the external information, the source text coding hidden state and the decoding hidden state.

一部の実施例において、現在タイムステップｔのアテンション分布Ａ^ｔは、ソーステキストの符号化アテンション分布である。例えば、式（１）、（２）を利用してソーステキストの符号化アテンション分布ａ^ｔを決定することができる。 In some embodiments, the attention distribution At of the current time step ^t is the coded attention distribution of the source text. For example, the formula (1), it is possible to determine the coding attention distribution a ^t the source text by using (2).

他の一部の実施例において、外部情報及び式（１）に基づいて決定したソーステキストのアテンション分布ａ^ｔに基づいて外部情報を含む現在タイムステップのアテンション分布Ａ^ｔを決定し、かつ外部情報を含むアテンション分布Ａ^ｔを出力し、後続のテキスト処理プロセスにおいて使用する。 In some other implementations, determining the attention distribution A ^t the current time step comprising external information based on the attention distribution a ^t the determined source text based on the external information and the formula (1), and external information outputs attention distribution a ^t containing, for use in the subsequent text processing process.

図７で本願の実施例にかかる外部情報に基づいて現在タイムステップのアテンション分布を決定する例示的フローチャートを示している。 FIG. 7 shows an exemplary flowchart for determining the attention distribution of the current time step based on external information according to the embodiment of the present application.

ステップＳ７０２において、前記ソーステキスト符号化非表示状態と前記復号化非表示状態に基づいて前記ソーステキストの符号化アテンション分布を決定する。一部の実施例において、前記式（１）を利用してソーステキストの符号化アテンション分布ａ^ｔを決定することができる。 In step S702, the coded attention distribution of the source text is determined based on the source text coded hidden state and the decoded hidden state. In some embodiments, it is possible to determine the coding attention distribution a ^t the source text by using the equation (1).

ステップＳ７０４において、外部情報に基づいて前記ソーステキストに対する選択確率分布を決定し、前記選択確率分布は、前記ソーステキストにおける単語毎に使用される選択確率を含む。 In step S704, the selection probability distribution for the source text is determined based on external information, and the selection probability distribution includes the selection probability used for each word in the source text.

一部の実施例において、コンテンツ選択ネットワーク（例如ＬＳＴＭネットワーク）を利用してソーステキストＩについて処理を行うことで、前記ソーステキストにおける単語毎の第一選択確率を決定する。 In some embodiments, the source text I is processed using a content selection network (eg, an LSTM network) to determine the first selection probability for each word in the source text.

コンテンツ選択ネットワークは、ソーステキストＩについて処理を行い前記ソーステキストにおける単語毎の第一選択確率の結果を出力し、第一選択確率はコンテンツ選択ネットワークに基づいてソーステキストＩにおけるこの単語が選択され最終のテキスト処理結果に現れる確率を表示する。 The content selection network processes the source text I and outputs the result of the first selection probability for each word in the source text, and the first selection probability is the final selection of this word in the source text I based on the content selection network. Display the probability of appearing in the text processing result of.

一部の実施例において、前記ソーステキストにおける外部情報に属する少なくとも１つの単語について、当該少なくとも１つの単語の選択確率を少なくとも事前定義した確率値λに決定する。例えば、ソーステキストにおける各外部情報に属する単語の第二選択確率を事前定義した確率値λに決定し、外部情報に属する他の単語の第二選択確率を０に決定する。 In some embodiments, for at least one word belonging to external information in the source text, the selection probability of the at least one word is determined to be at least a predefined probability value λ. For example, the second selection probability of a word belonging to each external information in the source text is determined to be a predefined probability value λ, and the second selection probability of another word belonging to the external information is determined to be 0.

上記第一選択確率と第二選択確率に基づいてソーステキストにおける単語毎の選択確率を決定する。例えば、ソーステキストにおける単語毎の選択確率を第一選択確率と第二選択確率の合計に決定することができる。わかるように、外部情報に属する単語に対して、その選択確率は事前定義した確率値λ以上である。 The selection probability for each word in the source text is determined based on the first selection probability and the second selection probability. For example, the selection probability for each word in the source text can be determined as the sum of the first selection probability and the second selection probability. As can be seen, for words belonging to external information, the selection probability is greater than or equal to the predefined probability value λ.

ステップＳ７０６において、前記ソーステキストにおける各単語に対して当該単語の選択確率に基づいて当該単語のアテンションを決定することで前記アテンション分布を取得することができる。 In step S706, the attention distribution can be obtained by determining the attention of the word for each word in the source text based on the selection probability of the word.

前記選択確率分布に基づいて、ステップＳ７０６は、前記ソーステキストにおける各単語に対して当該単語の選択確率に基づいて当該単語のアテンションを決定しアテンション分布Ａを取得することを含んでもよい。一部の実施例において、ステップＳ７０６は当該単語の選択確率が所定の選択確率閾値εより小さい時、現在タイムステップのアテンション分布における当該単語に使用されるアテンションを０と決定することを含んでもよい。また、ステップＳ７０６は当該単語の選択確率が所定の選択確率閾値ε以上である時、現在タイムステップのアテンション分布における当該単語に使用されるアテンションをソーステキストの符号化アテンション分布ａ^ｔにおける当該単語のアテンションに決定することを含んでもよい。 Based on the selection probability distribution, step S706 may include determining the attention of the word for each word in the source text based on the selection probability of the word and acquiring the attention distribution A. In some embodiments, step S706 may include determining that the attention used for the word in the attention distribution of the current time step is 0 when the selection probability of the word is less than the predetermined selection probability threshold ε. .. Further, when the step S706 is selection probability of the word is predetermined selection probability threshold ε above, attention of the word in the coding attention distribution a ^t the source text to be used for the word in the attention distribution for the current time step It may include deciding on attention.

上記アテンション生成の方法を利用して、ソーステキストにおける単語毎に選択確率を生成することができ、すなわち、単語毎のアテンションを決定する時、少なくとも式（１）を利用して計算したアテンションの大きさ及び当該単語の選択確率の双方を考慮するべきである。当該単語の選択確率が所定の選択閾値より低い時、当該単語が現在タイムステップに現れる確率が非常に低いと判断できるため、後続のテキスト処理プロセスにおいて当該単語のアテンションを考慮しなくてもよい。 The selection probability can be generated for each word in the source text by using the above-mentioned method of generating attention, that is, when determining the attention for each word, at least the magnitude of the attention calculated by using the equation (1). Both the probability of selecting the word and the probability of selection of the word should be considered. When the selection probability of the word is lower than the predetermined selection threshold, it can be determined that the probability that the word appears in the current time step is very low, so that the attention of the word does not need to be considered in the subsequent text processing process.

外部情報に含まれた単語の選択確率を少なくとも事前定義された確率値λに設置し、事前定義された確率値が所定の選択確率閾値εより大きい場合は、外部情報における単語がコンテンツ選択のステップでフィルタリングされないように保証することで、外部情報における単語が後続のテキスト処理プロセスにおいて処理されることを保証し、それによって外部情報における単語がテキスト処理結果に現れる確率を高めることができる。なお、一部の実装形態において、事前定義された確率値λを所定の選択確率閾値ε以下に設置してもよい。この場合、単語毎の選択確率を上記第一選択確率と第二選択確率との合計に決定することで、外部情報における単語の選択確率を増やすことが実装でき、外部情報における単語がテキスト処理結果に現れる確率を高める効果が実装できる。 The selection probability of a word contained in the external information is set to at least a predefined probability value λ, and if the predefined probability value is larger than the predetermined selection probability threshold ε, the word in the external information is a content selection step. By guaranteeing that the words in the external information are not filtered by, it is possible to guarantee that the words in the external information are processed in the subsequent text processing process, thereby increasing the probability that the words in the external information will appear in the text processing result. In some implementations, the predefined probability value λ may be set below the predetermined selection probability threshold ε. In this case, by determining the selection probability for each word as the sum of the first selection probability and the second selection probability, it is possible to increase the selection probability of the word in the external information, and the word in the external information is the text processing result. The effect of increasing the probability of appearing in can be implemented.

図８では、本願の実施例にかかる外部情報に基づいて現在タイムステップのアテンション分布を決定するステップの他の一例示的なフローチャートを示している。 FIG. 8 shows another exemplary flowchart of the step of determining the attention distribution of the current time step based on the external information according to the embodiment of the present application.

ステップＳ８０２において、現在タイムステップのソーステキストの符号化アテンションを決定できる。 In step S802, the coding attention of the source text of the current time step can be determined.

一部の実施例において、式（２）を利用して現在タイムステップのソーステキスト符号化非表示状態と現在タイムステップの復号化非表示状態に基づいてソーステキストにおける単語毎の符号化アテンションパラメータｅ_ｉ ^ｔを決定することができる。その後、前記式（１）、（２）を利用してソーステキストの符号化アテンション分布ａ^ｔを決定する。 In some embodiments, the expression (2) is used to describe the coded attention parameter e for each word in the source text based on the source text-coded hidden state of the current time step and the decoded hidden state of the current time step. it is possible to determine the _i ^t. Thereafter, the formula (1), determines the coding attention distribution a ^t the source text by using (2).

ステップＳ８０４において、現在タイムステップの外部情報符号化アテンションを決定できる。 In step S804, the external information coding attention of the current time step can be determined.

一部の実施例において、前記ソーステキストにおける単語毎の外部アテンションパラメータが決定でき、外部情報に属する単語の外部アテンションパラメータは所定の第一外部アテンションパラメータに決定され、外部情報に属しない単語の外部アテンションパラメータは所定の第二外部アテンションパラメータに決定される。一部の実装形態において、第一外部アテンションパラメータはλ’に設置され、第二外部アテンションパラメータは０に設置され、λ’は０より大きい値である。 In some embodiments, the external attention parameter for each word in the source text can be determined, the external attention parameter of the word belonging to the external information is determined to be the predetermined first external attention parameter, and the external of the word not belonging to the external information. The attention parameter is determined to be a predetermined second external attention parameter. In some implementations, the first external attention parameter is set to λ', the second external attention parameter is set to 0, and λ'is a value greater than 0.

ステップＳ８０２で決定される前記符号化アテンションパラメータと上記外部アテンションパラメータに基づいて前記ソーステキストにおける単語毎に使用されるアテンションパラメータを決定できる。例えば、単語毎の符号化アテンションパラメータと外部アテンションパラメータの合計を求めることで当該単語のアテンションパラメータｅ’_ｉ ^ｔを決定してもよい。 The attention parameter used for each word in the source text can be determined based on the coded attention parameter and the external attention parameter determined in step S802. For example, it is possible to determine the attention parameters e _'i ^t in the word by obtaining the sum of the coded attention parameters and external attention parameters for each word.

その後、アテンションパラメータｅ’_ｉ ^ｔに基づいてソーステキストの現在タイムステップのアテンション分布を決定してもよい。例えば、アテンションパラメータｅ’_ｉ ^ｔにｓｏｆｔｍａｘ関数を適用することで、ソーステキストの単語毎に使用される現在タイムステップのアテンションを決定することができる。 It may then be determined attention distribution for the current time step of the source text based on the attention parameters e _'i ^t. For example, by applying the softmax function in attention parameters e _'i ^t, can be determined attention now time step used for each word in the source text.

上記方法を利用して、事前定義された外部アテンションパラメータを利用してソーステキストにおける外部情報に属する単語のアテンションパラメータを調整することができ、外部情報に属する単語アテンションの調整を実装することができる。なお、第一外部アテンションパラメータは０より大きいハイパーパラメータλ’に設置され、第二外部アテンションパラメータは０に設置された場合、外部情報に基づいてソーステキストの単語毎のアテンション分布を調整し、外部情報に属する単語のアテンションがさらに重要になるようにできる。 Using the above method, the attention parameter of the word belonging to the external information in the source text can be adjusted by using the predefined external attention parameter, and the adjustment of the word attention belonging to the external information can be implemented. .. When the first external attention parameter is set to the hyperparameter λ'greater than 0 and the second external attention parameter is set to 0, the attention distribution for each word of the source text is adjusted based on the external information to the outside. Attention of words belonging to information can be made more important.

一部の実施例において、前記外部情報を符号化することで外部情報符号化非表示状態ｈ’を取得することができる。また上記式（１）、（２）を利用して外部情報の符号化アテンション分布ａ’^ｔを決定することができ、式（１）、（２）におけるソーステキスト符号化非表示状態ｈは、外部情報符号化非表示状態ｈ’に置き換えられるべきである。 In some embodiments, the external information coding non-display state h'can be obtained by encoding the external information. The above formula (1), (2) utilizing the can determine the coding attention distribution a ^'t of the external information, the equation (1), the source text encoding a non-display state h in (2), It should be replaced by the external information coding hidden state h'.

一部の実装形態において、外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔとをそれぞれ計算する時、シェアパラメータを利用して式（１）、（２）の計算を行い、すなわち外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔを計算する時に使用するパラメータｖ^Ｔ、Ｗ_ｈ、Ｗ_Ｓ、ｂ_ａｔｔｎは同じであってもよい。他の一部の実施方法において、外部情報の符号化アテンション分布ａ’^ｔとソーステキスト的符号化アテンション分布ａ^ｔとを計算するためのトレーニングパラメータをそれぞれトレーニングし、すなわち、外部情報の符号化アテンション分布ａ’^ｔとソーステキストの符号化アテンション分布ａ^ｔを計算する時に使用するパラメータｖ^Ｔ、Ｗ_ｈ、Ｗ_Ｓ、ｂ_ａｔｔｎは異なってもよい。 In some implementations, when calculating the coding attention distribution a ^t the coding attention distribution a ^'t and the source text of the external information, respectively, wherein by using the share parameter (1), the calculation of (2) It was carried out, i.e., the parameter ^v _T, W _h used in calculating the coding attention distribution ^{a t} the coding attention distribution a ^'t and the source text of the external information, _{W S, b attn} may be the same. In some other implementation, the training parameters for calculating the coding attention distribution a ^'t and the source text coding attention distribution a ^t the external information training respectively, i.e., coding attention external information distribution a ^'t and the parameter ^v T used in computing the coding attention distribution ^{a t} the source _{_{_{text, W h, W S, b}}} attn may be different.

図６に戻して参照すると、ステップＳ６０８において前記アテンション分布、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定してもよい。 With reference to FIG. 6, the output word probability distribution may be determined based on the attention distribution, the source text-coded non-display state, and the decoding non-display state in step S608.

出力単語確率分布は、生成確率分布Ｐ_{ｖｏｃａｂ}を含んでもよい。式（３）と式（４）を利用して生成確率分布Ｐ_{ｖｏｃａｂ}を決定できる。 The output word probability distribution may include a generation probability distribution P _vocab . _{The generation probability distribution P vocab} can be determined using the equations (3) and (4).

一部の実施例において、出力単語確率分布は現在タイムステップのアテンション分布Ａ^ｔを含んでもよい。 In some embodiments, the output word probability distribution may include attention distribution A ^t the current time step.

例えば、前記生成確率分布と前記アテンション分布Ａについて重み付け加算を行うことで、出力単語確率分布を決定できる。 For example, the output word probability distribution can be determined by performing weighting addition on the generation probability distribution and the attention distribution A.

一部の実装形態において、現在タイムステップのソーステキスト符号化非表示状態、ソーステキスト復号化非表示状態、アテンション分布及び直前のタイムステップ復号化ネットワークの出力に基づいて生成確率分布とアテンション分布用の重み係数Ｐ_ｇｅｎを決定する。 In some implementations, for the generated probability distribution and attention distribution based on the source text-coded hidden state of the current time step, the source text-decoded hidden state, the attention distribution, and the output of the previous time-step decoding network. The weighting factor P _gen is determined.

例えば、前記生成確率分布及び前記アテンション分布について重み付け加算を行う重み係数Ｐ_ｇｅｎは式（５）のように示すことができる。 _{For example, the weighting coefficient P gen} that performs weighting addition on the generation probability distribution and the attention distribution can be expressed as in the equation (5).

アテンション分布Ａ^ｔにソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔの双方を含む場合、ソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔの重み係数パラメータは同じであってもよく、異なってもよい。 'When containing both ^t, attention distribution a of attention distribution a ^t the external information source text' to the attention distribution A ^t attention distribution a of attention distribution a ^t the external information source text weighting coefficient parameter of ^t the same It may or may not be different.

アテンション分布Ａ^ｔにソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔとの双方を含む場合、出力単語確率分布は、生成確率分布、ソーステキストのアテンション分布ａ^ｔと外部情報のアテンション分布ａ’^ｔとの加重平均で示すことができる。 If the attention distribution A ^t including both the attention distribution a ^'t attention distribution a ^t the external information source text, the output word probability distributions, generation probability distribution, attention attention distribution a ^t the external information of the source text it can be indicated by the weighted average of the distribution a ^'t.

一部の実施例において、式（１０）を利用して出力単語確率分布を決定してもよい。 In some embodiments, equation (10) may be used to determine the output word probability distribution.

一部の実装形態において、現在タイムステップｔのソーステキストの符号化非表示状態、ソーステキストの復号化非表示状態、外部情報の符号化アテンション分布及び直前のタイムステップｔ−１の復号化ネットワークの出力に基づいてＰ_{ｇｅｎｅｒａｔｏｒ}、Ｐ_{ｐｏｉｎｔｅｒ}、及びＰ_Ｔを決定してもよい。例えば、式（１１）を利用してＰ_{ｇｅｎｅｒａｔｏｒ}、Ｐ_{ｐｏｉｎｔｅｒ}、及びＰ_Ｔを決定する。 In some implementations, the currently coded hidden state of the source text in time step t, the decoded hidden state of the source text, the coded attention distribution of external information, and the decoding network of the immediately preceding time step t-1. _{The P generator} , P _pointer , and P _T may be determined based on the output. For example, equation (11) is used to determine the P _generator , P _pointer , and P _T.

一部の実施例において、ステップＳ６０８は、出力単語確率分布における確率の最も高い単語を現在タイムステップの出力単語に決定することを含んでもよい。 In some embodiments, step S608 may include determining the word with the highest probability in the output word probability distribution as the output word of the current time step.

図９は本願の実施例にかかるあるテキスト処理方法の例示的フローチャートを示している。 FIG. 9 shows an exemplary flowchart of a text processing method according to an embodiment of the present application.

図９に示すように、ステップＳ９０２において、前記ソーステキストについて符号化を行うことでソーステキスト符号化非表示状態を取得することができる。 As shown in FIG. 9, in step S902, the source text coding non-display state can be obtained by encoding the source text.

ステップＳ９０４において、復号化非表示状態を決定することができる。一部の実施例において、図６に示すステップＳ６０４を利用して復号化非表示状態を決定することができ、は繰り返して説明しない。 In step S904, the decryption hidden state can be determined. In some embodiments, step S604 shown in FIG. 6 can be used to determine the decryption-hidden state, which will not be reiterated.

ステップＳ９０６において、外部情報、前記ソーステキスト符号化非表示状態、前記復号化非表示状態に基づいて出力単語確率分布を決定し、出力単語を決定する。 In step S906, the output word probability distribution is determined based on the external information, the source text-coded non-display state, and the decoding non-display state, and the output word is determined.

他の一部の実施例において、ステップＳ９０６は、さらに外部情報に基づいて単語確率分布を決定かつ出力することで現在タイムステップの出力単語を決定してもよい。 In some other embodiments, step S906 may further determine and output the word probability distribution based on external information to determine the output word for the current time step.

一実装形態において、ステップＳ９０６は、前記外部情報に基づいて、前記候補出力単語において確率が出力確率閾値以上でありかつ前記外部情報に属する単語を現在タイムステップの候補出力単語に決定することを含んでもよい。 In one implementation, step S906 includes determining a word having a probability equal to or higher than the output probability threshold value in the candidate output word and belonging to the external information as a candidate output word of the current time step based on the external information. But it may be.

例えば、タイムステップ毎に少なくとも２つの単語を現在タイムステップの候補出力単語と決定し、その後、候補出力単語は次のタイムステップのテキスト処理プロセスにおいて使用される。同様に、次のタイムステップにおいても、少なくとも２つの候補出力単語を決定してもよい。 For example, at least two words per time step are determined as candidate output words for the current time step, after which the candidate output words are used in the text processing process for the next time step. Similarly, in the next time step, at least two candidate output words may be determined.

具体的に候補出力単語の数が２の場合を例として、タイムステップｔにおいて２つの候補出力単語ａ、ｂを決定することができる。その後、候補出力単語ａ、ｂは次のタイムステップのテキスト処理プロセスに利用され、タイムステップｔ＋１の候補出力単語ｃ、ｄを決定することができる。 Specifically, taking the case where the number of candidate output words is 2, two candidate output words a and b can be determined in the time step t. After that, the candidate output words a and b are used in the text processing process of the next time step, and the candidate output words c and d of the time step t + 1 can be determined.

一部の実施例において、タイムステップ毎の候補出力単語を決定する時、出力確率分布における出力確率が最も高い所定数のＭ個（上記の例では、Ｍは２である）の単語を候補出力単語として決定する。Ｍは２以上の整数である。 In some embodiments, when determining candidate output words for each time step, a predetermined number of M words (M is 2 in the above example) having the highest output probability in the output probability distribution are output as candidates. Determine as a word. M is an integer greater than or equal to 2.

他の一部の実施例において、タイムステップ毎の候補出力単語を決定する時、事前定義された方法に基づいて出力確率分布における選択出力確率が最も高いＮ個の単語を決定し、かつこれらのＮ個の単語のうちＭ個の単語を候補出力単語として決定してもよい。ＮはＭより大きい整数である。一部の実装形態において、予めＮの数値を指定してもよい。 In some other embodiments, when determining candidate output words for each time step, the N words with the highest selective output probabilities in the output probability distribution are determined based on a predefined method, and these Of the N words, M words may be determined as candidate output words. N is an integer greater than M. In some implementations, a numerical value of N may be specified in advance.

他の一部の実施方法において、出力確率閾値を事前に決定し、出力確率が前記出力確率閾値より大きいＮ個の単語の中からＭ個の単語を候補出力単語として決定してもよい。 In some other implementation methods, the output probability threshold value may be determined in advance, and M words may be determined as candidate output words from among N words whose output probability is larger than the output probability threshold value.

出力確率の最も高いＮ個の単語のうち外部情報に属する単語が存在しない場合、これらのＮ個の単語のうち出力確率の最も高いＭ個の単語を候補出力単語として決定する。 When there is no word belonging to external information among the N words having the highest output probability, the M words having the highest output probability among these N words are determined as candidate output words.

出力確率の最も高いＮ個の単語のうち外部情報に属する単語が存在する場合、これらのＮ個の単語に存在する、外部情報に属する単語の数ｎがＭ以上であるとこれらのＮ個の単語における出力確率の最も高いかつ外部情報に属するＭ個の単語を候補出力単語として決定する。これらのＮ個の単語に存在する、外部情報に属する単語数ｎが所定数Ｍより小さいと、これらのＮ個の単語における外部情報に属する単語と残りのＮ−ｎ個の単語における出力確率の最も高いＭ−ｎ個の単語を候補出力単語として決定する。 When there are words belonging to external information among the N words having the highest output probability, if the number n of words belonging to external information existing in these N words is M or more, these N words The M words having the highest output probability in the words and belonging to the external information are determined as candidate output words. When the number n of words belonging to external information existing in these N words is smaller than a predetermined number M, the output probabilities of the words belonging to external information in these N words and the remaining NN words The highest MN words are determined as candidate output words.

タイムステップｔに出力した候補出力単語ａ、ｂとタイムステップｔ＋１の候補出力単語ｃ、ｄとを利用して少なくとも４個の出力候補系列ａｃ、ａｄ、ｂｃ、ｂｄが決定でき、同時確率の方法で各出力候補系列の出力確率を決定し、４個の出力系列ａｃ、ａｄ、ｂｃ、ｂｄ候補のうち出力確率の最も高い２つをタイムステップｔ＋１後の候補テキストとして決定する。 At least four output candidate sequences ac, ad, bc, and bd can be determined by using the candidate output words a and b output in the time step t and the candidate output words c and d in the time step t + 1, and a method of simultaneous probability. Determines the output probability of each output candidate series, and determines two of the four output series ac, ad, bc, and bd candidates having the highest output probabilities as candidate texts after the time step t + 1.

一部の実施例において、また外部情報に基づいて出力候補系列を決定してもよい。例えば、式（６）を利用して出力候補系列のペナルティ値を決定できる。 In some embodiments, the output candidate sequence may also be determined based on external information. For example, the penalty value of the output candidate series can be determined by using the equation (6).

他の実装形態において、ステップＳ９０６は部情報と前記ソーステキスト符号化非表示状態との間の類似度を決定し、外部情報と前記ソーステキスト符号化非表示状態との間の類似度に基づいて現在タイムステップで出力しようとする単語を決定することを含んでもよい。 In another implementation, step S906 determines the similarity between the part information and the source text coded non-display state and is based on the similarity between the external information and the source text coded hidden state. It may include determining the word to be output in the current time step.

例えば、符号化ネットワークを利用して前記外部情報を符号化して外部情報符号化非表示状態を得てもよい。 For example, the external information may be encoded by using a coded network to obtain an external information coded non-display state.

ステップＳ９０６は、前記外部情報符号化非表示状態と前記復号化非表示状態の類似度を決定することを含んでもよい。外部情報符号化非表示状態と前記復号化非表示状態との類似度が事前定義された類似度閾値以上である場合、前記外部情報を出力して現在タイムステップの出力としてもよい。 Step S906 may include determining the similarity between the external information coded hidden state and the decoded hidden state. When the similarity between the external information coding non-display state and the decoding non-display state is equal to or higher than the predefined similarity threshold value, the external information may be output and used as the output of the current time step.

なお、現在タイムステップｔの前に既に生成されたテキスト系列は、前記出力確率分布における確率の最も高い単語に基づいて生成されたのであってもよく、出力確率分布における確率の最も高いいくつかの候補出力単語に基づいて生成されてもよい。前記実装方法において説明したプロセスによって採用候補出力単語を決定すればよく、ここでは、詳しく説明しない。 It should be noted that the text sequence already generated before the time step t at present may be generated based on the word having the highest probability in the output probability distribution, and some of the text series having the highest probability in the output probability distribution. It may be generated based on the candidate output words. Adoption candidate output words may be determined by the process described in the implementation method, and will not be described in detail here.

外部情報符号化非表示状態と前記復号化非表示状態との類似度が事前定義の類似度閾値より小さい場合、前記復号化ネットワークから出力される結果に基づいて現在タイムステップの出力単語確率分布を決定し、現在タイムステップの出力単語確率分布に基づいて現在タイムステップの出力単語を決定する。 If the similarity between the external information-coded non-display state and the decryption non-display state is smaller than the predefined similarity threshold, the output word probability distribution of the current time step is calculated based on the result output from the decoding network. Determine and determine the output word of the current time step based on the output word probability distribution of the current time step.

上記方法を利用して、復号化ネットワークから出力される結果と外部情報との間の類似度がより高い時、直接外部情報で復号化ネットワークから出力される結果を置き換えてもよい。すなわち、この場合、現在タイムステップの出力後に決定したテキスト系列の結果は直前のタイムステップの出力後に決定したテキスト系列の後に外部情報を挿入して得られた結果である。 When the similarity between the result output from the decoding network and the external information is higher by using the above method, the result output from the decoding network may be directly replaced by the external information. That is, in this case, the result of the text series determined after the output of the current time step is the result obtained by inserting external information after the text series determined after the output of the immediately preceding time step.

その後、次のタイムステップの処理を行う時、復号化ネットワークを利用して外部情報を符号化して次のタイムステップの復号化非表示状態を得て、後の復号化プロセスが外部情報の結果を利用できるようにすることで、後の復号化で得られた結果と挿入した外部情報との間のセマンティック一貫性を保証することができる。 After that, when processing the next time step, the external information is encoded using the decoding network to obtain the decryption hidden state of the next time step, and the later decoding process obtains the result of the external information. By making it available, it is possible to guarantee the semantic consistency between the result obtained by the later decryption and the inserted external information.

外部情報が単語である場合、直前のタイムステップの復号化非表示状態と外部情報を利用して復号化ネットワークの入力として処理を行い、現在タイムステップの復号化非表示状態を取得することができる。 When the external information is a word, it is possible to obtain the decryption / non-display state of the current time step by processing as the input of the decryption network using the decryption / non-display state of the immediately preceding time step and the external information. ..

外部情報に複数の単語を含む場合、復号化ネットワークで数回のループ処理を行う。第１個目のループにおける復号化ネットワークの入力は直前のタイムステップの復号化非表示状態と外部情報の第１個目の単語であり、その後のループにおける復号化ネットワークの入力は前回ループで得られた復号化非表示状態と外部情報の次の単語である。数回のループにより外部情報における単語毎に処理することができ、全ての外部情報を含む復号化非表示状態を取得して現在タイムステップの復号化非表示状態とする。 If the external information contains multiple words, the decryption network performs several loops. The input of the decryption network in the first loop is the decryption hidden state of the previous time step and the first word of the external information, and the input of the decryption network in the subsequent loops is obtained in the previous loop. The next word of the decrypted hidden state and external information. It can be processed word by word in the external information by a few loops, and the decryption / non-display state including all the external information is acquired and the decryption / non-display state of the current time step is set.

一部の実装形態において、既に外部情報で復号化ネットワークから出力される結果に置き換えテキスト処理結果に挿入した後は、上記外部情報符号化非表示状態と前記復号化非表示状態との類似度比較を実行しない。 In some implementations, after replacing the external information with the result output from the decoding network and inserting it into the text processing result, the similarity comparison between the external information coding non-display state and the decoding non-display state is performed. Do not execute.

前述したように、外部情報符号化非表示状態と前記復号化非表示状態の類似度が事前定義された類似度閾値より小さい場合、上記外部情報で復号化ネットワークの出力を置き換えて出力にする動作を実行しなく、出力単語確率分布に基づいて出力結果を決定する。この場合、外部情報が最終のテキスト処理結果に表れる確率を高めるために、前記現在タイムステップの類似度閾値を調整することで調整後の類似度閾値を決定し、前記調整後の類似度閾値は、前記現在タイムステップの類似度閾値より小さく、かつ前記調整後の類似度閾値は次のタイムステップの類似度閾値として使用される。 As described above, when the similarity between the external information coded non-display state and the decryption non-display state is smaller than the predefined similarity threshold value, the operation of replacing the output of the decryption network with the external information to output. The output result is determined based on the output word probability distribution without executing. In this case, in order to increase the probability that the external information appears in the final text processing result, the adjusted similarity threshold is determined by adjusting the similarity threshold of the current time step, and the adjusted similarity threshold is set. The adjusted similarity threshold, which is smaller than the similarity threshold of the current time step, is used as the similarity threshold of the next time step.

例えば、式（７）を利用して類似度閾値を調整する。 For example, equation (7) is used to adjust the similarity threshold.

タイムステップ毎に類似度閾値について単調減少の調整を実行することで、外部情報と復号化部の出力結果間の類似度がテキスト処理プロセスにて類似度閾値がより低いレベルまで低減されるため、外部情報と復号化ネットワークの出力結果間の類似度が現在タイムステップの類似度閾値より大きくなる確率が増えるようになる。すなわち、外部情報が最終のテキスト処理結果に現れる確率が増えるようになる。 By performing a monotonous reduction adjustment for the similarity threshold at each time step, the similarity between the external information and the output result of the decoding unit is reduced to a lower level in the text processing process. The probability that the similarity between the external information and the output result of the decryption network will be greater than the similarity threshold of the current time step will increase. That is, the probability that external information will appear in the final text processing result will increase.

図１０で本願の実施例にかかるあるテキスト処理方法の例示的フローチャートを示している。 FIG. 10 shows an exemplary flowchart of a text processing method according to an embodiment of the present application.

ステップＳ１００２において、前記ソーステキスト符号化非表示状態を取得するためにソーステキストを符号化する。 In step S1002, the source text is encoded in order to acquire the source text coding hidden state.

ステップＳ１００４において、復号化非表示状態を決定する。 In step S1004, the decryption hidden state is determined.

ステップＳ１００６において、前記ソーステキスト符号化非表示状態と前記復号化非表示状態とに基づいてタイムステップ毎の出力単語を決定し、候補テキストを決定する。 In step S1006, the output word for each time step is determined based on the source text coding hidden state and the decoding hidden state, and the candidate text is determined.

ステップＳ１００８において、外部情報に基づいて前記候補テキストに対して後処理を行い、外部情報を含む出力テキストを決定する。 In step S1008, post-processing is performed on the candidate text based on the external information, and the output text including the external information is determined.

ステップＳ１００６から出力される結果に外部情報を含まない場合、ステップＳ１００６から出力される結果を候補テキストとし、外部情報に基づいて前記候補テキストに対して後処理を行い、外部情報を含む出力テキストを決定する。 When the result output from step S1006 does not include external information, the result output from step S1006 is used as a candidate text, post-processing is performed on the candidate text based on the external information, and the output text including the external information is obtained. decide.

一部の実施例において、外部情報は予め指定された情報を含んでもよい。例えば、外部情報は予め指定された文であってもよく、予め指定された単語を含むソーステキストにおける文であってもよい。 In some embodiments, the external information may include pre-designated information. For example, the external information may be a pre-specified sentence or a sentence in a source text containing a pre-specified word.

前記予め指定された外部情報が文である場合、前記候補テキストにおける文と前記外部情報の類似度を決定してもよい。前記類似度が所定の候補類似度閾値より大きい場合、前記候補テキストにおける前記文を前記外部情報に置き換えてもよい。 When the pre-designated external information is a sentence, the similarity between the sentence in the candidate text and the external information may be determined. When the similarity is larger than a predetermined candidate similarity threshold, the sentence in the candidate text may be replaced with the external information.

事前決定される外部情報が単語である場合、前記候補テキストにおける文と前記外部情報の類似度を決定し、前記類似度が所定の候補類似度閾値より大きい場合、前記候補テキストにおける前記文を前記外部情報に置き換えてもよい。 When the pre-determined external information is a word, the similarity between the sentence in the candidate text and the external information is determined, and when the similarity is larger than the predetermined candidate similarity threshold, the sentence in the candidate text is referred to as the sentence. It may be replaced with external information.

一部の実装形態において、前記類似度が所定の候補類似度閾値より大きい場合、前記候補テキストにおける前記文を削除し、外部情報としての文又は、外部情報としての単語を含む文で削除された候補テキストにおける文に置き換えてもよい。 In some implementations, when the similarity is greater than a predetermined candidate similarity threshold, the sentence in the candidate text is deleted and deleted in a sentence as external information or a sentence containing a word as external information. It may be replaced with a sentence in the candidate text.

他の一部の実施方法において、前記類似度が所定の候補類似度閾値より小さい場合、前記外部情報と前記候補テキストにおける文の前記ソーステキストにおける相関性に基づいて、前記候補テキストに外部情報を插入する。 In some other embodiments, if the similarity is less than a predetermined candidate similarity threshold, the candidate text is populated with external information based on the correlation between the external information and the sentence in the candidate text in the source text. Insert.

本願にかかる上記テキスト処理方法を利用して、効果的にテキスト処理結果に外部情報の内容を追加することができ、テキスト処理結果に期待する外部情報の内容を追加することが保証できる。 By utilizing the above-mentioned text processing method according to the present application, the content of external information can be effectively added to the text processing result, and it can be guaranteed that the content of external information expected to be added to the text processing result is added.

本願かかるテキスト処理方法を利用して、テキストの生成プロセスにおいて、外部情報を利用して現在タイムステップのアテンション分布を決定すること及び／又は、外部情報に基づいて現在タイムステップの出力単語を決定することで、テキスト処理のプロセスにおいて外部情報の内容を有効に考慮でき、テキスト生成のプロセスにおいて外部情報を生成する確率を高め、外部情報を考慮する場合にテキストを生成する效果を改善することができる。 Using the text processing method of the present application, in the text generation process, the attention distribution of the current time step is determined by using external information and / or the output word of the current time step is determined based on the external information. As a result, the content of external information can be effectively considered in the text processing process, the probability of generating external information in the text generation process can be increased, and the effect of generating text when considering external information can be improved. ..

また、本願の実施例にかかる方法又は装置は、図１１に示す計算デバイスの構造によって実装されてもよい。図１１で当該計算デバイスの構造を示している。図１１に示すように、計算デバイス１１００はバス１１１０、１つ又は少なくとも２つのＣＰＵ１１２０、読み取り専用メモリ（ＲＯＭ）１１３０、ランダムアクセスメモリ（ＲＡＭ）１１４０、ネットワークに接続された通信ポート１１５０、入力／出力部品１１６０、ハードディスク１１７０等を含んでもよい。計算デバイス１１００におけるメモリデバイス、例えばＲＯＭ１１３０又はハードディスク１１７０は本願によって提供されるビデオにおいて目標を検出ための方法の処理及び／又は通信で使用する各種データ又はファイル及びＣＰＵの実行するプログラミングコマンドを記録してもよい。計算デバイス１１００はユーザインタフェース１１８０を含んでもよい。当然のことながら、図１１に示す構造は例示的なものであって、異なるデバイスを実装する場合、実際のニーズに応じて図１１に示す計算デバイスにおける１つ以上の部品を省略してもよい。 Further, the method or apparatus according to the embodiment of the present application may be implemented by the structure of the computing device shown in FIG. FIG. 11 shows the structure of the computing device. As shown in FIG. 11, the computing device 1100 includes bus 1110, one or at least two CPUs 1120, read-only memory (ROM) 1130, random access memory (RAM) 1140, networked communication port 1150, input / output. Parts 1160, hard disk 1170, and the like may be included. A memory device in the computing device 1100, such as ROM 1130 or hard disk 1170, records various data or files used in processing and / or communication of methods for detecting targets in the video provided by the present application and programming commands executed by the CPU. May be good. The computing device 1100 may include a user interface 1180. As a matter of course, the structure shown in FIG. 11 is exemplary, and when mounting different devices, one or more components in the computing device shown in FIG. 11 may be omitted depending on the actual needs. ..

本願の実施例は、さらにコンピュータ読み取り可能な記録媒体によって実装されてもよい。本願の実施例のコンピュータ読み取り可能な記録媒体にコンピュータ読み取り可能なコマンドが記録されている。前記コンピュータ読み取り可能なコマンドはプロセッサによって実行される場合、上記図面で説明している本願の実施例の方法を参照して実行してよい。前記コンピュータ読み取り可能な記録媒体は例えば揮発性メモリ及び／又は非揮発性メモリを含んでもよいがこれに限られない。前記揮発性メモリは例えばランダムアクセスメモリ（ＲＡＭ）及び／又はキャッシュ（ｃａｃｈｅ）等を含んでもよい。前記非発揮性メモリは例えば読み取り専用メモリ（ＲＯＭ）、ハードディスク、フラッシュメモリ等を含んでもよい。 The embodiments of the present application may be further implemented by a computer-readable recording medium. Computer-readable commands are recorded on the computer-readable recording medium of the embodiment of the present application. When the computer-readable command is executed by a processor, it may be executed with reference to the method of the embodiment of the present application described in the above drawings. The computer-readable recording medium may include, but is not limited to, for example, volatile and / or non-volatile memory. The volatile memory may include, for example, a random access memory (RAM) and / or a cache (cache) and the like. The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, or the like.

当業者が理解できるように、本願に記載される内容は各種の変形及び改善ができる。例えば、上記で説明した各種デバイス又は部品はハードウェアによって実装されてもよく、ソフトウェア、ファームウェア、又はそれらのうち、一部又は全部の組み合わせで実装されもよい。 As can be understood by those skilled in the art, the contents described in the present application can be variously modified and improved. For example, the various devices or components described above may be implemented by hardware, software, firmware, or a combination of some or all of them.

また、本願及び特許請求の範囲に示すように、文脈で例外のケースが明確に示される場合以外、「一」、「一個」、「一種」及び／又は「当該」等単語は単数形を指すわけではなく、複数形を含んでもよい。一般的には、「含む」や「含まれる」は、単に既に明確に符合をつけたステップ及び要素を含むことを示すだけで、これらのステップ及び要素は、網羅的なものではなく、方法又はデバイスは、他のステップ又は要素を含む可能性もある。 Also, as shown in the present application and claims, words such as "one", "one", "one" and / or "corresponding" refer to the singular, unless the context clearly indicates an exceptional case. Not necessarily, it may include the plural. In general, "contains" or "contains" simply indicates that they include steps and elements that have already been clearly signed, and these steps and elements are not exhaustive, but method or method. The device may also include other steps or elements.

また、本願では、本願の実施例にかかるシステムにおける一部のユニットに対して各種の援用をしているが、任意の数の異なるユニットは、ユーザ側及び／又はサーバ側に使用され、実行されうる。前記ユニットは説明的なものであって、前記システムと方法の異なる態様は異なるユニットを使用してもよい。 Further, in the present application, various references are made to some units in the system according to the embodiment of the present application, but an arbitrary number of different units are used and executed on the user side and / or the server side. sell. The unit is descriptive and different aspects of the system and method may use different units.

なお、本願では、フローチャートで本願の実施例にかかるシステムが実行する動作について説明している。なお、前の、又は後の動作は順番で実行されなくてもよい。逆に、逆の順序で、又は同時に各ステップを処理してもよい。それとともに、ほかの動作をこれらのプロセスに追加してもよく、又はこれらのプロセスから、あるステップ又は複数のステップを省略してもよい。 In this application, the operation executed by the system according to the embodiment of the present application is described by the flowchart. The previous or subsequent operations do not have to be executed in order. Conversely, each step may be processed in reverse order or at the same time. At the same time, other actions may be added to these processes, or one or more steps may be omitted from these processes.

特に定義されない限り、ここで使用される全ての用語（技術及び科学的用語を含む）は、当業者が一般的に理解する意味と同じ意味を有している。なお、特に定義されない限り、例えば、通常辞書によって定義される用語は、それらの関連技術の文脈での意味と一致する意味を有し、理想化又は形式上の意味で解釈するものではない。 Unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meanings commonly understood by those skilled in the art. Unless otherwise defined, for example, terms defined by ordinary dictionaries have meanings consistent with their meanings in the context of related arts and are not interpreted in an idealized or formal sense.

以上は本発明についての説明であり、本発明を制限するものではない。本発明の複数の例示的な実施例について説明したが、本発明の斬新の教示及び利点から背離しない限り、上記例示的な実施例に対して各種の修正を行ってもよいことは、当業者が理解しうる。したがって、これらの修正意図は請求項で限定する本発明の範囲に含まれるものである。なお、上記は本発明についての説明であり、開示された特定の実施例に限られるものではなく、開示された実施例及び他の実施例に対する修正意図も添付の特許請求の範囲に含まれる。本発明は特許請求の範囲及びその均等物に限定されるものである。 The above is a description of the present invention and does not limit the present invention. Although a plurality of exemplary embodiments of the present invention have been described, those skilled in the art may make various modifications to the above exemplary embodiments as long as they do not depart from the novel teachings and advantages of the present invention. Can be understood. Therefore, these amendments are included in the scope of the present invention as defined by the claims. It should be noted that the above is a description of the present invention, and is not limited to the disclosed specific examples, and amendments to the disclosed examples and other examples are also included in the appended claims. The present invention is limited to the scope of claims and their equivalents.

Claims

Source-text-coded A coder configured to code the source text to get the hidden state,
Decoding unit configured to determine the decryption hidden state,
A text processing device including external information, an output unit configured to determine an output word probability distribution based on the source text-coded non-display state, and the decrypted non-display state to determine an output word. ..

The output unit is configured to determine, among the candidate output words, a word having a probability equal to or higher than the output probability threshold value and belonging to the external information as a candidate output word for the current time step, based on the external information. Item 1. The text processing apparatus according to item 1.

The output unit further
The candidate output word of the current time step is based on the simultaneous probability of the candidate output word determined in the previous time step and the candidate sequence determined in the immediately preceding time step, and the similarity between the candidate sequence determined in the previous time step and the external information. Determine the candidate probability and
The text processing apparatus according to claim 2, wherein a predetermined number of candidate output words having the highest candidate probability are determined as output words.

The coding unit is further configured to encode the external information in order to acquire the external information coding hidden state.
The output unit determines the similarity between the external information coded non-display state and the decoded non-display state, and when the similarity is equal to or greater than the similarity threshold of the current time step, the output word outputs the external information. The text processing apparatus according to claim 1, which is configured to output as.

The output unit further
When the similarity is smaller than the current similarity threshold, the word with the highest probability in the output word probability distribution is determined as the output word of the current time step.
By adjusting the similarity threshold of the current time step, it is configured to determine an adjusted similarity threshold used as the similarity threshold of the next time step, which is smaller than the similarity threshold of the current time step. The text processing apparatus according to claim 4.

The text processing apparatus further includes an attention generator configured to determine the attention distribution of the current time step based on the external information, the source text coded hidden state and the decoded hidden state.
Claim 1 is configured such that the output unit determines an output word probability distribution based on the attention distribution, the source text-coded non-display state, and the decoding non-display state in order to determine the output word. The text processing device described in.

The coding unit and the decoding unit
Training Source Text Coding Steps to code the training source text to get the hidden state, and
Steps to determine the training decryption hidden state,
A step of determining the output word of the current time step based on the external information, the training source text coding hidden state, and the training decoding hidden state.
Claims 1 to 6 which are trained by the step of adjusting the parameters in the coding unit and the decoding unit so that the difference between the training output word and the word contained in the external information is minimized. The text processor described in either.

Source Text Coding Steps to code the source text to get the hidden state, and
Steps to determine the decryption hidden state and
A text processing method comprising determining an output word probability distribution based on external information, the source text-coded hidden state, and the decoded hidden state to determine the output word.

With the processor
Contains memory that stores computer-readable programming commands, and
A text processing device that performs the text processing method of claim 8 when the computer-readable programming command is executed by the processor.

A computer-readable recording medium on which computer-readable commands are recorded, wherein the computer performs the text processing method according to claim 8, when executed by a computer.