JP2020187729A

JP2020187729A - Text processing method, apparatus, device, and storage medium

Info

Publication number: JP2020187729A
Application number: JP2020033292A
Authority: JP
Inventors: シーホングオ; Xihong Guo; シンユグオ; xin yu Guo; アンシンリー; Anxin Li; ランチェン; Lan Chen
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2019-05-15
Filing date: 2020-02-28
Publication date: 2020-11-19
Also published as: CN112036152A

Abstract

To provide a text processing method configured to reduce duplicated sentences in an output text, a text processing apparatus, a text processing device, and a computer-readable storage medium.SOLUTION: A text processing method includes: acquiring an input text; analyzing the input text to obtain an analysis result corresponding to the input text; dividing the input text into a plurality of sections by means of clustering; and generating an output text on the basis of the sections and the analysis result.SELECTED DRAWING: Figure 1

Description

本出願は、テキスト処理分野に関し、特に、テキスト処理方法、テキスト処理装置、テキスト処理デバイス、及びコンピュータ読取可能な記憶媒体に関する。 The present application relates to the field of text processing, in particular to text processing methods, text processing devices, text processing devices, and computer-readable storage media.

ディープニューラルネットワークは、大規模なマルチパラメーターによる最適化ツールである。ディープニューラルネットワークは、大量のトレーニングデータを利用し、纏めるのが難しい隠される特徴をデータから学習できるため、顔検出、画像セマンティックセグメンテーション、テキスト要約抽出、物体検出、モーショントラッキング、自然言語翻訳などの多くの複雑なタスクを完了することができる。 Deep neural networks are large-scale multi-parameter optimization tools. Deep neural networks utilize large amounts of training data and can learn hidden features that are difficult to summarize from the data, so face detection, image semantic segmentation, text summarization, object detection, motion tracking, natural language translation, and much more. Can complete complex tasks.

テキスト要約抽出とは、明確な意味を有するテキスト内容を高レベルに概括化及び抽象化して、テキスト要約を生成することを指す。従来のテキスト要約抽出方法は、テキストの具体的な内容（表現の意味、文の構造、修辞技法、物語スタイルなど）に大きく依存している。従って、異なるテキスト（異なる長さのテキストなど）に異なるテキスト要約抽出方法が適用されるとき、パフォーマンスが異なる。また、テキスト要約抽出のプロセスでは、同じ情報の異なる表現が文章に複数回出ることがあるため、要約抽出プロセスで抽出された要約文が重複することに繋がる。 Text abstract extraction refers to generating a text abstract by high-level summarization and abstraction of text content that has a clear meaning. Traditional text summarization extraction methods rely heavily on the specific content of the text (meaning of expression, sentence structure, rhetorical technique, narrative style, etc.). Therefore, performance will differ when different text summarization methods are applied to different texts (such as texts of different lengths). In addition, in the text summary extraction process, different expressions of the same information may appear multiple times in a sentence, which leads to duplication of the summary sentences extracted in the summary extraction process.

上記問題に鑑みて、本開示は、テキスト処理方法、テキスト処理装置、テキスト処理デバイス、及びコンピュータ読取可能な記憶媒体を提供する。 In view of the above problems, the present disclosure provides a text processing method, a text processing device, a text processing device, and a computer-readable storage medium.

本開示の一態様によれば、ニューラルネットワークに基づくテキスト処理方法が提供される。当該方法は、入力テキストを取得することと、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することと、クラスタリングを利用して前記入力テキストを複数のセクションに分割することと、前記複数のセクションと前記分析結果に基づいて出力テキストを生成することと、を含む。 According to one aspect of the present disclosure, a text processing method based on a neural network is provided. The method is to obtain the input text, analyze the input text, obtain the analysis result corresponding to the input text, and divide the input text into a plurality of sections by using clustering. , The generation of output text based on the plurality of sections and the analysis results.

本開示の一態様によれば、クラスタリングを利用して前記入力テキストを複数のセクションに分割することは、前記複数のセクションに対応する複数の中心文を初期化する初期化ステップと、前記入力テキストの構成文と前記複数の中心文との間の類似度を計算することにより、類似度に基づいて前記入力テキストの構成文をそれぞれ前記複数の中心文に対応するセクションに割り当て、前記複数のセクションに含まれる構成文を更新する更新ステップと、前記複数のセクションにおいて、各構成文間の類似度を計算することにより、類似度の合計が最大となる構成文を新しい中心文として決定する決定ステップと、新しい中心文が変わらなくなるまで上記更新ステップと決定ステップを繰り返す繰り返しステップと、を含む。 According to one aspect of the present disclosure, dividing the input text into a plurality of sections by utilizing clustering includes an initialization step for initializing a plurality of central sentences corresponding to the plurality of sections and the input text. By calculating the similarity between the constituent sentence of the above and the plurality of central sentences, the constituent sentences of the input text are assigned to the sections corresponding to the plurality of central sentences based on the similarity, and the plurality of sections are assigned. An update step for updating the constituent statements included in the above, and a determination step for determining the constituent statement having the maximum total similarity as a new central sentence by calculating the similarity between the constituent statements in the plurality of sections. And a repetitive step of repeating the above update step and determination step until the new central sentence does not change.

本開示の一態様によれば、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することは、前記入力テキストのすべての構成文を分析してすべての構成文それぞれの文の重みを前記分析結果として取得することを含む。 According to one aspect of the present disclosure, analyzing the input text and obtaining the analysis result corresponding to the input text analyzes all the constituent sentences of the input text and of each sentence of all the constituent sentences. It includes acquiring the weight as the analysis result.

本開示の一態様によれば、前記複数のセクションと前記分析結果に基づいて出力テキストを生成することは、前記すべての構成文それぞれの文の重みに基づいて、前記複数のセクションのそれぞれにおいて、当該セクションで文の重みが最大となる構成文を当該セクションに対応する出力結果として選択することと、前記複数のセクションの出力結果を結合して出力テキストを生成することと、を含む。 According to one aspect of the present disclosure, generating output text based on the plurality of sections and the analysis results is performed in each of the plurality of sections based on the weight of each sentence of all the constituent sentences. This includes selecting the constituent sentence having the maximum sentence weight in the section as the output result corresponding to the section, and combining the output results of the plurality of sections to generate the output text.

本開示の一態様によれば、前記ニューラルネットワークは一つのテキスト処理レイヤを含み、入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、前記テキスト処理レイヤの出力テキストの所定の目標文数によって決定される。 According to one aspect of the present disclosure, the neural network includes one text processing layer, and the number of a plurality of sections obtained by dividing the input text by clustering is a predetermined target sentence of the output text of the text processing layer. Determined by the number.

本開示の一態様によれば、前記ニューラルネットワークは、カスケードされたＮ（Ｎ≧２）個のテキスト処理レイヤを含む。前記カスケードされたＮ個のテキスト処理レイヤにおけるｎ番目のテキスト処理レイヤが入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、前記ｎ番目のテキスト処理レイヤの出力テキストの所定の目標文数によって決定される。 According to one aspect of the present disclosure, the neural network includes a cascade of N (N ≧ 2) text processing layers. The number of sections obtained by the nth text processing layer in the cascaded N text processing layers dividing the input text by clustering is a predetermined target sentence of the output text of the nth text processing layer. Determined by the number.

本開示の他の一態様によれば、ニューラルネットワークに基づくテキスト処理装置が提供される。当該テキスト処理装置は、入力テキストを取得する取得部と、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得する分析部と、クラスタリングを利用して前記入力テキストを複数のセクションに分割する分割部と、前記複数のセクションと前記分析結果に基づいて出力テキストを生成する生成部と、を備える。 According to another aspect of the present disclosure, a text processing device based on a neural network is provided. The text processing device uses an acquisition unit for acquiring input text, an analysis unit for analyzing the input text and acquiring an analysis result corresponding to the input text, and clustering to divide the input text into a plurality of sections. It includes a division unit for dividing, the plurality of sections, and a generation unit for generating output text based on the analysis result.

本開示の他の一態様によれば、前記分割部は、前記複数のセクションに対応する複数の中心文を初期化し、前記入力テキストの構成文と前記複数の中心文との類似度を計算することにより、類似度に基づいて前記入力テキストにおける構成文をそれぞれ前記複数の中心文に対応するセクションに割り当て、前記複数のセクションに含まれる構成文を更新し、前記複数のセクションにおいて、それぞれの構成文間の類似度を計算することにより、類似度の合計が最大となる構成文を新しい中心文として決定し、新しい中心文が変わらなくなるまで上記プロセスを繰り返す。 According to another aspect of the present disclosure, the division portion initializes a plurality of central sentences corresponding to the plurality of sections and calculates the similarity between the constituent sentence of the input text and the plurality of central sentences. As a result, the constituent sentences in the input text are assigned to the sections corresponding to the plurality of central sentences based on the similarity, the constituent sentences included in the plurality of sections are updated, and the constituent sentences in the plurality of sections are each configured. By calculating the similarity between sentences, the constituent sentence having the maximum total similarity is determined as a new central sentence, and the above process is repeated until the new central sentence does not change.

本開示の他の一態様によれば、前記分析部は、前記入力テキストのすべての構成文を分析してすべての構成文それぞれの文の重みを分析結果として取得する。 According to another aspect of the present disclosure, the analysis unit analyzes all the constituent sentences of the input text and obtains the weight of each sentence of all the constituent sentences as an analysis result.

本開示の他の一態様によれば、前記出力部は、前記すべての構成文それぞれの文の重みに基づいて、前記複数のセクションのそれぞれにおいて、当該セクションで文の重みが最大となる構成文を当該セクションに対応する出力結果として選択し、複数のセクションの出力結果を結合して出力テキストを生成する。 According to another aspect of the present disclosure, the output unit is based on the sentence weights of all the constituent sentences, and in each of the plurality of sections, the constituent sentence in which the sentence weight is maximized in the section. Is selected as the output result corresponding to the section, and the output results of multiple sections are combined to generate the output text.

本開示の他の一態様によれば、前記ニューラルネットワークは一つのテキスト処理レイヤを含み、入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、前記テキスト処理レイヤの出力テキストの所定の目標文数によって決定される。 According to another aspect of the present disclosure, the neural network includes one text processing layer, and the number of sections obtained by dividing the input text by clustering is a predetermined number of output texts of the text processing layer. Determined by the target number of sentences.

本開示の他の一態様によれば、前記ニューラルネットワークは、カスケードされたＮ（Ｎ≧２）個のテキスト処理レイヤを含む。前記カスケードされたＮ個のテキスト処理レイヤにおけるｎ番目のテキスト処理レイヤが入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、前記ｎ番目のテキスト処理レイヤの出力テキストの所定の目標文数によって決定される。 According to another aspect of the present disclosure, the neural network includes a cascaded N (N ≧ 2) text processing layers. The number of sections obtained by the nth text processing layer in the cascaded N text processing layers dividing the input text by clustering is a predetermined target sentence of the output text of the nth text processing layer. Determined by the number.

本開示のさらに他の一態様によれば、ニューラルネットワークに基づくテキスト処理デバイスが提供される。前記テキスト処理デバイスは、コンピュータ読取可能な命令を格納するように構成されるメモリと、前記メモリに格納される前記コンピュータ読取可能な命令を実行するように構成されるプロセッサとを備える。前記コンピュータ読取可能な命令が前記プロセッサによって実行されるとき、入力テキストを取得することと、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することと、クラスタリングを利用して前記入力テキストを複数のセクションに分割することと、前記複数のセクションと前記分析結果に基づいて出力テキストを生成することと、を実行する。 According to yet another aspect of the present disclosure, a text processing device based on a neural network is provided. The text processing device includes a memory configured to store computer-readable instructions and a processor configured to execute the computer-readable instructions stored in the memory. When the computer-readable instruction is executed by the processor, the input text is acquired, the input text is analyzed, and the analysis result corresponding to the input text is acquired, and the clustering is utilized. Dividing the input text into a plurality of sections and generating the output text based on the plurality of sections and the analysis result are executed.

本開示のさらに他の一態様によれば、コンピュータ読取可能な記憶媒体が提供される。前記コンピュータ読取可能な記憶媒体には、コンピュータ読取可能な命令が格納されている。前記コンピュータ読取可能な命令がコンピュータによって実行されるとき、前記コンピュータはテキスト処理方法を実行する。前記方法は、入力テキストを取得することと、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することと、クラスタリングを利用して前記入力テキストを複数のセクションに分割することと、前記複数のセクションと前記分析結果に基づいて出力テキストを生成することと、を含む。 According to yet another aspect of the present disclosure, a computer-readable storage medium is provided. Computer-readable instructions are stored in the computer-readable storage medium. When the computer-readable instruction is executed by the computer, the computer performs a text processing method. The method is to obtain the input text, analyze the input text, obtain the analysis result corresponding to the input text, and divide the input text into a plurality of sections by using clustering. , The generation of output text based on the plurality of sections and the analysis results.

本開示の上記態様では、クラスタリングにより入力テキストを複数のセクションに分割し、複数のセクションに対応する分析結果をそれぞれ取得することで、取得した出力テキストにおける重複する文を削減する効果を得ることができ、よって出力テキストはより簡潔で明確になる。 In the above aspect of the present disclosure, it is possible to obtain the effect of reducing duplicate sentences in the acquired output text by dividing the input text into a plurality of sections by clustering and acquiring the analysis results corresponding to the plurality of sections. Yes, so the output text is more concise and clear.

図面を参照しながら本開示の実施形態をさらに詳細に説明することにより、本発明の上記及び他の目的、特徴及び利点は、より明らかになるであろう。図面は、本発明の実施形態のさらなる理解を提供することを意図し、明細書の一部となり、本開示の実施形態とともに本開示に対する解釈に用いられ、本開示に対する制限ではない。図面において、同じ参照番号は常に同じ部品またはステップを指す。 By describing the embodiments of the present disclosure in more detail with reference to the drawings, the above and other objects, features and advantages of the present invention will become clearer. The drawings are intended to provide a further understanding of the embodiments of the invention and are incorporated herein by reference and are used in an interpretation of the present disclosure along with the embodiments of the present disclosure and are not restrictions on the present disclosure. In drawings, the same reference number always refers to the same part or step.

本開示の一実施形態によるテキスト処理方法のフローチャートである。It is a flowchart of the text processing method by one Embodiment of this disclosure. 本開示の一実施形態による、文ベクトル及び語彙ベクトルに基づいて入力テキストの構成文の重みを取得する模式図である。It is a schematic diagram which acquires the weight of the constituent sentence of the input text based on the sentence vector and the vocabulary vector by one Embodiment of this disclosure. 本開示の一実施形態による、クラスタリングをニューラルネットワークに基づくテキスト処理方法に適用する模式図である。It is a schematic diagram which applies clustering to a text processing method based on a neural network according to one Embodiment of this disclosure. 本開示の実施形態による、クラスタリングを利用して入力テキストを分割する方法のフローチャートである。It is a flowchart of the method of dividing the input text by using clustering according to the embodiment of this disclosure. 本開示の実施形態による、クラスタリングを利用して入力テキストを分割する模式図である。It is a schematic diagram which divides an input text by using clustering according to embodiment of this disclosure. 本開示の実施形態による、複数のセクションのそれぞれに対応する分析結果を取得する例示的な模式図である。FIG. 5 is an exemplary schematic diagram for obtaining analysis results corresponding to each of a plurality of sections according to an embodiment of the present disclosure. 本開示の実施形態によるテキスト処理装置の模式図である。It is a schematic diagram of the text processing apparatus according to the embodiment of this disclosure. 本開示の実施形態によるテキスト処理デバイスの模式図である。It is a schematic diagram of the text processing device according to the embodiment of this disclosure. 本開示の実施形態によるコンピュータ読取可能な記憶媒体の模式図である。It is a schematic diagram of a computer-readable storage medium according to the embodiment of the present disclosure.

本開示の実施形態に係る技術的解決策は、以下、本開示の実施形態における図面と併せて明確かつ完全に説明する。説明する実施形態は、本開示の実施形態の一部に過ぎず、すべての実施形態ではないことは明らかである。創造的な作業なしに本開示の実施形態に基づいて当業者によって得られる他のすべての実施形態は、すべて本開示の範囲内である。 The technical solutions according to the embodiments of the present disclosure will be clearly and completely described below together with the drawings in the embodiments of the present disclosure. It is clear that the embodiments described are only a part of the embodiments of the present disclosure and not all embodiments. All other embodiments obtained by one of ordinary skill in the art based on the embodiments of the present disclosure without creative work are all within the scope of the present disclosure.

まず、図１を参照して本開示の一実施形態を実現するためのテキスト処理方法１００について説明する。本開示は、クラスタリングを利用して入力テキストを複数のセクションに分割することにより、同じ種別の構成文が一緒に分割され、その後、複数のセクションからそれぞれ抽出された分析結果が出力テキストとして結合される。これにより、入力テキストにおける重複する文を削除する目的を実現でき、よって出力テキストはより簡潔で明確になる。 First, a text processing method 100 for realizing one embodiment of the present disclosure will be described with reference to FIG. In the present disclosure, by dividing the input text into a plurality of sections using clustering, the constituent sentences of the same type are divided together, and then the analysis results extracted from the plurality of sections are combined as the output text. To. This can serve the purpose of removing duplicate sentences in the input text, thus making the output text more concise and clear.

図１に示すように、ステップＳ１０１では、入力テキストを取得する。 As shown in FIG. 1, in step S101, the input text is acquired.

当該入力テキストは、処理を行うと予想される元のテキストであり、当該元のテキストによって、予想の文字数または予想の文数の要約が生成される。 The input text is the original text that is expected to be processed, and the original text produces a summary of the expected number of characters or the expected number of sentences.

ステップＳ１０２では、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得する。 In step S102, the input text is analyzed, and the analysis result corresponding to the input text is acquired.

従来、多くの異なるアプローチがある。例えば、テキスト要約抽出について、位置方法、フレーズ方法、タイトル方法、キーワード方法などのさまざまな方法がある。このステップＳ１０２では、例えば、テキスト処理方法ごとに異なる重みを割り当て、テキスト処理方法ごとに入力テキストのすべての構成文を分析して、入力テキストのすべての構成文の重みを分析結果として取得してもよい。 Traditionally, there are many different approaches. For example, there are various methods for text summary extraction, such as position method, phrase method, title method, and keyword method. In step S102, for example, different weights are assigned to each text processing method, all the constituent sentences of the input text are analyzed for each text processing method, and the weights of all the constituent sentences of the input text are acquired as the analysis result. May be good.

あるいは、入力テキストの構成文の文ベクトルを利用して、各構成文の重みを取得してもよい。代わりに、入力テキストのすべての構成文を利用して対応する語彙ベクトルを取得し、異なる処理方法を利用して入力テキストを分析して得られた構成文ベクトルを語彙ベクトルと結合してから、ＶＧＧ−１６などの方法を利用し特徴量を抽出して各構成文の重みを取得してもよい。文ベクトルと語彙ベクトルに基づいて分析することにより、文の粒度の深さから語彙の粒度の深さまで細分化することができ、よって特別な文の認識性を改善することができる。さらに、ＶＧＧ−１６を利用することにより、特徴値間の関係をよりよく把握することができ、より精細な特徴を抽出することができる。 Alternatively, the weight of each constituent sentence may be obtained by using the sentence vector of the constituent sentence of the input text. Instead, use all the constituent sentences of the input text to get the corresponding vocabulary vector, analyze the input text using different processing methods, combine the resulting constituent sentence vector with the vocabulary vector, and then The feature amount may be extracted by using a method such as VGG-16 to obtain the weight of each constituent sentence. By analyzing based on the sentence vector and the vocabulary vector, it is possible to subdivide from the depth of sentence granularity to the depth of vocabulary granularity, and thus the recognizability of a special sentence can be improved. Further, by using VGG-16, the relationship between the feature values can be better grasped, and more detailed features can be extracted.

図２は、文ベクトル及び語彙ベクトルに基づいて入力テキストの構成文の重みを取得する模式図である。図２に示すように、まず、入力テキストのすべての構成文７０（｛Ｓ_１，Ｓ_２，Ｓ_３，…，Ｓ_７｝）がニューラルネットワークに入力される。次に、ニューラルネットワークでは、異なる処理方法が利用されて入力テキストのすべての構成文７０が分析され、構成文ごとに文ベクトル７１が生成される。次に、入力テキストのすべての構成文の語彙ベクトル７２が取得される。そして、取得された文ベクトル７１と語彙ベクトル７２に基づいて、ＶＧＧ−１６を利用して特徴量７４（｛Ｖ_１，Ｖ_２，Ｖ_３，…，Ｖ_７｝）が抽出され、当該特徴量７４は入力テキストの各構成文７０の重みに対応する。 FIG. 2 is a schematic diagram for acquiring the weights of the constituent sentences of the input text based on the sentence vector and the vocabulary vector. As shown in FIG. 2, first, all the constituent sentences 70 ({S ₁ , S ₂ , S ₃ , ..., S ₇ }) of the input text are input to the neural network. Next, in the neural network, all the constituent sentences 70 of the input text are analyzed by using different processing methods, and the sentence vector 71 is generated for each constituent sentence. Next, the vocabulary vector 72 of all the constituent sentences of the input text is acquired. Then, based on the acquired sentence vector 71 and the vocabulary vector 72, the feature amount 74 ({V ₁ , V ₂ , V ₃ , ..., V ₇ }) is extracted using VGG-16, and the feature amount is said. Reference numeral 74 corresponds to the weight of each constituent sentence 70 of the input text.

構成文の重みを取得する上記方法はこれに限らず、構成文の重みを取得するためには他の適切な方法を採用してもよいことを理解されたい。 It should be understood that the above method for obtaining the weight of the constituent sentence is not limited to this, and another appropriate method may be adopted for obtaining the weight of the constituent sentence.

次に、ステップＳ１０３では、クラスタリングを利用して前記入力テキストを複数のセクションに分割する。 Next, in step S103, the input text is divided into a plurality of sections by using clustering.

ここで、入力テキストをクラスタリングにより分割して得られるセクションの数は、当該出力テキストの所定の目標文数によって決定される。例えば、所望の出力テキストの所定の目標文数がＭである場合、入力テキストをクラスタリングにより分割して得られる複数のセクションの数もＭである。 Here, the number of sections obtained by dividing the input text by clustering is determined by a predetermined target number of sentences of the output text. For example, when the predetermined target number of sentences of the desired output text is M, the number of a plurality of sections obtained by dividing the input text by clustering is also M.

図３は、クラスタリングをニューラルネットワークに基づくテキスト処理方法に適用する模式図を示す。図３に示すように、ニューラルネットワークにカスケードされたＮ（Ｎ≧２）個のテキスト処理レイヤが含まれると想定すると、カスケードされたＮ個のテキスト処理レイヤのｎ番目のテキスト処理レイヤがクラスタリングにより入力テキストを分割して得られる複数のセクションの数は、ｎ番目のテキスト処理レイヤの出力テキストの所定の目標文数によって決定される。例えば、テキスト処理レイヤ１（図中では５２で表す）が、クラスタリング５０を利用して入力テキスト５１を分割して得られる複数のセクションの数は、テキスト処理レイヤ１の出力テキスト１（図中では５３で表す）の所定の目標文数によって決定されてもよいし、テキスト処理レイヤ２（図中で５４で表す）が、クラスタリング５６を利用して入力テキスト（つまり、図中の出力テキスト１）を分割して得られる複数のセクションの数は、テキスト処理レイヤ２の出力テキスト２（図中で５５で表す）の所定の目標文数によって決定されてもよい。 FIG. 3 shows a schematic diagram in which clustering is applied to a text processing method based on a neural network. As shown in FIG. 3, assuming that the neural network contains N (N ≧ 2) cascaded text processing layers, the nth text processing layer of the cascaded N text processing layers is clustered. The number of sections obtained by dividing the input text is determined by a predetermined target number of sentences of the output text of the nth text processing layer. For example, the number of a plurality of sections obtained by dividing the input text 51 by the text processing layer 1 (represented by 52 in the figure) by using the clustering 50 is the output text 1 of the text processing layer 1 (in the figure). It may be determined by a predetermined target number of sentences (represented by 53), or the text processing layer 2 (represented by 54 in the figure) utilizes clustering 56 to input text (that is, output text 1 in the figure). The number of the plurality of sections obtained by dividing the above may be determined by a predetermined target number of sentences of the output text 2 (represented by 55 in the figure) of the text processing layer 2.

ニューラルネットワークには１つのテキスト処理レイヤのみを含んでもよく、そうすると、入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、テキスト処理レイヤの出力テキストの所定の目標文数によって決定されることを理解されたい。 The neural network may contain only one text processing layer, so that the number of sections obtained by clustering the input text is determined by a predetermined target number of sentences in the output text of the text processing layer. Please understand that.

入力テキストがクラスタリングにより分割されて得られる複数のセクションの数を取得した後、次に、入力テキストをクラスタリングにより分割する操作を行う。 After obtaining the number of sections obtained by dividing the input text by clustering, the operation of dividing the input text by clustering is performed.

図４は、本開示の一実施形態による、クラスタリングを利用して入力テキストを分割する方法２００のフローチャートである。 FIG. 4 is a flowchart of the method 200 for dividing the input text by using clustering according to the embodiment of the present disclosure.

図４に示すように、ステップＳ２０１では、複数のセクションに対応する複数の中心文を初期化する。 As shown in FIG. 4, in step S201, a plurality of central sentences corresponding to the plurality of sections are initialized.

複数のセクションの数がＭである場合、各セクションに対して１つの中心文を選択する必要があるため、Ｍ個の中心文を選択する必要がある。例えば、入力テキストの最初のＭ個の文を初期化時の中心文として選択したり、入力テキストからランダムにＭ個の文を初期化時の中心文として選択したりすることができる。中心文を初期化する方法はこれに限らず、中心文を初期化するためには他の適切な方法を採用してもよいことを理解されたい。 When the number of a plurality of sections is M, it is necessary to select one central sentence for each section, so it is necessary to select M central sentences. For example, the first M sentences of the input text can be selected as the central sentence at the time of initialization, or M sentences can be randomly selected as the central sentence at the time of initialization from the input text. It should be understood that the method of initializing the central sentence is not limited to this, and other appropriate methods may be adopted to initialize the central sentence.

ステップＳ２０２では、入力テキストの構成文と複数の中心文との類似度を計算することにより、類似度に基づいて入力テキストの構成文を複数の中心文に対応するセクションに割り当て、当該複数のセクションに含まれる構成文を更新する。 In step S202, by calculating the similarity between the constituent sentences of the input text and the plurality of central sentences, the constituent sentences of the input text are assigned to the sections corresponding to the plurality of central sentences based on the similarity, and the plurality of sections are assigned. Update the constituent statements contained in.

例えば、Ｒｏｕｇｅ−１方式で類似度（または距離）を計算することができる。例えば、ｉ番目の文とｊ番目の文との類似度ｆ_ｉｊは次のように表すことができる。

For example, the similarity (or distance) can be calculated by the Rouge-1 method. For example, the similarity f _ij of the i-th sentence and the j-th sentence can be expressed as follows.

但し、Ｓ_ｉはｉ番目の文におけるｐ語彙（例えば、ｐ＝２の場合は二語彙を表し、ｐ＝３の場合は三語彙を表す）の数を表し、Ｒ_ｉｊはｉ番目の文とｊ番目の文における繰り返すｐ語彙の数を示す。ｆ_ｉｊが大きくなるほど、ｉ番目の文とｊ番目の文との類似度が高くなる。類似度を計算する方法はこれに限らず、類似度を計算するためには他の適切な方法を採用することもできることを理解されたい。 However, S _i represents the number of p vocabularies in the i-th sentence (for example, p = 2 represents two vocabularies, p = 3 represents three vocabularies), and R _ij represents the i-th sentence. The number of repeating p vocabularies in the jth sentence is shown. The larger f _{ij, the} higher the similarity between the i-th sentence and the j-th sentence. It should be understood that the method of calculating similarity is not limited to this, and other suitable methods can be adopted to calculate similarity.

次に、入力テキストにおける構成文と複数の中心文との間の類似度を計算することにより、類似度の高い入力テキストの構成文を、中心文に対応するセクションに割り当てる。 Next, by calculating the similarity between the constituent sentence in the input text and the plurality of central sentences, the constituent sentence of the input text having a high degree of similarity is assigned to the section corresponding to the central sentence.

例えば、入力テキストの構成文がＴ＝｛ｔ_１，ｔ_２，ｔ_３，…，ｔ_ｚ｝であり、入力テキストをクラスタリングにより分割して得られる複数のセクションの数がＭ＝２（つまり、第１セクションと第２セクション）であり、分割されて得られる二つのセクションの中心文として最初の２つの文ｔ_１，ｔ_２が選択されると想定すると、次に、入力テキストの残りの構成文｛ｔ_３，ｔ_４，…，ｔ_ｚ｝と中心文ｔ_１との間の類似度｛ｆ_３１，ｆ_４１，…，ｆ_ｚ１｝、及び｛ｔ_３，ｔ_４，…，ｔ_ｚ｝と中心文ｔ_２との間の類似度｛ｆ_３２，ｆ_４２，…，ｆ_ｚ２｝を計算することにより、ｆ_３１とｆ_３２を比較することで構成文ｔ_３を類似度の高い中心文に対応するセクションに割り当て（例えば、ｆ_３１＞ｆ_３２の場合、構成文ｔ_３を第１セクションに割り当て、ｆ_３１＜ｆ_３２の場合、構成文ｔ_３を第２セクションに割り当ててもよい）、ｆ_４１とｆ_４２を比較することで構成文ｔ_４を類似度の高い中心文に対応するセクションに割り当ててもよい。 For example, the constituent sentence of the input text is T = {t ₁ , t ₂ , t ₃ , ..., t _z }, and the number of a plurality of sections obtained by dividing the input text by clustering is M = 2 (that is, that is). Assuming that the first two sentences t ₁ and t ₂ are selected as the central sentences of the two sections obtained by the first section and the second section), then the rest of the input text is constructed. Similarity between the sentence {t ₃ , t ₄ , ..., t _z } and the central sentence t ₁ , {f ₃₁ , f ₄₁ , ..., f _{z 1} }, and {t ₃ , t ₄ , ..., t _z } similarity _{_{{f 32, f 42, ...}} , f z2} between the center sentence _{t 2} and by calculating the _high center sentence similarity configuration statement _{t 3} by comparing _{f 31} and _{f 32} (For example, if f ₃₁ > f ₃₂ , the constituent sentence t ₃ may be assigned to the first section, and if f ₃₁ <f ₃₂ , the constituent sentence t ₃ may be assigned to the second section). , F ₄₁ and f ₄₂ may be compared to assign the constituent sentence t ₄ to the section corresponding to the central sentence having a high degree of similarity.

すべての構成文を割り当てした後、複数のセクションに含まれる構成文を更新する。 After assigning all the constructs, update the constructs contained in multiple sections.

次に、ステップＳ２０３では、複数のセクションのそれぞれにおいて、各構成文間の類似度を計算することにより、類似度の合計が最大となる構成文を新しい中心文として決定する。 Next, in step S203, the constituent sentence having the maximum total similarity is determined as the new central sentence by calculating the similarity between the constituent sentences in each of the plurality of sections.

例えば、第１セクションに含まれる構成文が｛ｔ_３、ｔ_５、ｔ_６｝であると想定すると、各構成文｛ｔ_３、ｔ_５、ｔ_６｝間の類似度｛ｆ_３５、ｆ_３６、ｆ_５６｝を計算することにより、類似度の合計（例えば、ｔ_３の類似度の合計が（ｆ_３５＋ｆ_３６）であり、ｔ_５の類似度の合計が（ｆ_５３＋ｆ_５６）であり、ｔ_６の類似度の合計が（ｆ_６５＋ｆ_６３）である）が最大となる構成文がこのセクションにおける新しい中心文として選択される。 For example, assuming that the constructs contained in the first section are {t ₃ , t ₅ , t ₆ }, the similarity {f ₃₅ , f ₃₆ } between each construct {t ₃ , t ₅ , t ₆ } , F ₅₆ }, the total similarity (eg, the total similarity of t ₃ is (f ₃₅ + f ₃₆ )) and the total similarity of t ₅ is (f ₅₃ + f ₅₆ ). , T _{6 with} the largest sum of similarity (f ₆₅ + f ₆₃ )) is selected as the new central sentence in this section.

次に、ステップＳ２０４では、新しい中心文が変わらなくなるか、繰り返し回数が所定の閾値に達するまで、すなわち、複数のセクションのそれぞれが安定になるまで、上記ステップ２０２とステップ２０３を繰り返す。このようにして、各セクションの構成文が決定され、つまり、クラスタリングが完了する。クラスタリング方法は類似度の計算に限らず、他の適切な方法を採用してクラスタリングしてもよいことを理解されたい。 Next, in step S204, the steps 202 and 203 are repeated until the new central sentence does not change or the number of repetitions reaches a predetermined threshold value, that is, until each of the plurality of sections becomes stable. In this way, the constituent statements of each section are determined, that is, clustering is completed. It should be understood that the clustering method is not limited to the calculation of similarity, and other appropriate methods may be adopted for clustering.

また、複数のセクションにおける各構成文をマーク（ｍａｒｋ）してもよい。例えば、第１セクションに含まれる各構成文は「０」とマークされ、第２セクションに含まれる各構成文は「１」とマークされてもよい。同じマークでマークされた構成文は同じセクションに属する。 In addition, each constituent sentence in a plurality of sections may be marked. For example, each constituent sentence included in the first section may be marked as "0", and each constituent sentence included in the second section may be marked as "1". Constituents marked with the same mark belong to the same section.

図５は、本開示の実施形態による、クラスタリングを利用して入力テキストを分割する模式図である。図５に示すように、入力テキストの構成文２１がＴ＝｛ｔ_１，ｔ_２，ｔ_３，…，ｔ_７｝であり、入力テキストをクラスタリングにより分割して得られる複数のセクションの数はＭ＝３（第１セクション２２、第２セクション２３及び第３セクション２４）であり、３つのセクションの初期化のために選択された中心文はそれぞれｔ_１、ｔ_２、ｔ_３であると想定する。次に、入力テキストの残りの構成文｛ｔ_４，ｔ_５，ｔ_６，ｔ_７｝と各中心文｛ｔ_１，ｔ_２，ｔ_３｝との間の類似度を計算することにより、類似度に基づいて入力テキストの構成文｛ｔ_４，ｔ_５，ｔ_６、ｔ_７｝をそれぞれ中心文｛ｔ_１，ｔ_２，ｔ_３｝に対応するセクションに割り当てることができる。例えば、図５に示すように、ｔ_４は第１セクション２５に割り当てられ、ｔ_６及びｔ_７は第２セクション２６に割り当てられ、ｔ_５は第３セクション２７に割り当てられる。次に、各セクションにおいて、各構成文間の類似度を計算し、類似度の合計が最大となる構成文を新しい中心文として決定する。例えば、第２セクション２６では、ｔ_２、ｔ_６、及びｔ_７の間の類似度を計算し、類似度の合計が最大となるｔ_６を新しい中心文（図中では円で表す）として決定する。中心文が変わらなくなるまで、上記プロセスを繰り返す。 FIG. 5 is a schematic diagram for dividing the input text by using clustering according to the embodiment of the present disclosure. As shown in FIG. 5, the constituent sentence 21 of the input text is T = {t ₁ , t ₂ , t ₃ , ..., T ₇ }, and the number of a plurality of sections obtained by dividing the input text by clustering is It is assumed that M = 3 (1st section 22, 2nd section 23 and 3rd section 24), and the central sentences selected for the initialization of the 3 sections are t ₁ , t ₂ and t ₃ , respectively. To do. Next, the similarity is calculated by calculating the similarity between the remaining constituent sentences of the input text {t ₄ , t ₅ , t ₆ , t ₇ } and each central sentence {t ₁ , t ₂ , t ₃ }. Based on the degree, the constituent sentences {t ₄ , t ₅ , t ₆ , t ₇ } of the input text can be assigned to the sections corresponding to the central sentences {t ₁ , t ₂ , t ₃ }, respectively. For example, as shown in FIG. 5, t ₄ is assigned to the first section 25, t ₆ and t ₇ are assigned to the second section 26, and t ₅ is assigned to the third section 27. Next, in each section, the similarity between each constituent sentence is calculated, and the constituent sentence having the maximum total similarity is determined as a new central sentence. For example, in section 26, the similarity between t ₂ , t ₆ , and t ₇ is calculated, and t ₆ with the maximum total similarity is determined as the new central sentence (represented by a circle in the figure). To do. Repeat the above process until the central sentence does not change.

入力テキストがクラスタリングにより複数のセクションに分割された後、つまり、入力テキストにおける類似度の高い構成文が１つのセクションに割り当てられた後、次に、割り当てられた複数のセクションに対して処理して出力テキストを取得する。 After the input text is divided into multiple sections by clustering, that is, after the highly similar constituents in the input text are assigned to one section, the next processing is performed on the assigned multiple sections. Get the output text.

図１に戻って、ステップＳ１０４では、前記複数のセクションと前記分析結果に基づいて出力テキストを生成する。 Returning to FIG. 1, in step S104, output text is generated based on the plurality of sections and the analysis result.

例えば、ステップＳ１０２で取得された入力テキストに対応する分析結果を利用し、ステップＳ１０３で取得された複数のセクションのそれぞれにおいて、当該セクションにおける文の重みが最大となる構成文を当該セクションに対応する出力結果として選択し、各セクションに対応する分析結果をそれぞれ取得した後、複数のセクションの出力結果を結合して出力テキストを生成する。 For example, using the analysis result corresponding to the input text acquired in step S102, in each of the plurality of sections acquired in step S103, the constituent sentence having the maximum sentence weight in the section corresponds to the section. Select as the output result, obtain the analysis result corresponding to each section, and then combine the output results of multiple sections to generate the output text.

図６は、複数のセクションのそれぞれに対応する分析結果を取得する例示的な模式図を示す。 FIG. 6 shows an exemplary schematic diagram for obtaining analysis results corresponding to each of the plurality of sections.

図６に示すように、まず、テキスト６０はニューラルネットワークに入力される。その後、クラスタリング６５を利用して入力テキスト６０を複数のセクションに分割してもよく、当該複数のセクションの数は前記テキスト処理レイヤ６１の出力テキスト６６の所定の目標文数によって決定される。同時に、異なる処理方法６４を利用して入力テキストのすべての構成文を分析し、各構成文の文ベクトルを生成し、入力テキスト６０のすべての構成文によって各構成文の語彙ベクトル６２を取得し、次に、取得した文ベクトルと語彙ベクトルに基づいて、ＶＧＧ−１６を利用して特徴量を抽出し、当該特徴量は、入力テキスト６０の各構成文の重みに対応し、かつクラスタリングにより分割して得られる複数のセクションにおける各構成文の重みにも対応する。次に、複数のセクションのそれぞれにおいて、文の重みが最大となる構成文を、当該セクションに対応する分析結果として選択する。最後に、複数のセクションの出力結果を結合して出力テキスト６６を生成する。 As shown in FIG. 6, first, the text 60 is input to the neural network. After that, the input text 60 may be divided into a plurality of sections by using the clustering 65, and the number of the plurality of sections is determined by a predetermined target number of sentences of the output text 66 of the text processing layer 61. At the same time, all the constituent sentences of the input text are analyzed using different processing methods 64, a sentence vector of each constituent sentence is generated, and the vocabulary vector 62 of each constituent sentence is acquired by all the constituent sentences of the input text 60. Next, based on the acquired sentence vector and vocabulary vector, a feature amount is extracted using VGG-16, and the feature amount corresponds to the weight of each constituent sentence of the input text 60 and is divided by clustering. It also corresponds to the weight of each constituent sentence in the plurality of sections obtained by. Next, in each of the plurality of sections, the constituent sentence having the maximum sentence weight is selected as the analysis result corresponding to the section. Finally, the output results of the plurality of sections are combined to generate the output text 66.

本開示に基づいて、文ベクトル及び語彙ベクトルに基づいて分析することにより、文の粒度の深さから語彙の粒度の深さまで細分化することができ、よって特別な文の認識性を向上することができる。さらに、ＶＧＧ−１６を利用することにより、特徴値間の関係をよりよく把握することができ、より精細な特徴を抽出することができる。さらに、クラスタリングを利用して入力テキストを複数のセクションに分割することにより、同じ種別の構成文を同一のセクションに分割し、そして複数のセクションから要約を抽出し、重複する文を削除する目的を達成することができる。 By analyzing based on the sentence vector and the vocabulary vector based on the present disclosure, it is possible to subdivide from the depth of sentence granularity to the depth of vocabulary granularity, thereby improving the recognition of special sentences. Can be done. Further, by using VGG-16, the relationship between the feature values can be better grasped, and more detailed features can be extracted. Furthermore, by using clustering to divide the input text into multiple sections, the purpose is to divide the same type of constituent sentences into the same section, and extract summaries from multiple sections to remove duplicate sentences. Can be achieved.

以下、図７を参照して、本開示の一実施形態によるテキスト処理装置１０００について説明する。図７は、本開示の一実施形態によるテキスト処理装置１０００の模式図である。本実施形態のテキスト処理装置の機能は、図１を参照して上述した方法の詳細と同じであるため、便宜上、ここでは同じ内容に対する詳細な説明を省略する。 Hereinafter, the text processing apparatus 1000 according to the embodiment of the present disclosure will be described with reference to FIG. 7. FIG. 7 is a schematic view of the text processing device 1000 according to the embodiment of the present disclosure. Since the function of the text processing apparatus of this embodiment is the same as the details of the method described above with reference to FIG. 1, detailed description of the same contents will be omitted here for convenience.

図７に示すように、本開示の一実施形態によるテキスト処理装置１０００は、取得部１００１、分析部１００２、分割部１００３、及び生成部１００４を備える。なお、テキスト処理装置１０００は、図７では４つの部品のみを備えるように示されているが、これは例示的なものに過ぎず、テキスト処理装置１０００は、１つ以上の他の部品も備えてもよい。これらの部品は本発明の思想と関係がないため、ここでは省略する。 As shown in FIG. 7, the text processing apparatus 1000 according to the embodiment of the present disclosure includes an acquisition unit 1001, an analysis unit 1002, a division unit 1003, and a generation unit 1004. Although the text processing apparatus 1000 is shown in FIG. 7 to include only four components, this is merely an example, and the text processing apparatus 1000 also includes one or more other components. You may. Since these parts have nothing to do with the idea of the present invention, they are omitted here.

図７に示すように、取得部１００１は、入力テキストを取得する。 As shown in FIG. 7, the acquisition unit 1001 acquires the input text.

当該入力テキストは、処理を行うと予想される元のテキストであり、当該元のテキストによって、予想の語彙数または予想の文数の要約が生成される。 The input text is the original text that is expected to be processed, and the original text produces a summary of the expected vocabulary or expected number of sentences.

分析部１００２は、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得する。 The analysis unit 1002 analyzes the input text and acquires the analysis result corresponding to the input text.

従来、多くの異なる処理方法がある。例えば、テキスト要約抽出については、位置方法、フレーズ方法、タイトル方法、キーワード方法などのさまざまな方法がある。例えば、分析部１００２は、各テキスト処理方法に異なる重みを割り当て、各テキスト処理方法により入力テキストのすべての構成文を分析して、入力テキストのすべての構成文の重みを分析結果として取得してもよい。 Traditionally, there are many different processing methods. For example, for text summary extraction, there are various methods such as position method, phrase method, title method, and keyword method. For example, the analysis unit 1002 assigns different weights to each text processing method, analyzes all the constituent sentences of the input text by each text processing method, and acquires the weights of all the constituent sentences of the input text as the analysis result. May be good.

あるいは、分析部１００２は、入力テキストの構成文の文ベクトルを利用して、各構成文の重みを取得してもよい。代わりに、分析部１００２は、入力テキストのすべての構成文を利用して対応する語彙ベクトルを取得し、異なる処理方法を利用して入力テキストを分析することによって得られた構成文ベクトルを語彙ベクトルと結合して、そして、ＶＧＧ−１６などの方法を利用して特徴量を抽出して各構成文の重みを取得してもよい。文ベクトルと語彙ベクトルに基づいて分析を行うことにより、文の粒度の深さから語彙の粒度の深さまで細分化することができ、よって特別な文の認識制を向上することができる。また、ＶＧＧ−１６を利用することで、特徴値間の関係をよりよく把握することができ、よってより精細な特徴を抽出することができる。 Alternatively, the analysis unit 1002 may acquire the weight of each constituent sentence by using the sentence vector of the constituent sentence of the input text. Instead, the analysis unit 1002 uses all the constituent sentences of the input text to obtain the corresponding vocabulary vector, and analyzes the input text using different processing methods to obtain the constituent sentence vector as the vocabulary vector. Then, the feature amount may be extracted by using a method such as VGG-16 to obtain the weight of each constituent sentence. By performing analysis based on the sentence vector and the vocabulary vector, it is possible to subdivide from the depth of sentence granularity to the depth of vocabulary granularity, and thus the recognition system of a special sentence can be improved. Further, by using VGG-16, the relationship between the feature values can be better grasped, and thus more detailed features can be extracted.

図２は、文ベクトル及び語彙ベクトルに基づいて入力テキストの構成文の重みを取得する模式図である。図２に示すように、まず、入力テキストのすべての構成文７０（｛Ｓ_１，Ｓ_２，Ｓ_３，…，Ｓ_７｝）がニューラルネットワークに入力される。そして、ニューラルネットワークにおいて、分析部１００２は、異なる処理方法を利用して入力テキストのすべての構成文７０を分析し、各構成文の文ベクトル７１を生成する。次に、分析部１００２は、入力テキストのすべての構成文の語彙ベクトル７２を取得する。次に、分析部１００２は、取得した文ベクトル７１及び語彙ベクトル７２に基づいて、ＶＧＧ−１６を利用して、入力テキストの各構成文７０に対応する特徴量７４（｛Ｖ_１，Ｖ_２，Ｖ_３，…，Ｖ_７｝）を抽出する。 FIG. 2 is a schematic diagram for acquiring the weights of the constituent sentences of the input text based on the sentence vector and the vocabulary vector. As shown in FIG. 2, first, all the constituent sentences 70 ({S ₁ , S ₂ , S ₃ , ..., S ₇ }) of the input text are input to the neural network. Then, in the neural network, the analysis unit 1002 analyzes all the constituent sentences 70 of the input text by using different processing methods, and generates the sentence vector 71 of each constituent sentence. Next, the analysis unit 1002 acquires the vocabulary vector 72 of all the constituent sentences of the input text. Next, the analysis unit 1002 uses the VGG-16 based on the acquired sentence vector 71 and the vocabulary vector 72, and the feature amount 74 ({V ₁ , V ₂ , ,) corresponding to each constituent sentence 70 of the input text. V _3, _..., extracts the _V 7}).

次に、分割部１００３は、クラスタリングを利用して入力テキストを複数のセクションに分割する。 Next, the division unit 1003 divides the input text into a plurality of sections using clustering.

ここで、分割部１００３がクラスタリングにより入力テキストを分割して得られる複数のセクションの数は、出力テキストの所定の目標文数によって決定される。例えば、所望の出力テキストの所定の目標文数がＭである場合、入力テキストをクラスタリングにより分割して得られる複数のセクションの数もＭである。 Here, the number of a plurality of sections obtained by dividing the input text by the dividing unit 1003 by clustering is determined by a predetermined target number of sentences of the output text. For example, when the predetermined target number of sentences of the desired output text is M, the number of a plurality of sections obtained by dividing the input text by clustering is also M.

図３は、クラスタリングをニューラルネットワークに基づくテキスト処理方法に適用する模式図を示す。図３に示すように、ニューラルネットワークにカスケードされたＮ（Ｎ≧２）個のテキスト処理レイヤが含まれると想定すると、カスケードされたＮ個のテキスト処理レイヤのｎ番目のテキスト処理レイヤがクラスタリングを利用して入力テキストを分割して得られるセクションの数は、ｎ番目のテキスト処理レイヤの出力テキストの所定の目標文数によって決定される。例えば、テキスト処理レイヤ１（図中で５２で表す）がクラスタリング５０を利用して入力テキスト５１を分割して得られる複数のセクションの数は、テキスト処理レイヤ１の出力テキスト１（図中では５３で表す）の所定の目標文数によって決定されてもよいし、テキスト処理レイヤ２（図中では５４で表す）がクラスタリング５６を利用して入力テキスト（つまり、図中の出力テキスト１）を分割して得られる複数のセクションの数は、テキスト処理レイヤ２の出力テキスト２（図中では５５で表す）の所定の目標文数によって決定されてもよい。 FIG. 3 shows a schematic diagram in which clustering is applied to a text processing method based on a neural network. As shown in FIG. 3, assuming that the neural network contains N (N ≧ 2) cascaded text processing layers, the nth text processing layer of the cascaded N text processing layers clusters. The number of sections obtained by dividing the input text by using it is determined by a predetermined target number of sentences of the output text of the nth text processing layer. For example, the number of a plurality of sections obtained by dividing the input text 51 by the text processing layer 1 (represented by 52 in the figure) using the clustering 50 is the output text 1 of the text processing layer 1 (53 in the figure). It may be determined by a predetermined target number of sentences (represented by), or the text processing layer 2 (represented by 54 in the figure) divides the input text (that is, the output text 1 in the figure) by using clustering 56. The number of the plurality of sections thus obtained may be determined by a predetermined target number of sentences of the output text 2 (represented by 55 in the figure) of the text processing layer 2.

ニューラルネットワークには１つのテキスト処理レイヤのみが含まれてもよく、この場合、入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、テキスト処理レイヤの出力テキストの所定の目標文数によって決定されることを認識されたい。 The neural network may contain only one text processing layer, in which case the number of sections obtained by clustering the input text depends on a predetermined target number of output texts in the text processing layer. Please be aware that it will be decided.

入力テキストがクラスタリングにより分割されて得られる複数のセクションの数を取得した後、分割部１００３は、クラスタリングを利用して入力テキストを分割する動作を行う。 After acquiring the number of a plurality of sections obtained by dividing the input text by clustering, the division unit 1003 performs an operation of dividing the input text by using clustering.

図４に示すように、ステップＳ２０１では、分割部１００３は、複数のセクションに対応する複数の中心文を初期化する。 As shown in FIG. 4, in step S201, the division unit 1003 initializes a plurality of central sentences corresponding to the plurality of sections.

複数のセクションの数がＭである場合、各セクションに対して１つの中心文を選択する必要があるため、Ｍ個の中心文を選択する必要がある。例えば、入力テキストの最初のＭ個の文を初期化時の中心文として選択したり、入力テキストからＭ個の文をランダムに初期化時の中心文として選択したりすることができる。中心文を初期化する方法はこれに限らず、中心文を初期化するためには他の適切な方法を採用することもできることを認識されたい。 When the number of a plurality of sections is M, it is necessary to select one central sentence for each section, so it is necessary to select M central sentences. For example, the first M sentences of the input text can be selected as the central sentence at the time of initialization, or M sentences can be randomly selected as the central sentence at the time of initialization from the input text. It should be recognized that the method of initializing the central sentence is not limited to this, and other appropriate methods can be adopted to initialize the central sentence.

ステップＳ２０２では、分割部１００３は、入力テキストの構成文と複数の中心文との類似度を計算することにより、類似度に基づいて入力テキストの構成文を複数の中心文に対応するセクションに分割し、複数のセクションに含まれる構成文を更新する。 In step S202, the division unit 1003 divides the composition sentence of the input text into sections corresponding to the plurality of central sentences based on the similarity by calculating the similarity between the composition sentence of the input text and the plurality of central sentences. And update the constructs contained in multiple sections.

例えば、Ｒｏｕｇｅ−１方式で類似度（または距離）を計算してもよい。例えば、ｉ番目の文とｊ番目の分との類似度ｆ_ｉｊは次のように表すことができる。

For example, the similarity (or distance) may be calculated by the Rouge-1 method. For example, the similarity f _ij of the i-th sentence and j th minute can be expressed as follows.

但し、Ｓ_ｉはｉ番目の文におけるｐ語彙（例えば、ｐ＝２の場合は二語彙を表し、ｐ＝３の場合は三語彙を表す）の数を表し、Ｒ_ｉｊはｉ番目の文とｊ番目の文における繰り返すｐ語彙の数を示す。ｆ_ｉｊが大きくなるほど、ｉ番目の文とｊ番目の文との類似度が高くなる。類似度を計算する方法はこれに限らず、類似度を計算するためには他の適切な方法を採用してもよいことを理解されたい。 However, S _i represents the number of p vocabularies in the i-th sentence (for example, p = 2 represents two vocabularies, p = 3 represents three vocabularies), and R _ij represents the i-th sentence. The number of repeating p vocabularies in the jth sentence is shown. The larger f _{ij, the} higher the similarity between the i-th sentence and the j-th sentence. It should be understood that the method for calculating the similarity is not limited to this, and other appropriate methods may be adopted for calculating the similarity.

次に、ステップＳ２０３では、複数のセクションのそれぞれにおいて、分割部１００３は、各構成文間の類似度を計算することにより、類似度の合計が最大となる構成文を新しい中心文として決定する。 Next, in step S203, in each of the plurality of sections, the division unit 1003 calculates the similarity between the constituent sentences to determine the constituent sentence having the maximum total similarity as the new central sentence.

例えば、第１セクションに含まれる構成文が｛ｔ_３，ｔ_５，ｔ_６｝であると想定すると、各構成文｛ｔ_３，ｔ_５，ｔ_６｝間の類似度｛ｔ_３５，ｔ_３６，ｔ_５６｝を計算することにより、類似度の合計（例えば、ｔ_３の類似度の合計が（ｆ_３５＋ｆ_３６）であり、ｔ_５の類似度の合計が（ｆ_５３＋ｆ_５６）であり、ｔ_６の類似度の合計が（ｆ_６５＋ｆ_６３）である）が最大となる構成文をこのセクションにおける新しい中心文として選択する。 For example, assuming that the constructs contained in the first section are {t ₃ , t ₅ , t ₆ }, the similarity {t ₃₅ , t ₃₆ between each construct {t ₃ , t ₅ , t ₆ } , T ₅₆ }, the total similarity (eg, the total similarity of t ₃ is (f ₃₅ + f ₃₆ )) and the total similarity of t ₅ is (f ₅₃ + f ₅₆ ). , T _{6 with} the largest sum of similarity (f ₆₅ + f ₆₃ )) is selected as the new central sentence in this section.

次に、ステップＳ２０４において、新しい中心文が変わらなくなるか、繰り返し回数が所定の閾値に達するまで、すなわち、複数のセクションのそれぞれが安定になるまで、上記のプロセスを繰り返す。このようにして、各セクションの構成文が決定され、つまり、クラスタリングが完了する。クラスタリング方法は類似度の計算に限らず、他の適切な方法を採用してクラスタリングしてもよいことを理解されたい。 Next, in step S204, the above process is repeated until the new central sentence does not change or the number of repetitions reaches a predetermined threshold, that is, until each of the plurality of sections becomes stable. In this way, the constituent statements of each section are determined, that is, clustering is completed. It should be understood that the clustering method is not limited to the calculation of similarity, and other appropriate methods may be adopted for clustering.

図５は、本開示の実施形態による、クラスタリングを利用して入力テキストを分割する模式図である。図５に示すように、入力テキストの構成文２１がＴ＝｛ｔ_１，ｔ_２，ｔ_３，…，ｔ_７｝であり、入力テキストをクラスタリングにより分割して得られる複数のセクションの数はＭ＝３（第１セクション２２、第２セクション２３及び第３セクション２４）であり、３つのセクションの初期化のために選択された中心文はそれぞれｔ_１、ｔ_２、ｔ_３であると想定する。次に、入力テキストの残りの構成文｛ｔ_４，ｔ_５，ｔ_６，ｔ_７｝と各中心文｛ｔ_１，ｔ_２，ｔ_３｝との間の類似度を計算することにより、類似度に基づいて入力テキストの構成文｛ｔ_４，ｔ_５，ｔ_６，ｔ_７｝をそれぞれ中心文｛ｔ_１，ｔ_２，ｔ_３｝の対応するセクションに割り当てることができる。例えば、図５に示すように、ｔ_４は第１セクション２５に割り当てられ、ｔ_６及びｔ_７は第２セクション２６に割り当てられ、ｔ_５は第３セクション２７に割り当てられる。次に、各セクションにおいて、各構成文間の類似度を計算し、類似度の合計が最大となる構成文を新しい中心文として決定する。例えば、第２セクション２６において、ｔ_２、ｔ_６、及びｔ_７の間の類似度を計算し、類似度の合計が最大であるｔ_６を新しい中心文（図中では円で表す）として決定する。中心文が変わらなくなるまで、上記プロセスを繰り返す。 FIG. 5 is a schematic diagram for dividing the input text by using clustering according to the embodiment of the present disclosure. As shown in FIG. 5, the constituent sentence 21 of the input text is T = {t ₁ , t ₂ , t ₃ , ..., T ₇ }, and the number of a plurality of sections obtained by dividing the input text by clustering is It is assumed that M = 3 (1st section 22, 2nd section 23 and 3rd section 24), and the central sentences selected for the initialization of the 3 sections are t ₁ , t ₂ and t ₃ , respectively. To do. Next, the similarity is calculated by calculating the similarity between the remaining constituent sentences of the input text {t ₄ , t ₅ , t ₆ , t ₇ } and each central sentence {t ₁ , t ₂ , t ₃ }. Based on the degree, the constituent sentences {t ₄ , t ₅ , t ₆ , t ₇ } of the input text can be assigned to the corresponding sections of the central sentence {t ₁ , t ₂ , t ₃ }, respectively. For example, as shown in FIG. 5, t ₄ is assigned to the first section 25, t ₆ and t ₇ are assigned to the second section 26, and t ₅ is assigned to the third section 27. Next, in each section, the similarity between each constituent sentence is calculated, and the constituent sentence having the maximum total similarity is determined as a new central sentence. For example, the determination in the second section 26, t _{2, t} _6, and calculates the similarity between t _7, as total new center sentence t ₆ is the maximum degree of similarity (represented by circles in the drawing) To do. Repeat the above process until the central sentence does not change.

入力テキストがクラスタリングにより複数のセクションに分割された後、つまり、入力テキストにおける類似度の高い構成文が同一のセクションに割り当てられた後、次に、割り当てられた複数のセクションに対して処理して出力テキストを取得する。 After the input text has been divided into multiple sections by clustering, that is, after the highly similar constituents in the input text have been assigned to the same section, then the assigned sections are processed. Get the output text.

次に、生成部１００４は、前記複数のセクションと前記分析結果に基づいて出力テキストを生成する。 Next, the generation unit 1004 generates an output text based on the plurality of sections and the analysis result.

例えば、生成部１００４は、取得した入力テキストに対応する分析結果を利用し、取得した複数のセクションのそれぞれにおいて、当該セクションの文の重みが最大となる構成文を当該セクションに対応する出力結果として選択し、各セクションに対応する分析結果をそれぞれ取得した後、複数のセクションの出力結果を結合して出力テキストを生成する。 For example, the generation unit 1004 uses the analysis result corresponding to the acquired input text, and in each of the acquired plurality of sections, the constituent sentence having the maximum sentence weight of the section is output as the output result corresponding to the section. After selecting and acquiring the analysis results corresponding to each section, the output results of multiple sections are combined to generate the output text.

図６に示すように、まず、テキスト６０はニューラルネットワークに入力される。その後、分割部１００３は、クラスタリング６５を利用して入力テキスト６０を複数のセクションに分割してもよく、当該複数のセクションの数は前記テキスト処理レイヤ６１による出力テキスト６６の所定の目標文数によって決定される。同時に、分析部１００２は、異なる処理方法６４を利用して入力テキストのすべての構成文を分析し、各構成文の文ベクトルを生成し、入力テキスト６０のすべての構成文により各構成文の語彙ベクトル６２を取得し、次に、取得した文ベクトルと語彙ベクトルに基づいて、ＶＧＧ−１６を利用して特徴量を抽出し、当該特徴量は、入力テキスト６０の各構成文の重みに対応し、かつクラスタリングにより分割して得られる複数のセクションにおける各構成文の重みにも対応する。次に、複数のセクションのそれぞれにおいて、文の重みが最大となる構成文を、当該セクションに対応する分析結果として選択する。最後に、生成部１００４は、複数のセクションの出力結果を結合して出力テキスト６６を生成する。 As shown in FIG. 6, first, the text 60 is input to the neural network. After that, the division unit 1003 may divide the input text 60 into a plurality of sections by using the clustering 65, and the number of the plurality of sections depends on the predetermined target number of sentences of the output text 66 by the text processing layer 61. It is determined. At the same time, the analysis unit 1002 analyzes all the constituent sentences of the input text using different processing methods 64, generates a sentence vector of each constituent sentence, and vocabulary of each constituent sentence by all the constituent sentences of the input text 60. The vector 62 is acquired, and then, based on the acquired sentence vector and vocabulary vector, the feature amount is extracted using VGG-16, and the feature amount corresponds to the weight of each constituent sentence of the input text 60. Also, it corresponds to the weight of each constituent sentence in a plurality of sections obtained by dividing by clustering. Next, in each of the plurality of sections, the constituent sentence having the maximum sentence weight is selected as the analysis result corresponding to the section. Finally, the generation unit 1004 combines the output results of the plurality of sections to generate the output text 66.

以下、図８を参照して、本開示の一実施形態によるテキスト処理装置１１００について説明する。図８は、本開示の実施形態によるテキスト処理装置の模式図である。本実施形態のテキスト処理装置の機能は、図１を参照して上述した方法の詳細と同じであるため、便宜上、ここでは同じ内容に対する詳細な説明を省略する。 Hereinafter, the text processing apparatus 1100 according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 8 is a schematic diagram of a text processing device according to the embodiment of the present disclosure. Since the function of the text processing apparatus of this embodiment is the same as the details of the method described above with reference to FIG. 1, detailed description of the same contents will be omitted here for convenience.

図８に示すように、テキスト処理デバイス１１００は、メモリ１１０１とプロセッサ１１０２を備える。なお、テキスト処理デバイス１１００は、図８では２つのデバイスのみを備えるように示されているが、これは例示的なものに過ぎず、テキスト処理デバイス１１００は１つ以上の他のデバイスも備えてもよい。これらのデバイスは本発明の思想と関係がないため、ここでは省略する。 As shown in FIG. 8, the text processing device 1100 includes a memory 1101 and a processor 1102. Although the text processing device 1100 is shown in FIG. 8 to include only two devices, this is merely an example, and the text processing device 1100 also includes one or more other devices. May be good. Since these devices have nothing to do with the idea of the present invention, they are omitted here.

本開示のニューラルネットワークに基づくテキスト処理デバイス１１００は、コンピュータ読取可能な命令を格納するように構成されるメモリ１１０１と、前記メモリに格納された前記コンピュータ読取可能な命令を実行するように構成されるプロセッサ１１０２とを備える。前記プロセッサ１１０２は、前記コンピュータ読取可能な命令を実行するとき、入力テキストを取得することと、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することと、クラスタリングを利用して前記入力テキストを複数のセクションに分割することと、前記複数のセクションと前記分析結果に基づいて出力テキストを生成することと、を実行する。 The text processing device 1100 based on the neural network of the present disclosure is configured to execute a memory 1101 configured to store computer-readable instructions and the computer-readable instructions stored in the memory. It includes a processor 1102. When the processor 1102 executes the computer-readable instruction, it obtains an input text, analyzes the input text, obtains an analysis result corresponding to the input text, and utilizes clustering. Dividing the input text into a plurality of sections and generating an output text based on the plurality of sections and the analysis result are executed.

ここで、クラスタリングを利用して前記入力テキストを複数のセクションに分割することは、前記複数のセクションに対応する複数の中心文を初期化する初期化ステップと、前記入力テキストの構成文と前記複数の中心文との類似度を計算することにより、類似度に基づいて前記入力テキストの構成文をそれぞれ前記複数の中心文に対応するセクションに割り当て、前記複数のセクションに含まれる構成文を更新する更新ステップと、前記複数のセクションにおいて、各構成文間の類似度を計算することにより、類似度の合計が最大となる構成文を新しい中心文として決定する決定ステップと、新しい中心文が変わらなくなるまで上記更新ステップと決定ステップを繰り返すことと、を含む。 Here, dividing the input text into a plurality of sections by using clustering includes an initialization step for initializing a plurality of central sentences corresponding to the plurality of sections, a constituent sentence of the input text, and the plurality of sections. By calculating the similarity with the central sentence of, the constituent sentences of the input text are assigned to the sections corresponding to the plurality of central sentences based on the similarity, and the constituent sentences included in the plurality of sections are updated. By calculating the similarity between each constituent sentence in the update step and the plurality of sections, the new central sentence remains unchanged from the determination step in which the constituent sentence having the maximum total similarity is determined as the new central sentence. It includes repeating the above update step and determination step until.

ここで、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することは、前記入力テキストのすべての構成文を分析して、すべての構成文それぞれの文の重みを前記分析結果として取得することを含む。 Here, to analyze the input text and obtain the analysis result corresponding to the input text, the analysis result analyzes all the constituent sentences of the input text and determines the weight of each sentence of all the constituent sentences. Including getting as.

ここで、複数のセクションと分析結果に基づいて出力テキストを生成することは、前記すべての構成文それぞれの文の重みに基づいて、前記複数のセクションのそれぞれにおいて、当該セクションで文の重みが最大となる構成文を当該セクションに対応する出力結果として選択することと、複数のセクションの出力結果を結合して出力テキストを生成することと、を含む。 Here, generating output text based on a plurality of sections and analysis results means that in each of the plurality of sections, the sentence weight is the maximum in the section, based on the sentence weights of all the constituent sentences. This includes selecting the constituent sentence to be the output result corresponding to the section, and combining the output results of a plurality of sections to generate the output text.

ここで、ニューラルネットワークは１つのテキスト処理レイヤのみを含んでもよく、入力テキストをクラスタリングにより分割して得られる複数のセクションの数は、テキスト処理レイヤの前記出力テキストの所定の目標文数によって決定される。 Here, the neural network may include only one text processing layer, and the number of a plurality of sections obtained by dividing the input text by clustering is determined by a predetermined target number of sentences of the output text of the text processing layer. To.

ここで、ニューラルネットワークにはカスケードされたＮ（Ｎ≧２）個のテキスト処理レイヤが含まれてもよく、この場合、カスケードされたＮ個のテキスト処理レイヤにおけるｎ番目のテキスト処理レイヤがクラスタリングにより入力テキストを分割して得られる複数のセクションの数は、前記ｎ番目のテキスト処理レイヤの出力テキストの所定の目標文数によって決定される。 Here, the neural network may include N (N ≧ 2) cascaded text processing layers, in which case the nth text processing layer in the cascaded N text processing layers is clustered. The number of the plurality of sections obtained by dividing the input text is determined by a predetermined target number of sentences of the output text of the nth text processing layer.

図９は、本開示の実施形態によるコンピュータ読取可能な記憶媒体の模式図である。 FIG. 9 is a schematic diagram of a computer-readable storage medium according to the embodiment of the present disclosure.

図９に示すように、本開示は、さらに、コンピュータ読取可能な命令１２０１が格納されたコンピュータ読取可能な記憶媒体１２００を備える。当該コンピュータ読取可能な命令がコンピュータによって実行されるとき、前記コンピュータはテキスト処理方法を実行する。この方法は、入力テキストを取得することと、前記入力テキストを分析し、前記入力テキストに対応する分析結果を取得することと、クラスタリングを利用して前記入力テキストを複数のセクションに分割することと、前記複数のセクションと前記分析結果に基づいて出力テキストを生成することと、を含む。 As shown in FIG. 9, the present disclosure further comprises a computer-readable storage medium 1200 in which a computer-readable instruction 1201 is stored. When the computer-readable instruction is executed by the computer, the computer performs a text processing method. This method involves obtaining the input text, analyzing the input text, obtaining the analysis result corresponding to the input text, and using clustering to divide the input text into a plurality of sections. , The generation of output text based on the plurality of sections and the analysis results.

＜ハードウェア構成＞
なお、上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及び／又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び／又は論理的に結合した１つの装置により実現されてもよいし、物理的及び／又は論理的に分離した２つ以上の装置を直接的及び／又は間接的に（例えば、有線及び／又は無線で）接続し、これら複数の装置により実現されてもよい。 <Hardware configuration>
The block diagram used in the description of the above embodiment shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. (For example, wired and / or wirelessly) may be connected and realized by these plurality of devices.

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。例えば、プロセッサは１つだけ図示されているが、複数のプロセッサがあってもよい。また、処理は、１のプロセッサで実行されてもよいし、処理が同時に、逐次に、又はその他の手法で、１以上のプロセッサで実行されてもよい。なお、プロセッサは、１以上のチップで実装されてもよい。 In the following description, the word "device" can be read as a circuit, device, unit, or the like. For example, although only one processor is shown, there may be multiple processors. Further, the processing may be executed by one processor, or the processing may be executed simultaneously, sequentially, or by other methods on one or more processors. The processor may be mounted on one or more chips.

プロセッサは、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサは、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）で構成されてもよい。 The processor, for example, runs an operating system to control the entire computer. The processor may be composed of a central processing unit (CPU: Central Processing Unit) including an interface with peripheral devices, a control device, an arithmetic unit, a register, and the like.

また、プロセッサは、プログラム（プログラムコード）、ソフトウェアモジュール、データなどを、ストレージ及び／又は通信装置からメモリに読み出し、これらに従って各種の処理を実行する。プログラムとしては、上記の実施形態で説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、制御部は、メモリに格納され、プロセッサで動作する制御プログラムによって実現されてもよく、他の機能ブロックについても同様に実現されてもよい。メモリは、コンピュータ読取可能な記憶媒体であり、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇＲＡＭｍａｂｌｅＲＯＭ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥＰＲＯＭ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、その他の適切な記憶媒体の少なくとも１つで構成されてもよい。メモリは、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリは、本開示の一実施形態に係る無線通信方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 Further, the processor reads a program (program code), software module, data, etc. from the storage and / or communication device into the memory, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described in the above embodiment is used. For example, the control unit may be realized by a control program stored in a memory and running on a processor, and may be realized for other functional blocks as well. The memory is a computer-readable storage medium, such as ROM (Read Only Memory), EPROM (Erasable ProgRAMmable ROM), EPROM (Electrically EPROM), RAM (Random Access Memory), or at least one of other suitable storage media. It may be composed of one. The memory may be referred to as a register, a cache, a main memory (main storage device), or the like. The memory can store a program (program code), a software module, or the like that can be executed to implement the wireless communication method according to the embodiment of the present disclosure.

ストレージは、コンピュータ読取可能な記憶媒体であり、例えば、フレキシブルディスク、フロッピー（登録商標）ディスク、光磁気ディスク（例えば、コンパクトディスク（ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲＯＭ）など）、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク）、リムーバブルディスク、ハードディスクドライブ、スマートカード、フラッシュメモリデバイス（例えば、カード、スティック（ｓｔｉｃｋ）、キードライブ（ｋｅｙｄｒｉｖｅｒ））、磁気ストライプ、データベース、サーバ、その他の適切な記憶媒体の少なくとも１つで構成されてもよい。ストレージは、補助記憶装置と呼ばれてもよい。 The storage is a computer-readable storage medium, such as a flexible disk, a floppy (registered trademark) disk, a magneto-optical disk (for example, a compact disk (CD-ROM (Compact Disk ROM))), a digital versatile disk, or a Blu. -Ray® disks), removable disks, hard disk drives, smart cards, flash memory devices (eg cards, sticks, key drivers), magnetic stripes, databases, servers, and other suitable It may be composed of at least one storage medium. The storage may be referred to as auxiliary storage.

入力装置は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）ランプなど）である。なお、入力装置及び出力装置は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device is an output device (for example, a display, a speaker, an LED (Light Emitting Diode) lamp, etc.) that outputs to the outside. The input device and the output device may have an integrated configuration (for example, a touch panel).

また、本明細書で説明した情報、パラメータなどは、絶対値で表されてもよいし、所定の値からの相対値で表されてもよいし、対応する別の情報で表されてもよい。例えば、無線リソースは、所定のインデックスで指示されるものであってもよい。さらに、これらのパラメータを使用する数式などは、本明細書で明示的に開示したものと異なってもよい。 Further, the information, parameters, etc. described in the present specification may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information. .. For example, the radio resource may be indicated by a predetermined index. Further, mathematical formulas and the like using these parameters may differ from those expressly disclosed herein.

本明細書においてパラメータなどに使用する名称は、いかなる点においても限定的なものではない。 The names used for parameters and the like in the present specification are not limited in any respect.

入出力された情報、信号などは、特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報、信号などは、上書き、更新又は追記をされ得る。出力された情報、信号などは、削除されてもよい。入力された情報、信号などは、他の装置へ送信されてもよい。 The input / output information, signals, etc. may be stored in a specific location (for example, a memory) or may be managed by a management table. Input / output information, signals, etc. can be overwritten, updated, or added. The output information, signals, etc. may be deleted. The input information, signals, etc. may be transmitted to other devices.

情報の通知は、本明細書で説明した態様／実施形態に限られず、他の方法で行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、下り制御情報（ＤＣＩ：ＤｏｗｎｌｉｎｋＣｏｎｔｒｏｌＩｎｆｏｒｍａｔｉｏｎ）、上り制御情報（ＵＣＩ：ＵｐｌｉｎｋＣｏｎｔｒｏｌＩｎｆｏｒｍａｔｉｏｎ））、上位レイヤシグナリング（例えば、ＲＲＣ（ＲａｄｉｏＲｅｓｏｕｒｃｅＣｏｎｔｒｏｌ）シグナリング、ブロードキャスト情報（マスタ情報ブロック（ＭＩＢ：ＭａｓｔｅｒＩｎｆｏｒｍａｔｉｏｎＢｌｏｃｋ）、システム情報ブロック（ＳＩＢ：ＳｙｓｔｅｍＩｎｆｏｒｍａｔｉｏｎＢｌｏｃｋ）など）、媒体アクセス制御（ＭＡＣ：ＭｅｄｉｕｍＡｃｃｅｓｓＣｏｎｔｒｏｌ）シグナリング）、その他の信号又はこれらの組み合わせによって実施されてもよい。 The notification of information is not limited to the embodiments / embodiments described herein, and may be made by other methods. For example, information notification includes physical layer signaling (for example, downlink control information (DCI), uplink control information (UCI)), upper layer signaling (eg, RRC (Radio Resource Control) signaling, etc.). It is implemented by broadcast information (Master Information Block (MIB), System Information Block (SIB), Media Access Control (MAC: Medium Access Control) signaling), other signals, or a combination thereof. You may.

また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗示的に（例えば、当該所定の情報の通知を行わないことによって又は別の情報の通知によって）行われてもよい。 In addition, the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit notification, but implicitly (for example, by not notifying the predetermined information or another). It may be done (by notification of information).

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真（ｔｒｕｅ）又は偽（ｆａｌｓｅ）で表される真偽値（ｂｏｏｌｅａｎ）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be made by a value represented by 1 bit (0 or 1), or by a boolean value represented by true (true) or false (false). , May be done by numerical comparison (eg, comparison with a given value).

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software is an instruction, instruction set, code, code segment, program code, program, subprogram, software module, whether called software, firmware, middleware, microcode, hardware description language, or another name. , Applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, functions, etc. should be broadly interpreted to mean.

また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：ＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）など）及び／又は無線技術（赤外線、マイクロ波など）を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び／又は無線技術は、伝送媒体の定義内に含まれる。 Further, software, instructions, information and the like may be transmitted and received via a transmission medium. For example, the software uses wired technology (coaxial cable, fiber optic cable, twist pair, digital subscriber line (DSL: Digital Subscriber Line), etc.) and / or wireless technology (infrared, microwave, etc.) to create a website, server. , Or when transmitted from other remote sources, these wired and / or wireless technologies are included within the definition of transmission medium.

本明細書で使用する「システム」及び「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" as used herein are used interchangeably.

本明細書で説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、本明細書で説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 Each aspect / embodiment described in the present specification may be used alone, in combination, or may be switched and used according to the execution. Further, the order of the processing procedures, sequences, flowcharts, etc. of each aspect / embodiment described in the present specification may be changed as long as there is no contradiction. For example, the methods described herein present elements of various steps in an exemplary order, and are not limited to the particular order presented.

本明細書で説明した各態様／実施形態は、ＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）、ＬＴＥ−Ａ（ＬＴＥ−Ａｄｖａｎｃｅｄ）、ＬＴＥ−Ｂ（ＬＴＥ−Ｂｅｙｏｎｄ）、ＳＵＰＥＲ３Ｇ、ＩＭＴ−Ａｄｖａｎｃｅｄ、４Ｇ（４ｔｈｇｅｎｅｒａｔｉｏｎｍｏｂｉｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓｙｓｔｅｍ）、５Ｇ（５ｔｈｇｅｎｅｒａｔｉｏｎｍｏｂｉｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓｙｓｔｅｍ）、ＦＲＡ（ＦｕｔｕｒｅＲａｄｉｏＡｃｃｅｓｓ）、Ｎｅｗ−ＲＡＴ（ＲａｄｉｏＡｃｃｅｓｓＴｅｃｈｎｏｌｏｇｙ）、ＮＲ（ＮｅｗＲａｄｉｏ）、ＮＸ（Ｎｅｗｒａｄｉｏａｃｃｅｓｓ）、ＦＸ（Ｆｕｔｕｒｅｇｅｎｅｒａｔｉｏｎｒａｄｉｏａｃｃｅｓｓ）、ＧＳＭ（登録商標）（ＧｌｏｂａｌＳｙｓｔｅｍｆｏｒＭｏｂｉｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓ）、ＣＤＭＡ２０００、ＵＭＢ（ＵｌｔｒａＭｏｂｉｌｅＢｒｏａｄｂａｎｄ）、ＩＥＥＥ８０２．１１（Ｗｉ−Ｆｉ（登録商標））、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ（登録商標））、ＩＥＥＥ８０２．２０、ＵＷＢ（Ｕｌｔｒａ−ＷｉｄｅＢａｎｄ）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切な無線通信方法を利用するシステム及び／又はこれらに基づいて拡張された次世代システムに適用されてもよい。 Each aspect / embodiment described herein includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), LTE-B (LTE-Beyond), SUPER 3G, IMT-Advanced, 4G (4th generation mobile). communication system), 5G (5th generation mobile communication system), FRA (Future Radio Access), New-RAT (Radio Access Technology), NR (New LTE), NR (New LTE), LTE GSM® (Global System for Mobile communications), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi®), LTE 802.16 (WiMAX) .20, UWB (Ultra-WideBand), Bluetooth®, and other systems that utilize suitable wireless communication methods and / or extended next-generation systems based on them.

本明細書で使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The phrase "based on" as used herein does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

本明細書で使用する「第１の」、「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定するものではない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本明細書で使用され得る。従って、第１及び第２の要素の参照は、２つの要素のみが採用され得ること又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to elements using designations such as "first", "second" as used herein does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements can be employed or that the first element must somehow precede the second element.

本明細書で使用する「判断（決定）（ｄｅｔｅｒｍｉｎｉｎｇ）」という用語は、多種多様な動作を包含する場合がある。例えば、「判断（決定）」は、計算（ｃａｌｃｕｌａｔｉｎｇ）、算出（ｃｏｍｐｕｔｉｎｇ）、処理（ｐｒｏｃｅｓｓｉｎｇ）、導出（ｄｅｒｉｖｉｎｇ）、調査（ｉｎｖｅｓｔｉｇａｔｉｎｇ）、探索（ｌｏｏｋｉｎｇｕｐ）（例えば、テーブル、データベース又は別のデータ構造での探索）、確認（ａｓｃｅｒｔａｉｎｉｎｇ）などを「判断（決定）」することであるとみなされてもよい。また、「判断（決定）」は、受信（ｒｅｃｅｉｖｉｎｇ）（例えば、情報を受信すること）、送信（ｔｒａｎｓｍｉｔｔｉｎｇ）（例えば、情報を送信すること）、入力（ｉｎｐｕｔ）、出力（ｏｕｔｐｕｔ）、アクセス（ａｃｃｅｓｓｉｎｇ）（例えば、メモリ中のデータにアクセスすること）などを「判断（決定）」することであるとみなされてもよい。また、「判断（決定）」は、解決（ｒｅｓｏｌｖｉｎｇ）、選択（ｓｅｌｅｃｔｉｎｇ）、選定（ｃｈｏｏｓｉｎｇ）、建立（ｅｓｔａｂｌｉｓｈｉｎｇ）、比較（ｃｏｍｐａｒｉｎｇ）などを「判断（決定）」することであるとみなされてもよい。つまり、「判断（決定）」は、何らかの動作を「判断（決定）」することであるとみなされてもよい。 The term "determining" as used herein may include a wide variety of actions. For example, a "decision" is a calculation, computing, processing, deriving, investigating, searching up (eg, a table, database or another data). It may be regarded as "judgment (decision)" of search in structure, ascertaining, and the like. In addition, "judgment (decision)" includes receiving (for example, receiving information), transmitting (transmitting) (for example, transmitting information), input (input), output (output), and access (for example). It may be regarded as "decision" (for example, accessing data in memory) (accessing) or the like. In addition, "judgment (decision)" is considered to be "judgment (decision)" such as solving, selecting, selecting, erection (setting), and comparing (comparing). May be good. That is, "judgment (decision)" may be regarded as "judgment (decision)" of some action.

本明細書で使用する「接続された（ｃｏｎｎｅｃｔｅｄ）」、「結合された（ｃｏｕｐｌｅｄ）」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。例えば、「接続」は「アクセス」と読み替えられてもよい。本明細書で使用する場合、２つの要素は、１又はそれ以上の電線、ケーブル及び／又はプリント電気接続を使用することにより、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び／又は光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどを使用することにより、互いに「接続」又は「結合」されると考えることができる。 As used herein, the terms "connected", "coupled", or any variation thereof, may be any direct or indirect connection between two or more elements or. It means a bond and can include the presence of one or more intermediate elements between two elements that are "connected" or "bonded" to each other. The connection or connection between the elements may be physical, logical, or a combination thereof. For example, "connection" may be read as "access." As used herein, the two elements are by using one or more wires, cables and / or printed electrical connections, and, as some non-limiting and non-comprehensive examples, radio frequencies. It can be considered to be "connected" or "coupled" to each other by using electromagnetic energy or the like having wavelengths in the region, microwave region and / or light (both visible and invisible) regions.

本明細書又は請求の範囲で「含む（ｉｎｃｌｕｄｉｎｇ）」、「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」、及びそれらの変形が使用されている場合、これらの用語は、用語「備える」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「又は（ｏｒ）」は、排他的論理和ではないことが意図される。 As used herein or in the claims, "inclusion," "comprising," and variations thereof, these terms are as comprehensive as the term "comprising." Is intended to be. Furthermore, the term "or" as used herein or in the claims is intended not to be an exclusive OR.

以上、本開示について詳細に説明したが、当業者にとっては、本開示が本明細書中に説明した実施形態に限定されるものではないということは明らかである。本開示は、特許請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく校正及び変更態様として実施することができる。従って、本明細書の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, it will be apparent to those skilled in the art that the present disclosure is not limited to the embodiments described herein. The present disclosure can be implemented as a calibration and modification mode without departing from the spirit and scope of the present disclosure, which is determined by the description of the claims. Therefore, the description herein is for purposes of illustration only and has no limiting implications for the present disclosure.

Claims

A text processing method based on a neural network
To get the input text and
To analyze the input text and obtain the analysis result corresponding to the input text,
Using clustering to divide the input text into multiple sections
A text processing method comprising generating output text based on the plurality of sections and the analysis results.

Dividing the input text into a plurality of sections using the clustering
An initialization step that initializes a plurality of central sentences corresponding to the plurality of sections,
By calculating the similarity between the constituent sentence of the input text and the plurality of central sentences, the constituent sentences of the input text are assigned to the sections corresponding to the plurality of central sentences based on the similarity, and the plurality of said Update steps to update the constructs contained in the section,
In the plurality of sections, a determination step of determining the composition sentence having the maximum total similarity as a new central sentence by calculating the similarity between the composition sentences, and
The text processing method according to claim 1, further comprising a repeating step of repeating the update step and the determination step until the new central sentence remains unchanged.

Analyzing the input text and obtaining the analysis result corresponding to the input text
The text processing method according to claim 1, which comprises analyzing all the constituent sentences of the input text and acquiring the weights of the sentences of each of the constituent sentences as the analysis result.

Generating output text based on the plurality of sections and the results of the analysis
Based on the sentence weights of all the constituent sentences, in each of the plurality of sections, the constituent sentence having the maximum sentence weight in the section is selected as the output result corresponding to the section.
The text processing method according to claim 3, comprising combining the output results of a plurality of sections to generate output text.

The neural network includes one text processing layer, and the number of a plurality of sections obtained by dividing the input text by clustering is determined by a predetermined target number of sentences of the output text of the text processing layer. The text processing method according to any one of 1 to 4.

The neural network includes cascaded N (N ≧ 2) text processing layers.
The number of sections obtained by dividing the input text by the nth text processing layer in the cascaded N text processing layers is a predetermined target sentence of the output text of the nth text processing layer. The text processing method according to any one of claims 1 to 4, which is determined by a number.

A text processing device based on a neural network
The acquisition part that acquires the input text, and
An analysis unit that analyzes the input text and acquires the analysis result corresponding to the input text.
A division part that divides the input text into a plurality of sections using clustering,
A text processing apparatus including the plurality of sections and a generation unit that generates output text based on the analysis result.

The divided portion
An initialization step that initializes a plurality of central sentences corresponding to the plurality of sections,
By calculating the similarity between the constituent sentence of the input text and the plurality of central sentences, the constituent sentences of the input text are assigned to the sections corresponding to the plurality of central sentences based on the similarity, and the plurality of said Update steps to update the constructs contained in the section,
In each of the plurality of sections, a determination step of determining the composition sentence having the maximum total similarity as a new central sentence by calculating the similarity between the composition sentences, and
The text processing apparatus according to claim 7, wherein the update step and the repeating step of repeating the determination step are performed until the new central sentence does not change.

The text processing apparatus according to claim 7, wherein the analysis unit analyzes all the constituent sentences of the input text and acquires the weight of each sentence of all the constituent sentences as the analysis result.

The generator
Based on the sentence weights of all the constituent sentences, in each of the plurality of sections, the constituent sentence having the maximum sentence weight of the section is selected as the output result corresponding to the section.
The text processing apparatus according to claim 9, wherein the output results of a plurality of sections are combined to generate output text.

The neural network includes one text processing layer, and the number of a plurality of sections obtained by dividing the input text by clustering is determined by a predetermined target number of sentences of the output text of the text processing layer. The text processing apparatus according to any one of 7 to 10.

The neural network includes cascaded N (N ≧ 2) text processing layers.
The number of sections obtained by dividing the input text by the nth text processing layer in the cascaded N text processing layers is a predetermined target sentence of the output text of the nth text processing layer. The text processing apparatus according to any one of claims 7 to 10, which is determined by a number.

A text processing device based on a neural network
Memory configured to store computer-readable instructions,
A processor configured to execute the computer-readable instructions stored in the memory.
With
Here, when the processor executes the computer-readable instruction,
To get the input text and
To analyze the input text and obtain the analysis result corresponding to the input text,
Using clustering to divide the input text into multiple sections
A text processing device that generates and performs output text based on the plurality of sections and the analysis results.

A computer-readable storage medium that stores computer-readable instructions.
When the computer-readable instruction is executed by the computer, the computer
To get the input text and
To analyze the input text and obtain the analysis result corresponding to the input text,
Using clustering to divide the input text into multiple sections
A computer-readable storage medium that performs text processing methods, including generating output text based on the plurality of sections and the results of the analysis.