JP2018073411A

JP2018073411A - Natural language generation method, natural language generation device, and electronic apparatus

Info

Publication number: JP2018073411A
Application number: JP2017204160A
Authority: JP
Inventors: レイディン; Lei Ding; ジィチョアヌジォン; Jichuan Zheng; ビンドン; Bin Don; シャヌシャヌジアン; Jin Shanshan; イシュエントン; Ishuan Tong
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-11-04
Filing date: 2017-10-23
Publication date: 2018-05-10
Anticipated expiration: 2037-10-23
Also published as: CN108021547B; JP6601470B2; CN108021547A

Abstract

PROBLEM TO BE SOLVED: To provide a natural language generation method, a natural language generation device, and an electronic apparatus that increase flexibility of natural word generation, reduce an amount of human work, and improve correctness of language generation results.SOLUTION: A natural language generation device directly extracts a sentence type template from a corpus, guarantees correctness of a sentence type of natural words to be generated subsequently, only deleting a constitution component of predefined words in an input mode at the time of extracting the sentence type template, and thereby avoids a lot of human work. In addition, it selects a candidate sentence type template on the basis of a matching level between input word meaning and the sentence type template, and thereby improves correctness of natural words to be generated.SELECTED DRAWING: Figure 1

Description

本発明は自然言語処理の技術分野に関し、具体的には自然言語の生成方法、自然言語の生成装置及び電子機器に関する。 The present invention relates to a technical field of natural language processing, and more specifically to a natural language generation method, a natural language generation apparatus, and an electronic apparatus.

人工知能の発展に伴い、人と機械の対話等の知能システムの応用範囲が広がるなか、擬人化出力のニーズ、すなわち自然言語の直接出力に対するニーズがますます高くなっている。従来技術による自然言語の生成および出力の実現方法は、１）あらかじめ定義した言語モデルを通じて自然語句を生成する方法、２）人工定義したテンプレートを通じて自然語句を生成する方法、を含む。 With the development of artificial intelligence, the need for anthropomorphic output, that is, the direct output of natural language, is increasing as the application range of intelligent systems such as human-machine dialogue expands. Conventional methods for generating and outputting a natural language include 1) a method for generating a natural word / phrase through a predefined language model, and 2) a method for generating a natural word / phrase through an artificially defined template.

上記二種類の方法は実際の応用過程においてそれぞれ一定の問題がある。例えば、第１種の実現方法では、数学モデルで自然言語の語法とロジック関係を上手く表現するのは難しい。よって、生成した言語の正確性を保証するのは難しい。第２種は人工テンプレートに基づく方法では、通常、特定分野または単一用途にしか応用できないため、柔軟性に欠け、かつ大量の人的作業が必要となる。 Each of the above two methods has certain problems in the actual application process. For example, in the first type of realization method, it is difficult to express well the natural language wording and logic relations with a mathematical model. Therefore, it is difficult to guarantee the accuracy of the generated language. The second type is a method based on an artificial template, which is usually applicable only to a specific field or a single application, and therefore lacks flexibility and requires a large amount of human work.

よって、方法の実現に向けた柔軟性を高め、人的作業量を減らし、かつ言語の生成結果の正確性を高められる自然言語の生成方法が早急に必要である。 Therefore, there is an urgent need for a natural language generation method that can increase the flexibility for realizing the method, reduce the amount of human work, and increase the accuracy of the language generation result.

本発明の実施例が解決しようとする技術課題は、自然語句生成の柔軟性を高め、人的作業量を減らし、かつ言語生成結果の正確性を高めるための自然言語の生成方法、自然言語の生成装置及び電子機器を提供することである。 Technical problems to be solved by the embodiments of the present invention include a natural language generation method for increasing the flexibility of natural word generation, reducing the amount of human work, and improving the accuracy of the language generation result, A generation device and an electronic device are provided.

上記技術課題を解決するため、本発明の実施例は、コーパスにおける語句に基づき、予め定義された入力モードにマッチした少なくとも一つの文型テンプレートを生成するステップと、前記入力モードに基づく入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たしている少なくとも一つの候補文型テンプレートを選択するステップと、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成するステップと、を含むことを特徴とする自然言語の生成方法を提供する。 In order to solve the above technical problem, an embodiment of the present invention obtains at least one sentence template that matches a predefined input mode based on words in a corpus, and acquires an input meaning based on the input mode And calculating a matching degree between the input meaning and the sentence pattern template, selecting at least one candidate sentence pattern template whose matching degree satisfies a predetermined condition, and based on the input meaning and the candidate sentence pattern template And a natural language phrase generating step. A method for generating a natural language is provided.

この中で、上記方法において、予め定義された入力モードにマッチした少なくとも一つの文型テンプレートを生成するステップの後に、各二つの文型テンプレート間の類似度を計算するステップ、をさらに含み、前記入力語義と文型テンプレートとの間のマッチング度を計算する際に、マッチング度を計算するための現在の文型テンプレートと他の文型テンプレートとの間の類似度に基づき、前記マッチング度を計算するための次の文型テンプレートを決定する。 In this method, the method further includes the step of calculating the similarity between each two sentence pattern templates after the step of generating at least one sentence pattern template that matches a predefined input mode. When calculating the matching degree between the sentence pattern template and the sentence pattern template, the matching degree is calculated based on the similarity between the current sentence pattern template and the other sentence pattern template for calculating the matching degree. Determine the sentence template.

この中で、上記方法において、前記各二つの文型テンプレートの間の類似度を計算するステップは、以下の公式に基づき、各二つの文型テンプレートの間の類似度Sim(p₁,p₂)を計算すること、を含み、

このうち、

wはサブ語義が対応する単語を示し、p₁、p₂は各二つの文型テンプレートの第一文型テンプレートと第二テンプレートをそれぞれ示し、sは文型テンプレート中の一つの充填位置を示し、T(p,s)はコーパスにおける文型テンプレートpの充填位置sに充填できる単語の集合を示し、Num(T())は集合T()の単語数を示し、nはT(p,s)の単語数を示し、θ_wは単語wの予め設定された重み付け係数を示し、xはT(p,s)の単語を示し、cos(w,x)は単語wとxのコサイン類似度を示し、T(p₁,s)∩T(p₂,s)は二つの集合の交叉を示し、T(p₁,s)∪T(p₂,s)は二つの集合の合併を示し、
（外１）

は文型テンプレート中の全ての充填位置sに対応するY値に対して加算を行うことを示す。 In this method, in the above method, the step of calculating the similarity between each of the two sentence pattern templates is based on the following formula to calculate the similarity Sim (p ₁ , p ₂ ) between each of the two sentence pattern templates: Calculating, including

this house,

w indicates the word corresponding to the sub meaning, p ₁ and p ₂ indicate the first sentence template and the second template, respectively, and s indicates one filling position in the sentence template, and T (p, s) indicates the set of words that can be filled in the filling position s of the sentence template p in the corpus, Num (T ()) indicates the number of words in the set T (), and n is T (p, s) Indicates the number of words, θ _w indicates a preset weighting factor for word w, x indicates a word of T (p, s), and cos (w, x) indicates the cosine similarity between words w and x , T (p ₁ , s) ∩T (p ₂ , s) indicates the intersection of the two sets, T (p ₁ , s) ∪T (p ₂ , s) indicates the merger of the two sets,
(Outside 1)

Indicates that addition is performed for Y values corresponding to all filling positions s in the sentence pattern template.

この中で、上記方法において、前記入力語義と文型テンプレートとの間のマッチング度を計算するステップは、前記入力語義における各サブ語義に対し、該サブ語義の文型テンプレートにおける充填位置にそれぞれ基づき、前記コーパスにおける該充填位置に充填できる単語の第一集合を決定するステップと、該サブ語義と第一集合における各単語との間のコサイン類似度に基づき、該サブ語義と文型テンプレートにおける対応する充填位置とのマッチング因子を計算するステップであって、前記マッチング因子と前記コサイン類似度は正の相関を有する、ステップと、各サブ語義と文型テンプレートにおける対応する充填位置とのマッチング因子に基づき、前記入力語義と文型テンプレートとの間のマッチング度を計算するステップと、を含む。 In this method, in the above method, the step of calculating the degree of matching between the input meaning and the sentence pattern template is based on the filling position in the sentence pattern template of the sub meaning, for each sub meaning in the input meaning, Determining a first set of words that can be filled at the filling position in the corpus, and corresponding filling positions in the sub meaning and sentence template based on the cosine similarity between the sub meaning and each word in the first set The matching factor and the cosine similarity have a positive correlation, and the input based on the matching factor between each sub-meaning and the corresponding filling position in the sentence template Calculating a matching degree between the meaning and the sentence pattern template; No.

この中で、上記方法において、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成するステップは、前記入力語義及び／又は変換語義の単語を、前記候補文型テンプレートの対応する位置に充填し、候補自然語句を取得するステップであって、前記変換語義と前記入力語義の語義近似度は予め設定された閾値よりも高い、ステップと、前記候補自然語句における各充填位置のサブ語義により構成された充填語義と、対応する候補文型テンプレートとの間のマッチング度を計算し、前記マッチング度に基づき、マッチング度が所定しきい値に達した自然語句を選別するステップと、を含む。 In this method, in the above method, the step of generating a natural phrase based on the input meaning and the candidate sentence pattern template fills the position corresponding to the candidate sentence pattern template with the words of the input meaning and / or the conversion meaning. A candidate natural word / phrase, wherein the conversion word meaning and the input word word meaning approximation degree are higher than a preset threshold value, and the sub-meaning of each filling position in the candidate natural word / phrase Calculating a matching degree between the filled meaning and the corresponding candidate sentence template, and selecting a natural phrase whose matching degree has reached a predetermined threshold based on the matching degree.

本発明の実施例は、コーパスにおける語句に基づき、予め定義された入力モードにマッチした少なくとも一つの文型テンプレートを生成するテンプレート取得モジュールと、前記入力モードに基づく入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たしている少なくとも一つの候補文型テンプレートを選択するテンプレート選択モジュールと、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成する語句生成モジュールと、を含むことを特徴とする自然言語の生成装置をさらに提供する。 An embodiment of the present invention includes a template acquisition module that generates at least one sentence pattern template that matches a predefined input mode based on a phrase in a corpus, an input meaning based on the input mode, and the input meaning Based on the input meaning and the candidate sentence pattern template, a natural word / phrase is calculated based on the template selection module that calculates the matching degree between the sentence pattern templates and selects at least one candidate sentence pattern template whose matching degree satisfies a predetermined condition. A natural language generation device characterized by including a phrase generation module to be generated.

ここで、上記装置は、前記テンプレート取得モジュールにより予め定義された入力モードにマッチした少なくとも一つの文型テンプレートが生成された後に、各二つの文型テンプレート間の類似度を計算する類似度計算モジュール、をさらに含み、前記テンプレート選択モジュールは、前記入力語義と文型テンプレートとの間のマッチング度を計算する際に、マッチング度を計算するための現在の文型テンプレートと他の文型テンプレートとの間の類似度に基づき、前記マッチング度を計算するための次の文型テンプレートを決定する。 Here, the apparatus includes a similarity calculation module that calculates a similarity between each two sentence pattern templates after at least one sentence pattern template matching a predefined input mode is generated by the template acquisition module. In addition, the template selection module may calculate the similarity between the current sentence pattern template and the other sentence pattern template for calculating the matching degree when calculating the matching degree between the input meaning and the sentence pattern template. Based on this, the next sentence pattern template for calculating the matching degree is determined.

この中で、上記装置において、前記類似度の計算モジュールは、以下の公式に基づき、各二つの文型テンプレート間の類似度Sim(p₁,p₂)をそれぞれ計算し、

このうち、

wはサブ語義に対応する単語を示し、p₁、p₂は各二つの文型テンプレートの第一文型テンプレートと第二文型テンプレートをそれぞれ示し、sは文型テンプレート中の一つの充填位置を示し、T(p,s)はコーパスにおける文型テンプレートpの充填位置sに充填できる単語の集合を示し、Num(T())は集合T()の単語数を示し、nはT(p,s)の単語数を示し、θ_wは単語wの予め設定された重み付け係数を示し、xはT(p,s)の単語を示し、cos(w,x)は単語wとxのコサイン類似度を示し、T(p₁,s)∩T(p₂,s)は二つの集合の交叉を示し、T(p₁,s)∪T(p₂,s)は二つの集合の合併を示し、
（外２）

は文型テンプレート中の全ての充填位置sに対応するY値に対して加算を行うことを示す。 Among them, in the above device, the similarity calculation module calculates similarity Sim (p ₁ , p ₂ ) between each two sentence template based on the following formula,

this house,

w indicates the word corresponding to the sub meaning, p ₁ and p ₂ indicate the first sentence pattern template and the second sentence pattern template of each of the two sentence pattern templates, s indicates one filling position in the sentence pattern template, T (p, s) indicates a set of words that can be filled in the filling position s of the sentence template p in the corpus, Num (T ()) indicates the number of words in the set T (), and n is T (p, s) word indicates the number, theta _w represents a preset weighting coefficient word w, x represents a word of T (p, s), cos (w, x) is the cosine similarity of words w and x T (p ₁ , s) ∩T (p ₂ , s) indicates the intersection of the two sets, T (p ₁ , s) ∪T (p ₂ , s) indicates the merger of the two sets,
(Outside 2)

この中で、上記装置において、前記テンプレート選択モジュールは、前記入力語義における各サブ語義に対し、該サブ語義の文型テンプレートにおける充填位置にそれぞれ基づき、前記コーパスにおける該充填位置に充填できる単語の第一集合を決定し、該サブ語義と第一集合における各単語との間のコサイン類似度に基づき、該サブ語義と文型テンプレートにおける対応する充填位置とのマッチング因子を計算し取得し、前記マッチング因子と前記コサイン類似度は正の相関を有し、各サブ語義と文型テンプレートにおける対応する充填位置とのマッチング因子に基づき、前記入力語義と文型テンプレートとの間のマッチング度を計算する。 In this device, the template selection module, for each sub-meaning in the input meaning, based on the filling position in the sentence template of the sub-meaning, respectively, the first word that can be filled in the filling position in the corpus Determining a set, calculating and obtaining a matching factor between the sub-sense and the corresponding filling position in the sentence template based on the cosine similarity between the sub-sense and each word in the first set; and The cosine similarity has a positive correlation, and the degree of matching between the input meaning and the sentence pattern template is calculated based on a matching factor between each sub-meaning and the corresponding filling position in the sentence pattern template.

このうち、上記装置において、前記語句生成モジュールは、前記入力語義及び／又は変換語義の単語を、前記候補文型テンプレートの対応する位置に充填し、候補自然語句を取得し、前記変換語義と前記入力語義の語義類似度は予め設定された閾値よりも高く、前記候補自然語句における各充填位置のサブ語義により構成された充填語義と、対応する候補文型テンプレートとの間のマッチング度を計算し、前記マッチング度に基づき、マッチング度が所定のしきい値に達した自然語句を選別する。 Among these, in the above device, the phrase generation module fills the input semantic meaning and / or conversion semantic meaning words into corresponding positions of the candidate sentence template, obtains candidate natural phrases, and converts the conversion semantic meaning and the input The meaning of word meaning is higher than a preset threshold value, and the degree of matching between the filling meaning defined by the sub-meaning of each filling position in the candidate natural word and the corresponding candidate sentence template is calculated, Based on the matching level, natural words / phrases whose matching level reaches a predetermined threshold are selected.

本発明の実施例は、プロセッサーと、コンピュータプログラム命令が記憶されているメモリと、を含む電子機器であって、前記コンピュータプログラム命令が前記プロセッサーにより実行される時に、コーパスにおける語句に基づき、予め定義された入力モードにマッチした少なくとも一つの文型テンプレートを生成するステップと、前記入力モードに基づく入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たしている少なくとも一つの候補文型テンプレートを選択するステップと、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成するステップと、を前記プロセッサーに実行させることを特徴とする電子機器をさらに提供する。 An embodiment of the present invention is an electronic device including a processor and a memory in which computer program instructions are stored, and is pre-defined based on words in a corpus when the computer program instructions are executed by the processor. Generating at least one sentence template that matches the input mode, obtaining an input meaning based on the input mode, calculating a matching degree between the input meaning and the sentence template, and the matching degree is predetermined. An electronic device characterized by causing the processor to execute a step of selecting at least one candidate sentence pattern template that satisfies a condition, and a step of generating a natural phrase based on the input meaning and the candidate sentence pattern template. Provide further.

従来技術と比較して、本発明の実施例が提供する自然言語の生成方法、自然言語の生成装置及び電子機器は、少なくとも以下の有益な効果がある。本発明の実施例は直接コーパスから文型テンプレートを抽出し、後続して生成される自然語句の文型の正確性を保証し、かつ、文型テンプレートの抽出に際し、入力モードのうちあらかじめ定義した語句の構成部分を削除するだけで、多くの人的作業を回避する。また、本発明の実施例は入力語義と文型テンプレートとの間のマッチング度に基づき候補文型テンプレートを選び、生成される自然語句の正確性を高め、さらに、本発明の実施例は、マッチング度を通し生成された自然語句に対してフィルタリングを行うことで、取得した自然語句の正確性と多様性を両立できる。 Compared with the prior art, the natural language generation method, the natural language generation apparatus, and the electronic apparatus provided by the embodiments of the present invention have at least the following beneficial effects. The embodiment of the present invention extracts a sentence pattern template directly from a corpus, guarantees the accuracy of the sentence pattern of a natural word phrase that is generated subsequently, and forms a predefined phrase in the input mode when extracting the sentence pattern template. Avoiding a lot of human work just by deleting parts. In addition, the embodiment of the present invention selects a candidate sentence pattern template based on the matching degree between the input meaning and the sentence pattern template, and improves the accuracy of the generated natural word / phrase. By filtering the generated natural phrases, it is possible to achieve both the accuracy and diversity of the acquired natural phrases.

本発明の実施例１に係る自然言語の生成方法のフローチャートである。It is a flowchart of the production | generation method of the natural language which concerns on Example 1 of this invention. 本発明の実施例２に係る自然言語の生成方法のフローチャートである。It is a flowchart of the production | generation method of the natural language which concerns on Example 2 of this invention. 本発明の実施例３に係る自然言語生成装置の構造図である。It is a structural diagram of the natural language generation device according to Embodiment 3 of the present invention. 本発明の実施例３に係るもう一つの自然言語生成装置の構造図である。It is a structural diagram of another natural language generator according to Embodiment 3 of the present invention. 本発明の実施例に係る電子機器の構造図である。1 is a structural diagram of an electronic apparatus according to an embodiment of the present invention.

本発明の実施例が解決しようとする技術課題、技術方案および優れた点をより明確にさせるために、下記の説明において、例えば具体的な配置およびコンポーネントの特定の細部を提供するのは、単に本発明の実施例に対する全面的な理解を助けるためである。よって、当業者であれば、ここに説明する実施例に対し、本発明の範囲と精神から逸脱しない場合に各種改変と修正を行うことが可能だと分かるはずである。このほか、明確で簡潔にするため、既知の機能と構造に対する説明は省略する。 In order to make the technical problem, technical solution and advantages to be solved by the embodiments of the present invention clearer, in the following description, for example, specific arrangements and specific details of components are merely provided. This is to help a full understanding of the embodiments of the present invention. Thus, those skilled in the art will appreciate that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of known functions and structures are omitted for clarity and brevity.

明細書全文を通じて述べられる「一つの実施例」または「一実施例」は、実施例に関連する特定の特徴、構造または特性が、本発明の少なくとも一つの実施例の中に含まれることを意味すると理解されたい。よって、要約書全編の各所に出現する「一つの実施例において」「一実施例において」は、必ずしも同じ実施例を指すとは限らない。このほか、これら特定の特徴、構造または特性は、任意の適合する方法で一つまたは多くの実施例において結びつけることができる。 “One embodiment” or “one embodiment” throughout the specification means that a particular feature, structure or characteristic associated with the embodiment is included in at least one embodiment of the present invention. Then I want you to understand. Thus, “in one embodiment” and “in one embodiment” appearing in various parts of the full summary do not necessarily refer to the same embodiment. In addition, these specific features, structures or characteristics may be combined in one or many embodiments in any suitable manner.

本発明の各種実施例において、下記各過程の通し番号の大きさは実行順序の前後を意味するものではなく、各過程の実行順序はその機能と内在的なロジックにより決定されるのであって、本発明の実施例の実施過程に対しいかなる限定も構成してはならないことを理解されたい。本文中の用語「および/または」は、単なる関連対象の関連関係の説明であり、三種類の関係が存在し得ることを示すことを理解されたい。例えば、Aおよび/またはBは、Aが単独で存在する、AとBが同時に存在する、Bが単独で存在する、の三種類の状況を示すことができる。このほか、本文中の符号「/」は、一般的に前と後の関連対象が「または」の関係にあることを示す。本明細書が提供する実施例において、「Aと対応するB」はBとAが互いに関連し、Aに基づきBを決定できることを示すと理解されたい。ただし、Aに基づくBの決定は単にAに基づくBの決定のみを意味するのではなく、Aおよび/またはその他情報に基づきBを決定してもよいことも理解されたい。 In various embodiments of the present invention, the size of the serial number of each process below does not mean the order of execution, and the execution order of each process is determined by its function and intrinsic logic. It should be understood that no limitation should be made to the implementation process of the embodiments of the invention. It is to be understood that the term “and / or” in the text is merely a description of the related relationship of the related object, indicating that there are three types of relationships. For example, A and / or B can indicate three situations where A is present alone, A and B are present simultaneously, and B is present alone. In addition, the sign “/” in the text indicates that the related object before and after is generally in the relationship of “or”. In the examples provided herein, “B corresponding to A” should be understood to indicate that B and A are related to each other and that B can be determined based on A. However, it should also be understood that determining B based on A does not just mean determining B based on A, but may determine B based on A and / or other information.

まず、本発明の以下の各実施例において係る関連概念に対して説明する。 First, related concepts in the following embodiments of the present invention will be described.

本発明実施例において、入力モードは入力する単語の分類を指し、具体的には、前記分類は名詞、動詞、形容詞、数詞、助数詞、副詞、代詞、接続詞、介詞、助詞、語気助詞等を含むことができ、例えば、入力モードを、一つの名詞と一つの動詞としてよい。また前記入力モードでは、入力する単語が文法構造上の構成部分であってもよいし、作用を受け持つものであってもよい。具体的には、前記構成部分は主語、述語、目的語、連体修飾語、連用修飾語、補語等であってもよい。つまり、入力モードが入力する単語の語句中における構成成分を定義する。 In the embodiment of the present invention, the input mode refers to the classification of the input word, and specifically, the classification includes a noun, a verb, an adjective, a numeral, a particle, an adverb, a pronoun, a conjunction, an injunction, a particle, a verbal particle, and the like. For example, the input mode may be one noun and one verb. In the input mode, a word to be input may be a constituent part in the grammatical structure or may have an action. Specifically, the component may be a subject, a predicate, an object, a combination modifier, a combination modifier, a complement, and the like. That is, the component in the phrase of the word input by the input mode is defined.

入力語義とは入力する単語または単語ベクトル（単語のもう一種の表現形式）を指す。一つの入力語義中に多くの単語または単語ベクトルを含むことができるため、本文では入力語義の中の各単語または単語ベクトルをサブ語義という。例えば、入力語義が、「京東」と「一号店」の場合、「京東」と「一号店」をまとめて一つの入力語義とし、「京東」と「一号店」のそれぞれを上記入力語義中の一つのサブ語義として区別する。 Input meaning means an input word or word vector (another form of expression of a word). Since many words or word vectors can be included in one input meaning, in the text, each word or word vector in the input meaning is called a sub-meaning. For example, if the input meanings are “Kyoto” and “No. 1 store”, “Kyoto” and “No. 1 store” are combined into one input meaning, and each of “Kyoto” and “No. 1 store” is input above Distinguish as one sub-meaning within the meaning.

文型テンプレートとは、語句から入力モードで定義した文の成分を除去した後に得られるテンプレートである。例えば、語句「我々は市場で服を買う」に対し、あらかじめ定義した入力モードを主語と述語にした場合は、上記語句から主語「我々」と述語「買う」を削除した後、得られる文型テンプレートは［主語］市場で［述語］服、となる。つまり、［］の中の部分は、入力モードで定義した構成成分が語句の中で占める充填位置となり、該充填位置の成分は削除される。後続して自然語句を生成する場合は、上記入力モードに合致する入力語義の中の各サブ語義を、対応する充填位置に充填すれば、自然言語の語句（自然語句）を得ることができる。例えば、入力語義が「王さん」と「売る」である場合、上記文型テンプレートに充填した後、王さんは市場で服を売る、という語句を得ることができる。 A sentence template is a template obtained after removing sentence components defined in an input mode from a phrase. For example, for the phrase “we buy clothes in the market”, if the pre-defined input mode is the subject and predicate, the sentence template obtained after deleting the subject “we” and the predicate “buy” from the above phrase Becomes [predicate] clothes in the [subject] market. That is, the part in [] is the filling position occupied by the component defined in the input mode in the phrase, and the component at the filling position is deleted. In the case of subsequently generating a natural word / phrase, a natural language word / phrase (natural word / phrase) can be obtained by filling each sub-meaning in the input word meaning that matches the input mode into the corresponding filling position. For example, if the input meanings are “Mr. Wang” and “Sell”, after filling the sentence template, the word “Wang will sell clothes in the market” can be obtained.

次に、図面および具体的な実施例を参照しながら本発明に対する詳細な説明を行う。
＜実施例一＞
図１に示すように、本発明の実施例一は、自然言語の生成方法を提供し、該方法は人と機械の対話システムまたは画像記述生成システム等の環境に応用することができる。図１を参照に、該方法は以下のステップを含む。 The present invention will now be described in detail with reference to the drawings and specific examples.
<Example 1>
As shown in FIG. 1, Embodiment 1 of the present invention provides a natural language generation method, which can be applied to an environment such as a human-machine interaction system or an image description generation system. Referring to FIG. 1, the method includes the following steps.

ステップ11では、コーパスの語句に基づき、あらかじめ定義した入力モードにマッチする少なくとも一つの文型テンプレートを生成する。 In step 11, at least one sentence pattern template that matches a predefined input mode is generated based on the corpus phrase.

本発明の実施例は、あらかじめ設定したコーパスの語句より、直接該語句から前記入力モードで定義した語句の構成成分を削除して、文型テンプレートを得る。文型テンプレート中の、削除した構成成分の位置の場所を空にしておき、充填位置とし、入力語義の中で対応する単語を後続して充填するときに用いる。コーパスには通常大量の語句が保存されており、多くの語句が上記入力モードと相互にマッチする可能性があるため、これらマッチする語句に対応して、多くの文型テンプレートを抽出することができる。 In an embodiment of the present invention, a sentence template is obtained by directly deleting a constituent component of a word defined in the input mode from a word of a corpus set in advance. The position of the deleted component in the sentence template is emptied to be a filling position, which is used when the corresponding word is subsequently filled in the input meaning. A corpus usually contains a large number of phrases, and many phrases can be matched with the above input modes. Therefore, many sentence templates can be extracted corresponding to these matching phrases. .

ここでは、前記入力モードはユーザーが定義または決定する入力モードであってもよいし、システムが生成した入力モードであってもよい。例えば、画像記述生成システムは画像内容を識別し、かつ自然言語を用いてそれを記述することもできる。このとき、入力モードは該システムが画像内容を識別した後生成した入力モードであってもよい。 Here, the input mode may be an input mode defined or determined by a user, or may be an input mode generated by the system. For example, an image description generation system can identify image content and describe it using natural language. At this time, the input mode may be an input mode generated after the system identifies the image content.

ステップ12では、前記入力モードに基づき入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たす少なくとも一つの候補文型テンプレートを選出する。 In step 12, the input meaning is acquired based on the input mode, the matching degree between the input meaning and the sentence pattern template is calculated, and at least one candidate sentence pattern template satisfying a predetermined condition is selected.

ステップ12において、入力語義と文型テンプレートのマッチング度を計算し、さらに、マッチング度が所定条件を満たす文型テンプレートを選出し、候補文型テンプレートとする。前記所定条件はシーンの必要性または計算量等の要素に基づき設置してよい、例えばマッチング度があらかじめ設定した数値のしきい値を超える文型テンプレートであってもよいし、マッチング度が最も高いN個の文型テンプレートであってもよく、ここでNとは一個の正の整数である。同様に、入力語義はユーザーが入力する語義でもよいし、あるシステムが自ら生み出した語義、例えば前記文中で取り上げた画像記述生成システムが生成した語義でもよい。 In step 12, the matching level between the input meaning and the sentence pattern template is calculated, and a sentence pattern template whose matching degree satisfies a predetermined condition is selected and set as a candidate sentence pattern template. The predetermined condition may be set based on factors such as the necessity of the scene or the amount of calculation, for example, a sentence template whose matching degree exceeds a preset threshold value, or N having the highest matching degree. May be a sentence template, where N is a single positive integer. Similarly, the input meaning may be a meaning input by the user, or a meaning created by a certain system, for example, a meaning generated by the image description generation system taken up in the sentence.

上記ステップ12において、入力語義と文型テンプレートのマッチング度を計算する時、具体的には以下の方式で計算できる。 In step 12 above, when calculating the matching degree between the input meaning and the sentence template, it can be calculated specifically by the following method.

ステップ121では、前記入力語義中の各サブ語義に対して、該サブ語義の文型テンプレートにおける充填位置にそれぞれ基づき、前記コーパスの中で該充填位置に充填できる単語の第一集合を決定し、並びに、該サブ語義と第一集合にある各単語との間のコサイン類似度に基づき、該サブ語義と文型テンプレート中で充填位置が対応するマッチング因子を計算し取得するが、このとき、前記マッチング因子と前記コサイン類似度は正の相関である。 In step 121, for each sub-meaning in the input meaning, a first set of words that can be filled in the filling position in the corpus is determined based on the filling position in the sentence template of the sub-meaning, and Based on the cosine similarity between the sub meaning and each word in the first set, the matching factor corresponding to the filling position in the sub meaning and the sentence template is calculated and acquired. And the cosine similarity is positively correlated.

上記ステップ121において、前記コーパスの中で該充填位置に充填できる単語の第一集合を決定する時、前記コーパスの中で前記文型テンプレートとマッチする語句で該充填位置にある単語に基づき、第一集合の単語を決定かつ取得することができる。その後に、該サブ語義と第一集合の各単語との間のコサイン類似度を計算するが、最良の計算方式は、サブ語義が対応する単語ベクトルと、第一集合の各単語が対応する単語ベクトルとの間のコサイン類似度（コサイン距離）を計算することであり、さらに該コサイン類似度に基づき上記マッチング因子を計算し、マッチング因子と上記コサイン類似度が正の相関である、すなわちコサイン類似度が高いほど、マッチング因子の値は大きくなり、両者はよりマッチする。逆に、コサイン類似度が低いほど、マッチング因子の値は小さくなり、両者はよりノンマッチになる。以下にマッチング因子の計算方式を提供するが、本発明の実施例がこれに限らないことに注意されたい。

In the step 121, when determining the first set of words that can be filled in the filling position in the corpus, the first set of words that match the sentence template in the corpus is based on the words at the filling position. A set of words can be determined and obtained. Thereafter, the cosine similarity between the sub meaning and each word of the first set is calculated. The best calculation method is to use the word vector corresponding to the sub meaning and the word corresponding to each word of the first set. The cosine similarity (cosine distance) between the vectors is calculated, the matching factor is calculated based on the cosine similarity, and the matching factor and the cosine similarity are positively correlated, that is, cosine similarity. The higher the degree, the greater the value of the matching factor and the better the match. Conversely, the lower the cosine similarity, the smaller the value of the matching factor and the more non-matching the two. Although a method for calculating a matching factor is provided below, it should be noted that embodiments of the present invention are not limited thereto.

上記公式(1)において、wはサブ語義が対応する単語を示し、sは文型テンプレートp中の一つの充填位置を示し、AM(p,s,w)は単語wと文型テンプレートp中の充填位置sのマッチング因子を示し、θ_wは単語wにあらかじめ設定した重み付け係数を示し、T(p,s)はコーパスにおいてテンプレートpの充填位置sに充填できる単語の集合を示し、nはT(p,s)の単語数を示し、xはT(p,s)の単語を示し、cos(w,x)は単語wとxのコサイン類似度を示す。 In the above formula (1), w indicates the word corresponding to the sub meaning, s indicates one filling position in the sentence pattern template p, and AM (p, s, w) indicates the filling in the word w and the sentence pattern template p. Indicates a matching factor for position s, θ _w indicates a weighting factor set in advance for word w, T (p, s) indicates a set of words that can be filled in filling position s of template p in the corpus, and n indicates T ( p, s) indicates the number of words, x indicates the word of T (p, s), and cos (w, x) indicates the cosine similarity between the words w and x.

ステップ122では、各サブ語義と文型テンプレート中で充填位置が対応するマッチング因子に基づき、前記入力語義と文型テンプレートとの間のマッチング度を計算する。 In step 122, the matching degree between the input meaning and the sentence pattern template is calculated based on the matching factor corresponding to the filling position in each sub meaning and sentence pattern template.

前記ステップ122において、各サブ語義と文型テンプレート中で充填位置が対応するマッチング因子の平均値を計算し、該平均値を前記入力語義と文型テンプレートとの間のマッチング度としてもよいし、全てのサブ語義と文型テンプレート中で充填位置が対応するマッチング因子との和の値を計算し、該和の値を前記入力語義と文型テンプレートとの間のマッチング度にしてもよい。 In step 122, an average value of matching factors corresponding to filling positions in each sub meaning and sentence pattern template is calculated, and the average value may be used as a matching degree between the input meaning and sentence pattern template, A sum value of the sub meaning and the matching factor corresponding to the filling position in the sentence template may be calculated, and the sum value may be used as a matching degree between the input meaning and the sentence template.

ステップ13では、前記入力語義と前記候補文型テンプレートに基づき、自然言語を生成する。 In step 13, a natural language is generated based on the input meaning and the candidate sentence template.

前記ステップ13において、自然語句を取得する方法は以下の通りである。候補文型テンプレートを選出した後、入力語義の単語を候補文型テンプレートの中で対応する充填位置に充填し、自然語句を取得することができる。 In step 13, a method for acquiring a natural word / phrase is as follows. After selecting the candidate sentence pattern template, the word of the input meaning can be filled in the corresponding filling position in the candidate sentence pattern template, and the natural phrase can be acquired.

自然語句の多様性を取得するための、上記ステップ13の実現方式は、前記入力語義との語義類似度があらかじめ設定した閾値より高い変換語義を幾つか決定し、その後、前記入力語義および/または変換語義の単語を、前記候補文型テンプレートの中で対応する位置に充填し、さらに多様性のある自然語句を取得する。ここで語義類似度は単語ベクトル間のコサイン類似度に基づき計算できる。 In order to acquire the diversity of natural phrases, the implementation method of the above step 13 determines several conversion meanings whose meaning similarity with the input meaning is higher than a preset threshold value, and then the input meaning and / or The words of conversion meaning are filled in the corresponding positions in the candidate sentence template, and further natural words / phrases with diversity are obtained. Here, the semantic similarity can be calculated based on the cosine similarity between word vectors.

取得した自然語句の正確性と多様性のバランスをとるため、上記ステップ13の実現方式は、前記入力語義および/または変換語義の単語を、前記候補文型テンプレートの中で対応する位置に充填し、候補自然語句を得る。前記候補自然語句の各充填位置にあるサブ語義が構成する充填語義と、対応する候補文型テンプレートとの間のマッチング度を計算し、かつ前記マッチング度に基づき、マッチング度が所定しきい値に達する自然語句を選別する。ここでマッチング度の計算方式は、上記ステップ121〜122の実現を参考にできるため、説明を省略する。 In order to balance the accuracy and diversity of the acquired natural words / phrases, the implementation method of step 13 described above fills the words of the input meaning and / or conversion meaning in the corresponding positions in the candidate sentence template, Get candidate natural words. The degree of matching between the filling meaning defined by the sub meaning at each filling position of the candidate natural word / phrase and the corresponding candidate sentence template is calculated, and the degree of matching reaches a predetermined threshold based on the degree of matching. Select natural language phrases. Here, since the calculation method of the matching degree can be referred to the realization of the above steps 121 to 122, the description is omitted.

以上のステップを経て、本発明の実施例は直接コーパスから文型テンプレートを抽出し、後続して生成される自然語句の文型の正確性を保証し、かつ、文型テンプレートの抽出に際し、入力モードのうちあらかじめ定義した語句の構成成分を削除するだけで、多くの人的作業を回避する。また、本発明の実施例は入力語義と文型テンプレートとの間のマッチング度に基づき候補文型テンプレートを選び、生成される自然語句の正確性を高め、さらに、本発明の実施例はマッチング度を通し生成された自然語句に対してフィルタリングを行うことで、取得した自然語句の正確性と多様性を両立できる。
＜実施例二＞
図2に示すように、本発明の実施例二が提供する自然言語の生成方法は、候補文型テンプレートを後続して選択する効率を向上させるため、文型テンプレートを取得した後、さらに各二つの文型テンプレート間の類似度を計算し、その上で文型テンプレート間の類似度を利用することで、後続する候補文型テンプレートの選択効率を向上させる。図２を参照に、該方法は以下のステップを含む。 Through the above steps, the embodiment of the present invention directly extracts the sentence pattern template from the corpus, guarantees the accuracy of the sentence pattern of the natural language phrase that is generated subsequently, and extracts the sentence pattern template from the input mode. A lot of human work is avoided by simply removing the components of the predefined words. Further, the embodiment of the present invention selects a candidate sentence pattern template based on the matching degree between the input meaning and the sentence pattern template, and improves the accuracy of the generated natural word / phrase. Further, the embodiment of the present invention passes the matching degree through the matching degree. By filtering the generated natural phrases, it is possible to achieve both the accuracy and diversity of the acquired natural phrases.
<Example 2>
As shown in FIG. 2, the natural language generation method provided by the second embodiment of the present invention acquires two sentence patterns after acquiring a sentence pattern template in order to improve the efficiency of selecting candidate sentence pattern templates subsequently. By calculating the similarity between the templates and using the similarity between the sentence pattern templates, the selection efficiency of the subsequent candidate sentence pattern templates is improved. Referring to FIG. 2, the method includes the following steps.

ステップ21では、コーパスの語句に基づき、あらかじめ定義した入力モードにマッチする少なくとも一つの文型テンプレートを生成する。 In step 21, at least one sentence pattern template that matches a predefined input mode is generated based on the corpus phrase.

ここで、文型テンプレート生成の具体的実現には実施例一を参考にできるため、説明を省略する。 Here, the specific implementation of the sentence pattern template generation can be referred to the first embodiment, and the description thereof will be omitted.

ステップ22では、前記の少なくとも一つの文型テンプレートにおける各二つの文型テンプレート間の類似度を計算する。 In step 22, the similarity between each two sentence pattern templates in the at least one sentence pattern template is calculated.

ここで、上記ステップ22において、以下の公式によって各二つの文型テンプレート間の類似度Sim(p₁,p₂)を計算できる。

このうち、

Here, in step 22 above, the similarity Sim (p ₁ , p ₂ ) between each two sentence template can be calculated by the following formula.

this house,

上記公式において、wはサブ語義が対応する単語を示し、p₁、p₂は各二つの文型テンプレートの第一文型テンプレートと第二文型テンプレートをそれぞれ示し、sは文型テンプレート中の一つの充填位置を示し、T(p,s)はコーパスにおいて文型テンプレートpの充填位置sに充填できる単語の集合を示し、Num(T())は集合T()の単語数を示し、AM(p,s,w)は単語wと文型テンプレートp中の充填位置sのマッチング因子を示し、nはT(p,s)の単語数を示し、θ_wは単語wにあらかじめ設定した重み付け係数を示し、xはT(p,s)の単語を示し、cos(w,x)は単語wとxのコサイン類似度を示し、T(p₁,s)∩T(p₂,s)は二つの集合の交叉を示し、(p₁,s)∪T(p₂,s)は二つの集合の合併を示し、
（外３）

は文型テンプレート中の全ての充填位置sが対応するY値に対して行う加算を示す。 In the above formula, w indicates the word corresponding to the sub meaning, p ₁ and p ₂ indicate the first sentence pattern template and the second sentence pattern template of each of the two sentence pattern templates, and s indicates one filling in the sentence pattern template. T (p, s) indicates a set of words that can be filled in the filling position s of the sentence template p in the corpus, Num (T ()) indicates the number of words in the set T (), and AM (p, s, w) indicates the matching factor between the word w and the filling position s in the sentence template p, n indicates the number of words of T (p, s), θ _w indicates the weighting factor set in advance for the word w, x indicates the word of T (p, s), cos (w, x) indicates the cosine similarity between the words w and x, and T (p ₁ , s) ∩T (p ₂ , s) is the two sets (P ₁ , s) ∪T (p ₂ , s) indicates the union of the two sets,
(Outside 3)

Indicates the addition performed for the Y values corresponding to all the filling positions s in the sentence pattern template.

ステップ23では、前記入力モードに基づき入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たす少なくとも一つの候補文型テンプレートを選出し、このうち、マッチング度の計算過程において、文型テンプレート間の類似度に基づき、前記マッチング度を計算する次の文型テンプレートを決定する。 In step 23, an input meaning is obtained based on the input mode, a matching degree between the input meaning and a sentence pattern template is calculated, and at least one candidate sentence pattern template satisfying a predetermined condition is selected, Among them, in the matching degree calculation process, the next sentence pattern template for calculating the matching degree is determined based on the similarity between the sentence pattern templates.

ここで、上記ステップ23において、前記入力語義と文型テンプレートとの間のマッチング度を計算する過程において、現在マッチング度を計算している文型テンプレートとその他文型テンプレートとの間の類似度に基づき、前記マッチング度を計算する次の文型テンプレートを決定することによって、候補文型テンプレートを選択する効率を向上させる。 Here, in the step 23, in the process of calculating the matching degree between the input meaning and the sentence pattern template, based on the similarity between the sentence pattern template that is currently calculating the matching degree and the other sentence pattern template, By determining the next sentence pattern template for calculating the matching degree, the efficiency of selecting the candidate sentence pattern template is improved.

例えば、まずステップ21より得た前記少なくとも一つの文型テンプレートから一つの文型テンプレートを選出し、現在の文型テンプレートとし、その後、前記入力語義と現在の文型テンプレートとの間のマッチング度を計算し、もしマッチング度があらかじめ設定した第一しきい値に達しない場合は、前記少なくとも一つの文型テンプレートで余った文型テンプレートの中から一つ文型テンプレートを選出し、前記マッチング度を計算する次の文型テンプレートとする。もしマッチング度があらかじめ設定した第一しきい値に達する場合は、文型テンプレート間の類似度に基づき、現在の文型テンプレートとの類似度があらかじめ設定した第二しきい値を超える計算待ち文型テンプレートを選出し、前記マッチング度を計算する次の文型テンプレートとする。前記少なくとも一つの文型テンプレート中の余った文型テンプレートの数量が0であるとき、或いは、前記入力語義とのマッチング度があらかじめ設定した閾値に達する文型テンプレートの数量が所定数量に達した場合は、マッチング度の計算過程を終了し、かつ計算し取得した前記入力語義と各文型テンプレートとの間のマッチング度に基づき、前記所定条件を満たす少なくとも一つの候補文型テンプレートを選出してよい。 For example, first, one sentence pattern template is selected from the at least one sentence pattern template obtained in step 21 and set as the current sentence pattern template, and then the matching degree between the input meaning and the current sentence pattern template is calculated. If the matching level does not reach the preset first threshold value, one sentence pattern template is selected from the remaining sentence pattern templates in the at least one sentence pattern template, and the next sentence pattern template for calculating the matching degree is To do. If the matching level reaches the preset first threshold value, based on the similarity between the sentence type templates, the queued sentence type template whose similarity with the current sentence type template exceeds the preset second threshold value is selected. The next sentence template is selected and the matching degree is calculated. When the number of remaining sentence pattern templates in the at least one sentence pattern template is 0, or when the number of sentence pattern templates whose matching degree with the input meaning reaches a predetermined threshold reaches a predetermined number, matching is performed. At least one candidate sentence pattern template satisfying the predetermined condition may be selected based on a matching degree between the input meaning and the sentence pattern templates calculated and acquired.

なお、上記に挙げた例は候補文型テンプレートの選出過程を加速させるための一つの例にすぎず、本発明の実施例は文型テンプレート間の類似度に基づき、そのほかの算出方法を採用し、選択効率を向上させてもよい。 The example given above is only one example for accelerating the selection process of candidate sentence template, and the embodiment of the present invention adopts another calculation method based on the similarity between sentence template, and selects it. Efficiency may be improved.

前記入力語義と文型テンプレートとの間のマッチング度をどう計算するかについては、実施例一の実現過程を参考にできるので、説明を省略する。 Since how to calculate the matching degree between the input meaning and the sentence template can be referred to the realization process of the first embodiment, description thereof is omitted.

ステップ24では、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成する。 In step 24, a natural phrase is generated based on the input meaning and the candidate sentence template.

ここでは、ステップ24において、まず、前記入力語義および/または変換語義の単語を、前記候補文型テンプレートの対応する位置に充填し、候補自然語句を得ることが可能で、このとき、前記変換語義と前記入力語義の語義類似度はあらかじめ設定した閾値より高いこととし、変換語義のサブ語義（単語）はステップ21にある同一コーパスから選び取ってもよいし、またその外のコーパスから選び取ってもよく、例えば、インターネットのコーパスから選び取り、後続して生成される自然語句の多様性を高めてもよい。その後、前記候補自然語句において各充填位置にあるサブ語義が構成する充填語義と、対応する候補文型テンプレートとの間のマッチング度を計算し、かつ前記マッチング度に基づき、マッチング度が所定しきい値に達した自然語句を選別する。
＜実施例三＞
本実施例は上記各実施例が述べる自然言語の生成方法を実現するために用いる装置を提供し、図３を参照に、本発明の実施例は自然言語生成装置30を提供し、該装置は以下のモジュールを含む。 Here, in step 24, first, the words of the input meaning and / or the conversion meaning can be filled in the corresponding positions of the candidate sentence template to obtain a candidate natural word / phrase. The meaning similarity of the input meaning is higher than a preset threshold value, and the sub meaning (word) of the conversion meaning may be selected from the same corpus in Step 21 or may be selected from other corpora. For example, it is possible to increase the diversity of natural phrases that are selected from an Internet corpus and subsequently generated. Thereafter, the degree of matching between the filling meaning defined by the sub-meanings at each filling position in the candidate natural phrase and the corresponding candidate sentence template is calculated, and the matching degree is a predetermined threshold value based on the matching degree. Select natural phrases that reach.
<Example 3>
The present embodiment provides an apparatus used for realizing the natural language generation method described in each of the above embodiments, and referring to FIG. 3, the embodiment of the present invention provides a natural language generation apparatus 30, Includes the following modules:

テンプレート取得モジュール31は、コーパスの語句に基づき、あらかじめ定義した入力モードにマッチする少なくとも一つの文型テンプレートを生成する。 The template acquisition module 31 generates at least one sentence pattern template that matches a predefined input mode based on the corpus phrase.

テンプレート選択モジュール32は、前記入力モードに基づき入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たす少なくとも一つの候補文型テンプレートを選択する。 The template selection module 32 acquires an input meaning based on the input mode, calculates a matching degree between the input meaning and a sentence pattern template, and selects at least one candidate sentence pattern template in which the matching degree satisfies a predetermined condition. .

語句生成モジュール33は、前記入力語義と前記候補文型テンプレートに基づき自然語句を生成する。 The phrase generation module 33 generates a natural phrase based on the input meaning and the candidate sentence template.

候補文型テンプレートの選択情報を向上させるため、図４に示す通り、本発明の実施例の自然言語生成装置はさらに以下のモジュールを含んでよい。類似度計算モジュール34は、前記テンプレート取得モジュールであらかじめ定義した入力モードにマッチする少なくとも一つの文型テンプレートを生成した後、各二つの文型テンプレート間の類似度を計算する。このとき、前記テンプレート選択モジュール32は、前記入力語義と文型テンプレートとの間のマッチング度を計算する過程において、現在マッチング度を計算している文型テンプレートとその他文型テンプレートとの間の類似度に基づき、前記マッチング度を計算する次の文型テンプレートを決定する。 In order to improve the selection information of the candidate sentence pattern template, as shown in FIG. 4, the natural language generation apparatus according to the embodiment of the present invention may further include the following modules. The similarity calculation module 34 generates at least one sentence pattern template that matches the input mode defined in advance by the template acquisition module, and then calculates the similarity between the two sentence pattern templates. At this time, in the process of calculating the matching degree between the input meaning and the sentence pattern template, the template selection module 32 is based on the similarity between the sentence pattern template for which the matching degree is currently calculated and the other sentence pattern templates. The next sentence pattern template for calculating the matching degree is determined.

このうち、前記テンプレート選択モジュール32は具体的に以下のモジュールを含んでよい。第一選択サブモジュールは、上記少なくとも一つの文型テンプレートから一つの文型テンプレートを選出し、現在の文型テンプレートとする。計算サブモジュールは、前記入力語義と現在の文型テンプレートとの間のマッチング度を計算する。第一処理サブモジュールは、前記入力語義と現在の文型テンプレートとの間のマッチング度があらかじめ設定した第一しきい値に達しない場合に、前記少なくとも一つの文型テンプレートにおいて余った文型テンプレートの中から一つ文型テンプレートを選出し、前記マッチング度を計算する次の文型テンプレートとする。第二処理サブモジュールは、前記入力語義と現在の文型テンプレートとの間のマッチング度があらかじめ設定した第一しきい値に達しない場合に、文型テンプレート間の類似度に基づき、現在の文型テンプレートとの類似度があらかじめ設定した第二しきい値を超える計算待ち文型テンプレートを選出し、前記マッチング度を計算する次の文型テンプレートとする。 Among these, the template selection module 32 may specifically include the following modules. The first selection sub-module selects one sentence pattern template from the at least one sentence pattern template and sets it as the current sentence pattern template. The calculation submodule calculates a matching degree between the input meaning and the current sentence template. The first processing sub-module, when the matching degree between the input meaning and the current sentence template does not reach a preset first threshold, from among the sentence template remaining in the at least one sentence template One sentence pattern template is selected and set as the next sentence pattern template for calculating the matching degree. The second processing sub-module, when the degree of matching between the input meaning and the current sentence pattern template does not reach the preset first threshold, based on the similarity between the sentence pattern templates, Is selected as a next sentence pattern template for calculating the matching degree.

前記類似度計算モジュール34は、以下の公式に基づき、各二つの文型テンプレート間の類似度Sim(p₁,p₂)を計算する。

このうち、

wはサブ語義が対応する単語を示し、p₁、p₂は各二つの文型テンプレートの第一文型テンプレートと第二文型テンプレートをそれぞれ示し、sは文型テンプレート中の一つの充填位置を示し、T(p,s)はコーパスにおいて文型テンプレートpの充填位置sに充填できる単語の集合を示し、Num(T())は集合T()の単語数を示し、nはT(p,s)の単語数を示し、θ_wは単語wにあらかじめ設定した重み付け係数を示し、xはT(p,s)の単語を示し、cos(w,x)は単語wとxのコサイン類似度を示し、T(p₁,s)∩T(p₂,s)は二つの集合の交叉を示し、T(p₁,s)∪T(p₂,s)は二つの集合の合併を示し、
（外４）

は文型テンプレート中の全ての充填位置sが対応するY値に対して行う加算を示す。 The similarity calculation module 34 calculates the similarity Sim (p ₁ , p ₂ ) between each two sentence template based on the following formula.

this house,

w indicates the word corresponding to the sub meaning, p ₁ and p ₂ indicate the first sentence pattern template and the second sentence pattern template of each of the two sentence pattern templates, s indicates one filling position in the sentence pattern template, T (p, s) indicates a set of words that can be filled in the filling position s of the sentence template p in the corpus, Num (T ()) indicates the number of words in the set T (), and n is T (p, s) indicates the number of words, theta _w represents a weighting coefficient preset in the word w, x represents a word of T (p, s), cos (w, x) denotes the cosine similarity of words w and x , T (p ₁ , s) ∩T (p ₂ , s) indicates the intersection of the two sets, T (p ₁ , s) ∪T (p ₂ , s) indicates the merger of the two sets,
(Outside 4)

ここで、前記テンプレート選択モジュール32は、入力語義と文型テンプレートとの間のマッチング度を計算する時に、具体的に、前記入力語義中の各サブ語義に対し、該サブ語義の文型テンプレートにおける充填位置にそれぞれ基づき、前記コーパスの中で該充填位置に充填できる単語の第一集合を決定し、該サブ語義と第一集合にある各単語との間のコサイン類似度に基づき、該サブ語義と文型テンプレート中で充填位置が対応するマッチング因子を計算し取得し、このとき、前記マッチング因子と前記コサイン類似度は正の相関を有し、その後、各サブ語義と文型テンプレート中で充填位置が対応するマッチング因子に基づき、前記入力語義と文型テンプレートとの間のマッチング度（例えばマッチング因子の平均値或いは和の値等を使う）を計算する。 Here, when the template selection module 32 calculates the matching degree between the input meaning and the sentence pattern template, specifically, for each sub meaning in the input meaning, the filling position in the sentence pattern template of the sub meaning And determining a first set of words that can be filled in the filling position in the corpus, and based on the cosine similarity between the sub-sense and each word in the first set. The matching factor corresponding to the filling position in the template is calculated and acquired. At this time, the matching factor and the cosine similarity have a positive correlation, and then the filling position corresponds in each sub meaning and sentence template. Based on the matching factor, the degree of matching between the input meaning and the sentence template (for example, the average value or the sum of the matching factors is used). ) Is calculated.

ここで、前記語句生成モジュール33は具体的に、前記入力語義および/または変換語義の単語を、前記候補文型テンプレートの対応する位置に充填し、候補自然語句を取得し、このとき、前記変換語義と前記入力語義の語義類似度は、あらかじめ設定した閾値より高く、前記候補自然語句において各充填位置にあるサブ語義が構成する充填語義と、対応する候補文型テンプレートとの間のマッチング度を計算し、かつ前記マッチング度に基づき、マッチング度が所定のしきい値に達した自然語句を選別する。
＜実施例四＞
図５を参照に、本発明の実施例はさらに人数統計の電子機器を提供し、本発明の実施例の図１または図２に示す実施例のフローを実現できる。前記電子機器はパーソナルコンピュータ（PC）、タブレットコンピュータ及び各種スマートデバイス（スマートグラスまたはスマートフォンを含む）等でもよい。図５に示すように、上記電子機器50はプロセッサー51及びメモリを含んでよい。 Here, the phrase generation module 33 specifically fills the input meaning and / or conversion meaning words in corresponding positions of the candidate sentence template to obtain candidate natural phrases, and at this time, the conversion meaning And the meaning similarity of the input meaning is higher than a preset threshold value, and the matching degree between the filling meaning formed by the sub-meaning at each filling position in the candidate natural phrase and the corresponding candidate sentence template is calculated. Based on the matching degree, natural words / phrases whose matching degree has reached a predetermined threshold are selected.
<Example 4>
Referring to FIG. 5, the embodiment of the present invention further provides an electronic device for demographic statistics, and the flow of the embodiment shown in FIG. 1 or 2 of the embodiment of the present invention can be realized. The electronic device may be a personal computer (PC), a tablet computer, various smart devices (including smart glasses or smart phones), and the like. As shown in FIG. 5, the electronic device 50 may include a processor 51 and a memory.

該メモリにコンピュータプログラム命令が記憶されている。具体的には、該メモリはRAM（ランダムアクセスメモリ）52、ROM（リードオンリーメモリ）53を含んでよい。 Computer program instructions are stored in the memory. Specifically, the memory may include a RAM (Random Access Memory) 52 and a ROM (Read Only Memory) 53.

このうち、上記コンピュータプログラム命令が前記プロセッサー51によって動作されるとき、前記プロセッサー51に、コーパスの語句に基づき、あらかじめ定義した入力モードにマッチする少なくとも一つの文型テンプレートを生成するステップと、前記入力モードに基づき入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たす少なくとも一つの候補文型テンプレートを選出するステップと、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成するステップと、を実行させる。 Among these, when the computer program instruction is operated by the processor 51, the processor 51 generates at least one sentence template that matches a predefined input mode based on a corpus phrase, and the input mode Obtaining the input meaning, calculating the matching degree between the input meaning and the sentence pattern template, selecting at least one candidate sentence pattern template for which the matching condition satisfies a predetermined condition, and the input meaning and the candidate Generating a natural phrase based on the sentence pattern template.

図５に示すように、本発明の実施例が提供する電子機器にはさらに、ハードディスク54、入力装置55と表示装置56等の組立部品を含んでもよい。具体的には、入力装置55は、あらかじめ定義した入力モードと入力語義を取得するため、入力機能および/または受け入れ機能を有する装置、例えばキーボード、タッチパネル、各種インターフェースであってよい。前記表示装置56は生成した自然語句等の情報を表示できるLEDディスプレイパネルまたはディスプレイでよい。 As shown in FIG. 5, the electronic apparatus provided by the embodiment of the present invention may further include assembly parts such as a hard disk 54, an input device 55, and a display device 56. Specifically, the input device 55 may be a device having an input function and / or a reception function, such as a keyboard, a touch panel, and various interfaces, in order to acquire a predefined input mode and input meaning. The display device 56 may be an LED display panel or display capable of displaying information such as generated natural phrases.

上記プロセッサー51、RAM52、ROM53、ハードディスク54、入力装置55と表示装置56はバスアーキテクチャを通じ相互接続できる。バスアーキテクチャは任意の数量の相互接続するバスとブリッジを含んでもよい。具体的には、プロセッサー51が代表する一つ或いは多くのセントラル・プロセッシング・ユニット（CPU）、及びRAM 52とROM 53が代表する一つ或いは多くのメモリの各種回路を一緒に接続する。バスアーキテクチャはさらに、例えば周辺機器、電圧レギュレーター及び電力管理回路等といった各種その他回路を一緒に接続してもよいが、これらは本分野では公知のことであるから、本文ではこれに対し詳細な説明はしない。 The processor 51, RAM 52, ROM 53, hard disk 54, input device 55 and display device 56 can be interconnected through a bus architecture. A bus architecture may include any number of interconnecting buses and bridges. Specifically, one or many central processing units (CPUs) represented by the processor 51 and various circuits of one or many memories represented by the RAM 52 and the ROM 53 are connected together. The bus architecture may further connect various other circuits together, such as peripheral devices, voltage regulators and power management circuits, for example, which are well known in the art and will be described in detail herein. I do not.

上記入力装置55は、ネットワークが求めるデータのサンプルをハードディスク54に入力かつ保存する。 The input device 55 inputs and stores a sample of data required by the network in the hard disk 54.

上記RAM52とROM53は、メモリシステムが動作するために不可欠なプログラムとデータ、並びにプロセッサーの計算過程における中間結果等のデータを記憶するために用いられる。 The RAM 52 and ROM 53 are used for storing programs and data essential for the operation of the memory system, and data such as intermediate results in the calculation process of the processor.

本明細書が提供する上記各実施例において、明らかにした方法と装置は、その他の方式で実現してもよいことを理解されたい。例えば、これまで説明した装置の実施例は単に例示的なものであって、例えば、前記ユニットの分割は、単にロジック機能で分割しただけであって、実際の実現の際には別の分割方式であってもよいし、例えば多くのユニットまたはコンポーネントを結合するか、別のシステムに統合してもよいし、或いはいくらかの特徴を無視したり、実行しなくてもよい。もう一つの点は、表示または論じた相互間の連結、直接連結または通信接続は、いくつかのインターフェース、装置またはユニットを通じた間接連結または通信接続であってもよいし、電気接続、機械またはその他の形式でよい。 It should be understood that in each of the embodiments provided herein, the disclosed method and apparatus may be implemented in other ways. For example, the embodiments of the apparatus described so far are merely exemplary, and for example, the division of the unit is merely divided by a logic function, and another division scheme is used in actual implementation. It may be, for example, many units or components may be combined, integrated into another system, or some features may be ignored or not performed. Another point is that the coupling, direct coupling or communication connection between the displayed or discussed may be an indirect coupling or communication connection through several interfaces, devices or units, or an electrical connection, machine or other The format is acceptable.

また、本発明の各実施例における各機能ユニットは一つの処理ユニットに統合してもよいし、各ユニット単独で物理的に包括してもよいし、二つまたは二つ以上のユニットを一つのユニットに統合してもよい。上記統合したユニットはハードウェア形式で実現してもよいし、ハードウェアにソフトウェア機能ユニットを加える形式で実現してもよい。 In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included alone, or two or more units may be combined into one unit. It may be integrated into the unit. The integrated unit may be realized in a hardware format, or may be realized in a format in which a software function unit is added to hardware.

上記ソフト機能ユニットの形式で実現した統合ユニットは、一つのコンピュータの読み取り可能メモリ媒体に記憶してよい。上記ソフト機能ユニットを一つのメモリ媒体に記憶するとき、若干の指令で一台のコンピュータ装置（パーソナルコンピュータ、サーバー、或いはネット装置等でよい）に本発明の各実施例が述べる送受信方法の一部ステップを実行させることを含む。前記のメモリ媒体は、USBフラッシュディスク、モバイルハードディスク、リードオンリーメモリ（Read-Only Memory、ROMと略称する）、ランダムアクセスメモリ（Random Access Memory、RAMと略称する）、磁気ディスクまたはコンパクトディスク等のプログラムコードを記憶できる各種媒体を含む。 The integrated unit realized in the form of the software function unit may be stored in a readable memory medium of one computer. When storing the above-mentioned software function unit in one memory medium, a part of the transmission / reception method described in each embodiment of the present invention in one computer device (personal computer, server, network device, etc.) with a few commands Including executing a step. The memory medium is a program such as a USB flash disk, a mobile hard disk, a read-only memory (read-only memory, abbreviated as ROM), a random access memory (abbreviated as a random access memory, RAM), a magnetic disk, or a compact disk. Includes various media that can store codes.

例えば、本発明の実施例の１つの態様では、コーパスにおける語句に基づき、予め定義された入力モードにマッチした少なくとも一つの文型テンプレートを生成するステップと、前記入力モードに基づく入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たしている少なくとも一つの候補文型テンプレートを選択するステップと、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成するステップと、をコンピュータに実行させるための自然言語の生成プログラムを提供する。 For example, in one aspect of an embodiment of the present invention, the step of generating at least one sentence template that matches a predefined input mode based on a phrase in the corpus, and acquiring the input semantics based on the input mode, Calculating a matching degree between the input meaning and the sentence pattern template, selecting at least one candidate sentence pattern template for which the matching degree satisfies a predetermined condition; and based on the input meaning and the candidate sentence pattern template, A program for generating a natural language for causing a computer to execute a step of generating a phrase is provided.

また、本発明の実施例のもう１つの態様では、コーパスにおける語句に基づき、予め定義された入力モードにマッチした少なくとも一つの文型テンプレートを生成するステップと、前記入力モードに基づく入力語義を取得し、前記入力語義と文型テンプレートとの間のマッチング度を計算し、前記マッチング度が所定条件を満たしている少なくとも一つの候補文型テンプレートを選択するステップと、前記入力語義と前記候補文型テンプレートに基づき、自然語句を生成するステップと、をコンピュータに実行させるための自然言語の生成プログラムを記録した記録媒体を提供する。 In another aspect of the embodiment of the present invention, a step of generating at least one sentence pattern template that matches a predefined input mode based on a phrase in the corpus, and acquiring an input meaning based on the input mode are obtained. , Calculating a matching degree between the input meaning and the sentence pattern template, selecting at least one candidate sentence pattern template for which the matching degree satisfies a predetermined condition, and based on the input meaning and the candidate sentence pattern template, There is provided a recording medium on which a natural language generation program for causing a computer to execute a natural language phrase generation step is recorded.

以上は本発明の好ましい実施例を説明しているが、当業者にとって、本発明の前記原理から逸脱しない前提の下で、若干の改良と変形を行うことが可能であり、これらの改良と変形も本発明の保護範囲内に属するものである。 Although the preferred embodiments of the present invention have been described above, those skilled in the art can make slight improvements and modifications without departing from the principles of the present invention. Are also within the protection scope of the present invention.

Claims

Generating at least one sentence template matching a predefined input mode based on words in the corpus;
Obtaining an input meaning based on the input mode, calculating a matching degree between the input meaning and a sentence pattern template, and selecting at least one candidate sentence pattern template in which the matching degree satisfies a predetermined condition;
Generating a natural phrase based on the input meaning and the candidate sentence template;
A natural language generation method characterized by comprising:

After generating at least one sentence template that matches a predefined input mode,
Calculating the similarity between each two sentence template,
When calculating the matching degree between the input meaning and the sentence template, to calculate the matching degree based on the similarity between the current sentence template and the other sentence template for calculating the matching degree The natural language generation method according to claim 1, wherein the next sentence pattern template is determined.

Calculating the similarity between each of the two sentence template,
Calculating similarity Sim (p ₁ , p ₂ ) between each two sentence template based on the following formula:

this house,

w indicates the word corresponding to the sub meaning, p ₁ and p ₂ indicate the first sentence pattern template and the second sentence pattern template of each of the two sentence pattern templates, s indicates one filling position in the sentence pattern template, T (p, s) indicates a set of words that can be filled in the filling position s of the sentence template p in the corpus, Num (T ()) indicates the number of words in the set T (), and n is T (p, s) word indicates the number, theta _w represents a preset weighting coefficient word w, x represents a word of T (p, s), cos (w, x) is the cosine similarity of words w and x T (p ₁ , s) ∩T (p ₂ , s) indicates the intersection of the two sets, T (p ₁ , s) ∪T (p ₂ , s) indicates the merger of the two sets,
(Outside 1)

The method of generating a natural language according to claim 2, wherein indicates that addition is performed on Y values corresponding to all filling positions s in the sentence template.

The step of calculating the matching degree between the input meaning and the sentence pattern template includes:
Determining a first set of words that can be filled in the filling position in the corpus, for each sub-meaning in the input meaning, based on the filling position in the sentence template of the sub-sense, respectively;
Calculating a matching factor between the sub-sense and the corresponding filling position in the sentence template based on the cosine similarity between the sub-sense and each word in the first set, the matching factor and the cosine similarity The degree has a positive correlation, and
Calculating a matching degree between the input meaning and the sentence pattern template based on a matching factor between each sub meaning and a corresponding filling position in the sentence pattern template;
The natural language generation method according to any one of claims 1 to 3, wherein:

Based on the input meaning and the candidate sentence template, the step of generating a natural phrase is:
Filling the input semantic meaning and / or conversion semantic meaning words into corresponding positions in the candidate sentence template to obtain candidate natural phrases, wherein the semantic meaning of the conversion semantic meaning and the input semantic meaning is preset. Higher than the threshold value, and
The degree of matching between the filling meaning defined by the sub-meaning of each filling position in the candidate natural phrase and the corresponding candidate sentence template is calculated, and the degree of matching reaches a predetermined threshold based on the degree of matching Selecting natural language phrases;
The natural language generation method according to claim 1, comprising:

A template acquisition module that generates at least one sentence template that matches a predefined input mode based on words in the corpus;
A template selection module that acquires an input meaning based on the input mode, calculates a matching degree between the input meaning and a sentence pattern template, and selects at least one candidate sentence pattern template in which the matching degree satisfies a predetermined condition; ,
A phrase generation module that generates a natural phrase based on the input meaning and the candidate sentence template,
A natural language generation device characterized by comprising:

A similarity calculation module that calculates a similarity between each two sentence pattern templates after at least one sentence pattern template that matches a predefined input mode is generated by the template acquisition module;
The template selection module calculates the matching degree between the input meaning and the sentence pattern template based on the similarity between the current sentence pattern template and the other sentence pattern templates for calculating the matching degree. The natural language generation apparatus according to claim 6, wherein a next sentence pattern template for calculating a matching degree is determined.

The similarity calculation module includes:
Based on the following formula, calculate the similarity Sim (p ₁ , p ₂ ) between each two sentence template,

this house,

The natural language generation device according to claim 7, wherein indicates that addition is performed on Y values corresponding to all filling positions s in the sentence template.

The template selection module is
For each sub meaning in the input meaning, a first set of words that can be filled in the filling position in the corpus is determined based on the filling position in the sentence pattern template of the sub meaning,
Based on the cosine similarity between the sub meaning and each word in the first set, a matching factor between the sub meaning and the corresponding filling position in the sentence template is calculated and obtained, and the matching factor and the cosine similarity are Has a positive correlation,
9. The degree of matching between the input meaning and the sentence pattern template is calculated based on a matching factor between each sub meaning and a corresponding filling position in the sentence pattern template. Natural language generator.

The phrase generation module
The input meaning and / or conversion meaning words are filled in corresponding positions of the candidate sentence template, candidate natural phrases are obtained, and the meaning similarity between the conversion meaning and the input meaning is higher than a preset threshold value. high,
The matching degree between the filling meaning defined by the sub-meaning of each filling position in the candidate natural word and the corresponding candidate sentence template is calculated, and the matching degree reaches a predetermined threshold based on the matching degree. 7. The natural language generation apparatus according to claim 6, wherein the natural language phrases are selected.

A processor;
An electronic device including a memory in which computer program instructions are stored,
When the computer program instructions are executed by the processor,
Generating at least one sentence template matching a predefined input mode based on words in the corpus;
Obtaining an input meaning based on the input mode, calculating a matching degree between the input meaning and a sentence pattern template, and selecting at least one candidate sentence pattern template in which the matching degree satisfies a predetermined condition;
An electronic apparatus that causes the processor to execute a step of generating a natural phrase based on the input meaning and the candidate sentence template.

Generating at least one sentence template matching a predefined input mode based on words in the corpus;
Obtaining an input meaning based on the input mode, calculating a matching degree between the input meaning and a sentence pattern template, and selecting at least one candidate sentence pattern template in which the matching degree satisfies a predetermined condition;
A natural language generation program for causing a computer to execute a step of generating a natural phrase based on the input meaning and the candidate sentence template.

Generating at least one sentence template matching a predefined input mode based on words in the corpus;
Obtaining an input meaning based on the input mode, calculating a matching degree between the input meaning and a sentence pattern template, and selecting at least one candidate sentence pattern template in which the matching degree satisfies a predetermined condition;
A recording medium recording a natural language generation program for causing a computer to execute a step of generating a natural phrase based on the input meaning and the candidate sentence template.