JP6849723B2

JP6849723B2 - Methods and devices for generating information

Info

Publication number: JP6849723B2
Application number: JP2019052668A
Authority: JP
Inventors: ユグァン・チェン; ルゥ・パン; ウェンハオ・チェン; ホイ・シュウ; ウェイナ・チェン; ユホン・チェン
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-06-05
Filing date: 2019-03-20
Publication date: 2021-03-24
Anticipated expiration: 2039-03-20
Also published as: US20190370272A1; JP2019212289A; KR102290767B1; EP3579119A1; KR20190138562A; US11494420B2; CN110569494A; CN110569494B

Description

本発明の実施例は、コンピュータ技術分野に関し、具体的には、情報を生成するための方法及び装置に関する。 Examples of the present invention relate to the field of computer technology, specifically to methods and devices for generating information.

現在、通常は、固有表現抽出（ＮａｍｅｄＥｎｔｉｔｙＲｅｃｏｇｎｉｔｉｏｎ，ＮＥＲ）技術及びエンティティ・リンキング（ＥｎｔｉｔｙＬｉｎｋｉｎｇ，ＥＬ）技術を用いてテキストの中のエンティティをマイニングすることができる。なお、ＮＥＲは、人物、企業等の固有名詞を認識することができる。ＥＬは、テキストの中の単語と知識図鑑の中のエンティティをリンクさせてエンティティの同一指示という問題を解決することができる。しかしながら、現在は、イベントを認識してリンクすることができない。 Currently, it is usually possible to mine entities in text using Named Entity Recognition (NER) techniques and Entity Linking (EL) techniques. In addition, NER can recognize proper nouns such as a person and a company. The EL can solve the problem of the same instruction of an entity by linking the word in the text and the entity in the knowledge picture book. However, it is currently not possible to recognize and link events.

本発明の実施例は、情報を生成するための方法及び装置を提供する。 The embodiments of the present invention provide methods and devices for generating information.

第１の態様において、本発明の実施例は、情報を生成するための方法を提供する。前記方法は、オブジェクト及び前記オブジェクトに対する記述情報を含むターゲットテキストを受信するステップと、ターゲットテキストに対して依存構文解析を行ってターゲットテキストの依存関係ツリーを生成するステップと、予め設定された少なくとも１つの構文構造ツリーと前記依存関係ツリーをマッチングさせて、主語、述語及び目的語からなる少なくとも１つのトリプレット（三つ組）を取得するステップと、前記少なくとも１つのトリプレットのうちのトリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するステップと、を含む。 In a first aspect, the embodiments of the present invention provide a method for generating information. The method includes a step of receiving an object and a target text containing descriptive information about the object, a step of performing dependency parsing on the target text to generate a dependency tree of the target text, and at least one preset. A step of matching one parse structure tree with the dependency tree to obtain at least one triplet (triplet) consisting of a subject, a predicate, and an object, a word contained in the triplet of the at least one triplet, and the above. It comprises the step of determining one target triplet from the at least one triplet based on the preset weights of the parsed structure tree that are matched to obtain the triplet.

いくつかの実施例においては、前記少なくとも１つのトリプレットのうちのトリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するステップは、前記依存関係ツリーに基づいてターゲットテキストの中の数量詞及び連体修飾語を決定するステップと、前記数量詞が修飾するオブジェクト及び前記連体修飾語が修飾するオブジェクトを決定するステップと、決定された数量詞、連体修飾語及びオブジェクトに基づいて、前記少なくとも１つのトリプレットを更新するステップと、更新された少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するステップと、を含む。 In some embodiments, from the at least one triplet, based on the words contained in the triplet of the at least one triplet and the preset weights of the syntactic structure tree that are matched to obtain the triplet. The step of determining one target triplet is to determine the quantifier and the modifier of the association in the target text based on the dependency tree, and the object to be modified by the quantifier and the object to be modified by the modifier of the association. A step of updating the at least one triplet based on the determined quantifier, the modifier and the object, and a step of determining one target triplet from the updated at least one triplet.

いくつかの実施例においては、前記決定された数量詞、連体修飾語及びオブジェクトに基づいて、前記少なくとも１つのトリプレットを更新するステップは、前記少なくとも１つのトリプレットのうちのトリプレットに対して、決定されたオブジェクトが前記トリプレットの主語又は目的語と一致するか否かを判定するステップと、決定されたオブジェクトが前記トリプレットの主語と一致することが判定されたことに応答して、決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの主語を結合し、結合後のテキストを前記トリプレットの主語として決定するステップと、決定されたオブジェクトが前記トリプレットの目的語と一致することが判定されたことに応答して、決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの目的語を結合し、結合後のテキストを前記トリプレットの目的語として決定するステップと、を含む。 In some embodiments, the step of updating the at least one triplet based on the determined quantifier, adjunct modifier and object was determined for the triplet of the at least one triplet. Modifies the determined object in response to the step of determining whether the object matches the subject or object of the triplet and the determination that the determined object matches the subject of the triplet. The step of combining the quantity words, the modifiers to be used, and the subject of the triplet, and determining the combined text as the subject of the triplet, and the determination that the determined object matches the subject of the triplet. In response, it includes a step of combining a quantifier, a coalition modifier, and the subject of the triplet that modify the determined object, and determining the combined text as the subject of the triplet.

いくつかの実施例においては、前記少なくとも１つのトリプレットのうちのトリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するステップは、前記少なくとも１つのトリプレットのうちのトリプレットに対して、前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みを決定し、前記トリプレットに含まれる単語の文字数を決定するステップと、前記トリプレットに含まれる単語の共起度を決定し、決定された重み、文字数及び共起度に基づいて前記トリプレットの得点を決定するステップと、前記少なくとも１つのトリプレットのうちの、得点が最も高いトリプレットをターゲットトリプレットとして決定するステップと、を含む。 In some embodiments, from the at least one triplet, based on the words contained in the triplet of the at least one triplet and the preset weights of the syntactic structure tree that are matched to obtain the triplet. The step of determining one target triplet determines, for the triplet of at least one triplet, the preset weights of the syntactic structure tree that are matched to obtain the triplet and is included in the triplet. At least one of the steps of determining the number of characters of a word, determining the degree of co-occurrence of words contained in the triplet, and determining the score of the triplet based on the determined weight, the number of characters and the degree of co-occurrence. This includes the step of determining the triplet with the highest score as the target triplet.

いくつかの実施例においては、前記方法は、少なくとも１つの履歴ターゲットトリプレットを取得するステップと、前記少なくとも１つの履歴ターゲットトリプレットのうちの、所定の構文構造ツリーをマッチングさせることによって取得された履歴ターゲットトリプレットの数を統計するステップと、統計結果に基づいて前記少なくとも１つの構文構造ツリーの重みを決定するステップと、を更に含む。 In some embodiments, the method obtains a historical target obtained by matching a predetermined syntax structure tree of the at least one historical target triplet with the step of obtaining at least one historical target triplet. It further includes a step of statistics on the number of triplets and a step of determining the weight of the at least one syntactic structure tree based on the statistical result.

いくつかの実施例においては、前記方法は、前記ターゲットトリプレットに基づいて予め設定された履歴イベント情報集合中のターゲットテキストと関連する少なくとも１つの履歴イベント情報を決定するステップと、ターゲットテキストと前記少なくとも１つの履歴イベント情報の類似度を決定するステップと、ターゲットテキストとの類似度が最も高い履歴イベント情報を出力するステップと、を更に含む。 In some embodiments, the method comprises determining at least one historical event information associated with a target text in a preset historical event information set based on the target triplet, and the target text and at least said. It further includes a step of determining the similarity of one historical event information and a step of outputting the historical event information having the highest similarity with the target text.

いくつかの実施例においては、履歴イベント情報は、参加者情報及びトリガーワード情報を含む。前記ターゲットトリプレットに基づいて予め設定された履歴イベント情報集合中のターゲットテキストと関連する少なくとも１つの履歴イベント情報を取得するステップは、前記ターゲットトリプレットの主語又は目的語が前記履歴イベント情報集合の中の履歴イベント情報の参加者情報と一致する条件、及び、前記ターゲットトリプレットの述語が前記履歴イベント情報集合の中の履歴イベント情報のトリガーワード情報と一致する条件を満たすか否かを判定するステップと、履歴イベント情報が、ターゲットテキストと関連している上述した各条件のうちの少なくとも１つを満たすことを決定するステップと、を含む。 In some embodiments, the historical event information includes participant information and trigger word information. In the step of acquiring at least one historical event information related to the target text in the historical event information set preset based on the target triplet, the subject or object of the target triplet is in the historical event information set. A step of determining whether or not a condition that matches the participant information of the history event information and a condition that the predicate of the target triplet matches the trigger word information of the history event information in the history event information set are satisfied. Includes a step of determining that the historical event information meets at least one of the above-mentioned conditions associated with the target text.

いくつかの実施例においては、履歴イベント情報は、キーワードを含む。前記ターゲットテキストと前記少なくとも１つの履歴イベント情報の類似度を決定するステップは、ターゲットテキストを分割することによって、第１の単語集合を取得するステップと、前記少なくとも１つの履歴イベント情報のうちの履歴イベント情報に対して、前記履歴イベント情報に含まれる各キーワードを連結し、連結されたテキストを分割することによって、第２の単語集合を取得するステップと、前記第１の単語集合及び前記第２の単語集合に基づいてターゲットテキストと前記履歴イベント情報の類似度を決定するステップと、を含む。 In some embodiments, the historical event information includes keywords. The step of determining the similarity between the target text and the at least one historical event information includes a step of acquiring a first word set by dividing the target text and a history of the at least one historical event information. A step of acquiring a second word set by concatenating each keyword included in the history event information with respect to the event information and dividing the concatenated text, and the first word set and the second word set. Includes a step of determining the similarity between the target text and the historical event information based on the word set of.

第２の態様において、本発明の実施例は、情報を生成するための装置を提供する。前記装置は、オブジェクト及び前記オブジェクトに対する記述情報を含むターゲットテキストを受信するように構成されるターゲットテキスト受信ユニットと、ターゲットテキストに対して依存構文解析を行ってターゲットテキストの依存関係ツリーを生成するように構成される依存関係ツリー生成ユニットと、予め設定された少なくとも１つの構文構造ツリーと前記依存関係ツリーをマッチングさせて、主語、述語及び目的語からなる少なくとも１つのトリプレットを取得するように構成されるトリプレット決定ユニットと、前記少なくとも１つのトリプレットにおける１つのトリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するように構成されるターゲットトリプレット決定ユニットとを含む。 In a second aspect, the embodiments of the present invention provide an apparatus for generating information. The device is configured to receive an object and a target text containing descriptive information about the object, and to perform dependency parsing on the target text to generate a dependency tree for the target text. The dependency tree generation unit configured in is matched with at least one preset syntactic structure tree and the dependency tree to obtain at least one triplet consisting of a subject, a predicate, and an object. One from the at least one triplet based on the triplet determination unit and the preset weights of the syntactic tree matched to obtain the word contained in one triplet in the at least one triplet and the triplet. Includes a target triplet determination unit configured to determine the target triplet.

いくつかの実施形態においては、前記ターゲットトリプレット決定ユニットは、前記依存関係ツリーに基づいてターゲットテキストの中の数量詞及び連体修飾語を決定するように構成される連体修飾語決定モジュールと、前記数量詞が修飾するオブジェクト及び前記連体修飾語が修飾するオブジェクトを決定するように構成されるオブジェクト決定モジュールと、決定された数量詞、連体修飾語及びオブジェクトに基づいて、前記少なくとも１つのトリプレットを更新するように構成されるトリプレット更新モジュールと、更新された少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するように構成されるターゲットトリプレット決定モジュールとを含む。 In some embodiments, the target triplet determination unit comprises a quantifier modifier configuration module configured to determine a quantifier and a quantifier in the target text based on the dependency tree, and the quantifier. An object determination module configured to determine the object to be modified and the object to be modified by the association modifier, and to update the at least one triplet based on the determined quantifier, association modifier and object. It includes a triplet update module that is modified and a target triplet determination module that is configured to determine one target triplet from at least one updated triplet.

いくつかの実施形態においては、前記トリプレット更新モジュールは、更に、前記少なくとも１つのトリプレットのうちのトリプレットに対して、決定されたオブジェクトが前記トリプレットの主語又は目的語と一致するか否かを判定し、決定されたオブジェクトが前記トリプレットの主語と一致することが判定されたことに応答して、決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの主語を結合し、結合後のテキストを前記トリプレットの主語として決定し、決定されたオブジェクトが前記トリプレットの目的語と一致することが判定されたことに応答して、決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの目的語を結合し、結合後のテキストを前記トリプレットの目的語として決定するように構成される。 In some embodiments, the triplet update module further determines, for the triplet of at least one triplet, whether the determined object matches the subject or object of the triplet. , In response to the determination that the determined object matches the subject of the triplet, combine the quantifiers, modifiers and the subject of the triplet that modify the determined object, and combine the combined text. A quantifier, a coalition modifier, and an object of the triplet that are determined as the subject of the triplet and that modify the determined object in response to the determination that the determined object matches the object of the triplet. Is combined, and the combined text is determined as the object of the triplet.

いくつかの実施例においては、前記ターゲットトリプレット決定ユニットは、更に、前記少なくとも１つのトリプレットのうちのトリプレットに対して、前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みを決定し、前記トリプレットに含まれる単語の文字数を決定し、前記トリプレットに含まれる単語の共起度を決定し、決定された重み、文字数及び共起度に基づいて前記トリプレットの得点を決定し、前記少なくとも１つのトリプレットのうちの、得点が最も高いトリプレットをターゲットトリプレットとして決定するように構成される。 In some embodiments, the target triplet determination unit further determines, for the triplet of at least one triplet, the preset weights of the syntactic structure tree that are matched to obtain the triplet. Then, the number of characters of the word contained in the triplet is determined, the degree of co-occurrence of the word contained in the triplet is determined, the score of the triplet is determined based on the determined weight, the number of characters and the degree of co-occurrence, and the score of the triplet is determined. The triplet with the highest score among at least one triplet is configured to be determined as the target triplet.

いくつかの実施例においては、前記装置は、少なくとも１つの履歴ターゲットトリプレットを取得するように構成される履歴ターゲットトリプレットモジュールと、前記少なくとも１つの履歴ターゲットトリプレットのうちの、所定の構文構造ツリーをマッチングさせることによって取得された履歴ターゲットトリプレットの数を統計するように構成されるトリプレット数量統計モジュールと、統計結果に基づいて前記少なくとも１つの構文構造ツリーの重みを決定するように構成される重み決定モジュールと、からなる重み設置ユニットを更に含む。 In some embodiments, the device matches a given syntax structure tree of the at least one historical target triplet with a historical target triplet module configured to acquire at least one historical target triplet. A triplet quantity statistics module configured to statistic the number of historical target triplets obtained by letting, and a weighting module configured to determine the weight of at least one syntax structure tree based on the statistical results. Further includes a weight installation unit consisting of

いくつかの実施例においては、前記装置は、前記ターゲットトリプレットに基づいて予め設定された履歴イベント情報集合中のターゲットテキストと関連する少なくとも１つの履歴イベント情報を決定するように構成される履歴イベント情報決定ユニットと、ターゲットテキストと前記少なくとも１つの履歴イベント情報の類似度を決定するように構成される類似度決定ユニットと、ターゲットテキストとの類似度が最も高い履歴イベント情報を出力するように構成される履歴イベント情報出力ユニットとを更に含む。 In some embodiments, the device is configured to determine at least one historical event information associated with a target text in a preset historical event information set based on the target triplet. The determination unit, the similarity determination unit configured to determine the similarity between the target text and at least one historical event information, and the history event information having the highest similarity to the target text are output. It also includes a history event information output unit.

いくつかの実施例においては、履歴イベント情報は、参加者情報及びトリガーワード情報を含む。前記履歴イベント情報決定ユニットは、更に、前記ターゲットトリプレットの主語又は目的語が前記履歴イベント情報集合の中の履歴イベント情報の参加者情報と一致する条件、及び、前記ターゲットトリプレットの述語が前記履歴イベント情報集合の中の履歴イベント情報のトリガーワード情報と一致する条件を満たすか否かを判定し、履歴イベント情報が、ターゲットテキストと関連している上述した各条件のうちの少なくとも１つを満たすことを決定するように構成される。 In some embodiments, the historical event information includes participant information and trigger word information. The history event information determination unit further has a condition in which the subject or object of the target triplet matches the participant information of the history event information in the history event information set, and the predicate of the target triplet is the history event. It is determined whether or not the condition matching the trigger word information of the historical event information in the information set is satisfied, and the historical event information satisfies at least one of the above-mentioned conditions related to the target text. Is configured to determine.

いくつかの実施例においては、履歴イベント情報は、キーワードを含む。前記類似度決定ユニットは、更に、ターゲットテキストを分割することによって、第１の単語集合を取得し、前記少なくとも１つの履歴イベント情報のうちの履歴イベント情報に対して、前記履歴イベント情報に含まれる各キーワードを連結し、連結されたテキストを分割することによって、第２の単語集合を取得し、前記第１の単語集合及び前記第２の単語集合に基づいて、ターゲットテキストと前記履歴イベント情報の類似度を決定するように構成される。 In some embodiments, the historical event information includes keywords. The similarity determination unit further acquires a first word set by dividing the target text, and is included in the history event information with respect to the history event information in the at least one history event information. By concatenating each keyword and dividing the concatenated text, a second word set is acquired, and based on the first word set and the second word set, the target text and the historical event information It is configured to determine the degree of similarity.

第３の態様においては、本発明の実施例は、設備であって、１つ又は複数のプロセッサと、１つ又は複数のプログラムが格納されている記憶装置とを備え、前記１つ又は複数のプログラムが前記１つ又は複数のプロセッサーにより実行される場合の、第１の態様のいずれかの実施例に記載の方法を前記１つ又は複数のプロセッサに実現させる設備を提供する。 In a third aspect, an embodiment of the invention comprises equipment comprising one or more processors and a storage device in which one or more programs are stored, said one or more. Provided is a facility for realizing the method described in the embodiment of any one of the first aspects in the one or more processors when the program is executed by the one or more processors.

第４の態様においては、本発明の実施例は、コンピュータプログラムが格納されているコンピュータ可読記憶媒体であって、前記プログラムがプロセッサにより実行される場合の、第１の態様のいずれかの実施例に記載の方法を実現するコンピュータ可読記憶媒体を提供する。 In a fourth aspect, the embodiment of the present invention is a computer-readable storage medium in which a computer program is stored, and the embodiment of any one of the first aspects when the program is executed by a processor. Provided is a computer-readable storage medium that realizes the method described in 1.

本発明の前記実施例による情報を生成するための方法及び装置は、ターゲットテキストが受信された後、ターゲットテキストに対して依存構文解析を行ってターゲットテキストの依存関係ツリーを生成することができる。そして、予め設定された少なくとも１つの構文構造ツリーを前記依存関係ツリーとマッチングさせることによって、少なくとも１つのトリプレットを取得する。最後に、前記少なくとも１つのトリプレットにおける各トリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。本実施例の方法及び装置は、ターゲットテキストに含まれるイベントと最も関連するトリプレットを選択することができるので、ターゲットトリプレットの抽出正確率が向上される。 The method and apparatus for generating the information according to the embodiment of the present invention can generate a dependency tree of the target text by performing a dependency parsing on the target text after the target text is received. Then, at least one triplet is acquired by matching at least one preset syntax structure tree with the dependency tree. Finally, one target triplet is determined from the at least one triplet based on the words contained in each triplet in the at least one triplet and the preset weights of the syntactic structure tree matched to obtain the triplet. To do. Since the method and device of this embodiment can select the triplet most related to the event contained in the target text, the extraction accuracy rate of the target triplet is improved.

本発明の他の特徴、目的及び利点は、以下の図面を参照してなされる非限定的な実施例に係る詳細な説明を読むことにより、より明らかになるであろう。
本発明の一実施例を適用可能な例示的なシステムアーキテクチャを示す図である。本発明に係る情報を生成するための方法の一実施例を示すフローチャートである。本発明に係る情報を生成するための方法の一実施例の依存関係ツリーを示す構造模式図である。本発明に係る情報を生成するための方法の一実施例の構文構造ツリーを示す構造模式図である。本発明に係る情報を生成するための方法において、図２ａに示す依存関係ツリーと図２ｂに示す構文構造ツリーとをマッチングさせることで取得された１つの候補トリプレットを示す構造模式図である。本発明に係る情報を生成するための方法において、図２ａに示す依存関係ツリーと図２ｂに示す構文構造ツリーとをマッチングさせることで取得されたもう１つの候補トリプレットを示す構造模式図である。本発明に係る情報を生成するための方法において、図２ａに示す依存関係ツリーと図２ｂに示す構文構造ツリーとをマッチングさせることで取得されたもう１つの候補トリプレットを示す構造模式図である。本発明に係る情報を生成するための方法の１つの応用シナリオを示す模式図である。本発明に係る情報を生成するための方法においてターゲットトリプレットを決定するフローチャートである。本発明に係る情報を生成するための方法のもう１つの実施例を示すフローチャートである。本発明に係る情報を生成するための装置の１つの実施例の構造模式図である。本発明の実施例を達成するための設備に適用されるコンピュータシステムの構造模式図である。 Other features, objectives and advantages of the present invention will become more apparent by reading the detailed description of the non-limiting examples made with reference to the drawings below.
It is a figure which shows the exemplary system architecture to which one Example of this invention is applicable. It is a flowchart which shows one Example of the method for generating the information which concerns on this invention. It is a structural schematic diagram which shows the dependency tree of one Example of the method for generating the information which concerns on this invention. It is a structural schematic diagram which shows the syntactic structure tree of one Example of the method for generating the information which concerns on this invention. FIG. 5 is a structural schematic diagram showing one candidate triplet obtained by matching the dependency tree shown in FIG. 2a with the syntactic structure tree shown in FIG. 2b in the method for generating information according to the present invention. FIG. 5 is a structural schematic diagram showing another candidate triplet obtained by matching the dependency tree shown in FIG. 2a with the syntactic structure tree shown in FIG. 2b in the method for generating information according to the present invention. FIG. 5 is a structural schematic diagram showing another candidate triplet obtained by matching the dependency tree shown in FIG. 2a with the syntactic structure tree shown in FIG. 2b in the method for generating information according to the present invention. It is a schematic diagram which shows one application scenario of the method for generating the information which concerns on this invention. It is a flowchart which determines a target triplet in the method for generating the information which concerns on this invention. It is a flowchart which shows another Example of the method for generating the information which concerns on this invention. It is a structural schematic diagram of one Example of the apparatus for generating the information which concerns on this invention. It is a structural schematic diagram of the computer system applied to the equipment for achieving the Example of this invention.

以下、図面及び実施例を参照しながら本発明をより詳細に説明する。理解すべきステップは、ここで説明する具体的な実施例は、関連する発明を説明するためのものに過ぎず、前記発明を限定するものではない。なお、説明の便宜上、図面には発明に関連する部分のみが示されている。 Hereinafter, the present invention will be described in more detail with reference to the drawings and examples. The steps to be understood are that the specific examples described herein are merely for explaining the related invention and are not intended to limit the invention. For convenience of explanation, only the parts related to the invention are shown in the drawings.

なお、本発明の実施例及び実施例における特徴は、矛盾を生じない限り、相互に組み合わせることができる。以下、図面及び実施例を参照しながら本発明を詳細に説明する。 It should be noted that the examples of the present invention and the features in the examples can be combined with each other as long as there is no contradiction. Hereinafter, the present invention will be described in detail with reference to the drawings and examples.

図１は、本発明に係る情報を生成するための方法又は情報を生成するための装置の実施例が適用可能な例示的なシステムアーキテクチャ１００を示す。 FIG. 1 shows an exemplary system architecture 100 to which examples of methods for generating information or devices for generating information according to the present invention can be applied.

図１に示すように、システムアーキテクチャ１００は、端末装置１０１、１０２、１０３、ネットワーク１０４及びサーバ１０５を含んでもよい。ネットワーク１０４は、端末装置１０１、１０２、１０３とサーバ１０５の間で通信リンクの媒体を提供するために使用される。ネットワーク１０４は、有線、無線通信リンク又は光ファイバケーブルなどの様々なタイプの接続を含んでもよい。 As shown in FIG. 1, system architecture 100 may include terminal devices 101, 102, 103, network 104 and server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various types of connections such as wired, wireless communication links or fiber optic cables.

ユーザは、メッセージを送受信するために、端末装置１０１、１０２、１０３を使用してネットワーク１０４を介してサーバ１０５と情報のやり取りをすることができる。端末装置１０１、１０２、１０３には、テキスト入力アプリケーション、ウェブブラウザアプリケーション、ショッピングアプリケーション、検索アプリケーション、インスタントコミュニケーションツール、メールボックスクライアント、ソーシャルプラットフォームソフトウェアなどの様々な通信クライアントアプリケーションをインストールすることができる。 The user can use the terminal devices 101, 102, 103 to exchange information with the server 105 via the network 104 in order to send and receive messages. Various communication client applications such as text input applications, web browser applications, shopping applications, search applications, instant communication tools, mailbox clients, and social platform software can be installed on the terminal devices 101, 102, and 103.

端末装置１０１、１０２、１０３は、ハードウェアでも、ソフトウェアであってもよい。端末装置１０１、１０２、１０３がハードウェアである場合、ディスプレイスクリーンを有し、テキスト入力がサポートされた様々な電子機器であってもよく、スマートフォン、タブレットコンピュータ、電子書籍リーダ、ＭＰ３（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐＡｕｄｉｏＬａｙｅｒＩＩＩ，ムービング・ピクチャー・エクスパーツ・グループ・オーディオ・レイヤー３）プレーヤー、ＭＰ４（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐＡｕｄｉｏＬａｙｅｒＩＶ，ムービング・ピクチャー・エクスパーツ・グループ・オーディオ・レイヤー４）プレーヤー、ラップトップポータブルコンピュータ及びデスクトップコンピュータ等を含むが、これらに限定されない。端末装置１０１、１０２、１０３がソフトウェアである場合、上述した電子機器にインストールされてもよい。それは、複数のソフトウェア又はソフトウェアモジュール（例えば、分散式サービスを提供するために用いられる）として実現されてもよく、単一のソフトウェア又はソフトウェアモジュールとして実現されてもよい。ここでは、特に限定されない。 The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they may be various electronic devices having a display screen and supporting text input, such as a smartphone, a tablet computer, an electronic book reader, and MP3 (Moving Picture Experts). Group Audio Layer III, Moving Picture Experts Group Audio Layer 3) Player, MP4 (Moving Computer Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer 4) Player, Laptop Portable Includes, but is not limited to, computers, desktop computers, and the like. When the terminal devices 101, 102, and 103 are software, they may be installed in the above-mentioned electronic device. It may be implemented as multiple software or software modules (eg, used to provide distributed services) or as a single software or software module. Here, it is not particularly limited.

サーバ１０５は、様々なサービスを提供するサーバであってもよく、例えば、端末装置１０１、１０２、１０３上で入力されたテキストをサポートするバックエンドサーバであってもよい。バックエンドサーバは、受信されたターゲットテキスト等のデータに対して解析等の処理を行い、さらに処理結果（例えば、ターゲットトリプレット）を端末装置１０１、１０２、１０３にフィードバックすることができる。 The server 105 may be a server that provides various services, and may be, for example, a back-end server that supports the text entered on the terminal devices 101, 102, 103. The back-end server can perform processing such as analysis on the received data such as target text, and further feed back the processing result (for example, target triplet) to the terminal devices 101, 102, 103.

サーバ１０５は、ハードウェアでも、ソフトウェアであってもよい。サーバ１０５がハードウェアである場合、複数のサーバから構成される分散式サーバークラスタとして実現されてもよいし、単一のサーバとして実現されてもよい。サーバ１０５がソフトウェアである場合、複数のソフトウェア又はソフトウェアモジュールとして実現されてもよいし（例えば、分散式サービスを提供するために用いられる）、単一のソフトウェア又はソフトウェアモジュールとして実現されてもよい。ここでは特に限定されない。 The server 105 may be hardware or software. When the server 105 is hardware, it may be realized as a distributed server cluster composed of a plurality of servers, or may be realized as a single server. When the server 105 is software, it may be implemented as multiple software or software modules (eg, used to provide distributed services) or as a single software or software module. There is no particular limitation here.

なお、本発明の実施例による情報を生成するための方法は、端末装置１０１、１０２、１０３によって実行されてもよく、サーバ１０５によって実行されてもよい。相応的には、情報を生成するための装置は、端末装置１０１、１０２、１０３に設置されてもよく、サーバ１０５に設置されてもよい。 The method for generating information according to the embodiment of the present invention may be executed by the terminal devices 101, 102, 103, or may be executed by the server 105. Correspondingly, the device for generating information may be installed in the terminal devices 101, 102, 103, or may be installed in the server 105.

本発明の実施例による情報を生成するための方法が端末装置１０１、１０２、１０３によって実行される場合、前記システムアーキテクチャ１００は、ネットワーク１０４とサーバ１０５を含まなくてもよいことを理解されたい。 It should be understood that the system architecture 100 may not include the network 104 and the server 105 if the method for generating information according to the embodiments of the present invention is performed by the terminal devices 101, 102, 103.

図１の端末装置、ネットワーク及びサーバの数は例示的なものに過ぎないことを理解されたい。必要に応じて、端末装置、ネットワーク及びサーバの数を任意に加減してもよい。 It should be understood that the number of terminal devices, networks and servers in FIG. 1 is only exemplary. If necessary, the number of terminal devices, networks, and servers may be arbitrarily adjusted.

次に、本発明による情報を生成するための方法の一実施例のフロー２００を示す図２を参照する。前記実施例の情報を生成するための方法は、ステップ２０１、ステップ２０２、ステップ２０３及びステップ２０４を含む。 Next, reference is made to FIG. 2 showing a flow 200 of an embodiment of the method for generating information according to the present invention. The method for generating the information of the embodiment includes step 201, step 202, step 203 and step 204.

ステップ２０１においては、ターゲットテキストを受信する。 In step 201, the target text is received.

本発明の実施例において、情報を生成するための方法の実行主体（例えば、図１に示す端末装置１０１、１０２、１０３又はサーバ１０５）は、ターゲットテキストを受信することができる。情報を生成するための方法の実行主体が端末装置である場合、ユーザが前記端末装置を介して入力したターゲットテキストを直接受信することができる。情報を生成するための方法の実行主体がサーバである場合、有線接続の方式又は無線接続の方式でユーザが使っている端末装置からターゲットテキストを受信することができる。前記ターゲットテキストは、オブジェクト及び前記オブジェクトに対する記述情報を含んでもよい。前記オブジェクトは、ＮＥＲ技術又はＥＬ技術により認識された任意のエンティティであってもよく、例えば、人物、企業等が挙げられる。前記記述情報は、前記オブジェクトを説明するための情報であってもよく、オブジェクトの状態を説明するための情報、前記オブジェクトの動作を説明するための情報等を含むが、これらに限定されない。 In an embodiment of the invention, the executing entity of the method for generating information (eg, terminal devices 101, 102, 103 or server 105 shown in FIG. 1) can receive the target text. When the executing body of the method for generating information is a terminal device, the target text input by the user via the terminal device can be directly received. When the execution subject of the method for generating information is a server, the target text can be received from the terminal device used by the user by the wired connection method or the wireless connection method. The target text may include an object and descriptive information about the object. The object may be any entity recognized by NER technology or EL technology, and examples thereof include a person, a company, and the like. The descriptive information may be information for explaining the object, and includes, but is not limited to, information for explaining the state of the object, information for explaining the operation of the object, and the like.

なお、前記無線接続の方式は、３Ｇ／４Ｇ接続、ＷｉＦｉ接続、ブルートォース接続、ＷｉＭＡＸ接続、Ｚｉｇｂｅｅ接続、ＵＷＢ（ｕｌｔｒａｗｉｄｅｂａｎｄ）接続及び他の現在知られている、又は将来開発する無線接続の方式を含んでもよいが、これらに限定されない。 The wireless connection method includes 3G / 4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other currently known or future wireless connection methods. It may include, but is not limited to.

ステップ２０２においては、ターゲットテキストに対して依存構文解析を行うことによって、ターゲットテキストの依存関係ツリーを生成する。 In step 202, the dependency tree of the target text is generated by performing the dependency parsing on the target text.

実行主体は、ターゲットテキストを受信した後に、ターゲットテキストに対して依存構文解析を行うことができる。依存構文は、従属関係構文とも言われ、フランスの言語学者であるＬ．Ｔｅｓｎｉｅｒｅによって２０世紀の５０年代に最初に提出された。依存構文は、単語間で形成される依存関係を使用して文の言語構造を記述する構造的構文である。依存構文の構造上の特徴を分かりやすく説明するために、依存関係ツリーを用いて表すことができる。依存関係ツリーの各ノードは、文中の単語に対応する。依存関係ツリーは、単語間の依存関係を表すだけではなく、単語の品詞（例えば、数量詞、助詞等）を表すこともでき、テキストにおける単語の機能（例えば、連体修飾語、副詞的修飾語等）を更に表すこともできる。実際の応用においては、実行主体は、様々なオープンソースツールキットによりターゲットテキストに対して依存構文解析を行うことができる。前記オープンソースツールキットは、米国スタンフォード大学のＳｔａｎｆｏｒｄＮＬＰグループにより提供されたオープンソースツールキットであるＳｔａｎｄｆｏｒｄＰａｒｓｅｒ、中国復旦大学のコンピュータサイエンス学院により開発されたオープンソースツールキットであるＦｕｄａｎＮＬＰ等を含んでもよい。 After receiving the target text, the executor can perform dependent parsing on the target text. Dependency syntax is also called dependency syntax, and is a French linguist, L.A. First submitted by Tenniere in the 1950s of the 20th century. Dependency syntax is a structural syntax that describes the linguistic structure of a sentence using the dependencies formed between words. In order to explain the structural features of the dependency syntax in an easy-to-understand manner, it can be expressed using a dependency tree. Each node in the dependency tree corresponds to a word in the sentence. The dependency tree can not only represent the dependency between words, but can also represent the part of speech of the word (eg, quantifier, particle, etc.), and the function of the word in the text (eg, adnominal modifier, adverbial modifier, etc.). ) Can also be further expressed. In practical applications, the executable can perform dependent parsing on the target text using various open source toolkits. The open source tool kit may include Standford Parser, which is an open source tool kit provided by the Standford NLP group of Stanford University in the United States, and Fudan NLP, which is an open source tool kit developed by the Computer Science Academy of Fudan University in China.

ステップ２０３においては、予め設定された少なくとも１つの構文構造ツリーと依存関係ツリーとをマッチングさせることによって、少なくとも１つのトリプレットを取得する。 In step 203, at least one triplet is acquired by matching at least one preset syntax structure tree with the dependency tree.

ターゲットテキストの依存関係ツリーが生成されると、実行主体は、予め設定された構文構造ツリーを依存関係ツリーとマッチングさせることができる。なお、構文構造ツリーのツリー状構造には、複数のノードを含み、構文構造ツリーには、各ノードに位置する単語の品詞を含むことができる。前記構文構造ツリーを依存関係ツリーとマッチングさせると、依存関係ツリーにおける構文構造ツリーと同じ依存関係を有する単語を取得することができる。同時に、取得された各単語の品詞は、構文構造ツリーにおける対応するノードの単語の品詞と同じである。 Once the target text dependency tree is generated, the executor can match the preset syntax structure tree with the dependency tree. The tree-like structure of the syntactic structure tree may include a plurality of nodes, and the syntactic structure tree may include the part of speech of the word located at each node. By matching the syntax structure tree with the dependency tree, words having the same dependency as the syntax structure tree in the dependency tree can be obtained. At the same time, the part of speech of each acquired word is the same as the part of speech of the word of the corresponding node in the syntactic structure tree.

一例として、図２ａは、ターゲットテキストの依存関係ツリーの構造を示しており、図２ｂは、構文構造ツリーの構造を示している。図２ｂに示す構文構造ツリーは、各ノードの単語の品詞を示しており、ただし、ｖ．は、動詞を表し、ｎ．は、名詞を表す。いくつかの選択的な実施態様では、次のように構文構造ツリーと依存関係ツリーとをマッチングさせることができる。まずは、単語の品詞を考慮せずに構文構造ツリーと依存関係ツリーの構造のみを考慮して、図２ｃ、図２ｄ及び図２ｅにおいて破線ノードに位置する単語からなる候補トリプレットを決定することができる。次に、図２ｃ、図２ｄ及び図２ｅに示す候補トリプレットにおける各ノードの単語の品詞と構文構造ツリーにおける各ノードの単語の品詞とをマッチングさせ、図２ｃに示すトリプレットの各単語の品詞と構文構造ツリーにおける各ノードの単語の品詞が同じであることを決定することができる。よって、図２ｃに示すトリプレットは、依存関係ツリーと構文構造ツリーとをマッチングした結果である。 As an example, FIG. 2a shows the structure of the target text dependency tree, and FIG. 2b shows the structure of the syntax structure tree. The syntactic structure tree shown in FIG. 2b shows the part of speech of the word of each node, however, v. Represents a verb, n. Represents a noun. In some selective embodiments, the syntactic structure tree and the dependency tree can be matched as follows: First, a candidate triplet consisting of words located at the dashed node in FIGS. 2c, 2d, and 2e can be determined by considering only the structure of the syntactic structure tree and the dependency tree without considering the part of speech of the word. .. Next, the part of speech of the word of each node in the candidate triplet shown in FIGS. 2c, 2d and 2e is matched with the part of speech of the word of each node in the syntactic structure tree, and the part of speech and syntax of each word of the triplet shown in FIG. 2c are matched. It can be determined that the part of speech of each node's word in the structure tree is the same. Therefore, the triplet shown in FIG. 2c is the result of matching the dependency tree and the syntactic structure tree.

トリプレットは、主語、述語及び目的語を含んでもよく、ここでのトリプレットは、広い意味でのトリプレットであってもよい。例えば、いくつかの語句に目的語が存在しないと、得られたトリプレットの中の目的語は、「ヌル」である。例えば、いくつかの語句は、並列述語を含むと、得られたトリプレットの中の述語は、２つの単語を含んでもよい。トリプレットにおける主語、述語及び目的語は、ターゲットテキストの主語、述語及び目的語と同一でも、異なっていてもよい。一例としては、ターゲットテキストが、「シェアサイクル業界は、２０１６年及び２０１７年の上半期において急速に拡張した後、２０１７年の下半期において徐々に減少傾向を示している」である場合、得られたトリプレットは、「シェアサイクル業界・拡張する・ヌル、シェアサイクル業界・減少傾向・示している」を含む可能性がある。ターゲットテキストは、その主語が「シェアサイクル業界」で、述語が「示している」で、目的語が「減少傾向」である。なお、第１のトリプレットにおける述語「拡張する」は、ターゲットテキストの述語「示している」とは異なる。第２のトリプレットにおける主語、述語及び目的語は、ターゲットテキストの主語、述語及び目的語と同じである。 The triplet may include a subject, a predicate and an object, and the triplet here may be a triplet in a broad sense. For example, if some words do not have an object, the object in the resulting triplet is "null". For example, some words may contain parallel predicates, and the predicates in the resulting triplet may contain two words. The subject, predicate and object in the triplet may be the same as or different from the subject, predicate and object of the target text. As an example, if the target text is "The share cycle industry is expanding rapidly in the first half of 2016 and 2017 and then gradually declining in the second half of 2017", the triplets obtained. May include "share cycle industry / expanding / null, share cycle industry / decreasing trend / showing". The subject of the target text is "share cycle industry", the predicate is "indicate", and the object is "declining trend". The predicate "extend" in the first triplet is different from the predicate "indicate" in the target text. The subject, predicate and object in the second triplet are the same as the subject, predicate and object of the target text.

ステップ２０４においては、少なくとも１つのトリプレットにおける１つのトリプレットに含まれる単語及びマッチングによって得られたトリプレットの構文構造ツリーの事前設定重みに基づいて、少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。 In step 204, one target triplet is determined from at least one triplet based on the words contained in one triplet in at least one triplet and the preset weights of the triplet's syntactic structure tree obtained by matching.

前記少なくとも１つのトリプレットが得られると、前記少なくとも１つのトリプレットの各トリプレットに対して、実行主体は、前記トリプレットに含まれる単語及びマッチングによって得られた前記トリプレットの構文構造ツリーの事前設定重みに基づいて、少なくとも１つのトリプレットから１つのターゲットトリプレットを決定することができる。なお、構文構造ツリーの重みは、技術者によって具体的な応用シナリオに応じて設定することができる。例えば、技術者は、前記少なくとも１つの構文構造ツリーから毎回１つの構文構造ツリーを選択してトリプレットのマッチングを行い、過去の時間範囲内において、構文構造ツリーがトリプレットマッチングのために選択された回数に基づいて重みを設定することができる。又は、技術者は、構文構造ツリーに含まれるノードの数に基づいて重みを設定することもできる。 When the at least one triplet is obtained, for each triplet of the at least one triplet, the executing entity is based on the words contained in the triplet and the preset weights of the syntax structure tree of the triplet obtained by matching. Therefore, one target triplet can be determined from at least one triplet. The weight of the syntactic structure tree can be set by an engineer according to a specific application scenario. For example, the technician selects one syntax structure tree from at least one syntax structure tree each time to perform triplet matching, and the number of times the syntax structure tree is selected for triplet matching in the past time range. Weights can be set based on. Alternatively, the technician can set the weights based on the number of nodes contained in the syntactic structure tree.

次に、本実施例による情報を生成するための方法の応用シナリオの模式図である図３を参照する。図３の応用シナリオにおいては、ユーザが端末から入力したターゲットテキストは、ビデオタイトルであり、端末は、前記ビデオタイトルをサーバに送信する。サーバは、前記ビデオタイトルを受信すると、まずは、ビデオタイトルの依存関係ツリーを生成することができる。次に、構文構造ツリーとマッチングさせることで、少なくとも１つのトリプレットを取得する。次に、少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。最後に、ターゲットトリプレットは、ユーザが見えるように、端末に出力される。 Next, refer to FIG. 3, which is a schematic diagram of an application scenario of the method for generating information according to this embodiment. In the application scenario of FIG. 3, the target text input by the user from the terminal is a video title, and the terminal transmits the video title to the server. Upon receiving the video title, the server can first generate a dependency tree for the video title. It then gets at least one triplet by matching it with the syntactic structure tree. Next, one target triplet is determined from at least one triplet. Finally, the target triplet is output to the terminal so that the user can see it.

本発明の前記実施例による情報を生成するための方法では、ターゲットテキストが受信されると、ターゲットテキストに対して依存構文解析を行ってターゲットテキストの依存関係ツリーを生成することができる。そして、予め設定された少なくとも１つの構文構造ツリーを前記依存関係ツリーとマッチングさせることによって、少なくとも１つのトリプレットを取得する。最後に、前記少なくとも１つのトリプレットにおける各トリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。本実施例の方法では、ターゲットテキストに含まれるイベントと最も関連するトリプレットを選出することができるので、ターゲットトリプレットの抽出正確率が向上された。 In the method for generating information according to the embodiment of the present invention, when the target text is received, the dependency parsing is performed on the target text to generate the dependency tree of the target text. Then, at least one triplet is acquired by matching at least one preset syntax structure tree with the dependency tree. Finally, one target triplet is determined from the at least one triplet based on the words contained in each triplet in the at least one triplet and the preset weights of the syntactic structure tree matched to obtain the triplet. To do. In the method of this embodiment, the triplet most related to the event contained in the target text can be selected, so that the extraction accuracy rate of the target triplet is improved.

本実施例のいくつかの所望による実施態様では、実行主体は、図２に示されていない次のステップに従って構文構造ツリーの重みを決定することができる。まずは、少なくとも１つの履歴ターゲットトリプレットを取得する。次に、前記少なくとも１つの履歴ターゲットトリプレットのうちの、所定の構文構造ツリーのマッチングによって得られた履歴ターゲットトリプレットの数を統計する。最後に、統計の結果に基づいて少なくとも１つの構文構造ツリーの重みを決定する。 In some desired embodiments of this embodiment, the performer can determine the weighting of the syntactic structure tree according to the following steps not shown in FIG. First, get at least one historical target triplet. Next, the number of historical target triplets obtained by matching a predetermined syntactic structure tree among the at least one historical target triplet is statistic. Finally, the weight of at least one syntactic tree is determined based on the statistical results.

本実施形態では、実行主体は、まずは、少なくとも１つの履歴ターゲットトリプレットを取得することができる。ここでは、履歴ターゲットトリプレットとは、実行主体が過去の時間範囲内において受信したターゲットテキストを処理することによって得られたターゲットトリプレットを指す。そして、実行主体は、前記少なくとも１つの履歴ターゲットトリプレットのうちの、所定の構文構造ツリーのマッチングによって得られた履歴ターゲットトリプレットの数を統計することができる。ある構文構造ツリーのマッチングによって得られた履歴ターゲットトリプレットの数が多ければ多いほど、前記構文構造ツリーの正確率がより高いことを意味し、よって、前記構文構造ツリーの重みがより大きいことが理解すべきである。最後に、実行主体は、前記統計の結果に基づいて各構文構造ツリーの重みを決定することができる。一例として、実行主体は、１００個の履歴ターゲットトリプレットを取得し、統計の結果から、５０個の履歴ターゲットトリプレットは、構文構造ツリーａから得られ、３０個の履歴ターゲットトリプレットは、構文構造ツリーｂから得られ、残った２０個の履歴ターゲットトリプレットは、構文構造ツリーｃから得られたことが分かる。実行主体は、前記統計の結果に基づいて構文構造ツリーａの重みが５０／１００＝０．５であり、構文構造ツリーｂの重みが３０／１００＝０．３であり、構文構造ツリーｃの重みが２０／１００＝０．２であることを決定することができる。 In the present embodiment, the execution subject can first acquire at least one history target triplet. Here, the historical target triplet refers to a target triplet obtained by processing a target text received by an executing entity within a past time range. Then, the executing subject can statistic the number of historical target triplets obtained by matching a predetermined syntactic structure tree among the at least one historical target triplet. It is understood that the greater the number of historical target triplets obtained by matching a syntactic tree, the higher the accuracy of the syntactic tree, and thus the greater the weight of the syntactic tree. Should. Finally, the executor can determine the weight of each syntactic tree based on the results of the statistics. As an example, the executor obtains 100 historical target triplets, and from the statistical results, 50 historical target triplets are obtained from the syntactic structure tree a, and 30 historical target triplets are obtained from the syntactic structure tree b. It can be seen that the remaining 20 historical target triplets obtained from were obtained from the syntactic structure tree c. Based on the result of the statistics, the execution subject has a weight of the syntax structure tree a of 50/100 = 0.5, a weight of the syntax structure tree b of 30/100 = 0.3, and a weight of the syntax structure tree c. It can be determined that the weight is 20/100 = 0.2.

本実施形態の情報を生成するための方法では、履歴ターゲットトリプレットと組み合わせて構文構造ツリーの重みを適時に調整することができるので、ターゲットトリプレットの決定正確率が向上された。 In the method for generating the information of the present embodiment, the weight of the syntax structure tree can be adjusted in a timely manner in combination with the historical target triplet, so that the determination accuracy rate of the target triplet is improved.

次に、本発明による情報を生成するための方法においてターゲットトリプレットを決定するフロー４００を示している図４を参照する。図４に示すように、本発明は、ステップ４０１、ステップ４０２、ステップ４０３及びステップ４０４に基づいてターゲットトリプレットを決定することができる。 Next, refer to FIG. 4, which shows the flow 400 for determining the target triplet in the method for generating information according to the present invention. As shown in FIG. 4, the present invention can determine the target triplet based on step 401, step 402, step 403 and step 404.

ステップ４０１においては、依存関係ツリーに基づいてターゲットテキストの中の数量詞及び連体修飾語を決定する。 In step 401, the quantifier and adnominal modifier in the target text are determined based on the dependency tree.

本実施例においては、依存関係ツリーは、単語の品詞及び機能を表しているので、実行主体は、生成されたターゲットテキストの依存関係ツリーに基づいてターゲットテキストの中の数量詞及び連体修飾語を決定することができる。連体修飾語は、主語及び目的語を修飾するために用いられ、名詞、代名詞及び形容詞を含んでもよい。 In this embodiment, the dependency tree represents the part of speech and the function of the word, so that the executing subject determines the quantifier and the adnominal modifier in the target text based on the dependency tree of the generated target text. can do. Adnominal modifiers are used to modify the subject and object and may include nouns, pronouns and adjectives.

ステップ４０２においては、前記数量詞が修飾するオブジェクト及び前記連体修飾語が修飾するオブジェクトを決定する。 In step 402, the object to be modified by the quantifier and the object to be modified by the adnominal modifier are determined.

数量詞及び連体修飾語が決定されると、実行主体は、前記数量詞が修飾するオブジェクト及び前記連体修飾語が修飾するオブジェクトを決定することができる。前記オブジェクトは、トリプレットの中の主語であってもよく、トリプレットの中の目的語であってもよい。一例として、テキスト「一個のリンゴ」においては、「一個」は、数量詞であり、「リンゴ」は、数量詞である「一個」が修飾するオブジェクトである。テキスト「赤リンゴ」においては、「赤」は、連体修飾語であり、「リンゴ」は、連体修飾語である「赤」が修飾するオブジェクトである。 Once the quantifier and the qualifier have been determined, the executing subject can determine the object to be modified by the quantifier and the object to be modified by the quantifier. The object may be the subject in the triplet or the object in the triplet. As an example, in the text "one apple", "one" is a quantifier, and "apple" is an object modified by the quantifier "one". In the text "red apple", "red" is an adnominal modifier, and "apple" is an object modified by the adnominal modifier "red".

ステップ４０３においては、決定された数量詞、連体修飾語、オブジェクトに基づいて少なくとも１つのトリプレットを更新する。 In step 403, at least one triplet is updated based on the determined quantifier, adnominal modifier, and object.

実行主体は、前記数量詞、連体修飾語及びその修飾対象であるオブジェクトが決定された後、少なくとも１つのトリプレットを更新することができる。例えば、決定されたオブジェクトがトリプレットの目的語である場合、実行主体は、前記オブジェクトを修飾する数量詞及び／又は連体修飾語を、前記オブジェクトと結合させ、結合後のテキストをトリプレットの新たな目的語とすることによって、前記トリプレットを更新する。前記更新により、各トリプレットにおける単語を増やすことができ、更新されたトリプレットに含まれる文字数でターゲットトリプレットを決定することができるので、ターゲットトリプレットの決定正確率が向上されることができる。一例として、ターゲットテキストは、「張三さんは、深センの誕生日会に出席する」であり、実行主体は、依存関係ツリーを生成した後に、構文構造ツリーとマッチングさせることでトリプレット「張三さん・深セン・出席する」及びトリプレット「張三さん・誕生日会・出席する」を取得することができる。依存関係ツリーに基づいて、「深セン」が「誕生日会」の連体修飾語であることを決定でき、実行主体は、更新によってトリプレット「張三さん・深センの誕生日会・出席する」を取得することができる。 The executing subject can update at least one triplet after the quantifier, the adnominal modifier, and the object to be modified are determined. For example, if the determined object is the object of the triplet, the executor combines the quantifier and / or modifier that modifies the object with the object, and the combined text is the new object of the triplet. By doing so, the triplet is updated. With the update, the number of words in each triplet can be increased, and the target triplet can be determined by the number of characters included in the updated triplet, so that the determination accuracy rate of the target triplet can be improved. As an example, the target text is "Mr. Zhangsan attends a birthday party in Shenzhen", and the executing entity creates a dependency tree and then matches it with the syntax structure tree to match the triplet "Mr. Zhangsan"・ You can get "Shenzhen / Attend" and triplet "Mr. Zhangsan / Birthday party / Attend". Based on the dependency tree, it can be determined that "Shenzhen" is an adnominal modifier of "birthday party", and the executing body gets the triplet "Zhangsan-san / Shenzhen's birthday party / attendance" by updating. can do.

本実施例のいくつかの選択的な実施態様では、前記ステップ４０３は、図４に示されていない次の内容を更に含んでもよい。前記少なくとも１つのトリプレットにおいて目的語とターゲットテキストの連体修飾語とがマッチングされたトリプレットを削除する。 In some selective embodiments of this example, step 403 may further include the following content not shown in FIG. The triplet in which the object and the adnominal modifier of the target text are matched in the at least one triplet is deleted.

本実施形態においては、実行主体は、得られた少なくとも１つのトリプレットにおいて目的語がターゲットテキストの連体修飾語であるトリプレットが存在するか否かを判定することができる。存在すれば、実行主体は、前記トリプレットを削除することができる。例えば、トリプレット「張三さん・深セン・出席する」に対し、実行主体は、連体修飾語である「深セン」をトリプレットの目的語としてはならないと判定することができる。よって、実行主体は、トリプレット「張三さん・深セン・出席する」が誤っていると判定し、前記トリプレットを削除することができる。従って、計算量を効果的に低減することができるので、計算の効率が向上可能である。 In the present embodiment, the executing subject can determine whether or not a triplet whose object is an adnominal modifier of the target text exists in at least one triplet obtained. If present, the executing entity can delete the triplet. For example, for the triplet "Mr. Zhang San, Shenzhen, attend", the executing subject can determine that the adnominal modifier "Shenzhen" should not be the object of the triplet. Therefore, the executing entity can determine that the triplet "Mr. Zhang San, Shenzhen, Attend" is incorrect and delete the triplet. Therefore, the amount of calculation can be effectively reduced, and the efficiency of calculation can be improved.

本実施例のいくつかの選択的な実施態様では、実行主体は、図４に示されていない次のステップに従ってトリプレットを更新することができる。まずは、少なくとも１つのトリプレットのうちのトリプレットに対して、決定されたオブジェクトが前記トリプレットの主語又は目的語と一致するか否かを判定する。次に、決定されたオブジェクトが前記トリプレットの主語と一致すると判定された後、前記決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの主語を結合し、結合後のテキストを前記トリプレットの主語として決定する。その後、決定されたオブジェクトが前記トリプレットの目的語と一致すると判定された後、前記決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの目的語を結合し、結合後のテキストを前記トリプレットの目的語として決定する。 In some selective embodiments of this embodiment, the performer can update the triplet according to the following steps not shown in FIG. First, for the triplet of at least one triplet, it is determined whether or not the determined object matches the subject or object of the triplet. Next, after it is determined that the determined object matches the subject of the triplet, the quantifier, the modifier and the subject of the triplet that modify the determined object are combined, and the combined text is the triplet's subject. Determined as the subject. Then, after it is determined that the determined object matches the object of the triplet, the quantifier, the modifier and the object of the triplet that modify the determined object are combined, and the combined text is combined with the triplet. Determined as the object of.

前記少なくとも１つのトリプレットのうちの各トリプレットに対して、実行主体は、まずは、決定されたオブジェクトが前記トリプレットの主語又は目的語と一致するか否かを判定することができる。ここで、一致とは、前記オブジェクトにおける少なくとも１つの文字がトリプレットの主語又は目的語における少なくとも１つの文字と同じであることを指してもよいことを理解されたい。例えば、オブジェクトが「張さん」であり、トリプレットの主語が「張三さん」である場合、前記オブジェクトが前記トリプレットの主語と一致していると判定することができる。 For each triplet of the at least one triplet, the executing subject can first determine whether or not the determined object matches the subject or object of the triplet. It should be understood here that matching may mean that at least one character in the object is the same as at least one character in the subject or object of the triplet. For example, when the object is "Zhang-san" and the subject of the triplet is "Zhang-san", it can be determined that the object matches the subject of the triplet.

決定されたオブジェクトが前記トリプレットの主語と一致すると判定された後、実行主体は、前記オブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの主語を結合し、結合後のテキストを前記トリプレットの主語とすることができる。例えば、オブジェクトが「張さん」であり、前記オブジェクトを修飾する連体修飾語が「すがすがしい」であり、トリプレットの主語が「張三さん」である場合、結合後のテキストは、「すがすがしい張三さん」であってもよい。その後、「すがすがしい張三さん」を前記トリプレットの主語とする。こうすると、トリプレットの主語に対する更新が実現された。 After it is determined that the determined object matches the subject of the triplet, the executing subject combines the quantifier, the modifier and the subject of the triplet that modify the object, and the combined text is combined with the subject of the triplet. can do. For example, if the object is "Zhang-san", the adnominal modifier that modifies the object is "refreshing", and the subject of the triplet is "Zhang-san", the combined text will be "refreshing Zhang-san". May be. After that, "refreshing Mr. Zhangsan" is used as the subject of the triplet. This provided an update to the triplet subject.

決定されたオブジェクトが前記トリプレットの目的語と一致すると判定された後、実行主体は、前記オブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの目的語を結合し、結合後のテキストを前記トリプレットの目的語とすることができる。こうすると、トリプレットの目的語に対する更新が実現された。 After it is determined that the determined object matches the object of the triplet, the executing subject combines the quantifier, the modifier and the object of the triplet that modify the object, and the combined text is the object of the triplet. Can be an object. This resulted in an update to the triplet object.

トリプレットに対して更新を行うときに、トリプレットの主語のみを更新してもよく、トリプレットの目的語のみを更新してもよく、トリプレットの主語及び目的語を同時に更新してもよいことを理解されたい。また、前記結合の操作を行う際、数量詞と連体修飾語のうちの何れかをトリプレットの主語又はトリプレットの目的語と結合してもよい。 It is understood that when updating a triplet, only the triplet subject may be updated, only the triplet object may be updated, or the triplet subject and object may be updated at the same time. I want to. Further, when performing the combination operation, any one of the quantifier and the adnominal modifier may be combined with the subject of the triplet or the object of the triplet.

ステップ４０４においては、更新された少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。 In step 404, one target triplet is determined from at least one updated triplet.

トリプレットが更新された後、更新された少なくとも１つのトリプレットから１つのターゲットトリプレットを決定することができる。具体的には、実行主体は、サブステップ４０４１及びサブステップ４０４２に従ってターゲットトリプレットを決定することができる。 After the triplet has been updated, one target triplet can be determined from at least one updated triplet. Specifically, the executing entity can determine the target triplet according to sub-step 4041 and sub-step 4042.

サブステップ４０４１においては、少なくとも１つのトリプレットのうちのトリプレットに対して、マッチングによって得られた前記トリプレットの構文構造ツリーの事前設定重みを決定し、前記トリプレットに含まれる単語の文字数を決定し、前記トリプレットにおける単語の共起度を決定し、決定された重み、文字数及び共起度に基づいて前記トリプレットの得点を決定する。 In substep 4041, for the triplet of at least one triplet, the preset weight of the syntactic structure tree of the triplet obtained by matching is determined, the number of characters of the word contained in the triplet is determined, and the above-mentioned The degree of co-occurrence of a word in a triplet is determined, and the score of the triplet is determined based on the determined weight, the number of characters, and the degree of co-occurrence.

前記少なくとも１つのトリプレットのうちの各トリプレットに対して、実行主体は、まずは、マッチングによって得られた前記トリプレットの構文構造ツリーの重みを決定することができる。次に、前記トリプレットに含まれる単語に基づいて前記トリプレットに含まれる単語の文字数を決定する。次に、前記トリプレットにおける単語の共起度を決定する。最後に、決定された重み、文字数及び共起度に基づいて前記トリプレットの得点を計算する。なお、ここでの共起とは、トリプレットにおける単語が同一の語句、同一の段落又は同一の文章の中に現れることを指してもよい。前記共起度は、トリプレットの中の一番目の単語が現れる確率と、一番目の単語が現れた上で二番目の単語が現れる確率と、一番目の単語及び二番目の単語が現れた上で三番目の単語が現れる確率との三者の積であってもよい。 For each triplet of at least one triplet, the executing entity can first determine the weight of the syntactic tree of the triplet obtained by matching. Next, the number of characters of the word contained in the triplet is determined based on the word contained in the triplet. Next, the degree of co-occurrence of words in the triplet is determined. Finally, the triplet score is calculated based on the determined weight, number of characters and degree of co-occurrence. The co-occurrence here may mean that the words in the triplet appear in the same phrase, the same paragraph, or the same sentence. The degree of co-occurrence is the probability that the first word in the triplet will appear, the probability that the second word will appear after the first word appears, and the probability that the first word and the second word will appear. It may be the product of the probability that the third word appears in.

例えば、トリプレットが「張三さん・新生児・見舞う」である場合、実行主体は、まずは、予め設定された情報の集合中の「張三さん」が現れる確率を決定することができる。前記情報の集合は、ホームページのタイトルの集合、多数の文章の集合等であってもよい。前記情報の集合は、１００００のメーセッジを含み、その中、「張三さん」が含まれるメーセッジは、１００であると仮定すると、「張三さん」が現れる確率は、１％である。そして、実行主体は、前記情報の集合における「張三さん」が含まれるメーセッジから、「見舞う」が現れる確率を決定することができる。前記「張三さん」が含まれる１００のメーセッジのうち、２０のメーセッジには「見舞う」が含まれると仮定すると、「張三さん」が現れた上で、「見舞う」が現れる確率は、２０％である。その後、実行主体は、同じ方法で「張三」及び「見舞う」が現れた上で、「新生児」が「見舞う」の目的語として現れる確率は、５０％であることを決定することができる。よって、前記共起度は、１％×２０％×５０％＝０．１％である。 For example, when the triplet is "Mr. Zhangsan, newborn baby, visit", the executing entity can first determine the probability that "Mr. Zhangsan" will appear in the preset set of information. The set of information may be a set of homepage titles, a set of a large number of sentences, and the like. Assuming that the set of information contains 10,000 messages, and the number of messages including "Zhangsan" is 100, the probability that "Zhangsan" appears is 1%. Then, the executing subject can determine the probability that "visit" appears from the message including "Mr. Zhangsan" in the set of information. Assuming that 20 of the 100 messages including "Zhangsan" include "visit", the probability that "visit" will appear after "Zhangsan" appears is 20. %. After that, the executing subject can determine that the probability that "newborn" appears as the object of "visit" is 50% after "Zhangsan" and "visit" appear in the same way. Therefore, the degree of co-occurrence is 1% × 20% × 50% = 0.1%.

実行主体は、重み、文字数及び共起度を取得した後、次の式に従って前記トリプレットの得点を決定することができる。得点＝ａ×重み＋ｂ×文字数＋ｃ×共起度。式中、ａ、ｂ、ｃは、予め設定された係数である。 After acquiring the weight, the number of characters, and the degree of co-occurrence, the executing subject can determine the score of the triplet according to the following equation. Score = a x weight + b x number of characters + c x co-occurrence degree. In the formula, a, b, and c are preset coefficients.

サブステップ４０４２においては、少なくとも１つのトリプレットのうちの、得点が最も高いトリプレットをターゲットトリプレットとして決定する。 In substep 4042, the triplet with the highest score among at least one triplet is determined as the target triplet.

各トリプレットの得点が得られた後、実行主体は、前記少なくとも１つのトリプレットのうちの、得点が最も高いトリプレットをターゲットトリプレットとすることができる。トリプレットの得点が高ければ高いほど、前記トリプレットの正確率が高くなり、ターゲットテキストに含まれるオブジェクトとオブジェクトの記述情報をよりよく表現できることを理解されたい。 After the score of each triplet is obtained, the executing entity can use the triplet having the highest score among the at least one triplet as the target triplet. It should be understood that the higher the triplet score, the higher the accuracy of the triplet and the better the representation of the objects contained in the target text and the descriptive information of the objects.

本発明の前記実施例による情報を生成するための方法では、複数のトリプレットからターゲットテキストと最も関連する１つのトリプレットを決定することができるので、トリプレットの抽出正確率が向上された。 In the method for generating information according to the above embodiment of the present invention, one triplet most related to the target text can be determined from a plurality of triplets, so that the extraction accuracy rate of the triplet is improved.

次に、本発明の情報を生成するための方法のもう１つの実施例のフロー５００を示している図５を参照する。図５に示すように、本実施例の情報を生成するための方法は、ターゲットトリプレットが得られた後、ステップ５０１、ステップ５０２及びステップ５０３を更に含むことができる。 Next, refer to FIG. 5, which shows the flow 500 of another embodiment of the method for generating the information of the present invention. As shown in FIG. 5, the method for generating the information of this embodiment can further include steps 501, 502 and 503 after the target triplet is obtained.

ステップ５０１においては、ターゲットトリプレットに基づいて、予め設定された履歴イベント情報集合中のターゲットテキストと関連する少なくとも１つの履歴イベント情報を決定する。 In step 501, at least one historical event information associated with the target text in the preset historical event information set is determined based on the target triplet.

ターゲットトリプレットが決定された後、前記ターゲットトリプレットに基づいて予め設定された履歴イベント情報集合中のターゲットテキストと関連する少なくとも１つの履歴イベント情報を決定することができる。前記履歴イベント情報は、オブジェクト及びオブジェクトの記述情報を含んでもよい。本実施例においては、履歴イベント情報におけるオブジェクトがターゲットトリプレットの主語と同じである場合、又は、履歴イベント情報には、ターゲットトリプレットの主語、述語又は目的語が含まれる場合、履歴イベント情報は、ターゲットテキストと関連していると見なすことが可能である。 After the target triplet is determined, at least one historical event information associated with the target text in the preset historical event information set based on the target triplet can be determined. The historical event information may include an object and descriptive information of the object. In this embodiment, if the object in the historical event information is the same as the subject of the target triplet, or if the historical event information includes the subject, predicate, or object of the target triplet, the historical event information is the target. It can be considered to be related to the text.

本実施例のいくつかの選択的な実施態様では、履歴イベント情報は、参加者情報及びトリガーワード情報を含んでもよい。実行主体は、図５に示されていない次のステップに従って履歴イベント情報とターゲットテキストが関連しているか否かを判定することができる。まずは、ターゲットトリプレットの主語又は目的語と履歴イベント情報集合における履歴イベント情報の参加者情報とが一致する条件、及び、ターゲットトリプレットの述語と履歴イベント情報集合における履歴イベント情報のトリガーワード情報とが一致する条件を満たすか否かを判定する。次に、履歴イベント情報が、ターゲットテキストと関連している上述した各条件のうちの少なくとも１つを満たすと決定する。 In some selective embodiments of this embodiment, the historical event information may include participant information and trigger word information. The executing subject can determine whether or not the historical event information and the target text are related according to the next step (not shown in FIG. 5). First, the condition that the subject or object of the target triplet matches the participant information of the history event information in the history event information set, and the predicate of the target triplet and the trigger word information of the history event information in the history event information set match. Judge whether or not the conditions are met. It is then determined that the historical event information satisfies at least one of the above-mentioned conditions associated with the target text.

本実施形態においては、前記参加者情報は、履歴イベントにおける関連人物の情報であってもよい。トリガーワード情報は、前記参加者情報の動作情報であってもよい。例えば、履歴イベント情報は、「小明さんと小紅さんは、一緒に第１の食堂に昼ご飯を食べに行く」であれば、参加者情報は、「小明さん」及び「小紅さん」を含んでもよく、トリガーワード情報は、「食べる」である。ターゲットトリプレットの主語又は目的語と参加者情報とをマッチングし、一致であれば、トリプレットの主語又は目的語が履歴イベントにおける参加者と同じであると考えられる。ターゲットトリプレットの述語とトリガーワード情報とをマッチングさせ、一致であれば、トリプレットの述語が履歴イベントにおけるトリガーワードと同じであると考えられる。前記２つの条件のうちの少なくとも１つが満たされると、実行主体は、前記履歴イベントとターゲットテキストが関連していると見なすことができる。 In the present embodiment, the participant information may be information on a related person in a historical event. The trigger word information may be the operation information of the participant information. For example, if the history event information is "Mr. Komei and Mr. Koboku go to the first dining room together for lunch", the participant information is "Mr. Komei" and "Mr. Koboku". May include, and the trigger word information is "eat". If the subject or object of the target triplet is matched with the participant information, and if they match, the subject or object of the triplet is considered to be the same as the participant in the history event. If the target triplet predicate and the trigger word information are matched and they match, the triplet predicate is considered to be the same as the trigger word in the historical event. If at least one of the two conditions is met, the executing entity can consider the historical event to be related to the target text.

ステップ５０２においては、ターゲットテキストと少なくとも１つの履歴イベント情報の類似度を決定する。 In step 502, the similarity between the target text and at least one historical event information is determined.

ターゲットトリプレットに基づいて少なくとも１つの履歴イベント情報が決定された後、更にターゲットテキストと最も関連する履歴イベント情報を取得するために、実行主体は、ターゲットテキストと前記少なくとも１つの履歴イベント情報のうちの各履歴イベント情報との類似度を決定することができる。実行主体は、ターゲットテキストと履歴イベント情報における同じ文字又は単語の数に基づいて、ターゲットテキストと履歴イベント情報の類似度を決定することができる。又は、実行主体は、更に履歴イベント情報における前記条件を満たす項目の数に対して類似度を決定することができる。 After at least one historical event information has been determined based on the target triplet, in order to further acquire the historical event information most related to the target text, the executing entity is among the target text and the at least one historical event information. The degree of similarity with each historical event information can be determined. The executing entity can determine the similarity between the target text and the historical event information based on the number of the same characters or words in the target text and the historical event information. Alternatively, the executing entity can further determine the similarity with respect to the number of items satisfying the above conditions in the historical event information.

本実施例のいくつかの選択的な実施態様では、前記履歴イベント情報は、キーワードを含んでもよい。前記キーワードは、イベント名称、イベント発生時間等であってもよい。なお、前記イベント名称は、履歴イベントの主語、述語及び目的語を含んでもよい。実行主体は、図５に示されていない次のステップに従ってターゲットテキストと履歴イベント情報の類似度を決定することができる。まずは、ターゲットテキストを分割することによって、第１の単語集合を取得する。次に、前記少なくとも１つの履歴イベント情報のうちの各履歴イベント情報に対して、前記履歴イベント情報に含まれる各キーワードを連結し、連結されたテキストを分割することによって、第２の単語集合を取得する。第１の単語集合及び第２の単語集合に基づいてターゲットテキストと前記履歴イベント情報の類似度を決定する。 In some selective embodiments of this embodiment, the historical event information may include keywords. The keyword may be an event name, an event occurrence time, or the like. The event name may include the subject, predicate, and object of the history event. The executor can determine the similarity between the target text and the historical event information according to the following steps not shown in FIG. First, the first word set is acquired by dividing the target text. Next, for each history event information in the at least one history event information, each keyword included in the history event information is concatenated, and the concatenated text is divided to form a second word set. get. The similarity between the target text and the historical event information is determined based on the first word set and the second word set.

本実施形態においては、実行主体は、まずは、ターゲットテキストを分割することによって、第１の単語集合を取得することができる。分割時、意味に基づいて分割してもよく、文字数に基づいて分割してもよい。次に、前記少なくとも１つの履歴イベント情報のうちの各履歴イベント情報に対して、実行主体は、前記履歴イベント情報に含まれる各キーワードを連結し、連結されたテキストを分割することによって、第２の単語集合を取得する。類似度の正決性を決保するために、同じ粒度で分割することができる。即ち、ターゲットテキスト及び結合されたテキストを分割する際、何れもｂｉｇｒａｍ又はｔｒｉｇｒａｍの方法を用いて分割し、得られた単語に含まれる文字数が同じである。例えば、ターゲットテキストは、「私は中国人」である場合、ｂｉｇｒａｍの方法を用いて分割すると、「私は」「は中」「中国」「国人」が得られるが、ｔｒｉｇｒａｍの方法を用いて分割すると、「私は中」「は中国）」「中国人」が得られる。 In the present embodiment, the execution subject can first acquire the first word set by dividing the target text. At the time of division, it may be divided based on the meaning or the number of characters. Next, for each historical event information in the at least one historical event information, the executing subject concatenates each keyword included in the historical event information and divides the concatenated text to obtain a second. Get the word set of. It can be divided at the same particle size to ensure the correctness of similarity. That is, when dividing the target text and the combined text, both are divided using the method of bigram or trigram, and the number of characters contained in the obtained word is the same. For example, if the target text is "I am Chinese", splitting it using the bigram method will give "I", "Hachu", "China", and "Kokujin", but using the trigram method. If you divide it, you will get "I am medium", "is Chinese", and "Chinese".

実行主体は、第１の単語集合及び第２の単語集合が得られた後、第１の単語集合及び第２の単語集合における単語の全てを列挙することができる。その後、前記各単語がターゲットテキストに現れる回数を統計し、得られた各回数を組み合わせて第１の単語ベクトルＡが得られる。次に、前記各単語が結合されたテキストに現れる回数を統計し、得られた各回数を組み合わせて第２の単語ベクトルＢが得られる。次に、実行主体は、ベクトルの余弦の公式に基づいてターゲットテキストと結合されたテキストの類似度を計算する。

The executing entity can enumerate all the words in the first word set and the second word set after the first word set and the second word set are obtained. After that, the number of times each word appears in the target text is statistic, and the obtained times are combined to obtain the first word vector A. Next, the number of times each word appears in the combined text is statistic, and the obtained times are combined to obtain a second word vector B. The executor then calculates the similarity of the text combined with the target text based on the vector cosine formula.

式中、Ａ＝（Ａ_１，Ａ_２，…，Ａ_ｎ）、Ｂ＝（Ｂ_１，Ｂ_２，…，Ｂ_ｎ）。ただし、Ａ_ｉは、第１の単語ベクトルＡのうちのｉ番目の値であり、Ｂ_ｉは、第２の単語ベクトルＢのうちのｉ番目の値である。 In the equation, A = (A ₁ , A ₂ , ..., _An ), B = (B ₁ , B ₂ , ..., B _n ). However, A _i is the i th value of the first word vectors A, B _i is the i th value of the second word vector B.

ステップ５０３においては、ターゲットテキストとの類似度が最も高い履歴イベント情報を出力する。 In step 503, the historical event information having the highest degree of similarity to the target text is output.

ターゲットテキストと関連する各履歴イベント情報とターゲットテキストの類似度が決定された後、実行主体は、ターゲットテキストとの類似度が最も高い履歴イベント情報を出力することができる。 After the similarity between each historical event information associated with the target text and the target text is determined, the execution subject can output the historical event information having the highest similarity with the target text.

本発明の前記実施例による情報を生成するための方法では、ユーザの情報量を充実するために、履歴イベント情報集合中のターゲットテキストと最も関連する履歴イベント情報を決定することができる。本実施例の方法は、ビデオを選別するために応用することができる。ビデオのタイトルをターゲットテキストとし、ビデオタイトルのターゲットトリプレットを決定し、その後、前記ビデオタイトルと関連する履歴イベントを選別することによって、そのビデオが古いビデオであるか否かを判断することができる。 In the method for generating the information according to the embodiment of the present invention, the historical event information most related to the target text in the historical event information set can be determined in order to enhance the amount of information of the user. The method of this embodiment can be applied to screen videos. By using the title of the video as the target text, determining the target triplet of the video title, and then selecting the historical events associated with the video title, it is possible to determine whether the video is an old video.

更に、図６に示すように、前記各図面に示される方法の実現として、本発明は、情報を生成するための装置の一実施形態を提供する。前記装置の実施例は、図２に示される方法の実施例に対応し、前記装置は、具体的に様々な電子機器に応用することができる。 Further, as shown in FIG. 6, as a realization of the method shown in each of the drawings, the present invention provides an embodiment of an apparatus for generating information. The embodiment of the apparatus corresponds to the embodiment of the method shown in FIG. 2, and the apparatus can be specifically applied to various electronic devices.

図６に示すように、本実施例の情報を生成するための装置６００は、ターゲットテキスト受信ユニット６０１、依存関係ツリー生成ユニット６０２、トリプレット決定ユニット６０３及びターゲットトリプレット決定ユニット６０４を含む。 As shown in FIG. 6, the device 600 for generating the information of this embodiment includes a target text receiving unit 601, a dependency tree generating unit 602, a triplet determining unit 603, and a target triplet determining unit 604.

ここで、ターゲットテキスト受信ユニット６０１は、ターゲットテキストを受信するように構成されている。ターゲットテキストは、オブジェクト及びオブジェクトに対する記述情報を含む。 Here, the target text receiving unit 601 is configured to receive the target text. The target text contains the object and descriptive information for the object.

依存関係ツリー生成ユニット６０２は、ターゲットテキストに対して依存構文解析を行い、ターゲットテキストの依存関係ツリーを生成するように構成されている。 The dependency tree generation unit 602 is configured to perform dependency parsing on the target text and generate a dependency tree for the target text.

トリプレット決定ユニット６０３は、予め設定された少なくとも１つの構文構造ツリーと依存関係ツリーとをマッチングさせることで、主語、述語及び目的語からなる少なくとも１つのトリプレットを取得するように構成されている。 The triplet determination unit 603 is configured to acquire at least one triplet composed of a subject, a predicate, and an object by matching at least one preset syntactic structure tree with a dependency tree.

ターゲットトリプレット決定ユニット６０４は、少なくとも１つのトリプレットにおける１つのトリプレットに含まれる単語及びトリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するように構成されている。 The target triplet determination unit 604 has one target triplet from at least one triplet based on the preset weights of the syntactic structure tree that are matched to obtain the words and triplets contained in one triplet in at least one triplet. Is configured to determine.

本実施例のいくつかの選択的な実施態様では、前記ターゲットトリプレット決定ユニット６０４は、図６に示されていない連体修飾語決定モジュール、オブジェクト決定モジュール、トリプレット更新モジュール及びターゲットトリプレット決定モジュールを更に含んでもよい。 In some selective embodiments of this embodiment, the target triplet determination unit 604 further comprises a coalition modifier determination module, an object determination module, a triplet update module and a target triplet determination module not shown in FIG. It may be.

連体修飾語決定モジュールは、依存関係ツリーに基づいてターゲットテキストの中の数量詞及び連体修飾語を決定するように構成されている。 The adnominal modifier determination module is configured to determine the quantifiers and adnominal modifiers in the target text based on the dependency tree.

オブジェクト決定モジュールは、数量詞が修飾するオブジェクト及び連体修飾語が修飾するオブジェクトを決定するように構成されている。 The object determination module is configured to determine the object to be modified by the quantifier and the object to be modified by the adnominal modifier.

トリプレット更新モジュールは、決定された数量詞、連体修飾語及びオブジェクトに基づいて少なくとも１つのトリプレットを更新するように構成されている。 The triplet update module is configured to update at least one triplet based on the determined quantifiers, adnominal modifiers and objects.

ターゲットトリプレット決定モジュールは、更新された少なくとも１つのトリプレットから１つのターゲットトリプレットを決定するように構成されている。 The target triplet determination module is configured to determine one target triplet from at least one updated triplet.

本実施例のいくつかの選択的な実施態様では、前記トリプレット更新モジュールは、少なくとも１つのトリプレットのうちのトリプレットに対して、決定されたオブジェクトが前記トリプレットの主語又は目的語と一致するか否かを判定し、決定されたオブジェクトが前記トリプレットの主語と一致すると判定されたことに応答し、決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの主語を結合し、結合後のテキストを前記トリプレットの主語として決定し、決定されたオブジェクトが前記トリプレットの目的語と一致すると判定されたことに応答し、決定されたオブジェクトを修飾する数量詞、連体修飾語及び前記トリプレットの目的語を結合し、結合後のテキストを前記トリプレットの目的語として決定するように更に構成されてもよい。 In some selective embodiments of this embodiment, the triplet update module determines whether the determined object matches the subject or object of the triplet for the triplet of at least one triplet. In response to the determination that the determined object matches the subject of the triplet, the quantifier, the modifier and the subject of the triplet that modify the determined object are combined, and the combined text is obtained. In response to the determination as the subject of the triplet and the determination that the determined object matches the object of the triplet, the quantifier, the coalition modifier and the object of the triplet that modify the determined object are combined. , The combined text may be further configured to determine the subject of the triplet.

本実施例のいくつかの選択的な実施態様では、前記ターゲットトリプレット決定モジュールは、少なくとも１つのトリプレットのうちのトリプレットに対して、前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みを決定し、前記トリプレットに含まれる単語の文字数を決定し、前記トリプレットにおける単語の共起度を決定し、決定された重み、文字数及び共起度に基づいて前記トリプレットの得点を決定し、少なくとも１つのトリプレットのうちの、得点が最も高いトリプレットをターゲットトリプレットとして決定するように更に構成されてもよい。 In some selective embodiments of this embodiment, the target triplet determination module presets the syntactic structure tree that is matched against the triplet of at least one triplet to obtain the triplet. The weight is determined, the number of characters of the word contained in the triplet is determined, the degree of co-occurrence of the word in the triplet is determined, and the score of the triplet is determined based on the determined weight, the number of characters and the degree of co-occurrence. The triplet with the highest score among at least one triplet may be further configured to be determined as the target triplet.

本実施例のいくつかの選択的な実施態様では、前記装置６００は、図６に示されていない重み設置ユニットを更に含んでもよい。前記重み設置ユニットは、履歴ターゲットトリプレットモジュール、トリプレット数量統計モジュール及び重み決定モジュールを含んでもよい。 In some selective embodiments of this embodiment, the device 600 may further include a weighting unit not shown in FIG. The weighting unit may include a history target triplet module, a triplet quantity statistics module, and a weight determination module.

履歴ターゲットトリプレットモジュールは、少なくとも１つの履歴ターゲットトリプレットを取得するように構成されている。 The history target triplet module is configured to acquire at least one history target triplet.

トリプレット数量統計モジュールは、前記少なくとも１つの履歴ターゲットトリプレットのうちの、所定の構文構造ツリーからマッチングされて得られた履歴ターゲットトリプレットの数を統計するように構成されている。 The triplet quantity statistics module is configured to stat the number of historical target triplets obtained by matching from a predetermined syntactic structure tree among the at least one historical target triplet.

重み決定モジュールは、統計の結果に基づいて前記少なくとも１つの構文構造ツリーの重みを決定するように構成されている。 The weight determination module is configured to determine the weight of at least one syntactic structure tree based on the result of statistics.

本実施例のいくつかの選択的な実施態様においては、前記装置６００は、図６に示されていない履歴イベント情報決定ユニット、類似度決定ユニット及び履歴イベント情報出力ユニットを更に含んでもよい。 In some selective embodiments of this embodiment, the device 600 may further include a history event information determination unit, a similarity determination unit and a history event information output unit not shown in FIG.

ここで、履歴イベント情報決定ユニットは、ターゲットトリプレットに基づいて予め設定された履歴イベント情報集合中のターゲットテキストと関連する少なくとも１つの履歴イベント情報を決定するように構成されている。 Here, the historical event information determination unit is configured to determine at least one historical event information associated with the target text in the preset historical event information set based on the target triplet.

類似度決定ユニットは、ターゲットテキストと少なくとも１つの履歴イベント情報の類似度を決定するように構成されている。 The similarity determination unit is configured to determine the similarity between the target text and at least one historical event information.

履歴イベント情報出力ユニットは、ターゲットテキストとの類似度が最も高い履歴イベント情報を出力するように構成されている。 The history event information output unit is configured to output history event information having the highest degree of similarity to the target text.

本実施例のいくつかの選択的な実施態様においては、前記履歴イベント情報は、参加者情報及びトリガーワード情報を含んでもよい。前記履歴イベント情報決定ユニットは更に、ターゲットトリプレットの主語又は目的語が履歴イベント情報集合における履歴イベント情報の参加者情報と一致する条件、及び、ターゲットトリプレットの述語が履歴イベント情報集合における履歴イベント情報のトリガーワード情報と一致する条件を満たすか否かを判定し、履歴イベント情報が、ターゲットテキストと関連している上述した各条件のうちの少なくとも１つを満たすと決定するように構成されている。 In some selective embodiments of this embodiment, the historical event information may include participant information and trigger word information. The history event information determination unit further includes a condition in which the subject or object of the target triplet matches the participant information of the history event information in the history event information set, and the predicate of the target triplet is the history event information in the history event information set. It is configured to determine whether or not a condition that matches the trigger word information is satisfied, and to determine that the historical event information satisfies at least one of the above-mentioned conditions associated with the target text.

本実施例のいくつかの選択的な実施態様においては、前記履歴イベント情報は、キーワードを含んでもよい。前記類似度決定ユニットは更に、ターゲットテキストを分割することによって、第１の単語集合を取得し、少なくとも１つの履歴イベント情報のうちの履歴イベント情報に対して、前記履歴イベント情報に含まれる各キーワードを連結し、連結されたテキストを分割することによって、第２の単語集合を取得し、第１の単語集合及び第２の単語集合に基づいてターゲットテキストと前記履歴イベント情報の類似度を決定するように構成されている。 In some selective embodiments of this embodiment, the historical event information may include keywords. The similarity determination unit further acquires a first word set by dividing the target text, and for the historical event information of at least one historical event information, each keyword included in the historical event information. Is concatenated and the concatenated text is divided to obtain a second word set, and the similarity between the target text and the history event information is determined based on the first word set and the second word set. It is configured as follows.

本発明の前記実施例による情報を生成するための装置は、ターゲットテキストが受信された後、ターゲットテキストに対して依存構文解析を行ってターゲットテキストの依存関係ツリーを生成することができる。次に、予め設定された少なくとも１つの構文構造ツリーと前記依存関係ツリーとをマッチングさせることによって、少なくとも１つのトリプレットを取得する。最後に、前記少なくとも１つのトリプレットにおける各トリプレットに含まれる単語及び前記トリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、前記少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。本実施例の装置は、ターゲットテキストに含まれるイベントと最も関連するトリプレットを選択することができるので、ターゲットトリプレットの抽出正確率が向上された。 The device for generating the information according to the embodiment of the present invention can generate a dependency tree of the target text by performing a dependency parsing on the target text after the target text is received. Next, at least one triplet is acquired by matching at least one preset syntax structure tree with the dependency tree. Finally, one target triplet is determined from the at least one triplet based on the words contained in each triplet in the at least one triplet and the preset weights of the syntactic structure tree matched to obtain the triplet. To do. Since the device of this embodiment can select the triplet most related to the event contained in the target text, the extraction accuracy rate of the target triplet is improved.

情報を生成するための装置６００に記載のユニット６０１〜ユニット６０４は、それぞれ図２に示されている方法の中の各ステップに対応していることを理解されたい。従って、以上、情報を生成するための方法について説明した操作及び特徴は、同様に装置６００及びその中に含まれるユニットにも適しているので、ここではこれ以上くどくど述べない。 It should be understood that the units 601 to 604 described in the device 600 for generating information correspond to each step in the method shown in FIG. Therefore, the operations and features described above for the method for generating information are also suitable for the device 600 and the units contained therein, and will not be described further here.

以下、本発明の実施例を実現するための設備に適用されるコンピュータシステム７００を示す構造模式図である図７を参照する。図７に示す設備は、一例に過ぎず、本発明の実施例の機能及び使用範囲を限定するものではない。 Hereinafter, FIG. 7 which is a structural schematic diagram showing a computer system 700 applied to the equipment for realizing the embodiment of the present invention will be referred to. The equipment shown in FIG. 7 is merely an example, and does not limit the functions and the range of use of the embodiments of the present invention.

図７に示すように、コンピュータシステム７００は、読み出し専用メモリ（ＲＯＭ）７０２に記憶されているプログラム又は記憶部７０８からランダムアクセスメモリ（ＲＡＭ）７０３にロードされたプログラムによって様々な適当な動作及び処理を実行することができる中央処理装置（ＣＰＵ）７０１を備える。ＲＡＭ７０３には、システム７００の動作に必要な様々なプログラム及びデータが更に格納されている。ＣＰＵ７０１、ＲＯＭ７０２及びＲＡＭ７０３は、バス７０４を介して互いに接続されている。入力／出力（Ｉ／Ｏ）インターフェース７０５もバス７０４に接続されている。 As shown in FIG. 7, the computer system 700 has various appropriate operations and processes depending on the program stored in the read-only memory (ROM) 702 or the program loaded from the storage unit 708 into the random access memory (RAM) 703. 701 is provided with a central processing unit (CPU) 701 capable of executing the above. The RAM 703 further stores various programs and data necessary for the operation of the system 700. The CPU 701, ROM 702, and RAM 703 are connected to each other via the bus 704. The input / output (I / O) interface 705 is also connected to the bus 704.

キーボード、マウスなどを含む入力部７０６、陰極線管（ＣＲＴ）、液晶ディスプレイ（ＬＣＤ）など、及びスピーカなどを含む出力部７０７、ハードディスクなどを含む記憶部７０８、並びにＬＡＮカード、モデムなどを含むネットワークインターフェースカードの通信部７０９は、Ｉ／Ｏインターフェース７０５に接続されている。通信部７０９は、例えばインターネットのようなネットワークを介して通信処理を実行する。ドライバ７１０は、必要に応じてＩ／Ｏインターフェース７０５に接続される。リムーバブルメディア７１１は、例えば、マグネチックディスク、光ディスク、光磁気ディスク、半導体メモリなどのようなものであり、必要に応じてドライバ７１０に取り付けられ、それによって、リムーバブルメディア７１１から読み出されたコンピュータプログラムが必要に応じて記憶部７０８にインストールされる。 Input unit 706 including keyboard, mouse, etc., cathode ray tube (CRT), liquid crystal display (LCD), etc., output unit 707 including speakers, storage unit 708 including hard disk, etc., and network interface including LAN card, modem, etc. The communication unit 709 of the card is connected to the I / O interface 705. The communication unit 709 executes communication processing via a network such as the Internet. The driver 710 is connected to the I / O interface 705 as needed. The removable media 711 is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, and is attached to a driver 710 as needed, thereby reading a computer program from the removable media 711. Is installed in the storage unit 708 as needed.

特に、本発明の実施例によれば、上記のフローチャートを参照しながら記載されたプロセスは、コンピュータのソフトウェアプログラムとして実現されてもよい。例えば、本発明の実施例は、機械可読記憶媒体に具現化されるコンピュータプログラムを含むコンピュータプログラム製品を備え、前記コンピュータプログラムは、フローチャートで示される方法を実行するためのプログラムコードを含む。このような実施例では、前記コンピュータプログラムは、通信部７０９を介してネットワークからダウンロードされてインストールされてもよく、及び／又はリムーバブルメディア７１１からインストールされてもよい。前記コンピュータプログラムが中央処理装置（ＣＰＵ）７０１によって実行される場合の、本発明の方法で限定された上記の機能を実行する。 In particular, according to the embodiment of the present invention, the process described with reference to the above flowchart may be realized as a software program of a computer. For example, an embodiment of the present invention comprises a computer program product comprising a computer program embodied in a machine-readable storage medium, said computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication unit 709 and / or installed from the removable media 711. When the computer program is executed by a central processing unit (CPU) 701, it performs the above-mentioned functions limited by the method of the present invention.

なお、本発明の前記コンピュータ可読記憶媒体は、コンピュータ可読信号記憶媒体又はコンピュータ可読記憶媒体、又はこれらの任意の組み合わせであってもよい。コンピュータ可読記憶媒体は、例えば、電子、磁気、光学、電磁気、赤外線、又は半導体システム、装置もしくはデバイス、又はこれらの任意の組み合わせであることができるが、これらに限定されない。コンピュータ可読記憶媒体のより具体的な例としては、１本以上の導線を有する電気的接続、ポータブルコンピュータディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、消去可能プログラマブル読み出し専用メモリ（ＥＰＲＯＭもしくはフラッシュメモリ）、光ファイバ、ポータブルコンパクトディスク読み出し専用メモリ（ＣＤ−ＲＯＭ）、光メモリ、磁気メモリ、又はこれらの任意の適切な組み合わせを含むことができるが、これらに限定されない。 The computer-readable storage medium of the present invention may be a computer-readable signal storage medium, a computer-readable storage medium, or any combination thereof. The computer-readable storage medium can be, but is not limited to, for example, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any combination thereof. More specific examples of computer-readable storage media include electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), and erasable programmable read-only memory. It can include, but is not limited to, (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical memory, magnetic memory, or any suitable combination thereof.

本発明において、コンピュータ可読記憶媒体は、命令実行システム、装置もしくはデバイスによって使用可能な、又はそれらに組み込まれて使用可能なプログラムを包含又は格納する任意の有形の記憶媒体であってもよい。本発明において、コンピュータ可読信号記憶媒体は、ベースバンド内で、又はキャリアの一部として伝搬される、コンピュータ可読プログラムコードが担持されたデータ信号を含んでもよい。このような伝搬されたデータ信号は、様々な形態をとることができ、電磁信号、光信号、又はこれらの任意の適切な組み合わせを含むことができるが、これらに限定されない。コンピュータ可読信号記憶媒体は、更にコンピュータ可読記憶媒体以外の任意のコンピュータ可読記憶媒体であってもよい。前記コンピュータ可読記憶媒体は、命令実行システム、装置もしくはデバイスによって使用されるか、又はそれらに組み込まれて使用されるプログラムを、送信、伝搬又は転送することができる。コンピュータ可読記憶媒体に含まれるプログラムコードは任意の適切な媒体で送信することができ、無線、有線、光ケーブル、ＲＦなど、又はこれらの任意の適切な組み合わせを含むが、これらに限定されない。 In the present invention, the computer-readable storage medium may be any tangible storage medium that includes or stores programs that can be used by, or incorporated into, instruction execution systems, devices or devices. In the present invention, the computer-readable signal storage medium may include a data signal carrying a computer-readable program code that is propagated within the baseband or as part of a carrier. Such propagated data signals can take various forms and can include, but are not limited to, electromagnetic signals, optical signals, or any suitable combination thereof. The computer-readable signal storage medium may be any computer-readable storage medium other than the computer-readable storage medium. The computer-readable storage medium can transmit, propagate or transfer programs used by or embedded in instruction execution systems, devices or devices. The program code contained in the computer-readable storage medium can be transmitted on any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination thereof.

本発明の動作を実行するためのコンピュータプログラムコードは、１種以上のプログラミング言語、又はそれらの組み合わせで作成されることができ、前記プログラミング言語は、Ｊａｖａ（登録商標）、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋などのオブジェクト指向プログラミング言語と、「Ｃ」言語又は同様のプログラミング言語などの従来の手続き型プログラミング言語とを含む。プログラムコードは、完全にユーザのコンピュータ上で実行され、部分的にユーザのコンピュータ上で実行され、独立したソフトウェアパッケージとして実行され、一部がユーザのコンピュータ上で一部がリモートコンピュータ上で実行され、又は完全にリモートコンピュータ又はサーバ上で実行されてもよい。リモートコンピュータに関わる場合、リモートコンピュータは、ローカルエリアネットワーク（ＬＡＮ）又はワイドエリアネットワーク（ＷＡＮ）を含む任意の種類のネットワークを介してユーザのコンピュータに接続されることができ、又は外部のコンピュータに接続されることができる（例えばインターネットサービスプロバイダによりインターネットで接続される）。 The computer program code for performing the operation of the present invention can be created in one or more programming languages or a combination thereof, and the programming language is an object such as Java (registered trademark), Smalltalk, C ++, etc. Includes oriented programming languages and traditional procedural programming languages such as the "C" language or similar programming languages. The program code runs entirely on the user's computer, partially on the user's computer, as a separate software package, partly on the user's computer, and partly on the remote computer. , Or may be run entirely on a remote computer or server. When involved with a remote computer, the remote computer can be connected to the user's computer via any type of network, including a local area network (LAN) or wide area network (WAN), or is connected to an external computer. Can be (eg, connected to the internet by an internet service provider).

図面におけるフローチャート及びブロック図は、本発明の各実施例に係るシステム、方法及びコンピュータプログラム製品により実現可能なアーキテクチャ、機能及び操作を示す。ここで、フローチャート又はブロック図における各ブロックは、１つのモジュール、プログラムセグメントもしくはコードの一部を表してもよく、前記モジュール、プログラムセグメントもしくはコードの一部は、規定されたロジック機能を達成するための１つ以上の実行可能な命令を含む。なお、いくつかの代替実施態様において、ブロック内に示された機能は、図面に示された順番とは異なるもので実行されてもよい。例えば、連続して示された２つのブロックは、実際には関連する機能に応じて、ほぼ並行に実行されてもよく、逆の順番で実行されてもよい。なお、ブロック図及び／又はフローチャートにおける各ブロック、並びに、ブロック図および／又はフローチャートにおけるブロックの組み合わせは、規定された機能もしくは動作を実行する、ハードウェアに基づく専用システムで実現されてもよく、又は、専用ハードウェアとコンピュータ命令との組み合わせで実行されてもよい。 The flowcharts and block diagrams in the drawings show the architecture, functions, and operations that can be realized by the system, method, and computer program product according to each embodiment of the present invention. Here, each block in the flowchart or block diagram may represent one module, program segment or part of code, and the module, program segment or part of code is for achieving a defined logic function. Contains one or more executable instructions of. It should be noted that in some alternative embodiments, the functions shown within the block may be performed in a different order than shown in the drawings. For example, two blocks shown in succession may actually be executed approximately in parallel or in reverse order, depending on the associated function. In addition, each block in the block diagram and / or the flowchart, and the combination of the blocks in the block diagram and / or the flowchart may be realized by a dedicated hardware-based system that executes a specified function or operation, or may be realized. , May be executed in combination with dedicated hardware and computer instructions.

本発明の実施例に記載されたユニットは、ソフトウェアで実現されてもよく、ハードウェアで実現されてもよい。記載されたユニットは、プロセッサに設けられてもよく、例えば、「プロセッサは、ターゲットテキスト受信ユニット、依存関係ツリー生成ユニット、トリプレット決定ユニット及びターゲットトリプレット決定ユニットを備える」ように記載されてもよい。ここで、これらのユニットの名称は、ある場合において前記ユニット自体を限定するものではなく、例えば、ターゲットテキスト受信ユニットは、「ターゲットテキストを受信するユニット」として記載されてもよい。 The units described in the examples of the present invention may be realized by software or hardware. The described units may be provided in the processor, for example, "the processor includes a target text receiving unit, a dependency tree generation unit, a triplet determination unit, and a target triplet determination unit". Here, the names of these units do not limit the unit itself in some cases, and for example, the target text receiving unit may be described as "a unit that receives the target text".

一方、本発明は、コンピュータ可読記憶媒体を更に提供し、前記コンピュータ可読記憶媒体は、前記実施例に記載された装置に含まれるものであってもよく、独立に存在して前記装置に組み立てられていないものであってもよい。前記コンピュータ可読記憶媒体は、一つ又は複数のプログラムを担持しており、前記一つ又は複数のプログラムが前記装置によって実行される場合に、前記装置は、オブジェクト及び前記オブジェクトに対する記述情報を含むターゲットテキストを受信し、ターゲットテキストに対して依存構文解析を行ってターゲットテキストの依存関係ツリーを生成し、予め設定された少なくとも１つの構文構造ツリーと依存関係ツリーとをマッチングさせて、主語、述語及び目的語からなる少なくとも１つのトリプレットを取得し、少なくとも１つのトリプレットにおける１つのトリプレットに含まれる単語及びトリプレットを取得するためにマッチングされる前記構文構造ツリーの事前設定重みに基づいて、少なくとも１つのトリプレットから１つのターゲットトリプレットを決定する。 On the other hand, the present invention further provides a computer-readable storage medium, and the computer-readable storage medium may be included in the apparatus described in the above embodiment, and may exist independently and be assembled in the apparatus. It may not be. The computer-readable storage medium carries one or more programs, and when the one or more programs are executed by the device, the device includes an object and a target containing descriptive information about the object. It receives the text, performs a dependency parsing on the target text to generate a dependency tree for the target text, matches at least one preset syntax structure tree with the dependency tree, and matches the subject, predicate, and dependency tree. At least one triplet based on the preset weights of the parse structure tree that gets at least one triplet of objects and is matched to get the words and triplets contained in one triplet in at least one triplet. Determine one target triplet from.

以上の記載は、本発明の好ましい実施例、及び使用された技術的原理の説明に過ぎない。本発明に係る発明の範囲が、上記の技術的特徴の特定な組み合わせからなる技術案に限定されるものではなく、上記の本発明の趣旨を逸脱しない範囲で、上記の技術的特徴又はそれらの同等の特徴を任意に組み合わせたものからなる他の技術案も含むべきであることを、当業者は理解すべきである。例えば、上記の特徴と、本発明に開示された（これらに限定されない）類似の機能を持っている技術的特徴とを互いに置き換えてなる技術案が挙げられる。 The above description is merely a description of the preferred embodiments of the present invention and the technical principles used. The scope of the invention according to the present invention is not limited to the technical proposal consisting of a specific combination of the above technical features, and the above technical features or their technical features are not deviated from the above-mentioned gist of the present invention. Those skilled in the art should understand that other technical proposals consisting of any combination of equivalent features should also be included. For example, there is a technical proposal that replaces the above-mentioned features with the technical features having similar functions disclosed in the present invention (not limited to these).

Claims

A computer-implemented method of generating information,
A step of executing a process of receiving an object and a target text including descriptive information for the object, and
A step of executing a process of performing dependency parsing on the target text and generating a dependency tree of the target text, in which each node of the dependency tree corresponds to each word of the target text. When,
A step of matching at least one preset syntactic structure tree with the dependency tree to acquire at least one triplet consisting of a subject, a predicate, and an object, which is a step of executing the process of obtaining at least one triplet of the syntactic structure tree. Each node contains the part of speech of the word located at that node, and the triplet consists of a step consisting of nodes in the dependency tree that match the syntax structure tree .
For each of said at least one triplet of words contained in the triplet characteristic, and, based on a pre-set weight of the syntactic structure tree that is matched to obtain the triplet, the predetermined rule, of the triplets It is a step of determining a score and executing a process of determining one target triplet from the at least one triplet based on the determined score, and the score is an object included in the target text and a description of the object. Steps that evaluate the accuracy of expressing information ,
A step of executing a process of determining at least one historical event information related to the target text in a preset historical event information set based on the target triplet.
Including methods.

The process of determining one target triplet from at least one triplet is
The step of determining the quantifier and the adnominal modifier in the target text based on the dependency tree,
A step of determining an object to be modified by the quantifier and an object to be modified by the adnominal modifier.
A step of updating the at least one triplet based on the determined quantifier, adnominal modifier and object.
The step of determining the one target triplet from the at least one updated triplet, and
The method according to claim 1.

The step of updating the at least one triplet based on the determined quantifier, adnominal modifier and object is
A step of determining whether or not the determined object matches the subject or the object of the triplet with respect to the triplet of the at least one triplet.
In response to the determination that the determined object matches the subject of the triplet, the quantifier, the coalition modifier and the subject of the triplet that modify the determined object are combined. With the step of determining the combined text as the subject of the triplet,
In response to the determination that the determined object matches the object of the triplet, the quantifier, the coalition modifier, and the object of the triplet that modify the determined object are added. With the step of combining and determining the combined text as the object of the triplet,
2. The method according to claim 2.

The process of determining one target triplet from at least one triplet is
For the triplet of at least one triplet, the preset weight of the syntactic structure tree matched to obtain the triplet is determined, and the number of characters of the word contained in the triplet is determined. , A step of determining the degree of co-occurrence of the word contained in the triplet, and determining the score of the triplet based on the determined weight, the number of characters, and the degree of co-occurrence.
The step of determining the triplet with the highest score among the at least one triplet as the target triplet, and
The method according to any one of claims 1 to 3.

The method is
A step to execute the process of acquiring at least one historical target triplet, and
A step of executing a process of statisticizing the number of historical target triplets obtained by matching a predetermined syntactic structure tree among the at least one historical target triplets.
A step of executing a process of determining the weight of at least one syntactic structure tree based on the statistical result,
The method according to any one of claims 1 to 3, further comprising.

The method,
And executing a process of determining the similarity of the said target text at least one historical event information,
A step of executing a process of outputting historical event information having the highest degree of similarity to the target text, and
The method according to claim 1, further comprising.

The historical event information includes participant information and trigger word information.
The process of executing the process of determining at least one historical event information related to the target text in the preset historical event information set based on the target triplet is
The condition that the subject or object of the target triplet matches the participant information of the history event information in the history event information set, or the predicate of the target triplet is the history event information in the history event information set. A step of determining whether or not the condition matching the trigger word information is satisfied, and
A step of determining that the historical event information satisfies at least one of the above-mentioned conditions associated with the target text.
The method according to claim 1.

The historical event information includes keywords and includes keywords.
The process of determining the similarity between the target text and the at least one historical event information is
The step of dividing the target text to obtain the first word set, and
A second word set is acquired by concatenating each keyword included in the history event information with respect to the history event information of the at least one history event information and dividing the concatenated text. Steps and
A step of determining the similarity between the target text and the historical event information based on the first word set and the second word set.
6. The method according to claim 6.

A target text receiving unit configured to perform a process of receiving an object and a target text containing descriptive information about the object, and a target text receiving unit.
It is a dependency tree generation unit configured to perform a dependency parsing on the target text and generate a dependency tree of the target text, and each node of the dependency tree is the said. Dependency tree generation unit for each word in the target text ,
Matching the at least one syntax structure tree and the dependency tree is set in advance, subject, met triplet determination unit configured to perform a process of acquiring the predicate and at least one triplet consists object Each node of the syntactic structure tree contains the part of speech of the word located at the node, and the triplet is a triplet determination unit composed of nodes of the dependency tree matching the syntactic structure tree .
For each of said at least one triplet of words contained in the triplet characteristic, and, based on a pre-set weight of the syntactic structure tree that is matched to obtain the triplet, the predetermined rule, of the triplets A target triplet determination unit configured to determine a score and perform a process of determining one target triplet from the at least one triplet based on the determined score, wherein the score is the target text. The target triplet determination unit, which evaluates the accuracy of the objects contained in the lens and the descriptive information of the objects, and the target triplet determination unit .
A historical event information determination unit configured to perform a process of determining at least one historical event information associated with the target text in a preset historical event information set based on the target triplet.
A device for generating information including.

The target triplet determination unit is
An adnominal modifier determination module configured to determine quantifiers and adnominal modifiers in the target text based on the dependency tree.
An object determination module configured to determine the object to be modified by the quantifier and the object to be modified by the adnominal modifier.
A triplet update module configured to update at least one triplet based on the determined quantifier, adnominal modifier and object.
A target triplet determination module configured to determine the one target triplet from the at least one updated triplet.
9. The apparatus according to claim 9.

The triplet update module further determines, for the triplet of at least one triplet, whether the determined object matches the subject or object of the triplet.
In response to the determination that the determined object matches the subject of the triplet, the quantifier, the coalition modifier and the subject of the triplet that modify the determined object are combined. The combined text is determined as the subject of the triplet and
In response to the determination that the determined object matches the object of the triplet, the quantifier, the coalition modifier, and the object of the triplet that modify the determined object are added. Combined and configured to determine the combined text as the object of the triplet.
The device according to claim 10.

The target triplet determination unit further determines, for the triplet of at least one triplet, the preset weight of the syntactic structure tree that is matched to obtain the triplet, and includes the triplet. The number of characters of the word to be used is determined, the degree of co-occurrence of the word contained in the triplet is determined, and the score of the triplet is determined based on the determined weight, number of characters and degree of co-occurrence.
Of the at least one triplet, the triplet with the highest score is configured to be determined as the target triplet.
The device according to any one of claims 9 to 11.

The device is
A historical target triplet module configured to retrieve at least one historical target triplet,
A triplet quantity statistics module configured to statistic the number of historical target triplets obtained by matching a given syntactic structure tree of at least one historical target triplet.
A weight determination module configured to determine the weight of at least one syntax structure tree based on the statistical results.
Including a weight installation unit consisting of
The device according to any one of claims 9 to 11.

The apparatus comprising
A similarity determination unit configured to perform a process of determining the similarity between the target text and at least one historical event information.
It further includes a history event information output unit configured to execute a process of outputting history event information having the highest degree of similarity to the target text.
The device according to claim 9.

The historical event information includes participant information and trigger word information.
In the history event information determination unit, a condition in which the subject or object of the target triplet matches the participant information of the history event information in the history event information set, or a predicate of the target triplet is the history. It is determined whether or not the condition that matches the trigger word information of the history event information in the event information set is satisfied.
The historical event information is configured to determine that it meets at least one of the above-mentioned conditions associated with the target text.
The device according to claim 9.

The historical event information includes keywords and includes keywords.
The similarity determination unit further divides the target text to obtain a first word set.
A second word set is acquired by concatenating each keyword included in the history event information with respect to the history event information of the at least one history event information and dividing the concatenated text. , The similarity between the target text and the historical event information is determined based on the first word set and the second word set.
The device according to claim 14.

It ’s a facility,
With one or more processors
A storage device in which one or more programs are stored,
The method according to any one of claims 1 to 8, when the one or more programs are executed by the one or more processors, causes the one or more processors to execute the method.
Facility.

A computer-readable storage medium that stores computer programs.
The method according to any one of claims 1 to 8 is realized when the program is executed by a processor.
Computer-readable storage medium.

It ’s a computer program,
The method according to any one of claims 1 to 8 is realized when the computer program is executed by a processor.
Computer program.