JP3241556U

JP3241556U - language data processing system

Info

Publication number: JP3241556U
Application number: JP2023000396U
Authority: JP
Inventors: ユーロンチョウ
Original assignee: 犀動智能科技股▲ふん▼有限公司
Priority date: 2022-10-24
Filing date: 2023-02-13
Publication date: 2023-04-13
Anticipated expiration: 2033-02-13

Abstract

【課題】適応範囲のより広い言語データ処理システムを提供する。【解決手段】言語データ処理システム１は、処理ユニット１１と、記憶ユニット１２と、を含む。記憶ユニット１２には、機械学習技術により実現され、複数の意図タグｄ１を含む言語処理モデルＭが格納されている。処理ユニット１１は、言語処理モデルＭを用いて音声入力に対応するテキストデータに意味解析処理を実行することによって意味解析結果を得て、意味解析結果に基づいて、意図タグｄ１においてマッチングされた意図タグがあるかどうかを判断し、マッチングされた意図タグがないと判断される場合において、言語処理モデルＭを用いて意味解析結果とテキストデータに関連するＮ個の言語処理履歴とに基づいて動作予測処理を実行することによって、動作手順ｄ２からテキストデータに対応する目標の動作手順を選択して実行し、Ｎは１以上の整数である、ように構成される。【選択図】図１A linguistic data processing system with a wider range of applications is provided. A language data processing system (1) includes a processing unit (11) and a storage unit (12). The storage unit 12 stores a language processing model M realized by machine learning technology and including a plurality of intention tags d1. The processing unit 11 obtains a semantic analysis result by performing semantic analysis processing on the text data corresponding to the speech input using the language processing model M, and based on the semantic analysis result, the intention matched in the intention tag d1 Determining whether there is a tag, and if it is determined that there is no matching intention tag, operate based on the semantic analysis result and N language processing histories related to the text data using the language processing model M By executing the prediction process, a target operation procedure corresponding to the text data is selected from the operation procedure d2 and executed, and N is an integer of 1 or more. [Selection drawing] Fig. 1

Description

本考案は、データ処理システムに関し、特に、ユーザが口述した内容について処理する言語データ処理システムに関する。 The present invention relates to data processing systems and, more particularly, to linguistic data processing systems that process user dictated content.

言語処理技術の発展に伴い、音声操作可能な電子装置を操作しようとする際に、ユーザは、特定の音声コマンドに限定されることなく、より日常的な表現で該電子装置にコマンドを発することができる。 With the development of language processing technology, when trying to operate a voice-operable electronic device, the user is not limited to a specific voice command, and can issue commands to the electronic device in more everyday expressions. can be done.

台湾特許出願公開第２０１９１６００２号公報Taiwan Patent Application Publication No. 201916002

しかしながら、同一のリクエストを表すコマンドの表現は様々であり、リクエストの表し方が明確でないコマンドについて、従来技術ではユーザの意図を判断することができない可能性があるため、従来技術にはまだ改善の余地がある。 However, there are various expressions of commands that express the same request, and there is a possibility that the conventional technique cannot determine the user's intention for a command whose expression of the request is not clear. There is room.

従って、本考案の目的は、従来技術の欠点を少なくとも１つ軽減することができる言語データ処理システムを提供することにある。 SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide a linguistic data processing system that alleviates at least one of the disadvantages of the prior art.

言語データ処理システムは、処理ユニットと、処理ユニットに電気的に接続する記憶ユニットと、を含む。 The language data processing system includes a processing unit and a storage unit electrically connected to the processing unit.

記憶ユニットには、機械学習技術により実現される言語処理モデルが格納されている。言語処理モデルは、複数の意図タグを含む。意図タグのそれぞれは、処理ユニットにより実行される複数の動作手順のうちの少なくとも１つに対応する。 The storage unit stores a language processing model realized by machine learning technology. A language processing model includes multiple intent tags. Each of the intent tags corresponds to at least one of a plurality of action steps to be performed by the processing unit.

処理ユニットは、音声入力に対応するテキストデータを得て、言語処理モデルを用いてテキストデータに意味解析処理を実行することによって、テキストデータに対応する意味解析結果を得て、意味解析結果に基づいて、意図タグにおいてマッチングされた意図タグがあるかどうかを判断し、意図タグにおいてマッチングされた意図タグがないと判断される場合において、言語処理モデルを用いて意味解析結果とテキストデータに関連するＮ個の言語処理履歴とに基づいて動作予測処理を実行することによって、動作手順からテキストデータに対応する目標の動作手順を選択し、Ｎは１以上の整数であり、目標の動作手順を実行する、ように構成される。 The processing unit obtains text data corresponding to the voice input, performs semantic analysis processing on the text data using the language processing model, obtains a semantic analysis result corresponding to the text data, and performs semantic analysis based on the semantic analysis result. If it is determined that there is no intention tag matched in the intention tag, the semantic analysis result and the text data are related using the language processing model. A target motion procedure corresponding to the text data is selected from the motion procedures by executing motion prediction processing based on N language processing histories, where N is an integer equal to or greater than 1, and the target motion procedure is executed. be configured to

意図タグにおいてマッチングされた意図タグがないと判断する場合において（すなわち、音声入力に対応するテキストデータが表すユーザの意図を明確に判断することができない場合において）、処理ユニットは、言語処理結果と言語処理履歴とに基づいて動作予測処理を実行することによって、動作手順からテキストデータに対応する目標の動作手順を選択する。言い換えれば、ユーザが話しかけた内容だけではユーザの意図を判断できないと判断した場合において、処理ユニットは、ユーザが最近に話しかけた内容にさらに基づいて意図を判断し、実行する動作手順を選択する。従って、ユーザの意図の表現が完全ではなくても、或いは、意図の表現の合間に該意図と関連しない内容を差し挟むようにして喋っても、本考案の言語データ処理システムは、意図を判断することができ、音声操作の適応範囲を広げることができる。 When it is determined that there is no intention tag matched in the intention tag (that is, when the user's intention expressed by the text data corresponding to the voice input cannot be clearly determined), the processing unit outputs the language processing result and A target action procedure corresponding to the text data is selected from the action procedures by executing the action prediction process based on the language processing history. In other words, when it is determined that the user's intention cannot be determined only from the content of the user's speech, the processing unit determines the intention further based on the content of the user's recent speech, and selects an operation procedure to be executed. Therefore, even if the expression of the user's intention is not complete, or even if the user speaks while inserting content unrelated to the intention between the expressions of the intention, the language data processing system of the present invention can judge the intention. It is possible to expand the applicable range of voice operation.

本考案の他の特徴及び利点は、添付の図面を参照する以下の実施形態の詳細な説明において明白になる。 Other features and advantages of the present invention will become apparent in the following detailed description of embodiments which refers to the accompanying drawings.

本考案の言語データ処理システムの一実施形態が例示的に示されているブロック図である。1 is a block diagram illustratively illustrating one embodiment of a language data processing system of the present invention; FIG. 言語データ処理システムにより実行される言語データ処理方法が例示的に示されているフローチャートである。1 is a flow chart exemplifying a language data processing method executed by a language data processing system; ユーザとサービス装置との会話の一例が示されている図である。FIG. 3 is a diagram showing an example of a conversation between a user and a service device; FIG.

本考案をより詳細に説明する前に、本明細書において、「結合」又は「接続」という用語は、複数の電気機器／装置／設備の間が導電材料（例えば、電線）により直接的に接続されること、或いは、２つの電気機器／装置／設備の間が他の一つ以上の機器／装置／設備又は無線通信により間接的に接続されることを意味する。また、本明細書において、「ユニット」という用語は、コンピュータハードウェアを示し、例えば、「制御ユニット」は、制御機能を有するコンピュータハードウェアまたはその組み合わせを示す。 Before describing the present invention in more detail, as used herein, the term "coupling" or "connection" refers to a direct connection between a plurality of electrical devices/equipment/equipment by means of conductive material (e.g., electrical wires). or that there is an indirect connection between two electrical appliances/devices/installations by one or more other appliances/devices/installations or wireless communication. Also, as used herein, the term "unit" refers to computer hardware, for example, "control unit" refers to computer hardware or a combination thereof having control functions.

図１を参照すると、言語データ処理システム１の一実施形態が示されている。言語データ処理システム１は、１つまたは複数のサービス装置２とネットワークを介して電気的に接続することができる。ここで、説明の便宜上、１つのサービス装置２と電気的に接続する言語データ処理システム１を例として説明する。従って、図１において、１つのサービス装置２が描かれている。 Referring to FIG. 1, one embodiment of a language data processing system 1 is shown. Language data processing system 1 can be electrically connected to one or more service devices 2 via a network. Here, for convenience of explanation, the language data processing system 1 electrically connected to one service device 2 will be explained as an example. Therefore, in FIG. 1, one service device 2 is depicted.

本実施形態において、サービス装置２は、宿泊施設（例えば、ホテル）の客室に設置されるものであり、マイクロフォン（図示せず）と、スピーカー（図示せず）と、ディスプレイ（図示せず）と、を含む。これにより、サービス装置２は、マイクロフォンを介してユーザ（例えば、客室の宿泊客）の声を拾ったり、スピーカーを用いて対応の音声データを流したり、ユーザの参考のためにディスプレイを用いて対応の情報を表示したりする。なお、サービス装置２は、本実施形態に限定されない。言い換えると、本考案の適応場所は宿泊施設に限定されない。 In this embodiment, the service device 2 is installed in a guest room of an accommodation facility (for example, a hotel), and includes a microphone (not shown), a speaker (not shown), and a display (not shown). ,including. As a result, the service device 2 picks up the voice of a user (for example, a guest in a guest room) via a microphone, transmits corresponding voice data using a speaker, and responds using a display for the user's reference. information. Note that the service device 2 is not limited to this embodiment. In other words, the location of application of the present invention is not limited to lodging facilities.

本実施形態において、言語データ処理システム１は、サーバー機器であって、処理ユニット１１と、処理ユニット１１に電気的に接続する記憶ユニット１２と、を含む。処理ユニット１１は、ネットワークを介してサービス装置２と電気的に接続し、本実施形態において、処理ユニット１１は、データの演算及び処理の機能を有する１つの中央処理装置であるが、他の実施形態において、処理ユニット１１は、複数の中央処理装置の組み合わせであってもよく、本実施形態に限定されない。また、本実施形態において、記憶ユニット１２は、デジタルデータを格納する１つのデータ記憶装置であり、具体的には、例えば、ハードディスクであるが、他の実施形態において、記憶ユニット１２は、異なる種類のコンピュータ読み取り可能な記憶媒体または複数のコンピュータ読み取り可能な記憶媒体の組み合わせであってもよく、本実施形態に限定されない。また、他の実施形態において、言語データ処理システム１は、互いに電気的に接続する複数のサーバー機器であってもよく、本実施形態に限定されない。 In this embodiment, the language data processing system 1 is a server device and includes a processing unit 11 and a storage unit 12 electrically connected to the processing unit 11 . The processing unit 11 is electrically connected to the service device 2 through a network. In this embodiment, the processing unit 11 is a single central processing unit having data calculation and processing functions. In terms of form, the processing unit 11 may be a combination of multiple central processing units, and is not limited to this embodiment. In addition, in this embodiment, the storage unit 12 is one data storage device for storing digital data, specifically, for example, a hard disk. or a combination of multiple computer-readable storage media, and is not limited to this embodiment. Also, in another embodiment, the language data processing system 1 may be a plurality of server devices electrically connected to each other, and is not limited to this embodiment.

記憶ユニット１２には、機器学習技術により実現され、処理ユニット１１により用いられる言語処理モデルＭが格納されている。本実施形態において、言語処理モデルＭは、意味解析サブモデルｍ１と、動作予測サブモデルｍ２と、を含む。 The storage unit 12 stores a language processing model M that is realized by machine learning technology and used by the processing unit 11 . In this embodiment, the language processing model M includes a semantic analysis submodel m1 and a motion prediction submodel m2.

より具体的には、意味解析サブモデルｍ１は、予め設定される複数の意図タグｄ１を含む。意図タグｄ１は、意図を示し、例えば、「電気を消す」、「アラームを設定する」、「テレビをつける」などを示すものを含んでもよいが、これらに限定されない。本実施形態において、意味解析サブモデルｍ１は、ニューラルネットワークであり、意図タグｄ１が設定された後、複数のフレーズデータをトレーニングデータとして、機械学習によりトレーニングされたものである。フレーズデータは、自然言語でリクエストを直接にまたは間接に表すフレーズであり、例えば、「部屋のテレビがつかない。故障したみたい。」を含んでもよいが、これに限定されない。 More specifically, the semantic analysis submodel m1 includes a plurality of preset intention tags d1. The intention tag d1 indicates an intention and may include, but is not limited to, those indicating "turn off lights", "set alarm", "turn on TV", and the like. In this embodiment, the semantic analysis sub-model m1 is a neural network, which is trained by machine learning using a plurality of phrase data as training data after setting the intention tag d1. Phrase data is a phrase directly or indirectly expressing the request in natural language, and may include, but is not limited to, for example, "The TV in the room does not work. It seems to be broken."

これにより、トレーニングされた意味解析サブモデルｍ１は、入力データに意味解析処理を実行することができ、意味解析処理を実行することにより、入力データから意図を推測し、且つ、入力データから少なくとも１つのポイントとなる１つまたはそれ以上の文字を抽出する。 Thereby, the trained semantic analysis sub-model m1 is capable of performing semantic analysis processing on the input data, performing the semantic analysis processing to infer intent from the input data, and at least one Extract one or more characters that are one point.

意味解析サブモデルｍ１は、入力データが表すユーザの意図が、意図タグｄ１のうちの少なくとも１つにマッチングするかどうかをさらに判断することもできる。例えば、「部屋の電気を消して」という入力データについて、意味解析サブモデルｍ１は、該入力データに意味解析処理を実行することにより、入力データが表す意図を判断し、入力データが表す意図が意図タグｄ１における「電気を消す」を示す意図タグｄ１にマッチングすると判断し、入力データから「電気」及び「消す」を抽出する。 The semantic analysis sub-model m1 may also determine whether the user's intent represented by the input data matches at least one of the intent tags d1. For example, for the input data "Turn off the lights in the room", the semantic analysis submodel m1 executes semantic analysis processing on the input data to determine the intention expressed by the input data. It is determined that the intention tag d1 matches the intention tag d1 indicating "turn off the lights", and "turn off the lights" and "turn off" are extracted from the input data.

動作予測サブモデルｍ２は、予め設定され、処理ユニット１１により実行される複数の動作手順ｄ２を含む。処理ユニット１１は、動作手順ｄ２を実行することによって、サービス装置２を制御し、サービス装置２にデータを出力させる。例えば、サービス装置２に文字データまたは映像データを表示させたり、音声データを流させたり、サービス装置２に接続する電子装置を制御するように有線通信または無線通信を介して該電子装置（例えば、同客室にある電灯、テレビ、電動カーテンなど）に制御信号を送信させたりする。 The motion prediction sub-model m2 includes a plurality of motion procedures d2 that are set in advance and executed by the processing unit 11. FIG. The processing unit 11 controls the service device 2 and causes the service device 2 to output data by executing the operation procedure d2. For example, the electronic device (for example, (lights, televisions, electric curtains, etc. in the same room) to transmit control signals.

動作手順ｄ２は、意味解析サブモデルｍ１に含まれる意図タグｄ１に対応する。より具体的には、意図タグｄ１のそれぞれは、少なくとも１つの動作手順ｄ２に対応する。例えば、「電気を消す」を示す意図タグｄ１は、サービス装置２が電気を消すように制御信号を電灯に送信するようにする動作手順ｄ２と、サービス装置２が電気を消したことを示す音声データを流すようにする動作手順ｄ２と、に対応する。 The action procedure d2 corresponds to the intention tag d1 included in the semantic analysis submodel m1. More specifically, each intent tag d1 corresponds to at least one action procedure d2. For example, the intention tag d1 indicating "turn off the lights" includes an operation procedure d2 for causing the service device 2 to transmit a control signal to the lamp to turn off the lights, and a voice indicating that the service device 2 has turned off the lights. and an operation procedure d2 for causing data to flow.

本実施形態において、動作予測サブモデルｍ２は、ニューラルネットワークであり、動作手順ｄ２が設定された後、複数の会話データをトレーニングデータとして、機械学習によりトレーニングされたものである。会話データのそれぞれは、会話する２人が交代で複数回発話した内容を示すものであり、複数の要求フレーズと、要求フレーズにそれぞれ対応する複数の返答フレーズと、要求フレーズに関連する複数の要求意図と、要求フレーズに関連する複数の要求ポイント部分と、を含む。複数の要求フレーズ及び複数の返答フレーズには順序性があり、さらに、返答フレーズのそれぞれは、対応の要求フレーズに続く、言い換えれば、複数の要求フレーズと複数の返答フレーズとは交錯している。より具体的には、要求フレーズは、サービスが提供される人（例えば、ホテルの宿泊客）がリクエストを自然言語を用いて口語体で表現するものである。返答フレーズは、サービスを提供する人（例えば、ホテルの従業員）がリクエストへの返答を自然言語を用いて口語体で表現するものである。要求意図のそれぞれは、要求フレーズのうちの１つに対応し、対応の要求フレーズの意図を表す。また、要求意図は、意味解析サブモデルｍ１を用いて対応の要求フレーズに意味解析処理を実行することによって生成されるもの、または、人により設定されるものであってもよい。要求ポイント部分は、要求フレーズのうちの１つに対応し、対応の要求フレーズにおけるポイントとなる１つまたはそれ以上の文字である。また、要求ポイント部分は、意味解析サブモデルｍ１を用いて対応の要求フレーズに意味解析処理を実行することによって生成されるもの、または、人により設定されるものであってもよい。 In this embodiment, the motion prediction sub-model m2 is a neural network, which is trained by machine learning using a plurality of conversation data as training data after the motion procedure d2 is set. Each piece of conversation data indicates the content of multiple utterances made by two people taking turns in conversation, and includes a plurality of request phrases, a plurality of reply phrases corresponding to the request phrases, and a plurality of requests related to the request phrases. It contains an intent and multiple request point parts associated with the request phrase. The plurality of request phrases and the plurality of reply phrases are ordered, and each of the reply phrases follows the corresponding request phrase, in other words, the plurality of request phrases and the plurality of reply phrases are interlaced. More specifically, a request phrase is a colloquial expression of a request by a person to whom a service is provided (eg, a hotel guest) using natural language. A reply phrase is a colloquial expression of a response to a request by a service provider (eg, a hotel employee) using natural language. Each request intent corresponds to one of the request phrases and represents the intent of the corresponding request phrase. The request intent may also be generated by performing a semantic analysis process on the corresponding request phrase using the semantic analysis sub-model m1, or set by a person. A request point portion is one or more characters that correspond to one of the request phrases and are points in the corresponding request phrase. Also, the request point part may be generated by executing semantic analysis processing on the corresponding request phrase using the semantic analysis sub-model m1, or may be set by a person.

これにより、トレーニングされた動作予測サブモデルｍ２は、入力データに基づいて動作予測処理を実行することができ、動作予測処理を実行することによって動作手順ｄ２の１つを選択する。 This allows the trained motion prediction sub-model m2 to perform motion prediction processing based on the input data, and select one of the motion procedures d2 by performing the motion prediction processing.

なお、動作手順ｄ２は、必ずしも動作予測サブモデルｍ２に含まれるとは限らない。例えば、他の実施形態において、動作予測サブモデルｍ２は、動作手順ｄ２にそれぞれ対応する複数の動作タグを含み、処理ユニット１１は、動作タグに基づいて対応の動作手順ｄ２を実行するように構成される。 Note that the action procedure d2 is not necessarily included in the action prediction sub-model m2. For example, in another embodiment, the action prediction sub-model m2 includes a plurality of action tags respectively corresponding to action procedures d2, and the processing unit 11 is configured to execute the corresponding action procedures d2 based on the action tags. be done.

図２を参照し、以下では、言語データ処理システム１により実行される言語データ処理方法が例示的に示されている。 Referring to FIG. 2, the language data processing method executed by the language data processing system 1 is exemplified below.

ステップＳ１において、処理ユニット１１は、サービス装置２から音声入力に対応するテキストデータを得て、言語処理モデルＭの意味解析サブモデルｍ１を用いて、該テキストデータに意味解析処理を実行することによって、該テキストデータに対応する意味解析結果を得る。すなわち、テキストデータは入力データとして意味解析サブモデルｍ１に入力され、意味解析結果は意味解析サブモデルｍ１によりテキストデータに基づいて生成された出力データである。 In step S1, the processing unit 11 obtains text data corresponding to speech input from the service device 2, and uses the semantic analysis submodel m1 of the language processing model M to perform semantic analysis processing on the text data. , to obtain a semantic analysis result corresponding to the text data. That is, text data is input to the semantic analysis sub-model m1 as input data, and the semantic analysis result is output data generated based on the text data by the semantic analysis sub-model m1.

本実施形態において、意味解析結果は、テキストデータから推測された意図を示す推測意図データと、それぞれテキストデータから抽出されたポイントとなる少なくとも１つの文字を含む少なくとも１つのポイント部分と、を含む。推測意図データは、多次元の意味ベクトルである。すなわち、推測意図データは、ベクトルの形式でテキストデータが表す意図を示す。具体的には、意味ベクトルである推測意図データが有する複数の成分ベクトルは、複数の意図タグｄ１にそれぞれ対応し、成分ベクトルの大きさ（例えば、０から１の範囲にあるが、これに限定されない）は、該対応の意図タグｄ１の推測意図データとのマッチング程度を示す。ポイント部分のそれぞれは、意味解析サブモデルｍ１によりテキストデータにおける意図と高度に関連すると判断される少なくとも１つの文字である。 In this embodiment, the semantic analysis result includes inferred intent data indicating the intent inferred from the text data, and at least one point portion including at least one character each serving as a point extracted from the text data. Inferred intent data is a multi-dimensional semantic vector. That is, the inferred intent data indicates the intent represented by the text data in the form of vectors. Specifically, a plurality of component vectors of the inferred intention data, which is a semantic vector, correspond to a plurality of intention tags d1, respectively, and have component vector magnitudes (for example, in the range of 0 to 1, but limited to this). not) indicates the degree of matching with the inferred intention data of the corresponding intention tag d1. Each of the point parts is at least one character determined to be highly relevant to the intent in the text data by the semantic analysis submodel m1.

また、意味解析サブモデルｍ１は、それ自身が有するパラメーターを用いてテキストデータに計算をすることによって、意味解析処理を実行し意味解析結果を生成する。なお、意味解析サブモデルｍ１が有するパラメーター及びテキストデータに対する計算は、機械学習により意味解析サブモデルｍ１をトレーニングした結果であり、詳細は本明細書のポイントではないため、詳しく説明しない。 Also, the semantic analysis sub-model m1 performs semantic analysis processing and generates semantic analysis results by performing calculations on text data using its own parameters. Calculations for the parameters and text data of the semantic analysis sub-model m1 are the result of training the semantic analysis sub-model m1 by machine learning, and the details are not the point of this specification, so they will not be described in detail.

本実施形態において、音声入力は、サービス装置２が拾った音声の電気信号であり、例えば、ユーザ（例えば、宿泊客）がリクエストを表すために喋った１つのフレーズの電気信号である。サービス装置２は、音声入力を得た後、音声認識技術を用いて該音声入力を処理し、対応のテキストデータを生成して言語データ処理システム１の処理ユニット１１に送信する。また、他の実施形態において、サービス装置２は、音声入力を得た後、音声入力をそのまま言語データ処理システム１の処理ユニット１１に送信し、処理ユニット１１が音声認識技術を用いて該音声入力を処理し、対応のテキストデータを生成するようにしてもよい。 In this embodiment, the voice input is an electrical signal of a voice picked up by the service device 2, for example a single phrase spoken by a user (eg, a guest) to express a request. After obtaining the voice input, the service device 2 uses voice recognition technology to process the voice input, generates corresponding text data and sends it to the processing unit 11 of the language data processing system 1 . In another embodiment, after receiving the voice input, the service device 2 directly transmits the voice input to the processing unit 11 of the language data processing system 1, and the processing unit 11 uses voice recognition technology to recognize the voice input. may be processed to generate corresponding text data.

ステップＳ２において、意味解析結果に基づいて、処理ユニット１１は、意図タグｄ１においてマッチングされた意図タグがあるかどうかを判断する。より具体的には、処理ユニット１１は、意図タグｄ１において推測意図データとのマッチング程度がマッチング閾値以上である少なくとも１つの意図タグｄ１があるかどうかを判断することにより、マッチングされた意図タグがあるかどうかを判断する。意図タグｄ１において推測意図データとのマッチング程度がマッチング閾値以上である少なくとも１つの意図タグｄ１があると判断される場合において、処理ユニット１１は、推測意図データとのマッチング程度が最も高い意図タグｄ１をマッチングされた意図タグとし、意図タグｄ１においてマッチングされた意図タグがあると判断する。意図タグｄ１において推測意図データとのマッチング程度がマッチング閾値以上である意図タグｄ１がないと判断される場合において、処理ユニット１１は、意図タグｄ１においてマッチングされた意図タグがないと判断する。 In step S2, based on the semantic analysis result, the processing unit 11 determines whether there is an intention tag matched in the intention tag d1. More specifically, the processing unit 11 determines whether there is at least one intention tag d1 whose degree of matching with the inferred intention data is equal to or greater than the matching threshold in the intention tag d1, thereby determining whether the matched intention tag is determine whether there is When it is determined that there is at least one intention tag d1 whose degree of matching with the inferred intention data is equal to or greater than the matching threshold among the intention tags d1, the processing unit 11 selects the intent tag d1 with the highest degree of matching with the inferred intention data. is a matched intention tag, and it is determined that there is a matched intention tag in the intention tag d1. When it is determined that there is no intention tag d1 whose degree of matching with the inferred intention data is equal to or greater than the matching threshold in the intention tag d1, the processing unit 11 determines that there is no intention tag matched in the intention tag d1.

処理ユニット１１がマッチングされた意図タグがあると判断する場合において、フローはステップＳ３へ進み、そうでない場合において、フローはステップＳ４へ進む。 If the processing unit 11 determines that there is a matched intent tag, the flow proceeds to step S3, otherwise the flow proceeds to step S4.

マッチングされた意図タグがあると判断されると、ステップＳ３において、処理ユニット１１は、マッチングされた意図タグに対応する少なくとも１つの動作手順ｄ２を実行する。例えば、マッチングされた意図タグが「電気を消す」を示す意図タグｄ１である場合において、処理ユニット１１は、本実施形態において、サービス装置２が電気を消すように制御信号を電灯に送信するようにする動作手順ｄ２と、サービス装置２が電気を消したことを示す音声データを流すようにする動作手順ｄ２と、を実行するが、これに限定されない。 When it is determined that there is a matched intent tag, in step S3 the processing unit 11 performs at least one action procedure d2 corresponding to the matched intent tag. For example, when the matched intention tag is the intention tag d1 indicating "turn off", the processing unit 11, in this embodiment, causes the service device 2 to send a control signal to the lamp to turn off the light. and an operation procedure d2 for outputting voice data indicating that the service device 2 has turned off the lights, but the present invention is not limited to this.

ステップＳ３を実行した後、フローはステップＳ６へ進む。 After executing step S3, the flow proceeds to step S6.

マッチングされた意図タグがないと判断されると、ステップＳ４において、処理ユニット１１は、言語処理モデルＭの動作予測サブモデルｍ２を用いて、意味解析結果とテキストデータに関連するＮ個の言語処理履歴とに基づいて、動作予測処理を実行することによって、動作手順ｄ２からテキストデータに対応する目標の動作手順を選択する。より具体的には、処理ユニット１１は、意味解析結果とＮ個の言語処理履歴とに基づいて、意味統合結果を生成し、動作手順ｄ２における意味統合結果とのマッチング程度が最も高い動作手順ｄ２を目標の動作手順として選択する。すなわち、意味解析結果とＮ個の言語処理履歴とは入力データとして動作予測サブモデルｍ２に入力され、目標の動作手順は、動作予測サブモデルｍ２により意味解析結果とＮ個の言語処理履歴とに基づいて生成された出力データである。本実施形態において、Ｎは２であるが、他の実施形態において、Ｎは１以上の任意の整数であってもよく、本実施形態に限定されない。 When it is determined that there is no matching intention tag, in step S4, the processing unit 11 performs semantic analysis results and N language processing related to the text data using the motion prediction submodel m2 of the language processing model M. A target operation procedure corresponding to the text data is selected from the operation procedure d2 by executing the operation prediction process based on the history. More specifically, the processing unit 11 generates a semantic integration result based on the semantic analysis result and the N language processing histories, and selects the operation procedure d2 that has the highest degree of matching with the semantic integration result in the operation procedure d2. is selected as the target operating procedure. That is, the semantic analysis result and the N language processing histories are input to the action prediction submodel m2 as input data, and the target action procedure is applied to the semantic analysis result and the N language processing histories by the action prediction submodel m2. is the output data generated based on In this embodiment, N is 2, but in other embodiments, N may be any integer greater than or equal to 1 and is not limited to this embodiment.

また、Ｎ個の言語処理履歴は、処理ユニット１１がテキストデータを得る前の一定の期間内（例えば、テキストデータを得るまでの５分間）にサービス装置２から得られたＮ個のテキストデータ（以下、Ｎ個の過去のテキストデータと呼称する）にそれぞれ対応する。Ｎ個の過去のテキストデータは、音声入力がサービス装置２に入力される前に入力されたＮ個の音声入力（以下、Ｎ個の過去の音声入力と呼称する）にそれぞれ対応し、すなわち、ユーザが該一定の期間内に喋ったＮ個のフレーズにそれぞれ対応する。
本実施形態において、Ｎ個の言語処理履歴は、処理ユニット１１が、テキストデータに対して今回の言語データ処理方法を実行する前に、Ｎ個の過去のテキストデータに対してそれぞれ言語データ処理方法を実行する時に、Ｎ個の過去のテキストデータに対してそれぞれ意味解析処理を実行して、対応のＮ個の意味解析結果（以下、Ｎ個の過去の意味解析結果と呼称する）を生成し、少なくともＮ個の過去の意味解析結果にそれぞれ基づいて生成されて記憶ユニット１２に格納されたものである。言語処理履歴は、過去の推測意図データと、少なくとも１つの過去のポイント部分と、を含む。過去の推測意図データは、多次元の意味ベクトルであり、対応の過去の意味解析結果の推測意図データである。過去のポイント部分のそれぞれは、対応の過去のテキストデータにあり、意味解析サブモデルｍ１により意図と高度に関連すると判断された、対応の過去の意味解析結果のポイント部分の１つである。 Also, the N language processing histories are N text data ( hereinafter referred to as N pieces of past text data). The N pieces of past text data respectively correspond to the N pieces of speech input that were input before the speech input was input to the service device 2 (hereinafter referred to as N pieces of past speech input), that is, Each corresponds to N phrases spoken by the user within the fixed period of time.
In the present embodiment, the N language processing histories are processed by the language data processing method for N past text data before the processing unit 11 executes the current language data processing method for the text data. is executed, semantic analysis processing is performed on each of N pieces of past text data to generate corresponding N pieces of semantic analysis results (hereinafter referred to as N pieces of past semantic analysis results) , are generated based on at least N past semantic analysis results and stored in the storage unit 12 . The language processing history includes past speculative intent data and at least one past point portion. The past guessed intention data is a multidimensional semantic vector, and is the guessed intention data of the past semantic analysis result of the correspondence. Each of the past point parts is one of the point parts of the past semantic analysis result of the correspondence, which is in the past text data of the correspondence and determined by the semantic analysis sub-model m1 to be highly relevant to the intent.

さらに、動作予測サブモデルｍ２は、それ自身が有するパラメーターを用いて、意味解析結果及びＮ個の言語処理履歴にそれぞれ計算をすることによって、意味解析結果及びＮ個の言語処理履歴に対応する（Ｎ＋１）個の意味ベクトルを生成し、そして、（Ｎ＋１）個の意味ベクトルを連結して意味行列を生成して、該意味行列を意味統合結果とする。そして、動作予測サブモデルｍ２は、それ自身が有するパラメーターを用いて、動作手順ｄ２それぞれの意味統合結果とのマッチング程度を計算し、動作手順ｄ２における意味統合結果とのマッチング程度が最も高い動作手順ｄ２を目標の動作手順として選択する。なお、動作予測サブモデルｍ２が有するパラメーター及び意味解析結果とＮ個の言語処理履歴とに対する計算は、機械学習により動作予測サブモデルｍ２をトレーニングして結果であり、詳細は本明細書のポイントではないため、詳しく説明しない。 Furthermore, the motion prediction submodel m2 uses its own parameters to perform calculations on the semantic analysis results and the N language processing histories, respectively, so as to correspond to the semantic analysis results and the N language processing histories ( N+1) semantic vectors are generated, and (N+1) semantic vectors are concatenated to generate a semantic matrix, and the semantic matrix is taken as a semantic integration result. Then, the action prediction submodel m2 uses its own parameters to calculate the degree of matching with the semantic integration result of each action procedure d2, and the action procedure d2 has the highest degree of matching with the semantic integration result. Select d2 as the target operating procedure. The calculations for the parameters and semantic analysis results of the action prediction submodel m2 and the N language processing histories are the results of training the action prediction submodel m2 by machine learning. I won't go into detail because I don't.

ステップＳ５において、処理ユニット１１は、目標の動作手順を実行する。 In step S5, the processing unit 11 executes the target operation procedure.

ステップＳ３及びステップＳ５に続くステップＳ６において、処理ユニット１１は、少なくとも意味解析結果に基づいて、他の言語処理履歴（以下、最新の言語処理履歴と呼称する）を生成し、該最新の言語処理履歴が今後の言語データ処理方法の実行時に用いられ得るように、該最新の言語処理履歴を記憶ユニット１２に格納する。最新の言語処理履歴は、前述したＮ個の言語処理履歴と同様であるため、ここで説明を繰り返さない。 In step S6 following steps S3 and S5, the processing unit 11 generates another language processing history (hereinafter referred to as the latest language processing history) based on at least the semantic analysis result, and the latest language processing history The latest language processing history is stored in the storage unit 12 so that the history can be used in future executions of the language data processing method. The latest language processing history is the same as the N language processing histories described above, so the description will not be repeated here.

図３を参照し、ユーザとサービス装置２との間の会話を例として、本考案の言語データ処理システム１の技術的効果を説明する。
まず、会話１において、ユーザは第１の音声入力ｍｓｇ１（すなわち、図３における「ホテルにレストランはありますか」）をサービス装置２に入力し、サービス装置２は、第１の音声入力ｍｓｇ１に対応して「はい、レストランは２階にあります」という音声データを出力した。そして、続く会話２において、例えば、ユーザは、サービス装置２が出力した音声データの音量が大きすぎると感じたため、サービス装置２に第２の音声入力ｍｓｇ２（すなわち、図３における「音量を下げて」）を入力し、サービス装置２は、第２の音声入力ｍｓｇ２に対応して「はい、音量を下げました」という音声データを出力した。続いて、ユーザは、第３の音声入力ｍｓｇ３（すなわち、図３における「いくらですか？」）をサービス装置２に入力した。 Referring to FIG. 3, the technical effect of the language data processing system 1 of the present invention will be described by taking a conversation between the user and the service device 2 as an example.
First, in Conversation 1, the user inputs a first voice input msg1 (ie, "Is there a restaurant in the hotel?" in FIG. 3) into the service device 2, and the service device 2 responds to the first voice input msg1. Then, voice data saying "Yes, the restaurant is on the second floor" was output. Then, in the following conversation 2, for example, the user felt that the volume of the voice data output by the service device 2 was too loud, so the user sent the second voice input msg2 to the service device 2 (that is, "Turn down the volume" in FIG. 3). ”), and the service device 2 outputs the voice data “Yes, the volume is turned down” in response to the second voice input msg2. Subsequently, the user entered a third voice input msg3 (ie "How much is it?" in FIG. 3) into the service device 2 .

第３の音声入力ｍｓｇ３は、単独で見た場合ユーザが何について値段を尋ねたのかが不明確であり、表現が不完全であるため、本考案の言語データ処理システム１が、第３の音声入力ｍｓｇ３に対して言語データ処理方法を実行する場合には（すなわち、ステップＳ１において処理ユニット１１は第３の音声入力ｍｓｇ３に対応するテキストデータを得た）、ステップＳ２において、処理ユニット１１は、意図タグｄ１においてマッチングされた意図タグがないと判断し、ステップＳ４へ進む。すなわち、第３の音声入力ｍｓｇ３に対応するテキストデータのみでは、処理ユニット１１は、ユーザの意図を判断することができない。ステップＳ４において、処理ユニット１１は、第３の音声入力ｍｓｇ３に関連する意味解析結果と、第１の音声入力ｍｓｇ１に関連する言語処理履歴と、第２の音声入力ｍｓｇ２に関連する言語処理履歴とに基づいて、動作予測処理を実行し、動作手順ｄ２から第３の音声入力ｍｓｇ３に関連する目標の動作手順を選択する。第１の音声入力ｍｓｇ１に関連する言語処理履歴は、第１の音声入力ｍｓｇ１に対応のテキストデータから推測された意図が「レストランについて尋ねる」であることを示す過去の推測意図データと、第１の音声入力ｍｓｇ１に対応のテキストデータから抽出された「レストラン」を示す過去のポイント部分と、を含む。第２の音声入力ｍｓｇ２に関連する言語処理履歴は、第２の音声入力ｍｓｇ２に対応のテキストデータから推測された意図が「音声の音量を下げる」であることを示す過去の推測意図データと、第２の音声入力ｍｓｇ２に対応のテキストデータから抽出された「音量」及び「下げる」をそれぞれ示す２つの過去のポイント部分と、を含む。言い換えれば、処理ユニット１１は、三者の間の意味上の関連性に基づいてユーザの意図を分析し、目標の動作手順を選択する。この例において、機械学習でトレーニングされた動作予測サブモデルｍ２により、処理ユニット１１は、第３の音声入力ｍｓｇ３の第１の音声入力ｍｓｇ１との意味上の関連性が比較的に高いと判断し、ユーザがレストランの値段について尋ねていると判断し、従って、選択された目標の動作手順は、例えば、サービス装置２にレストランの値段情報を出力させるようにするものである。 When the third voice input msg3 is viewed alone, it is unclear what the user asked about the price, and the expression is incomplete. When performing the language data processing method on the input msg3 (that is, in step S1 the processing unit 11 obtained the text data corresponding to the third speech input msg3), in step S2 the processing unit 11: It is determined that there is no intention tag matched in the intention tag d1, and the process proceeds to step S4. That is, the processing unit 11 cannot determine the user's intention only from the text data corresponding to the third voice input msg3. In step S4, the processing unit 11 generates the semantic analysis result associated with the third speech input msg3, the language processing history associated with the first speech input msg1, and the language processing history associated with the second speech input msg2. to select a target motion procedure related to the third voice input msg3 from the motion procedure d2. The language processing history related to the first speech input msg1 includes past inference intention data indicating that the intention inferred from the text data corresponding to the first speech input msg1 is "ask about the restaurant"; and a past point part indicating "Restaurant" extracted from the text data corresponding to the speech input msg1 of . The language processing history related to the second speech input msg2 includes past inferred intention data indicating that the intention inferred from the text data corresponding to the second speech input msg2 is "Turn down the volume of the speech"; and two past point portions respectively indicating "volume" and "lower" extracted from the text data corresponding to the second voice input msg2. In other words, the processing unit 11 analyzes the user's intention based on the semantic relevance among the three parties and selects the target action procedure. In this example, with the machine learning trained motion prediction sub-model m2, the processing unit 11 determines that the third speech input msg3 is more semantically related to the first speech input msg1. , that the user is asking about restaurant prices, and therefore the target action sequence selected is, for example, to cause the service device 2 to output restaurant price information.

ステップＳ１からステップＳ６及び図２のフローチャートは、言語データ処理方法を例示的に示すものに過ぎないことを理解されたい。ステップＳ１からステップＳ６を組み合わせたり、分割したり、順序を変えたりしても、本考案の言語データ処理方法と実質的に同一の方法で同一の効果を得ることができれば、本考案の権利範囲に含まれる。従って、ステップＳ１からステップＳ６及び図２のフローチャートは、本考案を限定するものではない。 It should be understood that steps S1 to S6 and the flowchart of FIG. 2 merely exemplify the language data processing method. Even if steps S1 to S6 are combined, divided, or changed in order, if the same effect can be obtained by substantially the same method as the language data processing method of the present invention, the scope of rights of the present invention include. Accordingly, steps S1 through S6 and the flowchart of FIG. 2 are not intended to limit the present invention.

要約すると、意図タグｄ１においてマッチングされた意図タグがないと判断する場合において（すなわち、音声入力に対応するテキストデータが表すユーザの意図を明確に判断することができない場合において）、処理ユニット１１は、言語処理結果と言語処理履歴とに基づいて動作予測処理を実行することによって、動作手順ｄ２からテキストデータに対応する目標の動作手順を選択する。言い換えれば、ユーザが話しかけた内容だけではユーザの意図を判断できないと判断した場合において、処理ユニット１１は、ユーザが最近に話しかけた内容にさらに基づいて意図を判断し、実行する動作手順ｄ２を選択する。従って、ユーザの意図の表現が完全ではなくても、或いは、リクエストと離れた内容を差し挟むようにして喋っても、本考案の言語データ処理システム１は、意図を判断することができ、音声操作の適応範囲を広げることができる。 In summary, when determining that there is no intention tag matched in the intention tag d1 (that is, when the user's intention represented by the text data corresponding to the voice input cannot be clearly determined), the processing unit 11 , a target operation procedure corresponding to the text data is selected from the operation procedure d2 by executing the operation prediction process based on the language processing result and the language processing history. In other words, when it is determined that the user's intention cannot be determined based only on the content of the user's speech, the processing unit 11 further determines the intention based on the content of the user's recent speech, and selects the operation procedure d2 to be executed. do. Therefore, even if the expression of the user's intention is not complete, or even if the user speaks while inserting a content separate from the request, the language data processing system 1 of the present invention can determine the intention, and the voice operation can be performed. You can expand the scope of application.

上記の説明では、説明の目的のために、実施形態の完全な理解を提供するために多数の特定の詳細が述べられた。しかしながら、当業者であれば、一又はそれ以上の他の実施形態が具体的な詳細を示さなくとも実施され得ることが明らかである。また、本明細書における「一実施形態」「一つの実施形態」を示す説明において、序数などの表示を伴う説明は全て、特定の態様、構造、特徴を有する本考案の具体的な実施に含まれ得るものであることと理解されたい。更に、本明細書において、時には複数の変化例が一つの実施形態、図面、又はこれらの説明に組み込まれているが、これは本明細書を合理化させるためのもので、本考案の多面性が理解されることを目的としたものであり、また、一実施形態における一又はそれ以上の特徴あるいは特定の具体例は、適切な場合には、本考案の実施において、他の実施形態における一またはそれ以上の特徴あるいは特定の具体例と共に実施され得る。 In the above description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments. However, it will be apparent to one skilled in the art that one or more other embodiments may be practiced without the specific details. In addition, in the descriptions indicating "one embodiment" and "one embodiment" in this specification, all descriptions accompanied by indications such as ordinal numbers are included in specific implementations of the present invention having specific aspects, structures, and features. It should be understood that Further, in this specification, at times multiple variations may be incorporated into a single embodiment, drawing, or description thereof for the purpose of streamlining the specification and recognizing the versatility of the invention. It is intended to be understood that one or more features or specific examples of one embodiment may, where appropriate, be applied to one or more of the other embodiments in the practice of the invention. It may be implemented with more features or with specific embodiments.

以上、本考案の実施形態および変化例を説明したが、本考案はこれらに限定されるものではなく、最も広い解釈の精神および範囲内に含まれる様々な構成として、全ての修飾および均等な構成を包含するものとする。 Although the embodiments and variations of the present invention have been described above, the present invention is not limited to these, and includes all modifications and equivalent configurations as various configurations included within the spirit and scope of the broadest interpretation. shall include

１言語データ処理システム
１１処理ユニット
１２記憶ユニット
Ｍ言語処理モデル
ｍ１意味解析サブモデル
ｍ２動作予測サブモデル
ｄ１意図タグ
ｄ２動作手順
２サービス装置
ｍｓｇ１第１の音声入力
ｍｓｇ２第２の音声入力
ｍｓｇ３第３の音声入力
Ｓ１－Ｓ６ステップ 1 language data processing system 11 processing unit 12 storage unit M language processing model m1 semantic analysis submodel m2 action prediction submodel d1 intention tag d2 action procedure 2 service device msg1 first voice input msg2 second voice input msg3 third Voice input S1-S6 step

Claims

a processing unit;
a storage unit electrically connected to the processing unit;
The storage unit stores a language processing model realized by machine learning technology,
The language processing model includes a plurality of intent tags,
each of the intent tags corresponds to at least one of a plurality of operational procedures performed by the processing unit;
The processing unit is
Obtaining text data corresponding to voice input and performing semantic analysis processing on the text data using the language processing model to obtain a semantic analysis result corresponding to the text data,
determining whether there is an intention tag matched in the intention tag based on the semantic analysis result;
When it is determined that there is no matching intention tag in the intention tag, motion prediction processing based on the semantic analysis result and N language processing histories related to the text data using the language processing model. to select a target operation procedure corresponding to the text data from the operation procedures, N being an integer equal to or greater than 1;
configured to perform the target operating procedure;
Language data processing system.

the processing unit is electrically connected to a service device;
the voice input is input through the service device;
The N language processing histories respectively correspond to N past text data obtained before obtaining the text data, and the N past text data are obtained before the speech input is input. 2. A language data processing system according to claim 1, corresponding to each of the N input past speech inputs.

The action prediction process generates a semantic integration result based on the semantic analysis result and the N language processing histories, and selects the operation procedure having the highest degree of matching with the semantic integration result in the operation procedure. 2. The language data processing system according to claim 1, wherein the language data processing system is selected as a target operation procedure.

the semantic analysis result includes inferred intention data corresponding to the intention tag;
4. The language data processing system of claim 3, wherein each of said language processing histories includes past inferred intention data corresponding to said intention tag.

the semantic analysis result further includes at least one point portion extracted from the text data using the language processing model;
5. The language processing history of claim 4, wherein each of said language processing histories further includes at least one past point portion extracted from a corresponding one of said N past text data using said language processing model. Language data processing system.

the semantic analysis result includes inferred intention data corresponding to the intention tag;
The processing unit determines whether the intention tag includes the intention tag whose degree of matching with the inferred intention data is equal to or greater than a matching threshold, and the degree of matching with the inferred intention data is equal to or greater than the matching threshold. When it is determined that the intention tag exists, the intention tag having the highest degree of matching with the inferred intention data is regarded as the matched intention tag, and it is determined that the intention tag has the matched intention tag. 2. The language data processing system of claim 1, further configured to:

The processing unit is further configured to perform at least one of the action procedures corresponding to the matched intent tag when it is determined that the matched intent tag exists in the intent tag. Item 1. The language data processing system according to item 1.