JP6799574B2

JP6799574B2 - Method and device for determining satisfaction with voice dialogue

Info

Publication number: JP6799574B2
Application number: JP2018225189A
Authority: JP
Inventors: マオ、ワントン
Original assignee: バイドゥオンラインネットワークテクノロジー（ベイジン）カンパニーリミテッド
Priority date: 2018-03-15
Filing date: 2018-11-30
Publication date: 2020-12-16
Anticipated expiration: 2038-11-30
Also published as: US10950234B2; CN108388926B; JP2019159309A; CN108388926A; US20190287524A1

Description

本発明の実施形態は、音声対話分野に関し、特に音声対話の満足度の確定方法及び装置に関する。 Embodiments of the present invention relate to the field of voice dialogue, and particularly to methods and devices for determining satisfaction with voice dialogue.

人工知能技術の発展につれて、スマート音声対話製品の開発と使用が注目されている。なお、スマート音声対話は、音声入力による対話モードであり、ユーザは、音声で自身の要求を入力可能であり、当該製品は、要求の意図に応じて対応の内容を応答可能である。 With the development of artificial intelligence technology, the development and use of smart voice dialogue products is drawing attention. Note that the smart voice dialogue is a dialogue mode by voice input, the user can input his / her own request by voice, and the product can respond to the contents of the correspondence according to the intention of the request.

音声対話分野において、スマート音声対話製品に対するより良い製造及びアップグレードをするために、当該製品の音声対話の満足度を評価することが重要である。これは、音声対話の満足度の評価により当該製品の音声対話機能がユーザに認められるか否かを反映できるからである。従来技術において、満足度の評価は、主に、ユーザによる一回の要求の意図及び端末による応答の内容を評価処理データとして、幾つかのテキスト処理技術によって意図及び内容の関連性の演算を行って、当該応答に応じたユーザの満足度を標識する。 In the field of voice dialogue, it is important to evaluate the satisfaction of voice dialogue of the product in order to make a better manufacture and upgrade for the smart voice dialogue product. This is because the evaluation of the satisfaction level of the voice dialogue can reflect whether or not the voice dialogue function of the product is recognized by the user. In the prior art, the evaluation of satisfaction mainly uses the intention of one request by the user and the content of the response by the terminal as evaluation processing data, and calculates the relevance of the intention and the content by some text processing techniques. The user's satisfaction level according to the response is labeled.

ただし、端末から返信された内容とユーザ意図との関連性だけでユーザの満足度を取得するのは、ユーザからの音声対話に対するリアルで全面的な評価を取得できないため、従来の評価手段により音声対話の満足度を評価し難い。 However, it is not possible to obtain a real and complete evaluation of the voice dialogue from the user in order to obtain the user's satisfaction only by the relationship between the content returned from the terminal and the user's intention. It is difficult to evaluate the satisfaction level of the dialogue.

本発明の実施形態は、音声対話に対してリアルで全面的な評価を提供可能な音声対話の満足度の確定方法及び装置を提供する。 Embodiments of the present invention provide a method and apparatus for determining the satisfaction level of a voice dialogue that can provide a realistic and complete evaluation for the voice dialogue.

第一の側面として、本発明の実施形態は、音声対話の満足度の確定方法であって、音声対話の客観データと音声対話の主観データとが含まれる音声対話特徴を取得するステップであって、前記音声対話の客観データと音声対話の主観データは、同一の主題に対するデータである、ステップと、前記客観データに対して評価処理を行って客観評価を取得し、前記主観データに対して評価処理を行って主観評価を取得するステップと、前記客観評価と前記主観評価とを満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得するステップと、を含む音声対話の満足度の確定方法を提供する。 As a first aspect, the embodiment of the present invention is a method for determining the satisfaction level of a voice dialogue, and is a step of acquiring a voice dialogue feature including objective data of the voice dialogue and subjective data of the voice dialogue. , The objective data of the voice dialogue and the subjective data of the voice dialogue are data for the same subject. The step and the objective data are evaluated to obtain an objective evaluation, and the subjective data is evaluated. A step of performing processing and acquiring a subjective evaluation, and a step of acquiring the satisfaction of the voice dialogue output from the satisfaction evaluation model by using the objective evaluation and the subjective evaluation as inputs of the satisfaction evaluation model. Provided is a method for determining the satisfaction level of a voice dialogue including.

可能的な設計において、前記音声対話の客観データには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間が含まれ、前記音声対話の主観データには、応答内容の再生が中断された後のユーザによる音声入力に対応するテキスト情報が含まれる。 In a possible design, the objective data of the voice dialogue includes the user intent and the response content, the response delay, and the current playback time of the response content, and the subjective data of the voice dialogue interrupts the reproduction of the response content. Contains text information that corresponds to voice input by the user after it has been done.

可能的な設計において、前記客観データに対して評価処理を行って客観評価を取得するステップは、前記ユーザ意図と前記応答内容との意図マッチ度に基づいて、第一客観評価を取得するステップと、前記応答遅延と標準遅延に基づいて、第二客観評価を取得するステップと、前記応答内容の現在再生時間と前記応答内容の標準再生時間に基づいて、第三客観評価を取得するステップと、を含み、前記客観評価と前記主観評価とを満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得するステップは、前記第一客観評価、前記第二客観評価、前記第三客観評価及び前記主観評価を満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得するステップを含む。 In a possible design, the step of performing evaluation processing on the objective data and acquiring the objective evaluation is a step of acquiring the first objective evaluation based on the degree of intentional match between the user's intention and the response content. , A step of acquiring a second objective evaluation based on the response delay and the standard delay, and a step of acquiring a third objective evaluation based on the current reproduction time of the response content and the standard reproduction time of the response content. The step of acquiring the satisfaction level of the voice dialogue output from the satisfaction evaluation model by inputting the objective evaluation and the subjective evaluation into the satisfaction evaluation model includes the first objective evaluation and the second objective evaluation. It includes a step of acquiring the satisfaction level of the voice dialogue output from the satisfaction evaluation model by inputting the evaluation, the third objective evaluation, and the subjective evaluation into the satisfaction evaluation model.

可能的な設計において、前記主観データに対して評価処理を行って主観評価を取得するステップは、前記テキスト情報に対して語義解析を行って、前記テキスト情報に対応する、感情属性又は主題属性である内容属性を取得するステップと、前記テキスト情報に対応する内容属性に基づいて主観評価を取得するステップと、を含む。 In a possible design, the step of performing evaluation processing on the subjective data to obtain the subjective evaluation is to perform a word meaning analysis on the text information and use emotional attributes or subject attributes corresponding to the text information. It includes a step of acquiring a certain content attribute and a step of acquiring a subjective evaluation based on the content attribute corresponding to the text information.

可能的な設計において、前記内容属性が主題属性であれば、前記テキスト情報に対応する内容属性に基づいて主観評価を取得するステップは、前記テキスト情報に対応する主題タイプを取得するステップと、前記テキスト情報に対応する主題タイプがユーザ意図に対応する主題タイプと一致であれば、前記主観評価が所定の評価値よりも低いと確定するステップと、を含む。 In a possible design, if the content attribute is a subject attribute, the steps to acquire the subjective evaluation based on the content attribute corresponding to the text information are the step to acquire the subject type corresponding to the text information and the step. If the subject type corresponding to the text information matches the subject type corresponding to the user's intention, the step of determining that the subjective evaluation is lower than a predetermined evaluation value is included.

可能的な設計において、前記内容属性が感情属性であれば、前記テキスト情報に対応する内容属性に基づいて主観評価を取得するステップは、前記テキスト情報における感情キーワードを抽出するステップと、前記感情キーワードと気分タイプとの対応関係に基づいて、積極的な気分と、消極的な気分と、中性的な気分とからなる群のうちいずれかの気分タイプを取得するステップと、前記気分タイプと所定の評価との対応関係に基づいて主観評価を取得するステップと、を含む。 In a possible design, if the content attribute is an emotional attribute , the steps of acquiring the subjective evaluation based on the content attribute corresponding to the text information include a step of extracting the emotional keyword in the text information and the emotional keyword. The step of acquiring one of the mood types consisting of positive mood, negative mood, and neutral mood based on the correspondence between mood type and mood type, and the mood type and predetermined mood type. Includes a step to obtain a subjective evaluation based on the correspondence with the evaluation of.

可能的な設計において、音声対話特徴を取得するステップは、直前の時間帯における第二ログデータ及び直後の時間帯における第三ログデータとの時間間隔が所定の閾値よりも大きい第一ログデータを取得するステップと、前記第一ログデータから、ユーザによる二回の隣接する音声入力のそれぞれに対応する主題を取得するステップと、二回の隣接する音声入力のそれぞれに対応する主題に基づいて、前記音声対話特徴を取得するステップと、を含む。 In a possible design, the step of acquiring the voice dialogue feature is to select the first log data whose time interval between the second log data in the immediately preceding time zone and the third log data in the immediately following time zone is larger than a predetermined threshold. Based on the step to acquire, the step to acquire the subject corresponding to each of the two adjacent voice inputs by the user from the first log data, and the subject corresponding to each of the two adjacent voice inputs. Includes a step of acquiring the spoken dialogue feature.

可能的な設計において、前記方法は、前記客観評価と前記主観評価とを満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得するステップの前に、更に客観的なサンプルデータに対して評価処理を行って得られた第一サンプル評価と、主観的なサンプルデータに対して評価処理を行って得られた第二サンプル評価と、ユーザから入力された満足度とが含まれるトレーニングサンプルセットを取得するステップであって、前記客観的なサンプルデータと前記主観的なサンプルデータは、同一の主題に対するデータである、ステップと、前記トレーニングサンプルセットに基づいて、イテレーショントレーニングにより前記満足度評価モデルを取得するステップと、を含む。 In a possible design, the method further takes the objective evaluation and the subjective evaluation as inputs to the satisfaction evaluation model and further before the step of acquiring the satisfaction of the voice dialogue output from the satisfaction evaluation model. The first sample evaluation obtained by performing evaluation processing on objective sample data, the second sample evaluation obtained by performing evaluation processing on subjective sample data, and the satisfaction input by the user. A step of obtaining a training sample set that includes a degree, wherein the objective sample data and the subjective sample data are data for the same subject, based on the step and the training sample set. It includes a step of acquiring the satisfaction evaluation model by iteration training.

第二の側面として、本発明の実施形態は、音声対話の満足度の確定装置であって、音声対話の客観データと音声対話の主観データとが含まれる音声対話特徴を取得する取得モジュールであって、前記声対話の客観データと音声対話の主観データは、同一の主題に対するデータである、取得モジュールと、前記客観データに対して評価処理を行って客観評価を取得し、前記主観データに対して評価処理を行って主観評価を取得する処理モジュールと、前記客観評価と前記主観評価とを満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得する確定モジュールと、を備える音声対話の満足度の確定装置を提供する。 As a second aspect, the embodiment of the present invention is a device for determining the satisfaction level of voice dialogue, and is an acquisition module for acquiring voice dialogue features including objective data of voice dialogue and subjective data of voice dialogue. The objective data of the voice dialogue and the subjective data of the voice dialogue are data for the same subject. The acquisition module and the objective data are evaluated to acquire the objective evaluation, and the subjective data is subjected to evaluation processing. The processing module that performs the evaluation process to acquire the subjective evaluation, and the objective evaluation and the subjective evaluation are input to the satisfaction evaluation model, and the satisfaction level of the voice dialogue output from the satisfaction evaluation model is acquired. Provided is a device for determining the satisfaction level of a voice dialogue including a module.

可能的な設計において、前記音声対話の客観データには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間とが含まれ、前記音声対話の主観データには、応答内容の再生が中断された後のユーザによる音声入力に対応するテキスト情報、或いは応答内容の再生が終了された後のユーザにより入力されたテキスト情報が含まれる。 In a possible design, the objective data of the voice dialogue includes the user intention and the response content, the response delay, and the current playback time of the response content, and the subjective data of the voice dialogue includes the reproduction of the response content. It includes text information corresponding to voice input by the user after the interruption, or text information input by the user after the reproduction of the response content is completed.

可能的な設計において、前記処理モジュールは、具体的に、前記ユーザ意図と前記応答内容との意図のマッチ度に基づいて、第一客観評価を取得し、前記応答遅延と標準遅延に基づいて、第二客観評価を取得し、前記応答内容の現在再生時間と前記応答内容の標準再生時間に基づいて、第三客観評価を取得するように構成され、前記確定モジュールは、具体的に、前記第一客観評価、前記第二客観評価、前記第三客観評価及び前記主観評価を満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得するように構成されている。 In a possible design, the processing module specifically obtains a first objective evaluation based on the degree of matching between the user's intention and the response content, and based on the response delay and standard delay. The second objective evaluation is acquired, and the third objective evaluation is acquired based on the current reproduction time of the response content and the standard reproduction time of the response content, and the confirmation module is specifically the first. It is configured to acquire the satisfaction level of the voice dialogue output from the satisfaction evaluation model by using the one objective evaluation, the second objective evaluation, the third objective evaluation, and the subjective evaluation as inputs of the satisfaction evaluation model. There is.

可能的な設計において、前記処理モジュールは、具体的に、前記テキスト情報に対して語義解析を行って前記テキスト情報に対応する、感情属性又は主題属性である内容属性を取得し、前記テキスト情報に対応する内容属性に基づいて主観評価を取得するように構成されている。 In a possible design, the processing module specifically performs a word meaning analysis on the text information to acquire a content attribute which is an emotional attribute or a subject attribute corresponding to the text information, and uses the text information as the text information. It is configured to acquire subjective evaluations based on the corresponding content attributes.

可能的な設計において、前記内容属性が主題属性である場合に、前記処理モジュールは、具体的に、前記テキスト情報に対応する主題タイプを取得し、前記テキスト情報に対応する主題タイプがユーザ意図に対応する主題タイプと一致であれば、前記主観評価が所定の評価値よりも低いと確定するように構成されている。 In a possible design, when the content attribute is a subject attribute, the processing module specifically acquires the subject type corresponding to the text information, and the subject type corresponding to the text information is intended by the user. If it matches the corresponding subject type, the subjective evaluation is configured to be determined to be lower than a predetermined evaluation value.

可能的な設計において、前記内容属性が感情属性である場合に、前記処理モジュールは、具体的に、前記テキスト情報における感情キーワードを抽出し、前記感情キーワードと気分タイプとの対応関係に基づいて、積極的な気分と、消極的な気分と、中性的な気分とからなる群のうちいずれかの気分タイプを取得し、前記気分タイプと所定の評価との対応関係に基づいて主観評価を取得するように構成されている。 In a possible design, when the content attribute is an emotional attribute , the processing module specifically extracts an emotional keyword in the text information, and based on the correspondence between the emotional keyword and the mood type, Acquire one of the mood types in the group consisting of positive mood, negative mood, and neutral mood, and acquire subjective evaluation based on the correspondence between the mood type and a predetermined evaluation. It is configured to do.

可能的な設計において、前記取得モジュールは、具体的に、直前の時間帯における第二ログデータ及び直後の時間帯における第三ログデータとの時間間隔が所定の閾値よりも大きい第一ログデータを取得し、前記第一ログデータから、ユーザによる二回の隣接する音声入力のそれぞれに対応する主題を取得し、二回の隣接する音声入力のそれぞれに対応する主題に基づいて、前記音声対話特徴を取得するように構成されている。 In a possible design, the acquisition module specifically captures first log data in which the time interval between the second log data in the immediately preceding time zone and the third log data in the immediately following time zone is greater than a predetermined threshold. Acquired, from the first log data, the subject corresponding to each of the two adjacent voice inputs by the user is acquired, and the voice dialogue feature is based on the subject corresponding to each of the two adjacent voice inputs. Is configured to get.

可能的な設計において、前記装置は、トレーニングモジュールを更に備え、当該トレーニングモジュールは、前記客観評価と前記主観評価とを満足度評価モデルの入力として前記満足度評価モデルから出力される音声対話の満足度を取得する前に、客観的なサンプルデータに対して評価処理を行って得られた第一サンプル評価と、主観的なサンプルデータに対して評価処理を行って得られた第二サンプル評価と、ユーザから入力された満足度とが含まれるトレーニングサンプルセットを取得し、前記客観的なサンプルデータと前記主観的なサンプルデータとは同一の主題に対するデータであり、前記トレーニングサンプルセットに基づいて、イテレーショントレーニングにより前記満足度評価モデルを得るように構成されている。 In a possible design, the device further comprises a training module, the training module satisfying the voice dialogue output from the satisfaction evaluation model with the objective evaluation and the subjective evaluation as inputs to the satisfaction evaluation model. The first sample evaluation obtained by performing evaluation processing on objective sample data and the second sample evaluation obtained by performing evaluation processing on subjective sample data before acquiring the degree. , A training sample set including the satisfaction level input from the user is obtained, and the objective sample data and the subjective sample data are data for the same subject, and based on the training sample set, It is configured to obtain the satisfaction evaluation model by iteration training.

第三の側面として、本発明の実施形態は、音声対話の満足度の確定装置であって、少なくとも一つのプロセッサとメモリとを備え、前記メモリにコンピュータにより実行可能な指令が記憶されており、前記少なくとも一つのプロセッサは、前記メモリに記憶されているコンピュータにより実行可能な指令を実行することにより、前記少なくとも一つのプロセッサに第一の側面又は第一の側面の各種の可能的な設計に記載の音声対話の満足度の確定方法を実行させる音声対話の満足度の確定装置を提供する。 As a third aspect, an embodiment of the present invention is a device for determining the satisfaction level of a voice dialogue, which includes at least one processor and a memory, and stores commands that can be executed by a computer in the memory. The at least one processor is described in various possible designs of the first aspect or the first aspect of the at least one processor by executing a command that can be executed by the computer stored in the memory. Provided is a device for determining the satisfaction level of the voice dialogue, which executes the method for determining the satisfaction level of the voice dialogue.

第四の側面として、本発明の実施形態は、コンピュータ読み取り可能な記憶媒体であって、前記コンピュータ読み取り可能な記憶媒体にコンピュータに実行可能な指令が記憶されており、前記コンピュータに実行可能な指令がプロセッサにより実行されると、第一の側面又は第一の側面の各種の可能的な設計に記載の音声対話の満足度の確定方法を実現させるコンピュータ読み取り可能な記憶媒体を提供する。 As a fourth aspect, the embodiment of the present invention is a computer-readable storage medium, in which commands that can be executed by a computer are stored in the computer-readable storage medium, and commands that can be executed by the computer. When executed by a processor, it provides a computer-readable storage medium that implements the method of determining the satisfaction of a voice dialogue as described in the various possible designs of the first aspect or the first aspect.

本発明の実施形態により提供される音声対話の満足度の確定方法及び装置は、同一の主題に対するデータである、音声対話の客観データと音声対話の主観データが含まれる音声対話特徴を取得することで，同一の主題の主観データと客観データを取得することにより、満足度の評価用のデータをリアルで全面的に取得し、評価のデータのリアル性及び全面性を確保することができるため、満足度がより全面的であり、ユーザのリアルな評価により近くなる。客観データに対して評価処理を行って客観評価を取得し、主観データに対して評価処理を行って主観評価を取得し、客観評価と主観評価を満足度評価モデルの入力として、満足度評価モデルから出力される音声対話の満足度を取得することで、満足度評価モデルの手段により満足度を取得することにより、満足度を迅速で正確に取得できるため、当該方法が音声対話量の多い場合に適用できることになる。 The method and apparatus for determining the satisfaction level of a voice dialogue provided by an embodiment of the present invention acquires a voice dialogue feature including objective data of the voice dialogue and subjective data of the voice dialogue, which are data for the same subject. By acquiring the subjective data and the objective data of the same subject, the data for evaluation of satisfaction can be acquired in a real and complete manner, and the realism and completeness of the evaluation data can be ensured. Satisfaction is more comprehensive and closer to the user's realistic evaluation. Satisfaction evaluation model by performing evaluation processing on objective data to acquire objective evaluation, performing evaluation processing on subjective data to acquire subjective evaluation, and using objective evaluation and subjective evaluation as input of satisfaction evaluation model By acquiring the satisfaction level of the voice dialogue output from, the satisfaction level can be obtained quickly and accurately by acquiring the satisfaction level by the means of the satisfaction evaluation model. Therefore, when the method has a large amount of voice dialogue. Will be applicable to.

本発明の実施形態又は従来技術における技術案をより明確に説明するために、以下に実施例又は従来技術に対する説明に必要な図面を簡単に紹介する。明らかに、以下の説明における図面は本発明の幾つかの実施例であり、当業者にとって創造的な労働を付しない前提でこれら図面に基づいて他の図面を更に得られる。
本発明の実施例により提供される音声対話の満足度の確定システムのアーキテクチャの模式図である。本発明の実施例により提供される満足度評価モデルの取得の模式図である。本発明の実施例により提供される音声対話の満足度の確定方法の第一のフローチャートである。本発明の実施例により提供されるログデータの模式図である。本発明の実施例により提供される音声対話の満足度の確定方法の第二のフローチャートである。本発明の実施例により提供される主観評価の取得のフローチャートである。本発明の実施例により提供される音声対話の満足度の確定装置の構成の模式図である。本発明の実施例により提供される音声対話の満足度の確定装置のハードウェアの構成図である。 In order to more clearly explain the technical proposal in the embodiment or the prior art of the present invention, the drawings necessary for the explanation of the embodiment or the prior art will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present invention, and other drawings can be further obtained based on these drawings on the premise that no creative labor is applied to those skilled in the art.
It is a schematic diagram of the architecture of the satisfaction determination system of the voice dialogue provided by the embodiment of the present invention. It is a schematic diagram of acquisition of the satisfaction evaluation model provided by the Example of this invention. It is the first flowchart of the method of determining the satisfaction degree of the voice dialogue provided by the Example of this invention. It is a schematic diagram of the log data provided by the Example of this invention. It is a second flowchart of the method of determining the satisfaction degree of the voice dialogue provided by the Example of this invention. It is a flowchart of acquisition of the subjective evaluation provided by the Example of this invention. It is a schematic diagram of the configuration of the device for determining the satisfaction level of the voice dialogue provided by the embodiment of the present invention. It is a block diagram of the hardware of the device for determining the satisfaction of the voice dialogue provided by the embodiment of the present invention.

本発明の実施例の目的、技術案及び利点をより明確にするために、以下に本発明の実施例における図面と合わせて本発明の実施例における技術案を明瞭で完備的に説明する。説明される実施例は、全ての実施例ではなく、本発明の一部の実施例であるのは言うまでもない。当業者により本発明における実施例に基づいて創造的な労働を付しない前提で得られる全ての他の実施例も本発明の保護範囲に属される。 In order to further clarify the purpose, technical proposal and advantages of the examples of the present invention, the technical proposal in the examples of the present invention will be described in a clear and complete manner together with the drawings in the examples of the present invention. It goes without saying that the examples described are not all examples, but some examples of the present invention. All other embodiments obtained by one of ordinary skill in the art based on the embodiments of the present invention on the premise that no creative labor is applied also belong to the scope of protection of the present invention.

図１は、本発明の実施例により提供される音声対話の満足度の確定システムのアーキテクチャの模式図である。図１に示すように、本実施例により提供されるシステムには、端末１０１と、サーバ１０２とが備えられる。なお、端末１０１は、児童ストーリーマシン、携帯電話、タブレット、車載端末などであっても良い。本実施例では、端末１０１の実現手段に制限せず、当該端末１０１がユーザと音声対話を実行可能であれば良い。 FIG. 1 is a schematic diagram of the architecture of the voice dialogue satisfaction determination system provided by the embodiments of the present invention. As shown in FIG. 1, the system provided by this embodiment includes a terminal 101 and a server 102. The terminal 101 may be a child story machine, a mobile phone, a tablet, an in-vehicle terminal, or the like. In this embodiment, the means for realizing the terminal 101 is not limited, as long as the terminal 101 can execute a voice dialogue with the user.

音声対話（ＳｐｅｅｃｈＩｎｔｅｒａｃｔｉｏｎ）とは、音声認識、音声合成、自然言語解析などの技術により、複数種の実際の応用シーンにおいて、端末に「聞き取り可能、言出し可能、理解可能」という機能を与えるようなスマートヒューマンマシンインタラクションの体験であり、スマート問答、スマート再生、インテリジェント検索などのシーンを含む複数の応用シーンに適用される。 Speech Interaction is a technology such as speech recognition, speech synthesis, and natural language analysis that gives the terminal the function of "listening, speaking, and understanding" in multiple types of actual application scenes. A smart human-machine interaction experience that applies to multiple application scenes, including scenes such as smart questions and answers, smart playback, and intelligent search.

ユーザは、音声で端末１０１へ検索語句を入力する。当該端末１０１は、当該検索語句によりユーザ意図を取得し、当該意図に応じてローカルで又はサーバ１０２から、当該意図に対応する応答内容を取得して応答内容をユーザにフィードバックすることができる。例えば、料理の予約、チケットの予約、音楽や映画やある商品の検索などがある。 The user inputs a search term to the terminal 101 by voice. The terminal 101 can acquire the user's intention by the search term, acquire the response content corresponding to the intention locally or from the server 102 according to the intention, and feed back the response content to the user. For example, booking food, booking tickets, searching for music, movies, or certain products.

端末１０１がユーザへ各種の音声対話サービスを提供可能であるため、ユーザからの音声対話に対するリアルで全面的な評価を取得することが端末における音声対話の開発、アップグレードなどに対して重要である。 Since the terminal 101 can provide various voice dialogue services to the user, it is important for the development and upgrade of the voice dialogue in the terminal to obtain a real and complete evaluation of the voice dialogue from the user.

本発明の実施例は、音声対話の満足度の確定方法を提供する。当該方法は、音声対話に対してリアルで全面的な評価を提供することができる。当該音声対話の満足度の確定方法は、図１に示された端末１０１により実行可能である。端末１０１は、ログデータに基づいて満足度を確定し、その後にサーバ１０２に当該満足度をフィードバックし、サーバ１０２に当該満足度に基づいて更なる処理を実行させることができる。本実施例の音声対話の満足度の確定方法は、図２に示されたサーバ１０２により実行されても良い。端末１０１は、ログデータをサーバ１０２へ送信し、サーバ１０２に当該満足度を確定して更なる処理を実行させる。 The embodiments of the present invention provide a method for determining the satisfaction level of a voice dialogue. The method can provide a realistic and complete evaluation of spoken dialogue. The method of determining the satisfaction level of the voice dialogue can be executed by the terminal 101 shown in FIG. The terminal 101 can determine the satisfaction level based on the log data, then feed back the satisfaction level to the server 102, and cause the server 102 to execute further processing based on the satisfaction level. The method of determining the satisfaction level of the voice dialogue of this embodiment may be executed by the server 102 shown in FIG. The terminal 101 transmits the log data to the server 102, and causes the server 102 to determine the satisfaction level and execute further processing.

本実施例では、当該音声対話の満足度の確定方法の実行主体に対して特別な制限がなく、端末１０１により実行されてもサーバ１０２により実行されても良い。本実施例において、端末１０１とサーバ１０２は、ログデータを取得した後に、何れも同じ方法を採用して音声対話の満足度を確定しても良い。 In this embodiment, there is no particular restriction on the executing subject of the method for determining the satisfaction level of the voice dialogue, and the method may be executed by the terminal 101 or the server 102. In this embodiment, the terminal 101 and the server 102 may both adopt the same method to determine the satisfaction level of the voice dialogue after acquiring the log data.

本実施例において、満足度を迅速に確定するため、満足度評価モデルにより満足度を確定する。本実施例において、まず図２と合わせて満足度評価モデルを如何に取得する過程を説明する。 In this embodiment, in order to quickly determine the satisfaction level, the satisfaction level is determined by the satisfaction evaluation model. In this embodiment, first, the process of acquiring the satisfaction evaluation model will be described together with FIG.

図２は、本発明の実施例により提供される満足度評価モデルの取得の模式図である。本実施例において、客観データと主観データとを評価することにより満足度を取得する。これにより、満足度評価モデルを取得する時に、客観的なサンプルデータ及び主観的なサンプルデータを同時に考慮してユーザからの音声対話に対するリアルで全面的な評価を取得する。なお、当該満足度評価モデルは、端末により取得されても良く、サーバにより取得されても良く、サーバにより取得され或いはサーバにより満足度評価モデルが取得された後に当該満足度評価モデルをインストールパッケージで端末へ送信しても良い。 FIG. 2 is a schematic diagram of acquisition of the satisfaction evaluation model provided by the embodiment of the present invention. In this embodiment, satisfaction is obtained by evaluating objective data and subjective data. As a result, when acquiring the satisfaction evaluation model, the objective sample data and the subjective sample data are simultaneously considered to acquire a real and complete evaluation of the voice dialogue from the user. The satisfaction evaluation model may be acquired by the terminal or the server, and after the satisfaction evaluation model is acquired by the server or the server, the satisfaction evaluation model is installed in the installation package. You may send it to the terminal.

図２に示すように、まずトレーニングサンプルセットを取得する。当該トレーニングサンプルセットには、第一サンプル評価と、第二サンプル評価と、ユーザから入力される満足度とが含まれる。当業者であればわかるように、トレーニングサンプルセットのうち適量なデータ、即ち第一サンプル評価、第二サンプル評価及びユーザから入力される満足度に対してイテレーショントレーニングを行うことにより、満足度評価モデルを取得することができる。 As shown in FIG. 2, a training sample set is first obtained. The training sample set includes a first sample evaluation, a second sample evaluation, and a satisfaction level input by the user. As can be understood by those skilled in the art, a satisfaction evaluation model is performed by performing iteration training on appropriate amounts of data in the training sample set, that is, first sample evaluation, second sample evaluation, and satisfaction input from the user. Can be obtained.

なお、第一サンプル評価は客観的なサンプルデータに対して評価処理を行って得られるものであり、第二サンプル評価は主観的なサンプルデータに対して評価処理を行って得られるものである。本実施例において、当該客観的なサンプルデータと主観的なサンプルデータは、同一の主題に対するデータである。 The first sample evaluation is obtained by performing evaluation processing on objective sample data, and the second sample evaluation is obtained by performing evaluation processing on subjective sample data. In this embodiment, the objective sample data and the subjective sample data are data for the same subject.

当該客観的なサンプルデータは、ユーザの感情的なニュアンスを含まないデータ、例えば端末に関するデータである。当該主観的なサンプルデータは、ユーザの感情的なニュアンスを含むデータである。同一の主題に対するデータであれば、何れも取得して主観と客観を区分可能である。 The objective sample data is data that does not include the emotional nuances of the user, for example, data relating to a terminal. The subjective sample data is data including emotional nuances of the user. If it is data for the same subject, it is possible to acquire all of them and distinguish between subjective and objective.

具体的な例示において、ユーザの音声入力に応じてユーザ意図として「私は張三の新しい歌を聞きたい」を取得し、当該意図に応じて端末から返信された応答内容は「××歌」のオーディオファイルである。ユーザにより当該「××歌」について主観的なフィードバックを行う場合に、ユーザは当該オーディオファイルの再生を一旦に中止し、その後に音声で「この歌が私の欲しいものではない」を入力すると共に、端末により自己の満足度を入力する。この過程において、当該「××歌」の再生時間及び応答遅延を取得する。これにより、ユーザは「歌」の主題について端末と二回の対話を行った。 In a specific example, "I want to hear Zhang San's new song" is acquired as the user's intention according to the user's voice input, and the response content returned from the terminal according to the intention is "XX song". It is an audio file of. When the user gives subjective feedback about the "XX song", the user stops playing the audio file at once, and then inputs "This song is not what I want" by voice. , Enter your own satisfaction with the terminal. In this process, the playback time and response delay of the "XX song" are acquired. As a result, the user had two dialogues with the terminal on the subject of the "song".

当該過程において、客観的なサンプルデータには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間が含まれても良い。客観評価は、これら客観的なサンプルデータについて所定のアルゴリズムにより客観評価を取得することができる。当該所定のアルゴリズムは、前記客観的なサンプルデータを含む関数などであっても良く、本実施例では当該所定のアルゴリズムに対して制限しない。 In this process, the objective sample data may include the user's intention and response content, response delay, and the current playback time of the response content. As for the objective evaluation, the objective evaluation can be obtained from these objective sample data by a predetermined algorithm. The predetermined algorithm may be a function or the like including the objective sample data, and is not limited to the predetermined algorithm in this embodiment.

主観的なサンプルデータには、応答内容の再生が中断された後にユーザによる音声入力に対応するテキスト情報が含まれる。当該テキスト情報に対して語義解析を行って主観評価を取得することができる。 The subjective sample data includes text information corresponding to the voice input by the user after the reproduction of the response content is interrupted. It is possible to obtain a subjective evaluation by performing a word meaning analysis on the text information.

当業者であればわかるように、当該客観評価と主観評価は具体的な評価値であっても良く、ユーザから入力される満足度も具体的な値であっても良い。これら値に対してイテレーショントレーニングを行うことにより、満足度評価モデルを取得することができる。 As can be understood by those skilled in the art, the objective evaluation and the subjective evaluation may be specific evaluation values, and the satisfaction level input by the user may also be a specific value. A satisfaction evaluation model can be obtained by performing iteration training on these values.

選択的に、異なる客観的なサンプルデータについて、それぞれ評価処理を行ってそれぞれの対応する評価値を取得することもできる。例えば、ユーザ意図と応答内容に応じて評価値ｘ１を取得し、応答遅延に応じて評価値ｙ１を取得し、応答内容の現在再生時間に応じて評価値ｚ１を取得することができる。主観評価に対応する評価値はｐ１、ユーザから入力される満足度はｎ１である。 It is also possible to selectively perform evaluation processing on different objective sample data to obtain the corresponding evaluation values. For example, the evaluation value x1 can be acquired according to the user's intention and the response content, the evaluation value y1 can be acquired according to the response delay, and the evaluation value z1 can be acquired according to the current reproduction time of the response content. The evaluation value corresponding to the subjective evaluation is p1, and the satisfaction level input from the user is n1.

本実施例において、可能的な満足度評価モデルが与えられた。当該満足度評価モデルはＡｘ＋Ｂｙ＋Ｃｚ＋Ｄｐ＝ｎであっても良い。各評価値を当該満足度評価モデルに代入してイテレーショントレーニングを行うことにより、Ａ、Ｂ、Ｃ、Ｄを取得することができる。これにより、満足度評価モデルを取得可能である。評価値は取得された後に、直接に当該モデルに代入されると、最終的な満足度ｎを取得することができる。本実施例は、一種のモデルを例示して満足度評価モデルを如何に構築することを説明したが、他の形式のモデルは、主観評価と客観評価に対して満足度を提供可能であれば、何れも本実施例の保護範囲に入る。本実施例では、満足度評価モデルの具体的な実現手段について制限しない。 In this example, a possible satisfaction evaluation model was given. The satisfaction evaluation model may be Ax + By + Cz + Dp = n. A, B, C, and D can be obtained by substituting each evaluation value into the satisfaction evaluation model and performing iteration training. As a result, a satisfaction evaluation model can be obtained. After the evaluation value is acquired, if it is directly substituted into the model, the final satisfaction level n can be acquired. In this embodiment, a kind of model is illustrated to explain how to construct a satisfaction evaluation model, but other types of models can provide satisfaction for subjective evaluation and objective evaluation. , Both fall within the scope of protection of this embodiment. In this embodiment, there are no restrictions on the specific means of realizing the satisfaction evaluation model.

以下に詳細的な実施例を採用して、本発明の実施例において如何に満足度評価モデルにより満足度を取得するについて説明する。 Hereinafter, a detailed example will be adopted to explain how to obtain the satisfaction level by the satisfaction evaluation model in the embodiment of the present invention.

図３は、本発明の実施例により提供される音声対話の満足度の確定方法の第一フローチャートである。当該方法の実行主体は、図１に示された端末又はサーバであっても良い。図３に示すように、当該方法は以下のステップを含む。 FIG. 3 is a first flowchart of the method for determining the satisfaction level of the voice dialogue provided by the embodiment of the present invention. The execution subject of the method may be the terminal or server shown in FIG. As shown in FIG. 3, the method includes the following steps.

Ｓ３０１において、音声対話特徴を取得する。前記音声対話特徴には、音声対話の客観データと音声対話の主観データとが含まれる。なお、前記客観データと前記主観データは、同一の主題に対するデータである。 In S301, the voice dialogue feature is acquired. The voice dialogue feature includes objective data of voice dialogue and subjective data of voice dialogue. The objective data and the subjective data are data for the same subject.

本実施例の音声対話特徴は、端末のログデータから取得可能である。具体的な実現手順において、端末はユーザから入力された音声を取得した後に、当該音声をテキスト情報に変換し、当該テキスト情報に基づいてユーザ意図を取得し、ユーザ意図に応じて応答内容を取得して応答内容をユーザにフィードバックする。ユーザは、当該応答内容に応じて主観的なフィードバックを行うことができる。 The voice dialogue feature of this embodiment can be acquired from the log data of the terminal. In the specific realization procedure, the terminal acquires the voice input from the user, converts the voice into text information, acquires the user intention based on the text information, and acquires the response content according to the user intention. And feed back the response contents to the user. The user can give subjective feedback according to the content of the response.

ユーザと端末の対話データについて、データ毎に対応する主題を取得した後に、同一の主題に対する主観データと客観データを取得することができる。具体的に、対話データに対して語義解析、内容解析、データ生成時の時系列解析などを行って同一の主題に対する主観データと客観データを生成することができる。本実施例では、同一の主題のデータを取得する実現手段に制限しない。 Regarding the dialogue data between the user and the terminal, after acquiring the corresponding subject for each data, it is possible to acquire the subjective data and the objective data for the same subject. Specifically, it is possible to generate subjective data and objective data for the same subject by performing word meaning analysis, content analysis, time series analysis at the time of data generation, etc. on the dialogue data. In this embodiment, the realization means for acquiring data of the same subject is not limited.

選択的に、具体的な例示において、音声の入力時間、テキスト情報及び応答内容の標識、属性など及び応答時間は、何れもログデータを形成することになる。当業者であればわかるように、ログデータにおいて、各ログについて時間を記録し、当該時間情報に基づいて第一ログデータを取得する。なお、第一ログデータは、ユーザと端末が連続的なインタラクションを行う一連のログデータである。 Optionally, in a specific example, the voice input time, the text information and the response content indicator, the attribute, and the response time all form log data. As can be understood by those skilled in the art, the time is recorded for each log in the log data, and the first log data is acquired based on the time information. The first log data is a series of log data in which the user and the terminal continuously interact with each other.

具体的な実現手順において、各ログ記録の間の時間間隔を取得し、二つの時間間隔が所定の閾値よりも大きい場合に、二つの時間間隔の間のログデータ、即ち第一ログデータを取得する。当該第一ログデータは、二つの時間間隔の間の全てのデータである。 In a specific implementation procedure, the time interval between each log recording is acquired, and when the two time intervals are larger than a predetermined threshold value, the log data between the two time intervals, that is, the first log data is acquired. To do. The first log data is all data between the two time intervals.

当業者であればわかるように、第一ログデータと隣接する前の時間帯における第二ログデータ及び隣接する次の時間帯における第三ログデータとの時間間隔は所定の閾値よりも大きい。なお、第二ログデータ及び第三ログデータは、一つのログデータとして理解しても良く、第一ログデータと類似する、二つの時間間隔の間の全てのデータとして理解しても良い。図４は、本発明の実施例により提供されるログデータの模式図である。図４に示すように、時間軸において、中央部分は第一ログデータのうち各データの記録時間である。当業者であればわかるように、第一ログデータ、第二ログデータ及び第三ログデータは、Ｓｅｓｓｉｏｎ（セッション）に従って区画されるデータに相当する。 As will be appreciated by those skilled in the art, the time interval between the first log data and the second log data in the time zone before the adjacency and the third log data in the adjacent next time zone is larger than a predetermined threshold value. The second log data and the third log data may be understood as one log data, or may be understood as all data between two time intervals similar to the first log data. FIG. 4 is a schematic diagram of log data provided by an embodiment of the present invention. As shown in FIG. 4, on the time axis, the central portion is the recording time of each data in the first log data. As can be understood by those skilled in the art, the first log data, the second log data, and the third log data correspond to data partitioned according to a session.

第一ログデータが取得された後に、第一ログデータにおいて、ユーザによる二回の隣接する音声入力に対応するテキスト情報に基づいて、前記音声対話特徴を取得する。 After the first log data is acquired, the voice dialogue feature is acquired in the first log data based on the text information corresponding to the two adjacent voice inputs by the user.

当業者であればわかるように、音声対話を行う際に、ユーザと端末の音声対話手順は、音声入力−応答内容−語義入力−応答内容……である。これにより、音声入力と応答内容は繰り返して発生する。 As can be understood by those skilled in the art, when performing a voice dialogue, the voice dialogue procedure between the user and the terminal is voice input-response content-word meaning input-response content .... As a result, the voice input and the response content are repeatedly generated.

なお、音声入力がテキスト情報に変換された後に、当該テキスト情報に対して語義解析を行うことができる。当該テキスト情報は、ユーザ意図であっても良く、ユーザのフィードバックであっても良い。本実施例において、隣接の二つのテキスト情報のそれぞれがユーザ意図とユーザのフィードバックである場合に、当該ユーザ意図、ユーザのフィードバック及び端末のフィードバックの応答内容の関連情報などを抽出して音声対話特徴を取得する。即ち、音声入力−応答内容−語義入力という手順において特徴情報を抽出する。当該手順はユーザ意図−応答内容−ユーザのフィードバックになる。なお、当該ユーザのフィードバックに他の主題に係わる内容がない場合に、当該ユーザのフィードバックは依然としてユーザ意図に対応する主題に該当し、即ち主題に変化がないと考える。なお、主観データはユーザのフィードバックであり、客観データはユーザ意図及び応答内容の関連情報などを含む。 After the voice input is converted into text information, the word meaning analysis can be performed on the text information. The text information may be the user's intention or the user's feedback. In this embodiment, when each of the two adjacent text information is the user intention and the user feedback, the voice dialogue feature is extracted by extracting the information related to the user intention, the user feedback, and the response content of the terminal feedback. To get. That is, the feature information is extracted in the procedure of voice input-response content-word meaning input. The procedure is user intent-response content-user feedback. If the user's feedback does not have any content related to another subject, it is considered that the user's feedback still corresponds to the subject corresponding to the user's intention, that is, the subject does not change. The subjective data is user feedback, and the objective data includes information related to the user's intention and response content.

当業者であればわかるように、一部の連続的な音声入力が二つの異なる主題に同時に係わると、当該音声入力を二つの部分の内容に区分することができる。例えば前部が一つの主題に係わり、後部が他の主題に係わると、前部を一つの主題に分けて音声対話特徴を抽出し、後部を他の主題に分けて音声特徴を抽出することができる。 As will be appreciated by those skilled in the art, if some continuous voice inputs involve two different subjects at the same time, the voice inputs can be divided into two parts of content. For example, if the front part is related to one subject and the rear part is related to another subject, the front part can be divided into one subject to extract voice dialogue features, and the rear part can be divided into other subjects to extract voice features. it can.

Ｓ３０２において、前記客観データに対して評価処理を行って客観評価を取得し、前記主観データに対して評価処理を行って主観評価を取得する。 In S302, the objective data is evaluated and the objective evaluation is acquired, and the subjective data is evaluated and the subjective evaluation is acquired.

当業者であればわかるように、客観データと主観データは幾つかのデータである。これらデータは、形式が異なり、フォーマットが異なり、或いはタイプが異なる。データを統一することにより満足度評価モデルの入力を統一するために、客観データに対して評価処理を行って客観評価を取得し、主観データに対して評価処理を行って主観評価を取得することができる。 As will be appreciated by those skilled in the art, objective data and subjective data are some data. These data have different formats, different formats, or different types. In order to unify the input of the satisfaction evaluation model by unifying the data, the objective data is evaluated and the objective evaluation is acquired, and the subjective data is evaluated and the subjective evaluation is acquired. Can be done.

なお、客観評価と主観評価は何れも評価値である。当該評価値は、満足度の総計値に基づいて算出されても良く、異なるデータタイプに対して確定されても良い。選択的に、当該評価値は、−１〜１の値を取っても良い。 Both the objective evaluation and the subjective evaluation are evaluation values. The evaluation value may be calculated based on the total value of satisfaction, or may be determined for different data types. Alternatively, the evaluation value may take a value of -1 to 1.

当業者であればわかるように、客観データの評価処理方式及び主観データの評価処理方式は、前記満足度評価モデルを確立する場合の評価処理方式と同じにすることにより、評価値と満足度評価モデルの合理性を確保した。 As can be understood by those skilled in the art, the evaluation processing method for objective data and the evaluation processing method for subjective data are the same as the evaluation processing method for establishing the satisfaction evaluation model, thereby evaluating the evaluation value and satisfaction. Ensured the rationality of the model.

Ｓ３０３において、前記客観評価と前記主観評価を満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得する。 In S303, the objective evaluation and the subjective evaluation are input to the satisfaction evaluation model, and the satisfaction level of the voice dialogue output from the satisfaction evaluation model is acquired.

客観評価と主観評価を取得した後に、客観評価と主観評価を満足度評価モデルの入力として、満足度評価モデルにより演算や解析などを行う。満足度評価モデルの出力は音声対話の満足度になる。 After acquiring the objective evaluation and the subjective evaluation, the objective evaluation and the subjective evaluation are input to the satisfaction evaluation model, and the calculation and analysis are performed by the satisfaction evaluation model. The output of the satisfaction evaluation model is the satisfaction of the voice dialogue.

本発明の実施例により提供された音声対話の満足度の確定方法は、同一の主題に対するデータである、音声対話の客観データと音声対話の主観データが含まれる音声対話特徴を取得することで、同一の主題の主観データと客観データを取得することにより、満足度の評価用のデータをリアルで全面的に取得し、評価のデータのリアル性及び全面性を確保することができるため、満足度がより全面的であり、ユーザのリアルな評価に一層に近くなる。客観データに対して評価処理を行って客観評価を取得し、主観データに対して評価処理を行って主観評価を取得し、客観評価と主観評価を満足度評価モデルの入力として満足度評価モデルから出力される音声対話の満足度を取得することで、満足度評価モデルの手段により満足度を取得することにより、満足度を快速的で正確に取得できるため、当該方法が音声対話量の多い場合に適用できることになる。 The method of determining the satisfaction level of the voice dialogue provided by the embodiment of the present invention is to acquire a voice dialogue feature including objective data of the voice dialogue and subjective data of the voice dialogue, which are data for the same subject. By acquiring the subjective data and the objective data of the same subject, the data for evaluation of satisfaction can be acquired in a real and complete manner, and the realism and completeness of the evaluation data can be ensured. Is more comprehensive and closer to the user's realistic evaluation. Evaluation processing is performed on the objective data to obtain the objective evaluation, evaluation processing is performed on the subjective data to obtain the subjective evaluation, and the objective evaluation and the subjective evaluation are input from the satisfaction evaluation model from the satisfaction evaluation model. By acquiring the satisfaction level of the output voice dialogue, the satisfaction level can be obtained quickly and accurately by acquiring the satisfaction level by the means of the satisfaction evaluation model. Therefore, when the method has a large amount of voice dialogue. Will be applicable to.

以下に具体的な実現方式を採用して、客観評価と主観評価を取得する実現方式を詳しく説明する。なお、本実施例において、音声対話の客観データには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間が含まれる。音声対話の主観データには、応答内容の再生が中断された後にユーザによる音声入力に対応するテキスト情報が含まれる。 The realization method for acquiring objective evaluation and subjective evaluation by adopting a specific realization method will be described in detail below. In this embodiment, the objective data of the voice dialogue includes the user intention and the response content, the response delay, and the current playback time of the response content. The subjective data of the voice dialogue includes text information corresponding to the voice input by the user after the reproduction of the response content is interrupted.

なお、ユーザ意図と応答内容との意図マッチ度が高く、応答遅延が短く、応答内容の現在再生時間が長いほど、ユーザの満足度が大きくなり、客観評価の値が大きくなると意味する。 It means that the higher the degree of intentional match between the user's intention and the response content, the shorter the response delay, and the longer the current playback time of the response content, the greater the user's satisfaction and the larger the objective evaluation value.

当該テキスト情報の気分情報を取得する。ユーザの気分が良いほど、主観評価の値が大きくなる。 Acquire the mood information of the text information. The better the user feels, the larger the subjective evaluation value.

以下、図５及び図６と合わせて詳しく説明する。下記の客観評価と主観評価を取得する実現方式は、前記の図２に示されたモデル確立の実施例に適用しても良く、図３に示された満足度確定の実施例に適用しても良い。 Hereinafter, a detailed description will be given together with FIGS. 5 and 6. The following implementation method for acquiring the objective evaluation and the subjective evaluation may be applied to the model establishment example shown in FIG. 2 above, or may be applied to the satisfaction determination example shown in FIG. Is also good.

図５は、本発明の実施例により提供される音声対話の満足度の確定方法の第二のフローチャートである。図５に示すように、当該方法は以下のことを含む。 FIG. 5 is a second flowchart of the method for determining the satisfaction level of the voice dialogue provided by the embodiment of the present invention. As shown in FIG. 5, the method includes the following.

Ｓ５０１において、音声対話特徴を取得する。前記音声対話特徴には、音声対話の客観データと音声対話の主観データが含まれる。前記音声対話の客観データには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間が含まれる。 In S501, the voice dialogue feature is acquired. The voice dialogue feature includes objective data of voice dialogue and subjective data of voice dialogue. The objective data of the voice dialogue includes the user intention and the response content, the response delay, and the current playback time of the response content.

具体的に、Ｓ５０１において音声対話特徴を取得する方式はＳ１０１において音声対話特徴を取得する方式と類似する。本実施例の客観データには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間が含まれる。 Specifically, the method of acquiring the voice dialogue feature in S501 is similar to the method of acquiring the voice dialogue feature in S101. The objective data of this embodiment includes the user intention and the response content, the response delay, and the current playback time of the response content.

当該客観データには、ユーザ意図及び端末からの当該ユーザ意図に対する各種のフィードバックが含まれる。なお、意図（ｉｎｔｅｎｔ）は領域データに対する操作であり、一般的に動賓語句で命名され、例えば天気の問い合わせ、音楽の検索がある。端末からのユーザ意図に対するフィードバックには、応答内容、応答遅延、応答内容の現在再生時間が含まれる。 The objective data includes various feedbacks on the user's intention and the user's intention from the terminal. It should be noted that the intention is an operation on the area data, which is generally named by a guest phrase, such as a weather inquiry and a music search. Feedback to the user's intention from the terminal includes the response content, the response delay, and the current playback time of the response content.

Ｓ５０２において、前記ユーザ意図と前記応答内容との意図マッチ度に基づいて、第一客観評価を取得する。 In S502, the first objective evaluation is acquired based on the intention match degree between the user intention and the response content.

ユーザ意図に対して語義解析を行ってユーザ意図における領域の属性などの内容を取得し、応答内容の領域の属性などの内容を抽出する。領域の属性の類似度に基づいて意図マッチ度を確定する。なお、意図マッチ度の値は０％−１００％であっても良い。当該意図マッチ度は第一客観評価に変換可能であり、それに応じて第一客観評価の値は０〜１である。例えば、ユーザ意図が張三のＡ歌、応答内容が張三のＢ歌、領域の属性が歌手名、歌名である場合に、類似度が５０％、対応する第一客観評価が０．５である。 Semantic parsing is performed on the user's intention to acquire the contents such as the area attribute in the user's intention, and the contents such as the area attribute of the response content are extracted. The degree of intentional match is determined based on the similarity of the attributes of the area. The value of the intentional match degree may be 0% -100%. The intentional match degree can be converted into the first objective evaluation, and the value of the first objective evaluation is 0 to 1 accordingly. For example, when the user's intention is Zhang San's A song, the response content is Zhang San's B song, and the area attributes are the singer name and song name, the similarity is 50% and the corresponding first objective evaluation is 0.5. Is.

Ｓ５０３において、前記応答遅延と標準遅延に基づいて、第二客観評価を取得する。 In S503, a second objective evaluation is acquired based on the response delay and the standard delay.

当該標準遅延は、予め設定されたユーザの受入可能な遅延であっても良い。例えば、当該標準遅延が２００ｍｓ、当該標準遅延に対応する第二客観評価が０である。応答遅延が標準遅延よりも大きい場合に、第二客観評価が０よりも小さく、応答遅延が標準遅延よりも小さい場合に、第二客観評価が０よりも大きい。 The standard delay may be a preset acceptable delay for the user. For example, the standard delay is 200 ms, and the second objective evaluation corresponding to the standard delay is 0. When the response delay is larger than the standard delay, the second objective evaluation is less than 0, and when the response delay is smaller than the standard delay, the second objective evaluation is larger than 0.

可能的な実現方式において、標準遅延に基づいて標準化処理を行って第二客観評価が−１〜１の間にあるようにすることができる。例えば、応答遅延が標準遅延よりも大きい場合に、応答遅延と標準遅延の差が標準遅延よりも大きければ、第二客観評価の値が１であり、当該差が標準遅延よりも小さければ、当該差と標準遅延の比にマイナスを取って第二客観評価を取得する。応答遅延が標準遅延よりも小さい場合に、標準遅延と応答遅延の差を取得し、当該差と標準遅延の比を第二客観評価とする。 In a possible implementation, a standardization process may be performed based on the standard delay so that the second objective evaluation is between -1 and 1. For example, when the response delay is larger than the standard delay, if the difference between the response delay and the standard delay is larger than the standard delay, the value of the second objective evaluation is 1, and if the difference is smaller than the standard delay, the response is concerned. Take a negative ratio between the difference and the standard delay to get a second objective evaluation. When the response delay is smaller than the standard delay, the difference between the standard delay and the response delay is acquired, and the ratio of the difference to the standard delay is used as the second objective evaluation.

Ｓ５０４において、前記応答内容の現在再生時間と前記応答内容の標準再生時間に基づいて、第三客観評価を取得する。 In S504, the third objective evaluation is acquired based on the current reproduction time of the response content and the standard reproduction time of the response content.

本実施例において、応答内容の現在再生時間が長いほど、ユーザの満足度が高くなると意味する。現在再生時間と標準再生時間の比を第三客観評価とすることができる。 In this embodiment, the longer the current playback time of the response content, the higher the user's satisfaction. The ratio of the current playback time to the standard playback time can be used as the third objective evaluation.

Ｓ５０５において、前記主観データに対して評価処理を行って主観評価を取得する。 In S505, the subjective evaluation is obtained by performing an evaluation process on the subjective data.

Ｓ５０６において、前記第一客観評価、前記第二客観評価、前記第三客観評価及び前記主観評価を満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得する。 In S506, the satisfaction level of the voice dialogue output from the satisfaction evaluation model is acquired by using the first objective evaluation, the second objective evaluation, the third objective evaluation, and the subjective evaluation as inputs of the satisfaction evaluation model. ..

第一客観評価、第二客観評価、第三客観評価及び主観評価を満足度評価モデルの入力として、満足度評価モデルの処理により、満足度評価モデルから音声対話の満足度を直接に出力する。 The first objective evaluation, the second objective evaluation, the third objective evaluation, and the subjective evaluation are input to the satisfaction evaluation model, and the satisfaction level of the voice dialogue is directly output from the satisfaction evaluation model by the processing of the satisfaction evaluation model.

当業者であればわかるように、満足度評価モデルを確立する時に、モデルの入力はサンプルデータに対応する第一客観評価、第二客観評価、第三客観評価、主観評価及びユーザから入力される満足度である。そしてイテレーショントレーニングを行って満足度評価モデルを取得する。 As will be appreciated by those skilled in the art, when establishing a satisfaction evaluation model, model inputs are input from the first objective evaluation, the second objective evaluation, the third objective evaluation, the subjective evaluation, and the user corresponding to the sample data. Satisfaction. Then, iterate training is performed to acquire a satisfaction evaluation model.

本発明の実施例は、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間について客観評価をそれぞれ取得することにより、客観評価に内容、遅延、再生時間などの各種の客観的な要素を総合してユーザの満足度を取得することができ、満足度が正確的で且つ全面的になる。 In the embodiment of the present invention, various objective elements such as content, delay, and playback time are obtained in the objective evaluation by acquiring objective evaluations for the user intention, the response content, the response delay, and the current playback time of the response content, respectively. It is possible to obtain the user's satisfaction level by integrating the above, and the satisfaction level becomes accurate and complete.

図６は、本発明の実施例により提供される主観評価を取得するフローチャートである。本実施例により提供される客観評価は、前記の何れか一つの実施例に適用可能である。当該客観評価に対応する音声対話の主観データには、応答内容の再生が中断された後にユーザによる音声入力に対応するテキスト情報、又は、応答内容の再生が終了した後にユーザから入力されるテキスト情報、即ちユーザからの応答内容に対するフィードバックが含まれる。当該フィードバックは、直接的なフィードバックであっても良く、気分的なフィードバックなどであっても良い。図６に示すように、当該方法は以下のことを含む。 FIG. 6 is a flowchart for acquiring the subjective evaluation provided by the embodiment of the present invention. The objective evaluation provided by this embodiment is applicable to any one of the above embodiments. The subjective data of the voice dialogue corresponding to the objective evaluation includes text information corresponding to the voice input by the user after the reproduction of the response content is interrupted, or text information input by the user after the reproduction of the response content is completed. That is, feedback on the response content from the user is included. The feedback may be direct feedback, mood feedback, or the like. As shown in FIG. 6, the method includes the following.

Ｓ６０１において、テキスト情報に対して語義解析を行って、前記テキスト情報に対応する内容属性を取得する。前記内容属性は感情属性又は主題属性である。 In S601, the word meaning analysis is performed on the text information, and the content attribute corresponding to the text information is acquired. The content attribute is an emotional attribute or a subject attribute.

ユーザから応答内容に対して入力された音声フィードバックに対応するテキスト情報が取得された後に、当該テキスト情報に対して語義解析を行って、テキスト情報に対応する内容属性を取得する。本実施例において、内容属性を感情属性と主題属性に分ける。なお、感情属性はユーザが感情を伝える内容を指し、主題属性はユーザから現在の主題に対する更なる操作を指す。 After the text information corresponding to the voice feedback input to the response content from the user is acquired, the word meaning analysis is performed on the text information to acquire the content attribute corresponding to the text information. In this embodiment, the content attributes are divided into emotional attributes and subject attributes. The emotional attribute refers to the content that the user conveys emotions, and the subject attribute refers to further operations from the user to the current subject.

具体的な実現過程において、語義解析によりテキスト情報におけるキーワードを抽出し、当該キーワードを感情バンク又は主題バンクにおける単語とマッチングして当該テキスト情報に対応する内容属性を判断することができる。 In a specific realization process, a keyword in text information can be extracted by word meaning analysis, and the keyword can be matched with a word in an emotion bank or a subject bank to determine a content attribute corresponding to the text information.

Ｓ６０２において、前記テキスト情報に対応する主題タイプを取得する。 In S602, the subject type corresponding to the text information is acquired.

Ｓ６０３において、前記テキスト情報に対応する主題タイプがユーザ意図に対応する主題タイプと一致であれば、前記主観評価が所定の評価値よりも低いと確定する。 In S603, if the subject type corresponding to the text information matches the subject type corresponding to the user's intention, it is determined that the subjective evaluation is lower than the predetermined evaluation value.

なお、Ｓ６０２−Ｓ６０３は、当該テキスト情報の内容属性が主題属性であることに対して設定されるものである。テキスト情報に対応する主題タイプがユーザ意図に対応する主題タイプと一致であれば、ユーザが現在の主題についてフィードバックされた応答内容に満足できないと意味するため、入力内容を再び繰り返す。従って、当該主観評価は所定の評価値よりも低い。ユーザによる入力が繰り返した回数、及び入力される文字の長さに基づいて具体的な評価値を確定することができる。 Note that S602-S603 is set for the content attribute of the text information being the subject attribute. If the subject type corresponding to the text information matches the subject type corresponding to the user's intention, it means that the user is not satisfied with the response content fed back about the current subject, so the input content is repeated again. Therefore, the subjective evaluation is lower than the predetermined evaluation value. A specific evaluation value can be determined based on the number of times the input by the user is repeated and the length of the input character.

Ｓ６０４において、前記テキスト情報における感情キーワードを抽出する。 In S604, the emotional keyword in the text information is extracted.

Ｓ６０５において、前記感情キーワードと気分タイプとの対応関係に基づいて、いずれかの気分タイプを取得する。前記気分タイプには、積極的な気分、消極的な気分及び中性的な気分が含まれる。 In S605, one of the mood types is acquired based on the correspondence between the emotion keyword and the mood type. The mood types include positive mood, negative mood and neutral mood.

Ｓ６０６において、前記気分タイプと所定の評価との対応関係に基づいて、主観評価を取得する。 In S606, a subjective evaluation is acquired based on the correspondence between the mood type and the predetermined evaluation.

なお、Ｓ６０４−Ｓ６０６は、内容属性が気分属性であることに対して設定されるものである。テキスト情報における感情キーワードを抽出可能である。本実施例において、気分データベースを予め設置することができる。気分データベースには、積極的な気分サブデータベース、消極的な気分サブデータベース、中性的な気分サブデータベースが含まれる。 In addition, S604-S606 is set for the content attribute being a mood attribute. Emotion keywords in text information can be extracted. In this embodiment, a mood database can be set up in advance. The mood database includes a positive mood subdatabase, a negative mood subdatabase, and a neutral mood subdatabase.

当該感情キーワードを気分データベースにおける単語とマッチングする。当該感情キーワードが積極的な気分サブデータベースにマッチされた場合に、気分タイプが積極的な気分であり、対応する主観評価の値が１である。当該感情キーワードが消極的な気分サブデータベースにマッチされた場合に、気分タイプが消極的な気分であり、対応する主観評価の値が−１である。当該感情キーワードが中性的な気分サブデータベースにマッチされた場合に、気分タイプが中性的な気分であり、対応する主観評価の値が０である。 Match the emotional keyword with a word in the mood database. When the emotional keyword is matched to the positive mood subdatabase, the mood type is positive mood and the corresponding subjective evaluation value is 1. When the emotional keyword is matched to the negative mood subdatabase, the mood type is negative mood and the corresponding subjective evaluation value is -1. When the emotional keyword is matched to the neutral mood subdatabase, the mood type is neutral mood and the corresponding subjective evaluation value is 0.

本発明の実施例は、ユーザの主観的なフィードバックに対して解析することにより、ユーザからの音声対話に対する主観的な満足度を取得することができる。本発明の実施例は、主観評価を確定する時に、ユーザの気分及びユーザからの同一の主題に対する繰り返した指示を解析することにより、ユーザの各種の操作に対して評価可能であり、ユーザの各種の表現をごまかせずに、ユーザの満足度をリアルでフィードバックすることができる。 In the embodiment of the present invention, the subjective satisfaction with the voice dialogue from the user can be obtained by analyzing the subjective feedback of the user. An embodiment of the present invention can be evaluated for various operations of the user by analyzing the mood of the user and repeated instructions for the same subject from the user when determining the subjective evaluation. It is possible to give real feedback on user satisfaction without cheating on the expression of.

前記の各実施例において、本発明の実施例は一つの主題についての対話に対して満足度の確定を行う過程を示した。具体的な実現過程において、複数の主題についての対話の満足度を連続的に取得することができる。各主題についての対話の満足度の確定方法は、何れも前記の実施例を参照可能である。以下に幾つかの具体的な実施例を提供して複数の主題についての対話に対して如何に満足度を取得するかを説明する。 In each of the above examples, the examples of the present invention have shown the process of determining satisfaction with a dialogue on one subject. In the concrete realization process, the satisfaction level of dialogue on a plurality of subjects can be continuously obtained. For the method of determining the satisfaction level of the dialogue on each subject, the above-described embodiment can be referred to. Some specific examples will be provided below to explain how to obtain satisfaction with dialogues on multiple subjects.

一つの具体的な例示は、ユーザ意図１−応答内容１−ユーザ意図２−応答内容２−主観的なフィードバック＋ユーザ意図３−応答内容３……である。上記の内容からわかるように、ユーザ意図１及び応答内容１について、ユーザはユーザのフィードバックを行わないため、客観評価を取得すれば良い。なお、客観評価の実現方式は前記の実施例を参照可能である。満足度評価モデルは、客観評価のみに関わっても良く、客観評価と主観評価の両方にも関わるが、主観評価の入力が０であっても良い。ユーザ意図２−応答内容２−主観的なフィードバックについて、前記の満足度の確定方法を採用して実現可能である。ユーザ意図３−応答内容３について、具体的な実現状況に基づいて満足度を確定可能である。 One specific example is user intention 1-response content 1-user intention 2-response content 2-subjective feedback + user intention 3-response content 3 .... As can be seen from the above contents, since the user does not give feedback to the user regarding the user intention 1 and the response content 1, it is sufficient to obtain an objective evaluation. The above-described embodiment can be referred to as a method for realizing the objective evaluation. The satisfaction evaluation model may be involved only in the objective evaluation, and may be involved in both the objective evaluation and the subjective evaluation, but the input of the subjective evaluation may be 0. User intention 2-Response content 2-Subjective feedback can be realized by adopting the above-mentioned method for determining satisfaction. User Intent 3-Satisfaction level can be determined for response content 3 based on the specific realization status.

複数の満足度を取得した後に、端末又はサーバは、複数の満足度のデータに対して各種の解析を行って端末の製品性能を取得する。 After acquiring the plurality of satisfaction levels, the terminal or the server performs various analyzes on the data of the plurality of satisfaction levels to acquire the product performance of the terminal.

当業者であればわかるように、前記の各主観評価又は客観評価に対応する評価値の値取り方式は、例示的な値取り方式であるが、他の値取り方式を採用しても良い。例えば、５点制、１０点制及び１００点制の違いにより値取り方式が異なり、本実施例では具体的な値取り方式に制限しない。 As will be understood by those skilled in the art, the evaluation value valuation method corresponding to each of the above subjective evaluations or objective evaluations is an exemplary valuation method, but other valuation methods may be adopted. For example, the pricing method differs depending on the difference between the 5-point system, the 10-point system, and the 100-point system, and the present embodiment is not limited to a specific pricing method.

図７は、本発明の実施例により提供される音声対話の満足度の確定装置の構成模式図である。図７に示すように、当該装置７０は、取得モジュール７０１と、処理モジュール７０２と、確定モジュール７０３とを備え、選択的に更にトレーニングモジュール７０４を備える。 FIG. 7 is a schematic configuration diagram of a device for determining the satisfaction level of the voice dialogue provided by the embodiment of the present invention. As shown in FIG. 7, the apparatus 70 includes an acquisition module 701, a processing module 702, a confirmation module 703, and optionally further a training module 704.

取得モジュール７０１は、音声対話特徴を取得する。前記音声対話特徴には、音声対話の客観データと音声対話の主観データが含まれる。なお、前記客観データと前記主観データは、同一の主題に対するデータである。 The acquisition module 701 acquires voice dialogue features. The voice dialogue feature includes objective data of voice dialogue and subjective data of voice dialogue. The objective data and the subjective data are data for the same subject.

処理モジュール７０２は、前記客観データに対して評価処理を行って客観評価を取得し、前記主観データに対して評価処理を行って主観評価を取得する。 The processing module 702 performs evaluation processing on the objective data to acquire the objective evaluation, and performs evaluation processing on the subjective data to acquire the subjective evaluation.

確定モジュール７０３は、前記客観評価と前記主観評価を満足度評価モデルの入力として、前記満足度評価モデルから出力される音声対話の満足度を取得する。 The confirmation module 703 takes the objective evaluation and the subjective evaluation as inputs of the satisfaction evaluation model, and acquires the satisfaction level of the voice dialogue output from the satisfaction evaluation model.

選択的に、前記音声対話の客観データには、ユーザ意図と応答内容、応答遅延、及び応答内容の現在再生時間が含まれる。前記音声対話の主観データには、応答内容の再生が中断された後にユーザによる音声入力に対応するテキスト情報、又は応答内容の再生が終了した後にユーザから入力されたテキスト情報が含まれる。 Optionally, the objective data of the voice dialogue includes the user's intention and response content, response delay, and the current playback time of the response content. The subjective data of the voice dialogue includes text information corresponding to the voice input by the user after the reproduction of the response content is interrupted, or text information input by the user after the reproduction of the response content is completed.

選択的に、前記処理モジュール７０２は具体的に、前記ユーザ意図と前記応答内容との意図マッチ度に基づいて第一客観評価を取得し、前記応答遅延と標準遅延に基づいて第二客観評価を取得し、前記応答内容の現在再生時間と前記応答内容の標準再生時間に基づいて第三客観評価を取得する。 Optionally, the processing module 702 specifically acquires the first objective evaluation based on the intention match degree between the user intention and the response content, and performs the second objective evaluation based on the response delay and the standard delay. The third objective evaluation is acquired based on the current reproduction time of the response content and the standard reproduction time of the response content.

前記確定モジュール７０３は具体的に、前記第一客観評価、前記第二客観評価、前記第三客観評価及び前記主観評価を満足度評価モデルの入力として、前記満足度評価モデルから出力された音声対話の満足度を取得する。 Specifically, the confirmation module 703 uses the first objective evaluation, the second objective evaluation, the third objective evaluation, and the subjective evaluation as inputs of the satisfaction evaluation model, and the voice dialogue output from the satisfaction evaluation model. Get satisfaction.

選択的に、前記処理モジュール７０２は具体的に、前記テキスト情報に対して語義解析を行って前記テキスト情報に対応する、感情属性又は主題属性である内容属性を取得し、前記テキスト情報に対応する内容属性に基づいて主観評価を取得する。 Optionally, the processing module 702 specifically performs a word meaning analysis on the text information to acquire a content attribute which is an emotional attribute or a subject attribute corresponding to the text information, and corresponds to the text information. Obtain a subjective evaluation based on the content attributes.

選択的に、前記内容属性が主題属性であれば、前記処理モジュール７０２は具体的に、前記テキスト情報に対応する主題タイプを取得し、前記テキスト情報に対応する主題タイプがユーザ意図に対応する主題タイプと一致であれば、前記主観評価が所定の評価値よりも低いと確定する。 Optionally, if the content attribute is a subject attribute, the processing module 702 specifically acquires the subject type corresponding to the text information, and the subject type corresponding to the text information is the subject corresponding to the user's intention. If it matches the type, it is determined that the subjective evaluation is lower than the predetermined evaluation value.

選択的に、前記内容属性が感情属性であれば、前記処理モジュール７０２は具体的に、前記テキスト情報における感情キーワードを抽出し、前記感情キーワードと気分タイプの対応関係に基づいて、積極的な気分、消極的な気分及び中性的な気分からなる群のうちいずれかの気分タイプを取得し、前記気分タイプと所定の評価の対応関係に基づいて主観評価を取得する。 Alternatively, if the content attribute is an emotional attribute , the processing module 702 specifically extracts an emotional keyword in the text information and has a positive mood based on the correspondence between the emotional keyword and the mood type. , A mood type consisting of a negative mood and a neutral mood is acquired, and a subjective evaluation is acquired based on the correspondence between the mood type and a predetermined evaluation.

選択的に、前記取得モジュール７０１は具体的に、直前（前の隣接する）の時間帯における第二ログデータ及び直後（後の隣接する）の時間帯における第三ログデータとの時間間隔が所定の閾値よりも大きい第一ログデータを取得し、前記第一ログデータからユーザによる二回の隣接する音声入力のそれぞれに対応する主題を取得し、二回の隣接する音声入力のそれぞれに対応する主題に基づいて前記音声対話特徴を取得する。 Optionally, the acquisition module 701 specifically defines a time interval between the second log data in the immediately preceding (previous adjacent) time zone and the third log data in the immediately preceding (later adjacent) time zone. Acquires the first log data larger than the threshold value of, acquires the subject corresponding to each of the two adjacent voice inputs by the user from the first log data, and corresponds to each of the two adjacent voice inputs. Acquire the spoken dialogue feature based on the subject.

選択的に、前記客観評価と前記主観評価を満足度評価モデルの入力として前記満足度評価モデルから出力される音声対話の満足度を取得する前に、トレーニングモジュール７０４は、第一サンプル評価、第二サンプル評価及びユーザから入力される満足度が含まれるトレーニングサンプルセットを取得し、前記トレーニングサンプルセットに基づいて、イテレーショントレーニングにより前記満足度評価モデルを取得する。なお、前記第一サンプル評価は、客観的なサンプルデータに対して評価処理を行って得られ、前記第二サンプル評価は、主観的なサンプルデータに対して評価処理を行って得られる。前記客観的なサンプルデータと前記主観的なサンプルデータは、同一の主題に対するデータである。 Selectively, the training module 704 sets the first sample evaluation, the first sample evaluation, before acquiring the satisfaction level of the voice dialogue output from the satisfaction evaluation model by inputting the objective evaluation and the subjective evaluation into the satisfaction evaluation model. (Ii) A training sample set including the sample evaluation and the satisfaction level input from the user is acquired, and the satisfaction evaluation model is acquired by iteration training based on the training sample set. The first sample evaluation is obtained by performing an evaluation process on objective sample data, and the second sample evaluation is obtained by performing an evaluation process on subjective sample data. The objective sample data and the subjective sample data are data for the same subject.

本実施例により提供された音声対話の満足度の確定装置は、前記の方法の実施例を実行可能である。その実現方式と技術効果は類似であるため、本実施例では詳しく説明しない。 The voice dialogue satisfaction determination device provided by this embodiment is capable of executing the embodiment of the above method. Since the implementation method and the technical effect are similar, they will not be described in detail in this embodiment.

図８は、本発明の実施例により提供される音声対話の満足度の確定装置のハードウェア構成図である。図８に示すように、当該音声対話の満足度の確定装置８０は少なくとも一つのプロセッサ８０１とメモリ８０２とを備える。前記メモリ８０２には、コンピュータ実行指令が記憶される。前記少なくとも一つのプロセッサ８０１は、前記メモリ８０２に記憶されているコンピュータ実行指令を実行して前記少なくとも一つのプロセッサ８０１に前記の音声対話の満足度の確定方法を実行させる。 FIG. 8 is a hardware configuration diagram of the device for determining the satisfaction level of the voice dialogue provided by the embodiment of the present invention. As shown in FIG. 8, the device 80 for determining the satisfaction level of the voice dialogue includes at least one processor 801 and a memory 802. A computer execution command is stored in the memory 802. The at least one processor 801 executes a computer execution command stored in the memory 802 to cause the at least one processor 801 to execute the method of determining the satisfaction level of the voice dialogue.

プロセッサ８０１の具体的な実現過程は、前記の方法の実施例を参照可能である。その実現原理と技術効果は類似であるため、本実施例では詳しく説明しない。 For the specific implementation process of the processor 801 can refer to the embodiment of the above method. Since the realization principle and the technical effect are similar, they will not be described in detail in this embodiment.

当該確定装置８０には、更に通信部品８０３が備えられる。なお、プロセッサ８０１、メモリ８０２及び通信部品８０３は、バス８０４により接続される。 The determination device 80 is further provided with a communication component 803. The processor 801 and the memory 802 and the communication component 803 are connected by the bus 804.

前記の図７と図８の実施例において、当該音声対話の満足度の確定装置は、図１に示された端末又はサーバであっても良い。 In the embodiment of FIGS. 7 and 8 above, the device for determining the satisfaction level of the voice dialogue may be the terminal or server shown in FIG.

本発明の実施例は更にコンピュータ読取可能な記憶媒体を提供した。前記コンピュータ読取可能な記憶媒体にコンピュータ実行指令が記憶される。前記コンピュータ実行指令がプロセッサにより実行されると、前記の音声対話の満足度の確定方法を実現させる。 The embodiments of the present invention further provided computer readable storage media. The computer execution command is stored in the computer-readable storage medium. When the computer execution command is executed by the processor, the method of determining the satisfaction level of the voice dialogue is realized.

理解すべきなのは、前記の実施例に開示された装置及び方法は、他の手段で実現されても良い。例えば、ここまで説明された装置の実施例は例示的なものに過ぎない。例えば、前記モジュールの分割は、ロジック機能による分割に過ぎず、実際的に実現される場合に他の分割手段を採用しても良い。例えば複数のモジュールは、他のシステムに組み合わせ、又は集積しても良く、幾つかの特徴を無視し、或いは実行しなくて良い。また、表示し又は論述された同士間の結合、直接的な結合又は通信接続は、幾つかのインターフェース、装置又はモジュールによる間接的な結合又は通信接続であっても良く、電気的、機械的又は他の手段によるものであっても良い。 It should be understood that the devices and methods disclosed in the above embodiments may be implemented by other means. For example, the examples of the devices described so far are merely exemplary. For example, the division of the module is merely a division by a logic function, and other division means may be adopted when it is actually realized. For example, a plurality of modules may be combined or integrated with other systems, and some features may be ignored or not executed. Also, the coupling, direct coupling or communication connection between the displayed or discussed may be an indirect coupling or communication connection by some interface, device or module, electrical, mechanical or It may be by other means.

前記で分離部品として説明されたモジュールは、物理的に分離しても、しなくても良く、モジュールとして表示された部品は、物理的なユニットであっても、でなくても良く。即ち、一つの箇所に位置しても良く、複数のネットワークユニットに配布されても良く。実際の必要に応じてその一部又は全部のモジュールを選択して本実施例の技術案の目的を実現することができる。 The module described above as a separate component may or may not be physically separated, and the component labeled as a module may or may not be a physical unit. That is, it may be located in one place or may be distributed to a plurality of network units. The purpose of the technical proposal of this embodiment can be realized by selecting some or all of the modules according to actual needs.

また、本発明の各実施例における各機能モジュールは、一つの処理ユニットに集積されても良く、各モジュールとして単独で物理的に存在しても良く、二つ又はそれ以上のモジュールが一つのユニットに集積されても良い。前記のモジュール化されたユニットは、ハードウェアの形式で実現されても良く、ハードウェアと共にソフトウェア機能ユニットを採用して実現されても良い。 Further, each functional module in each embodiment of the present invention may be integrated in one processing unit, may physically exist independently as each module, and two or more modules may be one unit. It may be accumulated in. The modularized unit may be realized in the form of hardware, or may be realized by adopting a software functional unit together with the hardware.

前記のソフトウェア機能モジュールで実現された集積モジュールは、コンピュータ読取可能な記憶媒体に記憶されても良い。前記のソフトウェア機能モジュールは記憶媒体に記憶されており、コンピュータ装置（パソコン、サーバ、又はネットワーク装置などであっても良い）又はプロセッサ（ｐｒｏｃｅｓｓｏｒ）に本願の各実施例における前記方法の一部のステップを実行させるように複数の指令を含む。 The integrated module realized by the software function module may be stored in a computer-readable storage medium. The software function module is stored in a storage medium, and a computer device (which may be a personal computer, a server, a network device, or the like) or a processor is a step of a part of the method in each embodiment of the present application. Includes multiple instructions to execute.

理解すべきなのは、前記のプロセッサは、中央処理ユニット（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、ＣＰＵと略称する）であっても良く、他の汎用プロセッサ、デジタル信号プロセッサ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ、ＤＳＰと略称する）、専用集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ、ＡＳＩＣと略称する）などであっても良い。汎用プロセッサは、マイクロプロセッサであっても良く、当該プロセッサは、如何なる標準的なプロセッサなどであっても良い。発明と合わせて開示された方法のステップは、直接にハードウェアプロセッサにより実行して完成させても良く、プロセッサにおけるハードウェア及びソフトウェアモジュールにより組合わせて実行して完成させても良い。 It should be understood that the processor may be a central processing unit (abbreviated as CPU), other general-purpose processors, digital signal processors (abbreviated as Digital Signal Processor, DSP), dedicated integrated. It may be a circuit (Application Special Integrated Circuit, abbreviated as ASIC) or the like. The general-purpose processor may be a microprocessor, and the processor may be any standard processor or the like. The steps of the method disclosed together with the invention may be executed and completed directly by the hardware processor, or may be executed and completed by the hardware and software modules in the processor in combination.

メモリは、高速ＲＡＭメモリを含んでも良く、不揮発的な記憶ＮＶＭ、例えば少なくとも一つの磁気ディスクメモリを更に含んでも良く、メモリカード、モバイルハードディスク、読取専用メモリ、磁気ディスク又は光ディスクなどであっても良い。 The memory may include a high speed RAM memory and may further include a non-volatile storage NVM, such as at least one magnetic disk memory, and may be a memory card, mobile hard disk, read-only memory, magnetic disk or optical disk. ..

バスは工業標準アーキテクチャ（ＩｎｄｕｓｔｒｙＳｔａｎｄａｒｄＡｒｃｈｉｔｅｃｔｕｒｅ、ＩＳＡ）バス、外部装置接続（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔ、ＰＣＩ）バス、又は拡張工業標準アーキテクチャ（ＥｘｔｅｎｄｅｄＩｎｄｕｓｔｒｙＳｔａｎｄａｒｄＡｒｃｈｉｔｅｃｔｕｒｅ、ＥＩＳＡ）バスなどであっても良い。バスは、アドレスバス、データバス、コントロールバスなどに分けても良い。表現を便利にするために、本願の図面におけるバスは、一本に限定されず、一種のタイプに限定されない。 The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, and the like. For convenience of expression, the bus in the drawings of the present application is not limited to one and is not limited to one type.

前記の記憶媒体は、任意のタイプの揮発的又は不揮発的な記憶装置或いはそれらの組み合わせで実現されても良く、例えばスタティックランダムアクセスメモリ（ＳＲＡＭ）、電気消去可能なプログラミング読取専用メモリ（ＥＥＰＲＯＭ）、消去可能なプログラミング読取専用メモリ（ＥＰＲＯＭ）、プログラミング読取専用メモリ（ＰＲＯＭ）、読取専用メモリ（ＲＯＭ）、磁気メモリ、フラッショメモリ、磁気ディスク又は光ディスクであっても良い。記憶媒体は、汎用的又は専用的なコンピュータにアクセス可能な任意の使用可能な媒体であっても良い。 The storage medium may be implemented in any type of volatile or non-volatile storage device or a combination thereof, eg static random access memory (SRAM), electroerasable programming read-only memory (EEPROM), and the like. It may be an erasable programming read-only memory (EPROM), a programming read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic disk, or an optical disk. The storage medium may be any usable medium accessible to a general purpose or dedicated computer.

例示的な記憶媒体がプロセッサに結合されると、プロセッサが当該記憶媒体から情報を読取可能であり、且つ当該記憶媒体に情報を書込可能である。勿論、記憶媒体はプロセッサの構成の一部であっても良い。プロセッサと記憶媒体は、専用集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔｓ、ＡＳＩＣと略称する）に設置されても良い。勿論、プロセッサと記憶媒体は、分離部品として端末又はサーバに設置されても良い。 When an exemplary storage medium is coupled to the processor, the processor is capable of reading information from the storage medium and writing information to the storage medium. Of course, the storage medium may be part of the processor configuration. The processor and the storage medium may be installed in a dedicated integrated circuit (Application Specific Integrated Circuits, abbreviated as ASIC). Of course, the processor and the storage medium may be installed in the terminal or the server as separate parts.

当業者であればわかるように、前記の各方法の実施例を実現する全て又は一部のステップは、プログラムにより関連のハードウェアを命令して完成させることができる。前記のプログラムは、コンピュータ読み取り可能な記憶媒体に記憶されても良い。当該プログラムは実行されると、前記の各方法の実施例を含むステップを実行する。前記の記憶媒体には、ＲＯＭ、ＲＡＭ、磁気ディスク又は光ディスクなどのようなプログラムコードを記憶可能な各種の媒体が含まれる。 As will be appreciated by those skilled in the art, all or part of the steps to implement the embodiments of each of the above methods can be programmatically completed by instructing the relevant hardware. The program may be stored on a computer-readable storage medium. When the program is executed, it executes steps including examples of each of the above methods. The storage medium includes various media capable of storing program codes such as ROM, RAM, magnetic disk, optical disk, and the like.

最後に説明すべきなのは、前記の各実施例は、制限的なものではなく、本発明の技術案を説明するに過ぎない。当業者であればわかるように、前記の各実施例を参照して本発明を詳しく説明したが、依然として前記の各実施例に記載された技術案を補正し、或いはそのうち一部又は全ての技術特徴を等価に置換することができる。これら補正又は置換により、該当する技術案の本質が本発明の各実施例の技術案の範囲から逸脱することがない。 Finally, it should be explained that each of the above embodiments is not restrictive and merely describes the technical proposal of the present invention. As will be appreciated by those skilled in the art, the present invention has been described in detail with reference to each of the above embodiments, but still amends the technical proposals described in each of the above embodiments, or some or all of the techniques. Features can be replaced equivalently. These amendments or substitutions do not deviate from the scope of the technical proposal of each embodiment of the present invention.

Claims

A method of determining the satisfaction level of a voice dialogue performed by a computer.
A step of acquiring a voice dialogue feature including an objective data of a voice dialogue and a subjective data of a voice dialogue, wherein the objective data of the voice dialogue and the subjective data of the voice dialogue are data for the same subject. ,
A step of performing evaluation processing on the objective data to obtain an objective evaluation and performing evaluation processing on the subjective data to obtain a subjective evaluation.
The objective evaluation and the subjective evaluation are input to the satisfaction evaluation model, and a step of acquiring the satisfaction of the voice dialogue output from the satisfaction evaluation model is included.
The objective data of the voice dialogue includes the user intention and the response content, the response delay, and the current playback time of the response content.
A method characterized in that the subjective data of the voice dialogue includes text information corresponding to voice input by the user after the reproduction of the response content is interrupted.

The step of performing evaluation processing on the objective data and acquiring the objective evaluation is
The step of acquiring the first objective evaluation based on the intention match degree between the user intention and the response content, and
The step of obtaining a second objective evaluation based on the response delay and the standard delay, and
Including a step of acquiring a third objective evaluation based on the current playback time of the response content and the standard playback time of the response content.
The step of acquiring the satisfaction level of the voice dialogue output from the satisfaction evaluation model by inputting the objective evaluation and the subjective evaluation into the satisfaction evaluation model is
The first objective evaluation, the second objective evaluation, the third objective evaluation, and the subjective evaluation are input to the satisfaction evaluation model, and the step of acquiring the satisfaction level of the voice dialogue output from the satisfaction evaluation model is included. The method according to claim 1, wherein the method is characterized by the above.

The step of performing evaluation processing on the subjective data and acquiring the subjective evaluation is
A step of performing a word meaning analysis on the text information and acquiring a content attribute which is an emotional attribute or a subject attribute corresponding to the text information.
The method according to claim 1, wherein a step of acquiring a subjective evaluation based on a content attribute corresponding to the text information is included.

If the content attribute is a subject attribute, the step of acquiring the subjective evaluation based on the content attribute corresponding to the text information is
Steps to get the subject type corresponding to the text information,
3. The third aspect of the present invention includes a step of determining that the subjective evaluation is lower than a predetermined evaluation value if the subject type corresponding to the text information matches the subject type corresponding to the user's intention. The method described.

If the content attribute is an emotional attribute, the step of acquiring the subjective evaluation based on the content attribute corresponding to the text information is
The step of extracting emotional keywords in the text information and
Based on the correspondence between the emotional keyword and the mood type, the step of acquiring one of the mood types of the group consisting of positive mood, negative mood, and neutral mood, and
The method according to claim 3, wherein a step of acquiring a subjective evaluation based on a correspondence relationship between the mood type and a predetermined evaluation is included.

The steps to acquire voice dialogue features are
The step of acquiring the first log data in which the time interval between the second log data in the immediately preceding time zone and the third log data in the immediately following time zone is larger than a predetermined threshold value, and
From the first log data, a step of acquiring a subject corresponding to each of two adjacent voice inputs by the user, and
The method of claim 1, comprising the step of acquiring the voice dialogue feature based on the subject corresponding to each of the two adjacent voice inputs.

In the method, the objective evaluation and the subjective evaluation are input to the satisfaction evaluation model, and before the step of acquiring the satisfaction of the voice dialogue output from the satisfaction evaluation model, more objective sample data is obtained. Training that includes the first sample evaluation obtained by performing evaluation processing on the subject, the second sample evaluation obtained by performing evaluation processing on subjective sample data, and the satisfaction level input by the user. In the step of acquiring the sample set, the objective sample data and the subjective sample data are data for the same subject.
The method according to any one of claims 1 to 6, wherein the step of acquiring the satisfaction evaluation model by iteration training based on the training sample set is included.

It is a device for determining the satisfaction level of voice dialogue,
A acquiring module for acquiring voice interaction feature that contains and subjective data objective data and voice dialogue spoken dialogue, subjective data objective data and voice interaction of the sound voice interaction is the data for the same subject, Get module and
A processing module that performs evaluation processing on the objective data to acquire an objective evaluation, and performs evaluation processing on the subjective data to acquire a subjective evaluation.
The objective evaluation and the subjective evaluation are input to the satisfaction evaluation model, and a confirmation module for acquiring the satisfaction of the voice dialogue output from the satisfaction evaluation model is provided.
The objective data of the voice dialogue includes the user intention, the response content, the response delay, and the current playback time of the response content.
The subjective data of the voice dialogue includes text information corresponding to the voice input by the user after the reproduction of the response content is interrupted, or text information input by the user after the reproduction of the response content is completed. , A device characterized by that.

Specifically, the processing module
Based on the degree of matching between the user's intention and the response content, the first objective evaluation is obtained.
Obtain a second objective evaluation based on the response delay and standard delay.
It is configured to obtain a third objective evaluation based on the current playback time of the response content and the standard playback time of the response content.
Specifically, the confirmation module
The first objective evaluation, the second objective evaluation, the third objective evaluation, and the subjective evaluation are used as inputs of the satisfaction evaluation model, and the satisfaction level of the voice dialogue output from the satisfaction evaluation model is acquired. The device according to claim 8, wherein the device is provided.

Specifically, the processing module
Semantic parsing is performed on the text information to acquire the content attribute which is an emotional attribute or a subject attribute corresponding to the text information.
The apparatus according to claim 8, wherein the subjective evaluation is acquired based on the content attribute corresponding to the text information.

When the content attribute is a subject attribute, the processing module specifically
Obtain the subject type corresponding to the text information and
The claim is characterized in that if the subject type corresponding to the text information matches the subject type corresponding to the user's intention, the subjective evaluation is determined to be lower than a predetermined evaluation value. 10. The apparatus according to 10.

When the content attribute is an emotional attribute, the processing module specifically
Extract emotional keywords in the text information
Based on the correspondence between the emotional keywords and mood types, one of the mood types in the group consisting of positive mood, negative mood, and neutral mood is acquired.
The device according to claim 10, wherein a subjective evaluation is acquired based on a correspondence relationship between the mood type and a predetermined evaluation.

Specifically, the acquisition module
Acquire the first log data in which the time interval between the second log data in the immediately preceding time zone and the third log data in the immediately following time zone is larger than a predetermined threshold value.
From the first log data, the subject corresponding to each of the two adjacent voice inputs by the user is acquired.
The device according to claim 8, wherein the voice dialogue feature is configured to be acquired based on a subject corresponding to each of the two adjacent voice inputs.

The device further includes a training module, which uses the objective evaluation and the subjective evaluation as inputs of the satisfaction evaluation model before acquiring the satisfaction level of the voice dialogue output from the satisfaction evaluation model. ,
The first sample evaluation obtained by performing evaluation processing on objective sample data, the second sample evaluation obtained by performing evaluation processing on subjective sample data, and the satisfaction input from the user. A training sample set including the degree is obtained, and the objective sample data and the subjective sample data are data for the same subject.
The apparatus according to any one of claims 8 to 13, characterized in that the satisfaction evaluation model is obtained by iteration training based on the training sample set.

It is a device for determining the satisfaction level of voice dialogue,
With at least one processor and memory
Commands that can be executed by a computer are stored in the memory.
The satisfaction level of the voice dialogue according to any one of claims 1 to 7 to the at least one processor by executing a command that can be executed by the computer stored in the memory. A device characterized in that the determination method of is executed.

A computer-readable storage medium
The computer-readable storage medium stores commands that can be executed by the computer.
A storage medium according to any one of claims 1 to 7, wherein when a command that can be executed by the computer is executed by the processor, the method for determining the satisfaction level of the voice dialogue according to any one of claims 1 to 7 is realized.

It ’s a computer program
A computer program that, when executed by a processor, realizes the method according to any one of claims 1-7.