JP2019191490A5

JP2019191490A5 -

Info

Publication number: JP2019191490A5
Application number: JP2018086985A
Authority: JP
Filing date: 2018-04-27
Publication date: 2020-06-25

Description

音声対話端末１から送られてきた音声データを受信した音声対話サービス２は、受信した音声データを音声認識部２−１で解析を行い、この解析した内容に応じた応答を対話処理部２−２で生成する。 The voice conversation service 2 that receives the voice data sent from the voice conversation terminal 1 analyzes the received voice data in the voice recognition unit 2-1 and returns a response according to the analyzed contents to the conversation processing unit 2-. Generate in 2.

応答を生成すると、この生成した応答の内容を音声合成部２−３で音声データに変換し、この変換した音声データを、ネットワーク３を介して音声対話端末１に送信する。 When generating a response, the content of the generated response is converted into speech data by the speech synthesis unit 2-3, the audio data this conversion, and transmits the voice interaction terminal 1 via the network 3.

コマンドによる応答の応答内容は、ユーザが音声対話端末１に対して発話した例えば「エアコンつけて」に対する「デバイス＝エアコン１１、操作＝ＯＮ、モード＝冷房、設定＝温度２６度、風量最大」の内容のコマンドである。また例えば「ちょっと電気つけて」に対する「デバイス＝照明１０、操作＝ＯＮ」の内容のコマンドである。 The response content of the response by the command is, for example, “device=air conditioner 1 1 , operation=ON, mode=cooling, setting=temperature 26 degrees, maximum air volume” for “turn on air conditioner” spoken by the user to the voice interaction terminal 1. Is the command of the contents of. Further, for example, the command has the content of "device=lighting 10, operation=ON" for "light up a little".

音声対話サービス２から応答を受け取った音声対話端末１は、その応答がコマンドによる応答の場合は、コマンドに含まれている制御対象のデバイスの制御を行う。例えばコマンドの内容が「デバイス＝エアコン１１、操作＝ＯＮ、モード＝冷房、設定＝温度２６度、風量最大」の場合、音声対話端末１は、エアコン１１を温度２６度、風量最大の設定で起動するように、内部に持つＷｉ-Ｆｉ、ＺｉｇＢｅｅ、Ｂｌｕｅｔｏｏｔｈ等の近距離無線通信システムを介して制御する。 When the response is a command response, the voice interaction terminal 1 that has received the response from the voice interaction service 2 controls the device to be controlled included in the command. For example the contents of the command "device = Air Conditioning 1 1, operation = ON, mode = cooling, set = temperature 26 degrees, the air volume up" case, the voice interaction terminal 1, the air conditioner 11 temperature 26 degrees, in air volume maximum setting It is controlled so as to be activated via a short-range wireless communication system such as Wi-Fi, ZigBee, and Bluetooth, which is internally provided.

音声対話端末１は、ユーザが発した発話を収集するマイク１−１、収集した発話内容をノイズキャンセル等音響処理を行う音響処理部２０１、収集された発話内容からトリガワードを検出するトリガワード検出処理部２０３、トリガワード検出処理部２０３でトリガワードを検出したら、以降のユーザの発話内容を音声対話サービス２に送信するための処理を行う音声データ出力処理部２０２、出力する音声データを音声対話サービス２とやり取りする通信制御部２０４を含む。さらに音声対話端末１は、トリガワード検出処理部２０３でトリガワードを検出する際の検出感度を設定するトリガワード検出感度設定処理部２０５を含む。 The voice interaction terminal 1 includes a microphone 1-1 that collects utterances uttered by a user, a sound processing unit 201 that performs acoustic processing such as noise cancellation on the collected utterance contents, and a trigger word detection that detects a trigger word from the collected utterance contents. When the processing unit 203 and the trigger word detection processing unit 203 detect a trigger word, a voice data output processing unit 202 that performs a process for transmitting the subsequent utterance content of the user to the voice conversation service 2 and voice conversation of the voice data to be output is performed. It includes a communication controller 204 that interacts with the service 2. Further, the voice interaction terminal 1 includes a trigger word detection sensitivity setting processing unit 205 that sets detection sensitivity when the trigger word detection processing unit 203 detects a trigger word.

音声対話端末１は、一定時間マイク１−１からユーザの発話の入力がない場合、トリガワードが入力されるのを待つトリガワード入力待ち状態に遷移する。音声対話端末１は、トリガワード検出処理部２０３において、マイク１−１から入力された音声が、予め登録されたトリガワードと一致するかの検出を常に行う。音声対話端末１は、トリガワード検出処理部２０３でトリガワードを検出すると、以降ユーザの発話内容をネットワーク３を介して音声対話サービス２に送信する。音声対話端末１は、トリガワード検出処理部２０３でトリガワードを検出した後も、一定時間マイク１−１からユーザの発話の入力がない場合は、再びトリガワード入力待ち状態に遷移する。
（第１の実施形態）
第１の実施形態の音声対話端末は、ユーザが携帯端末２０あるいは携帯端末２１にインストールされているアプリケーション（感度設定アプリ）を操作して、任意の検出感度の感度設定値を入力することで、トリガワード検出処理部２０３の検出感度を変更する音声対話端末である。 When there is no input of the user's utterance from the microphone 1-1 for a certain period of time, the voice interaction terminal 1 transitions to a trigger word input waiting state of waiting for a trigger word to be input. Voice interaction terminal 1, the trigger word detection processing section 203, voice input from the microphone 1-1, always perform either of the detection coincides with a previously registered trigger word. When the trigger word detection processing unit 203 detects the trigger word, the voice conversation terminal 1 transmits the user's utterance content to the voice conversation service 2 via the network 3 thereafter. The voice interactive terminal 1 transitions to the trigger word input waiting state again when the user does not input the utterance from the microphone 1-1 for a certain period of time after the trigger word detection processing unit 203 detects the trigger word.
(First embodiment)
In the voice interaction terminal of the first embodiment, the user operates an application (sensitivity setting application) installed in the mobile terminal 20 or the mobile terminal 21 to input a sensitivity setting value of arbitrary detection sensitivity, This is a voice interaction terminal that changes the detection sensitivity of the trigger word detection processing unit 203.

図４の例は、スライドバー４０１を左右に移動させることで検出感度の設定値を上下させる場合の例であるが、検出感度の設定値（例えば０〜１００）の数値を直接またはアップダウンさせて入力させる方法のユーザインターフェースであってもよい。 Example of FIG. 4 is an example of a case of lowering the set value of the detection sensitivity by moving the slide bar 40 1 to the left and right, directly or up and down the value of the set value of detection sensitivity (e.g. 0-100) It may be a user interface of a method of inputting.

Ｓ５０６の判定の結果、閾値を超えていない場合（Ｓ５０７のＮｏ）、音声対話端末１は、Ｓ５０２でマイク１−１から入力された音声からトリガワードを認識できなかったと判断し、トリガワード以降に入力された音声データを、音声対話サービス２に送信しない。 Result of the determination in S506, if it does not exceed the threshold value (No in S507), the voice interaction terminal 1 determines that it could not recognize the trigger word from the voice input from the microphone 1-1 S 5 02, the trigger word The voice data input thereafter is not transmitted to the voice interaction service 2.

音声対話端末１は、電源がＯＮであれば（Ｓ５０９のＮｏ）、Ｓ５０２の処理へ戻りＳ５０３以降の処理を、電源がＯＮである限り繰り返す。 If the power supply is ON ( No in S509), the voice interaction terminal 1 returns to the processing of S502 and repeats the processing of S503 and subsequent steps as long as the power is ON.

携帯端末２１のユーザが、感度設定値を設定する感度設定アプリを起動するためのアイコンをタップすると、感度設定アプリは携帯端末処理を開始する（Ｓ５２０）。感度設定アプリは、アイコンがタップされたことにより起動する（Ｓ５２１）と、例えば図４に示す表示内容（ＧＵＩ画面）を携帯端末２１の表示画面に表示する。 The mobile terminal 2 1 users, touch the icon to launch the sensitivity setting application for setting a sensitivity set value, the sensitivity setting application starts portable terminal processing (S520). When the sensitivity setting application is activated by tapping the icon (S521), for example, the display content (GUI screen) shown in FIG. 4 is displayed on the display screen of the mobile terminal 21.

ユーザは、携帯端末２１の表示画面に表示されている図４に示す表示内容により、感度設定値を調整して設定する（Ｓ５２２）。ユーザは、感度設定値を設定したあとに、図４の設定ボタン４０２を押下することで、設定された感度設定値がネットワーク３を介して音声対話端末１に送られ（Ｓ５２３）、トリガワード検出感度設定処理部２０５によりトリガワード検出処理部２０３の閾値が更新される。 The user, the display contents shown in FIG. 4, which is displayed on the display screen of the portable terminal 2 1, set by adjusting the sensitivity setting value (S522). After setting the sensitivity setting value, the user presses the setting button 402 in FIG. 4 to send the set sensitivity setting value to the voice interaction terminal 1 via the network 3 (S523), and trigger word detection The sensitivity setting processing unit 205 updates the threshold value of the trigger word detection processing unit 203.

図７は、携帯端末２０あるいは携帯端末２１にインストールされている音声対話端末１を制御するためのアプリケーションにおける、周辺雑音の測定の開始および終了の設定画面の例である。携帯端末２０あるいは携帯端末２１のユーザは、任意のタイミングで開始ボタン７０２と終了ボタン７０３を押下することができる。ユーザが開始ボタン７０２を押下すると、測定開始の通知（感度設定開始イベント）がネットワーク３を介して音声対話端末１に送られ、トリガワード検出処理部２０３が周辺雑音の測定を開始する。次にユーザが終了ボタン７０３を押下すると、測定終了の通知（感度設定終了イベント）がネットワーク３を介して音声対話端末１に送られ、トリガワード検出処理部２０３が周辺雑音の測定を終了する。 FIG. 7 is an example of a setting screen for starting and ending the measurement of ambient noise in the application for controlling the voice interaction terminal 1 installed in the mobile terminal 20 or the mobile terminal 21. The user of the mobile terminal 20 or mobile terminal 21 may be pressed a start button 702 of the end button 703 at any timing. When the user presses the start button 702, a measurement start notification (sensitivity setting start event) is sent to the voice interaction terminal 1 via the network 3, and the trigger word detection processing unit 203 starts measurement of ambient noise. Next, when the user presses the end button 703, a measurement end notification (sensitivity setting end event) is sent to the voice interaction terminal 1 via the network 3, and the trigger word detection processing unit 203 ends the measurement of ambient noise.

ユーザは、携帯端末２１の表示画面に表示されている図７に示す表示内容の開始ボタン７０２を押下すると、感度設定開始イベントが、ネットワーク３を介して音声対話端末１に送信される。感度設定開始イベントを受信した音声対話端末１は、図８ＡのＳ８０４の処理においてＹｅｓと判定し、取得した周辺雑音の音声データを音声対話サービス２に送信を開始する。 When the user has pressed the start button 702 of the display contents shown in FIG. 7, which is displayed on the display screen of the portable terminal 21, the sensitivity setting start event is sent to the voice interaction terminal 1 via the network 3. Upon receiving the sensitivity setting start event, the voice interaction terminal 1 determines Yes in the process of S804 in FIG. 8A, and starts transmitting the acquired ambient noise voice data to the voice interaction service 2.

つづいてユーザが、携帯端末２１の表示画面に表示されている図７に示す表示内容の終了ボタン７０３を押下すると、感度設定終了イベントが、ネットワーク３を介して音声対話端末１に送られる。感度設定終了イベントを受信した音声対話端末１は、図８ＡのＳ８１３の処理においてをＹｅｓと判定し、周辺雑音の音声データの音声対話サービス２への送信を終了する。 Subsequently user presses the end button 703 of the display contents shown in FIG. 7, which is displayed on the display screen of the portable terminal 2 1, sensitivity Exit event is sent to the voice interaction terminal 1 via the network 3. Upon receiving the sensitivity setting end event, the voice interaction terminal 1 determines Yes in the process of S813 in FIG. 8A, and ends the transmission of ambient noise voice data to the voice interaction service 2.

音声対話端末１から送られてきた期間の開始を示す音声データである「トリガワード検出感度開始」、期間の終了を示す音声データである「トリガワード検出感度終了」を認識した検出感度算出部２−４は、この期間に受信した音声データを用いて、トリガワードを発話していないのに、トリガワードを発したと誤検出した回数が一定回数以下になるようにトリガワードの検出感度を算出する。検出感度算出部２−４は、検出感度の算出が完了すると、算出した感度設定値を音声対話端末１に送信する（Ｓ９３）。 Detection sensitivity calculation unit 2 that recognizes "trigger word detection sensitivity start", which is voice data indicating the start of the period, and "trigger word detection sensitivity end", which is the voice data indicating the end of the period, transmitted from the voice interaction terminal 1. -4 uses the voice data received during this period to calculate the detection sensitivity of the trigger word so that the number of erroneous detections of the trigger word is less than a certain number, even though no trigger word is uttered. To do. Upon completion of the calculation of the detection sensitivity, the detection sensitivity calculation unit 2-4 transmits the calculated sensitivity setting value to the voice interaction terminal 1 (S93).

以上のように第３の実施形態の音声対話端末は、音声対話端末が置かれている周辺の音の状況から算出された最適な感度設定値を用いて、トリガワードを検出する感度設定値を更新することができ、かつ最適な感度設定値を算出し、ユーザの発話をトリガーにして実行することができるため、ユーザは携帯端末を操作する手間から解放され、さらにユーザの使い勝手が向上する。このよう音声対話端末１は、置かれている環境に適した感度でトリガワードを検出でき、ユーザの使い勝手が向上する。 As described above, the voice interaction terminal according to the third exemplary embodiment determines the sensitivity setting value for detecting the trigger word by using the optimum sensitivity setting value calculated from the situation of the sound around the voice interaction terminal. Since it can be updated and the optimum sensitivity setting value can be calculated and executed by using the user's utterance as a trigger, the user is freed from the trouble of operating the mobile terminal, and the usability of the user is further improved. In this way, the voice interaction terminal 1 can detect the trigger word with the sensitivity suitable for the environment in which it is placed, and the usability for the user is improved.

以上のように、本実施形態の音声対話端末は、音声対話端末が設定されている状況に応じて、トリガワードを検出する感度の設定を更新することが可能である。例えば。比較的騒がしい環境においては、第２の実施形態の機能を使い周辺環境に基づいた検出感度の設定を行うのに加え、第１の実施形態のユーザ操作による検出感度の設定を行ってもよい。 As described above, the audio interactive terminal of the present embodiment, depending on the context in which speech dialogue terminal is set, it is possible to update the settings of the sensitivity of detecting a trigger word. For example. In relatively noisy environment, in addition to the setting of the detection sensitivity based on the surrounding environment using the function of the second embodiment, it may set the detection sensitivity by a user operation of the first embodiment.

１・・・音声対話端末、２・・・音声対話サービス、２−１・・・音声認識部、２−２・・・対話処理部、２−３・・・音声合成部、２−４・・・検出感度算出部、３・・・ネットワーク、２０３・・・トリガワード検出処理部、２０５・・・トリガワード検出感度設定処理部。 1 ... speech dialogue terminal, 2 ... voice conversation service, 2-1 ... speech recognition unit, 2-2 ... dialogue processing unit, 2-3 ... speech synthesizer, 2-4- .. Detection sensitivity calculation unit, 3... Network, 203... Trigger word detection processing unit, 205... Trigger word detection sensitivity setting processing unit