WO2020195457A1

WO2020195457A1 - Speech interaction device, input device, and output device

Info

Publication number: WO2020195457A1
Application number: PCT/JP2020/007447
Authority: WO
Inventors: 沙織岩田
Original assignee: 株式会社東海理化電機製作所
Priority date: 2019-03-26
Filing date: 2020-02-25
Publication date: 2020-10-01
Also published as: JP2020160725A; CN113544771A

Abstract

A speech interaction device (1) is provided with: a determination unit (13) that, on the basis of the result of recognition performed by a recognition unit (10) for recognizing the purpose of a motion indicated by a user's body movement, determines whether or not there is an intention of temporarily stopping a speech interactive operation; and a stop unit (14) that, when it is determined as a result of the determination performed by the determination unit (13) that there is the intention of temporarily stopping the speech interactive operation, stops the speech interactive operation for a prescribed time.

Description

Voice dialogue device, input device and output device

The present invention relates to a voice dialogue device, an input device, and an output device that interact by voice.

Conventionally, a voice dialogue device in which a person and a computer exchange intentions through voice dialogue is well known (see Patent Document 1 and the like).

JP-A-2018-189984

However, there is a current situation where input and output in voice dialogue cannot be stopped halfway once they are started. For this reason, when the user of the voice dialogue is performing the voice dialogue and other tasks at the same time, it may not always be possible to concentrate on the voice dialogue and the voice dialogue may not be smooth. was there.

An object of the present invention is to provide a voice dialogue device, an input device, and an output device that enable smooth voice dialogue.

The voice dialogue device according to one embodiment determines whether or not there is an intention to suspend the voice dialogue operation based on the recognition result of the recognition unit that recognizes the intention of the operation indicated by the user moving the body. When it is determined as a result of the determination by the determination unit and the determination unit that the intention is to suspend the operation of the voice dialogue, the operation of the voice dialogue is stopped for a specified time. Equipped with a part.

The input device according to one embodiment determines whether or not there is an intention to suspend the input of the voice dialogue based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body. As a result of the determination by the determination unit and the determination unit, when it is determined that there is the intention to suspend the input of the voice dialogue, the stop unit that stops the input of the voice dialogue for a specified time. And equipped.

The output device according to one embodiment determines whether or not there is an intention to suspend the output of the voice dialogue based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body. As a result of the determination by the determination unit and the determination unit, when it is determined that there is the intention to suspend the output of the voice dialogue, the stop unit that stops the output of the voice dialogue for a specified time. And equipped.

The block diagram of the voice dialogue apparatus of one Embodiment. The figure which shows the arrangement example of a camera. Flowchart executed when pausing voice input. The figure which shows the gesture which pauses a voice input. A flowchart that is executed when the audio output is paused. The figure which shows the gesture which pauses a voice output. The figure which shows the detection part of another example. The block diagram of another example input device. The block diagram of the output device of another example.

Hereinafter, an embodiment of the voice dialogue device, the input device, and the output device will be described with reference to FIGS. 1 to 6.

As shown in FIG. 1, the voice dialogue device 1 supports the driving of the vehicle through interactive exchange of intentions with the occupants such as the driver. The voice dialogue device 1 supports the driving of a vehicle by providing various information related to driving by voice and controlling an in-vehicle device through voice dialogue with an occupant such as a driver.

The voice dialogue device 1 includes a controller 2 that controls the operation of the voice dialogue device 1, a sound collecting unit 3 that collects sound, and a sound output unit 4 that outputs sound. The sound collecting unit 3 includes, for example, a microphone. The sound output unit 4 includes, for example, an in-vehicle speaker. The controller 2 understands the content spoken by the other party by voice recognition based on the voice data Da input from the sound collecting unit 3, and executes voice output such as voice guidance from the sound output unit 4 as a response to the content.

The controller 2 responds to the voice recognition unit 5 that recognizes the voice data Da input from the sound collecting unit 3, the search database 6 that stores the search data used in the determination of voice recognition, and the result of voice recognition. It is provided with a guidance output unit 7 that outputs voice guidance. When the voice data Da is input from the sound collecting unit 3, the voice recognition unit 5 recognizes the input voice by comparing the voice data Da with the search database 6 and analyzing the voice data Da. After voice recognition, the voice recognition unit 5 outputs the voice recognition result to the guidance output unit 7. The guidance output unit 7 outputs the corresponding output data Db to the sound output unit 4 based on the input voice recognition result. The sound output unit 4 outputs voice guidance according to the voice recognition result based on the input output data Db.

The voice dialogue device 1 has a function of stopping the voice dialogue function in the middle, a so-called operation pause function. This is because it is difficult to have a smooth voice dialogue in a voice dialogue while driving, for example, because the driver concentrates too much on driving and misses the voice guidance or gets stuck in the voice input.

The voice dialogue device 1 includes a gesture recognition unit 10 that recognizes a gesture performed by a occupant such as a driver using a part of the body. The gesture recognition unit 10 is provided in the controller 2. The gesture recognition unit 10 inputs a detection signal St from a detection unit 11 capable of detecting the movement of the body of an occupant such as a driver, and recognizes a gesture based on the detection signal St.

As shown in FIG. 2, the detection unit 11 includes a camera 12 that captures an image. It is preferable that the camera 12 is arranged at a position where it is easy to photograph the driver, for example, on the upper surface of the instrument panel of the vehicle. The camera 12 outputs the captured surrounding image data St1 to the controller 2. The gesture recognition unit 10 recognizes the gesture reflected in the camera 12 by performing image analysis of the image data St1 input from the camera 12, for example.

Returning to FIG. 1, the voice dialogue device 1 includes a determination unit 13 for determining whether or not the user has an intention to suspend the operation of the voice dialogue. The determination unit 13 is provided in the controller 2. The determination unit 13 of this example determines whether or not there is an intention to suspend the operation of the voice dialogue based on the recognition result of the recognition unit 15 that recognizes the intention of the operation indicated by the user moving the body. To do. The recognition unit 15 is preferably a gesture recognition unit 10 that recognizes the movement of the body as a manifestation of intention through the user's body. Further, the determination unit 13 determines whether or not there is a gesture to suspend these operations in both the input and the output of the voice dialogue.

The voice dialogue device 1 includes a stop unit 14 that stops the operation of the voice dialogue based on the determination result of the determination unit 13. The stop unit 14 is provided in the controller 2. When it is determined as a result of the determination by the determination unit 13 that the stop unit 14 intends to suspend the operation of the voice dialogue, the stop unit 14 stops the operation of the voice dialogue for a predetermined time. The stop unit 14 of this example can suspend these operations at both the input and output of the voice dialogue.

The stop unit 14 cancels the stop when the condition for canceling the stop is satisfied while the operation of the voice dialogue is paused. In the case of voice input, it is preferable that the condition for canceling the stop is the restart of the utterance under the stop. Further, in the case of voice output, it is preferable that the stop release condition is, for example, the execution of the same gesture as the gesture performed when pausing the voice guidance.

Next, the operation of the voice dialogue device 1 of the present embodiment will be described with reference to FIGS. 3 to 6.

As shown in FIG. 3, when the system start operation is executed, the voice dialogue device 1 starts executing the flowchart shown in the figure. The system activation includes, for example, a start instruction by voice input, an operation of a switch provided around the driver's seat, and a start operation in a car navigation system.

In step 101, when the determination unit 13 detects the utterance, it starts determining whether or not to suspend the input of the voice dialogue (hereinafter referred to as voice input). The determination unit 13 of this example detects the utterance based on the voice data Da input from the sound collecting unit 3.

In step 102, the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of pausing the voice input. In the case of this example, the gesture recognition unit 10 recognizes the driver's gesture by, for example, monitoring the image data St1 of the camera 12 as the detection unit 11. Then, the determination unit 13 monitors whether or not there is a gesture of pausing the voice input based on the recognition result of this image recognition.

As shown in FIG. 4, the gesture of pausing voice input is preferably an operation of interrupting a conversation, for example, "waiting". The action of interrupting the conversation is preferably, for example, an action of spreading out the hand and holding out.

Returning to FIG. 3, if the gesture of pausing the voice input is recognized in step 102, the process proceeds to step 103. On the other hand, if the gesture of pausing the voice input is not recognized in step 102, the process proceeds to step 105.

When the determination unit 13 determines that the voice input pause gesture has been made, the stop unit 14 suspends the voice input in step 103. Therefore, when it is desired to concentrate on driving, it is possible to suspend the input of the voice dialogue. At this time, the stop unit 14 maintains the state in which the microphone, which is the sound collecting unit 3, is activated, and suspends the voice input. This allows for rediscovery of voice input.

In step 104, when the stop unit 14 detects an utterance during the period in which the voice input is paused, the stop unit 14 cancels the pause of the voice input. That is, if the driver tries to resume the voice input and makes an utterance while the voice input is paused, the pause of the voice input is released. This makes it possible to resume voice input. The utterance at this time may have any content as long as the voice input can be recognized.

In step 105, the voice dialogue device 1 determines whether or not the utterance is completed, that is, whether or not the voice input is completed. When the voice dialogue device 1 determines that the utterance is finished, the voice input is finished, and when it is determined that the utterance is not finished, the process returns to step 102 and repeats the above-described processing.

The voice recognition unit 5 analyzes the voice input by referring to the search database 6 based on the voice input voice data Da. Then, the voice recognition unit 5 outputs the voice recognition result after the analysis to the guidance output unit 7. The guidance output unit 7 operates the sound output unit 4 based on the voice recognition result input from the voice recognition unit 5, and executes the voice guidance.

As shown in FIG. 5, when the voice dialogue device 1 executes the voice guidance output (hereinafter referred to as voice output) from the sound output unit 4, the voice dialogue device 1 starts executing the flowchart shown in the figure.

In step 201, the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of pausing the voice output. The gesture recognition unit 10 recognizes the driver's gesture, for example, by monitoring the image data St1 of the camera 12 as the detection unit 11. Then, the determination unit 13 monitors whether or not there is a gesture of pausing the voice output based on the recognition result of this image recognition.

As shown in FIG. 6, the gesture of pausing the voice output is preferably an operation that encourages silence, for example, "quietly". The action of encouraging silence is preferably, for example, the action of raising the index finger to the mouth.

Returning to FIG. 5, if the gesture of pausing the voice output is recognized in step 201, the process proceeds to step 202. On the other hand, if the gesture of pausing the audio output is not recognized in step 201, the process proceeds to step 205.

When the determination unit 13 determines that the voice output pause gesture has been made, the stop unit 14 suspends the voice guidance in step 202. Therefore, it is possible to pause the voice guidance when it is desired to concentrate on driving.

In step 203, the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of canceling the pause of the voice output. It is preferable that the pause release gesture is the same operation as when the audio output is paused, for example. When the determination unit 13 recognizes the gesture of canceling the pause, the determination unit 13 proceeds to step 204. On the other hand, if the determination unit 13 does not recognize the pause release gesture, the determination unit 13 returns to step 202 and maintains the state in which the voice output is paused.

When the determination unit 13 recognizes that there is a gesture to release the pause of the voice output, the stop unit 14 resumes the voice guidance in step 204. Therefore, it is possible to listen to the continuation of the voice guidance stopped in the middle again.

In step 205, the voice dialogue device 1 determines whether or not the voice guidance has ended, that is, whether or not the voice output has ended. When the voice dialogue device 1 determines that the voice guidance has ended, it ends the voice output, and when it determines that the voice guidance has not finished yet, it returns to step 102 and repeats the above-described processing.

According to the voice dialogue device 1 of the above embodiment, the following effects can be obtained.

The determination unit 13 of the voice dialogue device 1 determines whether or not there is an intention to suspend the operation of the voice dialogue based on the recognition result of the gesture recognition unit 10 during the operation of the voice dialogue. When the determination unit 13 determines that the stop unit 14 of the voice dialogue device 1 intends to suspend the operation, the stop unit 14 stops the voice dialogue operation for a specified time. Therefore, if a user such as a driver cannot concentrate on the voice dialogue, the voice dialogue can be paused by a gesture, and then the continuation of the voice dialogue can be executed when the environment where the user can concentrate on the voice dialogue is prepared. It will be possible. Therefore, the voice dialogue can be made smooth. In addition, the driver can perform a smooth voice dialogue according to the driving situation. The driver can input and listen to the voice dialogue at his own pace.

The voice dialogue operation to be paused is voice input. As a result of the determination by the determination unit 13, the stop unit 14 stops the voice input for a predetermined time when it is determined that there is an intention to stop the voice input. Therefore, it is possible to appropriately shift the voice input to the paused state according to the user's intention, so that it is possible to concentrate on other work at the time of voice input.

The operation of the voice dialogue to be paused is the output of voice. As a result of the determination by the determination unit 13, the stop unit 14 stops the audio output for a predetermined time when it is determined that there is an intention to stop the audio output. Therefore, it is possible to appropriately shift the voice output to the paused state according to the user's intention, so that it is possible to concentrate on other work at the time of voice output.

Gesture is a movement that uses a part of the user's body. Therefore, the operation of the voice dialogue can be shifted to the paused state in an easy-to-understand manner using the movement of the user's body.

The stop unit 14 releases the stop when the stop release condition is satisfied during the period during which the voice dialogue operation is suspended. Therefore, even if the voice dialogue is stopped by the specified gesture, the stopped state can be released and the original normal state can be restored.

It is assumed that the stop release condition is satisfied when the stop unit 14 determines as a result of the determination by the determination unit 13 that there is an intention to release the stop in the operation of the voice dialogue. Therefore, the paused voice dialogue operation can be restored to the original state by a simple operation in which the user moves the body.

Note that this embodiment can be modified and implemented as follows. The present embodiment and the following modified examples can be implemented in combination with each other within a technically consistent range.

[About voice dialogue device 1]
-As shown in FIG. 7, the detection unit 11 may be a touch pad 32 provided on the steering wheel 31 mounted on the vehicle. The touch pad 32 includes various sensors such as a capacitance type and a resistance film type. Gestures using the touchpad 32 include, for example, the number of touch operations of the touchpad 32, the timing of the touch operation, the direction when the surface of the touchpad 32 is traced, and a combination thereof. In this way, the operation of the voice dialogue can be paused by a simple method of operating the touch pad 32 with a finger.

-As shown in FIG. 8, the operation pause function may be applied to, for example, an input device 35 that executes only voice input. The input device 35 includes, for example, a voice recognition unit 5, a search database 6, a gesture recognition unit 10, a determination unit 13, and a stop unit 14. Further, the input device 35 is not limited to the input device for voice dialogue, and may be used for other devices or input devices. When such an input device 35 is used, it becomes possible to provide an input device 35 capable of appropriately suspending voice input by a user's gesture.

-As shown in FIG. 9, the operation pause function may be applied to, for example, an output device 36 that executes only audio output. The output device 36 includes, for example, a guidance output unit 7, a gesture recognition unit 10, a determination unit 13, and a stop unit 14. Further, the output device 36 is not limited to the output device for voice dialogue, and may be used as an output device for other devices or devices. When such an output device 36 is used, it becomes possible to provide an output device 36 capable of appropriately pausing the voice output by the gesture of the user.

[About voice dialogue]
-Voice dialogue is not limited to communication in which only voice is exchanged, and may include, for example, a gesture by a part of the body.

-The voice dialogue may use an existing in-vehicle device, or may use a component different from the in-vehicle product.

-Voice dialogue may be in the form of interacting with a robot, for example. This robot can be applied with various designs such as anthropomorphic ones and replicas of animals.

[Intention to suspend the operation of voice dialogue]
-The gesture for suspending the operation of the voice dialogue may be changed to various modes such as changing the facial expression and shaking the head from side to side.

-The gesture for suspending the operation of the voice dialogue is not limited to the format described in the embodiment, and may be any format that can convey the suspension to the vehicle side.

-The gesture that suspends the operation of voice dialogue may be a combination of voice.

-The gesture that suspends the operation of the voice dialogue may be configured so that it can be freely registered or changed.

[Pause of voice dialogue]
-The specified time for pausing the voice dialogue is not limited to a fixed time, and various times are taken according to the method of returning from the pause.

-The pause of the voice dialogue may be a mode in which the input and output of the voice dialogue are temporarily stopped, and the operating state of various devices for constructing the voice dialogue device 1 is not particularly limited.

[Conditions for canceling the suspension]
-The condition for canceling the pause may be, for example, a mode in which a return is instructed by voice.

-The suspension release condition may be, for example, an operation of a switch provided on the vehicle or an operation of touching a sensor mounted on the vehicle.

・ The pause of voice dialogue may automatically return to the original normal state after stopping for a specified time. In this case, it is possible to prevent the operation of the voice dialogue from being left in a paused state.

-When the voice dialogue is paused, the user confirms by voice etc. whether it is okay to return to the original normal state, and if the user permits the return, the original normal state is restored. May be good.

[About manifestation of intention through the user's body]
-The manifestation of intention through the user's body is not limited to gestures that are physical gestures, and may be, for example, voice instructions.

・ The manifestation of intention through the user's body includes a mode in which a part of the body is moved without having the body itself move, for example, the movement of the line of sight.

[About recognition unit 15]
The recognition unit 15 is not limited to the gesture recognition unit 10, and may include the voice recognition unit 5 when, for example, the voice is also monitored as a pause condition.

-The recognition unit 15 may be a voice recognition unit 5 instead of the gesture recognition unit 10.

The recognition unit 15 is not limited to the voice recognition unit 5 and the gesture recognition unit 10, and may be capable of recognizing other physical movements.

[Other]
The detection unit 11 is not limited to the on-board device or component, and may be a terminal such as a high-performance mobile phone. In this case, for example, the user's gestures and voices may be collected by using the camera function and the microphone function provided in the terminal.

-The voice dialogue device 1 is not limited to being used in a vehicle, and may be used in other systems and devices.

The controller 2 (voice recognition unit 5, guidance output unit 7, gesture recognition unit 10, determination unit 13, and / or stop unit 14) shown in FIG. 1 constituting the voice dialogue device 1 may be one or more. Constructed as a computer system including a processor and a non-temporary memory that stores instructions that can be executed by the processor and that store instructions for realizing voice dialogue processing according to any of the above embodiments and other examples. Can be done. Similarly, the input device 35 of FIG. 8 and the output device 36 of FIG. 9 can also be constructed as such a computer system. Alternatively, the controller 2, the input device 35, and the output device 36 may be configured with dedicated hardware such as an application specific integrated circuit (ASIC).

The present disclosure includes the following embodiments.
(Embodiment 1)
It ’s a computer system,
With one or more processors
It is provided with a non-temporary memory that is an instruction that can be executed by the processor and stores an instruction for realizing voice interactive processing.
The voice dialogue process
Recognizing voice for voice dialogue,
Recognizing the gesture associated with the movement of the voice dialogue based on the detection signal from the detection unit that detects the movement of the user.
To determine whether or not a gesture indicating the intention to suspend the operation of the voice dialogue has been performed after the start of the voice dialogue.
Pausing the operation of the voice dialogue when a gesture indicating the intention to suspend the operation of the voice dialogue is performed.
Including computer systems.
(Embodiment 2)
The voice dialogue operation includes voice recognition based on voice input.
The computer system of embodiment 1, wherein suspending the operation of the voice dialogue includes suspending voice recognition based on the voice input in response to the recognition of the first gesture.
(Embodiment 3)
The computer system of embodiment 2, wherein the voice dialogue process further releases a pause in voice recognition based on the voice input in response to a user's utterance.
(Embodiment 4)
The voice dialogue operation includes voice output based on the voice recognition result.
The computer system according to any one of the first to third embodiments, wherein pausing the operation of the voice dialogue includes pausing the voice output based on the voice recognition result in response to the recognition of the second gesture.
(Embodiment 5)
The computer system of embodiment 4, wherein the voice dialogue process further unpauses the voice output based on the voice recognition result in response to the recognition of the third gesture.
(Embodiment 6)
The computer system of embodiment 5, wherein the third gesture is the same as the second gesture.

Claims

Based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body, the judgment unit that determines whether or not there is an intention to suspend the movement of the voice dialogue, and
As a result of the determination by the determination unit, when it is determined that there is the intention to suspend the operation of the voice dialogue, the stop unit is provided to stop the operation of the voice dialogue for a specified time. Voice dialogue device.
The operation of the voice dialogue is voice input, and is
The first aspect of claim 1 is that the stop unit stops the voice input for a specified time when it is determined as a result of the determination by the determination unit that the voice input is stopped. Voice dialogue device.
The operation of the voice dialogue is the output of voice, and is
Claim 1 or 2 that the stop unit stops the output of the voice for a specified time when it is determined as a result of the judgment by the judgment unit that the stop unit has the intention to stop the output of the voice. The voice dialogue device described in.
The voice dialogue according to any one of claims 1 to 3, wherein the stop unit cancels the stop when the stop release condition is satisfied during the period during which the operation of the voice dialogue is suspended. apparatus.
4. Claim 4 that the stop release condition is satisfied when it is determined as a result of the determination by the determination unit that the stop unit has the intention to release the stop in the operation of the voice dialogue. The voice dialogue device described in.
Based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body, a determination unit that determines whether or not there is an intention to suspend the input of the voice dialogue, and
As a result of the determination by the determination unit, when it is determined that there is the intention to suspend the input of the voice dialogue, the stop unit is provided to stop the input of the voice dialogue for a predetermined time. Input device.
Based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body, a determination unit that determines whether or not there is an intention to suspend the output of the voice dialogue, and a determination unit.
As a result of the determination by the determination unit, it is provided with a stop unit that stops the output of the voice dialogue for a specified time when it is determined that there is the intention to suspend the output of the voice dialogue. Output device.