WO2020195457A1 - Speech interaction device, input device, and output device - Google Patents

Speech interaction device, input device, and output device Download PDF

Info

Publication number
WO2020195457A1
WO2020195457A1 PCT/JP2020/007447 JP2020007447W WO2020195457A1 WO 2020195457 A1 WO2020195457 A1 WO 2020195457A1 JP 2020007447 W JP2020007447 W JP 2020007447W WO 2020195457 A1 WO2020195457 A1 WO 2020195457A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
unit
voice dialogue
input
output
Prior art date
Application number
PCT/JP2020/007447
Other languages
French (fr)
Japanese (ja)
Inventor
沙織 岩田
Original Assignee
株式会社東海理化電機製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社東海理化電機製作所 filed Critical 株式会社東海理化電機製作所
Priority to CN202080016426.XA priority Critical patent/CN113544771A/en
Publication of WO2020195457A1 publication Critical patent/WO2020195457A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Definitions

  • the present invention relates to a voice dialogue device, an input device, and an output device that interact by voice.
  • An object of the present invention is to provide a voice dialogue device, an input device, and an output device that enable smooth voice dialogue.
  • the voice dialogue device determines whether or not there is an intention to suspend the voice dialogue operation based on the recognition result of the recognition unit that recognizes the intention of the operation indicated by the user moving the body.
  • the operation of the voice dialogue is stopped for a specified time. Equipped with a part.
  • the input device determines whether or not there is an intention to suspend the input of the voice dialogue based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body.
  • the stop unit that stops the input of the voice dialogue for a specified time. And equipped.
  • the output device determines whether or not there is an intention to suspend the output of the voice dialogue based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body.
  • the stop unit that stops the output of the voice dialogue for a specified time. And equipped.
  • the block diagram of the voice dialogue apparatus of one Embodiment The figure which shows the arrangement example of a camera. Flowchart executed when pausing voice input. The figure which shows the gesture which pauses a voice input. A flowchart that is executed when the audio output is paused. The figure which shows the gesture which pauses a voice output. The figure which shows the detection part of another example.
  • the voice dialogue device 1 supports the driving of the vehicle through interactive exchange of intentions with the occupants such as the driver.
  • the voice dialogue device 1 supports the driving of a vehicle by providing various information related to driving by voice and controlling an in-vehicle device through voice dialogue with an occupant such as a driver.
  • the voice dialogue device 1 includes a controller 2 that controls the operation of the voice dialogue device 1, a sound collecting unit 3 that collects sound, and a sound output unit 4 that outputs sound.
  • the sound collecting unit 3 includes, for example, a microphone.
  • the sound output unit 4 includes, for example, an in-vehicle speaker.
  • the controller 2 understands the content spoken by the other party by voice recognition based on the voice data Da input from the sound collecting unit 3, and executes voice output such as voice guidance from the sound output unit 4 as a response to the content.
  • the controller 2 responds to the voice recognition unit 5 that recognizes the voice data Da input from the sound collecting unit 3, the search database 6 that stores the search data used in the determination of voice recognition, and the result of voice recognition. It is provided with a guidance output unit 7 that outputs voice guidance.
  • the voice recognition unit 5 recognizes the input voice by comparing the voice data Da with the search database 6 and analyzing the voice data Da.
  • the voice recognition unit 5 outputs the voice recognition result to the guidance output unit 7.
  • the guidance output unit 7 outputs the corresponding output data Db to the sound output unit 4 based on the input voice recognition result.
  • the sound output unit 4 outputs voice guidance according to the voice recognition result based on the input output data Db.
  • the voice dialogue device 1 has a function of stopping the voice dialogue function in the middle, a so-called operation pause function. This is because it is difficult to have a smooth voice dialogue in a voice dialogue while driving, for example, because the driver concentrates too much on driving and misses the voice guidance or gets stuck in the voice input.
  • the voice dialogue device 1 includes a gesture recognition unit 10 that recognizes a gesture performed by a occupant such as a driver using a part of the body.
  • the gesture recognition unit 10 is provided in the controller 2.
  • the gesture recognition unit 10 inputs a detection signal St from a detection unit 11 capable of detecting the movement of the body of an occupant such as a driver, and recognizes a gesture based on the detection signal St.
  • the detection unit 11 includes a camera 12 that captures an image. It is preferable that the camera 12 is arranged at a position where it is easy to photograph the driver, for example, on the upper surface of the instrument panel of the vehicle.
  • the camera 12 outputs the captured surrounding image data St1 to the controller 2.
  • the gesture recognition unit 10 recognizes the gesture reflected in the camera 12 by performing image analysis of the image data St1 input from the camera 12, for example.
  • the voice dialogue device 1 includes a determination unit 13 for determining whether or not the user has an intention to suspend the operation of the voice dialogue.
  • the determination unit 13 is provided in the controller 2.
  • the determination unit 13 of this example determines whether or not there is an intention to suspend the operation of the voice dialogue based on the recognition result of the recognition unit 15 that recognizes the intention of the operation indicated by the user moving the body. To do.
  • the recognition unit 15 is preferably a gesture recognition unit 10 that recognizes the movement of the body as a manifestation of intention through the user's body. Further, the determination unit 13 determines whether or not there is a gesture to suspend these operations in both the input and the output of the voice dialogue.
  • the voice dialogue device 1 includes a stop unit 14 that stops the operation of the voice dialogue based on the determination result of the determination unit 13.
  • the stop unit 14 is provided in the controller 2. When it is determined as a result of the determination by the determination unit 13 that the stop unit 14 intends to suspend the operation of the voice dialogue, the stop unit 14 stops the operation of the voice dialogue for a predetermined time.
  • the stop unit 14 of this example can suspend these operations at both the input and output of the voice dialogue.
  • the stop unit 14 cancels the stop when the condition for canceling the stop is satisfied while the operation of the voice dialogue is paused.
  • the condition for canceling the stop is the restart of the utterance under the stop.
  • the stop release condition is, for example, the execution of the same gesture as the gesture performed when pausing the voice guidance.
  • the voice dialogue device 1 starts executing the flowchart shown in the figure.
  • the system activation includes, for example, a start instruction by voice input, an operation of a switch provided around the driver's seat, and a start operation in a car navigation system.
  • step 101 when the determination unit 13 detects the utterance, it starts determining whether or not to suspend the input of the voice dialogue (hereinafter referred to as voice input).
  • voice input the input of the voice dialogue
  • the determination unit 13 of this example detects the utterance based on the voice data Da input from the sound collecting unit 3.
  • the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of pausing the voice input.
  • the gesture recognition unit 10 recognizes the driver's gesture by, for example, monitoring the image data St1 of the camera 12 as the detection unit 11. Then, the determination unit 13 monitors whether or not there is a gesture of pausing the voice input based on the recognition result of this image recognition.
  • the gesture of pausing voice input is preferably an operation of interrupting a conversation, for example, "waiting".
  • the action of interrupting the conversation is preferably, for example, an action of spreading out the hand and holding out.
  • step 102 if the gesture of pausing the voice input is recognized in step 102, the process proceeds to step 103. On the other hand, if the gesture of pausing the voice input is not recognized in step 102, the process proceeds to step 105.
  • the stop unit 14 suspends the voice input in step 103. Therefore, when it is desired to concentrate on driving, it is possible to suspend the input of the voice dialogue. At this time, the stop unit 14 maintains the state in which the microphone, which is the sound collecting unit 3, is activated, and suspends the voice input. This allows for rediscovery of voice input.
  • step 104 when the stop unit 14 detects an utterance during the period in which the voice input is paused, the stop unit 14 cancels the pause of the voice input. That is, if the driver tries to resume the voice input and makes an utterance while the voice input is paused, the pause of the voice input is released. This makes it possible to resume voice input.
  • the utterance at this time may have any content as long as the voice input can be recognized.
  • step 105 the voice dialogue device 1 determines whether or not the utterance is completed, that is, whether or not the voice input is completed.
  • the voice dialogue device 1 determines that the utterance is finished, the voice input is finished, and when it is determined that the utterance is not finished, the process returns to step 102 and repeats the above-described processing.
  • the voice recognition unit 5 analyzes the voice input by referring to the search database 6 based on the voice input voice data Da. Then, the voice recognition unit 5 outputs the voice recognition result after the analysis to the guidance output unit 7.
  • the guidance output unit 7 operates the sound output unit 4 based on the voice recognition result input from the voice recognition unit 5, and executes the voice guidance.
  • the voice dialogue device 1 executes the voice guidance output (hereinafter referred to as voice output) from the sound output unit 4, the voice dialogue device 1 starts executing the flowchart shown in the figure.
  • voice output the voice guidance output
  • the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of pausing the voice output.
  • the gesture recognition unit 10 recognizes the driver's gesture, for example, by monitoring the image data St1 of the camera 12 as the detection unit 11. Then, the determination unit 13 monitors whether or not there is a gesture of pausing the voice output based on the recognition result of this image recognition.
  • the gesture of pausing the voice output is preferably an operation that encourages silence, for example, "quietly".
  • the action of encouraging silence is preferably, for example, the action of raising the index finger to the mouth.
  • step 201 if the gesture of pausing the voice output is recognized in step 201, the process proceeds to step 202. On the other hand, if the gesture of pausing the audio output is not recognized in step 201, the process proceeds to step 205.
  • the stop unit 14 suspends the voice guidance in step 202. Therefore, it is possible to pause the voice guidance when it is desired to concentrate on driving.
  • step 203 the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of canceling the pause of the voice output. It is preferable that the pause release gesture is the same operation as when the audio output is paused, for example. When the determination unit 13 recognizes the gesture of canceling the pause, the determination unit 13 proceeds to step 204. On the other hand, if the determination unit 13 does not recognize the pause release gesture, the determination unit 13 returns to step 202 and maintains the state in which the voice output is paused.
  • the stop unit 14 resumes the voice guidance in step 204. Therefore, it is possible to listen to the continuation of the voice guidance stopped in the middle again.
  • step 205 the voice dialogue device 1 determines whether or not the voice guidance has ended, that is, whether or not the voice output has ended.
  • the voice dialogue device 1 determines that the voice guidance has ended, it ends the voice output, and when it determines that the voice guidance has not finished yet, it returns to step 102 and repeats the above-described processing.
  • the determination unit 13 of the voice dialogue device 1 determines whether or not there is an intention to suspend the operation of the voice dialogue based on the recognition result of the gesture recognition unit 10 during the operation of the voice dialogue.
  • the stop unit 14 stops the voice dialogue operation for a specified time. Therefore, if a user such as a driver cannot concentrate on the voice dialogue, the voice dialogue can be paused by a gesture, and then the continuation of the voice dialogue can be executed when the environment where the user can concentrate on the voice dialogue is prepared. It will be possible. Therefore, the voice dialogue can be made smooth.
  • the driver can perform a smooth voice dialogue according to the driving situation. The driver can input and listen to the voice dialogue at his own pace.
  • the voice dialogue operation to be paused is voice input.
  • the stop unit 14 stops the voice input for a predetermined time when it is determined that there is an intention to stop the voice input. Therefore, it is possible to appropriately shift the voice input to the paused state according to the user's intention, so that it is possible to concentrate on other work at the time of voice input.
  • the operation of the voice dialogue to be paused is the output of voice.
  • the stop unit 14 stops the audio output for a predetermined time when it is determined that there is an intention to stop the audio output. Therefore, it is possible to appropriately shift the voice output to the paused state according to the user's intention, so that it is possible to concentrate on other work at the time of voice output.
  • Gesture is a movement that uses a part of the user's body. Therefore, the operation of the voice dialogue can be shifted to the paused state in an easy-to-understand manner using the movement of the user's body.
  • the stop unit 14 releases the stop when the stop release condition is satisfied during the period during which the voice dialogue operation is suspended. Therefore, even if the voice dialogue is stopped by the specified gesture, the stopped state can be released and the original normal state can be restored.
  • the stop release condition is satisfied when the stop unit 14 determines as a result of the determination by the determination unit 13 that there is an intention to release the stop in the operation of the voice dialogue. Therefore, the paused voice dialogue operation can be restored to the original state by a simple operation in which the user moves the body.
  • the detection unit 11 may be a touch pad 32 provided on the steering wheel 31 mounted on the vehicle.
  • the touch pad 32 includes various sensors such as a capacitance type and a resistance film type. Gestures using the touchpad 32 include, for example, the number of touch operations of the touchpad 32, the timing of the touch operation, the direction when the surface of the touchpad 32 is traced, and a combination thereof. In this way, the operation of the voice dialogue can be paused by a simple method of operating the touch pad 32 with a finger.
  • the operation pause function may be applied to, for example, an input device 35 that executes only voice input.
  • the input device 35 includes, for example, a voice recognition unit 5, a search database 6, a gesture recognition unit 10, a determination unit 13, and a stop unit 14.
  • the input device 35 is not limited to the input device for voice dialogue, and may be used for other devices or input devices. When such an input device 35 is used, it becomes possible to provide an input device 35 capable of appropriately suspending voice input by a user's gesture.
  • the operation pause function may be applied to, for example, an output device 36 that executes only audio output.
  • the output device 36 includes, for example, a guidance output unit 7, a gesture recognition unit 10, a determination unit 13, and a stop unit 14.
  • the output device 36 is not limited to the output device for voice dialogue, and may be used as an output device for other devices or devices. When such an output device 36 is used, it becomes possible to provide an output device 36 capable of appropriately pausing the voice output by the gesture of the user.
  • [About voice dialogue] -Voice dialogue is not limited to communication in which only voice is exchanged, and may include, for example, a gesture by a part of the body.
  • the voice dialogue may use an existing in-vehicle device, or may use a component different from the in-vehicle product.
  • -Voice dialogue may be in the form of interacting with a robot, for example.
  • This robot can be applied with various designs such as anthropomorphic ones and replicas of animals.
  • the gesture for suspending the operation of the voice dialogue may be changed to various modes such as changing the facial expression and shaking the head from side to side.
  • the gesture for suspending the operation of the voice dialogue is not limited to the format described in the embodiment, and may be any format that can convey the suspension to the vehicle side.
  • the gesture that suspends the operation of voice dialogue may be a combination of voice.
  • the gesture that suspends the operation of the voice dialogue may be configured so that it can be freely registered or changed.
  • the specified time for pausing the voice dialogue is not limited to a fixed time, and various times are taken according to the method of returning from the pause.
  • the pause of the voice dialogue may be a mode in which the input and output of the voice dialogue are temporarily stopped, and the operating state of various devices for constructing the voice dialogue device 1 is not particularly limited.
  • condition for canceling the suspension may be, for example, a mode in which a return is instructed by voice.
  • the suspension release condition may be, for example, an operation of a switch provided on the vehicle or an operation of touching a sensor mounted on the vehicle.
  • the pause of voice dialogue may automatically return to the original normal state after stopping for a specified time. In this case, it is possible to prevent the operation of the voice dialogue from being left in a paused state.
  • the user confirms by voice etc. whether it is okay to return to the original normal state, and if the user permits the return, the original normal state is restored. May be good.
  • the manifestation of intention through the user's body is not limited to gestures that are physical gestures, and may be, for example, voice instructions.
  • the manifestation of intention through the user's body includes a mode in which a part of the body is moved without having the body itself move, for example, the movement of the line of sight.
  • the recognition unit 15 is not limited to the gesture recognition unit 10, and may include the voice recognition unit 5 when, for example, the voice is also monitored as a pause condition.
  • the recognition unit 15 may be a voice recognition unit 5 instead of the gesture recognition unit 10.
  • the recognition unit 15 is not limited to the voice recognition unit 5 and the gesture recognition unit 10, and may be capable of recognizing other physical movements.
  • the detection unit 11 is not limited to the on-board device or component, and may be a terminal such as a high-performance mobile phone.
  • the user's gestures and voices may be collected by using the camera function and the microphone function provided in the terminal.
  • the voice dialogue device 1 is not limited to being used in a vehicle, and may be used in other systems and devices.
  • the controller 2 (voice recognition unit 5, guidance output unit 7, gesture recognition unit 10, determination unit 13, and / or stop unit 14) shown in FIG. 1 constituting the voice dialogue device 1 may be one or more. Constructed as a computer system including a processor and a non-temporary memory that stores instructions that can be executed by the processor and that store instructions for realizing voice dialogue processing according to any of the above embodiments and other examples. Can be done. Similarly, the input device 35 of FIG. 8 and the output device 36 of FIG. 9 can also be constructed as such a computer system. Alternatively, the controller 2, the input device 35, and the output device 36 may be configured with dedicated hardware such as an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • the present disclosure includes the following embodiments.
  • (Embodiment 1) It ’s a computer system, With one or more processors It is provided with a non-temporary memory that is an instruction that can be executed by the processor and stores an instruction for realizing voice interactive processing.
  • the voice dialogue process Recognizing voice for voice dialogue, Recognizing the gesture associated with the movement of the voice dialogue based on the detection signal from the detection unit that detects the movement of the user. To determine whether or not a gesture indicating the intention to suspend the operation of the voice dialogue has been performed after the start of the voice dialogue. Pausing the operation of the voice dialogue when a gesture indicating the intention to suspend the operation of the voice dialogue is performed.
  • the voice dialogue operation includes voice recognition based on voice input.
  • suspending the operation of the voice dialogue includes suspending voice recognition based on the voice input in response to the recognition of the first gesture.
  • Embodiment 3 The computer system of embodiment 2, wherein the voice dialogue process further releases a pause in voice recognition based on the voice input in response to a user's utterance.
  • Embodiment 4 The voice dialogue operation includes voice output based on the voice recognition result.
  • pausing the operation of the voice dialogue includes pausing the voice output based on the voice recognition result in response to the recognition of the second gesture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A speech interaction device (1) is provided with: a determination unit (13) that, on the basis of the result of recognition performed by a recognition unit (10) for recognizing the purpose of a motion indicated by a user's body movement, determines whether or not there is an intention of temporarily stopping a speech interactive operation; and a stop unit (14) that, when it is determined as a result of the determination performed by the determination unit (13) that there is the intention of temporarily stopping the speech interactive operation, stops the speech interactive operation for a prescribed time.

Description

音声対話装置、入力装置及び出力装置Voice dialogue device, input device and output device
 本発明は、音声によって対話する音声対話装置、入力装置及び出力装置に関する。 The present invention relates to a voice dialogue device, an input device, and an output device that interact by voice.
 従来、人とコンピュータとが音声の対話を通じて意志を交換する音声対話装置が周知である(特許文献1等参照)。 Conventionally, a voice dialogue device in which a person and a computer exchange intentions through voice dialogue is well known (see Patent Document 1 and the like).
特開2018-189984号公報JP-A-2018-189984
 しかし、音声対話においての入力や出力は、一度開始されると途中で停止できない現状がある。このため、音声対話の利用者が音声対話とその他のタスクとを同時に実行している場合には、音声対話に必ずしも集中することができず、音声対話をスムーズなものとすることができない可能性があった。 However, there is a current situation where input and output in voice dialogue cannot be stopped halfway once they are started. For this reason, when the user of the voice dialogue is performing the voice dialogue and other tasks at the same time, it may not always be possible to concentrate on the voice dialogue and the voice dialogue may not be smooth. was there.
 本発明の目的は、音声対話をスムーズなものとすることを可能にした音声対話装置、入力装置及び出力装置を提供することにある。 An object of the present invention is to provide a voice dialogue device, an input device, and an output device that enable smooth voice dialogue.
 一実施形態による音声対話装置は、ユーザが身体を動かすことで示した動作の意図を認識する認識部の認識結果を基に、音声対話の動作を一時停止する旨の意思があったか否かを判定する判定部と、前記判定部での判定の結果、前記音声対話の動作を一時停止する旨の前記意思があったと判定された場合に、前記音声対話の動作を規定時間の間、停止する停止部とを備えた。 The voice dialogue device according to one embodiment determines whether or not there is an intention to suspend the voice dialogue operation based on the recognition result of the recognition unit that recognizes the intention of the operation indicated by the user moving the body. When it is determined as a result of the determination by the determination unit and the determination unit that the intention is to suspend the operation of the voice dialogue, the operation of the voice dialogue is stopped for a specified time. Equipped with a part.
 一実施形態による入力装置は、ユーザが身体を動かすことで示した動作の意図を認識する認識部の認識結果を基に、音声対話の入力を一時停止する旨の意思があったか否かを判定する判定部と、前記判定部での判定の結果、前記音声対話の入力を一時停止する旨の前記意思があったと判定された場合に、前記音声対話の入力を規定時間の間、停止する停止部とを備えた。 The input device according to one embodiment determines whether or not there is an intention to suspend the input of the voice dialogue based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body. As a result of the determination by the determination unit and the determination unit, when it is determined that there is the intention to suspend the input of the voice dialogue, the stop unit that stops the input of the voice dialogue for a specified time. And equipped.
 一実施形態による出力装置は、ユーザが身体を動かすことで示した動作の意図を認識する認識部の認識結果を基に、音声対話の出力を一時停止する旨の意思があったか否かを判定する判定部と、前記判定部での判定の結果、前記音声対話の出力を一時停止する旨の前記意思があったと判定された場合に、前記音声対話の出力を規定時間の間、停止する停止部とを備えた。 The output device according to one embodiment determines whether or not there is an intention to suspend the output of the voice dialogue based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body. As a result of the determination by the determination unit and the determination unit, when it is determined that there is the intention to suspend the output of the voice dialogue, the stop unit that stops the output of the voice dialogue for a specified time. And equipped.
一実施形態の音声対話装置の構成図。The block diagram of the voice dialogue apparatus of one Embodiment. カメラの配置例を示す例示図。The figure which shows the arrangement example of a camera. 音声入力を一時停止する場合に実行されるフローチャート。Flowchart executed when pausing voice input. 音声入力を一時停止するジェスチャを示す例示図。The figure which shows the gesture which pauses a voice input. 音声出力を一時停止する場合に実行されるフローチャート。A flowchart that is executed when the audio output is paused. 音声出力を一時停止するジェスチャを示す例示図。The figure which shows the gesture which pauses a voice output. 別例の検知部を示す例示図。The figure which shows the detection part of another example. 他の別例の入力装置の構成図。The block diagram of another example input device. 他の別例の出力装置の構成図。The block diagram of the output device of another example.
 以下、音声対話装置、入力装置及び出力装置の一実施形態を図1~図6に従って説明する。 Hereinafter, an embodiment of the voice dialogue device, the input device, and the output device will be described with reference to FIGS. 1 to 6.
 図1に示すように、音声対話装置1は、運転者等の乗員と対話形式による意思交換を通じて、車両の運転を支援する。音声対話装置1は、運転者等の乗員と音声対話を通じて、運転に関する各種情報を音声提供したり、車載機器を制御したりするなどして、車両の運転を支援する。 As shown in FIG. 1, the voice dialogue device 1 supports the driving of the vehicle through interactive exchange of intentions with the occupants such as the driver. The voice dialogue device 1 supports the driving of a vehicle by providing various information related to driving by voice and controlling an in-vehicle device through voice dialogue with an occupant such as a driver.
 音声対話装置1は、音声対話装置1の動作を制御するコントローラ2と、音を収音する集音部3と、音を出力する音出力部4とを備える。集音部3は、例えばマイクからなる。音出力部4は、例えば車載されたスピーカからなる。コントローラ2は、集音部3から入力した音声データDaを基に、話し相手が話す内容を音声認識により理解し、それに対する応答として、音出力部4から音声ガイダンス等の音声出力を実行する。 The voice dialogue device 1 includes a controller 2 that controls the operation of the voice dialogue device 1, a sound collecting unit 3 that collects sound, and a sound output unit 4 that outputs sound. The sound collecting unit 3 includes, for example, a microphone. The sound output unit 4 includes, for example, an in-vehicle speaker. The controller 2 understands the content spoken by the other party by voice recognition based on the voice data Da input from the sound collecting unit 3, and executes voice output such as voice guidance from the sound output unit 4 as a response to the content.
 コントローラ2は、集音部3から入力した音声データDaを音声認識する音声認識部5と、音声認識の判定の際に用いる検索データが蓄積された検索データベース6と、音声認識の結果に応じた音声ガイダンスを出力するガイダンス出力部7とを備える。音声認識部5は、集音部3から音声データDaを入力すると、この音声データDaを検索データベース6に照らし合わせて音声データDaを解析することにより、入力した音声を認識する。音声認識部5は、音声認識後、その音声認識結果をガイダンス出力部7に出力する。ガイダンス出力部7は、入力した音声認識結果を基に、それに応じた出力データDbを音出力部4に出力する。音出力部4は、入力した出力データDbを基に、音声認識結果に応じた音声ガイダンスを出力する。 The controller 2 responds to the voice recognition unit 5 that recognizes the voice data Da input from the sound collecting unit 3, the search database 6 that stores the search data used in the determination of voice recognition, and the result of voice recognition. It is provided with a guidance output unit 7 that outputs voice guidance. When the voice data Da is input from the sound collecting unit 3, the voice recognition unit 5 recognizes the input voice by comparing the voice data Da with the search database 6 and analyzing the voice data Da. After voice recognition, the voice recognition unit 5 outputs the voice recognition result to the guidance output unit 7. The guidance output unit 7 outputs the corresponding output data Db to the sound output unit 4 based on the input voice recognition result. The sound output unit 4 outputs voice guidance according to the voice recognition result based on the input output data Db.
 音声対話装置1は、音声対話の機能を途中で停止させる機能、いわゆる動作一時停止機能を備える。これは、運転中の音声対話では、例えば運転に集中するあまり、音声ガイダンスを聞き逃したり、音声入力に詰まったりするなど、円滑な音声対話が難しい現状があるからである。 The voice dialogue device 1 has a function of stopping the voice dialogue function in the middle, a so-called operation pause function. This is because it is difficult to have a smooth voice dialogue in a voice dialogue while driving, for example, because the driver concentrates too much on driving and misses the voice guidance or gets stuck in the voice input.
 音声対話装置1は、運転者等の乗員が身体の一部を用いて行うジェスチャを認識するジェスチャ認識部10を備える。ジェスチャ認識部10は、コントローラ2に設けられている。ジェスチャ認識部10は、運転者等の乗員の身体の動きを検知可能な検知部11から検知信号Stを入力し、この検知信号Stを基にジェスチャを認識する。 The voice dialogue device 1 includes a gesture recognition unit 10 that recognizes a gesture performed by a occupant such as a driver using a part of the body. The gesture recognition unit 10 is provided in the controller 2. The gesture recognition unit 10 inputs a detection signal St from a detection unit 11 capable of detecting the movement of the body of an occupant such as a driver, and recognizes a gesture based on the detection signal St.
 図2に示すように、検知部11は、画像を撮影するカメラ12を含む。カメラ12は、例えば運転者を撮影し易い位置、例えば車両のインストルメントパネルの上面などに配置されることが好ましい。カメラ12は、撮影した周囲の画像データSt1をコントローラ2に出力する。ジェスチャ認識部10は、例えばカメラ12から入力する画像データSt1を画像解析することにより、カメラ12に写り込むジェスチャを認識する。 As shown in FIG. 2, the detection unit 11 includes a camera 12 that captures an image. It is preferable that the camera 12 is arranged at a position where it is easy to photograph the driver, for example, on the upper surface of the instrument panel of the vehicle. The camera 12 outputs the captured surrounding image data St1 to the controller 2. The gesture recognition unit 10 recognizes the gesture reflected in the camera 12 by performing image analysis of the image data St1 input from the camera 12, for example.
 図1に戻り、音声対話装置1は、ユーザに音声対話の動作を一時停止する旨の意思があったか否かを判定する判定部13を備える。判定部13は、コントローラ2に設けられている。本例の判定部13は、ユーザが身体を動かすことで示した動作の意図を認識する認識部15の認識結果を基に、音声対話の動作を一時停止する旨の意思があったか否かを判定する。認識部15は、ユーザの身体を通じた意思表示として身体の動きを認識するジェスチャ認識部10であることが好ましい。また、判定部13は、音声対話の入力と出力との両方において、これら動作を一時停止する旨のジェスチャがあったか否かを判定する。 Returning to FIG. 1, the voice dialogue device 1 includes a determination unit 13 for determining whether or not the user has an intention to suspend the operation of the voice dialogue. The determination unit 13 is provided in the controller 2. The determination unit 13 of this example determines whether or not there is an intention to suspend the operation of the voice dialogue based on the recognition result of the recognition unit 15 that recognizes the intention of the operation indicated by the user moving the body. To do. The recognition unit 15 is preferably a gesture recognition unit 10 that recognizes the movement of the body as a manifestation of intention through the user's body. Further, the determination unit 13 determines whether or not there is a gesture to suspend these operations in both the input and the output of the voice dialogue.
 音声対話装置1は、判定部13の判定結果に基づき音声対話の動作を停止させる停止部14を備える。停止部14は、コントローラ2に設けられている。停止部14は、判定部13での判定の結果、音声対話の動作を一時停止する旨の意思があったと判定された場合に、音声対話の動作を規定時間の間、停止する。本例の停止部14は、音声対話の入力と出力との両方において、これら動作を一時停止することが可能である。 The voice dialogue device 1 includes a stop unit 14 that stops the operation of the voice dialogue based on the determination result of the determination unit 13. The stop unit 14 is provided in the controller 2. When it is determined as a result of the determination by the determination unit 13 that the stop unit 14 intends to suspend the operation of the voice dialogue, the stop unit 14 stops the operation of the voice dialogue for a predetermined time. The stop unit 14 of this example can suspend these operations at both the input and output of the voice dialogue.
 停止部14は、音声対話の動作を一時停止させている間、その停止の解除条件が満たされると、停止を解除する。音声入力の場合、停止の解除条件は、停止下における発話の再開であることが好ましい。また、音声出力の場合、停止の解除条件は、例えば音声ガイダンスを一時停止する際に行ったジェスチャと同じジェスチャの実行であることが好ましい。 The stop unit 14 cancels the stop when the condition for canceling the stop is satisfied while the operation of the voice dialogue is paused. In the case of voice input, it is preferable that the condition for canceling the stop is the restart of the utterance under the stop. Further, in the case of voice output, it is preferable that the stop release condition is, for example, the execution of the same gesture as the gesture performed when pausing the voice guidance.
 次に、図3~図6を用いて、本実施形態の音声対話装置1の作用について説明する。 Next, the operation of the voice dialogue device 1 of the present embodiment will be described with reference to FIGS. 3 to 6.
 図3に示すように、音声対話装置1は、システム起動の操作が実行されると、同図に示すフローチャートの実行を開始する。なお、システム起動は、例えば音声入力による開始指示や、運転席周辺に設けられたスイッチの操作や、カーナビゲーションシステムでの開始操作などがある。 As shown in FIG. 3, when the system start operation is executed, the voice dialogue device 1 starts executing the flowchart shown in the figure. The system activation includes, for example, a start instruction by voice input, an operation of a switch provided around the driver's seat, and a start operation in a car navigation system.
 ステップ101において、判定部13は、発話を検知すると、音声対話の入力(以降、音声入力と記す)を一時停止すべきか否かの判定を開始する。なお、本例の判定部13は、集音部3から入力する音声データDaを基に発話を検知する。 In step 101, when the determination unit 13 detects the utterance, it starts determining whether or not to suspend the input of the voice dialogue (hereinafter referred to as voice input). The determination unit 13 of this example detects the utterance based on the voice data Da input from the sound collecting unit 3.
 ステップ102において、判定部13は、ジェスチャ認識部10が音声入力の一時停止のジェスチャを認識したか否かを判定する。本例の場合、ジェスチャ認識部10は、例えば検知部11としてのカメラ12の画像データSt1を監視することにより、運転者のジェスチャを認識する。そして、判定部13は、この画像認識の認識結果を基に、音声入力の一時停止のジェスチャがあったか否かを監視する。 In step 102, the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of pausing the voice input. In the case of this example, the gesture recognition unit 10 recognizes the driver's gesture by, for example, monitoring the image data St1 of the camera 12 as the detection unit 11. Then, the determination unit 13 monitors whether or not there is a gesture of pausing the voice input based on the recognition result of this image recognition.
 図4に示すように、音声入力の一時停止のジェスチャは、例えば「待って」と会話を遮る動作であることが好ましい。会話を遮る動作は、例えば手を広げて差し出す動作であることが好ましい。 As shown in FIG. 4, the gesture of pausing voice input is preferably an operation of interrupting a conversation, for example, "waiting". The action of interrupting the conversation is preferably, for example, an action of spreading out the hand and holding out.
 図3に戻り、ステップ102において、音声入力の一時停止のジェスチャが認識されれば、ステップ103に移行する。一方、ステップ102において、音声入力の一時停止のジェスチャが認識されなければ、ステップ105に移行する。 Returning to FIG. 3, if the gesture of pausing the voice input is recognized in step 102, the process proceeds to step 103. On the other hand, if the gesture of pausing the voice input is not recognized in step 102, the process proceeds to step 105.
 停止部14は、音声入力の一時停止のジェスチャがあったと判定部13によって判定された場合、ステップ103において音声入力を一時停止する。このため、運転に集中したいときなどの場合に、音声対話の入力を一時停止しておくことが可能となる。このとき、停止部14は、集音部3であるマイクを起動させた状態を維持して、音声入力を一時停止する。こうすることで、音声入力の再検知を可能にする。 When the determination unit 13 determines that the voice input pause gesture has been made, the stop unit 14 suspends the voice input in step 103. Therefore, when it is desired to concentrate on driving, it is possible to suspend the input of the voice dialogue. At this time, the stop unit 14 maintains the state in which the microphone, which is the sound collecting unit 3, is activated, and suspends the voice input. This allows for rediscovery of voice input.
 ステップ104において、停止部14は、音声入力を一時停止している期間において発話を検知すると、音声入力の一時停止を解除する。すなわち、音声入力を一時停止している間に、運転者が音声入力を再開しようとして発話が行われると、音声入力の一時停止を解除する。これにより、音声入力を再開することが可能となる。なお、このときの発話は、音声入力を認識することができれば、どのような内容でもよい。 In step 104, when the stop unit 14 detects an utterance during the period in which the voice input is paused, the stop unit 14 cancels the pause of the voice input. That is, if the driver tries to resume the voice input and makes an utterance while the voice input is paused, the pause of the voice input is released. This makes it possible to resume voice input. The utterance at this time may have any content as long as the voice input can be recognized.
 ステップ105において、音声対話装置1は、発話が終了したか否か、すなわち音声入力が終了したか否かを判定する。音声対話装置1は、発話終了と判定した場合、音声入力を終了し、発話がまだ終了していないと判定した場合、ステップ102に戻り、前述の処理を繰り返す。 In step 105, the voice dialogue device 1 determines whether or not the utterance is completed, that is, whether or not the voice input is completed. When the voice dialogue device 1 determines that the utterance is finished, the voice input is finished, and when it is determined that the utterance is not finished, the process returns to step 102 and repeats the above-described processing.
 音声認識部5は、音声入力された音声データDaを基に検索データベース6を参照して、音声入力を解析する。そして、音声認識部5は、解析後の音声認識結果をガイダンス出力部7に出力する。ガイダンス出力部7は、音声認識部5から入力した音声認識結果を基に音出力部4を作動させて、音声ガイダンスを実行する。 The voice recognition unit 5 analyzes the voice input by referring to the search database 6 based on the voice input voice data Da. Then, the voice recognition unit 5 outputs the voice recognition result after the analysis to the guidance output unit 7. The guidance output unit 7 operates the sound output unit 4 based on the voice recognition result input from the voice recognition unit 5, and executes the voice guidance.
 図5に示すように、音声対話装置1は、音声ガイダンスの出力(以下、音声出力と記す)を音出力部4から実行する場合、同図に示すフローチャートの実行を開始する。 As shown in FIG. 5, when the voice dialogue device 1 executes the voice guidance output (hereinafter referred to as voice output) from the sound output unit 4, the voice dialogue device 1 starts executing the flowchart shown in the figure.
 ステップ201において、判定部13は、ジェスチャ認識部10が音声出力の一時停止のジェスチャを認識したか否かを判定する。ジェスチャ認識部10は、例えば検知部11としてのカメラ12の画像データSt1を監視することにより、運転者のジェスチャを認識する。そして、判定部13は、この画像認識の認識結果を基に、音声出力の一時停止のジェスチャがあったか否かを監視する。 In step 201, the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of pausing the voice output. The gesture recognition unit 10 recognizes the driver's gesture, for example, by monitoring the image data St1 of the camera 12 as the detection unit 11. Then, the determination unit 13 monitors whether or not there is a gesture of pausing the voice output based on the recognition result of this image recognition.
 図6に示すように、音声出力の一時停止のジェスチャは、例えば「静かに」と沈黙させることを促す動作であることが好ましい。沈黙させることを促す動作は、例えば人差し指を口元に立てる動作であることが好ましい。 As shown in FIG. 6, the gesture of pausing the voice output is preferably an operation that encourages silence, for example, "quietly". The action of encouraging silence is preferably, for example, the action of raising the index finger to the mouth.
 図5に戻り、ステップ201において、音声出力の一時停止のジェスチャが認識されれば、ステップ202に移行する。一方、ステップ201において、音声出力の一時停止のジェスチャが認識されなければ、ステップ205に移行する。 Returning to FIG. 5, if the gesture of pausing the voice output is recognized in step 201, the process proceeds to step 202. On the other hand, if the gesture of pausing the audio output is not recognized in step 201, the process proceeds to step 205.
 停止部14は、音声出力の一時停止のジェスチャがあったと判定部13によって判定された場合、ステップ202において音声ガイダンスを一時停止する。このため、運転に集中したいときなどの場合に、音声ガイダンスを一時停止しておくことが可能となる。 When the determination unit 13 determines that the voice output pause gesture has been made, the stop unit 14 suspends the voice guidance in step 202. Therefore, it is possible to pause the voice guidance when it is desired to concentrate on driving.
 ステップ203において、判定部13は、ジェスチャ認識部10が音声出力の一時停止解除のジェスチャを認識したか否かを判定する。一時停止解除のジェスチャは、例えば音声出力を一時停止したときと同様の動作であることが好ましい。判定部13は、一時停止解除のジェスチャを認識した場合、ステップ204に移行する。一方、判定部13は、一時停止解除のジェスチャを認識しなかった場合、ステップ202に戻り、音声出力を一時停止する状態を維持する。 In step 203, the determination unit 13 determines whether or not the gesture recognition unit 10 has recognized the gesture of canceling the pause of the voice output. It is preferable that the pause release gesture is the same operation as when the audio output is paused, for example. When the determination unit 13 recognizes the gesture of canceling the pause, the determination unit 13 proceeds to step 204. On the other hand, if the determination unit 13 does not recognize the pause release gesture, the determination unit 13 returns to step 202 and maintains the state in which the voice output is paused.
 停止部14は、音声出力の一時停止解除のジェスチャがあったと判定部13が認識した場合、ステップ204において音声ガイダンスを再開する。よって、途中で止めた音声ガイダンスの続きを再度聞くことが可能となる。 When the determination unit 13 recognizes that there is a gesture to release the pause of the voice output, the stop unit 14 resumes the voice guidance in step 204. Therefore, it is possible to listen to the continuation of the voice guidance stopped in the middle again.
 ステップ205において、音声対話装置1は、音声ガイダンスが終了したか否か、すなわち音声出力が終了したか否かを判定する。音声対話装置1は、音声ガイダンス終了と判定した場合、音声出力を終了し、音声ガイダンスがまだ終了していないと判定した場合、ステップ102に戻り、前述の処理を繰り返す。 In step 205, the voice dialogue device 1 determines whether or not the voice guidance has ended, that is, whether or not the voice output has ended. When the voice dialogue device 1 determines that the voice guidance has ended, it ends the voice output, and when it determines that the voice guidance has not finished yet, it returns to step 102 and repeats the above-described processing.
 上記実施形態の音声対話装置1によれば、以下のような効果を得ることができる。 According to the voice dialogue device 1 of the above embodiment, the following effects can be obtained.
 音声対話装置1の判定部13は、音声対話の動作時、ジェスチャ認識部10の認識結果を基に、音声対話の動作を一時停止する旨の意思があったか否かを判定する。音声対話装置1の停止部14は、動作を一時停止する旨の意思があったと判定部13により判定された場合、音声対話の動作を規定時間の間、停止する。このため、運転者等のユーザが音声対話に集中できない場合には、音声対話をジェスチャによって一時停止し、その後、音声対話に集中できる環境が整った際に、音声対話の続きを実行することが可能となる。よって、音声対話をスムーズなものとすることができる。また、運転者は、運転の状況に応じた円滑な音声対話を実行することができる。運転者は、自身のペースに合わせて、音声対話の入力を行ったり、出力を聞いたりすることができる。 The determination unit 13 of the voice dialogue device 1 determines whether or not there is an intention to suspend the operation of the voice dialogue based on the recognition result of the gesture recognition unit 10 during the operation of the voice dialogue. When the determination unit 13 determines that the stop unit 14 of the voice dialogue device 1 intends to suspend the operation, the stop unit 14 stops the voice dialogue operation for a specified time. Therefore, if a user such as a driver cannot concentrate on the voice dialogue, the voice dialogue can be paused by a gesture, and then the continuation of the voice dialogue can be executed when the environment where the user can concentrate on the voice dialogue is prepared. It will be possible. Therefore, the voice dialogue can be made smooth. In addition, the driver can perform a smooth voice dialogue according to the driving situation. The driver can input and listen to the voice dialogue at his own pace.
 一時停止の対象とする音声対話の動作は、音声の入力である。停止部14は、判定部13での判定の結果、音声入力を停止する旨の意思があったと判定された場合に、音声の入力を規定時間の間、停止する。よって、ユーザの意思に沿って必要に応じ音声入力を一時停止の状態に適宜移行させることが可能となるので、音声入力時に他の作業に集中することができる。 The voice dialogue operation to be paused is voice input. As a result of the determination by the determination unit 13, the stop unit 14 stops the voice input for a predetermined time when it is determined that there is an intention to stop the voice input. Therefore, it is possible to appropriately shift the voice input to the paused state according to the user's intention, so that it is possible to concentrate on other work at the time of voice input.
 一時停止の対象とする音声対話の動作は、音声の出力である。停止部14は、判定部13での判定の結果、音声出力を停止する旨の意思があったと判定された場合に、音声の出力を規定時間の間、停止する。よって、ユーザの意思に沿って必要に応じ音声出力を一時停止の状態に適宜移行させることが可能となるので、音声出力時に他の作業に集中することができる。 The operation of the voice dialogue to be paused is the output of voice. As a result of the determination by the determination unit 13, the stop unit 14 stops the audio output for a predetermined time when it is determined that there is an intention to stop the audio output. Therefore, it is possible to appropriately shift the voice output to the paused state according to the user's intention, so that it is possible to concentrate on other work at the time of voice output.
 ジェスチャは、ユーザの身体の一部を用いた動きである。よって、ユーザの身体の動きを利用した分かり易い態様で、音声対話の動作を一時停止の状態に移行させることができる。 Gesture is a movement that uses a part of the user's body. Therefore, the operation of the voice dialogue can be shifted to the paused state in an easy-to-understand manner using the movement of the user's body.
 停止部14は、音声対話の動作を一時停止させている期間に、停止の解除条件が満たされると、停止を解除する。よって、音声対話を規定のジェスチャにより停止状態としても、この停止状態を解除して、元の通常状態に復帰させることができる。 The stop unit 14 releases the stop when the stop release condition is satisfied during the period during which the voice dialogue operation is suspended. Therefore, even if the voice dialogue is stopped by the specified gesture, the stopped state can be released and the original normal state can be restored.
 停止部14は、判定部13での判定の結果、音声対話の動作における停止を解除する意思があったと判定された場合を、停止の解除条件が満たされたものとする。よって、ユーザが身体を動かす簡便な操作により、一時停止した音声対話の動作を、元の状態に復帰させることができる。 It is assumed that the stop release condition is satisfied when the stop unit 14 determines as a result of the determination by the determination unit 13 that there is an intention to release the stop in the operation of the voice dialogue. Therefore, the paused voice dialogue operation can be restored to the original state by a simple operation in which the user moves the body.
 なお、本実施形態は、以下のように変更して実施することができる。本実施形態及び以下の変更例は、技術的に矛盾しない範囲で互いに組み合わせて実施することができる。 Note that this embodiment can be modified and implemented as follows. The present embodiment and the following modified examples can be implemented in combination with each other within a technically consistent range.
 [音声対話装置1について]
 ・図7に示すように、検知部11は、車載されたステアリングホイール31に設けられたタッチパッド32でもよい。タッチパッド32は、例えば静電容量方式や抵抗膜方式等の各種センサからなる。タッチパッド32を用いたジェスチャには、例えばタッチパッド32のタッチ操作の回数、タッチ操作のタイミング、タッチパッド32の表面をなぞり操作する際の方向、及びこれらの組み合わせなどがある。このように、タッチパッド32を指で操作する簡易的な手法により、音声対話の動作を一時停止させることができる。
[About voice dialogue device 1]
-As shown in FIG. 7, the detection unit 11 may be a touch pad 32 provided on the steering wheel 31 mounted on the vehicle. The touch pad 32 includes various sensors such as a capacitance type and a resistance film type. Gestures using the touchpad 32 include, for example, the number of touch operations of the touchpad 32, the timing of the touch operation, the direction when the surface of the touchpad 32 is traced, and a combination thereof. In this way, the operation of the voice dialogue can be paused by a simple method of operating the touch pad 32 with a finger.
 ・図8に示すように、動作一時停止機能は、例えば音声入力のみを実行する入力装置35に適用されてもよい。入力装置35は、例えば音声認識部5、検索データベース6、ジェスチャ認識部10、判定部13及び停止部14を備える。また、この入力装置35は、音声対話の入力装置に限定されず、他の機器や入力装置に使用されてもよい。このような入力装置35にした場合、音声入力をユーザのジェスチャによって適宜一時停止できる入力装置35を提供できるようになる。 -As shown in FIG. 8, the operation pause function may be applied to, for example, an input device 35 that executes only voice input. The input device 35 includes, for example, a voice recognition unit 5, a search database 6, a gesture recognition unit 10, a determination unit 13, and a stop unit 14. Further, the input device 35 is not limited to the input device for voice dialogue, and may be used for other devices or input devices. When such an input device 35 is used, it becomes possible to provide an input device 35 capable of appropriately suspending voice input by a user's gesture.
 ・図9に示すように、動作一時停止機能は、例えば音声出力のみを実行する出力装置36に適用されてもよい。出力装置36は、例えばガイダンス出力部7、ジェスチャ認識部10、判定部13及び停止部14を備える。また、この出力装置36は、音声対話の出力装置に限定されず、他の機器や装置の出力装置に使用されてもよい。このような出力装置36にした場合、音声出力をユーザのジェスチャによって適宜一時停止できる出力装置36を提供できるようになる。 -As shown in FIG. 9, the operation pause function may be applied to, for example, an output device 36 that executes only audio output. The output device 36 includes, for example, a guidance output unit 7, a gesture recognition unit 10, a determination unit 13, and a stop unit 14. Further, the output device 36 is not limited to the output device for voice dialogue, and may be used as an output device for other devices or devices. When such an output device 36 is used, it becomes possible to provide an output device 36 capable of appropriately pausing the voice output by the gesture of the user.
 [音声対話について]
 ・音声対話は、音声のみをやり取りするコミュニケーションに限定されず、例えば身体の一部によるジェスチャを含んでもよい。
[About voice dialogue]
-Voice dialogue is not limited to communication in which only voice is exchanged, and may include, for example, a gesture by a part of the body.
 ・音声対話は、既設の車載機器を用いてもよいし、車載品とは別の部品を用いた態様でもよい。 -The voice dialogue may use an existing in-vehicle device, or may use a component different from the in-vehicle product.
 ・音声対話は、例えばロボットと対話する形式としてもよい。このロボットは、擬人化されたもの、動物を模写したものなど、種々のデザインを適用できる。 -Voice dialogue may be in the form of interacting with a robot, for example. This robot can be applied with various designs such as anthropomorphic ones and replicas of animals.
 [音声対話の動作を一時停止する意思について]
 ・音声対話の動作を一時停止するジェスチャは、顔の表情の変化や、頭部を左右に振るなど、種々の態様に変更してもよい。
[Intention to suspend the operation of voice dialogue]
-The gesture for suspending the operation of the voice dialogue may be changed to various modes such as changing the facial expression and shaking the head from side to side.
 ・音声対話の動作を一時停止するジェスチャは、実施例に述べた形式に限定されず、一時停止を車両側に伝えることができる形式であればよい。 -The gesture for suspending the operation of the voice dialogue is not limited to the format described in the embodiment, and may be any format that can convey the suspension to the vehicle side.
 ・音声対話の動作を一時停止するジェスチャは、音声を組み合わせた態様としてもよい。 -The gesture that suspends the operation of voice dialogue may be a combination of voice.
 ・音声対話の動作を一時停止するジェスチャは、自由に登録や変更が可能な構成としてもよい。 -The gesture that suspends the operation of the voice dialogue may be configured so that it can be freely registered or changed.
 [音声対話の一時停止について]
 ・音声対話を一時停止する規定時間は、一定時間に限定されず、一時停止からの復帰の方式に応じた種々の時間をとる。
[Pause of voice dialogue]
-The specified time for pausing the voice dialogue is not limited to a fixed time, and various times are taken according to the method of returning from the pause.
 ・音声対話の一時停止は、音声対話の入力や出力を一旦停止する態様であればよく、音声対話装置1を構築する各種デバイスの動作状態を特に限定するものではない。 -The pause of the voice dialogue may be a mode in which the input and output of the voice dialogue are temporarily stopped, and the operating state of various devices for constructing the voice dialogue device 1 is not particularly limited.
 [一時停止の解除条件について]
 ・一時停止の解除条件は、例えば音声によって復帰を指示する態様でもよい。
[Conditions for canceling the suspension]
-The condition for canceling the pause may be, for example, a mode in which a return is instructed by voice.
 ・一時停止の解除条件は、例えば車両に設けられたスイッチ類を操作したり、車載されたセンサをタッチしたりする操作としてもよい。 -The suspension release condition may be, for example, an operation of a switch provided on the vehicle or an operation of touching a sensor mounted on the vehicle.
 ・音声対話の一時停止は、規定時間の停止後、自動で元の通常状態に復帰してもよい。この場合、音声対話の動作を一時停止のまま放置してしまうことを防止することができる。 ・ The pause of voice dialogue may automatically return to the original normal state after stopping for a specified time. In this case, it is possible to prevent the operation of the voice dialogue from being left in a paused state.
 ・音声対話が一時停止している際に、元の通常状態に復帰してもよいか音声等によりユーザにより確認し、ユーザが復帰を許可した場合に、元の通常状態に復帰するようにしてもよい。 -When the voice dialogue is paused, the user confirms by voice etc. whether it is okay to return to the original normal state, and if the user permits the return, the original normal state is restored. May be good.
 [ユーザの身体を通じた意思表示について]
 ・ユーザの身体を通じた意思表示は、身体の身振り手振りであるジェスチャに限定されず、例えば音声による指示としてもよい。
[About manifestation of intention through the user's body]
-The manifestation of intention through the user's body is not limited to gestures that are physical gestures, and may be, for example, voice instructions.
 ・ユーザの身体を通じた意思表示は、例えば目線の動きなど、体自体には動きを持たせずに身体の一部を動かす態様を含む。 ・ The manifestation of intention through the user's body includes a mode in which a part of the body is moved without having the body itself move, for example, the movement of the line of sight.
 [認識部15について]
 ・認識部15は、ジェスチャ認識部10に限定されず、例えば一時停止の条件として音声も監視する場合、音声認識部5を含んでもよい。
[About recognition unit 15]
The recognition unit 15 is not limited to the gesture recognition unit 10, and may include the voice recognition unit 5 when, for example, the voice is also monitored as a pause condition.
 ・認識部15は、ジェスチャ認識部10に代えて、音声認識部5としてもよい。 -The recognition unit 15 may be a voice recognition unit 5 instead of the gesture recognition unit 10.
 ・認識部15は、音声認識部5やジェスチャ認識部10に限定されず、他の身体的な動きを認識できるものでもよい。 The recognition unit 15 is not limited to the voice recognition unit 5 and the gesture recognition unit 10, and may be capable of recognizing other physical movements.
 [その他]
 ・検知部11は、車載された装置や部品に限定されず、例えば高機能携帯電話等の端末でもよい。この場合、例えば端末に設けられたカメラ機能やマイク機能を用いて、ユーザのジェスチャや音声を収集するとよい。
[Other]
The detection unit 11 is not limited to the on-board device or component, and may be a terminal such as a high-performance mobile phone. In this case, for example, the user's gestures and voices may be collected by using the camera function and the microphone function provided in the terminal.
 ・音声対話装置1は、車両に使用されることに限定されず、他のシステムや機器に用いられてもよい。 -The voice dialogue device 1 is not limited to being used in a vehicle, and may be used in other systems and devices.
 ・音声対話装置1を構成する図1に示されたコントローラ2(音声認識部5、ガイダンス出力部7、ジェスチャ認識部10、判定部13、及び/又は停止部14)は、1つ又は複数のプロセッサと、当該プロセッサが実行可能な命令であって上記実施形態及び別例のいずれかに従った音声対話処理を実現するための命令を記憶した非一時的メモリとを含むコンピュータシステムとして構築することができる。同様に、図8の入力装置35及び図9の出力装置36も、こうしたコンピュータシステムとして構築することができる。或いは、コントローラ2、入力装置35、及び出力装置36は、特定用途向け集積回路(ASIC)などの専用ハードウェアで構成されてもよい。 The controller 2 (voice recognition unit 5, guidance output unit 7, gesture recognition unit 10, determination unit 13, and / or stop unit 14) shown in FIG. 1 constituting the voice dialogue device 1 may be one or more. Constructed as a computer system including a processor and a non-temporary memory that stores instructions that can be executed by the processor and that store instructions for realizing voice dialogue processing according to any of the above embodiments and other examples. Can be done. Similarly, the input device 35 of FIG. 8 and the output device 36 of FIG. 9 can also be constructed as such a computer system. Alternatively, the controller 2, the input device 35, and the output device 36 may be configured with dedicated hardware such as an application specific integrated circuit (ASIC).
 ・本開示は以下の実施態様を包含する。
(実施態様1)
 コンピュータシステムであって、
 1つ又は複数のプロセッサと、
 前記プロセッサが実行可能な命令であって音声対話処理を実現するための命令を記憶した非一時的メモリと、を備え、
 前記音声対話処理が、
  音声対話用の音声を認識すること、
  ユーザの動きを検知する検知部からの検知信号に基づいて前記音声対話の動作と関連付けられたジェスチャを認識すること、
  前記音声対話の開始後に、当該音声対話の動作を一時停止する旨の意思を表すジェスチャが行われたか否かを判定すること、
  前記音声対話の動作を一時停止する旨の意思を表すジェスチャが行われた場合に前記音声対話の動作を一時停止すること、
を含む、コンピュータシステム。
(実施態様2)
 前記音声対話の動作は、音声入力に基づく音声認識を含み、
 前記音声対話の動作を一時停止することは、第1ジェスチャの認識に応答して前記音声入力に基づく音声認識を一時停止することを含む、実施態様1のコンピュータシステム。
(実施態様3)
 前記音声対話処理がさらに、ユーザの発話に応答して前記音声入力に基づく音声認識の一時停止を解除することを含む、実施態様2のコンピュータシステム。
(実施態様4)
 前記音声対話の動作は、音声認識結果に基づく音声出力を含み、
 前記音声対話の動作を一時停止することは、第2ジェスチャの認識に応答して前記音声認識結果に基づく音声出力を一時停止することを含む、実施態様1~3のいずれか一つのコンピュータシステム。
(実施態様5)
 前記音声対話処理がさらに、第3ジェスチャの認識に応答して前記音声認識結果に基づく音声出力の一時停止を解除することを含む、実施態様4のコンピュータシステム。
(実施態様6)
 前記第3ジェスチャが前記第2ジェスチャと同一である、実施態様5のコンピュータシステム。
The present disclosure includes the following embodiments.
(Embodiment 1)
It ’s a computer system,
With one or more processors
It is provided with a non-temporary memory that is an instruction that can be executed by the processor and stores an instruction for realizing voice interactive processing.
The voice dialogue process
Recognizing voice for voice dialogue,
Recognizing the gesture associated with the movement of the voice dialogue based on the detection signal from the detection unit that detects the movement of the user.
To determine whether or not a gesture indicating the intention to suspend the operation of the voice dialogue has been performed after the start of the voice dialogue.
Pausing the operation of the voice dialogue when a gesture indicating the intention to suspend the operation of the voice dialogue is performed.
Including computer systems.
(Embodiment 2)
The voice dialogue operation includes voice recognition based on voice input.
The computer system of embodiment 1, wherein suspending the operation of the voice dialogue includes suspending voice recognition based on the voice input in response to the recognition of the first gesture.
(Embodiment 3)
The computer system of embodiment 2, wherein the voice dialogue process further releases a pause in voice recognition based on the voice input in response to a user's utterance.
(Embodiment 4)
The voice dialogue operation includes voice output based on the voice recognition result.
The computer system according to any one of the first to third embodiments, wherein pausing the operation of the voice dialogue includes pausing the voice output based on the voice recognition result in response to the recognition of the second gesture.
(Embodiment 5)
The computer system of embodiment 4, wherein the voice dialogue process further unpauses the voice output based on the voice recognition result in response to the recognition of the third gesture.
(Embodiment 6)
The computer system of embodiment 5, wherein the third gesture is the same as the second gesture.

Claims (7)

  1.  ユーザが身体を動かすことで示した動作の意図を認識する認識部の認識結果を基に、音声対話の動作を一時停止する旨の意思があったか否かを判定する判定部と、
     前記判定部での判定の結果、前記音声対話の動作を一時停止する旨の前記意思があったと判定された場合に、前記音声対話の動作を規定時間の間、停止する停止部と
    を備えた音声対話装置。
    Based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body, the judgment unit that determines whether or not there is an intention to suspend the movement of the voice dialogue, and
    As a result of the determination by the determination unit, when it is determined that there is the intention to suspend the operation of the voice dialogue, the stop unit is provided to stop the operation of the voice dialogue for a specified time. Voice dialogue device.
  2.  前記音声対話の動作は、音声の入力であり、
     前記停止部は、前記判定部での判定の結果、前記音声の入力を停止する旨の前記意思があったと判定された場合、前記音声の入力を規定時間の間、停止する
    請求項1に記載の音声対話装置。
    The operation of the voice dialogue is voice input, and is
    The first aspect of claim 1 is that the stop unit stops the voice input for a specified time when it is determined as a result of the determination by the determination unit that the voice input is stopped. Voice dialogue device.
  3.  前記音声対話の動作は、音声の出力であり、
     前記停止部は、前記判定部での判定の結果、前記音声の出力を停止する旨の前記意思があったと判定された場合、前記音声の出力を規定時間の間、停止する
    請求項1又は2に記載の音声対話装置。
    The operation of the voice dialogue is the output of voice, and is
    Claim 1 or 2 that the stop unit stops the output of the voice for a specified time when it is determined as a result of the judgment by the judgment unit that the stop unit has the intention to stop the output of the voice. The voice dialogue device described in.
  4.  前記停止部は、前記音声対話の動作を一時停止させている期間に、前記停止の解除条件が満たされると、前記停止を解除する
    請求項1~3のうちいずれか一項に記載の音声対話装置。
    The voice dialogue according to any one of claims 1 to 3, wherein the stop unit cancels the stop when the stop release condition is satisfied during the period during which the operation of the voice dialogue is suspended. apparatus.
  5.  前記停止部は、前記判定部での判定の結果、前記音声対話の動作における停止を解除する前記意思があったと判定された場合を、前記停止の解除条件が満たされたものとする
    請求項4に記載の音声対話装置。
    4. Claim 4 that the stop release condition is satisfied when it is determined as a result of the determination by the determination unit that the stop unit has the intention to release the stop in the operation of the voice dialogue. The voice dialogue device described in.
  6.  ユーザが身体を動かすことで示した動作の意図を認識する認識部の認識結果を基に、音声対話の入力を一時停止する旨の意思があったか否かを判定する判定部と、
     前記判定部での判定の結果、前記音声対話の入力を一時停止する旨の前記意思があったと判定された場合に、前記音声対話の入力を規定時間の間、停止する停止部と
    を備えた入力装置。
    Based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body, a determination unit that determines whether or not there is an intention to suspend the input of the voice dialogue, and
    As a result of the determination by the determination unit, when it is determined that there is the intention to suspend the input of the voice dialogue, the stop unit is provided to stop the input of the voice dialogue for a predetermined time. Input device.
  7.  ユーザが身体を動かすことで示した動作の意図を認識する認識部の認識結果を基に、音声対話の出力を一時停止する旨の意思があったか否かを判定する判定部と、
     前記判定部での判定の結果、前記音声対話の出力を一時停止する旨の前記意思があったと判定された場合に、前記音声対話の出力を規定時間の間、停止する停止部と
    を備えた出力装置。
    Based on the recognition result of the recognition unit that recognizes the intention of the movement indicated by the user moving the body, a determination unit that determines whether or not there is an intention to suspend the output of the voice dialogue, and a determination unit.
    As a result of the determination by the determination unit, it is provided with a stop unit that stops the output of the voice dialogue for a specified time when it is determined that there is the intention to suspend the output of the voice dialogue. Output device.
PCT/JP2020/007447 2019-03-26 2020-02-25 Speech interaction device, input device, and output device WO2020195457A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080016426.XA CN113544771A (en) 2019-03-26 2020-02-25 Voice conversation device, input device, and output device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-058651 2019-03-26
JP2019058651A JP2020160725A (en) 2019-03-26 2019-03-26 Audio interactive device, input device and output device

Publications (1)

Publication Number Publication Date
WO2020195457A1 true WO2020195457A1 (en) 2020-10-01

Family

ID=72608994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/007447 WO2020195457A1 (en) 2019-03-26 2020-02-25 Speech interaction device, input device, and output device

Country Status (3)

Country Link
JP (1) JP2020160725A (en)
CN (1) CN113544771A (en)
WO (1) WO2020195457A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003114691A (en) * 2001-10-03 2003-04-18 Nissan Motor Co Ltd Text speech synthesizing device for vehicle
JP2010238145A (en) * 2009-03-31 2010-10-21 Casio Computer Co Ltd Information output device, remote control method and program
JP2016029466A (en) * 2014-07-16 2016-03-03 パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America Control method of voice recognition and text creation system and control method of portable terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602004017955D1 (en) * 2004-01-29 2009-01-08 Daimler Ag Method and system for voice dialogue interface
CN102546795A (en) * 2011-12-31 2012-07-04 成都巴比塔网络技术股份有限公司 Client-server conversation persisting method based on user dialogue mode
JP6589514B2 (en) * 2015-09-28 2019-10-16 株式会社デンソー Dialogue device and dialogue control method
CN106648054B (en) * 2016-10-08 2019-07-16 河海大学常州校区 A kind of Multimodal interaction method of the company robot based on RealSense
TWI646529B (en) * 2017-07-25 2019-01-01 雲拓科技有限公司 Active chat device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003114691A (en) * 2001-10-03 2003-04-18 Nissan Motor Co Ltd Text speech synthesizing device for vehicle
JP2010238145A (en) * 2009-03-31 2010-10-21 Casio Computer Co Ltd Information output device, remote control method and program
JP2016029466A (en) * 2014-07-16 2016-03-03 パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America Control method of voice recognition and text creation system and control method of portable terminal

Also Published As

Publication number Publication date
JP2020160725A (en) 2020-10-01
CN113544771A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
US7295904B2 (en) Touch gesture based interface for motor vehicle
US9969268B2 (en) Controlling access to an in-vehicle human-machine interface
US9738158B2 (en) Motor vehicle control interface with gesture recognition
JP2019527387A (en) Situation-aware personal assistant
JP5975947B2 (en) Program for controlling robot and robot system
JP2015513704A (en) User-specific automatic speech recognition
WO2016132729A1 (en) Robot control device, robot, robot control method and program recording medium
JP2006208460A (en) Equipment controller of voice recognition type and vehicle
WO2020195457A1 (en) Speech interaction device, input device, and output device
JP2019074498A (en) Drive supporting device
JP2009069202A (en) Speech processor
Chen et al. Eliminating driving distractions: Human-computer interaction with built-in applications
JP2020135195A (en) Vehicle controller, vehicle, and vehicle control method
JP6657048B2 (en) Processing result abnormality detection device, processing result abnormality detection program, processing result abnormality detection method, and moving object
Milde et al. Studying multi-modal human robot interaction using a mobile VR simulation
JP6377034B2 (en) Vehicle drowsiness prevention device and vehicle drowsiness prevention method
WO2020209011A1 (en) Movement control device and moving body
JP2020130502A (en) Information processing device and information processing method
JP7396490B2 (en) Information processing device and information processing method
KR102645313B1 (en) Method and apparatus for controlling contents in vehicle for a plurality of users
JP7108716B2 (en) Image display device, image display system and image display method
JP2018180424A (en) Speech recognition apparatus and speech recognition method
WO2022239642A1 (en) Information providing device for vehicle, information providing method for vehicle, and information providing program for vehicle
WO2024070080A1 (en) Information processing device, information processing method, and program
KR20230006339A (en) Apparatus and method for processing commands by recognizing driver's voice and lips

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20776616

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20776616

Country of ref document: EP

Kind code of ref document: A1