WO2019058453A1 - Dispositif et procédé de commande d'interaction vocale permettant de commander une interaction vocale - Google Patents
Dispositif et procédé de commande d'interaction vocale permettant de commander une interaction vocale Download PDFInfo
- Publication number
- WO2019058453A1 WO2019058453A1 PCT/JP2017/033902 JP2017033902W WO2019058453A1 WO 2019058453 A1 WO2019058453 A1 WO 2019058453A1 JP 2017033902 W JP2017033902 W JP 2017033902W WO 2019058453 A1 WO2019058453 A1 WO 2019058453A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- response
- speech
- unit
- user
- Prior art date
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 126
- 238000000034 method Methods 0.000 title claims description 58
- 230000004044 response Effects 0.000 claims abstract description 384
- 238000001514 detection method Methods 0.000 claims abstract description 86
- 238000012545 processing Methods 0.000 claims abstract description 60
- 238000012790 confirmation Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 13
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 42
- 230000006870 function Effects 0.000 description 38
- 238000010586 diagram Methods 0.000 description 22
- 238000004891 communication Methods 0.000 description 7
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000009118 appropriate response Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Definitions
- the present invention relates to a voice interaction control apparatus and a voice interaction control method for causing a system to present a response corresponding to voice input from a user when the user operates the system by interaction between the system and the user.
- a system having a voice recognition function inputs a voice uttered by a user, and outputs a response corresponding to the voice.
- Patent Document 1 when the user inputs an interrupting voice while the system is outputting voice, the voice output is continued or paused depending on the importance of the voice being output.
- a speech dialogue control method has been proposed for performing processing on embedded speech.
- Patent Document 1 can not capture the subsequent second voice at a specific timing, for example, immediately after the end detection of the first voice, that is, immediately after the end of the first voice capture.
- a specific timing for example, immediately after the end detection of the first voice, that is, immediately after the end of the first voice capture.
- the present invention has been made to solve the problems as described above, and provides a voice interaction control device for performing interaction control so that the system can appropriately respond to the second voice input after the first voice. To aim.
- a voice interaction control device performs voice interaction control for causing the system to present a response to voice input from the user to the user when the user performs an operation on the system by interaction between the user and the system.
- a voice section detection unit that detects a voice section from the beginning to the end of the input continuous voice, a voice recognition unit that recognizes voice in the voice section, and a voice recognition result of voice
- a response generation unit that generates a response to be presented to the user from the system, and an interaction control unit that controls the voice section detection unit, the voice recognition unit, and the response generation unit.
- the dialogue control unit is configured to detect a first voice section forming a series of first voice input as voice and until a first response corresponding to a voice recognition result of the first voice is presented to the user from the system.
- a second voice is formed to enable generation of a second response to a series of second voices input as voice after the first voice even if processing for the first voice including processing is not completed.
- the voice segment detection unit detects the voice segment.
- a voice interaction control device for performing interaction control so that the system can appropriately respond to the second voice input after the first voice.
- FIG. 1 is a block diagram showing a configuration of a voice interaction control device and system in a first embodiment. It is a figure which shows an example of the processing circuit which a speech interaction control apparatus contains. It is a figure which shows another example of the processing circuit which a speech interaction control apparatus contains.
- 5 is a sequence chart showing an example of the operation of the voice interaction control apparatus and the voice interaction control method according to the first embodiment. 5 is a flowchart showing an example of the operation of the voice interaction control device and the voice interaction control method according to the first embodiment.
- FIG. 7 is a block diagram showing the configuration of a voice interaction control device and system in a second embodiment.
- FIG. 18 is a diagram showing an example of a configuration of a system response database in Embodiment 2.
- FIG. 1 is a block diagram showing a configuration of a voice interaction control device and system in a first embodiment. It is a figure which shows an example of the processing circuit which a speech interaction control apparatus contains. It is a figure which shows another example of the processing circuit
- FIG. 16 is a sequence chart showing an example of the operation of the voice interaction control device and the voice interaction control method according to Embodiment 2.
- FIG. FIG. 10 is a flowchart showing an example of the operation of the voice interaction control device and the voice interaction control method according to Embodiment 2.
- FIG. FIG. 14 is a block diagram showing the configuration of a voice interaction control device and system in a third embodiment.
- FIG. 17 is a sequence chart showing an example of the operation of the voice interaction control device in Embodiment 3 and the voice interaction control method.
- FIG. FIG. 17 is a flow chart showing an example of the operation of the speech dialog control device and the speech dialog control method in the third embodiment.
- FIG. FIG. 14 is a block diagram showing the configuration of a voice interaction control device and system in a fourth embodiment.
- FIG. 18 is a diagram showing an example of a configuration of a first dictionary database in the fourth embodiment.
- FIG. 18 is a diagram showing an example of a configuration of a second dictionary database in the fourth embodiment.
- FIG. 18 is a diagram showing an example of a configuration of a system response database in a fourth embodiment.
- FIG. 16 is a sequence chart showing an example of the operation of the voice interaction control device in the fourth embodiment and the voice interaction control method.
- FIG. 16 is a flow chart showing an example of the operation of the voice interaction control device and the voice interaction control method according to the fourth embodiment.
- FIG. 18 is a block diagram showing the configuration of a voice interaction control device and system in a fifth embodiment.
- FIG. 16 is a flow chart showing an example of the operation of the voice interaction control device and the voice interaction control method according to the fifth embodiment.
- FIG. 21 is a flow chart showing an example of the operation of the voice interaction control device in the sixth embodiment and the voice interaction control method.
- FIG. 21 is a block diagram showing an example of a configuration of a voice dialogue control device mounted on a vehicle in a seventh embodiment.
- FIG. 21 is a block diagram showing an example of the configuration of a voice dialog control device provided in a server according to a seventh embodiment.
- Embodiment 1 A voice dialogue control apparatus and a voice dialogue control method according to the first embodiment will be described.
- FIG. 1 is a block diagram showing the configuration of voice dialogue control apparatus 100 and system 200 in the first embodiment.
- the system 200 inputs a voice uttered by the user to operate the system 200, and presents a response to the voice to the user.
- the system 200 includes a voice input device 21, a voice interaction control device 100 and a response presentation device 22.
- the system 200 is, for example, a navigation system, an audio system, a control system that controls devices related to the driving of a vehicle, a control system that controls a driving environment, and the like.
- the voice input device 21 is an interface for the user to operate the system 200.
- the voice input device 21 inputs a voice uttered by the user in order to perform an operation on the system 200, and outputs the voice to the voice dialogue control device 100.
- the voice input device 21 is, for example, a microphone.
- the voice interaction control device 100 receives voice from the voice input device 21 and performs interaction control for causing the system 200 to present a response corresponding to the voice to the user.
- the response presentation device 22 presents the response generated by the voice interaction control device 100 to the user. Note that “to present” includes that the response presentation device 22 operates in accordance with the generated response.
- the response presentation device 22 may present the response to the user by operating according to the response generated by the voice interaction control device 100.
- the response presentation device 22 is an audio output device or display device.
- the voice output device presents a response by, for example, voice outputting guidance information to a destination.
- the display device presents a response, for example, by displaying guidance information to a destination along with a map.
- the response presentation device 22 is a music playback device.
- the music playback device presents a response by playing music.
- the response presentation device 22 is a drive control device of the vehicle.
- the response presentation device 22 is an air conditioner, a light, a mirror position adjustment device, a seat position adjustment device, or the like.
- the voice dialogue control apparatus 100 includes a voice section detection unit 11, a voice recognition unit 12, a response generation unit 13 and a dialogue control unit 14.
- the voice section detection unit 11 detects a voice section from the beginning to the end of the input continuous voice.
- the voice activity detection unit 11 constantly detects an input voice.
- the speech recognition unit 12 performs speech recognition on the speech in the speech segment detected by the speech segment detection unit 11. At the time of the speech recognition, the speech recognition unit 12 performs speech recognition by selecting the recognition vocabulary based on the acoustically or linguistically most probable vocabulary in the speech in the speech section.
- the speech recognition unit 12 performs speech recognition, for example, with reference to a dictionary database (not shown).
- the dictionary database may be provided in the voice interaction control apparatus 100 or in an external server. When the dictionary database is provided in the server, the dialog control device communicates with the server so that the speech recognition unit 12 performs speech recognition with reference to the dictionary database.
- the response generation unit 13 generates a response corresponding to the speech recognition result of the speech recognition by the speech recognition unit 12.
- the response generator 13 generates a response, for example, with reference to a system response database (not shown).
- the system response database is, for example, a table, and the recognition vocabulary and the responses included in the speech recognition result are stored in association with each other.
- the system response database may be provided in the voice interaction control device 100 or in an external server. When the system response database is provided in the server, the dialog control device communicates with the server, and the response generation unit 13 generates a response with reference to the system response database.
- the response generation unit 13 outputs the response to the response presentation device 22.
- the dialogue control unit 14 controls the operations of the speech segment detection unit 11, the speech recognition unit 12 and the response generation unit 13.
- the dialogue control unit 14 controls each unit while monitoring the dialogue state of the system 200.
- the interactive state is a state at any time from when a voice is detected by the voice section detection unit 11 to when a response corresponding to the voice is generated and further the response is presented to the user.
- the dialogue control unit 14 controls the operation of the speech recognition unit 12 based on the notification that the speech zone detection unit 11 detects the beginning or the end of the speech zone.
- the dialogue control unit 14 controls the start of generation of the response in the response generation unit 13 based on the notification that the speech recognition unit 12 has finished the speech recognition, or starts the speech recognition of the subsequent speech in the speech recognition unit 12 Control.
- the dialogue control unit 14 controls the processing for the first voice of the series and the processing for the second voice input after the first voice.
- the processing for the first voice includes processing from detection of the first voice section forming the first voice to presentation of the first response from the system 200 to the user. More specifically, the process for the first voice is at least a process of the speech recognition unit 12 performing speech recognition of the first speech and a response generation unit 13 generating a first response corresponding to the speech recognition result of the first speech. including.
- the end of the first voice section forming the first voice is detected, and then the first response is presented to the response presentation device 22, and the beginning of the voice section forming the voice to be input next is Processing until detection may be included.
- the dialogue control unit 14 detects the second voice section forming the second voice in the voice section detection unit 11 so that the second response to the second voice can be generated even if the processing on the first voice is not completed.
- the dialogue control unit 14 causes the speech recognition unit 12 to recognize the second speech in the second speech section, and the second response corresponding to the speech recognition result of the second speech is a response generation unit. 13 to be presented from the system 200 to the user.
- FIG. 2 is a view showing an example of the processing circuit 50 provided in the voice interaction control device 100. As shown in FIG. Each function of the speech zone detection unit 11, the speech recognition unit 12, the response generation unit 13, and the dialogue control unit 14 is realized by the processing circuit 50. That is, the processing circuit 50 includes the voice section detection unit 11, the voice recognition unit 12, the response generation unit 13, and the dialogue control unit 14.
- the processing circuit 50 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), an FPGA (field-programmable) Gate Array) or a circuit combining these.
- the functions of the voice section detection unit 11, the speech recognition unit 12, the response generation unit 13, and the dialogue control unit 14 may be realized individually by a plurality of processing circuits, or realized collectively by one processing circuit. It is also good.
- FIG. 3 is a view showing another example of the processing circuit included in the voice interaction control device 100.
- the processing circuit includes a processor 51 and a memory 52.
- the processor 51 executes the program stored in the memory 52, the functions of the speech zone detection unit 11, the speech recognition unit 12, the response generation unit 13, and the dialogue control unit 14 are realized.
- software or firmware described as a program is executed by the processor 51 to implement each function. That is, the voice dialogue control device 100 includes a memory 52 for storing a program and a processor 51 for executing the program.
- the voice interaction control apparatus 100 detects a voice section from the start to the end forming the input voice sequence, recognizes the voice in the detected voice section, and recognizes the voice. Functions and operations are described which generate responses corresponding to recognition results and further control the detection of those speech segments, speech recognition and generation of responses.
- the program forms a series of second voices input after the first voice, even when the processing for the first voice is not finished when the voice interaction control apparatus 100 executes each control. The function and operation for detecting the second speech segment are described.
- the program causes the second voice in the second voice section to be voice-recognized, generates a second response corresponding to the voice recognition result of the second voice, and causes the system 200 to present it to the user. It is done.
- the above program causes a computer to execute the procedure or method of the voice section detection unit 11, the voice recognition unit 12, the response generation unit 13, and the dialogue control unit 14 described above.
- the processor 51 is, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor) or the like.
- the memory 52 is, for example, nonvolatile or volatile, such as random access memory (RAM), read only memory (ROM), flash memory, erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), or the like. It is a semiconductor memory.
- the memory 52 may be any storage medium used in the future, such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, a DVD, and the like.
- the functions of the voice section detection unit 11, the speech recognition unit 12, the response generation unit 13, and the dialogue control unit 14 described above are partially realized by dedicated hardware, and the other portions are realized by software or firmware. May be Thus, the processing circuit implements each of the functions described above by hardware, software, firmware, or a combination thereof.
- FIG. 4 is a sequence chart showing an example of the operation of the voice interaction control apparatus 100 and the voice interaction control method according to the first embodiment.
- FIG. 5 is a flowchart showing an example of the operation of the voice interaction control apparatus 100 and the voice interaction control method according to the first embodiment.
- the dialog control unit 14 controls the voice section detection unit 11 to be in a standby state in which voice reception is possible and in a standby state in which the voice recognition unit 12 is capable of speech recognition. Do. This control is performed, for example, by an operation of instructing the user to start accepting the voice section detection to the system 200. Alternatively, after startup of the system 200, the dialogue control unit 14 may automatically control the voice section detection unit 11 to a standby state in which voice can be received. After this control, the voice activity detection unit 11 is constantly in a state of monitoring the input of voice, that is, in a detectable state.
- step S10 the voice activity detection unit 11 receives the first voice and detects the beginning of the first voice activity.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S20 the voice recognition unit 12 starts voice recognition of the first voice after the start end of the first voice section detected by the voice section detection unit 11 based on the notification of the start end detection.
- step S30 the voice activity detection unit 11 detects the end of the first voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S40 the voice recognition unit 12 ends voice recognition of the first voice up to the end of the first voice section detected by the voice section detection unit 11 based on the notification of end detection.
- the voice recognition unit 12 outputs the voice recognition result of the first voice to the response generation unit 13 and notifies the dialog control unit 14 of the end.
- step S50 the response generation unit 13 starts generation of a first response corresponding to the speech recognition result of the first speech based on the control from the dialogue control unit 14.
- step S60 the voice activity detection unit 11 detects the beginning of the second voice activity of the second voice input after the first voice. The detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14. Note that step S60 and the following step S70 are performed in parallel with the generation of the first response in the response generation unit 13.
- step S70 the voice recognition unit 12 starts voice recognition of the second voice after the start end of the second voice section detected by the voice section detection unit 11 based on the notification of the start end detection.
- step S80 the response generation unit 13 completes the generation of the first response.
- the dialogue control unit 14 causes the system 200 to present the first response to the user. That is, the response presentation device 22 presents the first response to the user.
- step S90 the voice activity detection unit 11 detects the end of the second voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S100 the speech recognition unit 12 ends the speech recognition of the second speech up to the end of the second speech zone detected by the speech zone detection unit 11.
- the voice recognition unit 12 outputs the voice recognition result of the second voice to the response generation unit 13 and notifies the dialog control unit 14 of the end.
- step S110 the response generation unit 13 starts generation of a second response corresponding to the speech recognition result of the second voice input from the speech recognition unit 12 based on the control from the dialogue control unit 14.
- step S120 the response generation unit 13 completes the generation of the second response.
- the dialogue control unit 14 causes the system 200 to present the second response to the user. That is, the response presentation device 22 presents the second response to the user.
- the voice interaction control device 100 responds to the user from the system 200 with respect to the voice input from the user when the user operates the system 200 by the interaction between the user and the system 200.
- a speech dialog control device 100 for performing dialogue control for presentation, the speech section detection unit 11 detecting a speech section from the start to the end forming the input series of speech, the speech in the speech section
- a voice recognition unit 12 for recognizing, a response generation unit 13 for generating a response to be presented to the user from the system 200, which is a response corresponding to a voice recognition result of voice, a voice section detection unit 11 and a voice recognition unit 12
- a dialog control unit 14 that controls the response generation unit 13.
- the dialogue control unit 14 detects the first speech section forming the series of first speech input as speech, and then the first response corresponding to the speech recognition result of the first speech is presented from the system 200 to the user Even if the processing for the first voice including the processing up to the first voice is not completed, the second voice is made to be able to generate the second response for the second voice of the series inputted as the voice after the first voice.
- the voice segment detection unit 11 detects the second voice segment.
- the voice interaction control device 100 can perform interactive control so that the system can appropriately respond to the second voice input after the first voice.
- the voice interaction control apparatus 100 can generate a response without omission to the second voice input immediately after the end of the first voice section.
- voice dialogue control apparatus 100 constantly inputs voice to perform voice section detection, thereby eliminating the time when the user can not acquire voice uttered. it can.
- a speech dialogue control method for dialogue control comprising detecting a speech section from the beginning to the end forming the input series of speech, speech recognizing speech in the speech section, and corresponding to speech recognition result of speech A response, which generates a response to be presented to the user from the system 200, and performs control of each of speech segment detection, speech recognition of the speech, and generation of the response.
- a first response corresponding to a voice recognition result of the first voice after a first voice section forming a series of first voice inputted as voice is detected It is possible to generate a second response to a series of second voices input as voice after the first voice, even if processing for the first voice including processing until the system is presented to the user is not finished In order to do this, the second voice section that makes the second voice is detected.
- the voice interaction control method including such configuration, it is possible to perform interaction control so that the system can appropriately respond to the second voice input after the first voice.
- this voice dialogue control method it is possible to generate a response without omission to the second voice input immediately after the end of the first voice section.
- voice dialogue control method since voice is always input to perform voice section detection, it is possible to eliminate a time when the user can not obtain a voice to be uttered.
- FIG. 6 is a block diagram showing configurations of the voice interaction control device 101 and the system 200 in the second embodiment.
- the system 200 includes a dictionary database storage device 23 in addition to the configuration shown in the first embodiment.
- the voice recognition unit 12 of the voice dialogue control device 101 refers to the dictionary database stored in the dictionary database storage device 23 to perform voice recognition.
- voice dialog control device 101 includes voice storage unit 15.
- the voice storage unit 15 stores the voice in the voice section detected by the voice section detection unit 11.
- the voice storage unit 15 stores the second voice in the second voice section
- the present invention is not limited thereto, and the voice storage unit 15 may also store the first voice of the first voice section. .
- the dialogue control unit 14 causes the voice recognition unit 12 to perform voice recognition of the second voice stored in the voice storage unit 15 based on the notification indicating that the voice recognition unit 12 has finished voice recognition of the first voice, and generates a response.
- the unit 13 generates a second response corresponding to the speech recognition result of the second speech. Further, the dialogue control unit 14 causes the response generation unit 13 to generate the second response based on the notification indicating that the generation of the first response is completed in the response generation unit 13.
- the response generation unit 13 generates responses by referring to the system response database for each response corresponding to each speech recognition result.
- FIG. 7 is a diagram showing an example of the configuration of the system response database in the second embodiment.
- the system response database is composed of recognition vocabulary contained in the speech recognition result and a response corresponding to the speech recognition result. Also, depending on the configuration of the response presentation device 22 that presents the response to the user, a plurality of responses may be included.
- processing circuit 50 Each function of the voice storage unit 15 and the dialogue control unit 14 described above is realized by, for example, the processing circuit 50 shown in FIG. That is, the processing circuit 50 includes the voice storage unit 15 and the dialogue control unit 14 having the respective functions described above.
- the function of the voice storage unit 15 is realized by the memory 52, for example.
- the program stored in the memory 52 stores the second voice in the second voice section, and the second voice stored in the memory 52 based on the notification indicating that the voice recognition of the first voice is finished.
- the function and operation of generating a second response corresponding to the speech recognition result of the second speech are described.
- the program describes functions and operations for generating a second response based on a notification indicating that the generation of the first response is completed.
- FIG. 8 is a sequence chart showing an example of the operation of the voice interaction control apparatus 101 and the voice interaction control method according to the second embodiment.
- FIG. 9 is a flowchart showing an example of the operation of the voice interaction control apparatus 101 and the voice interaction control method according to the second embodiment.
- the second speech is input during generation of the first response.
- the second speech is input during speech recognition of the first speech.
- An example is shown.
- step S10 the voice activity detection unit 11 receives the first voice and detects the beginning of the first voice activity.
- “I want to go to the supermarket” uttered by the user is input as the first voice.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S20 the voice recognition unit 12 starts voice recognition of the first voice after the start end of the first voice section detected by the voice section detection unit 11 based on the notification of the start end detection.
- the speech recognition unit 12 starts speech recognition of the first speech with reference to the dictionary database.
- step S30 the voice activity detection unit 11 detects the end of the first voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S32 the voice activity detection unit 11 receives the second voice and detects the beginning of the second voice activity.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S34 the dialogue control unit 14 causes the voice storage unit 15 to start storing the second voice based on the notification of the detection of the start end of the second voice section.
- the dialogue control unit 14 causes the voice storage unit 15 to start storing the second voice based on the notification of the detection of the start end of the second voice section.
- illustration of the operation regarding this notification is omitted.
- step S40 the voice recognition unit 12 ends voice recognition of the first voice up to the end of the first voice section detected by the voice section detection unit 11 based on the notification of end detection.
- "super" is included as a recognition vocabulary.
- the voice recognition unit 12 notifies the dialogue control unit 14 of the end of the voice recognition.
- the dialogue control unit 14 controls the following step S50, step S62 and step S70 to be executed based on the notification.
- step S50 the response generation unit 13 starts generation of a first response corresponding to the speech recognition result of the first speech based on the control from the dialogue control unit 14.
- the response generator 13 refers to the system response database shown in FIG. 7 and starts generating the first response.
- step S62 the voice recognition unit 12 starts reading of the second voice from the voice storage unit 15 based on the control from the dialogue control unit 14.
- the voice storage unit 15 outputs the previously stored second voice to the voice recognition unit 12 with a time difference while storing the second voice in the second voice section.
- step S62 to the following step S73 are executed in parallel with the generation of the first response in the response generation unit 13.
- step S70 the voice recognition unit 12 starts voice recognition of the second voice from the beginning of the second voice section read from the voice storage unit 15 based on the notification of the beginning detection.
- the voice recognition unit 12 starts voice recognition of the second voice based on the notification that voice recognition of the first voice is finished, thereby performing voice recognition of the second voice after voice recognition of the first voice. It can start.
- the speech recognition unit 12 starts speech recognition of the second speech with reference to the dictionary database.
- step S71 the voice activity detection unit 11 detects the end of the second voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S72 the voice storage unit 15 ends the storage of the second voice.
- step S73 the voice storage unit 15 ends the reading of the second voice from the voice storage unit 15.
- the response generation unit 13 completes the generation of the first response.
- the response generation unit 13 generates a first response including “display the search result of the super.”
- the dialogue control unit 14 controls to present the first response from the response presentation device 22 to the user.
- the response presentation device 22 is a speaker
- the speaker presents the first response to the user by outputting a voice as "display the search result of the supermarket" according to the first response.
- the response presentation device 22 is a display device
- the display device presents the first response to the user by displaying “display the search result of the super.”
- the response generation unit 13 may generate a first response including a control signal for searching for a super.
- a destination search unit included in the system 200 searches for a supermarket based on the first response, and the response presentation device 22 presents the search result of the supermarket to the user.
- the response generation unit 13 notifies the dialogue control unit 14 that the generation of the first response is completed.
- step S100 the speech recognition unit 12 ends the speech recognition of the second speech up to the end of the second speech segment.
- the “convenience store” is included as a recognition vocabulary in the speech recognition result of the second speech. Further, the voice recognition unit 12 notifies the dialogue control unit 14 of the end of the voice recognition.
- step S110 the response generation unit 13 starts generation of a second response corresponding to the speech recognition result of the second voice input from the speech recognition unit 12 based on the control from the dialogue control unit 14.
- the response generation unit 13 refers to the system response database shown in FIG. 7 and starts generation of the second response.
- step S110 is performed after step S90. That is, the dialogue control unit 14 controls the process of step S110 to be executed based on the notification that the generation of the first response is completed.
- step S120 the response generation unit 13 completes the generation of the second response.
- the response generation unit 13 generates a second response including “display the search result of the convenience store” as information for voice output or display output.
- the dialogue control unit 14 controls the second response from the response presentation device 22 to be presented to the user.
- the response presentation device 22 is a speaker
- the speaker presents the second response to the user by outputting a voice as "display the search result of the convenience store” according to the second response.
- the response presentation device 22 is a display device
- the display device presents the second response to the user by displaying “display the search result of the convenience store” according to the second response.
- the response generation unit 13 may generate a second response including a control signal for searching a convenience store.
- the destination search unit included in the system 200 searches the convenience store based on the second response, and the response presentation device 22 presents the search result of the convenience store to the user.
- the voice stored in the voice storage unit 15 is not limited to the second voice.
- the voice storage unit 15 may also store the first voice. That is, after the voice dialogue control device 101 stores the first voice of the first voice section detected by the voice section detection unit 11 in the voice storage unit 15 once, it reads it out after a predetermined time elapses, and sends it to the voice recognition unit 12. Speech recognition may be performed.
- the voice dialogue control device 101 further includes the voice storage unit 15 that stores the second voice in the second voice section detected by the voice section detection unit 11.
- the dialogue control unit 14 causes the voice recognition unit 12 to recognize the second voice stored in the voice storage unit 15 based on the notification indicating that the voice recognition unit 12 has finished the voice recognition of the first voice.
- the response generation unit 13 generates a second response that corresponds to the result of speech recognition of the second speech.
- the voice interaction control apparatus 101 can obtain the second voice even during processing of the first voice, for example, during voice recognition or response generation. That is, the voice interaction control apparatus 101 can generate an appropriate response to each of a plurality of voices uttered by the user at any timing.
- the dialogue control unit 14 of the speech dialogue control device 101 causes the speech recognition unit 12 to perform speech recognition based on the notification indicating that the generation of the first response is completed by the response generation unit 13.
- the response generation unit 13 generates a second response corresponding to the speech recognition result of the second speech in the second speech segment.
- the voice interaction control apparatus 101 can sequentially present both the first response to the first voice and the second response to the second voice to the user. For example, immediately after the system inputs the first voice "I want to go to the supermarket" and starts the processing, if the user utters the second voice "I want to go to the convenience store after all," the conventional system is the second It is conceivable that only the response presenting the search result of the super can be performed because the speech can not be recognized. However, the voice interaction control apparatus 101 according to the present embodiment can input both the first voice and the second voice, and can present the search results of the supermarket and the search results of the convenience store, respectively.
- Embodiment 3 A voice dialogue control apparatus and a voice dialogue control method according to the third embodiment will be described.
- FIG. 10 is a block diagram showing configurations of the voice dialogue control device 102 and the system 200 in the third embodiment.
- voice dialogue control apparatus 102 includes a dialogue state determination unit 16.
- the dialogue state determination unit 16 determines whether the speech recognition result of the second speech recognized by the speech recognition unit 12 is to update the speech recognition result of the first speech.
- the dialogue control unit 14 Based on the determination result of the dialogue state determination unit 16, the dialogue control unit 14 terminates the process on the first voice halfway and causes the response generation unit 13 to generate a second response.
- Each function of the above-mentioned dialogue state judgment unit 16 and dialogue control unit 14 is realized by, for example, the processing circuit 50 shown in FIG. That is, the processing circuit 50 includes the dialogue state determination unit 16 and the dialogue control unit 14 having the respective functions described above.
- FIG. 11 is a sequence chart showing an example of the operation of the voice interaction control apparatus 102 and the voice interaction control method according to the third embodiment.
- FIG. 12 is a flowchart showing an example of the operation of the voice interaction control apparatus 102 and the voice interaction control method according to the third embodiment. In the following description, the description of the operation of the voice storage unit 15 is omitted, but the operation is the same as that of the second embodiment.
- step S10 the voice activity detection unit 11 receives the first voice and detects the beginning of the first voice activity.
- “I want to go to a convenience store” uttered by the user is input as the first voice.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S20 the voice recognition unit 12 starts voice recognition of the first voice after the start end of the first voice section detected by the voice section detection unit 11 based on the notification of the start end detection.
- the speech recognition unit 12 performs speech recognition with reference to the dictionary database.
- step S30 the voice activity detection unit 11 detects the end of the first voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S40 the voice recognition unit 12 ends voice recognition of the first voice up to the end of the first voice section detected by the voice section detection unit 11 based on the notification of end detection.
- "Convenience store” is included as a recognition vocabulary in the speech recognition result of the first speech. Further, the voice recognition unit 12 notifies the dialogue control unit 14 of the end of the voice recognition.
- step S50 the response generation unit 13 starts generation of a first response corresponding to the speech recognition result of the first speech based on the control from the dialogue control unit 14.
- the response generator 13 refers to the system response database shown in FIG. 7 and starts generating the first response.
- step S60 the voice activity detection unit 11 detects the beginning of the second voice activity of the second voice input after the first voice.
- “I want to go to a restaurant after all” spoken by the user is input as the second voice.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S70 the speech recognition unit 12 starts speech recognition of the second speech after the start of the second speech zone detected by the speech zone detection unit 11.
- the speech recognition unit 12 refers to the dictionary database stored in the dictionary database storage unit 23 to perform speech recognition.
- step S90 the voice activity detection unit 11 detects the end of the second voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S100 the speech recognition unit 12 ends the speech recognition of the second speech up to the end of the second speech segment.
- the "restaurant” is included as a recognition vocabulary in the speech recognition result of the second speech.
- the voice recognition unit 12 notifies the dialogue control unit 14 of the end of the voice recognition.
- step S102 the dialogue state determination unit 16 determines whether the speech recognition result of the second speech is to update the speech recognition result of the first speech and outputs the judgment result to the dialogue control unit 14. . In the present embodiment, it is determined whether the speech recognition result of the second speech including "restaurant" is to update the speech recognition result of the first speech including "convenience store". If it is determined that the update is not to be performed, step S104 is executed. If it is determined that the update is to be performed, step S106 is performed. In the present embodiment, the dialogue state determination unit 16 determines that the speech recognition result of the first speech including the “convenience store” updates the speech recognition result of the second speech including the “restaurant”.
- the dialogue state determination unit 16 may determine the necessity of updating based on the parallel relation of the vocabulary of “convenience store” and “restaurant”, and other vocabulary included in the second voice, for example, paradox
- the necessity of updating may be determined based on the conjunction "after all”.
- step S104 the response generation unit 13 completes the generation of the first response by the control of the dialog control unit 14 based on the determination result, and the response presentation device 22 Presents the first response to the user.
- the same response presentation as step S80 shown in the second embodiment is performed.
- a response to the second voice is presented to the response presentation device 22 after step S110 shown in FIG.
- step S106 based on the determination result, the dialogue control unit 14 ends the process on the first voice halfway.
- step S110 the response generation unit 13 starts generation of a second response corresponding to the speech recognition result of the second speech.
- the response generation unit 13 refers to the system response database shown in FIG. 7 and starts generation of the second response.
- the response generation unit 13 completes the generation of the second response.
- the response generation unit 13 generates a second response including “display the search result of the restaurant” as information for voice output or display output.
- the dialogue control unit 14 controls the second response from the response presentation device 22 to be presented to the user.
- the response presentation device 22 is a speaker
- the speaker presents the second response to the user by outputting a voice “display the search result of the restaurant” according to the second response.
- the response presentation device 22 is a display device
- the display device presents the second response to the user by displaying “display the search result of the restaurant” according to the second response.
- the response generation unit 13 may generate a second response including a control signal for searching a restaurant.
- the destination search unit included in the system 200 starts a restaurant search based on the second response, and the response presentation device 22 displays the restaurant search results.
- the dialogue control unit 14 cancels the processing for the first voice halfway and causes the second voice to be input. Control to generate only the corresponding second response. Thereby, only the second response is presented to the response presentation device 22.
- the voice interaction control device 102 determines that the speech recognition result of the second speech in the second speech segment recognized as speech by the speech recognition unit 12 is the speech recognition result of the first speech.
- the communication state determination unit 16 is further included to determine whether or not to update. Based on the determination result of the dialogue state determination unit 16, the dialogue control unit 14 terminates the process on the first voice halfway and causes the response generation unit 13 to generate a second response.
- the voice interaction control device 102 terminates the processing for the first voice halfway, and responds to the second voice.
- the voice interaction control device 102 terminates the processing for the first voice halfway, and responds to the second voice.
- user's operability can be enhanced. For example, immediately after the system inputs the first voice "I want to go to a convenience store" and starts the process, if the user utters a second voice "I want to go to a restaurant anyway", the conventional system is the second It is conceivable that only the response presenting the search result of the convenience store can be performed because the voice can not be recognized.
- the speech dialogue control device 102 in the third embodiment searches for a restaurant more responsive to the user's intention, ie, the second speech.
- the result can be presented earlier than the voice interaction control device 101 according to the second embodiment.
- FIG. 13 is a block diagram showing configurations of the voice interaction control device 103 and the system 200 in the fourth embodiment.
- the dictionary database storage unit 23 of the system 200 stores a plurality of dictionary databases.
- the dictionary database storage unit 23 stores a first dictionary database 24 and a second dictionary database 25.
- the first dictionary database 24 is a dictionary database prepared corresponding to the standby state of the system 200.
- the standby state is, for example, a state in which the voice input device 21 of the system 200 can receive an operation by the user, that is, a state in which the input of the first voice is awaited.
- the display device which is another user interface included in the system 200, displays, for example, a menu screen.
- the second dictionary database 25 is a dictionary database that corresponds to the state after the system 200 has recognized the first speech, and is associated with a specific vocabulary included in the speech recognition result of the first speech.
- the speech recognition unit 12 performs speech recognition with reference to one dictionary database corresponding to the state of the system 200 among a plurality of dictionary databases.
- the speech recognition unit 12 when the system 200 is in the standby state, the speech recognition unit 12 refers to the first dictionary database 24 as a dictionary database corresponding to the standby state to speak the first speech. recognize. Alternatively, when the system 200 is in the standby state, the speech recognition unit 12 refers to all the dictionary databases to refer to the first dictionary database 24 as one dictionary database corresponding to the standby state. Speech recognition.
- FIG. 14 is a diagram showing an example of a configuration of the first dictionary database 24 in the fourth embodiment.
- the first dictionary database 24 includes the state of the system 200 and the recognition vocabulary.
- the first screen in FIG. 14 is a standby screen such as a menu screen.
- the speech recognition unit 12 corresponds to that state.
- the second speech is speech-recognized with reference to a second dictionary database 25 associated with the specific vocabulary as one dictionary database.
- the speech recognition unit 12 or the dialogue control unit 14 determines whether the specific vocabulary is included in the voice recognition result of the first voice after voice recognition of the first voice, and determines that the specific vocabulary is included.
- the speech recognition unit 12 has a function of performing processing such as switching the dictionary database used for speech recognition according to the state of the system 200.
- FIG. 15 is a diagram showing an example of a configuration of the second dictionary database 25 in the fourth embodiment.
- the second dictionary database 25 includes the main state of the system 200, the related state of the system 200, and the recognition vocabulary.
- the response generation unit 13 generates a response corresponding to the speech recognition result of speech and information of one dictionary database referred to for speech recognition of the speech. For example, the response generation unit 13 generates a first response corresponding to the speech recognition result of the first speech and the information of the first dictionary database 24 referred to for speech recognition of the first speech. Alternatively, for example, the response generation unit 13 generates a second response corresponding to the speech recognition result of the second speech and the information of the second dictionary database 25 referred to for speech recognition of the second speech.
- FIG. 16 is a diagram showing an example of a configuration of a system response database in the fourth embodiment.
- the system response database is composed of recognition vocabulary contained in the speech recognition result, information of the dictionary database referenced for speech recognition, and responses corresponding thereto.
- processing circuit 50 Each function of the speech recognition unit 12 and the response generation unit 13 described above is realized by, for example, the processing circuit 50 shown in FIG. That is, the processing circuit 50 includes the speech recognition unit 12 and the response generation unit 13 having the respective functions described above.
- the program stored in the memory 52 may include one of a plurality of dictionary databases for speech.
- a function and an operation are described which perform speech recognition with reference to the dictionary database and generate a response corresponding to speech recognition result of speech and information of one dictionary database referred to for speech recognition of speech.
- the program refers to the first dictionary database 24 prepared corresponding to the standby state of the system 200 to recognize the first speech, and relates to the specific vocabulary included in the speech recognition result of the first speech.
- the function and operation of speech recognition of the second speech are described with reference to the second dictionary database 25.
- the program describes a function and an operation for generating a second response corresponding to the speech recognition result of the second speech and the information of the second dictionary database 25.
- FIG. 17 is a sequence chart showing an example of the operation of the voice interaction control apparatus 103 and the voice interaction control method according to the fourth embodiment.
- FIG. 18 is a flow chart showing an example of the operation of the voice interaction control apparatus 103 and the voice interaction control method according to the fourth embodiment.
- the description of the operation of the voice storage unit 15 is omitted, but the operation is the same as that of the second embodiment.
- step S10 the voice activity detection unit 11 receives the first voice and detects the beginning of the first voice activity.
- “reproduction” uttered by the user is input as the first voice.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- the speech recognition unit 12 selects the first dictionary database 24 corresponding to the standby state of the system 200.
- the speech recognition unit 12 acquires information indicating that the system 200 is in a standby state, and selects the first dictionary database 24 shown in FIG. 14 from among a plurality of dictionary databases based on the information.
- the information indicating that the voice recognition unit 12 acquires the standby state is information that the first screen is displayed.
- step S24 the speech recognition unit 12 refers to the first dictionary database 24 and starts speech recognition of the first speech after the start of the first speech segment detected by the speech segment detection unit 11.
- the speech recognition unit 12 refers to all the dictionary databases based on the information that the system 200 is in the standby state, and the first dictionary database 24 corresponding to the standby state.
- the first voice may be voice-recognized with reference to FIG.
- step S30 the voice activity detection unit 11 detects the end of the first voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S40 the voice recognition unit 12 ends voice recognition of the first voice up to the end of the first voice section detected by the voice section detection unit 11 based on the notification of end detection.
- the speech recognition result of the first speech includes "reproduction" as a recognition vocabulary.
- step S60 the voice activity detection unit 11 detects the beginning of the second voice activity of the second voice input after the first voice.
- “music” uttered by the user is input as the second voice.
- the detected start point is notified to the speech recognition unit 12 or the dialogue control unit 14.
- speech recognition unit 12 is in a state after system 200 is speech-recognized for the first speech, and second dictionary database 25 corresponding to a state in which a specific vocabulary is included in the speech recognition result of the first speech.
- the speech recognition unit 12 determines whether the speech recognition result of the first speech includes a specific vocabulary or not, and when it is determined that the specific vocabulary is included, a plurality of dictionary databases are used to specify the specific vocabulary.
- the related second dictionary database 25 is selected.
- the speech recognition unit 12 determines whether the speech recognition result of the first speech includes “reproduction” which is a specific vocabulary, and determines that it is included in the second speech shown in FIG. The second speech is recognized by referring to the dictionary database 25.
- step S 76 the speech recognition unit 12 refers to the second dictionary database 25 to start speech recognition of the second speech after the start of the second speech segment detected by the speech segment detection unit 11.
- the speech recognition unit 12 has a function of switching the dictionary database used for speech recognition from the first dictionary database 24 to the second dictionary database 25 according to the state of the system 200.
- step S90 the voice activity detection unit 11 detects the end of the second voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S100 the speech recognition unit 12 ends the speech recognition of the second speech up to the end of the second speech segment.
- the speech recognition result of the second speech and the information of the second dictionary database 25 referred to for speech recognition of the second speech are output to the response generation unit 13.
- music is included as a recognition vocabulary in the speech recognition result of the second speech.
- the voice recognition unit 12 notifies the dialogue control unit 14 of the end of the voice recognition.
- step S110 the response generation unit 13 starts generation of a second response corresponding to the speech recognition result of the second speech.
- the response generation unit 13 refers to the system response database shown in FIG. 16 and starts generation of the second response.
- step S120 the response generation unit 13 completes the generation of the second response.
- the recognition vocabulary is "music” and the dictionary database information is "second dictionary database”
- the response generation unit 13 generates a second response including "play music” as information for voice output.
- Generate The dialogue control unit 14 controls the second response from the response presentation device 22 to be presented to the user.
- the speaker included in the response presentation device 22 presents the second response to the user by outputting a voice "Play music” according to the second response.
- the response generation unit 13 generates a second response including a control signal for causing the music reproduction device included in the response presentation device 22 to reproduce music, and the music reproduction device reproduces music based on the second response. It is also good.
- the first dictionary database 24 referred to for the voice recognition result of the first voice and the voice recognition of the first voice.
- the above information is output to the response generation unit 13. Since the recognition vocabulary is "play" and the dictionary database information is "first dictionary database”, the response generation unit 13 includes "what to play?" As information for voice output or display output. The response is generated and the response presentation device 22 presents the first response to the user.
- the speech recognition unit 12 of the speech dialogue control apparatus 103 recognizes speech by referring to one of a plurality of dictionary databases according to the state of the system.
- the response generation unit 13 generates a response corresponding to the speech recognition result of speech and information of one dictionary database referred to for speech recognition of speech.
- the speech interaction control apparatus 103 can switch the dictionary database to be referred to in speech recognition according to the state of the system 200, that is, the interaction state, thereby generating an accurate response to the user's speech. be able to.
- the voice recognition unit 12 of the voice dialogue control device 103 refers to the first voice database with reference to the first dictionary database 24 prepared corresponding to the standby state of the system 200 among the plurality of dictionary databases.
- a second dictionary database 25 corresponding to a state after speech recognition of the first speech among a plurality of dictionary databases and associated with a specific vocabulary included in the speech recognition result of the first speech Speech recognition.
- the response generation unit 13 generates a second response corresponding to the speech recognition result of the second speech and the information of the second dictionary database referred to for speech recognition of the second speech.
- the voice dialog control device 103 can generate a response reflecting the contents of both the first voice and the second voice, and can generate an accurate response to the user's speech. .
- the conventional system can not recognize the second voice and the user It is conceivable to present a response asking what to play.
- the voice interaction control apparatus 103 in the present embodiment refers to the second dictionary database related to the voice recognition result of the first voice to recognize the second voice, the music is used in accordance with the user's intention. It can be played back.
- the Fifth Preferred Embodiment A voice dialogue control apparatus and a voice dialogue control method according to the fifth embodiment will be described. Descriptions of configurations and operations similar to those of the other embodiments will be omitted.
- FIG. 19 is a block diagram showing configurations of the voice dialogue control device 104 and the system 200 in the fifth embodiment.
- the response generation unit 13 further includes a confirmation response generation unit 17 that generates a confirmation response for causing the user to select one response from a plurality of responses generated corresponding to the speech recognition result of speech.
- the dialogue control unit 14 causes the system 200 to present a confirmation response to the user, causes the response generation unit to generate one response corresponding to the voice input by the user according to the confirmation response, and causes the system 200 to present it to the user.
- Each function of the confirmation response generation unit 17 and the response generation unit 13 described above is realized by, for example, the processing circuit shown in FIG. 2 or 3.
- the program stored in memory 52 is for the user to select one response from a plurality of responses generated corresponding to the speech recognition result of speech.
- the functions and operations that generate acknowledgments are described.
- the program describes a function and an operation that causes the system 200 to present a confirmation response to the user from the system 200, generate one response corresponding to voice input by the user according to the confirmation response, and cause the system 200 to present it to the user. ing.
- FIG. 20 is a flow chart showing an example of the operation of the speech dialog control device 104 and the speech dialog control method according to the fifth embodiment.
- steps S10 to S110 are the same as in the fourth embodiment, and therefore the description thereof is omitted.
- step S112 the response generation unit 13 determines whether a plurality of second responses corresponding to the speech recognition result of the second speech can be generated. For example, if a portable device for playing music and a CD (Compact Disc) player are provided in the system 200, the response generation unit 13 generates a second response including a control signal for playing the music stored in the portable device. , And a second response including a control signal to play the music stored on the CD. If it is determined that a plurality of second responses are not generated, step S120 is performed. In this case, the processes after step S120 are the same as in the fourth embodiment. If it is determined that a plurality of second responses are to be generated, step S122 is performed.
- a CD Compact Disc
- step S122 the confirmation response generation unit 17 generates a confirmation response for causing the user to select one second response among the plurality of second responses generated corresponding to the voice recognition result of the second voice.
- the acknowledgment generation unit 17 generates an acknowledgment including “Do you want to play music on a portable device or play music on a CD?” As information for audio output or display output.
- step S124 the dialog control unit 14 causes the response presentation device 22 to present the confirmation response to the user.
- the response presentation device 22 presents to the user "Do you want to play the music of the portable device or the music of the CD?", And the user re-enters the voice for operating the system according to the acknowledgment response. .
- the voice interaction control apparatus 104 generates one second response by voice recognition and response generation similar to the above steps.
- the response presentation device 22 plays the music of the portable device to present the selected one second response to the user.
- the response generation unit 13 of the speech dialog control device 104 is a confirmation response for causing the user to select one response from a plurality of responses generated corresponding to the result of speech recognition of speech. It further includes an acknowledgment generation unit 17 to generate.
- the dialogue control unit 14 causes the system 200 to present a confirmation response to the user, causes the response generation unit 13 to generate one response corresponding to the voice input by the user according to the confirmation response, and causes the system 200 to present it to the user.
- the voice dialog control device 104 can ask the user for confirmation when there is an ambiguity in the interaction between the user and the system.
- Embodiment 6 A voice dialogue control apparatus and a voice dialogue control method according to the sixth embodiment will be described.
- the configurations of the voice interaction control device 104 and the system 200 in the sixth embodiment are the same as in the fourth embodiment.
- the dialog control unit 14 in the present embodiment determines whether or not the elapsed time from the end of the first voice section to the start of the second voice section is equal to or greater than a specific value.
- the dialogue control unit 14 recognizes the second voice by referring to the first dictionary database 24 prepared corresponding to the standby state of the system 200 among the plurality of dictionary databases.
- the dialogue control unit 14 corresponds to the state after the speech recognition of the first speech among the plurality of dictionary databases and corresponds to the specific vocabulary included in the speech recognition result of the first speech.
- the second speech is recognized by referring to the associated second dictionary database.
- the dialogue control unit 14 determines the relevance between the first voice and the second voice based on whether the elapsed time is equal to or more than a threshold and generates a response to be presented to the user.
- the above-mentioned function of the dialogue control unit 14 is realized by, for example, the processing circuit shown in FIG. 2 or FIG.
- the program stored in the memory 52 determines whether the elapsed time from the end of the first voice period to the beginning of the second voice period is equal to or greater than a specified value. Based on the determination, a function and an operation for causing the second speech to be recognized by referring to the first dictionary database 24 prepared corresponding to the standby state of the system 200 among the plurality of dictionary databases are described.
- a second dictionary database corresponding to a state after speech recognition of the first speech among the plurality of dictionary databases and related to a specific vocabulary included in the speech recognition result of the first speech is referred to (2)
- a function and operation for speech recognition of speech are described.
- FIG. 21 is a flow chart showing an example of the operation of the voice interaction control apparatus 104 and the voice interaction control method according to the sixth embodiment. Steps S10 to S60 in the present embodiment are the same as in the fourth embodiment, and therefore, the description thereof is omitted.
- step S64 the dialogue control unit 14 determines whether or not the elapsed time from the end of the first speech zone to the beginning of the second speech zone is equal to or greater than a specific value. If it is determined that the elapsed time is not the specific value or more, that is, if it is determined that the utterances are related, step S74 is executed. If it is determined that the elapsed time is equal to or greater than the specific value, that is, if it is determined that there is no relevance between the utterances, step S70 is executed.
- steps S74 and S76 the speech recognition unit 12 starts speech recognition of the second speech after the start of the second speech zone detected by the speech zone detection unit 11.
- the speech recognition unit 12 performs speech recognition of the second speech by referring to the second dictionary database 25 associated with the specific vocabulary included in the speech recognition result of the first speech.
- Each process after step S74 is the same as each process in the fourth embodiment shown in FIG.
- step S70 the speech recognition unit 12 determines that the second speech after the start of the second speech segment detected by the speech segment detection unit 11 is Start recognition. However, the speech recognition unit 12 performs speech recognition with reference to the first dictionary database 24 prepared corresponding to the standby state of the system 200.
- step S90 the voice activity detection unit 11 detects the end of the second voice activity. The detected end is notified to the speech recognition unit 12 or the dialogue control unit 14.
- step S100 the speech recognition unit 12 ends the speech recognition of the second speech up to the end of the second speech segment.
- the speech recognition result of the second speech and the information of the first dictionary database 24 referred to for speech recognition of the second speech are output to the response generation unit 13. Note that "music" is included as a recognition vocabulary in the speech recognition result of the second speech.
- the voice recognition unit 12 notifies the dialogue control unit 14 of the end of the voice recognition.
- step S110 the response generation unit 13 starts generation of a second response corresponding to the speech recognition result of the second speech.
- the response generation unit 13 refers to the system response database shown in FIG. 16 and starts generation of the second response.
- step S120 the response generation unit 13 completes the generation of the second response.
- the recognition vocabulary is "music" and the dictionary database information is "first dictionary database”
- the response generation unit 13 performs the second response including "display music screen” as information for voice output.
- Generate The dialogue control unit 14 controls the second response from the response presentation device 22 to be presented to the user.
- the speaker included in the response presentation device 22 presents the second response to the user by outputting a voice as "Display music screen” according to the second response.
- the display device displays the music screen based on the second response. Good.
- the dialog control unit 14 of the voice dialog control device 104 determines whether the elapsed time from the end of the first voice section to the start of the second voice section is equal to or greater than a specific value. Based on the first dictionary database 24 which is prepared corresponding to the standby state of the system 200 among the plurality of dictionary databases to make the second speech voice-recognized, or the first speech of the plurality of dictionary databases The second speech is speech-recognized with reference to the second dictionary database 25 corresponding to the state after the speech recognition and associated with the specific vocabulary included in the speech recognition result of the first speech.
- the voice interaction control device 104 generates an accurate response to the user's speech by generating a response in consideration of the speech timing from the user in addition to the speech recognition result of the speech. Can.
- FIG. 22 is a block diagram showing an example of the configuration of the voice interaction control device 105 mounted on the vehicle 30.
- the voice interaction control device 105 is any one of the voice interaction control devices 100 to 104 shown in the first to sixth embodiments.
- the system 200 includes, for example, an on-vehicle device (not shown) such as a navigation device, an audio device, and a PND (Portable Navigation Device).
- the voice input device (not shown) of the on-vehicle apparatus inputs voice uttered by the user, the voice dialogue control device 105 generates a response corresponding to the voice, and the response presentation device (not shown) of the on-vehicle apparatus The response is presented to the user.
- FIG. 23 is a block diagram showing an example of the configuration of the voice interaction control device 105 provided in the server 40.
- Voice input from a voice input device (not shown) of the communication terminal 32 is received by the communication device 41 of the server 40 via the network and processed by the voice interaction control device 105.
- the voice interaction control device 105 generates a response corresponding to the voice.
- the generated response is presented to the user from the response presentation device (not shown) of the on-vehicle device 31 from the communication device 41 via the network.
- the response presentation device may be included in the communication terminal 32.
- the communication terminal 32 is, for example, a mobile phone, a smartphone, and a tablet.
- each component of the voice interaction control device 105 may be distributed and disposed in each device configuring the system 200. In that case, each function is realized by each component communicating with each other as appropriate.
- the configuration of the vehicle 30 or the on-vehicle device 31 can be simplified.
- the functions of the voice interaction control device 105 are realized.
- each embodiment can be freely combined, or each embodiment can be appropriately modified or omitted.
- the present invention has been described in detail, the above description is an exemplification in all aspects, and the present invention is not limited thereto. It is understood that countless variations not illustrated are conceivable without departing from the scope of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
La présente invention a pour but de fournir un dispositif de commande d'interaction vocale permettant de commander une interaction vocale de telle sorte qu'un système peut répondre de façon appropriée à une seconde entrée vocale après une première voix. Le dispositif de commande d'interaction vocale selon la présente invention permet au système de présenter à un utilisateur une réponse à une entrée vocale par l'utilisateur et comprend : une unité de détection de segment vocal permettant de détecter un segment vocal pour une série d'entrées vocales ; une unité de reconnaissance vocale permettant de reconnaître une voix dans un segment vocal ; une unité de génération de réponse permettant de générer une réponse correspondant aux résultats de reconnaissance vocale ; et une unité de commande d'interaction vocale qui commande l'unité de détection de segment vocal, l'unité de reconnaissance vocale et l'unité de génération de réponse. L'unité de commande d'interaction vocale amène l'unité de détection de segment vocal à détecter un second segment vocal constituant la seconde voix pour permettre à une seconde réponse d'être générée pour une seconde série de voix entrées après la première voix même lorsque le traitement pour la première voix n'est pas encore achevé, ce qui comprend le traitement jusqu'à ce que le système présente à l'utilisateur une première réponse à une première série de voix.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019542865A JP6851491B2 (ja) | 2017-09-20 | 2017-09-20 | 音声対話制御装置および音声対話制御方法 |
PCT/JP2017/033902 WO2019058453A1 (fr) | 2017-09-20 | 2017-09-20 | Dispositif et procédé de commande d'interaction vocale permettant de commander une interaction vocale |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/033902 WO2019058453A1 (fr) | 2017-09-20 | 2017-09-20 | Dispositif et procédé de commande d'interaction vocale permettant de commander une interaction vocale |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019058453A1 true WO2019058453A1 (fr) | 2019-03-28 |
Family
ID=65811399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/033902 WO2019058453A1 (fr) | 2017-09-20 | 2017-09-20 | Dispositif et procédé de commande d'interaction vocale permettant de commander une interaction vocale |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6851491B2 (fr) |
WO (1) | WO2019058453A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112599133A (zh) * | 2020-12-15 | 2021-04-02 | 北京百度网讯科技有限公司 | 基于车辆的语音处理方法、语音处理器、车载处理器 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001014165A (ja) * | 1999-06-30 | 2001-01-19 | Toshiba Corp | 応答生成装置、対話管理装置、応答生成方法および応答生成プログラムを格納するコンピュータ読み取り可能な記録媒体 |
JP2003058188A (ja) * | 2001-08-13 | 2003-02-28 | Fujitsu Ten Ltd | 音声対話システム |
JP2004037910A (ja) * | 2002-07-04 | 2004-02-05 | Denso Corp | 対話システム及び対話型しりとりシステム |
JP2015064450A (ja) * | 2013-09-24 | 2015-04-09 | シャープ株式会社 | 情報処理装置、サーバ、および、制御プログラム |
JP2017102320A (ja) * | 2015-12-03 | 2017-06-08 | アルパイン株式会社 | 音声認識装置 |
-
2017
- 2017-09-20 WO PCT/JP2017/033902 patent/WO2019058453A1/fr active Application Filing
- 2017-09-20 JP JP2019542865A patent/JP6851491B2/ja active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001014165A (ja) * | 1999-06-30 | 2001-01-19 | Toshiba Corp | 応答生成装置、対話管理装置、応答生成方法および応答生成プログラムを格納するコンピュータ読み取り可能な記録媒体 |
JP2003058188A (ja) * | 2001-08-13 | 2003-02-28 | Fujitsu Ten Ltd | 音声対話システム |
JP2004037910A (ja) * | 2002-07-04 | 2004-02-05 | Denso Corp | 対話システム及び対話型しりとりシステム |
JP2015064450A (ja) * | 2013-09-24 | 2015-04-09 | シャープ株式会社 | 情報処理装置、サーバ、および、制御プログラム |
JP2017102320A (ja) * | 2015-12-03 | 2017-06-08 | アルパイン株式会社 | 音声認識装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112599133A (zh) * | 2020-12-15 | 2021-04-02 | 北京百度网讯科技有限公司 | 基于车辆的语音处理方法、语音处理器、车载处理器 |
Also Published As
Publication number | Publication date |
---|---|
JP6851491B2 (ja) | 2021-03-31 |
JPWO2019058453A1 (ja) | 2019-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11356730B2 (en) | Systems and methods for routing content to an associated output device | |
KR101418163B1 (ko) | 컨텍스트 정보를 이용한 음성 인식 복구 | |
KR102100389B1 (ko) | 개인화된 엔티티 발음 학습 | |
US10706853B2 (en) | Speech dialogue device and speech dialogue method | |
EP3475942B1 (fr) | Systèmes et procédés d'acheminement de contenu vers un dispositif de sortie associé | |
US8484033B2 (en) | Speech recognizer control system, speech recognizer control method, and speech recognizer control program | |
US9092435B2 (en) | System and method for extraction of meta data from a digital media storage device for media selection in a vehicle | |
US7822613B2 (en) | Vehicle-mounted control apparatus and program that causes computer to execute method of providing guidance on the operation of the vehicle-mounted control apparatus | |
JP4260788B2 (ja) | 音声認識機器制御装置 | |
KR102360589B1 (ko) | 관련 출력 디바이스에 컨텐츠를 라우팅하기 위한 시스템 및 방법 | |
CN111095400A (zh) | 选择系统和方法 | |
US10599469B2 (en) | Methods to present the context of virtual assistant conversation | |
US20150039316A1 (en) | Systems and methods for managing dialog context in speech systems | |
JP2001083991A (ja) | ユーザインタフェース装置、ナビゲーションシステム、情報処理装置及び記録媒体 | |
JP7347217B2 (ja) | 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム | |
JP6851491B2 (ja) | 音声対話制御装置および音声対話制御方法 | |
JP7456387B2 (ja) | 情報処理装置、及び情報処理方法 | |
JP2006058641A (ja) | 音声認識装置 | |
JP4585759B2 (ja) | 音声合成装置、音声合成方法、プログラム、及び記録媒体 | |
JP2004354942A (ja) | 音声対話システム、音声対話方法及び音声対話プログラム | |
CN117090668A (zh) | 车辆排气声音调节方法、装置及车辆 | |
JP2001209394A (ja) | 音声認識装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17925620 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019542865 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17925620 Country of ref document: EP Kind code of ref document: A1 |