WO2015102039A1 - Appareil de reconnaissance vocale - Google Patents

Appareil de reconnaissance vocale Download PDF

Info

Publication number
WO2015102039A1
WO2015102039A1 PCT/JP2014/006171 JP2014006171W WO2015102039A1 WO 2015102039 A1 WO2015102039 A1 WO 2015102039A1 JP 2014006171 W JP2014006171 W JP 2014006171W WO 2015102039 A1 WO2015102039 A1 WO 2015102039A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
user
content
speech
voice recognition
Prior art date
Application number
PCT/JP2014/006171
Other languages
English (en)
Japanese (ja)
Inventor
鈴木 竜一
Original Assignee
株式会社デンソー
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社デンソー filed Critical 株式会社デンソー
Publication of WO2015102039A1 publication Critical patent/WO2015102039A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • This disclosure relates to a speech recognition apparatus (Speech Recognition Apparatus) that recognizes speech uttered by a user in an interactive manner.
  • Speech Recognition Apparatus Speech Recognition Apparatus
  • a speech recognition device interprets input information input by a user, identifies a dialog agent that makes a response corresponding to the input information, sends input information to the dialog agent, requests a response, and The response from the agent is output, and the processable information is inquired to multiple dialog agents, the input information and the processable information are collated, and the dialog agent that can process the input information is selected and selected.
  • a configuration in which input information is transmitted to a dialogue agent and a response is received see, for example, Patent Document 1).
  • the user's utterance history is recorded for each subject, a new scenario of dialogue in each subject is determined based on the utterance history, and a speech recognition dictionary to be referred to is based on the determined new scenario of dialogue.
  • a voice recognition device that determines and recognizes the user's utterance using the voice recognition dictionary as a reference target (see, for example, Patent Document 2).
  • the apparatus described in Patent Document 1 is configured to select a dialog agent that can handle input information, transmit input information to the selected dialog agent, and receive a response. It is possible to have a smooth conversation in a state close to a natural conversation in which the category of the category changes frequently, but when the interactive speech recognition process is completed and the interactive speech recognition process is started again In this case, the user needs to speak the same input information again, which makes the user feel bothersome. There is a possibility.
  • the device described in Patent Document 2 determines a new scenario for each subject based on the user's utterance history in order to improve the accuracy of speech recognition. Based on the above, the speech recognition dictionary to be referred to is limited. However, even in the apparatus described in Patent Document 2, when the interactive speech recognition process is started again after the interactive speech recognition process is completed, the speech dialog is performed so as to continue the past operation. In such a case, the user needs to speak the same input information again, and the user may feel annoyed.
  • This disclosure is intended to provide a voice recognition device that eliminates the user's annoyance due to the fact that past operations are not continued.
  • a speech recognition apparatus includes a storage control section and a speech recognition processing section.
  • the voice recognition device recognizes the utterance content spoken by the user, generates voice for voice conversation with the user based on the utterance content recognized by the voice recognition, and performs voice recognition processing in an interactive format.
  • the storage control section causes the storage unit to store at least one of the content recognized by the voice recognition and the content executed in response to the user's manual operation.
  • the voice recognition processing section When performing the voice recognition processing, the voice recognition processing section generates voice for voice conversation with the user using the content stored in the storage unit and performs the voice recognition processing.
  • At least one of the content recognized by the voice recognition and the content executed according to the user's manual operation is stored in the storage unit, and stored in the storage unit when the voice recognition process is performed. Since the voice for performing voice conversation with the user is generated using the content and the voice recognition process is performed, the user's troublesomeness caused by the past operation not being continued can be eliminated.
  • Diagram showing the overall configuration of the navigation device The figure which showed the composition of the voice recognition part and the voice dialogue control part Flowchart of control circuit and voice interaction control unit of navigation device
  • the figure which showed the composition of the voice recognition part and the voice dialogue control part Figure showing a display example of the context display screen image Diagram for explaining the difference between dialogs with and without continuing the context Diagram for explaining the difference between dialogs with and without continuing the context Diagram for explaining the difference between dialogs with and without continuing the context Diagram for explaining the difference between dialogs with and without continuing the context
  • FIG. 1 shows the overall configuration of a speech recognition apparatus according to an embodiment of the present disclosure.
  • the voice recognition device is configured as a navigation device 20 that is mounted and used in a vehicle (also referred to as a host vehicle).
  • the navigation device 20 recognizes speech content uttered by the user, generates speech for voice conversation with the user based on the speech content recognized by the speech recognition, and performs speech recognition processing in an interactive format. In addition, a process of executing an operation according to the utterance content recognized by the voice recognition is performed.
  • This navigation device 20 includes a position detector 21, a data input device 22, an operation switch group 23, a communication device 24, an external memory 25, a display device 26, a remote control sensor 27, and a control circuit. 28 and a voice recognition unit 10.
  • the position detector 21 includes a gyroscope 21a, a distance sensor 21b, and a GPS receiver 21c, and outputs various information for specifying the current position input from these to the control circuit 28.
  • the data input device 22 is a device for inputting map data for map display and route search.
  • the data input device 22 reads out necessary map data from a map data storage medium in which map data is stored in response to a request from the control circuit 28.
  • the map data storage medium includes not only map data for map display and route search, but also dictionary data used when the speech recognition unit 10 performs recognition processing.
  • the map data storage medium can be configured using a hard disk drive, CD, DVD, flash memory, or the like.
  • the operation switch group 23 includes various switches such as a touch switch arranged on the front surface of a display (also referred to as a display unit or a display panel) of a display device 26 described later and a mechanical switch provided around the display. Then, signals corresponding to various user switch operations are output to the control circuit 28.
  • a touch switch arranged on the front surface of a display (also referred to as a display unit or a display panel) of a display device 26 described later and a mechanical switch provided around the display.
  • the communication device 24 is for communicating with the outside, and is configured by a mobile communication device such as a mobile phone, for example.
  • the external memory 25 is composed of a portable storage medium such as a USB memory or an SD card. Various data are stored in the external memory 25.
  • the display device 26 has a display such as a liquid crystal, and displays video and images (including screen images) according to the video signal input from the control circuit 28 on the display.
  • the remote control sensor 27 receives a radio signal transmitted from a remote control 27a for performing a remote operation.
  • the control circuit 28 is configured as a computer including a CPU, ROM, RAM, I / O, and the like, and the CPU of the control circuit 28 performs various processes according to programs stored in the ROM. It should be noted that part or all of the processes executed by the program may be executed by hardware components.
  • host vehicle position detection processing for detecting the host vehicle position based on various information for specifying the current position of the host vehicle input from the position detector 21, and on the map around the host vehicle position Map display processing for displaying the host vehicle position mark superimposed on the destination, destination setting processing for setting the destination, route search processing for searching for a guidance route to the destination, travel guidance processing for performing travel guidance according to the guidance route, etc. .
  • the voice recognition unit 10 is a device that performs processing for recognizing input voice collected by the microphone 15, generates dialogue voice, and outputs (speaks) the dialogue voice from the speaker 14.
  • the voice recognition unit 10 is configured as one or a plurality of computers including a CPU, a RAM, a ROM, an I / O, and the like as an example. Various processes are performed in accordance with programs stored in the ROM. It should be noted that part or all of the processes executed by the program may be executed by hardware components.
  • the voice recognition unit 10 includes a voice synthesis circuit 11, a voice recognition circuit 12, and a voice dialogue control circuit 13. Each of these may be composed of individual computers, or may be composed of one or two computers.
  • a speaker 14 is connected to the speech synthesis circuit 11, and a microphone 15 is connected to the speech recognition circuit 12.
  • a PTT switch (Push Talk to Switch) 16 is connected to the voice interaction control circuit 13.
  • the voice recognition circuit 12 recognizes the input voice collected by the microphone 15 in accordance with an instruction from the voice dialogue control circuit 13, and notifies the voice dialogue control circuit 13 of the recognition result. That is, the speech recognition circuit 12 collates the speech data acquired from the microphone 15 using the stored dictionary data, and selects a higher-order comparison target pattern having a higher degree of matching than a plurality of comparison target pattern candidates. Output to the voice dialogue control circuit 13.
  • the voice recognition circuit 12 sequentially analyzes the voice data input from the microphone 15 to extract an acoustic feature quantity (for example, cepstrum), and uses the acoustic feature quantity time-series data obtained by the acoustic analysis as a result.
  • an acoustic feature quantity for example, cepstrum
  • acoustic feature quantity time-series data obtained by the acoustic analysis as a result.
  • HMM Hidden Markov Model
  • DP matching method or neural network each section corresponds to which word stored as dictionary data. Recognize word sequences.
  • the voice dialogue control circuit 13 instructs the voice synthesis circuit 11 to output a response voice based on the recognition result in the voice recognition circuit 12 and also instructs the control circuit 28 of the navigation device 20 based on the recognition result in the voice recognition circuit 12.
  • a destination and a command necessary for the travel guidance process are notified, and processing for instructing to set the destination and execute the command is performed.
  • the speech synthesizer circuit 11 has a waveform database in which various speech waveforms are converted into a database. Using the speech waveform stored in the waveform database, the speech synthesis control circuit 13 issues a response speech output instruction. Synthesize speech based on. Note that the synthesized speech synthesized by the speech synthesis circuit 11 is output from the speaker 14.
  • the voice recognition unit 10 speaks various commands for executing various processes such as route setting, route guidance, facility search, and facility display toward the microphone 15 while the user presses the PTT switch 16. It has become. Specifically, the voice dialogue control circuit 13 monitors the timing when the PTT switch 16 is pressed, the timing when the PTT switch 16 is returned, and the time during which the pressed state is continued. The recognition circuit 12 is instructed to execute processing. On the other hand, when the PTT switch 16 is not pressed, the processing is not executed. Therefore, voice data input via the microphone 15 while the PTT switch 16 is being pressed is output to the voice recognition circuit 12.
  • FIG. 2 shows configurations of the voice recognition circuit 12 and the voice dialogue control circuit 13.
  • the speech recognition circuit 12 includes a speech extraction unit 101, a speech recognition collation unit 103, a speech recognition result output unit 105, a speech recognition dictionary unit 107, and a target dictionary determination unit 109.
  • the voice recognition dictionary unit 107 includes a command correspondence dictionary 201, an address correspondence dictionary 203, a music correspondence dictionary 205, a telephone directory correspondence dictionary 207, and the like.
  • the voice extraction unit 101 extracts words from the voice data input from the microphone 15.
  • the target dictionary determining unit 109 determines a target dictionary to be used for speech recognition from the command corresponding dictionaries 201 to 207 of the speech recognition dictionary unit 107.
  • the speech recognition / collation unit 103 collates the words extracted by the speech extraction unit 101 using the target dictionary determined by the target dictionary determination unit 109 that determines the target dictionary.
  • the voice recognition result output unit 105 outputs the result of voice recognition based on the collation result of the voice recognition collation unit 103 to the voice dialogue processing unit 121 of the voice dialogue control circuit 13.
  • the voice dialogue control circuit 13 includes a voice dialogue processing unit 121, a function execution processing determination unit 123, a voice output content determination unit 125, and a context history management unit 127.
  • the voice dialogue processing unit 121 determines a phrase that matches the voice recognition result output from the voice recognition result output unit 105 of the voice recognition circuit 12 from among previously prepared question phrases and dialogue phrases. In addition, when determining a phrase that matches the speech recognition result, the voice interaction processing unit 121 according to the present embodiment can also determine a phrase using a context managed by the context history management unit 127 described later. ing.
  • the function execution process determination unit 123 determines a process to execute a function based on the content processed by the voice interaction processing unit 121 and notifies the control circuit 28 of the determined process. In addition, the function execution process determination unit 123 acquires the speech recognition result output from the speech recognition result output unit 105 via the speech recognition circuit 12 and notifies the control circuit 28 of the speech recognition result.
  • the voice output content determination unit 125 determines the voice data to be output based on the content processed by the voice dialog processing unit 121 and notifies the voice synthesis circuit 11 of the determined voice data.
  • the control circuit 28 executes the function in accordance with the notification from the function execution process determination unit 123, and when the function execution is completed, notifies the context history management unit 127 of the executed content. In addition, the control circuit 28 notifies the context history management unit 127 of the voice recognition result acquired via the function execution process determination unit 123.
  • the context history management unit 127 has a memory (not shown), and sequentially stores contents (contexts) notified from the control circuit 28 in this memory to manage the context history.
  • the context refers to the utterance content uttered by the user in the voice conversation or the operation content executed in response to the user's manual operation. For example, when the user speaks “Destination” and then speaks “Tokyo” in a voice dialogue, “Destination, Tokyo” is stored as a context in the memory of the context history management unit 127. Further, when the user selects and operates “AM radio” after manually selecting “audio”, “audio and AM radio” is stored in the memory of the context history management unit 127 as a context.
  • FIG. 3 shows a flowchart of the voice recognition process of the navigation device 20.
  • the processing of the control circuit 28 and the voice recognition unit 10 of the navigation device 20 will be described as the voice recognition processing of the navigation device 20.
  • the ignition switch of the vehicle changes from the off state to the on operation state
  • the navigation device 20 enters an operating state.
  • the control circuit 28 and the voice recognition unit 10 of the navigation device 20 perform the process shown in FIG.
  • each section is expressed as S100, for example.
  • each section can be divided into a plurality of subsections, while a plurality of sections can be combined into one section.
  • each section can be referred to as a device, module, or means.
  • each of the above sections or a combination thereof includes not only (i) a section of software combined with a hardware unit (eg, a computer), but also (ii) hardware (eg, an integrated circuit, As a section of (wiring logic circuit), it can be realized with or without the function of related devices.
  • the hardware section can be included inside the microcomputer.
  • a voice recognition top screen image (also referred to as a top screen) is displayed on the display unit (or display panel) of the display device 26 (S100). Specifically, a voice recognition top screen image is displayed on the display unit of the display device 26 according to an instruction from the control circuit 28 of the navigation device 20.
  • FIG. 4 shows a display example of the voice recognition top screen image.
  • This voice recognition top screen image includes a message “Yes” and “No” along with a message “Do you want to continue the context?”.
  • “Yes” is selected
  • it is not desired to continue the utterance content in the past speech recognition processing “No” is selected. If no context is stored in the memory of the context history management unit 127, only “No” can be selected.
  • control circuit 28 determines that the context is continued when “Yes” is selected by the user, and is not continued when “No” is selected by the user.
  • a user voice input is performed (S200). During a period when the PTT switch 16 is pressed by a user operation, for example, as shown in FIG. 6A, when the user speaks “set destination”, the voice data from the microphone 15 is converted into a voice recognition circuit. 12 is input.
  • the determination in S204 is NO, and a dialog voice is generated and outputted (S206).
  • This processing is performed by the processing of the voice interaction processing unit 121 and the voice output content determination unit 125.
  • a dialogue voice such as “Please tell the destination” is generated for the utterance content “Set destination”, this dialogue voice is output from the speaker 14, and the process returns to S 200.
  • “Tokyo” is recognized as a voice in S 202, and a dialog voice such as “Set destination to Tokyo” is generated in S 206. Sound is output.
  • the voice recognition unit 10 notifies the control circuit 28 that the destination is set to Tokyo, and in response to this notification, the control circuit 28 sets the destination to Tokyo and sets the destination to Tokyo. Is notified to the voice recognition unit 10.
  • the context is stored (S106). Specifically, the control circuit 28 instructs the context history management unit 127 to store the context.
  • “Destination” is associated with “Tokyo” which is a specific place name
  • “Destination, Tokyo” is stored as a context in the memory of the context history management unit 127, and this processing is terminated.
  • the voice recognition top screen image as shown in FIG. 4 is displayed on the display unit of the display device 26 (S100), and then it is determined whether or not to continue the context (S102).
  • a user voice input is performed (S200). As shown in FIG. 6A, for example, when the user speaks “What is the weather today?” During the period in which the PTT switch 16 is pressed by a user operation, the voice data from the microphone 15 is converted into a voice recognition circuit. 12 is input.
  • the determination in S204 is NO, and a dialog voice is generated. Audio is output (S206). For example, an interactive voice such as “Where do you want to know the weather?” Is generated for the utterance content “What is the weather today?”, The interactive voice is output from the speaker 14, and the process returns to S 200.
  • the voice recognition unit 10 notifies the control circuit 28 that the voice recognition is finished, and in response to this notice, the control circuit 28 finishes the voice recognition and informs that the weather information of today's Tokyo has been notified. Notify the recognition unit 10.
  • the context is stored (S106).
  • “weather information” and a specific place name “Tokyo” are associated with each other, “weather information, Tokyo” is stored in the memory of the context history management unit 127 as a context, and this process ends.
  • the voice recognition top screen image as shown in FIG. 4 is displayed on the display unit of the display device 26 (S100).
  • FIG. 5 shows a display example of a context display screen image.
  • the context display screen image includes a context display image 300 and a reset button 310.
  • the reset button 310 is a button used when all contexts are reset.
  • the context is determined (S110). Further, in the display example of FIG. 5, the context indicating “Destination Tokyo” is highlighted. Here, for example, when the user utters “next”, the uttered content is recognized as speech, and “destination AA restaurant” is highlighted. Further, in accordance with the display screen image of the context as shown in FIG. 5, when the user utters “fifth”, the uttered content is recognized as voice, and “air conditioner on” is highlighted. As described above, the context to be highlighted is switched according to the content of the user's utterance. When the user utters “determine”, the highlighted context is determined as the context used for speech recognition.
  • the process proceeds to the second voice recognition process (S300 to S306).
  • the voice recognition process is performed by generating voice for voice conversation with the user so as to continue the utterance contents in the past voice recognition process using the context determined in S110.
  • a user's voice input is performed (S300). For example, when the user speaks “What is the weather today?” During the period in which the PTT switch 16 is being pressed by a user operation, voice data from the microphone 15 is input to the voice recognition circuit 12.
  • voice recognition processing is performed (S302). Specifically, the voice dialogue control circuit 13 instructs the voice recognition circuit 12 to execute voice recognition processing, and the voice recognition circuit 12 performs voice recognition processing of voice data from the microphone 15 in response to this instruction.
  • a target dictionary to be used for speech recognition is specified based on the context determined in S110, and speech recognition processing is performed using this target dictionary. For example, when the context determined in S110 is “Destination, Tokyo”, for example, using the address correspondence dictionary 203 or the telephone directory correspondence dictionary 207 related to the destination setting, The music correspondence dictionary 205 that does not exist is not used. In this way, the recognition rate is improved by using the minimum necessary dictionary.
  • S304 it is determined whether or not the operation to be executed has been determined. Specifically, whether or not the operation to be executed has been determined based on whether or not the utterance content recognized in the voice recognition processing in S202 is an operation command that instructs execution of a predetermined function. Determine whether.
  • the determination in S304 is NO and the voice generated by generating the dialog voice is output (S306).
  • a voice for voice conversation with the user is generated so as to continue the utterance contents in the past voice recognition processing.
  • the context determined in S110 is “Destination, Tokyo” and the user's utterance content is recognized as “What is today's weather?”
  • S302 a specific place name included in the context is used.
  • a phrase such as “Today's weather in Tokyo is sunny” is generated by combining a certain “Tokyo” and “How is the weather today?”. Then, this phrase is output as audio from the speaker 14.
  • the voice recognition unit 10 notifies the control circuit 28 that the voice recognition is finished, and in response to this notice, the control circuit 28 finishes the voice recognition and informs that the weather information of today's Tokyo has been notified. Notify the recognition unit 10.
  • the context is stored (S114).
  • “weather information” and a specific place name “Tokyo” are associated with each other, “weather information, Tokyo” is stored in the memory of the context history management unit 127 as a context, and this process ends.
  • Fig. 7 shows an example of dialogue when searching for a song by artist name and displaying the artist's album list.
  • A is an example of interaction when the context is not continued
  • (b) is an example of interaction when the context is continued.
  • a dialogue voice “Please tell the artist name” is output from the speaker 14, and then the user ’s “artist name”.
  • a dialogue voice “Play Michael's song” is output from the speaker 14, and when the function is executed, “Artist, Michael” is the context history management unit 127 as the context. Stored in the memory.
  • the speaker 14 responds to the user's utterance “display album list” from the speaker 14.
  • a dialogue voice “Michael's album list is displayed” is output from the speaker 14 in response to the user's utterance “Michael”. The That is, the input “Michael” is repeated.
  • “artist Michael” is stored as a context in the memory of the context history management unit 127, and after a series of speech recognition processing is completed, the speech recognition processing is started again.
  • a dialogue voice “display Michael's album list” is output from the speaker 14. That is, the list of Michaels can be displayed without inputting "Michael” again.
  • Fig. 8 shows an example of a dialog when searching for a destination and wanting to call the destination.
  • (A) is an example of interaction when the context is not continued, and
  • (b) is an example of interaction when the context is continued.
  • a dialogue voice “Please tell the destination” is output from the speaker 14, and then the user “AA restaurant”.
  • the speaker 14 outputs a dialogue voice “Set AA restaurant as the destination”, and when the function is executed, “Destination, AA restaurant” is stored in the memory of the context history management unit 127 as a context. Is done.
  • the speaker 14 responds to the user's utterance “make a call” and “where to call”
  • the speaker 14 outputs the conversation voice “I will call the AA restaurant”. That is, the input “AA restaurant” is repeated.
  • “Destination, AA restaurant” is stored in the memory of the context history management unit 127 as a context, and after a series of voice recognition processes, the voice recognition process is performed again.
  • a dialogue voice “Call a AA restaurant” is output from the speaker 14 in response to the user's “call” utterance. That is, Michael's list can be displayed without inputting "AA restaurant” again.
  • the ability to perform recognition processing with continued context eliminates the hassle of repeating the same input over and over, and also reduces the number of conversations that can be exchanged in a vehicle interior environment. Can ensure safety.
  • the content recognized by the voice recognition is stored in the memory of the context history management unit 127 as a context, and when performing the voice recognition process, the user has a voice conversation using the context stored in the memory. For this reason, the voice recognition process is performed and the voice recognition process is performed, so that the user's troublesomeness caused by the past operation not being continued can be eliminated.
  • the user confirms whether or not to perform a voice conversation using the context stored in the memory, and if the voice conversation using the content stored in the memory is confirmed, the context stored in the memory is changed. Since the voice for voice dialogue with the user is generated, it is possible to prevent the voice using the context from being voiced against the intention of the user.
  • the context stored in the memory is displayed on the display unit, the content of the conversation that continues according to the user operation is specified, and the voice for voice conversation with the user is generated using the specified content.
  • the user can easily specify the content of the ongoing dialogue.
  • the context stored in the memory can be displayed on the display unit in order from the latest one. Further, it is possible to display the content of the latest certain number of times (for example, 5 times) on the display unit.
  • the dictionary used for speech recognition is changed according to the context specified according to the user operation, it is possible to improve the recognition rate of speech recognition. It is also possible to specify the content of the dialogue that continues in response to an instruction by the user's utterance.
  • the context stored in the memory can be deleted in response to a user operation.
  • the configuration is shown in which all contexts are erased by operating the reset button 310 for erasing all contexts stored in the memory in response to a user operation.
  • contexts are selected one by one. And can be configured to be erased.
  • the content recognized by the speech recognition is stored as a context in the memory of the context history management unit 127, and after the speech recognition process is completed, the content is stored in the memory when the speech recognition process is performed again.
  • the voice for voice interaction was generated and the speech recognition process was performed so that the utterance contents in the past voice recognition process were continued using the determined context.
  • the voice recognition process may be performed by generating a voice for voice conversation with the user using the context stored in the memory. it can.
  • the memory of the context history management unit 127 is also referred to as a storage unit / device / means, and S106 and S114 are also referred to as a storage control section / device / means or a storage instruction section / device / means.
  • S306 is also referred to as a speech recognition processing section / device / means or content usage recognition section / device / means
  • S102 is also referred to as a confirmation section / device / means or content confirmation section / device / means
  • S108 is referred to as a display control section /
  • S110 is referred to as specific section / device / means or content specific section / device / means
  • voice Power content determination unit 125 is referred to as sound generation unit / device / means-reset button 310 is referred to as erasing unit / device / means-or content deletion unit / device / means-.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Navigation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Dans cet appareil de reconnaissance vocale, le contenu reconnu par reconnaissance vocale et/ou le contenu exécuté conformément à une opération manuelle d'un utilisateur est enregistré (S106, S114) dans une mémoire, et un processus de reconnaissance vocale est exécuté. Lorsqu'un processus de reconnaissance vocale est à nouveau exécuté par la suite, le contenu enregistré dans la mémoire est utilisé afin de générer une voix pour une interaction vocale avec l'utilisateur, et un processus de reconnaissance vocale (S300-S306) est exécuté.
PCT/JP2014/006171 2014-01-06 2014-12-11 Appareil de reconnaissance vocale WO2015102039A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014-000264 2014-01-06
JP2014000264A JP2015129793A (ja) 2014-01-06 2014-01-06 音声認識装置

Publications (1)

Publication Number Publication Date
WO2015102039A1 true WO2015102039A1 (fr) 2015-07-09

Family

ID=53493388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/006171 WO2015102039A1 (fr) 2014-01-06 2014-12-11 Appareil de reconnaissance vocale

Country Status (2)

Country Link
JP (1) JP2015129793A (fr)
WO (1) WO2015102039A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10269351B2 (en) 2017-05-16 2019-04-23 Google Llc Systems, methods, and apparatuses for resuming dialog sessions via automated assistant

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007264198A (ja) * 2006-03-28 2007-10-11 Toshiba Corp 対話装置、対話方法、対話システム、コンピュータプログラム及び対話シナリオ生成装置
JP2008083100A (ja) * 2006-09-25 2008-04-10 Toshiba Corp 音声対話装置及びその方法
JP2010073192A (ja) * 2008-08-20 2010-04-02 Universal Entertainment Corp 会話シナリオ編集装置、ユーザ端末装置、並びに電話取り次ぎシステム
JP2012008554A (ja) * 2010-05-24 2012-01-12 Denso Corp 音声認識装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007264198A (ja) * 2006-03-28 2007-10-11 Toshiba Corp 対話装置、対話方法、対話システム、コンピュータプログラム及び対話シナリオ生成装置
JP2008083100A (ja) * 2006-09-25 2008-04-10 Toshiba Corp 音声対話装置及びその方法
JP2010073192A (ja) * 2008-08-20 2010-04-02 Universal Entertainment Corp 会話シナリオ編集装置、ユーザ端末装置、並びに電話取り次ぎシステム
JP2012008554A (ja) * 2010-05-24 2012-01-12 Denso Corp 音声認識装置

Also Published As

Publication number Publication date
JP2015129793A (ja) 2015-07-16

Similar Documents

Publication Publication Date Title
JP5821639B2 (ja) 音声認識装置
US10706853B2 (en) Speech dialogue device and speech dialogue method
US9239829B2 (en) Speech recognition device
EP1450349B1 (fr) Dispositif de contrôle embarqué et programme amenant un ordinateur à exécuter un procédé visant à fournir un guidage d'opération du dispositif de contrôle embarqué
TWI281146B (en) Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition
US9123327B2 (en) Voice recognition apparatus for recognizing a command portion and a data portion of a voice input
WO2015098109A1 (fr) Dispositif de traitement de reconnaissance de paroles, procede de traitement de reconnaissance de paroles et dispositif d'affichage
CN105355202A (zh) 语音识别装置、具有语音识别装置的车辆及其控制方法
JP2009169139A (ja) 音声認識装置
JP2010191400A (ja) 音声認識装置およびデータ更新方法
JP3702867B2 (ja) 音声制御装置
EP3540565A1 (fr) Procédé de commande pour dispositif de traduction, dispositif de traduction et programme
CN103426429B (zh) 语音控制方法和装置
JP2008145693A (ja) 情報処理装置及び情報処理方法
EP1899955B1 (fr) Procede et systeme de dialogue vocal
US20170301349A1 (en) Speech recognition system
WO2015102039A1 (fr) Appareil de reconnaissance vocale
JP4498906B2 (ja) 音声認識装置
JP2007183516A (ja) 音声対話装置及び音声認識方法
JP2011180416A (ja) 音声合成装置、音声合成方法およびカーナビゲーションシステム
JP4093394B2 (ja) 音声認識装置
JP2005114964A (ja) 音声認識方法および音声認識処理装置
JP2017102320A (ja) 音声認識装置
JP2008233009A (ja) カーナビゲーション装置及びカーナビゲーション装置用プログラム
JPH06110495A (ja) 音声認識装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14876040

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14876040

Country of ref document: EP

Kind code of ref document: A1