WO2016103465A1 - 音声認識システム - Google Patents
音声認識システム Download PDFInfo
- Publication number
- WO2016103465A1 WO2016103465A1 PCT/JP2014/084571 JP2014084571W WO2016103465A1 WO 2016103465 A1 WO2016103465 A1 WO 2016103465A1 JP 2014084571 W JP2014084571 W JP 2014084571W WO 2016103465 A1 WO2016103465 A1 WO 2016103465A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- voice
- recognition result
- user
- recognition
- Prior art date
Links
- 230000009471 action Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 description 118
- 238000003860 storage Methods 0.000 description 35
- 238000012545 processing Methods 0.000 description 30
- 238000000034 method Methods 0.000 description 23
- 230000008569 process Effects 0.000 description 17
- 238000013500 data storage Methods 0.000 description 11
- 238000003825 pressing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000004044 response Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000005401 electroluminescence Methods 0.000 description 1
- 230000005674 electromagnetic induction Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3605—Destination input or retrieval
- G01C21/3608—Destination input or retrieval using speech input, e.g. using speech recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04817—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present invention relates to a speech recognition system that recognizes a user's utterance.
- the user needs to consider the content that the system wants to recognize in advance, and then utter after giving a speech recognition start instruction by pressing a PTT (Push To Talk) button or the like.
- PTT Push To Talk
- a word that appears in a natural conversation between users cannot be automatically recognized, so the user must speak again after pressing the PTT button or the like to recognize the word. For this reason, there are problems that the operation is troublesome and that the contents to be recognized are forgotten.
- Patent Document 1 describes an operation control device that always recognizes a voice and generates and displays a shortcut button for executing a function corresponding to the recognition result.
- Patent Document 1 since the function corresponding to the recognition result is executed only when the user presses the shortcut button, it is possible to prevent the user from operating arbitrarily against the user's intention. However, in the case of Patent Document 1, a part of information displayed on the screen is hidden by the shortcut button, or the display content changes due to the screen update when the shortcut button is displayed. There is a problem of causing uncomfortable feeling or reducing concentration when driving.
- the present invention has been made to solve the above problems, and presents a function execution button for always recognizing voice and executing a function corresponding to the recognition result at a timing required by the user.
- An object of the present invention is to provide a voice recognition system capable of performing the above.
- the speech recognition system includes a speech acquisition unit that acquires speech uttered by a user over a preset speech acquisition period, a speech recognition unit that recognizes speech acquired by the speech acquisition unit, and a user A determination unit that determines whether or not a predetermined operation or operation has been performed, and a function that corresponds to the recognition result of the voice recognition unit when the determination unit determines that the user has performed a predetermined operation or operation And a display control unit that displays on the display unit a function execution button to be executed by the controlled device.
- the present invention captures voice over a preset voice acquisition period and displays a function execution button based on the utterance content when a predetermined operation or action is performed by the user.
- a function execution button based on the utterance content when a predetermined operation or action is performed by the user.
- an operation contrary to the user's intention is not generated, and further, a decrease in concentration due to a screen update when the function execution button is displayed can be suppressed.
- the user is presented with a function execution button that pre-reads his / her intention to operate, user friendliness and ease of use can be improved.
- FIG. 1 It is a block diagram which shows an example of the navigation system to which the speech recognition system which concerns on Embodiment 1 of this invention is applied. It is a schematic block diagram which shows the main hardware structures of the navigation system to which the speech recognition system which concerns on Embodiment 1 is applied. 4 is an explanatory diagram for explaining an outline of an operation of the speech recognition system according to Embodiment 1.
- FIG. It is a figure which shows the example of the recognition result character string and recognition result type contained in a recognition result. It is a figure which shows an example of a response
- FIG. 5 is a flowchart illustrating processing for holding a recognition result of a user utterance in the voice recognition system according to Embodiment 1.
- 4 is a flowchart illustrating processing for displaying a function execution button in the voice recognition system according to the first embodiment. It is a figure which shows the example of a display of a function execution button. It is a figure which shows the example of a storage of the recognition result by a recognition result storage part. It is a figure which shows an example of the display mode of a function execution button. It is a block diagram which shows the modification of the speech recognition system which concerns on Embodiment 1. FIG. It is a figure which shows an example of a response
- FIG. 1 is a block diagram showing an example of a navigation system 1 to which a speech recognition system 2 according to Embodiment 1 of the present invention is applied.
- the navigation system 1 includes a control unit 3, an input reception unit 5, a navigation unit 6, a voice control unit 7, a voice acquisition unit 10, a voice recognition unit 11, a determination unit 14, and a display control unit 15.
- the configuration requirements of the navigation system 1 may be distributed to a server on a network, a mobile terminal such as a smartphone, and an in-vehicle device.
- the voice acquisition unit 10, the voice recognition unit 11, the determination unit 14, and the display control unit 15 constitute the voice recognition system 2.
- FIG. 2 is a schematic diagram showing main hardware configurations of the navigation system 1 and its peripheral devices in the first embodiment.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- HDD Hard Disk Drive
- the CPU 101 reads out and executes various programs stored in the ROM 102 or the HDD 104, thereby cooperating with each hardware to control the control unit 3, the input receiving unit 5, the navigation unit 6, and the voice control unit 7 of the navigation system 1.
- the functions of the voice acquisition unit 10, the voice recognition unit 11, the determination unit 14, and the display control unit 15 are realized.
- the input device 105 is an instruction input unit 4, an input reception unit 5, and a microphone 9.
- the output device 106 is the speaker 8 and the display unit 18.
- the voice recognition system 2 continuously takes the voice collected by the microphone 9 over a preset voice acquisition period, recognizes a predetermined keyword, and holds the recognition result. Then, the voice recognition system 2 determines whether or not a predetermined operation has been performed on the navigation system 1 by the user of the moving body, and when the operation is performed, recognition is performed using the held recognition result. A function execution button for executing a function corresponding to the result is generated, and the generated function execution button is output to the display unit 18.
- the preset voice acquisition period will be described later.
- the system 2 displays the “miss child” button SW1, the “restaurant” button SW2, and the “convenience store” button SW3, which are function execution buttons corresponding to the recognition results “mischild”, “restaurant”, and “convenience store” on the display unit 18. .
- These function execution buttons are software (SW) keys displayed on the menu screen.
- the “Destination setting” button SW11, “AV” button SW12, “Telephone” button SW13, and “Setting” button SW14 are not function execution buttons but software keys.
- the navigation unit 6 of the navigation system 1 searches for a convenience store around the current location and displays the search result on the display unit 18.
- the user B presses the “menu” button HW1 to display the menu screen, for example, Press the “Set Destination” button SW11 to display the destination search screen, press the “Search nearby facilities” button on the destination search screen to display the surrounding facility search screen, and select “Convenience Store” as the search key.
- Setting and instructing execution of search That is, a function that is normally called and executed by performing a plurality of operations can be called and executed by a single operation of the function execution button.
- the control unit 3 controls the operation of the entire navigation system 1.
- the microphone 9 collects the voice spoken by the user.
- the microphone 9 for example, an omnidirectional (omnidirectional) microphone, an array microphone in which a plurality of omnidirectional (omnidirectional) microphones are arranged in an array shape, and the directivity can be adjusted, or one direction There is a unidirectional microphone that has directivity only and cannot adjust directivity.
- the display unit 18 is, for example, an LCD (Liquid Crystal Display) or an organic EL (Electroluminescence) display.
- the display unit 18 may be a display-integrated touch panel that includes an LCD or organic EL display and a touch sensor.
- the instruction input unit 4 inputs a user's manual instruction.
- a user's manual instruction For example, a hardware button (key), a switch, a touch sensor, a remote controller installed on a handle or the like, a separate remote controller, a recognition device for recognizing an instruction by a gesture operation, etc. Can be mentioned. Note that any of a pressure-sensitive method, an electromagnetic induction method, a capacitance method, or a combination of these may be used for the touch sensor.
- the input receiving unit 5 receives the instruction input from the instruction input unit 4 and outputs the instruction to the control unit 3.
- the navigation unit 6 performs screen transitions according to user operations received by the input reception unit 5 and input via the control unit 3, or performs facility search and address search using map data (not shown). Perform various searches. Further, it calculates the route to the address or facility set by the user, generates voice information and display contents for route guidance, and controls the display control unit 15 and the voice control unit 7 described later to output them. This is instructed via part 3.
- the navigation unit 6 performs a music search by a music name or an artist name, plays a music, or executes other in-vehicle devices such as an air conditioner according to a user instruction.
- the voice control unit 7 outputs, from the speaker 8, guidance voice, music, and the like instructed from the navigation unit 6 via the control unit 3.
- the voice acquisition unit 10 continuously takes in the voice collected by the microphone 9 and performs A / D (Analog / Digital) conversion by PCM (Pulse Code Modulation), for example.
- a / D Analog / Digital
- PCM Pulse Code Modulation
- “continuous” means “over a preset voice acquisition period” and is not limited to “always”.
- the “voice acquisition period” includes, for example, a period of 5 minutes after the navigation system 1 is activated, 1 minute after the moving body stops, or a period from when the navigation system 1 is activated until it stops. Shall be. In the first embodiment, it is assumed that the voice acquisition unit 10 captures a voice from when the navigation system 1 starts up to when it stops.
- the microphone 9 and the sound acquisition unit 10 are separate as described above, but the sound acquisition unit 10 may be built in the microphone 9.
- the voice recognition unit 11 includes a processing unit 12 and a recognition result storage unit 13.
- the processing unit 12 detects, from the voice data digitized by the voice acquisition unit 10, a voice section corresponding to the content spoken by the user (hereinafter referred to as “speech section”), and the voice data of the speech section A feature amount is extracted, recognition processing is performed using a speech recognition dictionary based on the feature amount, and a recognition result is output to the recognition result storage unit 13.
- a recognition processing method for example, a general method such as an HMM (Hidden Markov Model) method may be used.
- the speech recognition unit 11 may include a well-known intention understanding process, and may output a result obtained by estimating or searching for a user's intention from a recognition result obtained by large vocabulary continuous speech recognition as a recognition result.
- the processing unit 12 outputs at least a recognition result character string and a recognition result type (hereinafter referred to as “recognition result type”) as a recognition result.
- FIG. 4 shows an example of the recognition result character string and the recognition result type. For example, when the recognition result character string is “convenience store”, the processing unit 12 outputs the recognition result type “facility genre name”.
- the recognition result type is not limited to a specific character string but may be an ID represented by a number, or a dictionary name used for recognition processing (a dictionary name that includes a recognition result character string as a recognition vocabulary) ).
- the recognition target vocabulary of the speech recognition unit 11 is described as a facility genre name such as “convenience store” or “restaurant” and an artist name such as “mischild”, but is not limited thereto.
- the recognition result storage unit 13 stores the recognition result output by the processing unit 12. When receiving an instruction from the determination unit 14 described later, the stored recognition result is output to the generation unit 16.
- a button for instructing the start of voice recognition (hereinafter referred to as “voice recognition start instruction unit”) is displayed on the touch panel or installed on the handle. Then, the voice uttered after the user presses the voice recognition start instruction unit is recognized. That is, the voice recognition start instruction unit outputs a voice recognition start signal, and when the voice recognition unit receives the signal, it corresponds to the content uttered by the user from the voice data acquired by the voice acquisition unit after receiving the signal.
- the speech section to be detected is detected, and the above-described recognition process is performed.
- the voice recognition unit 11 always recognizes the voice data captured by the voice acquisition unit 10 even if there is no voice recognition start instruction from the user as described above. That is, the voice recognition unit 11 detects a speech section corresponding to the content spoken by the user from the voice data acquired by the voice acquisition unit 10 without receiving a voice recognition start signal, and the voice data of the speech section The feature amount is extracted, the recognition process is performed using the speech recognition dictionary based on the feature amount, and the process of outputting the recognition result is repeatedly performed.
- the determination unit 14 predefines a user operation that causes the display unit 18 to display a function execution button corresponding to the recognition result of the user utterance. That is, a user operation that triggers the determination unit 14 to instruct the recognition result storage unit 13 to output the recognition result stored in the recognition result storage unit 13 to the generation unit 16 described later is defined in advance. ing.
- a user operation predefined by the determination unit 14 may cause the display unit 18 to display a menu screen showing a function list of the navigation system 1, display a destination search screen, or display an AV screen.
- the button is, for example, a software key displayed on the display (for example, “Destination Setting” button SW11 in FIG. 3B), a hardware key (for example, illustrated in FIG. 3 (a) "menu" button HW1) or a remote control key.
- the determination unit 14 acquires the user operation content from the input reception unit 5 through the control unit 3 and determines whether or not the acquired operation content matches a predefined operation. If the acquired operation content matches a predefined operation, the determination unit 14 instructs the recognition result storage unit 13 to output the stored recognition result to the generation unit 16. On the other hand, if they do not match, the determination unit 14 does nothing.
- the display control unit 15 includes a generation unit 16 and a drawing unit 17.
- the generation unit 16 acquires a recognition result from the recognition result storage unit 13, and generates a function execution button corresponding to the acquired recognition result.
- the generation unit 16 associates the recognition result type and the function to be assigned to the function execution button (hereinafter referred to as “assignment function to the function execution button”) with the recognition result type. Is defined. Then, the generation unit 16 determines the function assigned to the function execution button corresponding to the recognition result type included in the recognition result acquired from the recognition result storage unit 13. Further, the generation unit 16 generates a function execution button to which the determined function is assigned. Thereafter, the generation unit 16 instructs the drawing unit 17 to display the generated function execution button on the display unit 18.
- the generation unit 16 refers to the table of FIG.
- the function assigned to the function execution button is determined as “surrounding facility search using“ convenience store ”as a search key”.
- the drawing unit 17 causes the display unit 18 to display the content instructed by the navigation unit 6 via the control unit 3 and the function execution button generated by the generation unit 16.
- the operation of the speech recognition system 2 according to the first embodiment will be described using the flowcharts and specific examples shown in FIGS.
- the user's operation that causes the function execution button to be displayed on the display unit 18 is “menu”, which is a hardware key installed on the edge of the display as shown in FIG. It is assumed that the button HW1, the “destination” button HW2, and the “AV” button HW3 are pressed. In order to simplify the description, description of the operation of the control unit 3 is omitted below.
- the “menu” button HW1 is for displaying a menu screen for presenting various functions to the user as shown in FIG.
- the “destination” button HW2 is used to display a destination search screen as shown in FIG.
- the “AV” button HW3 is for displaying an AV screen as shown in FIG. Note that the operations after pressing these hardware keys are examples, and are not limited to these operations.
- FIG. 6 shows a flowchart for recognizing a user utterance and holding the recognition result.
- the sound acquisition unit 10 will be described as always capturing sound collected by the microphone 9 during the sound acquisition period from when the navigation system 1 is started to when it is stopped.
- the voice acquisition unit 10 captures a user utterance collected by the microphone 9, that is, an input voice, and performs A / D conversion using, for example, PCM (step ST01).
- the processing unit 12 detects an utterance section corresponding to the content uttered by the user from the voice data digitized by the voice acquisition unit 10, extracts the feature amount of the voice data in the utterance section, and the feature Based on the amount, a recognition process is performed using a speech recognition dictionary (step ST02), and the recognition result is stored in the recognition result storage unit 13 (step ST03). As a result, the recognition result is stored in the recognition result storage unit 13 as shown in FIG. If the navigation system 1 is not stopped (step ST04 “NO”), the speech recognition system 2 returns to the process of step ST01, and if it is stopped (step ST04 “YES”), the process ends.
- FIG. 7 shows a flowchart for displaying a function execution button.
- the determination unit 14 acquires the user operation content from the input reception unit 5 (step ST11).
- the determination unit 14 proceeds to the process of step ST13.
- the determination unit 14 returns to the process in step ST11.
- the determination unit 14 determines whether or not the operation content acquired from the input reception unit 5 matches a predefined operation. If they match (step ST13 “YES”), the determination unit 14 instructs the recognition result storage unit 13 to output the stored recognition result to the generation unit 16. On the other hand, when the operation content acquired from the input receiving unit 5 does not match the predefined operation (“NO” in step ST13), the determination unit 14 returns to the process in step ST11.
- the process does not proceed to step ST13. Therefore, the recognition target words “mischild” “restaurant” Even if “Convenience Store” is included in the utterance content, the function execution button is not displayed on the display unit 18.
- step ST11, step ST12 “YES” Since the pressing operation of the “destination” button HW2 coincides with the operation predefined in the determination unit 14 (“YES” in step ST13), the determination unit 14 stores the result in the recognition result storage unit 13. The generation unit 16 is instructed to output the recognition result. The same applies when the “menu” button HW1 and the “AV” button HW3 are pressed.
- the recognition result storage unit 13 When receiving the instruction from the determination unit 14, the recognition result storage unit 13 outputs the recognition result stored at the time of receiving the instruction to the generation unit 16 (step ST14). Thereafter, the generation unit 16 generates a function execution button corresponding to the recognition result acquired from the recognition result storage unit 13 (step ST15), and instructs the drawing unit 17 to display the generated function execution button on the display unit 18. Instruct. Finally, the drawing unit 17 displays a function execution button on the display unit 18 (step ST16).
- the recognition result storage unit 13 outputs the recognition results “mischild”, “convenience store”, and “restaurant” to the generation unit 16 (step ST14).
- the generation unit 16 executes the function execution to which the function execution button to which the function of “music search using“ mischild ”as a search key” is assigned and the function to perform “search for peripheral facilities using“ convenience store ”as a search key” is assigned.
- a function execution button to which a button and a function of “search for nearby facilities using“ restaurant ”as a search key” is assigned is generated (step ST15), and the drawing unit 17 is instructed to be displayed on the display unit 18.
- the drawing unit 17 causes the display unit 18 to display the function execution button generated by the generation unit 16 on the screen instructed to be displayed by the navigation unit 6. For example, when the “menu” button HW1 is pressed by the user, the drawing unit 17 displays the menu screen instructed by the navigation unit 6 and is generated by the generation unit 16 as shown in FIG.
- the function execution buttons of the “mischild” button SW1, the “restaurant” button SW2, and the “convenience store” button SW3 are displayed.
- the “destination” button HW2 and “AV” button HW3 are pressed by the user, the screens shown in FIGS. 8C and 8D are displayed.
- the navigation unit 6 that receives an instruction from the input reception unit 5 executes the function assigned to the function execution button.
- the voice recognition system 2 acquires the voice uttered by the user over a preset voice acquisition period, and the voice acquisition unit 10 acquires the voice.
- the voice recognition unit 11 for recognizing the voice, the determination unit 14 that determines whether or not the user has performed a predetermined operation, and the determination unit 14 that determines that the user has performed a predetermined operation
- a display control unit 15 that causes the display unit 18 to display a function execution button that causes the navigation system 1 to execute a function corresponding to the recognition result of the voice recognition unit 11, and captures voice over a preset voice acquisition period;
- the function execution button based on the utterance content is displayed. It is possible to eliminate again uttered again complexity later.
- the generation unit 16 has been described as generating a function execution button in which only the recognition result character string is displayed. However, an icon corresponding to the recognition result character string is defined in advance. As shown in FIG. 10A, a function execution button combining the recognition result character string and the icon or a function execution button only for the icon corresponding to the recognition result character string as shown in FIG. 10B is generated. Also good. In the following second and third embodiments, the display form of the function execution button does not matter.
- the generation unit 16 may change the display mode of the function execution button according to the recognition result type.
- the function execution button corresponding to the recognition result type “artist name” is a jacket image of the artist's album
- the function execution button corresponding to the recognition result type “facility genre name” is an icon. Also good.
- the speech recognition system 2 includes a priority assigning unit that assigns a priority to the recognition result for each type, and the generation unit 16 performs a function execution button corresponding to the recognition result based on the priority of the recognition result. At least one of the size and the display order may be changed.
- the voice recognition system 2 includes a priority assigning unit 19.
- the priority assigning unit 19 acquires user operation contents from the input receiving unit 5 via the control unit 3 and manages them as an operation history. Moreover, the priority provision part 19 monitors the recognition result storage part 13, and if a recognition result is stored in the recognition result storage part 13, the priority based on a user's past operation history will be provided with respect to the recognition result. To do.
- the recognition result storage unit 13 outputs the recognition result to the generation unit 16, the recognition result storage unit 13 also outputs the priority assigned by the priority assignment unit 19.
- the priority assigning unit 19 sets the priority of the recognition result whose recognition result type is “facility genre name”. The priority of the recognition result whose recognition result type is “artist name” is set higher. Then, for example, the generation unit 16 generates each function execution button so that the size of the function execution button for the recognition result with high priority is larger than the size of the function execution button for the recognition result with low priority. Also by doing this, it is possible to make the function execution buttons that are likely to be required by the user stand out, improving convenience.
- the drawing unit 17 displays the function execution button on the display unit 18, the drawing unit 17 displays the function execution button corresponding to the recognition result having a high priority on the upper part of the function execution button corresponding to the recognition result having a low priority. In this way, it is possible to make the function execution buttons that the user may need become conspicuous, and convenience is improved.
- whether to output the function execution button may be changed based on the priority of the recognition result. For example, when the number of function execution buttons generated by the generation unit 16 exceeds the upper limit of the predetermined display number, the drawing unit 17 gives priority to the function execution button corresponding to the recognition result having a high priority. If the maximum number is exceeded, other function execution buttons may not be displayed. By doing so, it is possible to preferentially display a function execution button that the user is likely to need, thereby improving convenience.
- the function execution button is displayed when the user operates a button such as a hardware key or software key. However, when the user performs a predetermined operation, the function execution button is displayed. A function execution button may be displayed. Examples of operations performed by the user include speech and gestures.
- the processing unit 12 includes commands for operating the controlled device such as “telephone” and “audio”, and the subject such as “I want to go”, “I want to hear”, and “Email”.
- An utterance that is considered to include an intention to operate the control device is set as a recognition target vocabulary. Then, the processing unit 12 outputs the recognition result not only to the recognition result storage unit 13 but also to the determination unit 14.
- the determination unit 14 defines an utterance that triggers the function execution button to be displayed in advance in addition to the user operation described above. For example, utterances such as “I want to go”, “I want to hear”, and “Audio” are defined. And the determination part 14 acquires the recognition result output by the process part 12, and when the said recognition result corresponds with the speech content defined beforehand, it outputs so that the stored recognition result may be output to the production
- the voice recognition system 2 may display a function execution button triggered by a user gesture operation of looking around the vehicle or hitting a handle.
- the determination unit 14 acquires information measured by a visible light camera, an infrared camera, or the like (not shown) installed in the vehicle, and detects the movement of the face from the acquired information. Then, the determination unit 14 determines that the vehicle is looking around the vehicle when the face turns to the front with respect to the camera as 0 degree and the left and right range is 45 degrees in one second.
- the drawing unit 17 functions so as to overlap the displayed screen without performing a screen transition corresponding to the operation or the like.
- An execution button may be displayed.
- the drawing unit 17 transitions to the menu screen of FIG. 3B and displays a function execution button.
- a function execution button is displayed on the map display screen of FIG.
- FIG. 1 A block diagram showing an example of a navigation system to which a speech recognition system according to Embodiment 2 of the present invention is applied is the same as FIG. 1 shown in Embodiment 1, and therefore illustration and description thereof are omitted.
- the second embodiment shown below is different from the first embodiment in that the determination unit 14 stores a user operation and a recognition result type in association with each other as shown in FIG. 12, for example.
- the hardware keys in FIG. 12 are, for example, a “menu” button HW1, a “destination” button HW2, an “AV” button HW3, and the like installed on the edge of the display as shown in FIG.
- the software keys in FIG. 12 are, for example, “Destination setting” button SW11, “AV” button SW12, etc. displayed on the display as shown in FIG.
- the determination unit 14 acquires the operation content of the user from the input reception unit 5, and determines whether or not the acquired operation content matches a predefined operation. If the acquired operation content matches a predefined operation, the determination unit 14 determines a recognition result type corresponding to the operation content. Thereafter, the determination unit 14 instructs the recognition result storage unit 13 to output a recognition result having the determined recognition result type to the generation unit 16. On the other hand, when the acquired operation content does not match the predefined operation, the determination unit 14 does nothing.
- the recognition result storage unit 13 When the recognition result storage unit 13 receives an instruction from the determination unit 14, the recognition result storage unit 13 outputs a recognition result having a recognition result type that matches the recognition result type specified by the determination unit 14 to the generation unit 16.
- the operation of the speech recognition system 2 according to the second embodiment will be described using the flowchart shown in FIG. 13 and a specific example.
- the user operation that causes the function execution button to be displayed on the display unit 18 is the operation defined in FIG.
- the conversation between users is the same as in the first embodiment.
- the flowchart for recognizing the user utterance and holding the recognition result is the same as the flowchart in FIG. Further, the processing from step ST21 to step ST23 in the flowchart of FIG. 13 is the same as that from step ST11 to step ST13 in the flowchart of FIG. In the following description, it is assumed that the process of FIG. 6 is executed and the recognition result storage unit 13 stores the recognition result as shown in FIG.
- the determination unit 14 determines the recognition result type corresponding to the operation content, The recognition result storage unit 13 is instructed to output a recognition result having the determined recognition result type to the generation unit 16 (step ST24).
- the recognition result storage unit 13 outputs a recognition result having a recognition result type that matches the recognition result type specified by the determination unit 14 to the generation unit 16. (Step ST25).
- step ST21, step ST22 when the user B wants to search for a convenience store around the current location, and performs a pressing operation of the “destination” button HW2, which is an operation for performing the function (step ST21, step ST22). Since the pressing operation of the “destination” button HW2 coincides with the operation predefined in the determination unit 14 (“YES” in step ST23), the determination unit 14 refers to the table shown in FIG. Then, the recognition result type corresponding to the operation is determined as “facility genre name” (step ST24). Thereafter, the determination unit 14 instructs the recognition result storage unit 13 to output a recognition result having the recognition result type “facility genre name” to the generation unit 16.
- the recognition result storage unit 13 Upon receiving an instruction from the determination unit 14, the recognition result storage unit 13 generates a recognition result whose recognition result type is “facility genre name”, that is, a recognition result whose recognition result character strings are “convenience store” and “restaurant”. It outputs to the part 16 (step ST25).
- the generation unit 16 has a function execution button assigned a function of “search for surrounding facilities using“ convenience store ”as a search key” and a function execution button assigned a function of “search for peripheral facilities using“ restaurant ”as a search key”.
- Generate step ST26.
- the drawing unit 17 displays function execution buttons of a “convenience store” button SW3 and a “restaurant” button SW2 as shown in FIG. 14A on the display unit 18 (step ST27).
- a “mischild” button SW1 which is a function execution button to which a function of “music search using“ mischild ”as a search key” is assigned, is displayed. Is displayed on the display unit 18 as shown in FIG.
- the determination unit 14 stores the user's utterance content or the user's gesture content and the recognition result type in association with each other. Outputs the recognition result type that matches the user's utterance content acquired from the speech recognition unit 11 or the user's gesture content determined based on information acquired from the camera or the touch sensor to the recognition result storage unit 13. do it.
- the determination unit 14 uses the information indicating the correspondence between the operation or operation performed by the user and the type of the recognition result of the speech recognition unit 11,
- the display control unit 15 selects a recognition result that matches the type determined by the determination unit 14 from the recognition results of the voice recognition unit 11, and determines the type corresponding to the case where it is determined that the operation has been performed. Since the function execution button for causing the navigation system 1 to execute the function corresponding to the recognition result is configured to be displayed on the display unit 18, the function execution button having high relevance to the content operated by the user is presented. Therefore, for the user, his / her own operation intention is prefetched and presented more accurately, and the user friendliness and ease of use can be further improved.
- FIG. FIG. 16 is a block diagram showing an example of a navigation system 1 to which the speech recognition system 2 according to Embodiment 3 of the present invention is applied.
- symbol is attached
- the speech recognition system 2 does not include the recognition result storage unit 13 as compared with the first embodiment. Instead, the voice recognition system 2 includes a voice data storage unit 20, and the voice acquisition unit 10 continuously captures the voice collected by the microphone 9 and digitizes it by A / D conversion, or one of them. Are stored in the audio data storage unit 20.
- the voice acquisition unit 10 stores, in the voice data storage unit 20, voice data obtained by digitizing the voice collected by the microphone 9 for one minute after the moving object stops as the voice acquisition period.
- voice data obtained by digitizing the voice collected by the microphone 9 for one minute after the moving object stops as the voice acquisition period.
- the voice acquisition unit 10 captures the voice collected by the microphone 9 during the period from the start of the navigation system 1 to the stop as the voice acquisition period, for example, the voice data for the past 30 seconds is voiced.
- the data is stored in the data storage unit 20.
- the voice acquisition unit 10 is configured to perform a process of detecting an utterance section from voice data and extracting the section, and the voice acquisition unit 10 stores voice data of the utterance section as a voice data storage unit. 20 may be stored.
- audio data for a predetermined number of utterance intervals may be stored in the audio data storage unit 20, and audio data exceeding the predetermined number of utterance intervals may be deleted in order from the oldest.
- the determination unit 14 acquires the user operation content from the input reception unit 5, and outputs a voice recognition start instruction to the processing unit 12 when the acquired operation content matches a predefined operation. .
- the processing unit 12 receives a voice recognition start instruction from the determination unit 14, the processing unit 12 acquires voice data from the voice data storage unit 20, performs voice recognition processing on the acquired voice data, and generates a recognition result. To the unit 16.
- the voice acquisition unit 10 captures the voice collected by the microphone 9 from the start to the stop of the navigation system 1 as the voice acquisition period. It is assumed that data is stored in the audio data storage unit 20.
- FIG. 17 shows a flowchart for capturing and holding user utterances.
- the voice acquisition unit 10 captures a user utterance collected by the microphone 9, that is, an input voice, and performs A / D conversion by, for example, PCM (step ST31).
- the voice acquisition unit 10 stores the digitized voice data in the voice data storage unit 20 (step ST32).
- voice acquisition part 10 returns to the process of step ST31, and when stopped (step ST33 "YES”), a process is complete
- FIG. 18 shows a flowchart for displaying a function execution button.
- the processing from step ST41 to step ST43 is the same as step ST11 to step ST13 in the flowchart of FIG.
- the determination unit 14 outputs a voice recognition start instruction to the processing unit 12 when the user operation content acquired from the input reception unit 5 matches a predefined operation ("YES" in step ST43).
- the processing unit 12 acquires voice data from the voice data storage unit 20 (step ST44), performs voice recognition processing on the acquired voice data, and obtains a recognition result. It outputs to the production
- the sound acquisition unit 10 when the determination unit 14 determines that the user has performed a predetermined operation or action, the sound acquisition unit 10 performs the sound acquisition period. Since it has a configuration for recognizing voices acquired over a period of time, resources such as memory can be allocated to other processing such as map screen drawing processing when voice recognition processing is not performed. The response speed to user operations other than voice operations can be improved.
- the voice recognition system according to the present invention is suitable for use in a voice recognition system that constantly recognizes a user's utterance because the function execution button is presented at a timing required by the user.
- 1 navigation system (controlled device), 2 voice recognition system, 3 control unit, 4 instruction input unit, 5 input receiving unit, 6 navigation unit, 7 audio control unit, 8 speaker, 9 microphone, 10 audio acquisition unit, 11 audio Recognition unit, 12 processing unit, 13 recognition result storage unit, 14 determination unit, 15 display control unit, 16 generation unit, 17 drawing unit, 18 display unit, 19 priority assignment unit, 20 audio data storage unit, 100 bus, 101 CPU, 102 ROM, 103 RAM, 104 HDD, 105 input device, 106 output device.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Automation & Control Theory (AREA)
- Computational Linguistics (AREA)
- Navigation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
なお、以下の実施の形態では、この発明に係る音声認識システムを車両等の移動体用のナビゲーションシステム(被制御装置)に適用した場合を例に挙げて説明するが、音声操作機能を有するシステムであれば、どのようなシステムに適用してもよい。
図1は、この発明の実施の形態1に係る音声認識システム2を適用したナビゲーションシステム1の一例を示すブロック図である。このナビゲーションシステム1は、制御部3、入力受信部5、ナビゲーション部6、音声制御部7、音声取得部10、音声認識部11、判定部14および表示制御部15を備えている。なお、ナビゲーションシステム1の構成要件は、ネットワーク上のサーバ、スマートフォン等の携帯端末、車載器に分散されていてもよい。
音声認識システム2は、マイク9により集音された音声を予め設定された音声取得期間に亘って連続的に取り込んで、予め定められたキーワードを認識し、認識結果を保持する。そして、音声認識システム2は、移動体のユーザによりナビゲーションシステム1に対して予め定められた操作が行われたか否か判定し、当該操作が行われると、保持している認識結果を用いて認識結果に対応する機能を実行するための機能実行ボタンを生成し、生成した機能実行ボタンを表示部18へ出力する。
予め設定された音声取得期間については後述する。
A:「この曲終わったら次は何再生しよう?」
B:「ミスチャイルドを久しぶりに聞きたいなぁ」
A:「いいねー。そういえば、昼食はレストランでいい?」
B:「コンビニで何か買えばいいかなぁ」
A:「わかったー」
ここで、音声認識システム2は、キーワードとしてアーティスト名「ミスチャイルド」と施設ジャンル名「レストラン」「コンビニ」とを認識するが、この段階では、これらの認識結果に対応する機能実行ボタンを表示部18に表示しない。なお、図3に示す「メニュー」ボタンHW1、「目的地」ボタンHW2、「AV(Audio Visual)」ボタンHW3および「現在地」ボタンHW4は、表示部18のディスプレイ筐体に設置されたハードウェア(HW)キーである。
一方、「コンビニ」ボタンSW3を使用せずに現在地周辺のコンビニの検索を実行しようとした場合、ユーザBは、例えば「メニュー」ボタンHW1を押下操作してメニュー画面を表示させ、メニュー画面の「目的地設定」ボタンSW11を押下操作して目的地検索画面を表示させ、目的地検索画面の「周辺施設検索」ボタンを押下操作して周辺施設検索画面を表示させ、検索キーとして「コンビニ」を設定して検索実行を指示することになる。つまり、通常であれば複数回の操作を行って呼び出して実行することとなる機能を、機能実行ボタン1回の操作で呼び出して実行することができる。
マイク9は、ユーザが発話した音声を集音する。マイク9には、例えば、全指向性(無指向性)のマイク、複数の全指向性(無指向性)のマイクをアレイ状に配列して指向特性を調整可能としたアレイマイク、または一方向のみに指向性を有しており指向特性を調整できない単一指向性マイクなどがある。
入力受信部5は、指示入力部4により入力された指示を受信して、制御部3へ出力する。
音声取得部10は、マイク9により集音された音声を連続的に取り込み、例えば、PCM(Pulse Code Modulation)によりA/D(Analog/Digital)変換する。
処理部12は、音声取得部10によりデジタル化された音声データから、ユーザが発話した内容に該当する音声区間(以下、「発話区間」と記載する)を検出し、該発話区間の音声データの特徴量を抽出し、その特徴量に基づいて音声認識辞書を用いて認識処理を行い、認識結果を認識結果格納部13へ出力する。認識処理の方法としては、例えばHMM(Hidden Markov Model)法のような一般的な方法を用いて行えばよいため詳細な説明を省略する。
ここでは、音声取得部10は、ナビゲーションシステム1が起動してから停止するまでの音声取得期間、常に、マイク9により集音された音声を取り込むものとして説明する。まず、音声取得部10は、マイク9により集音されたユーザ発話、すなわち、入力された音声を取り込み、例えばPCMによりA/D変換する(ステップST01)。
まず、判定部14は、入力受信部5からユーザの操作内容を取得する(ステップST11)。操作内容が取得できた場合すなわち何らかのユーザ操作があった場合(ステップST12「YES」)、判定部14はステップST13の処理へ進む。一方、操作内容が取得できなかった場合(ステップST12「NO」)、判定部14はステップST11の処理へ戻る。
その後、生成部16は、認識結果格納部13から取得した認識結果に対応する機能実行ボタンを生成し(ステップST15)、当該生成した機能実行ボタンを表示部18へ表示するよう描画部17に対して指示する。最後に、描画部17は機能実行ボタンを表示部18に表示させる(ステップST16)。
具体的には、手動でのジャンル名による施設検索の回数がアーティスト名検索の回数より多い場合は、優先度付与部19は、認識結果種別が「施設ジャンル名」である認識結果の優先度を、認識結果種別が「アーティスト名」である認識結果の優先度より高くする。そして、生成部16は、例えば、優先度が高い認識結果に対する機能実行ボタンの大きさが、優先度が低い認識結果に対する機能実行ボタンの大きさより大きくなるように、各機能実行ボタンを生成する。このようにすることでも、ユーザが必要としていそうな機能実行ボタンを目立たせることができるので、利便性が向上する。
この発明の実施の形態2による音声認識システムを適用したナビゲーションシステムの一例を示すブロック図は、実施の形態1において示した図1と同じであるため、図示および説明を省略する。以下に示す実施の形態2では、実施の形態1と比べると、判定部14が、例えば図12に示すように、ユーザの操作と認識結果種別とを対応付けて記憶している点が異なる。図12のハードウェアキーとは、例えば図3(a)に示すようなディスプレイの辺縁に設置されている「メニュー」ボタンHW1、「目的地」ボタンHW2、「AV」ボタンHW3などである。また、図12のソフトウェアキーとは、例えば図3(b)に示すようなディスプレイ上に表示されている「目的地設定」ボタンSW11、「AV」ボタンSW12などである。
図16は、この発明の実施の形態3による音声認識システム2を適用したナビゲーションシステム1の一例を示すブロック図である。なお、実施の形態1で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。
Claims (5)
- ユーザが発話した音声を、予め設定された音声取得期間に亘って取得する音声取得部と、
前記音声取得部が取得した音声を認識する音声認識部と、
前記ユーザが予め定められた操作または動作を行ったか否かを判定する判定部と、
前記判定部において前記ユーザが予め定められた操作または動作を行ったと判定された場合、前記音声認識部の認識結果に対応した機能を被制御装置に実行させる機能実行ボタンを表示部に表示させる表示制御部とを備えることを特徴とする音声認識システム。 - 前記判定部は、前記ユーザが行う操作または動作と前記音声認識部の認識結果の種別との対応関係を示す情報を用いて、前記ユーザが前記操作または前記動作を行ったと判定した場合に対応する種別を判定し、
前記表示制御部は、前記音声認識部の認識結果の中から前記判定部が判定した種別に一致する認識結果を選択し、当該選択した認識結果に対応した機能を前記被制御装置に実行させる機能実行ボタンを前記表示部に表示させることを特徴とする請求項1記載の音声認識システム。 - 前記表示制御部は、前記音声認識部の認識結果の種別に応じて、前記機能実行ボタンの表示態様を変更することを特徴とする請求項1記載の音声認識システム。
- 前記音声認識部の認識結果に対して、種別ごとに優先度を付与する優先度付与部を備え、
前記表示制御部は、前記優先度付与部が前記音声認識部の認識結果に対して付与した優先度に基づいて、前記機能実行ボタンの表示態様を変更することを特徴とする請求項3記載の音声認識システム。 - 前記音声認識部は、前記判定部において前記ユーザが予め定められた操作または動作を行ったと判定された場合に、前記音声取得部が前記音声取得期間に亘って取得しておいた音声を認識することを特徴とする請求項1記載の音声認識システム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112014007288.5T DE112014007288T5 (de) | 2014-12-26 | 2014-12-26 | Spracherkennungssystem |
US15/509,981 US20170301349A1 (en) | 2014-12-26 | 2014-12-26 | Speech recognition system |
PCT/JP2014/084571 WO2016103465A1 (ja) | 2014-12-26 | 2014-12-26 | 音声認識システム |
CN201480084386.7A CN107110660A (zh) | 2014-12-26 | 2014-12-26 | 语音识别系统 |
JP2016565813A JP6522009B2 (ja) | 2014-12-26 | 2014-12-26 | 音声認識システム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/084571 WO2016103465A1 (ja) | 2014-12-26 | 2014-12-26 | 音声認識システム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016103465A1 true WO2016103465A1 (ja) | 2016-06-30 |
Family
ID=56149553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/084571 WO2016103465A1 (ja) | 2014-12-26 | 2014-12-26 | 音声認識システム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20170301349A1 (ja) |
JP (1) | JP6522009B2 (ja) |
CN (1) | CN107110660A (ja) |
DE (1) | DE112014007288T5 (ja) |
WO (1) | WO2016103465A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2016002406A1 (ja) * | 2014-07-04 | 2017-04-27 | クラリオン株式会社 | 車載対話型システム、及び車載情報機器 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11176930B1 (en) | 2016-03-28 | 2021-11-16 | Amazon Technologies, Inc. | Storing audio commands for time-delayed execution |
DE102018006480A1 (de) | 2018-08-16 | 2020-02-20 | Daimler Ag | Schlüsselvorrichtung zum Einstellen eines Fahrzeugparameters |
JP2020144209A (ja) * | 2019-03-06 | 2020-09-10 | シャープ株式会社 | 音声処理装置、会議システム、及び音声処理方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004239963A (ja) * | 2003-02-03 | 2004-08-26 | Mitsubishi Electric Corp | 車載制御装置 |
JP2011080824A (ja) * | 2009-10-06 | 2011-04-21 | Clarion Co Ltd | ナビゲーション装置 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3380992B2 (ja) * | 1994-12-14 | 2003-02-24 | ソニー株式会社 | ナビゲーションシステム |
US8768286B2 (en) * | 2001-10-24 | 2014-07-01 | Mouhamad Ahmad Naboulsi | Hands on steering wheel vehicle safety control system |
JP3948357B2 (ja) * | 2002-07-02 | 2007-07-25 | 株式会社デンソー | ナビゲーション支援システム、移動装置、ナビゲーション支援サーバおよびコンピュータプログラム |
US20120253823A1 (en) * | 2004-09-10 | 2012-10-04 | Thomas Barton Schalk | Hybrid Dialog Speech Recognition for In-Vehicle Automated Interaction and In-Vehicle Interfaces Requiring Minimal Driver Processing |
JP2010205130A (ja) * | 2009-03-05 | 2010-09-16 | Denso Corp | 制御装置 |
US9213466B2 (en) * | 2009-07-20 | 2015-12-15 | Apple Inc. | Displaying recently used functions in context sensitive menu |
JP2011113483A (ja) * | 2009-11-30 | 2011-06-09 | Fujitsu Ten Ltd | 情報処理装置、オーディオ装置及び情報処理方法 |
US9417754B2 (en) * | 2011-08-05 | 2016-08-16 | P4tents1, LLC | User interface system, method, and computer program product |
US20180032997A1 (en) * | 2012-10-09 | 2018-02-01 | George A. Gordon | System, method, and computer program product for determining whether to prompt an action by a platform in connection with a mobile device |
CN103917847B (zh) * | 2011-11-10 | 2017-03-01 | 三菱电机株式会社 | 导航装置及方法 |
KR101992676B1 (ko) * | 2012-07-26 | 2019-06-25 | 삼성전자주식회사 | 영상 인식을 이용하여 음성 인식을 하는 방법 및 장치 |
CN105246743B (zh) * | 2013-05-21 | 2017-03-29 | 三菱电机株式会社 | 语音识别装置、识别结果显示装置及显示方法 |
US20150052459A1 (en) * | 2013-08-13 | 2015-02-19 | Unisys Corporation | Shortcut command button for a hierarchy tree |
KR20150025214A (ko) * | 2013-08-28 | 2015-03-10 | 삼성전자주식회사 | 동영상에 비주얼 객체를 중첩 표시하는 방법, 저장 매체 및 전자 장치 |
KR102229356B1 (ko) * | 2013-09-05 | 2021-03-19 | 삼성전자주식회사 | 제어 장치 |
US9383827B1 (en) * | 2014-04-07 | 2016-07-05 | Google Inc. | Multi-modal command display |
US9576575B2 (en) * | 2014-10-27 | 2017-02-21 | Toyota Motor Engineering & Manufacturing North America, Inc. | Providing voice recognition shortcuts based on user verbal input |
-
2014
- 2014-12-26 DE DE112014007288.5T patent/DE112014007288T5/de not_active Ceased
- 2014-12-26 WO PCT/JP2014/084571 patent/WO2016103465A1/ja active Application Filing
- 2014-12-26 CN CN201480084386.7A patent/CN107110660A/zh active Pending
- 2014-12-26 JP JP2016565813A patent/JP6522009B2/ja not_active Expired - Fee Related
- 2014-12-26 US US15/509,981 patent/US20170301349A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004239963A (ja) * | 2003-02-03 | 2004-08-26 | Mitsubishi Electric Corp | 車載制御装置 |
JP2011080824A (ja) * | 2009-10-06 | 2011-04-21 | Clarion Co Ltd | ナビゲーション装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2016002406A1 (ja) * | 2014-07-04 | 2017-04-27 | クラリオン株式会社 | 車載対話型システム、及び車載情報機器 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2016103465A1 (ja) | 2017-04-27 |
US20170301349A1 (en) | 2017-10-19 |
CN107110660A (zh) | 2017-08-29 |
JP6522009B2 (ja) | 2019-05-29 |
DE112014007288T5 (de) | 2017-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6570651B2 (ja) | 音声対話装置および音声対話方法 | |
JP6400109B2 (ja) | 音声認識システム | |
JP5762660B2 (ja) | 音声認識装置、認識結果表示装置および表示方法 | |
JP5925313B2 (ja) | 音声認識装置 | |
WO2013005248A1 (ja) | 音声認識装置およびナビゲーション装置 | |
US20150331665A1 (en) | Information provision method using voice recognition function and control method for device | |
JP6725006B2 (ja) | 制御装置および機器制御システム | |
JP5677650B2 (ja) | 音声認識装置 | |
CN105448293B (zh) | 语音监听及处理方法和设备 | |
WO2016103465A1 (ja) | 音声認識システム | |
KR20150089145A (ko) | 음성 제어를 수행하는 디스플레이 장치 및 그 음성 제어 방법 | |
US10671343B1 (en) | Graphical interface to preview functionality available for speech-enabled processing | |
JP6214297B2 (ja) | ナビゲーション装置および方法 | |
JP2008145693A (ja) | 情報処理装置及び情報処理方法 | |
WO2004019197A1 (ja) | リズムパターンを用いた制御システム、方法およびプログラム | |
JP4498906B2 (ja) | 音声認識装置 | |
JP3296783B2 (ja) | 車載用ナビゲーション装置および音声認識方法 | |
JP5446540B2 (ja) | 情報検索装置、制御方法及びプログラム | |
JP2008233009A (ja) | カーナビゲーション装置及びカーナビゲーション装置用プログラム | |
JP2015129672A (ja) | 施設検索装置および方法 | |
WO2015102039A1 (ja) | 音声認識装置 | |
JP2017102320A (ja) | 音声認識装置 | |
KR20210015986A (ko) | 전자 장치 및 이의 음성 인식 방법 | |
JP2017167600A (ja) | 端末装置 | |
JPWO2013005248A1 (ja) | 音声認識装置およびナビゲーション装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14909069 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016565813 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15509981 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112014007288 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14909069 Country of ref document: EP Kind code of ref document: A1 |