WO2014006690A1 - 音声認識装置 - Google Patents
音声認識装置 Download PDFInfo
- Publication number
- WO2014006690A1 WO2014006690A1 PCT/JP2012/066974 JP2012066974W WO2014006690A1 WO 2014006690 A1 WO2014006690 A1 WO 2014006690A1 JP 2012066974 W JP2012066974 W JP 2012066974W WO 2014006690 A1 WO2014006690 A1 WO 2014006690A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- information
- user
- search
- display
- Prior art date
Links
- 230000004044 response Effects 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 5
- 238000000034 method Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 3
- 102100035353 Cyclin-dependent kinase 2-associated protein 1 Human genes 0.000 description 2
- 102100029860 Suppressor of tumorigenicity 20 protein Human genes 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 101000737813 Homo sapiens Cyclin-dependent kinase 2-associated protein 1 Proteins 0.000 description 1
- 101000911772 Homo sapiens Hsc70-interacting protein Proteins 0.000 description 1
- 101000585359 Homo sapiens Suppressor of tumorigenicity 20 protein Proteins 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
Definitions
- the present invention relates to a speech recognition apparatus that recognizes speech spoken by a user and searches for information.
- a button for instructing the start of voice recognition (hereinafter referred to as a voice recognition start instruction section) is displayed on the touch panel or installed on the handle. Then, the voice uttered after the voice recognition start instruction unit is pressed by the passenger (user) is recognized. That is, the voice recognition start instruction unit outputs a voice recognition start signal, and when the voice recognition unit receives the signal, the passenger (user) speaks from the voice data acquired by the voice acquisition unit after receiving the signal. A speech section corresponding to the content is detected and speech recognition processing is performed.
- the voice recognition unit detects the voice section corresponding to the content spoken by the passenger (user) from the voice data acquired by the voice acquisition unit without receiving the voice recognition start signal, and the voice section A feature amount of voice data is extracted, a recognition process is performed using a recognition dictionary based on the feature amount, and a process of outputting a character string of a voice recognition result is repeatedly performed.
- the database is searched based on the character string and the search result is displayed.
- Patent Document 1 a voice uttered by a user is always input and voice recognition is performed, and the recognition result is displayed. Thereafter, the user performs a determination operation using an operation button, thereby executing processing based on the recognition result.
- a speech recognition apparatus is disclosed.
- the conventional speech recognition apparatus such as Patent Document 1 has a problem that when the same utterance is recognized, only the search result of the same level is always displayed. That is, for example, when the user speaks “gas station”, the store name and location of the nearby gas station are always displayed. In order for the user to know the price for each gas station, a predetermined operation is performed each time. There was a problem that it had to be done separately.
- the present invention has been made to solve the above-described problems, and an object of the present invention is to provide a voice recognition device that can immediately present information at a level required by a user.
- a speech recognition apparatus detects a speech uttered by a user and acquires a speech by recognizing speech data acquired by the speech acquisition unit. Based on a voice recognition unit, an operation input unit that receives an operation input from the user, a display unit that presents information to the user, information received by the operation input unit, and information displayed on the display unit.
- the operation response analysis unit that identifies the user's operation, and the display contents displayed on the display unit and the number of times displayed by the operation identified by the operation response analysis unit for each keyword extracted by the voice recognition unit Is extracted by the voice recognition unit according to the history information stored in the operation display history storage unit.
- a search level setting unit for setting a search level of the keyword, and according to the search level set by the search level setting unit, information is searched using the keyword extracted by the voice recognition unit as a search key, and the search result is obtained.
- An information search control unit to acquire, and an information presentation control unit for giving an instruction to display the search result acquired by the information search control unit on the display unit, wherein the search level setting unit is the voice recognition unit.
- the search level is changed when the number of display times in the history information stored in the operation display history storage unit exceeds a predetermined number for the keywords extracted by the above.
- the speech recognition apparatus of the present invention it is possible to immediately present information at the level required by the user, and it is possible to efficiently provide detailed information necessary for the user at all times, so that convenience for the user is improved.
- FIG. 1 is a block diagram illustrating an example of a voice recognition device according to Embodiment 1.
- FIG. It is a figure which shows the example of a definition of a search level. It is a figure which shows the example of the search level for every keyword set to the information search control part. It is a figure which shows the operation history and display history by the user for every keyword memorize
- 3 is a flowchart showing the operation of the speech recognition apparatus in the first embodiment. It is a figure which shows the example in which an operation history and a display history are updated about one keyword (gas station) memorize
- FIG. 10 is a flowchart showing the operation of the speech recognition apparatus in the second embodiment.
- FIG. 10 is a block diagram illustrating an example of a voice recognition device according to a third embodiment.
- 10 is a flowchart illustrating the operation of the speech recognition apparatus according to Embodiment 3.
- It is a block diagram which shows an example of the speech recognition apparatus by Embodiment 4.
- 10 is a flowchart showing the operation of the speech recognition apparatus in the fourth embodiment.
- FIG. 1 is a diagram illustrating an example of a display screen of a general navigation device.
- the following conversation is performed in a state where a map for normal road guidance and the vehicle mark 71 are displayed on the screen 70 of the navigation device.
- User A “Soon gasoline will run out”
- User B “Is there a gas station nearby?”
- a genre name icon 72 corresponding to the genre name included in the utterance content (in this example, “gas station”) is displayed on the screen 70 of the navigation device (FIG. 1A).
- a gas station around the current location is searched, and for example, the name and address of the gas station are displayed as a search result list 73 as a search result (FIG. 1B).
- the location information of the selected gas station is displayed as a facility mark 74, and detailed information of the gas station, such as business hours and gasoline, is displayed.
- Detailed buttons 75 for example, “business hours” button 75a and “price” button 75b) for displaying the price and the like are displayed (FIG. 1C).
- the “business hours” button 75a the business hours of the gas station are displayed (FIG. 1D).
- a facility search by genre such as the above-described gas station will be described as an example.
- information to be searched in the present invention is not limited to this facility information, and traffic information , Weather information, address information, news, music information, movie information, program information, and the like.
- FIG. FIG. 2 is a block diagram showing an example of a speech recognition apparatus according to Embodiment 1 of the present invention.
- This voice recognition device is used by being incorporated in a navigation device mounted on a vehicle (moving body), and includes a voice acquisition unit 1, a voice recognition unit 2, a voice recognition dictionary 3, an information database 4, and information search control. Unit 5, information presentation control unit 6, display unit 7, operation input unit 8, operation response analysis unit 9, operation display history storage unit 10, and search level setting unit 11.
- the voice acquisition unit 1 takes in a user utterance collected by a microphone, that is, an input voice, and performs A / D (Analog / Digital) conversion by PCM (Pulse Code Modulation), for example.
- a / D Analog / Digital
- PCM Pulse Code Modulation
- the voice recognition unit 2 detects a voice section corresponding to the content uttered by the user from the voice signal digitized by the voice acquisition unit 1, extracts a feature quantity of voice data in the voice section, and uses the feature quantity as the feature quantity. Based on this, a recognition process is performed using the speech recognition dictionary 3, and a character string as a speech recognition result is output.
- the recognition process may be performed using a general method such as an HMM (Hidden Markov Model) method.
- a button for instructing the start of voice recognition (hereinafter referred to as a voice recognition start instruction section) is displayed on the touch panel or installed on the handle. Then, the voice uttered after the user presses the voice recognition start instruction unit is recognized. That is, the voice recognition start instruction unit outputs a voice recognition start signal, and when the voice recognition unit receives the signal, it corresponds to the content uttered by the user from the voice data acquired by the voice acquisition unit after receiving the signal.
- the speech section to be detected is detected, and the above-described recognition process is performed.
- the voice recognition unit 2 in the first embodiment always recognizes the content spoken by the user without the voice recognition start instruction by the user as described above. That is, even if the voice recognition unit 2 does not receive the voice recognition start signal, the voice recognition unit 2 always uses the voice data acquired by the voice acquisition unit 1 from the voice data acquired by the voice acquisition unit 1 when the navigation device incorporating the voice recognition device is activated. A speech section corresponding to the content uttered is detected, a feature amount of speech data in the speech section is extracted, a recognition process is performed using the speech recognition dictionary 3 based on the feature amount, and a character string of a speech recognition result The process of outputting is repeated. The same applies to the following embodiments.
- the information database 4 stores at least one of facility information, address information, song information, and the like.
- the facility information includes, for example, the facility name, the genre to which the facility belongs, position data, business hours, the presence / absence of a parking lot
- the address information includes, for example, an address, position data, etc. Includes information such as name, artist name, song title, and age.
- the information database 4 is described as having the facility information stored therein, but it may be traffic information, weather information, address information, news, music information, movie information, program information, and the like.
- the information database 4 may be stored in, for example, an HDD or a flash memory, or may be on a network and accessed via communication means (not shown).
- the information search control unit 5 searches the information database 4 using the keyword output by the voice recognition unit 2 according to the search level set by the search level setting unit 11 described later, and acquires information.
- the search level is an index representing how much detailed information is acquired from the information database 4 (which hierarchy), and is defined for each keyword.
- Figure 3 shows an example of search level definition. For example, when searching using the keyword “gas station” as a search key, if the set search level is “1”, the facility name and address information are acquired, and if the search level is “2”, the facility name In addition to address information, information on at least one specified item of business hours or gasoline prices is acquired.
- the search level is not set, the information search control unit 5 does not perform a search process.
- the search level “0” may be set so that the search level is not set.
- FIG. 4 shows an example of the search level for each keyword set in the information search control unit 5 by the search level setting unit 11 described later.
- one item may be set as additional information as shown in FIG. 4A.
- business hours information is acquired in addition to the facility name and address information.
- FIG. 4B a plurality of items may be set as additional information. If only the search level is set, information may be acquired for all items of that level.
- the information presentation control unit 6 gives an instruction to display the search result acquired by the icon or the information search control unit 5 on the display unit 7 described later according to the search level. Specifically, when the search level is not set, the genre name icon 72 as shown in FIG. 1A is displayed. When the search level is set, the information is acquired by the information search control unit 5. The search results are displayed like a search result list 73 shown in FIG.
- the display unit 7 is a display-integrated touch panel, and includes, for example, an LCD (Liquid Crystal Display) and a touch sensor, and displays a search result according to an instruction from the information presentation control unit 6. Further, the user can operate by directly touching the display unit (touch panel) 7.
- LCD Liquid Crystal Display
- the operation input unit 8 is an operation key, an operation button, a touch panel, or the like that receives an operation input from a user and inputs the instruction to the in-vehicle navigation device.
- Various instructions by the user are recognized by the hardware switch provided in the in-vehicle navigation complex device, the touch switch set and displayed on the display, or the remote control installed on the handle or other remote control The thing by an apparatus etc. is mentioned.
- the operation response analysis unit 9 specifies a user operation based on information received by the operation input unit 8 and information on a screen displayed on the display unit 7.
- the identification of the user's operation is not an essential matter of the present invention, and a description thereof is omitted because a known technique may be used.
- the operation display history storage unit 10 displays the display contents displayed on the display unit 7 by the user's operation specified by the operation response analysis unit 9 and the number of times of display thereof, as history information.
- FIG. 5 shows history information by the user for each keyword stored in the operation display history storage unit 10. For example, as shown in FIG. 5, the content displayed by the user's operation and the number of times the content is displayed are stored for each keyword as shown in FIG. 5, and when the user's operation is specified by the operation response analysis unit 9, The number of times for the displayed content is incremented and saved.
- the search level setting unit 11 refers to the history information stored in the operation display history storage unit 10 and sets a search level for each keyword used as a search key in the information search control unit 5 according to the history information.
- the search level set in the information search control unit 5 is a level corresponding to display contents that are equal to or greater than the predetermined display count (or display contents that exceed the predetermined display count).
- the search level set in the information search control unit 5 is a level corresponding to display contents that are equal to or greater than the predetermined display count (or display contents that exceed the predetermined display count).
- storage part 10 becomes more than predetermined number, a search level is changed, Every time the number of display times exceeds a predetermined number, the search level is raised.
- the search level “1” (see FIG. 3) for searching for a name / address corresponding to the predetermined number of times 3 times or more is set.
- the search level is raised to “2”.
- the search level for the display content with the deepest hierarchy may be set. For example, if the predetermined number of times as the threshold is also set to 3 times, in the keyword “convenience store” shown in FIG. 5, the name / address display of level 1 is 5 times, and the business hours display and recommended product display of level 2 are both Since it is four times, the search level “2” (refer to FIG. 3) for searching for business hours and recommended products corresponding to the predetermined number of times three or more and the deepest display content is set.
- the predetermined number of times as the threshold has been described as being 3 times, but the same value may be used for all keywords, or a different value may be used for each keyword.
- the search level setting method shown here is an example, and a search level determined by another method may be set.
- the voice acquisition unit 1 takes in a user utterance collected by a microphone, that is, an input voice, and performs A / D conversion using, for example, PCM (step ST01).
- the voice recognition unit 2 detects a voice section corresponding to the content spoken by the user from the voice signal digitized by the voice acquisition unit 1, extracts a feature amount of the voice data of the voice section, and A recognition process is performed using the speech recognition dictionary 3 based on the feature amount, and a character string serving as a keyword is extracted and output (step ST02).
- the information search control unit 5 uses the keyword output by the voice recognition unit 2 according to the search level as a search key.
- the information database 4 is searched and information is acquired (step ST04).
- the information presentation control unit 6 instructs the display unit 7 to display the search result acquired by the information search control unit 5 (step ST05).
- step ST06 when the search level is not set (NO in step ST03), an icon corresponding to the keyword is displayed (step ST06). Subsequently, when the display screen is operated by the user via the operation input unit 8, the operation response analysis unit 9 analyzes the operation, specifies the user's operation (step ST07), and specifies the search keyword. The operation history and display history stored in the operation display history storage unit 10 are updated by incrementing the number of times displayed by the user's operation (step ST08).
- the search level setting unit 11 determines whether or not the number of display contents stored in the operation display history storage unit 10 for the keyword extracted in step ST02 is equal to or greater than a predetermined number that is a preset threshold value. Is determined (step ST09). When it is determined that there is no display content more than the predetermined number of times (in the case of NO in step ST09), the process returns to step ST01. On the other hand, if it is determined that there is display content that is a predetermined number of times or more (YES in step ST09), the search level is determined based on the content, and the search level is set for the information search control unit 5. (Step ST10).
- the search level is not set in the information search control unit 5 and the number of screen display times for each keyword is all zero.
- the “predetermined number of times” used as a threshold value for determination in the search level setting unit 11 is set to two times.
- a screen for normal road guidance and a vehicle mark 71 are displayed on the screen 70 of the navigation device.
- User A “Soon gasoline will run out”
- User B “Is there a gas station nearby?”
- the voice signal digitized by the voice acquisition unit 1 is recognized by the voice recognition unit 2, and the keyword “gas station” is extracted and output (step ST01, step ST02).
- the search level for the keyword “gasoline station” is not set in the information search control unit 5, so the information search control unit 5 does not search the information database 4 (in step ST03). In the case of NO). Then, a display corresponding to the search level not set, that is, a genre name icon 73 of “gas station” is displayed on the screen 70 of the display unit 7 as shown in FIG. 1A, for example (step ST06).
- the information stored in the operation display history storage unit 10 is the keyword as shown in FIG.
- the search level “1” is set in the information search control unit 5 for the keyword “gas station”. Is obtained, and the search result list 73 is displayed as a search result as shown in FIG. 8A (in the case of YES in step ST03, step ST04, step ST05).
- the search result list 73 is displayed as a search result as shown in FIG. 8A (in the case of YES in step ST03, step ST04, step ST05).
- a screen shown in FIG. 1C is displayed.
- the information stored in the operation display history storage unit 10 includes the name / address display count “3”, the business hours display count “2”, and the price display count “ The content is “0”, and the number of times the business hours are displayed is equal to or greater than the predetermined number “2”, which is the threshold value. Therefore, the search level “2” and the additional information “business hours” are set for the information search control unit 5.
- the information stored in the operation display history storage unit 10 includes the number of times of name / address display “4”, the number of business hours display “2”, and the number of price display “2”. ", All items are equal to or greater than the predetermined number of times" 2 "which is a threshold used for determination in the search level setting unit 11, so that the search level” 2 "and additional information” “Business hours” and “Price” (or no additional information) are set.
- the information search control unit 5 searches the keyword “gas station” at the search level “2”, the additional information “business hours” and “price” (or no additional information). ) Is set, the business hours and prices are acquired from the information database 4, and the search result list 73 including the business hours and prices as shown in FIG. 8C is displayed as a search result.
- the contents and the number of times displayed by the user's operation are stored as history information.
- the same operation is performed by determining whether or not the same operation and display have been performed more than a predetermined number of times, such as checking the business hours every time.
- FIG. FIG. 9 is a block diagram showing an example of a speech recognition apparatus according to Embodiment 2 of the present invention.
- symbol is attached
- the sound setting unit 12 is further provided, and the number of times the user displays the information for the keyword recognized by the voice recognition unit 2 is a predetermined number or more (or If the predetermined number of times is exceeded, the user is alerted.
- the information search control unit 5 is based on the number of times the user displays information on the keyword recognized by the voice recognition unit 2.
- the ring setting unit 12 is instructed to output.
- the ringing setting unit 12 receives an instruction from the information search control unit 5, the ringing setting unit 12 changes the setting of the navigation device to perform a predetermined output.
- the predetermined output refers to, for example, a predetermined vibration or sound output such as a vibration of a seat, an output of a notification sound, and a sound output indicating that the keyword is recognized.
- steps ST11 to ST19 is the same as steps ST01 to ST09 in the flowchart of FIG.
- the search level is set (step ST20), and then the ring setting unit 12 changes the ring setting and performs a predetermined output (step ST21).
- the user can display information about the keyword in the past more than a predetermined number of times (or beyond the predetermined number of times). If it is determined that the search is performed, that is, according to the search level of the keyword, the ringing setting unit performs a predetermined output by vibration or voice to alert the user. It is possible to appropriately recognize that detailed information tailored to is immediately presented.
- FIG. FIG. 11 is a block diagram showing an example of a speech recognition apparatus according to Embodiment 3 of the present invention.
- symbol is attached
- the search level initialization unit 13 is further provided, and the user wants to initialize the history information stored in the operation display history storage unit 10. Can be initialized by speaking.
- the voice recognition dictionary 3 is further configured to recognize keywords such as “initialization” and “reset” which mean commands that return history information stored in the operation display history storage unit 10 to an initial state.
- the voice recognition unit 2 outputs the keyword as a recognition result.
- the search level initialization unit 13 extracts history information stored in the operation display history storage unit 10 when the voice recognition unit 2 extracts a keyword indicating a command for returning to an initial state such as “initialization” and “reset”. Is initialized.
- Steps ST31 to 32 and steps ST35 to ST42 are the same as steps ST11 to 12 and steps ST13 to ST20 in the flowchart of FIG.
- step ST32 If the keyword extracted by the voice recognition unit 2 in step ST32 is a keyword meaning a command for returning to the initial state such as “initialization” and “reset” (YES in step ST33), the operation display history storage is performed.
- the information stored in unit 10 is initialized, that is, returned to the initial state (step ST34). If it is a keyword other than that, the process after step ST35 is performed.
- the keyword extracted from the user's utterance content by the voice recognition unit is a keyword meaning a command for returning to the initial state such as “initialization” and “reset”.
- the history information stored in the operation display history storage unit is initialized, the display of detailed information according to the search level is not as expected, or the user changes
- the content of the operation display history storage unit can be returned to the initial state only by speaking a keyword meaning this command.
- FIG. FIG. 13 is a block diagram showing an example of a speech recognition apparatus according to Embodiment 4 of the present invention. Note that the same components as those described in the first to third embodiments are denoted by the same reference numerals, and redundant description is omitted.
- the speaker identification part 14 is further provided and the log
- the speaker identification unit 14 analyzes the voice signal digitized by the voice acquisition unit 1 and identifies the speaker (the user who spoke).
- the speaker identification method is not an essential matter of the present invention, and a known technique may be used.
- the operation display history storage unit 10 holds history information as shown in FIG. 5 for each user. Then, when a speaker (speaking user) is identified by the speaker identifying unit 14, the history information corresponding to the identified user is validated. Since other processes are the same as those in the first embodiment, description thereof is omitted. It is assumed that the speaker identified by the speaker identification unit 14 is a user who operates the operation input unit 8.
- the search level setting unit 11 refers to the history information stored in the operation display history storage unit 10 that is valid, and the keyword used as a search key in the information search control unit 5 according to the history information. Set the search level for each.
- the operation response analysis unit 9 validates the history information corresponding to the speaker identified by the speaker identification unit 14 from the operation display history storage unit 10 (step ST53).
- the subsequent processing of steps ST54 to ST62 is the same as steps ST02 to ST10 of the flowchart shown in FIG.
- the speaker is identified by the user's utterance
- the search level is set with reference to the history information stored for each utterer, and the detailed information corresponding thereto is obtained. Since the information is displayed, even if the user who uses the navigation device in which the voice recognition device is incorporated changes, the level of information required by each user can be presented immediately, and detailed information that is always necessary for the user. Can be provided efficiently, and the convenience for the user is further improved.
- the user's utterance content is always recognized.
- voice recognition may be performed only for a predetermined time after the button is pressed.
- the user may be able to set whether to always recognize or recognize only for a predetermined period.
- a navigation device incorporating a voice recognition device when activated even if the user is not conscious, by performing voice acquisition and voice recognition at all times, If there is an utterance, voice acquisition and voice recognition are automatically performed, keywords are extracted from the voice recognition results, a search level is set, and information at the level requested by the user is immediately displayed. Detailed information necessary for the user can always be efficiently provided without requiring the user's manual operation or input intention to start speech recognition.
- the voice recognition device is described as being incorporated in a vehicle-mounted navigation device.
- the device in which the voice recognition device of the invention is incorporated is not limited to a vehicle-mounted navigation device, It is possible to search and display information through dialogue between the user and the device, such as a navigation device for a moving body including a vehicle, a railroad, a ship or an aircraft, a portable navigation device, a portable information processing device, etc. Any device can be applied as long as it is a device.
- the device in which the voice recognition device of the present invention is incorporated is not limited to a vehicle-mounted navigation device, but a navigation device for a mobile object including a person, a vehicle, a railroad, a ship or an aircraft, a portable navigation device, and portable information.
- the present invention can be applied to any form as long as the apparatus can search and display information through a dialogue between the user and the apparatus, such as a processing apparatus.
- 1 voice acquisition unit 2 voice recognition unit, 3 voice recognition dictionary, 4 information database, 5 information search control unit, 6 information presentation control unit, 7 display unit, 8 operation input unit, 9 operation response analysis unit, 10 operation display history Storage section, 11 Search level setting section, 12 Ringing setting section, 13 Search level initialization section, 14 Speaker identification section, 70 Navigation device screen, 71 Vehicle mark, 72 Genre name icon, 73 Search result list, 74 facilities Mark, 75 details button.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Remote Sensing (AREA)
- Navigation (AREA)
- User Interface Of Digital Computer (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
まず初めに、この発明の前提となる音声認識装置が組み込まれたナビゲーション装置について説明する。図1は、一般的なナビゲーション装置の表示画面例を示す図である。
ユーザA:「そろそろガソリンがなくなるなぁ」
ユーザB:「近くにガソリンスタンドはないかなぁ」
図2は、この発明の実施の形態1による音声認識装置の一例を示すブロック図である。この音声認識装置は、車両(移動体)に搭載されたナビゲーション装置に組み込まれて使用されるものであり、音声取得部1、音声認識部2、音声認識辞書3、情報データベース4、情報検索制御部5、情報提示制御部6、表示部7、操作入力部8、操作応答解析部9、操作表示履歴記憶部10、検索レベル設定部11を備えている。
なお、ここに示した検索レベルの設定方法は一例であって、他の方法で決定された検索レベルを設定するようにしてもよい。
まず、音声取得部1は、マイクにより集音されたユーザ発話、すなわち、入力された音声を取込み、例えばPCMによりA/D変換する(ステップST01)。
次に、音声認識部2は、音声取得部1によりデジタル化された音声信号から、ユーザが発話した内容に該当する音声区間を検出し、該音声区間の音声データの特徴量を抽出し、その特徴量に基づいて音声認識辞書3を用いて認識処理を行い、キーワードとなる文字列抽出し、出力する(ステップST02)。
続いて、ユーザにより操作入力部8を介して表示画面が操作されると、操作応答解析部9が当該操作を解析し、ユーザの操作を特定し(ステップST07)、当該検索キーワードについて、特定されたユーザの操作により表示された内容に対する回数をインクリメントして、操作表示履歴記憶部10に保存されている操作履歴、表示履歴を更新する(ステップST08)。
ユーザA:「そろそろガソリンがなくなるなぁ」
ユーザB:「近くにガソリンスタンドはないかなぁ」
上記のような会話がなされたとすると、音声取得部1によりデジタル化された音声信号が音声認識部2により認識され、キーワード「ガソリンスタンド」が抽出されて出力される(ステップST01、ステップST02)。
図9は、この発明の実施の形態2による音声認識装置の一例を示すブロック図である。なお、実施の形態1で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。以下に示す実施の形態2では、実施の形態1と比べると、鳴動設定部12をさらに備えており、音声認識部2により認識されたキーワードに対するユーザの情報表示回数が所定回数以上である(または所定回数を超えている)場合に、ユーザに注意を促すものである。
鳴動設定部12は、情報検索制御部5からの指示を受けると、所定の出力を行うようナビゲーション装置の設定を変更する。ここで、所定の出力とは、例えば、シートの振動、報知音の出力、当該キーワードが認識された旨の音声出力など、予め定められた振動または音声による鳴動出力をいう。
ステップST11~ST19までの処理については、実施の形態1における図6のフローチャートのステップST01~ST09と同じであるため、説明を省略する。
そして、音声認識部2により抽出されたキーワードについて、操作履歴、表示履歴が所定回数以上である表示内容があると判定された場合(ステップST19のYESの場合)は、実施の形態1と同様に検索レベルを設定し(ステップST20)、その後、鳴動設定部12が鳴動設定を変更して所定の出力を行う(ステップST21)。
図11は、この発明の実施の形態3による音声認識装置の一例を示すブロック図である。なお、実施の形態1,2で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。以下に示す実施の形態3では、実施の形態2と比べると、検索レベル初期化部13をさらに備えており、操作表示履歴記憶部10に記憶されている履歴情報を、ユーザが初期化したい場合に発話により初期化することができる。
検索レベル初期化部13は、音声認識部2により「初期化」「リセット」等の初期状態に戻すコマンドを意味するキーワードが抽出されると、操作表示履歴記憶部10に記憶されている履歴情報を初期化する。
ステップST31~32およびステップST35~42は実施の形態2における図10のフローチャートのステップST11~12およびステップST13~20と同じであるため、説明を省略する。
図13は、この発明の実施の形態4による音声認識装置の一例を示すブロック図である。なお、実施の形態1~3で説明したものと同様の構成には、同一の符号を付して重複した説明を省略する。以下に示す実施の形態4では、実施の形態1と比べると、話者識別部14をさらに備えており、発話者(発話したユーザ)毎に参照する履歴情報を変更するものである。
まず、音声取得部1は、マイクにより集音されたユーザ発話、すなわち、入力された音声を取り込み、例えばPCMによりA/D変換する(ステップST51)。
次に、話者識別部14は、音声取得部1により取り込まれた音声信号を解析し、発話者を識別する(ステップST52)。
その後のステップST54~ST62の処理については、実施の形態1における図6に示すフローチャートのステップST02~ST10と同じであるため、説明を省略する。
Claims (6)
- ユーザにより発話された音声を検知して取得する音声取得部と、
前記音声取得部により取得された音声データを認識してキーワードを抽出する音声認識部と、
前記ユーザからの操作入力を受け付ける操作入力部と、
前記ユーザに情報を提示する表示部と、
前記操作入力部により受け付けられた情報および前記表示部に表示された情報に基づいて、前記ユーザの操作を特定する操作応答解析部と、
前記音声認識部により抽出されたキーワード毎に、前記操作応答解析部により特定された操作により前記表示部に表示された表示内容とその表示回数を履歴情報として記憶する操作表示履歴記憶部と、
前記操作表示履歴記憶部に記憶されている履歴情報に応じて、前記音声認識部により抽出されたキーワードの検索レベルを設定する検索レベル設定部と、
前記検索レベル設定部により設定された検索レベルにしたがって、前記音声認識部により抽出されたキーワードを検索キーとして情報を検索して検索結果を取得する情報検索制御部と、
前記情報検索制御部により取得された検索結果を、前記表示部に表示させる指示を行う情報提示制御部と、を備え、
前記検索レベル設定部は、前記音声認識部により抽出されたキーワードについて、前記操作表示履歴記憶部に記憶されている履歴情報の中の表示回数が所定回数以上になった場合に、前記検索レベルを変更する
ことを特徴とする音声認識装置。 - 前記検索レベル設定部は、前記音声認識部により抽出されたキーワードについて、前記操作表示履歴記憶部に記憶されている履歴情報の中の表示回数が前記所定回数以上になるたびに、前記検索レベルを上げる
ことを特徴とする請求項1記載の音声認識装置。 - 前記情報検索制御部が前記音声認識部により抽出されたキーワードを検索キーとして検索する情報は、施設情報、交通情報、天気情報、住所情報、ニュース、音楽情報、映画情報または番組情報のいずれかである
ことを特徴とする請求項1記載の音声認識装置。 - 前記音声取得部により取得された音声を発話したユーザを特定する話者識別部をさらに備え、
前記操作表示履歴記憶部は、ユーザ毎に履歴情報を記憶しており、前記話者識別部により特定されたユーザの履歴情報を有効にし、
前記検索レベル設定部は、前記操作表示履歴記憶部において有効にされた履歴情報を参照して、前記検索レベルを設定する
ことを特徴とする請求項1記載の音声認識装置。 - 前記検索レベルに応じて、振動または音声により前記ユーザに注意を促す鳴動設定部をさらに備える
ことを特徴とする請求項1記載の音声認識装置。 - 前記音声認識部により抽出されたキーワードが、初期状態に戻すコマンドを意味するキーワードであった場合に、前記操作表示履歴記憶部に記憶されている履歴情報を初期状態に戻す検索レベル初期化部をさらに備える
ことを特徴とする請求項1記載の音声認識装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112012006652.9T DE112012006652T5 (de) | 2012-07-03 | 2012-07-03 | Spracherkennungsvorrichtung |
CN201280074470.1A CN104428766B (zh) | 2012-07-03 | 2012-07-03 | 语音识别装置 |
US14/398,933 US9269351B2 (en) | 2012-07-03 | 2012-07-03 | Voice recognition device |
JP2014523470A JP5925313B2 (ja) | 2012-07-03 | 2012-07-03 | 音声認識装置 |
PCT/JP2012/066974 WO2014006690A1 (ja) | 2012-07-03 | 2012-07-03 | 音声認識装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2012/066974 WO2014006690A1 (ja) | 2012-07-03 | 2012-07-03 | 音声認識装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014006690A1 true WO2014006690A1 (ja) | 2014-01-09 |
Family
ID=49881481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/066974 WO2014006690A1 (ja) | 2012-07-03 | 2012-07-03 | 音声認識装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9269351B2 (ja) |
JP (1) | JP5925313B2 (ja) |
CN (1) | CN104428766B (ja) |
DE (1) | DE112012006652T5 (ja) |
WO (1) | WO2014006690A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017004193A (ja) * | 2015-06-09 | 2017-01-05 | 凸版印刷株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JP2019079345A (ja) * | 2017-10-25 | 2019-05-23 | アルパイン株式会社 | 情報提示装置、情報提示システム、端末装置 |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102091003B1 (ko) * | 2012-12-10 | 2020-03-19 | 삼성전자 주식회사 | 음성인식 기술을 이용한 상황 인식 서비스 제공 방법 및 장치 |
CN105144222A (zh) * | 2013-04-25 | 2015-12-09 | 三菱电机株式会社 | 评价信息投稿装置及评价信息投稿方法 |
US10008204B2 (en) * | 2014-06-30 | 2018-06-26 | Clarion Co., Ltd. | Information processing system, and vehicle-mounted device |
JP6418820B2 (ja) * | 2014-07-07 | 2018-11-07 | キヤノン株式会社 | 情報処理装置、表示制御方法、及びコンピュータプログラム |
CN104834691A (zh) * | 2015-04-22 | 2015-08-12 | 中国建设银行股份有限公司 | 一种语音机器人 |
US10018977B2 (en) * | 2015-10-05 | 2018-07-10 | Savant Systems, Llc | History-based key phrase suggestions for voice control of a home automation system |
JP6625508B2 (ja) * | 2016-10-24 | 2019-12-25 | クラリオン株式会社 | 制御装置、制御システム |
JP6920878B2 (ja) | 2017-04-28 | 2021-08-18 | フォルシアクラリオン・エレクトロニクス株式会社 | 情報提供装置、及び情報提供方法 |
KR102353486B1 (ko) * | 2017-07-18 | 2022-01-20 | 엘지전자 주식회사 | 이동 단말기 및 그 제어 방법 |
JP6978174B2 (ja) * | 2017-10-11 | 2021-12-08 | アルパイン株式会社 | 評価情報生成システムおよび車載装置 |
KR20200042127A (ko) * | 2018-10-15 | 2020-04-23 | 현대자동차주식회사 | 대화 시스템, 이를 포함하는 차량 및 대화 처리 방법 |
CN113113029A (zh) * | 2018-08-29 | 2021-07-13 | 胡开良 | 无人机声纹新闻追踪方法 |
US11094327B2 (en) * | 2018-09-28 | 2021-08-17 | Lenovo (Singapore) Pte. Ltd. | Audible input transcription |
JP7266432B2 (ja) * | 2019-03-14 | 2023-04-28 | 本田技研工業株式会社 | エージェント装置、エージェント装置の制御方法、およびプログラム |
CN109996026B (zh) * | 2019-04-23 | 2021-01-19 | 广东小天才科技有限公司 | 基于穿戴式设备的视频特效互动方法、装置、设备及介质 |
CN111696548A (zh) * | 2020-05-13 | 2020-09-22 | 深圳追一科技有限公司 | 显示行车提示信息的方法、装置、电子设备以及存储介质 |
CN113470636B (zh) * | 2020-07-09 | 2023-10-27 | 青岛海信电子产业控股股份有限公司 | 一种语音信息处理方法、装置、设备及介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002297374A (ja) * | 2001-03-30 | 2002-10-11 | Alpine Electronics Inc | 音声検索装置 |
JP2007206886A (ja) * | 2006-01-31 | 2007-08-16 | Canon Inc | 情報処理装置および方法 |
WO2008136105A1 (ja) * | 2007-04-25 | 2008-11-13 | Pioneer Corporation | 表示装置、表示方法、表示プログラム、および記録媒体 |
WO2009147745A1 (ja) * | 2008-06-06 | 2009-12-10 | 三菱電機株式会社 | 検索装置 |
WO2010013369A1 (ja) * | 2008-07-30 | 2010-02-04 | 三菱電機株式会社 | 音声認識装置 |
JP2011075525A (ja) * | 2009-10-02 | 2011-04-14 | Clarion Co Ltd | ナビゲーション装置、および操作メニュー変更方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004030400A (ja) * | 2002-06-27 | 2004-01-29 | Fujitsu Ten Ltd | 検索システム |
US7386454B2 (en) * | 2002-07-31 | 2008-06-10 | International Business Machines Corporation | Natural error handling in speech recognition |
JP2004185240A (ja) * | 2002-12-02 | 2004-07-02 | Alpine Electronics Inc | 操作履歴再現機能を有する電子機器および操作履歴の再現方法 |
US9224394B2 (en) * | 2009-03-24 | 2015-12-29 | Sirius Xm Connected Vehicle Services Inc | Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same |
WO2006085565A1 (ja) * | 2005-02-08 | 2006-08-17 | Nippon Telegraph And Telephone Corporation | 情報通信端末、情報通信システム、情報通信方法、情報通信プログラムおよびそれを記録した記録媒体 |
JP4736982B2 (ja) | 2006-07-06 | 2011-07-27 | 株式会社デンソー | 作動制御装置、プログラム |
DE112007002665B4 (de) * | 2006-12-15 | 2017-12-28 | Mitsubishi Electric Corp. | Spracherkennungssystem |
WO2008084575A1 (ja) * | 2006-12-28 | 2008-07-17 | Mitsubishi Electric Corporation | 車載用音声認識装置 |
CN101499277B (zh) * | 2008-07-25 | 2011-05-04 | 中国科学院计算技术研究所 | 一种服务智能导航方法和系统 |
CN104412323B (zh) * | 2012-06-25 | 2017-12-12 | 三菱电机株式会社 | 车载信息装置 |
JP2014109889A (ja) * | 2012-11-30 | 2014-06-12 | Toshiba Corp | コンテンツ検索装置、コンテンツ検索方法及び制御プログラム |
-
2012
- 2012-07-03 JP JP2014523470A patent/JP5925313B2/ja not_active Expired - Fee Related
- 2012-07-03 CN CN201280074470.1A patent/CN104428766B/zh not_active Expired - Fee Related
- 2012-07-03 US US14/398,933 patent/US9269351B2/en not_active Expired - Fee Related
- 2012-07-03 DE DE112012006652.9T patent/DE112012006652T5/de not_active Withdrawn
- 2012-07-03 WO PCT/JP2012/066974 patent/WO2014006690A1/ja active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002297374A (ja) * | 2001-03-30 | 2002-10-11 | Alpine Electronics Inc | 音声検索装置 |
JP2007206886A (ja) * | 2006-01-31 | 2007-08-16 | Canon Inc | 情報処理装置および方法 |
WO2008136105A1 (ja) * | 2007-04-25 | 2008-11-13 | Pioneer Corporation | 表示装置、表示方法、表示プログラム、および記録媒体 |
WO2009147745A1 (ja) * | 2008-06-06 | 2009-12-10 | 三菱電機株式会社 | 検索装置 |
WO2010013369A1 (ja) * | 2008-07-30 | 2010-02-04 | 三菱電機株式会社 | 音声認識装置 |
JP2011075525A (ja) * | 2009-10-02 | 2011-04-14 | Clarion Co Ltd | ナビゲーション装置、および操作メニュー変更方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017004193A (ja) * | 2015-06-09 | 2017-01-05 | 凸版印刷株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JP2019079345A (ja) * | 2017-10-25 | 2019-05-23 | アルパイン株式会社 | 情報提示装置、情報提示システム、端末装置 |
Also Published As
Publication number | Publication date |
---|---|
CN104428766A (zh) | 2015-03-18 |
JPWO2014006690A1 (ja) | 2016-06-02 |
JP5925313B2 (ja) | 2016-05-25 |
US9269351B2 (en) | 2016-02-23 |
US20150120300A1 (en) | 2015-04-30 |
CN104428766B (zh) | 2017-07-11 |
DE112012006652T5 (de) | 2015-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5925313B2 (ja) | 音声認識装置 | |
JP6400109B2 (ja) | 音声認識システム | |
JP5921722B2 (ja) | 音声認識装置および表示方法 | |
JP5158174B2 (ja) | 音声認識装置 | |
JP5677650B2 (ja) | 音声認識装置 | |
WO2013005248A1 (ja) | 音声認識装置およびナビゲーション装置 | |
JP5893217B2 (ja) | 音声認識装置および表示方法 | |
JP2014142566A (ja) | 音声認識システムおよび音声認識方法 | |
US9881609B2 (en) | Gesture-based cues for an automatic speech recognition system | |
JP4466379B2 (ja) | 車載音声認識装置 | |
WO2015125274A1 (ja) | 音声認識装置、システムおよび方法 | |
CN105448293A (zh) | 语音监听及处理方法和设备 | |
US20130013310A1 (en) | Speech recognition system | |
US20220198151A1 (en) | Dialogue system, a vehicle having the same, and a method of controlling a dialogue system | |
JP6522009B2 (ja) | 音声認識システム | |
US20160019892A1 (en) | Procedure to automate/simplify internet search based on audio content from a vehicle radio | |
JP3296783B2 (ja) | 車載用ナビゲーション装置および音声認識方法 | |
US20210303263A1 (en) | Dialogue system and vehicle having the same, and method of controlling dialogue system | |
JP4624825B2 (ja) | 音声対話装置および音声対話方法 | |
JP3759313B2 (ja) | 車載用ナビゲーション装置 | |
JP5446540B2 (ja) | 情報検索装置、制御方法及びプログラム | |
JP2001154691A (ja) | 音声認識装置 | |
WO2015102039A1 (ja) | 音声認識装置 | |
JP7010585B2 (ja) | 音コマンド入力装置 | |
JP2022018605A (ja) | 電子機器及び音声起動方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12880630 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014523470 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14398933 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120120066529 Country of ref document: DE Ref document number: 112012006652 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12880630 Country of ref document: EP Kind code of ref document: A1 |