WO2013005248A1 - 音声認識装置およびナビゲーション装置 - Google Patents
音声認識装置およびナビゲーション装置 Download PDFInfo
- Publication number
- WO2013005248A1 WO2013005248A1 PCT/JP2011/003827 JP2011003827W WO2013005248A1 WO 2013005248 A1 WO2013005248 A1 WO 2013005248A1 JP 2011003827 W JP2011003827 W JP 2011003827W WO 2013005248 A1 WO2013005248 A1 WO 2013005248A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- recognition
- unit
- voice
- speech
- recognition result
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000013500 data storage Methods 0.000 claims abstract description 34
- 239000000284 extract Substances 0.000 claims abstract description 9
- 238000010187 selection method Methods 0.000 claims description 28
- 238000001514 detection method Methods 0.000 claims description 18
- 238000012790 confirmation Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 102100029860 Suppressor of tumorigenicity 20 protein Human genes 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
Definitions
- the present invention relates to a voice recognition device and a navigation device equipped with the same.
- Patent Document 1 discloses a speech recognition apparatus that divides a speech recognition target and performs recognition in a plurality of times.
- the speech recognition target is divided and the speech recognition is sequentially performed. If the recognition score (likelihood) of the recognition result is equal to or greater than the threshold value, the recognition result is determined and the process is terminated. Further, when there is no recognition result with a recognition score equal to or higher than the threshold value, a final recognition result is obtained with the highest recognition score among the obtained recognition results. In this way, it is possible to prevent the recognition rate from being lowered by dividing the speech recognition target. In addition, since the process is terminated when the recognition score of the recognition result is equal to or higher than the threshold value, the time required for the recognition process can be shortened.
- Patent Document 1 when recognition is sequentially performed by different speech recognition processes such as a syntax type and a dictation type, the recognition scores (likelihoods) of the recognition results cannot be simply compared. . For this reason, if there is no recognition result with a recognition score equal to or higher than the above threshold, the recognition result with the highest recognition score cannot be selected and the recognition result is presented to the user. There was a problem that it was not possible.
- the present invention has been made to solve the above-described problems.
- a speech recognition apparatus capable of accurately presenting recognition results obtained by different speech recognition processing and shortening the recognition processing, and
- An object of the present invention is to obtain a navigation device provided with this.
- the speech recognition apparatus is stored in an acquisition unit that digitally converts input speech and acquires it as speech data, a speech data storage unit that stores speech data acquired by the acquisition unit, and a speech data storage unit.
- a plurality of speech recognition units that detect speech sections from the extracted speech data, extract feature values of the speech data of the speech sections, perform recognition processing by referring to a recognition dictionary based on the extracted feature amounts, and a plurality of speech
- a switching unit that switches the recognition unit, a control unit that controls switching of the voice recognition unit by the switching unit, and obtains a recognition result by the switched voice recognition unit;
- a selection unit that selects a recognition result of the presentation target.
- the present invention it is possible to accurately present recognition results obtained by different voice recognition processes and to shorten the recognition process.
- FIG. 1 It is a block diagram which shows the structure of the navigation apparatus provided with the speech recognition apparatus which concerns on Embodiment 1 of this invention.
- 3 is a flowchart showing a flow of speech recognition processing by the speech recognition apparatus according to Embodiment 1. It is a figure which shows the example of a display of the recognition result for every audio
- 10 is a flowchart illustrating a flow of voice recognition processing by the voice recognition device according to the third embodiment. It is a block diagram which shows the structure of the speech recognition apparatus which concerns on Embodiment 4 of this invention. 10 is a flowchart illustrating a flow of voice recognition processing by the voice recognition device according to the fourth embodiment. It is a block diagram which shows the structure of the speech recognition apparatus which concerns on Embodiment 5 of this invention. 10 is a flowchart illustrating a flow of voice recognition processing by the voice recognition device according to the fifth embodiment.
- FIG. 1 is a block diagram showing a configuration of a navigation apparatus provided with a speech recognition apparatus according to Embodiment 1 of the present invention.
- the navigation apparatus according to Embodiment 1 shows a case where the speech recognition apparatus according to Embodiment 1 is applied to an in-vehicle navigation apparatus mounted on a vehicle that is a moving body.
- a voice acquisition unit 1 As a configuration of the voice recognition device, a voice acquisition unit 1, a voice data storage unit 2, a voice recognition unit 3, a voice recognition switching unit 4, a recognition control unit 5, a recognition result selection unit 6, and a recognition result storage unit 7 are provided, and navigation is performed.
- DB map database
- the voice acquisition unit 1 is an acquisition unit that performs analog / digital conversion on a predetermined period of time inputted by a microphone or the like, and acquires the data as, for example, PCM (Pulse Code Modulation) format voice data.
- the audio data storage unit 2 is a storage unit that stores the audio data acquired by the audio acquisition unit 1.
- the speech recognition unit 3 includes a plurality of speech recognition units (hereinafter referred to as first to Mth speech recognition units) that perform different speech recognition processes such as a syntax type and a dictation type.
- the first to Mth speech recognition units detect speech sections corresponding to the content spoken by the user from the speech data acquired by the speech acquisition unit 1 according to each speech recognition algorithm, and feature values of the speech data in the speech sections And performing a recognition process while referring to the recognition dictionary based on the extracted feature amount.
- the voice recognition switching unit 4 is a switching unit that switches the first to Mth voice recognition units in response to a switching control signal from the recognition control unit 5.
- the recognition control unit 5 is a control unit that controls switching of the voice recognition unit by the voice recognition switching unit 4 and acquires the recognition result of the voice recognition unit after switching.
- the recognition result selection unit 6 is a selection unit that selects a recognition result to be output from the recognition results acquired by the recognition control unit 5.
- the recognition result storage unit 7 is a storage unit that stores the recognition result selected by the recognition result selection unit 6.
- the display unit 8 is a display unit that displays the recognition result stored in the recognition result storage unit 7 or the processing result of the navigation processing unit 9.
- the navigation processing unit 9 is a functional configuration unit that performs navigation processing such as route calculation, route guidance, and map display.
- the navigation processing unit 9 stores the current position of the vehicle acquired by the position detection unit 10, the destination input by the voice recognition device according to the first embodiment or the input unit 12, and the map database (DB) 11. The route from the current vehicle position to the destination is calculated using the map data. Then, the navigation processing unit 9 guides the route obtained by route calculation. Further, the navigation processing unit 9 displays a map including the vehicle position on the display unit 8 using the current position of the vehicle and the map data stored in the map DB 11.
- the position detection unit 10 is a functional configuration unit that acquires the position information (latitude and longitude) of the vehicle from an analysis result such as GPS (Global Positioning System) radio waves.
- the map DB 11 is a database in which map data used by the navigation processing unit 9 is registered. Map data includes topographic map data, residential map data, road networks, and the like.
- the input unit 12 is a functional configuration unit that receives a destination setting input or various operations by the user, and is realized by, for example, a touch panel mounted on the screen of the display unit 8.
- FIG. 2 is a flowchart showing the flow of speech recognition processing by the speech recognition apparatus according to the first embodiment.
- the audio acquisition unit 1 performs A / D conversion on audio for a predetermined period input by a microphone or the like, and acquires it as, for example, PCM format audio data (step ST10).
- the voice data storage unit 2 stores the voice data acquired by the voice acquisition unit 1 (step ST20).
- the recognition control unit 5 initializes the variable N to 1 (step ST30).
- N is a variable that can take values from 1 to M.
- the recognition control unit 5 outputs a switching control signal for switching the voice recognition unit 3 to the Nth voice recognition unit to the voice recognition switching unit 4.
- the speech recognition switching unit 4 switches the speech recognition unit 3 to the Nth speech recognition unit in accordance with the switching control signal from the recognition control unit 5 (step ST40).
- the Nth speech recognition unit detects a speech section corresponding to the user utterance from speech data stored in the speech data storage unit 2, extracts a feature amount of the speech data in the speech segment, and recognizes based on the feature amount Recognition processing is performed with reference to the dictionary (step ST50).
- the recognition control unit 5 acquires a recognition result from the Nth speech recognition unit, compares the first recognition score (likelihood) in the recognition result with a predetermined threshold, and determines whether or not the threshold is equal to or higher than the threshold. Determination is made (step ST60).
- the predetermined threshold value is used for determining whether or not to continue the recognition process by switching to another voice recognition unit, and the predetermined threshold value is the first to Mth voice recognition units. It is set for each.
- the recognition result selection unit 6 uses a method described later from the recognition results obtained by the Nth speech recognition unit acquired by the recognition control unit 5. A recognition result to be output is selected (step ST70). Thereafter, the display unit 8 displays the recognition result selected by the recognition result selection unit 6 and stored in the recognition result storage unit 7 (step ST80). On the other hand, when the first recognition score is less than the threshold (step ST60; NO), the recognition result selection unit 6 uses a recognition result obtained by the Nth speech recognition unit acquired by the recognition control unit 5 to be described later. A recognition result to be output is selected (step ST90).
- the recognition result selection unit 6 stores the selected recognition result in the recognition result storage unit 7 (step ST100).
- the recognition control unit 5 increments the variable N by +1 (step ST110), and the value of the variable N sets the number M of voice recognition units. It is determined whether it has been exceeded (step ST120).
- step ST120 When the value of the variable N exceeds the number M of speech recognition units (step ST120; YES), the display unit 8 outputs the recognition results of the first to Mth speech recognition units stored in the recognition result storage unit 7 (step ST130). ). The output of the recognition result by the display unit 8 may be output in the order of the recognition result for each voice recognition unit.
- step ST120; NO When the value of the variable N is equal to or less than the number of voice recognition units M (step ST120; NO), the process returns to step ST40. Thus, the above processing is repeated by the voice recognition unit after switching.
- the recognition result selection unit 6 selects a higher recognition score from the recognition results acquired by the recognition control unit 5.
- the recognition result having the first recognition score may be selected, or all of the recognition results acquired by the recognition control unit 5 may be selected.
- you may select the recognition result contained in the X rank from the high rank of a recognition score.
- a recognition result whose difference from the first recognition score is a predetermined value or less may be selected.
- the recognition result included from the top of the recognition score to the X position, or the recognition result whose difference from the first recognition score is a predetermined value or less the recognition result whose recognition score is less than a predetermined threshold is , It may not be selected.
- FIG. 3 is a diagram showing a display example of recognition results for each voice recognition unit having a recognition score from the top to the second.
- speech recognition processing 1 indicates, for example, the recognition result of the first speech recognition unit
- speech recognition processing 2 indicates, for example, the recognition result of the second speech recognition unit. Is shown. The same applies to “voice recognition processing 3”, “voice recognition processing 4”,.
- the recognition results (likelihood) of the recognition scores (likelihood) from the top to the second are displayed in order for each voice recognition unit.
- FIG. 4 is a diagram illustrating a display example of recognition results selected by different methods for each voice recognition unit.
- the recognition results of the first speech recognition unit (“speech recognition processing 1”) are selected and displayed with the recognition scores from the top to the second.
- speech recognition process 2 For the second speech recognition unit (“speech recognition process 2”), all recognition results are selected and displayed.
- the recognition result selection method may be different for each voice recognition unit.
- the navigation processing unit 9 uses the current position of the host vehicle acquired by the position detection unit 10, the recognition result of the destination read from the recognition result storage unit 7, and the map data stored in the map DB 11. The route from the position to the destination is calculated, and the obtained route is guided.
- the voice acquisition unit 1 that digitally converts the input voice and acquires it as voice data
- the voice data storage unit that stores the voice data acquired by the voice acquisition unit 1 2
- the speech section is detected from the speech data stored in the speech data storage unit 2
- the feature amount of the speech data in the speech section is extracted
- the recognition process is performed by referring to the recognition dictionary based on the extracted feature amount.
- the first to Mth voice recognition units to be performed, the voice recognition switching unit 4 for switching the first to Mth voice recognition units, and the switching of the voice recognition unit by the voice recognition switching unit 4 are controlled to be switched.
- a recognition control unit 5 that acquires a recognition result by the recognition unit, and a recognition result selection unit 6 that selects a recognition result to be presented to the user from the recognition results acquired by the recognition control unit 5.
- FIG. FIG. 5 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.
- the speech recognition apparatus according to Embodiment 2 includes a speech acquisition unit 1, a speech data storage unit 2, a speech recognition unit 3, a speech recognition switching unit 4, a recognition control unit 5, a recognition result selection unit 6A, and a recognition result.
- a storage unit 7 and a recognition result selection method changing unit 13 are provided.
- the recognition result selection unit 6A selects a recognition result to be output from the recognition results acquired by the recognition control unit 5 in accordance with the selection method control signal from the recognition result selection method changing unit 13.
- the recognition result selection method changing unit 13 accepts designation of a recognition result selection method by the recognition result selection unit 6A for each of the first to Mth speech recognition units, and changes the selection method to a selection method designated by the user. It is a functional configuration unit that outputs a signal to the recognition result selection unit 6A.
- FIG. 5 the same components as those in FIG.
- the recognition result selection method changing unit 13 displays a screen for designating a recognition result selection method on the display unit 8 and provides an HMI (Human Machine Interface) that accepts designation by the user. For example, a designation screen that associates each of the first to Mth speech recognition units with the selection method is displayed by a user operation.
- a selection method is set for each speech recognition unit in the recognition result selection unit 6A.
- the user can specify the selection method for each voice recognition unit as desired, and may specify the selection method for each voice recognition unit according to the usage status of the voice recognition device.
- the selection method may be specified so that many recognition results by the voice recognition unit having a high level of importance are selected.
- the selection method may not be specified, that is, the recognition result by the voice recognition unit may not be output.
- the voice recognition by the voice recognition apparatus is the same as the flowchart of FIG. 2 shown in the first embodiment.
- the recognition result selection unit 6A selects the recognition result by the selection method set by the recognition result selection method changing unit 13. For example, for the recognition result acquired by the recognition control unit 5 from the first speech recognition unit, the one having the first recognition score is selected, and for the recognition result acquired from the second speech recognition unit, all are selected.
- the user can determine a recognition result selection method for each voice recognition unit.
- Other processes are the same as those in the first embodiment.
- the recognition result selection method changing unit 13 for changing the recognition result selection method by the recognition result selection unit 6A is provided.
- the user can specify a method for selecting a recognition result by the recognition result selection unit 6A. For example, it is possible to focus on presenting the result of speech recognition processing that is considered optimal according to the usage situation. .
- FIG. FIG. 6 is a block diagram showing the configuration of the speech recognition apparatus according to Embodiment 3 of the present invention.
- the speech recognition apparatus according to Embodiment 3 includes a speech acquisition unit 1, a speech data storage unit 2A, a speech recognition unit 3, a speech recognition switching unit 4, a recognition control unit 5, and a recognition result selection unit 6.
- the recognition result storage unit 7 and the voice section detection unit 14 are provided.
- FIG. 6 the same components as those in FIG.
- the voice data storage unit 2A is a storage unit that stores voice data of a voice section detected by the voice section detection unit 14.
- the voice section detection unit 14 is a voice section detection unit that detects voice data in a voice section corresponding to the content spoken by the user from the voice data acquired by the voice acquisition unit 1.
- the first to Mth speech recognition units extract feature amounts from the speech data stored in the speech data storage unit 2A, and perform recognition processing with reference to the recognition dictionary based on the feature amounts. As described above, in Embodiment 3, the first to Mth speech recognition units do not individually perform speech segment detection processing.
- FIG. 7 is a flowchart showing the flow of speech recognition processing by the speech recognition apparatus according to the third embodiment.
- the audio acquisition unit 1 performs A / D conversion on audio for a predetermined period input by a microphone or the like, and acquires the audio data, for example, in PCM format (step ST210).
- the voice section detection unit 14 detects voice data of a section corresponding to the content spoken by the user from the voice data acquired by the voice acquisition unit 1 (step ST220).
- the voice data storage unit 2A stores the voice data detected by the voice segment detection unit 14 (step ST230).
- the recognition control unit 5 initializes the variable N to 1 (step ST240). Then, the recognition control unit 5 outputs a switching control signal for switching the voice recognition unit 3 to the Nth voice recognition unit to the voice recognition switching unit 4. The speech recognition switching unit 4 switches the speech recognition unit 3 to the Nth speech recognition unit in accordance with the switching control signal from the recognition control unit 5 (step ST250).
- the Nth speech recognition unit extracts a feature amount from the speech data for each speech section stored in the speech data storage unit 2A, and performs recognition processing while referring to the recognition dictionary based on the feature amount (step ST260).
- the subsequent processing from step ST270 to step ST340 is the same as the processing from step ST60 to step ST130 in FIG.
- the voice acquisition unit 1 that digitally converts the input voice and acquires it as voice data, and the user's utterance content from the voice data acquired by the voice acquisition unit 1
- a voice segment detection unit 14 for detecting a voice segment to be performed, a voice data storage unit 2A for storing voice data for each voice segment detected by the voice segment detection unit 14, and features of voice data stored in the voice data storage unit 2A
- a first to Mth speech recognition unit that extracts a quantity and performs a recognition process with reference to the recognition dictionary based on the extracted feature quantity; a voice recognition switching unit 4 that switches between the first to Mth voice recognition units;
- the recognition control unit 5 that controls the switching of the voice recognition unit by the voice recognition switching unit 4 and acquires the recognition result by the switched voice recognition unit, and presents the recognition result acquired by the recognition control unit 5 to the user versus And a recognition result selection section 6 for selecting a recognition result.
- FIG. FIG. 8 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 4 of the present invention.
- the speech recognition apparatus according to Embodiment 4 includes a speech acquisition unit 1, a speech data storage unit 2, a speech recognition unit 3A, a speech recognition switching unit 4, a recognition control unit 5, and a recognition result selection unit 6. And a recognition result storage unit 7.
- the same components as those in FIG. 8 are identical components as those in FIG.
- variables that contribute to the accuracy of speech recognition include the frame period when extracting feature quantities of speech sections, the number of acoustic model mixtures, the number of acoustic model models, or a combination thereof.
- the speech recognition method with low recognition accuracy is one in which the frame period when extracting the feature amount of the speech section is made longer than the predetermined value, the number of mixture distributions of the acoustic model is reduced from the predetermined value in the variables described above,
- the acoustic model is defined by the number of models reduced from a predetermined value or a combination thereof.
- the speech recognition method with high recognition accuracy shortens the frame period when extracting the feature amount of the speech section to the predetermined value or less, and increases the number of mixture distributions of the acoustic model to the predetermined value or more.
- the number of acoustic models is increased by more than the predetermined value, or a combination thereof.
- the variables that contribute to the recognition accuracy of the speech recognition method in the first to Mth speech recognition units may be appropriately set by the user to determine the recognition accuracy.
- FIG. 9 is a flowchart showing the flow of speech recognition processing by the speech recognition apparatus according to the fourth embodiment.
- the audio acquisition unit 1 performs A / D conversion on audio for a predetermined period input by a microphone or the like, and acquires the audio data, for example, in PCM format (step ST410).
- the voice data storage unit 2 stores the voice data acquired by the voice acquisition unit 1 (step ST420).
- the recognition control unit 5 initializes the variable N to 1 (step ST430).
- N is a variable that can take values from 1 to M.
- the recognition control unit 5 outputs a switching control signal for switching the voice recognition unit 3A to the Nth voice recognition unit to the voice recognition switching unit 4.
- the speech recognition switching unit 4 switches the speech recognition unit 3A to the Nth speech recognition unit in accordance with the switching control signal from the recognition control unit 5 (step ST440).
- the Nth speech recognition unit detects a speech segment corresponding to the user utterance from speech data stored in the speech data storage unit 2 by a speech recognition method with low recognition accuracy, extracts a feature amount of the speech segment, Based on the amount, the recognition process is performed while referring to the recognition dictionary (step ST450).
- the recognition control unit 5 increments the variable N by +1 (step ST460), and the value of the variable N is the number of voice recognition units. It is determined whether or not M is exceeded (step ST470).
- the process returns to step ST440. The above process is repeated by the switched speech recognition unit.
- the recognition control unit 5 acquires the recognition result from the Nth speech recognition unit, and recognizes the first place among the recognition results.
- the score (likelihood) is compared with a predetermined threshold value to determine whether or not there are K speech recognition units that are equal to or greater than the threshold value (step ST480). Accordingly, among the first to Mth speech recognition units, K speech recognition units L (1) to L (1) to L (1) to L (1) to L that have obtained recognition results having a first recognition score equal to or higher than a threshold value by a speech recognition method with low recognition accuracy. Narrow down to (K).
- Recognition control unit 5 initializes variable n to 1 (step ST490). Note that n is a variable that can take a value of 1 to K.
- the recognition control unit 5 outputs to the voice recognition switching unit 4 a switching control signal for switching to the voice recognition unit L (n) among the voice recognition units L (1) to L (K) selected in step ST480.
- the speech recognition switching unit 4 switches the speech recognition unit 3A to the speech recognition unit L (n) in accordance with the switching control signal from the recognition control unit 5 (step ST500).
- the voice recognition unit L (n) detects a voice section corresponding to the user utterance from the voice data stored in the voice data storage unit 2 by a voice recognition method with high recognition accuracy, and determines the feature amount of the voice data in the voice section. Extraction is performed, and recognition processing is performed with reference to the recognition dictionary based on the feature amount (step ST510).
- the recognition control unit 5 acquires the recognition result every time the recognition process of the voice recognition unit L (n) is finished.
- the recognition result selection unit 6 outputs the recognition results obtained by the Nth speech recognition unit acquired by the recognition control unit 5 by the same method as in the first embodiment (step ST70 and step ST90 in FIG. 2).
- a recognition result to be selected is selected (step ST520).
- the recognition result selection unit 6 stores the selected recognition result in the recognition result storage unit 7 (step ST530).
- the recognition control unit 5 increments the variable n by +1 (step ST540), and the value of the variable n is the voice selected in step ST480. It is determined whether or not K, which is the number of recognition units, has been exceeded (step ST550).
- K which is the number of recognition units
- the process returns to step ST500.
- step ST480 When the value of the variable n exceeds the number K of speech recognition units selected in step ST480 (step ST550; YES), the display unit 8 displays the speech recognition units L (1) to L (1) stored in the recognition result storage unit 7.
- the recognition result of (K) is output (step ST560).
- the output of the recognition result by the display unit 8 may be output in the order of the recognition results of the speech recognition units L (1) to L (K).
- the first to Mth speech recognition units of the speech recognition unit 3A can perform recognition processing with different accuracy, and the recognition control unit 5
- the recognition processing is performed so that the accuracy increases step by step while narrowing down the speech recognition units that perform the recognition processing based on the recognition score of the recognition result.
- FIG. FIG. 10 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 5 of the present invention.
- the speech recognition apparatus according to the fifth embodiment includes a speech acquisition unit 1, a speech data storage unit 2, a speech recognition unit 3, a speech recognition switching unit 4, a recognition control unit 5, and a recognition result determination unit 15.
- the recognition result determination unit 15 is a determination unit that accepts selection of a recognition result by the user based on the recognition result candidate displayed on the display unit 8 and determines the selected recognition result candidate as a final recognition result. .
- the recognition result determination unit 15 displays a recognition result selection screen on the screen of the display unit 8, and based on the recognition result selection screen, the recognition result is determined using an input device such as a touch panel, a hard key, or a button. Provide an HMI for selecting candidates.
- an input device such as a touch panel, a hard key, or a button.
- HMI for selecting candidates.
- FIG. 10 the same components as those in FIG.
- FIG. 11 is a flowchart showing the flow of speech recognition processing by the speech recognition apparatus according to the fifth embodiment.
- the audio acquisition unit 1 performs A / D conversion on audio for a predetermined period input by a microphone or the like, and acquires it as, for example, PCM format audio data (step ST610).
- the voice data storage unit 2 stores the voice data acquired by the voice acquisition unit 1 (step ST620).
- the recognition control unit 5 initializes the variable N to 1 (step ST630).
- N is a variable that can take values from 1 to M.
- the recognition control unit 5 outputs a switching control signal for switching the voice recognition unit 3 to the Nth voice recognition unit to the voice recognition switching unit 4.
- the speech recognition switching unit 4 switches the speech recognition unit 3 to the Nth speech recognition unit in accordance with the switching control signal from the recognition control unit 5 (step ST640).
- the Nth speech recognition unit detects a speech section corresponding to a user utterance from speech data stored in the speech data storage unit 2, extracts a feature amount of the speech data in the speech segment, and recognizes based on the feature amount Recognition processing is performed while referring to the dictionary (step ST650).
- the recognition control unit 5 acquires the recognition result from the Nth speech recognition unit and outputs it to the display unit 8.
- display unit 8 displays the input recognition result as a recognition result candidate according to the control of recognition result determination unit 15 (step ST660).
- the recognition result determination unit 15 waits for selection of a recognition result from the user, and determines whether the user has selected the recognition result candidate displayed on the display unit 8. (Step ST670).
- the recognition result determination unit 15 determines the recognition result candidate selected by the user as a final recognition result (step ST680). As a result, the recognition process ends.
- step ST670; NO the recognition control unit 5 increments the variable N by +1 (step ST690), and the value of the variable N sets the number M of speech recognition units. It is determined whether it has been exceeded (step ST700). If the value of the variable N exceeds the number of voice recognition copies M (step ST700; YES), the recognition process ends. If the value of variable N is equal to or less than the number of voice recognition copies M (step ST700; NO), the process returns to step ST640. Thus, the above process is repeated by the voice recognition unit after switching.
- the voice acquisition unit 1 that digitally converts the input voice and acquires it as voice data
- the voice data storage unit that stores the voice data acquired by the voice acquisition unit 1 2
- the speech section is detected from the speech data stored in the speech data storage unit 2
- the feature amount of the speech data in the speech section is extracted
- the recognition process is performed by referring to the recognition dictionary based on the extracted feature amount.
- the first to Mth voice recognition units to be performed, the voice recognition switching unit 4 for switching the first to Mth voice recognition units, and the switching of the voice recognition unit by the voice recognition switching unit 4 are controlled to be switched.
- the recognition control unit 5 that acquires the recognition result by the user and the recognition result acquired by the recognition control unit 5 and presented to the user are accepted, and the recognition result selected by the user is finally determined.
- Recognition And a recognition result determination unit 15 for determining the result With this configuration, the recognition result selected and designated by the user can be confirmed as the final recognition result before all the recognition processes are performed, so that the overall recognition process time can be reduced.
- the recognition result is displayed on the display unit 8 in the first to fifth embodiments.
- the presentation of the recognition result to the user is not limited to the screen display on the display unit 8.
- the recognition result may be voice-guided using a voice output device such as a speaker.
- the navigation device according to the present invention is applied to a vehicle-mounted navigation device.
- a vehicle-mounted device not only a vehicle-mounted device but also a mobile phone terminal or a personal digital assistant (PDA; Personal Digital Assistance).
- PDA Personal Digital Assistance
- the present invention may be applied to a PND (Portable Navigation Device) or the like that is carried and used by a person in a moving body such as a vehicle, a railway, a ship, or an aircraft.
- the speech recognition apparatus according to the second to fifth embodiments as well as the first embodiment may be applied to the navigation apparatus.
- any combination of each embodiment, any component of each embodiment can be modified, or any component can be omitted in each embodiment. .
- the speech recognition apparatus can accurately present recognition results obtained by different speech recognition processes and can shorten the recognition process, the speed of the recognition process and the accuracy of the recognition results are improved. It is suitable for voice recognition of a required on-vehicle navigation device.
- 1 voice acquisition unit 2, 2A voice data storage unit, 3, 3A voice recognition unit, 4 voice recognition switching unit, 5 recognition control unit, 6, 6A recognition result selection unit, 7 recognition result storage unit, 8 display unit, 9 Navigation processing unit, 10 position detection unit, 11 map database (DB), 12 input unit, 13 recognition result selection method change unit, 14 speech section detection unit, 15 recognition result determination unit.
- DB map database
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Navigation (AREA)
Abstract
Description
このように、音声認識対象を分割することで認識率の低下を防ぐことができる。また、認識結果の認識スコアが閾値以上になった時点で処理を終了するので、認識処理に要する時間を短縮することができる。
実施の形態1.
図1は、この発明の実施の形態1に係る音声認識装置を備えたナビゲーション装置の構成を示すブロック図である。図1において、実施の形態1に係るナビゲーション装置は、実施の形態1に係る音声認識装置を、移動体である車両に搭載した車載用ナビゲーション装置に適用した場合を示している。音声認識装置の構成として、音声取得部1、音声データ記憶部2、音声認識部3、音声認識切換部4、認識制御部5、認識結果選択部6および認識結果記憶部7を備え、ナビゲーションを行う構成として、表示部8、ナビゲーション処理部9、位置検出部10、地図データベース(DB)11および入力部12を備える。
音声認識部3は、例えば構文型やディクテーション型等の異なる音声認識処理を行う複数の音声認識部(以降、第1~第M音声認識部と記載する)からなる。
第1~第M音声認識部は、各々の音声認識アルゴリズムに従って、音声取得部1が取得した音声データからユーザが発話した内容に該当する音声区間を検出し、その音声区間における音声データの特徴量を抽出し、抽出した特徴量に基づいて認識辞書を参照しながら認識処理を行う。
図2は、実施の形態1に係る音声認識装置による音声認識処理の流れを示すフローチャートである。まず、音声取得部1が、マイクなどにより入力された所定期間の音声をA/D変換し、例えば、PCM形式の音声データとして取得する(ステップST10)。音声データ記憶部2は、音声取得部1により取得された音声データを記憶する(ステップST20)。
認識制御部5は、第N音声認識部から認識結果を取得し、当該認識結果における第1位の認識スコア(尤度)を所定の閾値と比較して、当該閾値以上であるか否かを判定する(ステップST60)。なお、上記所定の閾値は、他の音声認識部へ切り換えて認識処理を継続するか否かを判断するために使用されるものであり、上記所定の閾値は、第1~第M音声認識部ごとに設定されている。
一方、第1位の認識スコアが閾値未満である場合(ステップST60;NO)、認識結果選択部6は、認識制御部5が取得した第N音声認識部による認識結果の中から、後述する方法で出力すべき認識結果を選択する(ステップST90)。
認識結果選択部6は、認識制御部5が取得した認識結果の中から、認識スコアが上位のものを選択する。
選択方法は、例えば、上述したように認識スコアが第1位の認識結果を選択してもよいし、認識制御部5が取得した認識結果の全てを選択してもよい。
また、認識スコアの上位からX位までに含まれる認識結果を選択してもよい。
さらに、第1位の認識スコアからの差が所定値以下の認識結果を選択してもよい。
なお、認識スコアの上位からX位までに含まれる認識結果、または第1位の認識スコアからの差が所定値以下の認識結果であっても、認識スコアが予め定めた閾値未満の認識結果は、選択しないようにしてもよい。
このように、ステップST70とステップST90では、音声認識部ごとに認識結果の選択方法が異なっていてもよい。
図5は、この発明の実施の形態2に係る音声認識装置の構成を示すブロック図である。図5において、実施の形態2に係る音声認識装置は、音声取得部1、音声データ記憶部2、音声認識部3、音声認識切換部4、認識制御部5、認識結果選択部6A、認識結果記憶部7、および認識結果選択方法変更部13を備える。認識結果選択部6Aは、認識結果選択方法変更部13からの選択方法制御信号に従って、認識制御部5により取得された認識結果から、出力すべき認識結果を選択する。認識結果選択方法変更部13は、第1~第M音声認識部ごとに、認識結果選択部6Aによる認識結果の選択方法の指定を受け付けて、ユーザから指定された選択方法に変更する選択方法制御信号を認識結果選択部6Aへ出力する機能構成部である。なお、図5において、図1と同一構成要素には同一符号を付して説明を省略する。
認識結果選択方法変更部13は、認識結果の選択方法の指定用画面を表示部8に表示して、ユーザによる指定を受け付けるHMI(Human Machine Interface)を提供する。
例えば、ユーザ操作によって第1~第M音声認識部のそれぞれと選択方法を対応付ける指定用画面を表示する。これにより、認識結果選択部6Aに対して、音声認識部ごとに選択方法を設定しておく。ユーザは、音声認識部ごとの選択方法を好みで指定でき、また、音声認識装置の使用状況に応じて音声認識部ごとの選択方法を指定してもよい。さらに、音声認識部ごとに重要度を予め設定していた場合には、重要度の高い音声認識部による認識結果が多く選択されるように選択方法を指定してもよい。なお、音声認識部によっては選択方法を指定しない、すなわちその音声認識部による認識結果は出力しないように指定を行ってもよい。
図6は、この発明の実施の形態3に係る音声認識装置の構成を示すブロック図である。図6に示すように、実施の形態3に係る音声認識装置は、音声取得部1、音声データ記憶部2A、音声認識部3、音声認識切換部4、認識制御部5、認識結果選択部6、認識結果記憶部7および音声区間検出部14を備える。なお、図6において、図1と同一構成要素には同一符号を付して説明を省略する。
図7は、実施の形態3に係る音声認識装置による音声認識処理の流れを示すフローチャートである。まず、音声取得部1が、マイクなどにより入力された所定期間の音声をA/D変換し、例えばPCM形式の音声データとして取得する(ステップST210)。次に、音声区間検出部14は、音声取得部1が取得した音声データから、ユーザが発話した内容に該当する区間の音声データを検出する(ステップST220)。音声データ記憶部2Aは、音声区間検出部14により検出された音声データを格納する(ステップST230)。
このように構成することで、第1~第M音声認識部で音声区間検出を実施しないため、認識処理に要する時間を短縮することができる。
図8は、この発明の実施の形態4に係る音声認識装置の構成を示すブロック図である。図8に示すように、実施の形態4に係る音声認識装置は、音声取得部1、音声データ記憶部2、音声認識部3A、音声認識切換部4、認識制御部5、認識結果選択部6および認識結果記憶部7を備える。なお、図8において、図1と同一構成要素には同一符号を付して説明を省略する。
また、認識精度が高い音声認識方法は、反対に、音声区間の特徴量を抽出する際のフレーム周期を上記所定値以下に短くしたもの、音響モデルの混合分布数を上記所定値以上に増やしたもの、音響モデルのモデル数を上記所定値以上に増やしたもの、またはこれらの組み合わせにより規定する。
なお、第1~第M音声認識部における音声認識方法の認識精度に寄与する上記変数は、ユーザが適宜設定して認識精度を決定してもよい。
図9は、実施の形態4に係る音声認識装置による音声認識処理の流れを示すフローチャートである。まず、音声取得部1が、マイクなどにより入力された所定期間の音声をA/D変換して、例えばPCM形式の音声データとして取得する(ステップST410)。音声データ記憶部2は、音声取得部1が取得した音声データを記憶する(ステップST420)。
次に、認識制御部5は、ステップST480で選択した音声認識部L(1)~L(K)のうち、音声認識部L(n)へ切り換える切換制御信号を音声認識切換部4へ出力する。音声認識切換部4は、認識制御部5からの当該切換制御信号に従って、音声認識部3Aを音声認識部L(n)に切り換える(ステップST500)。
図10は、この発明の実施の形態5に係る音声認識装置の構成を示すブロック図である。図10に示すように、実施の形態5に係る音声認識装置は、音声取得部1、音声データ記憶部2、音声認識部3、音声認識切換部4、認識制御部5および認識結果確定部15を備える。認識結果確定部15は、表示部8に表示された認識結果の候補に基づいたユーザによる認識結果の選択を受け付け、選択された認識結果の候補を最終的な認識結果として確定する確定部である。例えば、認識結果確定部15は、表示部8の画面上に認識結果の選択用画面を表示させ、認識結果選択用画面に基づき、タッチパネルやハードキー、ボタンなどの入力装置を用いて、認識結果候補を選択するHMIを提供する。なお、図10において、図1と同一構成要素には同一符号を付して説明を省略する。
図11は、実施の形態5に係る音声認識装置による音声認識処理の流れを示すフローチャートである。まず、音声取得部1が、マイクなどにより入力された所定期間の音声をA/D変換し、例えば、PCM形式の音声データとして取得する(ステップST610)。音声データ記憶部2は、音声取得部1により取得された音声データを記憶する(ステップST620)。
変数Nの値が音声認識部数Mを超える場合(ステップST700;YES)、認識処理は終了する。また、変数Nの値が音声認識部数M以下の場合(ステップST700;NO)、ステップST640の処理へ戻る。これにより、切り換え後の音声認識部によって、上記処理を繰り返す。
さらに、車両、鉄道、船舶または航空機等の移動体に人が携帯して持ち込んで使用されるPND(Portable Navigation Device)等に適用してもよい。
そのほか、上記実施の形態1のみならず、上記実施の形態2~5に係る音声認識装置をナビゲーション装置に適用してもよい。
Claims (6)
- 入力された音声をデジタル変換して音声データとして取得する取得部と、
前記取得部が取得した音声データを記憶する音声データ記憶部と、
前記音声データ記憶部に記憶された音声データから音声区間を検出し、前記音声区間の音声データの特徴量を抽出して、前記抽出した特徴量に基づいて認識辞書を参照して認識処理を行う複数の音声認識部と、
前記複数の音声認識部を切り換える切換部と、
前記切換部による音声認識部の切り換えを制御して、切り換えられた音声認識部による認識結果を取得する制御部と、
前記制御部が取得した認識結果の中からユーザへの提示対象の認識結果を選択する選択部とを備えた音声認識装置。 - 入力された音声をデジタル変換して音声データとして取得する取得部と、
前記取得部が取得した音声データからユーザの発話内容に該当する音声区間を検出する音声区間検出部と、
前記音声区間検出部が検出した音声区間ごとの音声データを記憶する音声データ記憶部と、
前記音声データ記憶部に記憶された音声データの特徴量を抽出して、前記抽出した特徴量に基づいて認識辞書を参照して認識処理を行う複数の音声認識部と、
前記複数の音声認識部を切り換える切換部と、
前記切換部による音声認識部の切り換えを制御して、切り換えられた音声認識部による認識結果を取得する制御部と、
前記制御部が取得した認識結果の中からユーザへの提示対象の認識結果を選択する選択部とを備えた音声認識装置。 - 入力された音声をデジタル変換して音声データとして取得する取得部と、
前記取得部が取得した音声データを記憶する音声データ記憶部と、
前記音声データ記憶部に記憶された音声データから音声区間を検出し、前記音声区間の音声データの特徴量を抽出して、前記抽出した特徴量に基づいて認識辞書を参照して認識処理を行う複数の音声認識部と、
前記複数の音声認識部を切り換える切換部と、
前記切換部による音声認識部の切り換えを制御して、切り換えられた音声認識部による認識結果を取得する制御部と、
前記制御部が取得してユーザへ提示された認識結果の中から、ユーザによる認識結果の選択を受け付け、ユーザに選択された認識結果を最終的な認識結果として確定する確定部とを備えた音声認識装置。 - 前記制御部が取得した認識結果の中からユーザへの提示対象の認識結果を選択する選択方法の指定を受け付けて、指定された選択方法で前記選択部による認識結果の選択方法を変更する変更部を備えたことを特徴とする請求項1または請求項2記載の音声認識装置。
- 前記複数の音声認識部は、それぞれの音声認識部が精度が異なる認識処理を行うことが可能であり、
前記制御部は、前記音声認識部に対して、認識結果の認識スコアに基づいて認識処理を行う音声認識部を絞り込みながら段階的に精度が上がるように認識処理を行わせることを特徴とする請求項1から請求項4のうちのいずれか1項記載の音声認識装置。 - 請求項1から請求項5のうちのいずれか1項記載の音声認識装置を備え、前記音声認識部による認識結果を利用してナビゲーション処理を行うナビゲーション装置。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112011105407.6T DE112011105407T5 (de) | 2011-07-05 | 2011-07-05 | Spracherkennungsvorrichtung und Navigationsvorrichtung |
US14/117,830 US20140100847A1 (en) | 2011-07-05 | 2011-07-05 | Voice recognition device and navigation device |
CN201180071882.5A CN103650034A (zh) | 2011-07-05 | 2011-07-05 | 语音识别装置及导航装置 |
PCT/JP2011/003827 WO2013005248A1 (ja) | 2011-07-05 | 2011-07-05 | 音声認識装置およびナビゲーション装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2011/003827 WO2013005248A1 (ja) | 2011-07-05 | 2011-07-05 | 音声認識装置およびナビゲーション装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013005248A1 true WO2013005248A1 (ja) | 2013-01-10 |
Family
ID=47436626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/003827 WO2013005248A1 (ja) | 2011-07-05 | 2011-07-05 | 音声認識装置およびナビゲーション装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20140100847A1 (ja) |
CN (1) | CN103650034A (ja) |
DE (1) | DE112011105407T5 (ja) |
WO (1) | WO2013005248A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3053587A1 (en) | 2015-02-05 | 2016-08-10 | Linde AG | Combination of nitric oxide, helium and antibiotic to treat bacterial lung infections |
EP3108920A1 (en) | 2015-06-22 | 2016-12-28 | Linde AG | Device for delivering nitric oxide and oxygen to a patient |
WO2019016938A1 (ja) * | 2017-07-21 | 2019-01-24 | 三菱電機株式会社 | 音声認識装置及び音声認識方法 |
WO2020065840A1 (ja) * | 2018-09-27 | 2020-04-02 | 株式会社オプティム | コンピュータシステム、音声認識方法及びプログラム |
JP2020201363A (ja) * | 2019-06-09 | 2020-12-17 | 株式会社Tbsテレビ | 音声認識テキストデータ出力制御装置、音声認識テキストデータ出力制御方法、及びプログラム |
Families Citing this family (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
KR20150104615A (ko) | 2013-02-07 | 2015-09-15 | 애플 인크. | 디지털 어시스턴트를 위한 음성 트리거 |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9786296B2 (en) * | 2013-07-08 | 2017-10-10 | Qualcomm Incorporated | Method and apparatus for assigning keyword model to voice operated function |
DE112014003653B4 (de) | 2013-08-06 | 2024-04-18 | Apple Inc. | Automatisch aktivierende intelligente Antworten auf der Grundlage von Aktivitäten von entfernt angeordneten Vorrichtungen |
WO2015072816A1 (ko) * | 2013-11-18 | 2015-05-21 | 삼성전자 주식회사 | 디스플레이 장치 및 제어 방법 |
EP3149728B1 (en) | 2014-05-30 | 2019-01-16 | Apple Inc. | Multi-command single utterance input method |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10115394B2 (en) | 2014-07-08 | 2018-10-30 | Mitsubishi Electric Corporation | Apparatus and method for decoding to recognize speech using a third speech recognizer based on first and second recognizer results |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
JP6516585B2 (ja) * | 2015-06-24 | 2019-05-22 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 制御装置、その方法及びプログラム |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
KR101736109B1 (ko) * | 2015-08-20 | 2017-05-16 | 현대자동차주식회사 | 음성인식 장치, 이를 포함하는 차량, 및 그 제어방법 |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US10271093B1 (en) * | 2016-06-27 | 2019-04-23 | Amazon Technologies, Inc. | Systems and methods for routing content to an associated output device |
US10931999B1 (en) | 2016-06-27 | 2021-02-23 | Amazon Technologies, Inc. | Systems and methods for routing content to an associated output device |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | USER INTERFACE FOR CORRECTING RECOGNITION ERRORS |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
DK201770427A1 (en) | 2017-05-12 | 2018-12-20 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK179822B1 (da) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
DK180091B1 (en) * | 2018-06-03 | 2020-04-22 | Apple Inc. | ACCELERATED TASK PERFORMANCE |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
WO2020141615A1 (ko) * | 2018-12-31 | 2020-07-09 | 엘지전자 주식회사 | 차량용 전자 장치 및 차량용 전자 장치의 동작 방법 |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | USER ACTIVITY SHORTCUT SUGGESTIONS |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
DK201970510A1 (en) | 2019-05-31 | 2021-02-11 | Apple Inc | Voice identification in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
CN110415685A (zh) * | 2019-08-20 | 2019-11-05 | 河海大学 | 一种语音识别方法 |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11038934B1 (en) | 2020-05-11 | 2021-06-15 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62201498A (ja) * | 1986-02-28 | 1987-09-05 | 沖電気工業株式会社 | 音声認識方法 |
JPS6332596A (ja) * | 1986-07-25 | 1988-02-12 | 日本電信電話株式会社 | 音声認識装置 |
JPH04163597A (ja) * | 1990-10-29 | 1992-06-09 | Ricoh Co Ltd | 車載用音声認識装置 |
JPH06266393A (ja) * | 1993-03-12 | 1994-09-22 | Matsushita Electric Ind Co Ltd | 音声認識装置 |
JP2003295893A (ja) * | 2002-04-01 | 2003-10-15 | Omron Corp | 音声認識システム、装置、音声認識方法、音声認識プログラム及び音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2007156974A (ja) * | 2005-12-07 | 2007-06-21 | Kddi Corp | 個人認証・識別システム |
JP2008210132A (ja) * | 2007-02-26 | 2008-09-11 | Toshiba Corp | 原言語による音声を目的言語に翻訳する装置、方法およびプログラム |
JP2009116107A (ja) * | 2007-11-07 | 2009-05-28 | Canon Inc | 情報処理装置及び方法 |
JP2009230068A (ja) * | 2008-03-25 | 2009-10-08 | Denso Corp | 音声認識装置及びナビゲーションシステム |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1197949B1 (en) * | 2000-10-10 | 2004-01-07 | Sony International (Europe) GmbH | Avoiding online speaker over-adaptation in speech recognition |
US6996525B2 (en) * | 2001-06-15 | 2006-02-07 | Intel Corporation | Selecting one of multiple speech recognizers in a system based on performance predections resulting from experience |
US7478044B2 (en) * | 2004-03-04 | 2009-01-13 | International Business Machines Corporation | Facilitating navigation of voice data |
JP4282704B2 (ja) * | 2006-09-27 | 2009-06-24 | 株式会社東芝 | 音声区間検出装置およびプログラム |
US8949130B2 (en) * | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Internal and external speech recognition use with a mobile communication facility |
US7933777B2 (en) * | 2008-08-29 | 2011-04-26 | Multimodal Technologies, Inc. | Hybrid speech recognition |
WO2011010604A1 (ja) * | 2009-07-21 | 2011-01-27 | 日本電信電話株式会社 | 音声信号区間推定装置と音声信号区間推定方法及びそのプログラムと記録媒体 |
-
2011
- 2011-07-05 US US14/117,830 patent/US20140100847A1/en not_active Abandoned
- 2011-07-05 DE DE112011105407.6T patent/DE112011105407T5/de not_active Withdrawn
- 2011-07-05 WO PCT/JP2011/003827 patent/WO2013005248A1/ja active Application Filing
- 2011-07-05 CN CN201180071882.5A patent/CN103650034A/zh active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62201498A (ja) * | 1986-02-28 | 1987-09-05 | 沖電気工業株式会社 | 音声認識方法 |
JPS6332596A (ja) * | 1986-07-25 | 1988-02-12 | 日本電信電話株式会社 | 音声認識装置 |
JPH04163597A (ja) * | 1990-10-29 | 1992-06-09 | Ricoh Co Ltd | 車載用音声認識装置 |
JPH06266393A (ja) * | 1993-03-12 | 1994-09-22 | Matsushita Electric Ind Co Ltd | 音声認識装置 |
JP2003295893A (ja) * | 2002-04-01 | 2003-10-15 | Omron Corp | 音声認識システム、装置、音声認識方法、音声認識プログラム及び音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2007156974A (ja) * | 2005-12-07 | 2007-06-21 | Kddi Corp | 個人認証・識別システム |
JP2008210132A (ja) * | 2007-02-26 | 2008-09-11 | Toshiba Corp | 原言語による音声を目的言語に翻訳する装置、方法およびプログラム |
JP2009116107A (ja) * | 2007-11-07 | 2009-05-28 | Canon Inc | 情報処理装置及び方法 |
JP2009230068A (ja) * | 2008-03-25 | 2009-10-08 | Denso Corp | 音声認識装置及びナビゲーションシステム |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3053587A1 (en) | 2015-02-05 | 2016-08-10 | Linde AG | Combination of nitric oxide, helium and antibiotic to treat bacterial lung infections |
EP3108920A1 (en) | 2015-06-22 | 2016-12-28 | Linde AG | Device for delivering nitric oxide and oxygen to a patient |
WO2016207227A1 (en) | 2015-06-22 | 2016-12-29 | Linde Ag | Device for delivering nitric oxide and oxygen to a patient |
WO2019016938A1 (ja) * | 2017-07-21 | 2019-01-24 | 三菱電機株式会社 | 音声認識装置及び音声認識方法 |
WO2020065840A1 (ja) * | 2018-09-27 | 2020-04-02 | 株式会社オプティム | コンピュータシステム、音声認識方法及びプログラム |
JPWO2020065840A1 (ja) * | 2018-09-27 | 2021-08-30 | 株式会社オプティム | コンピュータシステム、音声認識方法及びプログラム |
JP7121461B2 (ja) | 2018-09-27 | 2022-08-18 | 株式会社オプティム | コンピュータシステム、音声認識方法及びプログラム |
JP2020201363A (ja) * | 2019-06-09 | 2020-12-17 | 株式会社Tbsテレビ | 音声認識テキストデータ出力制御装置、音声認識テキストデータ出力制御方法、及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20140100847A1 (en) | 2014-04-10 |
CN103650034A (zh) | 2014-03-19 |
DE112011105407T5 (de) | 2014-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013005248A1 (ja) | 音声認識装置およびナビゲーション装置 | |
JP6400109B2 (ja) | 音声認識システム | |
JP4304952B2 (ja) | 車載制御装置、並びにその操作説明方法をコンピュータに実行させるプログラム | |
US8831938B2 (en) | Speech recognition adjustment based on manual interaction | |
US9123327B2 (en) | Voice recognition apparatus for recognizing a command portion and a data portion of a voice input | |
US20020035475A1 (en) | Voice recognition apparatus | |
US9881609B2 (en) | Gesture-based cues for an automatic speech recognition system | |
US9715877B2 (en) | Systems and methods for a navigation system utilizing dictation and partial match search | |
JP4357867B2 (ja) | 音声認識装置、音声認識方法、並びに、音声認識プログラムおよびそれを記録した記録媒体 | |
JP6214297B2 (ja) | ナビゲーション装置および方法 | |
JP4104313B2 (ja) | 音声認識装置、プログラム及びナビゲーションシステム | |
JP5606951B2 (ja) | 音声認識システムおよびこれを用いた検索システム | |
JP2009230068A (ja) | 音声認識装置及びナビゲーションシステム | |
JP6522009B2 (ja) | 音声認識システム | |
JP3296783B2 (ja) | 車載用ナビゲーション装置および音声認識方法 | |
JP2011180416A (ja) | 音声合成装置、音声合成方法およびカーナビゲーションシステム | |
JPWO2013005248A1 (ja) | 音声認識装置およびナビゲーション装置 | |
JP3700533B2 (ja) | 音声認識装置及び処理システム | |
JPH0916191A (ja) | ナビゲータ用音声認識装置および方法 | |
JP4941494B2 (ja) | 音声認識システム | |
JP2005031260A (ja) | 情報処理方法及び装置 | |
JP2017102320A (ja) | 音声認識装置 | |
JP2014232289A (ja) | 誘導音声調整装置、誘導音声調整方法および誘導音声調整プログラム | |
JP2008298851A (ja) | 音声入力処理装置および音声入力処理方法 | |
JP2006184421A (ja) | 音声認識装置及び音声認識方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11868878 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013522362 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14117830 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112011105407 Country of ref document: DE Ref document number: 1120111054076 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11868878 Country of ref document: EP Kind code of ref document: A1 |