WO2019016938A1 - Speech recognition device and speech recognition method - Google Patents
Speech recognition device and speech recognition method Download PDFInfo
- Publication number
- WO2019016938A1 WO2019016938A1 PCT/JP2017/026450 JP2017026450W WO2019016938A1 WO 2019016938 A1 WO2019016938 A1 WO 2019016938A1 JP 2017026450 W JP2017026450 W JP 2017026450W WO 2019016938 A1 WO2019016938 A1 WO 2019016938A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- speech
- unit
- recognition
- speech recognition
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 177
- 238000004891 communication Methods 0.000 claims description 74
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 24
- 230000006870 function Effects 0.000 description 14
- 238000012986 modification Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 239000000470 constituent Substances 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
Definitions
- the present invention relates to a speech recognition apparatus capable of recognizing an input speech which is an input speech, and a speech recognition method in the speech recognition apparatus.
- Patent Document 1 proposes a technique for switching between a push talk mode and a hands free mode based on the position of a speaker in a vehicle cabin.
- the push-to-talk mode is a mode for recognizing voice when the button switch is pressed
- the hands-free mode is a mode for recognizing voice regardless of the pressing of the button switch.
- the speech recognition apparatus performs speech recognition processing on the speech in the speech segment after determining the speech segment which is a speech recognition period and the non-speech segment which is a period in which speech is not recognized.
- the accuracy of the speech recognition is not sufficient because the processing is not switched from the plurality of speech segment determinations and the plurality of speech recognition processing to the appropriate processing.
- the present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technology capable of enhancing the accuracy of speech recognition.
- a speech recognition apparatus is a speech recognition apparatus capable of recognizing an input speech that is an input speech, and acquired by an acquisition unit that acquires a situation related to recognition of the input speech, and an acquisition unit. Based on the situation, it is possible to process voice recognition of voice segments determined by a plurality of predetermined voice segment determination units that can determine voice segments that are voice recognition periods and multiple voice segment determination units And a control unit for performing control to use at least any one voice segment determination unit and at least one voice recognition processing unit from among a plurality of predetermined voice recognition processing units for recognizing an input voice. Equipped with
- control is performed to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of an input speech based on a situation related to recognition of the input speech. This can improve the accuracy of speech recognition.
- FIG. 1 is a block diagram showing a configuration of a speech recognition device according to Embodiment 1.
- FIG. 7 is a block diagram showing the configuration of a speech recognition device according to Embodiment 2;
- FIG. 7 is a diagram showing discrimination data according to Embodiment 2;
- 7 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment.
- FIG. 10 is a block diagram showing a configuration of a speech recognition device according to a third embodiment.
- FIG. 16 is a diagram showing discrimination data according to Embodiment 3.
- 7 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment.
- FIG. 16 is a block diagram showing a configuration of a speech recognition device according to a fourth embodiment.
- FIG. 15 is a flowchart showing the operation of the speech recognition device according to the fourth embodiment.
- FIG. 18 is a block diagram showing a configuration of a speech recognition device according to Embodiment 5.
- FIG. 18 is a diagram showing discrimination data according to the fifth embodiment.
- 21 is a flowchart showing the operation of the speech recognition device according to the fifth embodiment.
- FIG. 18 is a diagram showing discrimination data according to Embodiment 6.
- FIG. 16 is a flowchart showing the operation of the speech recognition device according to Embodiment 6.
- FIG. It is a figure which shows the data for discrimination
- It is a block diagram which shows the hardware constitutions of the navigation apparatus concerning the other modification.
- It is a block diagram which shows the structure of the communication terminal which concerns on another modification.
- Embodiment 1 the voice recognition device according to the first embodiment of the present invention is mounted, and a vehicle to be focused is described as “own vehicle”.
- This voice recognition device can be applied to, for example, a navigation device mounted on the vehicle.
- FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus 1 according to the first embodiment.
- the speech recognition apparatus 1 of FIG. 1 is an apparatus capable of recognizing an input speech which is a speech input to the speech recognition apparatus 1. That is, the speech recognition device 1 is a device that selects a recognition vocabulary based on a vocabulary recognized by speech recognition and that is most likely to be acoustically and linguistically probable by the user.
- An example of such an apparatus is disclosed in, for example, Japanese Patent Application Laid-Open No. 9-50291.
- the input voice will be described as voice data indicating the strength (amplitude) and height (frequency) of the voice.
- the speech recognition apparatus 1 of FIG. 1 includes an acquisition unit 11 and a control unit 12.
- the acquisition unit 11 acquires a situation related to recognition of input speech.
- a situation related to recognition of input speech may be referred to as "recognition related situation”.
- the control unit 12 selects at least one of a plurality of speech segment determination units determined in advance and a plurality of speech recognition processing units determined in advance based on the recognition related situation acquired by the acquisition unit 11. Control is performed to use one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech.
- a plurality of predetermined speech segment determination units may be described only as "a plurality of speech segment determination units”
- a plurality of predetermined speech recognition processing units may be referred to as "a plurality of speech recognition processing units. There is also a case where it is described only as ".”
- Each of the plurality of voice segment determination units determines a voice segment that is a period for recognizing voice and a non-voice segment that is a period for not recognizing voice. Note that at least one of the plurality of voice section determination units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.
- Each of the plurality of voice recognition processing units is configured to be able to process voices in the voice section determined by the plurality of voice section determination units. That is, each of the plurality of voice recognition processing units extracts a feature included in the voice within the voice section determined by at least one of the plurality of voice section determination units, and a vocabulary or a word is extracted based on the feature. , It asks for as a recognition vocabulary which is a recognition result.
- at least one of the plurality of voice recognition processing units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.
- ⁇ Summary of Embodiment 1> According to the speech recognition apparatus 1 according to the first embodiment as described above, at least one speech segment determination unit and at least one speech recognition processing unit based on the situation related to the recognition of the input speech Is used to recognize input speech. This makes it possible to perform determination and recognition processing of the determination section suitable for the situation related to the recognition of the input speech, so that the accuracy of the speech recognition can be enhanced.
- FIG. 2 is a block diagram showing a configuration of the speech recognition device 1 according to Embodiment 2 of the present invention.
- the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
- FIG. 2 also shows a server 6 capable of communicating with the voice recognition device 1 by radio or the like.
- the speech recognition apparatus 1 of FIG. 2 includes a speech recognition method determination unit 2 and a speech recognition unit 3.
- the voice recognition system determination unit 2 includes a storage unit 13 in addition to the acquisition unit 11 and the control unit 12 corresponding to the acquisition unit 11 and the control unit 12 described in the first embodiment.
- the acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related status.
- the communication status selectively includes high quality online, low quality online, and offline.
- High quality online is a situation in which communication between the speech recognition device 1 and the server 6 is being performed, and a situation in which the evaluation value of the communication is equal to or greater than a predetermined threshold.
- the evaluation value of communication may be, for example, a value that increases as the field strength of communication increases, or may be a value that increases as the speed of communication increases, or a combination of the two values. It may be a value.
- the low quality online is a state in which communication between the speech recognition device 1 and the server 6 is performed, and the evaluation value of the communication is less than a threshold.
- Offline is a situation where communication between the speech recognition device 1 and the server 6 is not performed.
- the acquisition unit 11 may appropriately store the acquired communication status in the storage unit 13.
- the plurality of voice section determination units 8a and 8b are components equivalent to the plurality of voice section determination units described in the first embodiment, and the first voice section determination unit 8a included in the voice recognition unit 3 and the server And the second voice section determining unit 8b provided in the second embodiment.
- the plurality of voice recognition processing units 9a and 9b are components equivalent to the plurality of voice recognition processing units described in the first embodiment, and the first voice recognition processing unit 9a included in the voice recognition unit 3 and the server And the second speech recognition processing unit 9b provided in the second embodiment.
- the control unit 12 selects one set of voice interval determination units from among the plurality of voice interval determination units 8a and 8b and the plurality of voice recognition processing units 9a and 9b based on the communication status acquired by the acquisition unit 11. Determine the speech recognition processor. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
- FIG. 3 is a diagram showing discrimination data according to the second embodiment.
- one set of speech zone determination unit and speech recognition processing unit are associated with one communication situation.
- “1” and “2” of the “voice segment determination unit” indicate the first voice segment determination unit 8 a and the second voice segment determination unit 8 b, respectively, and “1” of the “voice recognition processing unit”.
- “2” indicate the first speech recognition processor 9a and the second speech recognition processor 9b, respectively.
- the control unit 12 determines the first voice segment determination unit Control to use 8a and the 2nd speech recognition processing part 9b for recognition of an input speech will be performed.
- the voice section determination unit such as the first voice section determination unit 8a and the voice recognition processing unit such as the second voice recognition processing unit 9b may perform a plurality of processes at the same time.
- the voice recognition unit 3 includes a first voice section determination unit 8a which is an off-line voice section determination unit, a first voice recognition processing unit 9a which is an off-line voice recognition processing unit, and a first recognition dictionary storage unit 10a.
- the first recognition dictionary storage unit 10a stores a dictionary used when the first speech recognition processing unit 9a performs speech recognition processing.
- the voice recognition unit 3 appropriately uses the first voice section determination unit 8a and the first voice recognition processing unit 9a for recognition of the input voice based on the control of the control unit 12.
- the server 6 includes a second speech zone determination unit 8b which is an online speech zone judgment unit, a second speech recognition processing unit 9b which is an online speech recognition processing unit, and a second recognition dictionary storage unit 10b.
- the second recognition dictionary storage unit 10 b stores a dictionary used when the second speech recognition processing unit 9 b performs a speech recognition process.
- the server 6 appropriately uses the second voice section determining unit 8b and the second voice recognition processing unit 9b to recognize an input voice based on the control of the control unit 12.
- the first speech zone determination unit 8a on the speech recognition device 1 side has lower determination accuracy of the speech zone than the second speech zone determination unit 8b on the server 6 side because of hardware limitations. It is possible to make the determination regardless of the communication status. Also, the first speech recognition processing unit 9a on the speech recognition device 1 side has a smaller number of recognizable vocabularies than the second speech recognition processing unit 9b on the server 6 side in general due to hardware limitations. However, it is possible to carry out recognition processing regardless of the communication situation.
- the speech recognition apparatus 1 can perform speech recognition suitable for the communication situation as follows.
- the second voice section determining unit 8b and the second voice recognition processing unit 9b are used according to the determination data of FIG. A large number of speech recognitions can be performed.
- the voice recognition unit 1 When the communication status is low quality online, the first voice section determining unit 8a and the second voice recognition processing unit 9b are used according to the determination data of FIG. As a result, the voice recognition unit 1 performs voice zone determination, and transmits to the server 6 input voice from which data used for voice zone determination is removed, that is, only input voice used for voice recognition processing. Can perform online speech recognition. That is, even if the communication situation is bad, the speech recognition of the server 6 can be performed by reducing the data amount of the input speech to be communicated.
- the first voice section determining unit 8a and the first voice recognition processing unit 9a are used according to the determination data of FIG. 3, so voice recognition can be performed regardless of the communication status. .
- FIG. 4 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment. The operation of FIG. 4 is performed as needed.
- step S ⁇ b> 1 the acquisition unit 11 acquires an input voice and outputs the input voice to the control unit 12.
- control unit 12 determines whether the intensity of the input voice is equal to or greater than a predetermined threshold. If it is determined that the strength of the input voice is equal to or greater than the threshold, the process proceeds to step S3. If it is determined that the strength of the input voice is less than the threshold, the operation of FIG. 4 ends.
- step S 3 the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6, and outputs the communication status to the control unit 12.
- step S4 the control unit 12 determines one set of voice segment determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
- step S5 the speech recognition unit 3 and the server 6 recognize the input speech using the set of speech segment determination unit and speech recognition processing unit determined by the control unit 12. Then, the recognition vocabulary as the recognition result is output from the speech recognition device 1. Thereafter, the operation of FIG. 4 ends.
- FIG. 5 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 3 of the present invention.
- the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
- the speech recognition apparatus 1 of FIG. 5 includes a speech recognition method determination unit 2 and a speech recognition unit 3 as in the speech recognition apparatus 1 (FIG. 2) according to the second embodiment.
- the recognition start button 21 mentioned later is connected to the speech recognition apparatus 1 of FIG.
- the speech recognition method determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the speech recognition method determination unit 2 according to the second embodiment.
- the acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related situation.
- the acquisition unit 11 also acquires, as a recognition related situation, the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted.
- the riding condition includes the presence or absence of the driver and the presence or absence of the passenger who is a passenger other than the driver.
- the acquisition unit 11 may be configured to determine the riding condition based on whether or not the seat belt of the own vehicle is used, and a sensor for detecting the seating of the own vehicle is provided in each seat If yes, the boarding situation may be determined based on the detection result of the sensor.
- the acquisition unit 11 may be configured to capture an image of the interior of the host vehicle and to determine the riding condition by performing image recognition on the image.
- the acquisition unit 11 may be an interface that acquires the result of the determination from an external device provided outside the speech recognition apparatus 1. The acquisition unit 11 may appropriately store the acquired communication status and boarding status in the storage unit 13.
- the control section 12 determines a plurality of speech segment determination sections 8c, 8d, 8e and 8f determined in advance and a plurality of speech recognition processing sections 9c determined in advance. , 9d, 9e, and 9f, one set of voice section determination unit and voice recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
- the plurality of voice section determination units 8c to 8f are a first voice section determination unit 8c and a second voice section determination unit 8d provided in the voice recognition unit 3, and a third voice section determination unit 8e provided in the server 6. And a fourth speech zone determination unit 8f.
- the plurality of voice recognition processing units 9c to 9f are a first voice recognition processing unit 9c and a second voice recognition processing unit 9d provided in the voice recognition unit 3, and a third voice recognition processing unit 9e provided in the server 6.
- a fourth speech recognition processor 9f are a fourth speech recognition processor 9f.
- FIG. 6 is a diagram showing discrimination data according to the third embodiment.
- one set of voice section determination unit and voice recognition processing unit are associated with the presence or absence of one set of passenger, the presence or absence of the driver, and the communication status.
- "1" to "4" of the "voice section determination unit” indicate the first voice section determination unit 8c to the fourth voice section determination unit 8f, respectively, and "1" of the "voice recognition processing unit”.
- '4' indicate the first speech recognition processor 9c to the fourth speech recognition processor 9f, respectively.
- the voice recognition unit 3 includes a first voice section determination unit 8c, which is an always-offline voice section determination unit, a first voice recognition processing unit 9c, which is an always-offline voice recognition processing unit, a first recognition dictionary storage unit 10c, A second speech zone judgment unit 8d which is an offline speech zone judgment unit, a second speech recognition processing unit 9d which is a normal offline speech recognition processor, and a second recognition dictionary storage unit 10d.
- the first and second recognition dictionary storage units 10c and 10d respectively store dictionaries used when the first and second speech recognition processing units 9c and 9d perform speech recognition processing.
- the voice recognition unit 3 appropriately uses the first and second voice section determination units 8c and 8d and the first and second voice recognition processing units 9c and 9d based on the control of the control unit 12 for recognition of the input voice. .
- the server 6 includes a third voice zone determination unit 8e which is an always online voice zone determination unit, a third voice recognition processing unit 9e which is an always online voice recognition processing unit, a third recognition dictionary storage unit 10e, and a normal online voice It includes a fourth speech section judging section 8f which is a section judging section, a fourth speech recognition processing section 9f which is a normal online speech recognition processing section, and a fourth recognition dictionary storage section 10f.
- the third and fourth recognition dictionary storage units 10e and 10f respectively store dictionaries used when the third and fourth speech recognition processing units 9e and 9f perform speech recognition processing.
- the server 6 appropriately uses the third and fourth speech zone determination units 8e and 8f and the third and fourth speech recognition processing units 9e and 9f for recognition of the input speech based on the control of the control unit 12.
- the speech recognition apparatus 1 performs speech recognition suitable for the communication situation as in the speech recognition apparatus 1 according to the second embodiment. Can.
- first and third speech segment determination units 8c and 8e are speech segment determination units that constantly determine a speech segment, and the second and fourth speech segment determination units 8d and 8f respond to a predetermined operation. It is a voice section determination unit that determines a voice section.
- the first and third speech recognition processing units 9c and 9e are speech recognition processing units that always recognize and process speech, and the second and fourth speech recognition processing units 9d and 9f perform speech in accordance with a predetermined operation. Is a speech recognition processing unit that recognizes and processes.
- the predetermined operation is an operation on the recognition start button 21 for starting voice recognition.
- the speech recognition apparatus 1 can perform speech recognition suitable for the riding situation as follows.
- the third speech recognition processor 9e is used.
- a passenger who desires voice recognition can perform voice recognition without performing an operation on the recognition start button 21. This is convenient for the passenger especially when the recognition start button 21 is provided at a place away from the passenger.
- the second voice section judging unit 8d or the fourth voice section judging unit 8f and the second voice section judging unit 8f according to the discrimination data of FIG.
- a voice recognition processor 9d or a fourth voice recognition processor 9f is used. This can suppress unintended speech recognition.
- the discrimination data is not limited to the data set as shown in FIG.
- the discrimination data when there is no passenger's boarding and there is a driver's boarding, it may be desirable for the driver to concentrate on driving. Therefore, when the passenger does not get in and the driver gets in, the first voice section determination unit 8c or the third voice section determination unit 8e, and the first voice recognition processing unit 9c or the third voice The recognition processor 9e may be used.
- FIG. 7 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment.
- the operation in FIG. 7 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3a.
- step S3 of the operation FIG. 4
- steps S1 and S2 processing similar to that in steps S1 and S2 of FIG. 4 is performed.
- step S3a the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6 and outputs the communication state to the control unit 12. Further, the acquisition unit 11 acquires the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted, and outputs the acquired riding condition to the control unit 12.
- step S4 the control unit 12 determines one set of voice section determining unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status and the boarding status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
- step S5 the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 7 ends.
- the control unit 12 determines one set of speech segment determination unit and speech recognition processing unit by one determination.
- the present invention is not limited to this, and the control unit 12 may determine one set of voice section determination unit and voice recognition processing unit by multiple determinations. For example, based on one of the communication status and the boarding status, the control unit 12 may select one or more speech segment determination units and some speech recognition processing units among the plurality of speech segment determination units and the plurality of speech recognition processing units. You may judge. After that, based on the other of the communication status and the boarding status, the control unit 12 sets one voice segment determination unit and one voice recognition process out of several voice segment determination units and several voice recognition processing units. The part may be determined. The same applies to the fourth and subsequent embodiments described later.
- the speech segment determination unit and the speech recognition processing unit suitable for the communication status and the boarding status can be used for recognition of input speech.
- the control unit 12 uses the voice section determination unit and the voice recognition processing unit for recognizing the input voice, based on both of the communication status and the boarding status acquired by the acquisition unit 11. I took control.
- the present invention is not limited to this, and the control unit 12 does not consider the communication status acquired by the acquisition unit 11, and based on the boarding status acquired by the acquisition unit 11, the voice section determination unit and the speech recognition process The unit may be controlled to use for recognition of input speech.
- FIG. 8 is a block diagram showing a configuration of a speech recognition device 1 according to a fourth embodiment of the present invention.
- constituent elements described in the fourth embodiment constituent elements that are the same as or similar to the above constituent elements are given the same reference numerals, and different constituent elements are mainly described.
- the speech recognition apparatus 1 shown in FIG. 8 has the same configuration as the speech recognition apparatus 1 (FIG. 5) according to the third embodiment, in which the input device 22 is connected.
- an operation hereinafter, referred to as a "specified operation" for specifying the use of one set of speech segment determination unit and speech recognition processing unit is input.
- the voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
- the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
- the designation operation input to the input device 22 is stored in the storage unit 13, and the acquisition unit 11 acquires the history of the designation operation (hereinafter referred to as “operation history”) from the storage unit 13.
- control unit 12 selects one of the plurality of speech segment determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f, which is designated by the designation operation.
- the voice segment determination unit and the voice recognition processing unit are controlled to be used for recognizing an input voice.
- the control unit 12 sets a plurality of voice section determination units 8c to 8f based on the communication status, the boarding status, and the operation history acquired by the acquisition unit 11. Among the speech recognition processing units 9c to 9f, one set of speech segment determination unit and speech recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
- the control unit 12 determines whether or not the number of times when the same set of voice section determination unit and voice recognition processing unit is specified is equal to or more than a predetermined threshold. Then, when it is determined that the number of times is equal to or more than the threshold value, the control unit 12 recognizes one set of the voice section determination unit and the voice recognition processing unit designated by the designation operation of the operation history as the input voice. Perform control to use. On the other hand, when it is determined that the above-mentioned number of times is less than the threshold value, the control unit 12 determines one set of voice based on the communication status and the boarding status acquired by the acquisition section 11 as in the third embodiment. The section determination unit and the speech recognition processing unit are controlled to be used for recognition of input speech.
- FIG. 9 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment.
- the operation of FIG. 9 is the same as that of step S3 of the operation of the speech recognition apparatus 1 according to the second embodiment (FIG. 4) except that step S3 is changed to step S3 b and steps S11 and S12 are added.
- step S3 is changed to step S3 b and steps S11 and S12 are added.
- control unit 12 determines whether or not a designation operation has been input to input device 22. If it is determined that the designating operation has been input, the process proceeds to step S12. If it is determined that the designating operation has not been input, the process proceeds to step S1.
- control unit 12 performs control to use one set of voice segment determination unit and voice recognition processing unit designated by the designation operation for recognition of input voice.
- the speech recognition unit 3 and the server 6 recognize the input speech using the pair of speech segment determination unit and speech recognition processing unit designated by the designation operation. Thereafter, the operation of FIG. 9 ends.
- step S11 processing similar to that of steps S1 and S2 of FIG. 4 is performed.
- step S3b the acquisition unit 11 acquires the communication status, the boarding status, and the operation history, and outputs the communication status, the boarding status, and the operation history to the control unit 12.
- step S4 the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the operation history. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
- step S5 the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 9 ends.
- ⁇ Summary of Embodiment 4> According to the speech recognition apparatus 1 according to the fourth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used for recognition of input speech based on the recognition related situation and the operation history. . As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
- FIG. 10 is a block diagram showing the configuration of the speech recognition device 1 according to the fifth embodiment of the present invention.
- the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
- the voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
- the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
- the acquisition unit 11 also acquires the recognition result of the speech recognition unit 3 and the recognition result of the server 6. Then, the acquisition unit 11 determines, based on the acquired recognition result, whether the voice input in the past, that is, the input voice in the past includes a predetermined command.
- the configuration is not limited to this as long as the acquisition unit 11 is configured to acquire the determination result as to whether or not a predetermined command is included in the past input voice. For example, this determination may be performed by an external device provided outside the voice recognition device 1, and the acquisition unit 11 may be configured to acquire the determination result from the external device.
- the predetermined command includes an upper-level command which is a command for starting the execution.
- the upper level command includes a command such as "Navigation” for starting execution of a navigation function capable of searching for a destination, a route search, etc.
- the lower level command that is not the upper level command is a destination.
- the command “destination search” for executing a search, the command “route search” for executing a route search, and the like are included.
- the control unit 12 selects one of the plurality of speech section determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f based on the communication status, the boarding status, and the determination result acquired by the acquisition unit 11.
- a control is performed to use a set of speech zone determination unit and speech recognition processing unit for recognition of input speech.
- FIG. 11 is a diagram showing part of the discrimination data according to the fifth embodiment.
- one of the 12 combinations shown in FIG. 6 is divided into whether the determination result includes the upper command or the lower command, and the voice section determining unit and the voice recognition processing unit are set.
- the voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.
- a regular voice section determination unit and a regular voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the upper level command.
- a normal voice section determination unit and a normal voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the lower order command.
- the discrimination data as shown in FIG. 11 since constant voice recognition is performed on a relatively small number of upper commands, it is possible to suppress voice recognition against the user's intention. It should be noted that the user can input voices of a plurality of lower level commands subsequently to the voice of the upper level command, and if the determination result is once determined to include the upper level command, the user can enter for a certain period of time. The determination result that the lower order command is included may be invalidated.
- FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment.
- the operation in FIG. 12 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3c.
- step S3 of the operation FIG. 4
- steps S1 and S2 processing similar to that in steps S1 and S2 of FIG. 4 is performed.
- step S3c the acquisition unit 11 acquires the communication status, the boarding status, and the determination result of the upper-level command, and outputs the communication status, the boarding status, and the determination result of the upper level command to the control unit 12.
- step S4 the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the determination result. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
- step S5 the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 12 ends.
- ⁇ Summary of Embodiment 5> According to the speech recognition device 1 according to the fifth embodiment as described above, one set of speech segment judgment unit and speech recognition processing are based on the recognition related situation and the judgment result of whether or not the upper level command is included. Part is used to recognize the input speech. As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
- Embodiment 6 The block configuration of the speech recognition device 1 according to the sixth embodiment of the present invention is the same as the block configuration (FIG. 5) of the speech recognition device 1 according to the third embodiment.
- the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
- the voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
- the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
- the acquisition unit 11 also acquires the usage state of the hardware of the speech recognition device 1.
- hardware of the speech recognition device 1 is hardware of a device such as a navigation device to which the speech recognition device 1 is applied, and the usage state of the hardware includes the usage rate of the hardware.
- the usage rate of hardware includes, for example, a usage rate of a central processing unit (CPU), a usage rate of a memory, and the like.
- the control section 12 Based on the communication status, the boarding status, and the use status acquired by the acquisition section 11, the control section 12 selects one set out of the plurality of speech segment determination sections 8c to 8f and the plurality of speech recognition processing sections 9c to 9f.
- the voice section determination unit and the voice recognition processing unit of the above are controlled to be used for recognition of the input voice.
- the storage unit 13 stores discrimination data used when the control unit 12 performs discrimination.
- FIG. 13 is a diagram showing a part of discrimination data according to the sixth embodiment.
- one of the 12 combinations shown in FIG. 6 is divided into whether the usage rate included in the usage state is equal to or more than a predetermined threshold value, and the voice section determination unit and the voice recognition processing unit Are set, but the voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.
- a voice segment determining unit and a voice recognition processing unit used when the usage rate included in the usage state is equal to or more than a predetermined threshold a normal voice segment determining unit and a normal voice recognition process A section is set, and as a voice section determination section and a voice recognition processing section used when the usage rate is less than the threshold, a regular voice section determination section and a regular voice recognition processing section are set.
- FIG. 14 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment.
- the operation in FIG. 14 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3d.
- step S3 of the operation FIG. 14
- step S3d FIG. 14
- steps S1 and S2 processing similar to that in steps S1 and S2 of FIG. 4 is performed.
- step S3d the acquisition unit 11 acquires the communication status, the boarding status, and the usage status of the hardware, and outputs the acquired status to the control unit 12.
- step S4 the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status, the boarding status, and the use status of the hardware. Do. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
- step S5 the same process as step S5 of FIG. 4 is performed, and the operation of FIG. 14 ends.
- ⁇ Summary of Embodiment 6> According to the speech recognition apparatus 1 according to the sixth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used to input an input speech based on the recognition related situation and the usage state of the hardware. Used for recognition. As a result, it is possible to use the speech segment determination unit and the speech recognition processing unit suitable for the usage state of the hardware for recognition of the input speech.
- the control unit 12 performs control to use one set of speech segment determination unit and speech recognition processing unit for recognition of input speech based on the communication status acquired by the acquisition unit 11. Not limited to this. For example, as illustrated in FIG. 15, when the communication status acquired by the acquisition unit 11 is low quality online, the control unit 12 performs the first speech zone judgment unit 8a and the second speech zone judgment unit 8b in parallel. Control may be performed using the first voice recognition processing unit 9a and the second voice section determination unit 8b in parallel. That is, based on the communication status acquired by the acquisition unit 11, the control unit 12 performs control of using a combination of a plurality of speech segment determination units and a combination of a plurality of speech recognition processing units for recognition of input speech. You may go.
- control unit 12 uses at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. Control may be performed. The same applies to the third to sixth embodiments and to the designation operation of the third embodiment.
- Embodiments 4 to 6 may be combined as appropriate. That is, the acquisition unit 11 may acquire the recognition related situation, and may acquire at least one of the operation history, the determination result of whether or not the upper-level command is included, and the usage state of the hardware. . Then, the control unit 12 performs at least one of the recognition related situation acquired by the acquisition unit 11 and at least one of the operation history acquired by the acquisition unit 11, the determination result, and the use state. Control may be performed in which one voice segment determination unit and at least one voice recognition processing unit are used to recognize an input voice.
- the acquisition unit 11 and the control unit 12 in FIG. 1 in the above-described speech recognition apparatus 1 will be referred to as “acquisition unit 11 and the like”.
- the acquisition unit 11 and the like are realized by the processing circuit 81 shown in FIG. That is, the processing circuit 81 obtains a plurality of speech segment determination units determined in advance and a plurality of voices determined in advance based on the acquisition unit 11 acquiring the recognition related situation and the recognition related situation acquired by the acquisition unit 11.
- the control unit 12 performs control to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech.
- Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in a memory may be applied.
- the processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), and the like.
- the processing circuit 81 may be, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), an FPGA (field programmable gate) An array) or a combination thereof is applicable.
- ASIC application specific integrated circuit
- FPGA field programmable gate
- Each function of each unit such as the acquisition unit 11 may be realized by a circuit in which processing circuits are dispersed, or the function of each unit may be realized by one processing circuit.
- the processing circuit 81 When the processing circuit 81 is a processor, the functions of the acquisition unit 11 and the like are realized by a combination with software and the like.
- the software and the like correspond to, for example, software, firmware, or software and firmware.
- Software and the like are described as a program and stored in the memory 83.
- the processor 82 applied to the processing circuit 81 implements the functions of the respective units by reading and executing the program stored in the memory 83. That is, when executed by the processing circuit 81, the speech recognition device 1 obtains a recognition related situation, and determines in advance a plurality of predetermined voice interval judging units based on the acquired recognition related situation.
- the memory 83 for storing a program to be executed.
- this program causes a computer to execute the procedure and method of the acquisition unit 11 and the like.
- the memory 83 is, for example, non-volatile or non-volatile, such as random access memory (RAM), read only memory (ROM), flash memory, erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM). Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc), its drive device, etc. or any storage medium used in the future May be
- the present invention is not limited to this, and a part of the acquisition unit 11 or the like may be realized by dedicated hardware, and another part may be realized by software or the like.
- the function of the acquisition unit 11 is realized by the processing circuit 81 and the receiver as dedicated hardware, and the processing circuit 81 as the processor 82 reads out and executes the program stored in the memory 83 for the rest. It is possible to realize the function by that.
- the processing circuit 81 can realize each of the functions described above by hardware, software, etc., or a combination thereof.
- the voice recognition device 1 described above includes at least one of a navigation device such as a portable navigation device (PND), a communication terminal including a portable terminal such as a mobile phone, a smartphone and a tablet, a navigation device and a communication terminal.
- a navigation device such as a portable navigation device (PND)
- PND portable navigation device
- a communication terminal including a portable terminal such as a mobile phone, a smartphone and a tablet
- a navigation device and a communication terminal can also be applied to a speech recognition system constructed as a system by appropriately combining the function of an application installed in one and the server.
- each function or each component of the speech recognition apparatus 1 described above may be distributed to each device configuring the system, or may be concentrated to any device. Good.
- FIG. 18 is a block diagram showing the configuration of the server 91 according to the present modification.
- the server 91 of FIG. 18 includes a communication unit 91a and a control unit 91b, and can perform wireless communication with the navigation device 93 of the vehicle 92.
- the communication unit 91a which is an acquisition unit, wirelessly communicates with the navigation device 93 to receive the recognition related situation.
- the control unit 91 b has a function similar to that of the control unit 12 of FIG. 1 by executing a program stored in a memory (not shown) of the server 91 by a processor (not shown) of the server 91 or the like. That is, the control unit 91b determines at least one voice segment determination unit and at least one voice recognition processing unit based on the recognition related situation received by the communication unit 91a, and determines the determination result as a navigation device. Send to 93
- the same effect as that of the speech recognition device 1 described in the first embodiment can be obtained.
- FIG. 19 is a block diagram showing a configuration of communication terminal 96 according to the present modification.
- the communication terminal 96 of FIG. 19 includes a communication unit 96a similar to the communication unit 91a and a control unit 96b similar to the control unit 91b, and can communicate wirelessly with the navigation device 98 of the vehicle 97. ing.
- a mobile terminal such as a mobile phone carried by the driver of the vehicle 97, a smart phone, and a tablet is applied.
- communication terminal 96 configured as described above, the same effect as that of speech recognition device 1 described in Embodiment 1 can be obtained.
- each embodiment and each modification can be freely combined, or each embodiment and each modification can be suitably modified or omitted.
- Reference Signs List 1 voice recognition apparatus 6 servers, 8a to 8f voice section determination unit, 9a to 9f voice recognition processing unit, 11 acquisition unit, 12 control unit, 21 recognition start button.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Navigation (AREA)
Abstract
The purpose of the present invention is to provide a technique capable of enhancing the accuracy of speech recognition. This speech recognition device is provided with an acquisition unit and a control unit. On the basis of a status acquired by the acquisition unit, the control unit carries out control so as to use, for recognition of input speech, at least one among a plurality of speech segment determination units that are determined in advance so as to be able to determine a speech segment which is a period for recognizing speech and at least one among a plurality of speech recognition processing units that are determined in advance so as to be able to carry out a process for recognizing the speech in the speech segment determined by the speech segment determination units.
Description
本発明は、入力された音声である入力音声の認識が可能な音声認識装置、及び、当該音声認識装置における音声認識方法に関する。
The present invention relates to a speech recognition apparatus capable of recognizing an input speech which is an input speech, and a speech recognition method in the speech recognition apparatus.
近年、音声を認識する音声認識装置について様々な技術が提案されている。例えば特許文献1には、車両室内の話者の位置に基づいて、プッシュトークモードと、ハンズフリーモードとを切り替える技術が提案されている。なお、プッシュトークモードとは、ボタンスイッチが押下げられた場合に音声を認識するモードであり、ハンズフリーモードとは、ボタンスイッチの押下げとは無関係に音声を認識するモードである。
In recent years, various techniques have been proposed for a speech recognition apparatus that recognizes speech. For example, Patent Document 1 proposes a technique for switching between a push talk mode and a hands free mode based on the position of a speaker in a vehicle cabin. The push-to-talk mode is a mode for recognizing voice when the button switch is pressed, and the hands-free mode is a mode for recognizing voice regardless of the pressing of the button switch.
音声認識装置では、音声を認識する期間である音声区間と、音声を認識しない期間である非音声区間とを判定してから、音声区間内の音声に対して音声認識処理を行う。しかしながら、従来技術では、話者の位置や使用状況に対して複数の音声区間判定及び複数の音声認識処理から適切な処理に切り替えないので、音声認識の精度が十分ではないという問題があった。
The speech recognition apparatus performs speech recognition processing on the speech in the speech segment after determining the speech segment which is a speech recognition period and the non-speech segment which is a period in which speech is not recognized. However, in the prior art, there is a problem that the accuracy of the speech recognition is not sufficient because the processing is not switched from the plurality of speech segment determinations and the plurality of speech recognition processing to the appropriate processing.
そこで、本発明は、上記のような問題点を鑑みてなされたものであり、音声認識の精度を高めることが可能な技術を提供することを目的とする。
Therefore, the present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technology capable of enhancing the accuracy of speech recognition.
本発明に係る音声認識装置は、入力された音声である入力音声の認識が可能な音声認識装置であって、入力音声の認識に関連する状況を取得する取得部と、取得部で取得された状況に基づいて、音声を認識する期間である音声区間を判定可能な予め定められた複数の音声区間判定部と、複数の音声区間判定部で判定された音声区間の音声を認識する処理が可能な予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う制御部とを備える。
A speech recognition apparatus according to the present invention is a speech recognition apparatus capable of recognizing an input speech that is an input speech, and acquired by an acquisition unit that acquires a situation related to recognition of the input speech, and an acquisition unit. Based on the situation, it is possible to process voice recognition of voice segments determined by a plurality of predetermined voice segment determination units that can determine voice segments that are voice recognition periods and multiple voice segment determination units And a control unit for performing control to use at least any one voice segment determination unit and at least one voice recognition processing unit from among a plurality of predetermined voice recognition processing units for recognizing an input voice. Equipped with
本発明によれば、入力音声の認識に関連する状況に基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う。これにより、音声認識の精度を高めることができる。
According to the present invention, control is performed to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of an input speech based on a situation related to recognition of the input speech. This can improve the accuracy of speech recognition.
本発明の目的、特徴、態様及び利点は、以下の詳細な説明と添付図面とによって、より明白となる。
The objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.
<実施の形態1>
以下、本発明の実施の形態1に係る音声認識装置が搭載され、着目の対象となる車両を「自車両」と記載して説明する。なお、この音声認識装置は、例えば自車両に搭載されたナビゲーション装置などに適用することができる。 Embodiment 1
Hereinafter, the voice recognition device according to the first embodiment of the present invention is mounted, and a vehicle to be focused is described as “own vehicle”. This voice recognition device can be applied to, for example, a navigation device mounted on the vehicle.
以下、本発明の実施の形態1に係る音声認識装置が搭載され、着目の対象となる車両を「自車両」と記載して説明する。なお、この音声認識装置は、例えば自車両に搭載されたナビゲーション装置などに適用することができる。 Embodiment 1
Hereinafter, the voice recognition device according to the first embodiment of the present invention is mounted, and a vehicle to be focused is described as “own vehicle”. This voice recognition device can be applied to, for example, a navigation device mounted on the vehicle.
図1は、本実施の形態1に係る音声認識装置1の構成を示すブロック図である。図1の音声認識装置1は、音声認識装置1に入力された音声である入力音声の認識が可能な装置である。つまり、音声認識装置1は、音声認識で認識された語彙であって、ユーザが発声した音響的・言語的に最も確からしい語彙に基づいて、認識語彙の選出を行う装置である。このような装置の一例が、例えば特開平9-50291号公報に開示されている。以下では、入力音声は、音声の強度(振幅)及び高さ(周波数)を示す音声データであるものとして説明する。
FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus 1 according to the first embodiment. The speech recognition apparatus 1 of FIG. 1 is an apparatus capable of recognizing an input speech which is a speech input to the speech recognition apparatus 1. That is, the speech recognition device 1 is a device that selects a recognition vocabulary based on a vocabulary recognized by speech recognition and that is most likely to be acoustically and linguistically probable by the user. An example of such an apparatus is disclosed in, for example, Japanese Patent Application Laid-Open No. 9-50291. Hereinafter, the input voice will be described as voice data indicating the strength (amplitude) and height (frequency) of the voice.
図1の音声認識装置1は、取得部11と制御部12とを備える。
The speech recognition apparatus 1 of FIG. 1 includes an acquisition unit 11 and a control unit 12.
取得部11は、入力音声の認識に関連する状況を取得する。以下の説明では、入力音声の認識に関連する状況を「認識関連状況」と記すこともある。
The acquisition unit 11 acquires a situation related to recognition of input speech. In the following description, a situation related to recognition of input speech may be referred to as "recognition related situation".
制御部12は、取得部11で取得された認識関連状況に基づいて、予め定められた複数の音声区間判定部と、予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う。以下の説明では、予め定められた複数の音声区間判定部を「複数の音声区間判定部」とのみ記載することもあり、予め定められた複数の音声認識処理部を「複数の音声認識処理部」とのみ記載して説明することもある。
The control unit 12 selects at least one of a plurality of speech segment determination units determined in advance and a plurality of speech recognition processing units determined in advance based on the recognition related situation acquired by the acquisition unit 11. Control is performed to use one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. In the following description, a plurality of predetermined speech segment determination units may be described only as "a plurality of speech segment determination units", and a plurality of predetermined speech recognition processing units may be referred to as "a plurality of speech recognition processing units. There is also a case where it is described only as "."
複数の音声区間判定部のそれぞれは、音声を認識する期間である音声区間と、音声を認識しない期間である非音声区間とを判定する。なお、複数の音声区間判定部の少なくともいずれか1つは、例えば、音声認識装置1、及び、音声認識装置1と無線などによって通信可能なサーバの少なくともいずれか1つに備えられる。
Each of the plurality of voice segment determination units determines a voice segment that is a period for recognizing voice and a non-voice segment that is a period for not recognizing voice. Note that at least one of the plurality of voice section determination units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.
複数の音声認識処理部のそれぞれは、複数の音声区間判定部で判定された音声区間内の音声を認識する処理可能に構成されている。つまり、複数の音声認識処理部のそれぞれは、複数の音声区間判定部の少なくともいずれか1つで判定された音声区間内の音声に含まれる特徴を抽出し、当該特徴に基づいて語彙または単語を、認識結果である認識語彙として求める。なお、複数の音声認識処理部の少なくともいずれか1つは、例えば、音声認識装置1、及び、音声認識装置1と無線などによって通信可能なサーバの少なくともいずれか1つに備えられる。
Each of the plurality of voice recognition processing units is configured to be able to process voices in the voice section determined by the plurality of voice section determination units. That is, each of the plurality of voice recognition processing units extracts a feature included in the voice within the voice section determined by at least one of the plurality of voice section determination units, and a vocabulary or a word is extracted based on the feature. , It asks for as a recognition vocabulary which is a recognition result. Note that at least one of the plurality of voice recognition processing units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.
<実施の形態1のまとめ>
以上のような本実施の形態1に係る音声認識装置1によれば、入力音声の認識に関連する状況に基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う。これにより、入力音声の認識に関連する状況に適した、判定区間の判定及び認識処理を行うことができるので、音声認識の精度を高めることができる。 <Summary of Embodiment 1>
According to the speech recognition apparatus 1 according to the first embodiment as described above, at least one speech segment determination unit and at least one speech recognition processing unit based on the situation related to the recognition of the input speech Is used to recognize input speech. This makes it possible to perform determination and recognition processing of the determination section suitable for the situation related to the recognition of the input speech, so that the accuracy of the speech recognition can be enhanced.
以上のような本実施の形態1に係る音声認識装置1によれば、入力音声の認識に関連する状況に基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う。これにより、入力音声の認識に関連する状況に適した、判定区間の判定及び認識処理を行うことができるので、音声認識の精度を高めることができる。 <Summary of Embodiment 1>
According to the speech recognition apparatus 1 according to the first embodiment as described above, at least one speech segment determination unit and at least one speech recognition processing unit based on the situation related to the recognition of the input speech Is used to recognize input speech. This makes it possible to perform determination and recognition processing of the determination section suitable for the situation related to the recognition of the input speech, so that the accuracy of the speech recognition can be enhanced.
<実施の形態2>
図2は、本発明の実施の形態2に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態2で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。なお、図2には、音声認識装置1と無線などによって通信が可能なサーバ6も図示されている。 Second Embodiment
FIG. 2 is a block diagram showing a configuration of the speech recognition device 1 according toEmbodiment 2 of the present invention. Hereinafter, among the components described in the second embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described. Note that FIG. 2 also shows a server 6 capable of communicating with the voice recognition device 1 by radio or the like.
図2は、本発明の実施の形態2に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態2で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。なお、図2には、音声認識装置1と無線などによって通信が可能なサーバ6も図示されている。 Second Embodiment
FIG. 2 is a block diagram showing a configuration of the speech recognition device 1 according to
図2の音声認識装置1は、音声認識方式判別部2と、音声認識部3とを備える。
The speech recognition apparatus 1 of FIG. 2 includes a speech recognition method determination unit 2 and a speech recognition unit 3.
<音声認識方式判別部>
音声認識方式判別部2は、実施の形態1で説明した取得部11及び制御部12に相当する取得部11及び制御部12に加えて、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognitionsystem determination unit 2 includes a storage unit 13 in addition to the acquisition unit 11 and the control unit 12 corresponding to the acquisition unit 11 and the control unit 12 described in the first embodiment.
音声認識方式判別部2は、実施の形態1で説明した取得部11及び制御部12に相当する取得部11及び制御部12に加えて、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognition
取得部11は、音声認識装置1とサーバ6との間の通信状況を、認識関連状況として取得する。この通信状況には、高質オンライン、低質オンライン、及び、オフラインが選択的に含まれる。
The acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related status. The communication status selectively includes high quality online, low quality online, and offline.
高質オンラインは、音声認識装置1とサーバ6との通信が行われている状況であり、かつ当該通信の評価値が予め定められた閾値以上である状況である。通信の評価値は、例えば、通信の電波強度が強くなるにつれて大きくなる値であってもよいし、通信の速度が速くなるにつれて大きくなる値であってもよいし、その両方の値を組み合わせた値であってもよい。低質オンラインは、音声認識装置1とサーバ6との通信が行われている状況であり、かつ当該通信の評価値が閾値未満である状況である。オフラインは、音声認識装置1とサーバ6との通信が行われていない状況である。なお、取得部11は、取得した通信状況を記憶部13に適宜記憶してもよい。
High quality online is a situation in which communication between the speech recognition device 1 and the server 6 is being performed, and a situation in which the evaluation value of the communication is equal to or greater than a predetermined threshold. The evaluation value of communication may be, for example, a value that increases as the field strength of communication increases, or may be a value that increases as the speed of communication increases, or a combination of the two values. It may be a value. The low quality online is a state in which communication between the speech recognition device 1 and the server 6 is performed, and the evaluation value of the communication is less than a threshold. Offline is a situation where communication between the speech recognition device 1 and the server 6 is not performed. The acquisition unit 11 may appropriately store the acquired communication status in the storage unit 13.
複数の音声区間判定部8a,8bは、実施の形態1で説明した複数の音声区間判定部に相当する構成要素であり、音声認識部3に備えられた第1音声区間判定部8aと、サーバ6に備えられた第2音声区間判定部8bとを含んでいる。複数の音声認識処理部9a,9bは、実施の形態1で説明した複数の音声認識処理部に相当する構成要素であり、音声認識部3に備えられた第1音声認識処理部9aと、サーバ6に備えられた第2音声認識処理部9bとを含んでいる。
The plurality of voice section determination units 8a and 8b are components equivalent to the plurality of voice section determination units described in the first embodiment, and the first voice section determination unit 8a included in the voice recognition unit 3 and the server And the second voice section determining unit 8b provided in the second embodiment. The plurality of voice recognition processing units 9a and 9b are components equivalent to the plurality of voice recognition processing units described in the first embodiment, and the first voice recognition processing unit 9a included in the voice recognition unit 3 and the server And the second speech recognition processing unit 9b provided in the second embodiment.
制御部12は、取得部11で取得された通信状況に基づいて、複数の音声区間判定部8a,8bと複数の音声認識処理部9a,9bとの中から、1組の音声区間判定部及び音声認識処理部を判別する。そして、制御部12は、判別した1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
The control unit 12 selects one set of voice interval determination units from among the plurality of voice interval determination units 8a and 8b and the plurality of voice recognition processing units 9a and 9b based on the communication status acquired by the acquisition unit 11. Determine the speech recognition processor. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
記憶部13は、制御部12が判別を行う際に用いる判別用データを記憶している。図3は、本実施の形態2に係る判別用データを示す図である。図3の判別用データでは、1つの通信状況に対して、1組の音声区間判定部及び音声認識処理部が対応付けられている。なお、図3において、「音声区間判定部」の「1」及び「2」はそれぞれ第1音声区間判定部8a及び第2音声区間判定部8bを示し、「音声認識処理部」の「1」及び「2」はそれぞれ第1音声認識処理部9a及び第2音声認識処理部9bを示す。
The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 3 is a diagram showing discrimination data according to the second embodiment. In the discrimination data of FIG. 3, one set of speech zone determination unit and speech recognition processing unit are associated with one communication situation. In FIG. 3, “1” and “2” of the “voice segment determination unit” indicate the first voice segment determination unit 8 a and the second voice segment determination unit 8 b, respectively, and “1” of the “voice recognition processing unit”. And "2" indicate the first speech recognition processor 9a and the second speech recognition processor 9b, respectively.
図3の判別用データが記憶部13に記憶されている構成において、例えば、取得部11で取得された通信状況が低質オンラインであった場合には、制御部12は、第1音声区間判定部8a及び第2音声認識処理部9bを、入力音声の認識に使用する制御を行うことになる。なお、第1音声区間判定部8aなどの音声区間判定部、及び、第2音声認識処理部9bなどの音声認識処理部は同時に複数の処理を行ってもよい。
In the configuration in which the data for determination of FIG. 3 is stored in the storage unit 13, for example, when the communication status acquired by the acquisition unit 11 is low quality online, the control unit 12 determines the first voice segment determination unit Control to use 8a and the 2nd speech recognition processing part 9b for recognition of an input speech will be performed. The voice section determination unit such as the first voice section determination unit 8a and the voice recognition processing unit such as the second voice recognition processing unit 9b may perform a plurality of processes at the same time.
<音声認識部及びサーバ>
音声認識部3は、オフライン音声区間判定部である第1音声区間判定部8aと、オフライン音声認識処理部である第1音声認識処理部9aと、第1認識辞書記憶部10aとを備える。第1認識辞書記憶部10aには、第1音声認識処理部9aが音声の認識処理を行う際に用いる辞書が記憶されている。音声認識部3は、制御部12の制御に基づいて、第1音声区間判定部8a及び第1音声認識処理部9aを、入力音声の認識に適宜使用する。 <Voice recognition unit and server>
The voice recognition unit 3 includes a first voice section determination unit 8a which is an off-line voice section determination unit, a first voicerecognition processing unit 9a which is an off-line voice recognition processing unit, and a first recognition dictionary storage unit 10a. The first recognition dictionary storage unit 10a stores a dictionary used when the first speech recognition processing unit 9a performs speech recognition processing. The voice recognition unit 3 appropriately uses the first voice section determination unit 8a and the first voice recognition processing unit 9a for recognition of the input voice based on the control of the control unit 12.
音声認識部3は、オフライン音声区間判定部である第1音声区間判定部8aと、オフライン音声認識処理部である第1音声認識処理部9aと、第1認識辞書記憶部10aとを備える。第1認識辞書記憶部10aには、第1音声認識処理部9aが音声の認識処理を行う際に用いる辞書が記憶されている。音声認識部3は、制御部12の制御に基づいて、第1音声区間判定部8a及び第1音声認識処理部9aを、入力音声の認識に適宜使用する。 <Voice recognition unit and server>
The voice recognition unit 3 includes a first voice section determination unit 8a which is an off-line voice section determination unit, a first voice
サーバ6は、オンライン音声区間判定部である第2音声区間判定部8bと、オンライン音声認識処理部である第2音声認識処理部9bと、第2認識辞書記憶部10bとを備える。第2認識辞書記憶部10bには、第2音声認識処理部9bが音声の認識処理を行う際に用いる辞書が記憶されている。サーバ6は、制御部12の制御に基づいて、第2音声区間判定部8b及び第2音声認識処理部9bを、入力音声の認識に適宜使用する。
The server 6 includes a second speech zone determination unit 8b which is an online speech zone judgment unit, a second speech recognition processing unit 9b which is an online speech recognition processing unit, and a second recognition dictionary storage unit 10b. The second recognition dictionary storage unit 10 b stores a dictionary used when the second speech recognition processing unit 9 b performs a speech recognition process. The server 6 appropriately uses the second voice section determining unit 8b and the second voice recognition processing unit 9b to recognize an input voice based on the control of the control unit 12.
なお、一般的にはハードウェアの制限のため、音声認識装置1側の第1音声区間判定部8aは、サーバ6側の第2音声区間判定部8bよりも音声区間の判定精度は低いが、通信状況に左右されることなく判定を行うことが可能である。また、音声認識装置1側の第1音声認識処理部9aは、サーバ6側の第2音声認識処理部9bよりも、一般的にはハードウェアの制限のため、認識可能な語彙の数は少ないが、通信状況に左右されることなく認識処理を行うことが可能である。
In general, the first speech zone determination unit 8a on the speech recognition device 1 side has lower determination accuracy of the speech zone than the second speech zone determination unit 8b on the server 6 side because of hardware limitations. It is possible to make the determination regardless of the communication status. Also, the first speech recognition processing unit 9a on the speech recognition device 1 side has a smaller number of recognizable vocabularies than the second speech recognition processing unit 9b on the server 6 side in general due to hardware limitations. However, it is possible to carry out recognition processing regardless of the communication situation.
このことと、図3の判別用データとによって、本実施の形態2に係る音声認識装置1は、以下のように通信状況に適した音声認識を行うことができる。
Based on this and the discrimination data of FIG. 3, the speech recognition apparatus 1 according to the second embodiment can perform speech recognition suitable for the communication situation as follows.
例えば、通信状況が高質オンラインである場合、図3の判別用データによって第2音声区間判定部8b及び第2音声認識処理部9bが使用されるので、判定精度が高く、認識可能な語彙の数が多い音声認識を行うことができる。
For example, when the communication status is high quality online, the second voice section determining unit 8b and the second voice recognition processing unit 9b are used according to the determination data of FIG. A large number of speech recognitions can be performed.
通信状況が低質オンラインである場合、図3の判別用データによって第1音声区間判定部8a及び第2音声認識処理部9bが使用される。これにより、音声認識装置1側で音声区間判定を行い、音声区間判定に用いられるデータが除去された入力音声、つまり音声の認識処理に用いられるデータだけを含む入力音声をサーバ6に送信することによって、オンラインによる音声認識を行うことができる。つまり、通信状況が悪い場合であっても、通信すべき入力音声のデータ量を低減することによって、サーバ6の音声認識を行うことができる。
When the communication status is low quality online, the first voice section determining unit 8a and the second voice recognition processing unit 9b are used according to the determination data of FIG. As a result, the voice recognition unit 1 performs voice zone determination, and transmits to the server 6 input voice from which data used for voice zone determination is removed, that is, only input voice used for voice recognition processing. Can perform online speech recognition. That is, even if the communication situation is bad, the speech recognition of the server 6 can be performed by reducing the data amount of the input speech to be communicated.
通信状況がオフラインである場合には、図3の判別用データによって第1音声区間判定部8a及び第1音声認識処理部9aが使用されるので、通信状況に左右されない音声認識を行うことができる。
When the communication status is offline, the first voice section determining unit 8a and the first voice recognition processing unit 9a are used according to the determination data of FIG. 3, so voice recognition can be performed regardless of the communication status. .
<動作>
図4は、本実施の形態2に係る音声認識装置1の動作を示すフローチャートである。なお、図4の動作は随時行われる。 <Operation>
FIG. 4 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment. The operation of FIG. 4 is performed as needed.
図4は、本実施の形態2に係る音声認識装置1の動作を示すフローチャートである。なお、図4の動作は随時行われる。 <Operation>
FIG. 4 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment. The operation of FIG. 4 is performed as needed.
まずステップS1にて、取得部11は、入力音声を取得し、当該入力音声を制御部12に出力する。
First, in step S <b> 1, the acquisition unit 11 acquires an input voice and outputs the input voice to the control unit 12.
ステップS2にて、制御部12は、入力音声の強度が予め定められた閾値以上であるか否かを判定する。入力音声の強度が閾値以上であると判定した場合には処理がステップS3に進み、入力音声の強度が閾値未満であると判定した場合には図4の動作が終了する。
At step S2, control unit 12 determines whether the intensity of the input voice is equal to or greater than a predetermined threshold. If it is determined that the strength of the input voice is equal to or greater than the threshold, the process proceeds to step S3. If it is determined that the strength of the input voice is less than the threshold, the operation of FIG. 4 ends.
ステップS3にて、取得部11は、音声認識装置1とサーバ6との間の通信状況を取得し、当該通信状況を制御部12に出力する。
In step S 3, the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6, and outputs the communication status to the control unit 12.
ステップS4にて、制御部12は、通信状況に基づき判別用データに従って、入力音声の認識に使用すべき1組の音声区間判定部及び音声認識処理部を判別する。この際、制御部12は、入力音声を音声認識部3に出力したりサーバ6に送信したりする。
In step S4, the control unit 12 determines one set of voice segment determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
ステップS5にて、音声認識部3及びサーバ6は、制御部12で判別された1組の音声区間判定部及び音声認識処理部を使用して入力音声を認識する。そして、その認識結果である認識語彙が、音声認識装置1から出力される。その後、図4の動作が終了する。
In step S5, the speech recognition unit 3 and the server 6 recognize the input speech using the set of speech segment determination unit and speech recognition processing unit determined by the control unit 12. Then, the recognition vocabulary as the recognition result is output from the speech recognition device 1. Thereafter, the operation of FIG. 4 ends.
<実施の形態2のまとめ>
以上のような本実施の形態2に係る音声認識装置1によれば、音声認識装置1とサーバ6との間の通信状況に適した音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary ofEmbodiment 2>
According to the speech recognition device 1 according to the second embodiment as described above, the speech segment determination unit and the speech recognition processing unit suitable for the communication situation between the speech recognition device 1 and theserver 6 It can be used for
以上のような本実施の形態2に係る音声認識装置1によれば、音声認識装置1とサーバ6との間の通信状況に適した音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary of
According to the speech recognition device 1 according to the second embodiment as described above, the speech segment determination unit and the speech recognition processing unit suitable for the communication situation between the speech recognition device 1 and the
<実施の形態3>
図5は、本発明の実施の形態3に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態3で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。 Embodiment 3
FIG. 5 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 3 of the present invention. Hereinafter, among the components described in the third embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
図5は、本発明の実施の形態3に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態3で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。 Embodiment 3
FIG. 5 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 3 of the present invention. Hereinafter, among the components described in the third embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
図5の音声認識装置1は、実施の形態2に係る音声認識装置1(図2)と同様に、音声認識方式判別部2と、音声認識部3とを備える。なお、図5の音声認識装置1には、後述する認識開始ボタン21が接続されている。
The speech recognition apparatus 1 of FIG. 5 includes a speech recognition method determination unit 2 and a speech recognition unit 3 as in the speech recognition apparatus 1 (FIG. 2) according to the second embodiment. In addition, the recognition start button 21 mentioned later is connected to the speech recognition apparatus 1 of FIG.
<音声認識方式判別部>
音声認識方式判別部2は、実施の形態2に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The speech recognitionmethod determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the speech recognition method determination unit 2 according to the second embodiment.
音声認識方式判別部2は、実施の形態2に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The speech recognition
取得部11は、実施の形態2に係る取得部11と同様に、音声認識装置1とサーバ6との間の通信状況を、認識関連状況として取得する。
Similar to the acquisition unit 11 according to the second embodiment, the acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related situation.
また、取得部11は、音声認識装置1が搭載された自車両の乗員の乗車状況も、認識関連状況として取得する。この乗車状況には、運転者の乗車有無、及び、運転者以外の乗員である同乗者の乗車有無が含まれる。
In addition, the acquisition unit 11 also acquires, as a recognition related situation, the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted. The riding condition includes the presence or absence of the driver and the presence or absence of the passenger who is a passenger other than the driver.
なお、取得部11は、自車両のシートベルトが使用されているか否かに基づいて乗車状況を判定するように構成されてもよいし、自車両の着席を検出するセンサが各席に設けられている場合には当該センサの検出結果に基づいて乗車状況を判定するように構成されてもよい。または、取得部11は、自車両の室内の画像を撮像し、当該画像に対して画像認識を行うことによって乗車状況を判定するように構成されてもよい。または、取得部11は、音声認識装置1の外部に設けられた外部装置から上記判定の結果を取得するインターフェースであってもよい。なお、取得部11は、取得した通信状況及び乗車状況を記憶部13に適宜記憶してもよい。
In addition, the acquisition unit 11 may be configured to determine the riding condition based on whether or not the seat belt of the own vehicle is used, and a sensor for detecting the seating of the own vehicle is provided in each seat If yes, the boarding situation may be determined based on the detection result of the sensor. Alternatively, the acquisition unit 11 may be configured to capture an image of the interior of the host vehicle and to determine the riding condition by performing image recognition on the image. Alternatively, the acquisition unit 11 may be an interface that acquires the result of the determination from an external device provided outside the speech recognition apparatus 1. The acquisition unit 11 may appropriately store the acquired communication status and boarding status in the storage unit 13.
制御部12は、取得部11で取得された通信状況及び乗車状況に基づいて、予め定められた複数の音声区間判定部8c,8d,8e,8fと予め定められた複数の音声認識処理部9c,9d,9e,9fとの中から、1組の音声区間判定部及び音声認識処理部を判別する。そして、制御部12は、判別した1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
Based on the communication status and the boarding status acquired by the acquisition section 11, the control section 12 determines a plurality of speech segment determination sections 8c, 8d, 8e and 8f determined in advance and a plurality of speech recognition processing sections 9c determined in advance. , 9d, 9e, and 9f, one set of voice section determination unit and voice recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
複数の音声区間判定部8c~8fは、音声認識部3に備えられた第1音声区間判定部8c及び第2音声区間判定部8dと、サーバ6に備えられた第3音声区間判定部8e及び第4音声区間判定部8fとを含んでいる。複数の音声認識処理部9c~9fは、音声認識部3に備えられた第1音声認識処理部9c及び第2音声認識処理部9dと、サーバ6に備えられた第3音声認識処理部9e及び第4音声認識処理部9fとを含んでいる。
The plurality of voice section determination units 8c to 8f are a first voice section determination unit 8c and a second voice section determination unit 8d provided in the voice recognition unit 3, and a third voice section determination unit 8e provided in the server 6. And a fourth speech zone determination unit 8f. The plurality of voice recognition processing units 9c to 9f are a first voice recognition processing unit 9c and a second voice recognition processing unit 9d provided in the voice recognition unit 3, and a third voice recognition processing unit 9e provided in the server 6. And a fourth speech recognition processor 9f.
記憶部13は、制御部12が判別を行う際に用いる判別用データを記憶している。図6は、本実施の形態3に係る判別用データを示す図である。図6の判別用データでは、1組の同乗者の乗車有無、運転者の乗車有無及び通信状況に対して、1組の音声区間判定部及び音声認識処理部が対応付けられている。なお、図6において、「音声区間判定部」の「1」~「4」はそれぞれ第1音声区間判定部8c~第4音声区間判定部8fを示し、「音声認識処理部」の「1」~「4」はそれぞれ第1音声認識処理部9c~第4音声認識処理部9fを示す。
The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 6 is a diagram showing discrimination data according to the third embodiment. In the discrimination data of FIG. 6, one set of voice section determination unit and voice recognition processing unit are associated with the presence or absence of one set of passenger, the presence or absence of the driver, and the communication status. In FIG. 6, "1" to "4" of the "voice section determination unit" indicate the first voice section determination unit 8c to the fourth voice section determination unit 8f, respectively, and "1" of the "voice recognition processing unit". '4' indicate the first speech recognition processor 9c to the fourth speech recognition processor 9f, respectively.
<音声認識部及びサーバ>
音声認識部3は、常時オフライン音声区間判定部である第1音声区間判定部8cと、常時オフライン音声認識処理部である第1音声認識処理部9cと、第1認識辞書記憶部10cと、通常オフライン音声区間判定部である第2音声区間判定部8dと、通常オフライン音声認識処理部である第2音声認識処理部9dと、第2認識辞書記憶部10dとを備える。なお、第1及び第2認識辞書記憶部10c,10dには、第1及び第2音声認識処理部9c,9dが音声の認識処理を行う際に用いる辞書がそれぞれ記憶されている。音声認識部3は、制御部12の制御に基づいて、第1及び第2音声区間判定部8c,8d並びに第1及び第2音声認識処理部9c,9dを、入力音声の認識に適宜使用する。 <Voice recognition unit and server>
The voice recognition unit 3 includes a first voicesection determination unit 8c, which is an always-offline voice section determination unit, a first voice recognition processing unit 9c, which is an always-offline voice recognition processing unit, a first recognition dictionary storage unit 10c, A second speech zone judgment unit 8d which is an offline speech zone judgment unit, a second speech recognition processing unit 9d which is a normal offline speech recognition processor, and a second recognition dictionary storage unit 10d. The first and second recognition dictionary storage units 10c and 10d respectively store dictionaries used when the first and second speech recognition processing units 9c and 9d perform speech recognition processing. The voice recognition unit 3 appropriately uses the first and second voice section determination units 8c and 8d and the first and second voice recognition processing units 9c and 9d based on the control of the control unit 12 for recognition of the input voice. .
音声認識部3は、常時オフライン音声区間判定部である第1音声区間判定部8cと、常時オフライン音声認識処理部である第1音声認識処理部9cと、第1認識辞書記憶部10cと、通常オフライン音声区間判定部である第2音声区間判定部8dと、通常オフライン音声認識処理部である第2音声認識処理部9dと、第2認識辞書記憶部10dとを備える。なお、第1及び第2認識辞書記憶部10c,10dには、第1及び第2音声認識処理部9c,9dが音声の認識処理を行う際に用いる辞書がそれぞれ記憶されている。音声認識部3は、制御部12の制御に基づいて、第1及び第2音声区間判定部8c,8d並びに第1及び第2音声認識処理部9c,9dを、入力音声の認識に適宜使用する。 <Voice recognition unit and server>
The voice recognition unit 3 includes a first voice
サーバ6は、常時オンライン音声区間判定部である第3音声区間判定部8eと、常時オンライン音声認識処理部である第3音声認識処理部9eと、第3認識辞書記憶部10eと、通常オンライン音声区間判定部である第4音声区間判定部8fと、通常オンライン音声認識処理部である第4音声認識処理部9fと、第4認識辞書記憶部10fとを備える。なお、第3及び第4認識辞書記憶部10e,10fには、第3及び第4音声認識処理部9e,9fが音声の認識処理を行う際に用いる辞書がそれぞれ記憶されている。サーバ6は、制御部12の制御に基づいて、第3及び第4音声区間判定部8e,8f並びに第3及び第4音声認識処理部9e,9fを、入力音声の認識に適宜使用する。
The server 6 includes a third voice zone determination unit 8e which is an always online voice zone determination unit, a third voice recognition processing unit 9e which is an always online voice recognition processing unit, a third recognition dictionary storage unit 10e, and a normal online voice It includes a fourth speech section judging section 8f which is a section judging section, a fourth speech recognition processing section 9f which is a normal online speech recognition processing section, and a fourth recognition dictionary storage section 10f. The third and fourth recognition dictionary storage units 10e and 10f respectively store dictionaries used when the third and fourth speech recognition processing units 9e and 9f perform speech recognition processing. The server 6 appropriately uses the third and fourth speech zone determination units 8e and 8f and the third and fourth speech recognition processing units 9e and 9f for recognition of the input speech based on the control of the control unit 12.
なお、音声認識装置1側の第1及び第2音声区間判定部8c,8dは、サーバ6側の第3及び第4音声区間判定部8e,8fよりも音声区間の判定精度は低いが、通信状況に左右されることなく判定を行うことが可能である。また、音声認識装置1側の第1及び第2音声認識処理部9c,9dは、サーバ6側の第3及び第4音声認識処理部9e,9fよりも認識可能な語彙の数は少ないが、通信状況に左右されることなく認識処理を行うことが可能である。このことと、図6の判別用データとによって、本実施の形態3に係る音声認識装置1は、実施の形態2に係る音声認識装置1と同様に、通信状況に適した音声認識を行うことができる。
In addition, although the first and second voice section determination units 8c and 8d on the voice recognition device 1 side have lower determination accuracy of voice sections than the third and fourth voice section determination units 8e and 8f on the server 6 side, communication It is possible to make the determination independently of the situation. In addition, although the first and second speech recognition processing units 9c and 9d on the speech recognition device 1 side have fewer recognizable vocabularies than the third and fourth speech recognition processing units 9e and 9f on the server 6 side, It is possible to perform recognition processing regardless of communication conditions. Based on this and the discrimination data shown in FIG. 6, the speech recognition apparatus 1 according to the third embodiment performs speech recognition suitable for the communication situation as in the speech recognition apparatus 1 according to the second embodiment. Can.
また、第1及び第3音声区間判定部8c,8eは、音声区間を常時判定する音声区間判定部であり、第2及び第4音声区間判定部8d,8fは、予め定められた操作に応じて音声区間を判定する音声区間判定部である。第1及び第3音声認識処理部9c,9eは、音声を常時認識処理する音声認識処理部であり、第2及び第4音声認識処理部9d,9fは、予め定められた操作に応じて音声を認識処理する音声認識処理部である。なお本実施の形態3では、予め定められた操作は、音声認識を開始する認識開始ボタン21に対する操作である。
Further, the first and third speech segment determination units 8c and 8e are speech segment determination units that constantly determine a speech segment, and the second and fourth speech segment determination units 8d and 8f respond to a predetermined operation. It is a voice section determination unit that determines a voice section. The first and third speech recognition processing units 9c and 9e are speech recognition processing units that always recognize and process speech, and the second and fourth speech recognition processing units 9d and 9f perform speech in accordance with a predetermined operation. Is a speech recognition processing unit that recognizes and processes. In the third embodiment, the predetermined operation is an operation on the recognition start button 21 for starting voice recognition.
このことと、図6の判別用データとによって、本実施の形態3に係る音声認識装置1は、以下のように乗車状況に適した音声認識を行うことができる。
Based on this and the discrimination data of FIG. 6, the speech recognition apparatus 1 according to the third embodiment can perform speech recognition suitable for the riding situation as follows.
例えば、同乗者の乗車があり、かつ、運転者の乗車がない場合、図6の判別用データによって第1音声区間判定部8cまたは第3音声区間判定部8eと、第1音声認識処理部9cまたは第3音声認識処理部9eとが使用される。これにより、音声認識を望む同乗者は、認識開始ボタン21に対する操作を行わなくても、音声認識を行うことができる。このことは、特に認識開始ボタン21が同乗者から離れた場所に設けられている場合に同乗者にとって便利である。
For example, when there is a passenger's boarding and there is no driver's boarding, the first voice section judging section 8c or the third voice section judging section 8e and the first voice recognition processing section 9c according to the discrimination data of FIG. Alternatively, the third speech recognition processor 9e is used. Thus, a passenger who desires voice recognition can perform voice recognition without performing an operation on the recognition start button 21. This is convenient for the passenger especially when the recognition start button 21 is provided at a place away from the passenger.
一方、同乗者の乗車があり、かつ、運転者の乗車がない場合以外の場合には、図6の判別用データによって第2音声区間判定部8dまたは第4音声区間判定部8fと、第2音声認識処理部9dまたは第4音声認識処理部9fとが使用される。これにより、意図せずに音声認識が行われることを抑制することができる。
On the other hand, except when there is a passenger's boarding and there is no driver's boarding, the second voice section judging unit 8d or the fourth voice section judging unit 8f and the second voice section judging unit 8f according to the discrimination data of FIG. A voice recognition processor 9d or a fourth voice recognition processor 9f is used. This can suppress unintended speech recognition.
なお、判別用データは、図6のように設定されたデータに限ったものではない。例えば、同乗者の乗車がなく、かつ、運転者の乗車がある場合、運転者は、運転に集中することが望ましいと考えられる。このため、同乗者の乗車がなく、かつ、運転者の乗車がある場合には、第1音声区間判定部8cまたは第3音声区間判定部8eと、第1音声認識処理部9cまたは第3音声認識処理部9eとが使用されてもよい。
Note that the discrimination data is not limited to the data set as shown in FIG. For example, when there is no passenger's boarding and there is a driver's boarding, it may be desirable for the driver to concentrate on driving. Therefore, when the passenger does not get in and the driver gets in, the first voice section determination unit 8c or the third voice section determination unit 8e, and the first voice recognition processing unit 9c or the third voice The recognition processor 9e may be used.
<動作>
図7は、本実施の形態3に係る音声認識装置1の動作を示すフローチャートである。なお、図7の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3aに変更したものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 7 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment. The operation in FIG. 7 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3a. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
図7は、本実施の形態3に係る音声認識装置1の動作を示すフローチャートである。なお、図7の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3aに変更したものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 7 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment. The operation in FIG. 7 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3a. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
ステップS1及びS2にて、図4のステップS1及びS2と同様の処理が行われる。
In steps S1 and S2, processing similar to that in steps S1 and S2 of FIG. 4 is performed.
ステップS3aにて、取得部11は、音声認識装置1とサーバ6との間の通信状況を取得して制御部12に出力する。また、取得部11は、音声認識装置1が搭載された自車両の乗員の乗車状況を取得して制御部12に出力する。
In step S3a, the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6 and outputs the communication state to the control unit 12. Further, the acquisition unit 11 acquires the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted, and outputs the acquired riding condition to the control unit 12.
ステップS4にて、制御部12は、通信状況及び乗車状況に基づき判別用データに従って、入力音声の認識に使用すべき1組の音声区間判定部及び音声認識処理部を判別する。この際、制御部12は、入力音声を音声認識部3に出力したりサーバ6に送信したりする。
In step S4, the control unit 12 determines one set of voice section determining unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status and the boarding status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
ステップS5にて、図4のステップS5と同様の処理が行われ、図7の動作が終了する。
In step S5, the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 7 ends.
なお、図7の動作によれば、制御部12は、1回の判別で1組の音声区間判定部及び音声認識処理部を判別することになる。しかしこれに限ったものではなく、制御部12は、複数回の判別で1組の音声区間判定部及び音声認識処理部を判別してもよい。例えば、制御部12は、通信状況及び乗車状況の一方に基づいて、複数の音声区間判定部及び複数の音声認識処理部の中からいくつかの音声区間判定部及びいくつかの音声認識処理部を判別してもよい。そして、その後に、制御部12は、通信状況及び乗車状況の他方に基づいて、いくつかの音声区間判定部及びいくつかの音声認識処理部の中から1組の音声区間判定部及び音声認識処理部を判別してもよい。このことは、後述する実施の形態4以降においても同様である。
Note that, according to the operation of FIG. 7, the control unit 12 determines one set of speech segment determination unit and speech recognition processing unit by one determination. However, the present invention is not limited to this, and the control unit 12 may determine one set of voice section determination unit and voice recognition processing unit by multiple determinations. For example, based on one of the communication status and the boarding status, the control unit 12 may select one or more speech segment determination units and some speech recognition processing units among the plurality of speech segment determination units and the plurality of speech recognition processing units. You may judge. After that, based on the other of the communication status and the boarding status, the control unit 12 sets one voice segment determination unit and one voice recognition process out of several voice segment determination units and several voice recognition processing units. The part may be determined. The same applies to the fourth and subsequent embodiments described later.
<実施の形態3のまとめ>
以上のような本実施の形態3に係る音声認識装置1によれば、通信状況及び乗車状況に適した音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。なお、本実施の形態3では、制御部12は、取得部11で取得された通信状況及び乗車状況の両方に基づいて、音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行った。しかしこれに限ったものではなく、制御部12は、取得部11で取得された通信状況を考慮せずに、取得部11で取得された乗車状況に基づいて、音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行ってもよい。 <Summary of Embodiment 3>
According to the speech recognition device 1 according to the third embodiment as described above, the speech segment determination unit and the speech recognition processing unit suitable for the communication status and the boarding status can be used for recognition of input speech. In the third embodiment, thecontrol unit 12 uses the voice section determination unit and the voice recognition processing unit for recognizing the input voice, based on both of the communication status and the boarding status acquired by the acquisition unit 11. I took control. However, the present invention is not limited to this, and the control unit 12 does not consider the communication status acquired by the acquisition unit 11, and based on the boarding status acquired by the acquisition unit 11, the voice section determination unit and the speech recognition process The unit may be controlled to use for recognition of input speech.
以上のような本実施の形態3に係る音声認識装置1によれば、通信状況及び乗車状況に適した音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。なお、本実施の形態3では、制御部12は、取得部11で取得された通信状況及び乗車状況の両方に基づいて、音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行った。しかしこれに限ったものではなく、制御部12は、取得部11で取得された通信状況を考慮せずに、取得部11で取得された乗車状況に基づいて、音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行ってもよい。 <Summary of Embodiment 3>
According to the speech recognition device 1 according to the third embodiment as described above, the speech segment determination unit and the speech recognition processing unit suitable for the communication status and the boarding status can be used for recognition of input speech. In the third embodiment, the
<実施の形態4>
図8は、本発明の実施の形態4に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態4で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。 Fourth Preferred Embodiment
Fourth Embodiment FIG. 8 is a block diagram showing a configuration of a speech recognition device 1 according to a fourth embodiment of the present invention. Hereinafter, among constituent elements described in the fourth embodiment, constituent elements that are the same as or similar to the above constituent elements are given the same reference numerals, and different constituent elements are mainly described.
図8は、本発明の実施の形態4に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態4で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。 Fourth Preferred Embodiment
Fourth Embodiment FIG. 8 is a block diagram showing a configuration of a speech recognition device 1 according to a fourth embodiment of the present invention. Hereinafter, among constituent elements described in the fourth embodiment, constituent elements that are the same as or similar to the above constituent elements are given the same reference numerals, and different constituent elements are mainly described.
図8の音声認識装置1には、実施の形態3に係る音声認識装置1(図5)に、入力装置22が接続された構成と同様である。入力装置22には、1組の音声区間判定部及び音声認識処理部の使用を指定する操作(以下「指定操作」と記載する)が入力される。
The speech recognition apparatus 1 shown in FIG. 8 has the same configuration as the speech recognition apparatus 1 (FIG. 5) according to the third embodiment, in which the input device 22 is connected. In the input device 22, an operation (hereinafter, referred to as a "specified operation") for specifying the use of one set of speech segment determination unit and speech recognition processing unit is input.
<音声認識方式判別部>
音声認識方式判別部2は、実施の形態3に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognitionsystem determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
音声認識方式判別部2は、実施の形態3に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognition
取得部11は、実施の形態3に係る取得部11と同様に、通信状況及び乗車状況を認識関連状況として取得する。
Similar to the acquiring unit 11 according to the third embodiment, the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
ここで、入力装置22に入力された指定操作は記憶部13に記憶され、取得部11は、記憶部13から、指定操作の履歴(以下「操作履歴」と記載する)を取得する。
Here, the designation operation input to the input device 22 is stored in the storage unit 13, and the acquisition unit 11 acquires the history of the designation operation (hereinafter referred to as “operation history”) from the storage unit 13.
入力装置22に指定操作が入力された場合、制御部12は、複数の音声区間判定部8c~8fと複数の音声認識処理部9c~9fとの中から、指定操作で指定された1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
When the designation operation is input to the input device 22, the control unit 12 selects one of the plurality of speech segment determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f, which is designated by the designation operation. The voice segment determination unit and the voice recognition processing unit are controlled to be used for recognizing an input voice.
一方、入力装置22に指定操作が入力されていない場合、制御部12は、取得部11で取得された通信状況、乗車状況及び操作履歴に基づいて、複数の音声区間判定部8c~8fと複数の音声認識処理部9c~9fとの中から、1組の音声区間判定部及び音声認識処理部を判別する。そして、制御部12は、判別した1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
On the other hand, when the designating operation is not input to the input device 22, the control unit 12 sets a plurality of voice section determination units 8c to 8f based on the communication status, the boarding status, and the operation history acquired by the acquisition unit 11. Among the speech recognition processing units 9c to 9f, one set of speech segment determination unit and speech recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
例えば、制御部12は、操作履歴に基づいて、同一の1組の音声区間判定部及び音声認識処理部が指定された回数が、予め定められた閾値以上であるか否かを判定する。そして、上記回数が閾値以上であると判定された場合には、制御部12は、操作履歴の指定操作で指定された1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。一方、上記回数が閾値未満であると判定された場合には、制御部12は、実施の形態3と同様に、取得部11で取得された通信状況及び乗車状況に基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
For example, based on the operation history, the control unit 12 determines whether or not the number of times when the same set of voice section determination unit and voice recognition processing unit is specified is equal to or more than a predetermined threshold. Then, when it is determined that the number of times is equal to or more than the threshold value, the control unit 12 recognizes one set of the voice section determination unit and the voice recognition processing unit designated by the designation operation of the operation history as the input voice. Perform control to use. On the other hand, when it is determined that the above-mentioned number of times is less than the threshold value, the control unit 12 determines one set of voice based on the communication status and the boarding status acquired by the acquisition section 11 as in the third embodiment. The section determination unit and the speech recognition processing unit are controlled to be used for recognition of input speech.
<動作>
図9は、本実施の形態4に係る音声認識装置1の動作を示すフローチャートである。なお、図9の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3bに変更し、ステップS11及びS12が追加されたものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 9 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment. The operation of FIG. 9 is the same as that of step S3 of the operation of the speech recognition apparatus 1 according to the second embodiment (FIG. 4) except that step S3 is changed to step S3 b and steps S11 and S12 are added. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
図9は、本実施の形態4に係る音声認識装置1の動作を示すフローチャートである。なお、図9の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3bに変更し、ステップS11及びS12が追加されたものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 9 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment. The operation of FIG. 9 is the same as that of step S3 of the operation of the speech recognition apparatus 1 according to the second embodiment (FIG. 4) except that step S3 is changed to step S3 b and steps S11 and S12 are added. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
ステップS11にて、制御部12は、入力装置22に指定操作が入力されたか否かを判定する。指定操作が入力されたと判定された場合には処理がステップS12に進み、指定操作が入力されていないと判定された場合には処理がステップS1に進む。
In step S11, control unit 12 determines whether or not a designation operation has been input to input device 22. If it is determined that the designating operation has been input, the process proceeds to step S12. If it is determined that the designating operation has not been input, the process proceeds to step S1.
ステップS11からステップS12に進んだ場合、制御部12は、指定操作で指定された1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。これにより、音声認識部3及びサーバ6は、指定操作で指定された1組の音声区間判定部及び音声認識処理部を使用して入力音声を認識する。その後、図9の動作が終了する。
When the process proceeds from step S11 to step S12, the control unit 12 performs control to use one set of voice segment determination unit and voice recognition processing unit designated by the designation operation for recognition of input voice. Thereby, the speech recognition unit 3 and the server 6 recognize the input speech using the pair of speech segment determination unit and speech recognition processing unit designated by the designation operation. Thereafter, the operation of FIG. 9 ends.
ステップS11からステップS1に進んだ場合、図4のステップS1及びS2と同様の処理が行われる。
When the process proceeds from step S11 to step S1, processing similar to that of steps S1 and S2 of FIG. 4 is performed.
ステップS3bにて、取得部11は、通信状況、乗車状況及び操作履歴を取得して制御部12に出力する。
In step S3b, the acquisition unit 11 acquires the communication status, the boarding status, and the operation history, and outputs the communication status, the boarding status, and the operation history to the control unit 12.
ステップS4にて、制御部12は、通信状況、乗車状況及び操作履歴に基づき判別用データに従って、入力音声の認識に使用すべき1組の音声区間判定部及び音声認識処理部を判別する。この際、制御部12は、入力音声を音声認識部3に出力したりサーバ6に送信したりする。
In step S4, the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the operation history. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
ステップS5にて、図4のステップS5と同様の処理が行われ、図9の動作が終了する。
In step S5, the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 9 ends.
<実施の形態4のまとめ>
以上のような本実施の形態4に係る音声認識装置1によれば、認識関連状況及び操作履歴に基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する。これにより、ユーザの使用傾向に合った音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary of Embodiment 4>
According to the speech recognition apparatus 1 according to the fourth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used for recognition of input speech based on the recognition related situation and the operation history. . As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
以上のような本実施の形態4に係る音声認識装置1によれば、認識関連状況及び操作履歴に基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する。これにより、ユーザの使用傾向に合った音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary of Embodiment 4>
According to the speech recognition apparatus 1 according to the fourth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used for recognition of input speech based on the recognition related situation and the operation history. . As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
<実施の形態5>
図10は、本発明の実施の形態5に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態5で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。 The Fifth Preferred Embodiment
FIG. 10 is a block diagram showing the configuration of the speech recognition device 1 according to the fifth embodiment of the present invention. Hereinafter, among the components described in the fifth embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
図10は、本発明の実施の形態5に係る音声認識装置1の構成を示すブロック図である。以下、本実施の形態5で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。 The Fifth Preferred Embodiment
FIG. 10 is a block diagram showing the configuration of the speech recognition device 1 according to the fifth embodiment of the present invention. Hereinafter, among the components described in the fifth embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
<音声認識方式判別部>
音声認識方式判別部2は、実施の形態3に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognitionsystem determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
音声認識方式判別部2は、実施の形態3に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognition
取得部11は、実施の形態3に係る取得部11と同様に、通信状況及び乗車状況を認識関連状況として取得する。
Similar to the acquiring unit 11 according to the third embodiment, the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
また、取得部11は、音声認識部3の認識結果及びサーバ6の認識結果も取得する。そして、取得部11は、取得した認識結果に基づいて、過去に入力された音声、つまり過去の入力音声に予め定められたコマンドが含まれるか否かを判定する。なお、取得部11が、過去の入力音声に予め定められたコマンドが含まれるか否かの判定結果を取得する構成であれば、これに限ったものではない。例えば、音声認識装置1の外部に設けられた外部装置でこの判定が行われ、取得部11はその判定結果を外部装置から取得する構成であってもよい。
The acquisition unit 11 also acquires the recognition result of the speech recognition unit 3 and the recognition result of the server 6. Then, the acquisition unit 11 determines, based on the acquired recognition result, whether the voice input in the past, that is, the input voice in the past includes a predetermined command. The configuration is not limited to this as long as the acquisition unit 11 is configured to acquire the determination result as to whether or not a predetermined command is included in the past input voice. For example, this determination may be performed by an external device provided outside the voice recognition device 1, and the acquisition unit 11 may be configured to acquire the determination result from the external device.
本実施の形態5では、予め定められたコマンドは、アプリケーションソフトの機能を実行するためのコマンドのうち、当該実行を開始するためのコマンドである上位コマンドを含む。例えば、上位コマンドには、目的地の検索、経路の探索などが可能なナビゲーション機能の実行を開始するための「ナビゲーション」というコマンドなどが含まれ、上位コマンドではない下位コマンドには、目的地の検索を実行するための「目的地検索」というコマンド、経路の探索を実行するための「経路探索」というコマンドなどが含まれる。
In the fifth embodiment, among the commands for executing the function of the application software, the predetermined command includes an upper-level command which is a command for starting the execution. For example, the upper level command includes a command such as "Navigation" for starting execution of a navigation function capable of searching for a destination, a route search, etc. The lower level command that is not the upper level command is a destination. The command “destination search” for executing a search, the command “route search” for executing a route search, and the like are included.
制御部12は、取得部11で取得された通信状況、乗車状況及び上記判定結果に基づいて、複数の音声区間判定部8c~8fと複数の音声認識処理部9c~9fとの中から、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
The control unit 12 selects one of the plurality of speech section determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f based on the communication status, the boarding status, and the determination result acquired by the acquisition unit 11. A control is performed to use a set of speech zone determination unit and speech recognition processing unit for recognition of input speech.
記憶部13は、制御部12が判別を行う際に用いる判別用データを記憶している。図11は、本実施の形態5に係る判別用データの一部を示す図である。図11では、図6の12通りの組み合わせのうちの1通りを、判定結果が上位コマンドを含むか下位コマンドを含むかに分けて、音声区間判定部及び音声認識処理部が設定されているが、他の11通りに対しても、同様に音声区間判定部及び音声認識処理部が設定される。
The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 11 is a diagram showing part of the discrimination data according to the fifth embodiment. In FIG. 11, one of the 12 combinations shown in FIG. 6 is divided into whether the determination result includes the upper command or the lower command, and the voice section determining unit and the voice recognition processing unit are set. The voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.
図11の例では、判定結果が上位コマンドを含む場合に使用される音声区間判定部及び音声認識処理部として、常時の音声区間判定部及び常時の音声認識処理部が設定されている。一方、判定結果が下位コマンドを含む場合に使用される音声区間判定部及び音声認識処理部として、通常の音声区間判定部及び通常の音声認識処理部が設定されている。
In the example of FIG. 11, a regular voice section determination unit and a regular voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the upper level command. On the other hand, a normal voice section determination unit and a normal voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the lower order command.
図11のような判別用データによれば、比較的数が少ない上位コマンドに対して常時の音声認識が行われるので、ユーザの意図に反して音声認識が行われることを抑制することができる。なお、ユーザは、上位コマンドの音声の後に続けて、複数の下位コマンドの音声を、入力することができるように、判定結果が上位コマンドを含むと一度判定された場合には、一定時間の間、下位コマンドを含むという判定結果を無効にしてもよい。
According to the discrimination data as shown in FIG. 11, since constant voice recognition is performed on a relatively small number of upper commands, it is possible to suppress voice recognition against the user's intention. It should be noted that the user can input voices of a plurality of lower level commands subsequently to the voice of the upper level command, and if the determination result is once determined to include the upper level command, the user can enter for a certain period of time. The determination result that the lower order command is included may be invalidated.
<動作>
図12は、本実施の形態5に係る音声認識装置1の動作を示すフローチャートである。なお、図12の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3cに変更したものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. The operation in FIG. 12 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3c. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
図12は、本実施の形態5に係る音声認識装置1の動作を示すフローチャートである。なお、図12の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3cに変更したものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. The operation in FIG. 12 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3c. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
ステップS1及びS2にて、図4のステップS1及びS2と同様の処理が行われる。
In steps S1 and S2, processing similar to that in steps S1 and S2 of FIG. 4 is performed.
ステップS3cにて、取得部11は、取得部11は、通信状況と、乗車状況と、上位コマンドの判定結果とを取得して制御部12に出力する。
In step S3c, the acquisition unit 11 acquires the communication status, the boarding status, and the determination result of the upper-level command, and outputs the communication status, the boarding status, and the determination result of the upper level command to the control unit 12.
ステップS4にて、制御部12は、通信状況、乗車状況及び判定結果に基づき判別用データに従って、入力音声の認識に使用すべき1組の音声区間判定部及び音声認識処理部を判別する。この際、制御部12は、入力音声を音声認識部3に出力したりサーバ6に送信したりする。
In step S4, the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the determination result. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
ステップS5にて、図4のステップS5と同様の処理が行われ、図12の動作が終了する。
In step S5, the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 12 ends.
<実施の形態5のまとめ>
以上のような本実施の形態5に係る音声認識装置1によれば、認識関連状況と上位コマンドが含まれるか否かの判定結果とに基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する。これにより、ユーザの使用傾向に合った音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary of Embodiment 5>
According to the speech recognition device 1 according to the fifth embodiment as described above, one set of speech segment judgment unit and speech recognition processing are based on the recognition related situation and the judgment result of whether or not the upper level command is included. Part is used to recognize the input speech. As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
以上のような本実施の形態5に係る音声認識装置1によれば、認識関連状況と上位コマンドが含まれるか否かの判定結果とに基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する。これにより、ユーザの使用傾向に合った音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary of Embodiment 5>
According to the speech recognition device 1 according to the fifth embodiment as described above, one set of speech segment judgment unit and speech recognition processing are based on the recognition related situation and the judgment result of whether or not the upper level command is included. Part is used to recognize the input speech. As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
<実施の形態6>
本発明の実施の形態6に係る音声認識装置1のブロック構成は、実施の形態3に係る音声認識装置1のブロック構成(図5)と同じである。以下、本実施の形態6で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。Embodiment 6
The block configuration of the speech recognition device 1 according to the sixth embodiment of the present invention is the same as the block configuration (FIG. 5) of the speech recognition device 1 according to the third embodiment. Hereinafter, among the components described in the sixth embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
本発明の実施の形態6に係る音声認識装置1のブロック構成は、実施の形態3に係る音声認識装置1のブロック構成(図5)と同じである。以下、本実施の形態6で説明する構成要素のうち、上述の構成要素と同じまたは類似する構成要素については同じ参照符号を付し、異なる構成要素について主に説明する。
The block configuration of the speech recognition device 1 according to the sixth embodiment of the present invention is the same as the block configuration (FIG. 5) of the speech recognition device 1 according to the third embodiment. Hereinafter, among the components described in the sixth embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
<音声認識方式判別部>
音声認識方式判別部2は、実施の形態3に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognitionsystem determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
音声認識方式判別部2は、実施の形態3に係る音声認識方式判別部2と同様に、取得部11、制御部12、及び、記憶部13を備える。 <Voice recognition method discrimination unit>
The voice recognition
取得部11は、実施の形態3に係る取得部11と同様に、通信状況及び乗車状況を認識関連状況として取得する。
Similar to the acquiring unit 11 according to the third embodiment, the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
また、取得部11は、音声認識装置1のハードウェアの使用状態も取得する。本実施の形態6では、音声認識装置1のハードウェアは、音声認識装置1が適用されたナビゲーション装置などの装置のハードウェアであり、ハードウェアの使用状態はハードウェアの使用率を含む。ハードウェアの使用率は、例えば、CPU(Central Processing Unit)の使用率、及び、メモリの使用率などを含む。
The acquisition unit 11 also acquires the usage state of the hardware of the speech recognition device 1. In the sixth embodiment, hardware of the speech recognition device 1 is hardware of a device such as a navigation device to which the speech recognition device 1 is applied, and the usage state of the hardware includes the usage rate of the hardware. The usage rate of hardware includes, for example, a usage rate of a central processing unit (CPU), a usage rate of a memory, and the like.
制御部12は、取得部11で取得された通信状況、乗車状況及び使用状態に基づいて、複数の音声区間判定部8c~8fと複数の音声認識処理部9c~9fとの中から、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行う。
Based on the communication status, the boarding status, and the use status acquired by the acquisition section 11, the control section 12 selects one set out of the plurality of speech segment determination sections 8c to 8f and the plurality of speech recognition processing sections 9c to 9f. The voice section determination unit and the voice recognition processing unit of the above are controlled to be used for recognition of the input voice.
記憶部13は、制御部12が判別を行う際に用いる判別用データを記憶している。図13は、本実施の形態6に係る判別用データの一部を示す図である。図13では、図6の12通りの組み合わせのうちの1通りを、使用状態に含まれる使用率が予め定められた閾値以上であるか否かに分けて、音声区間判定部及び音声認識処理部が設定されているが、他の11通りに対しても、同様に音声区間判定部及び音声認識処理部が設定される。
The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 13 is a diagram showing a part of discrimination data according to the sixth embodiment. In FIG. 13, one of the 12 combinations shown in FIG. 6 is divided into whether the usage rate included in the usage state is equal to or more than a predetermined threshold value, and the voice section determination unit and the voice recognition processing unit Are set, but the voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.
図13の例では、使用状態に含まれる使用率が予め定められた閾値以上である場合に使用される音声区間判定部及び音声認識処理部として、通常の音声区間判定部及び通常の音声認識処理部が設定され、使用率が閾値未満である場合に使用される音声区間判定部及び音声認識処理部として、常時の音声区間判定部及び常時の音声認識処理部が設定されている。
In the example of FIG. 13, a voice segment determining unit and a voice recognition processing unit used when the usage rate included in the usage state is equal to or more than a predetermined threshold, a normal voice segment determining unit and a normal voice recognition process A section is set, and as a voice section determination section and a voice recognition processing section used when the usage rate is less than the threshold, a regular voice section determination section and a regular voice recognition processing section are set.
図13のような判別用データによれば、ハードウェアの使用率が比較的高い場合には、ハードウェアの使用率が低い通常の音声認識を行うことができる。一方、ハードウェアの使用率が比較的低い場合には、ハードウェアの使用率が高い常時の音声認識を行うことができる。
According to the discrimination data as shown in FIG. 13, when the hardware usage rate is relatively high, normal voice recognition with low hardware usage rate can be performed. On the other hand, when the hardware usage rate is relatively low, it is possible to perform continuous voice recognition with high hardware usage rate.
<動作>
図14は、本実施の形態6に係る音声認識装置1の動作を示すフローチャートである。なお、図14の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3dに変更したものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 14 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. The operation in FIG. 14 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3d. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
図14は、本実施の形態6に係る音声認識装置1の動作を示すフローチャートである。なお、図14の動作は、実施の形態2に係る音声認識装置1の動作(図4)のステップS3をステップS3dに変更したものと同様である。以下、実施の形態2に係る音声認識装置1の処理と異なる処理についてのみ説明する。 <Operation>
FIG. 14 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. The operation in FIG. 14 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3d. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.
ステップS1及びS2にて、図4のステップS1及びS2と同様の処理が行われる。
In steps S1 and S2, processing similar to that in steps S1 and S2 of FIG. 4 is performed.
ステップS3dにて、取得部11は、通信状況と、乗車状況と、ハードウェアの使用状態とを取得して制御部12に出力する。
In step S3d, the acquisition unit 11 acquires the communication status, the boarding status, and the usage status of the hardware, and outputs the acquired status to the control unit 12.
ステップS4にて、制御部12は、通信状況、乗車状況及びハードウェアの使用状態に基づき判別用データに従って、入力音声の認識に使用すべき1組の音声区間判定部及び音声認識処理部を判別する。この際、制御部12は、入力音声を音声認識部3に出力したりサーバ6に送信したりする。
In step S4, the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status, the boarding status, and the use status of the hardware. Do. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
ステップS5にて、図4のステップS5と同様の処理が行われ、図14の動作が終了する。
In step S5, the same process as step S5 of FIG. 4 is performed, and the operation of FIG. 14 ends.
<実施の形態6のまとめ>
以上のような本実施の形態6に係る音声認識装置1によれば、認識関連状況とハードウェアの使用状態とに基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する。これにより、ハードウェアの使用状態に適した音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary ofEmbodiment 6>
According to the speech recognition apparatus 1 according to the sixth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used to input an input speech based on the recognition related situation and the usage state of the hardware. Used for recognition. As a result, it is possible to use the speech segment determination unit and the speech recognition processing unit suitable for the usage state of the hardware for recognition of the input speech.
以上のような本実施の形態6に係る音声認識装置1によれば、認識関連状況とハードウェアの使用状態とに基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する。これにより、ハードウェアの使用状態に適した音声区間判定部及び音声認識処理部を、入力音声の認識に使用することができる。 <Summary of
According to the speech recognition apparatus 1 according to the sixth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used to input an input speech based on the recognition related situation and the usage state of the hardware. Used for recognition. As a result, it is possible to use the speech segment determination unit and the speech recognition processing unit suitable for the usage state of the hardware for recognition of the input speech.
<変形例>
実施の形態2では、制御部12は、取得部11で取得された通信状況に基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行ったが、これに限ったものではない。例えば、図15に示すように、取得部11で取得された通信状況が低質オンラインであった場合、制御部12は、第1音声区間判定部8a及び第2音声区間判定部8bを並列的に使用し、かつ、第1音声認識処理部9a及び第2音声区間判定部8bを並列的に使用する制御を行ってもよい。つまり、制御部12は、取得部11で取得された通信状況に基づいて、複数の音声区間判定部の組み合わせと、複数の音声認識処理部の組み合わせとを、入力音声の認識に使用する制御を行ってもよい。さらに換言すれば、制御部12は、取得部11で取得された通信状況に基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行なってもよい。このことは、実施の形態3~6においても同様であり、実施の形態3の指定操作においても同様である。 <Modification>
In the second embodiment, thecontrol unit 12 performs control to use one set of speech segment determination unit and speech recognition processing unit for recognition of input speech based on the communication status acquired by the acquisition unit 11. Not limited to this. For example, as illustrated in FIG. 15, when the communication status acquired by the acquisition unit 11 is low quality online, the control unit 12 performs the first speech zone judgment unit 8a and the second speech zone judgment unit 8b in parallel. Control may be performed using the first voice recognition processing unit 9a and the second voice section determination unit 8b in parallel. That is, based on the communication status acquired by the acquisition unit 11, the control unit 12 performs control of using a combination of a plurality of speech segment determination units and a combination of a plurality of speech recognition processing units for recognition of input speech. You may go. Furthermore, in other words, based on the communication status acquired by the acquisition unit 11, the control unit 12 uses at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. Control may be performed. The same applies to the third to sixth embodiments and to the designation operation of the third embodiment.
実施の形態2では、制御部12は、取得部11で取得された通信状況に基づいて、1組の音声区間判定部及び音声認識処理部を、入力音声の認識に使用する制御を行ったが、これに限ったものではない。例えば、図15に示すように、取得部11で取得された通信状況が低質オンラインであった場合、制御部12は、第1音声区間判定部8a及び第2音声区間判定部8bを並列的に使用し、かつ、第1音声認識処理部9a及び第2音声区間判定部8bを並列的に使用する制御を行ってもよい。つまり、制御部12は、取得部11で取得された通信状況に基づいて、複数の音声区間判定部の組み合わせと、複数の音声認識処理部の組み合わせとを、入力音声の認識に使用する制御を行ってもよい。さらに換言すれば、制御部12は、取得部11で取得された通信状況に基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行なってもよい。このことは、実施の形態3~6においても同様であり、実施の形態3の指定操作においても同様である。 <Modification>
In the second embodiment, the
また、実施の形態4~6の機能を適宜組み合わせてもよい。すなわち、取得部11は、認識関連状況を取得し、かつ、操作履歴、上位コマンドが含まれるか否かの判定結果、及び、ハードウェアの使用状態の少なくともいずれか1つを取得してもよい。そして、制御部12は、取得部11で取得された認識関連状況と、取得部11で取得された操作履歴、判定結果、及び、使用状態の少なくともいずれか1つとに基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行ってもよい。
Also, the functions of Embodiments 4 to 6 may be combined as appropriate. That is, the acquisition unit 11 may acquire the recognition related situation, and may acquire at least one of the operation history, the determination result of whether or not the upper-level command is included, and the usage state of the hardware. . Then, the control unit 12 performs at least one of the recognition related situation acquired by the acquisition unit 11 and at least one of the operation history acquired by the acquisition unit 11, the determination result, and the use state. Control may be performed in which one voice segment determination unit and at least one voice recognition processing unit are used to recognize an input voice.
<その他の変形例>
上述した音声認識装置1における図1の取得部11及び制御部12を、以下「取得部11等」と記す。取得部11等は、図16に示す処理回路81により実現される。すなわち、処理回路81は、認識関連状況を取得する取得部11と、取得部11で取得された認識関連状況に基づいて、予め定められた複数の音声区間判定部と予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う制御部12と、を備える。処理回路81には、専用のハードウェアが適用されてもよいし、メモリに格納されるプログラムを実行するプロセッサが適用されてもよい。プロセッサには、例えば、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)などが該当する。 <Other Modifications>
Hereinafter, theacquisition unit 11 and the control unit 12 in FIG. 1 in the above-described speech recognition apparatus 1 will be referred to as “acquisition unit 11 and the like”. The acquisition unit 11 and the like are realized by the processing circuit 81 shown in FIG. That is, the processing circuit 81 obtains a plurality of speech segment determination units determined in advance and a plurality of voices determined in advance based on the acquisition unit 11 acquiring the recognition related situation and the recognition related situation acquired by the acquisition unit 11. Among the recognition processing units, the control unit 12 performs control to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in a memory may be applied. The processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), and the like.
上述した音声認識装置1における図1の取得部11及び制御部12を、以下「取得部11等」と記す。取得部11等は、図16に示す処理回路81により実現される。すなわち、処理回路81は、認識関連状況を取得する取得部11と、取得部11で取得された認識関連状況に基づいて、予め定められた複数の音声区間判定部と予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行う制御部12と、を備える。処理回路81には、専用のハードウェアが適用されてもよいし、メモリに格納されるプログラムを実行するプロセッサが適用されてもよい。プロセッサには、例えば、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、DSP(Digital Signal Processor)などが該当する。 <Other Modifications>
Hereinafter, the
処理回路81が専用のハードウェアである場合、処理回路81は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)、またはこれらを組み合わせたものが該当する。取得部11等の各部の機能それぞれは、処理回路を分散させた回路で実現されてもよいし、各部の機能をまとめて一つの処理回路で実現されてもよい。
When the processing circuit 81 is dedicated hardware, the processing circuit 81 may be, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), an FPGA (field programmable gate) An array) or a combination thereof is applicable. Each function of each unit such as the acquisition unit 11 may be realized by a circuit in which processing circuits are dispersed, or the function of each unit may be realized by one processing circuit.
処理回路81がプロセッサである場合、取得部11等の機能は、ソフトウェア等との組み合わせにより実現される。なお、ソフトウェア等には、例えば、ソフトウェア、ファームウェア、または、ソフトウェア及びファームウェアが該当する。ソフトウェア等はプログラムとして記述され、メモリ83に格納される。図17に示すように、処理回路81に適用されるプロセッサ82は、メモリ83に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、音声認識装置1は、処理回路81により実行されるときに、認識関連状況を取得するステップと、取得された認識関連状況に基づいて、予め定められた複数の音声区間判定部と予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、入力音声の認識に使用する制御を行うステップと、が結果的に実行されることになるプログラムを格納するためのメモリ83を備える。換言すれば、このプログラムは、取得部11等の手順や方法をコンピュータに実行させるものであるともいえる。ここで、メモリ83は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable Read Only Memory)、EEPROM(Electrically Erasable Programmable Read Only Memory)などの、不揮発性または揮発性の半導体メモリ、HDD(Hard Disk Drive)、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、DVD(Digital Versatile Disc)、そのドライブ装置等、または、今後使用されるあらゆる記憶媒体であってもよい。
When the processing circuit 81 is a processor, the functions of the acquisition unit 11 and the like are realized by a combination with software and the like. The software and the like correspond to, for example, software, firmware, or software and firmware. Software and the like are described as a program and stored in the memory 83. As shown in FIG. 17, the processor 82 applied to the processing circuit 81 implements the functions of the respective units by reading and executing the program stored in the memory 83. That is, when executed by the processing circuit 81, the speech recognition device 1 obtains a recognition related situation, and determines in advance a plurality of predetermined voice interval judging units based on the acquired recognition related situation. Performing control of using at least any one voice segment determination unit and at least one voice recognition processing unit among the plurality of voice recognition processing units to be used for recognizing the input voice; And a memory 83 for storing a program to be executed. In other words, it can be said that this program causes a computer to execute the procedure and method of the acquisition unit 11 and the like. Here, the memory 83 is, for example, non-volatile or non-volatile, such as random access memory (RAM), read only memory (ROM), flash memory, erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM). Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc), its drive device, etc. or any storage medium used in the future May be
以上、取得部11等の各機能が、ハードウェア及びソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、取得部11等の一部を専用のハードウェアで実現し、別の一部をソフトウェア等で実現する構成であってもよい。例えば、取得部11については専用のハードウェアとしての処理回路81及びレシーバなどでその機能を実現し、それ以外についてはプロセッサ82としての処理回路81がメモリ83に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。
In the above, the structure by which each function of the acquisition part 11 grade | etc., Is implement | achieved by either hardware, software, etc. was demonstrated. However, the present invention is not limited to this, and a part of the acquisition unit 11 or the like may be realized by dedicated hardware, and another part may be realized by software or the like. For example, the function of the acquisition unit 11 is realized by the processing circuit 81 and the receiver as dedicated hardware, and the processing circuit 81 as the processor 82 reads out and executes the program stored in the memory 83 for the rest. It is possible to realize the function by that.
以上のように、処理回路81は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。
As described above, the processing circuit 81 can realize each of the functions described above by hardware, software, etc., or a combination thereof.
また、以上で説明した音声認識装置1は、PND(Portable Navigation Device)などのナビゲーション装置と、携帯電話、スマートフォン及びタブレットなどの携帯端末を含む通信端末と、ナビゲーション装置及び通信端末の少なくともいずれか1つにインストールされるアプリケーションの機能と、サーバとを適宜に組み合わせてシステムとして構築される音声認識システムにも適用することができる。この場合、以上で説明した音声認識装置1の各機能あるいは各構成要素は、前記システムを構築する各機器に分散して配置されてもよいし、いずれかの機器に集中して配置されてもよい。
Further, the voice recognition device 1 described above includes at least one of a navigation device such as a portable navigation device (PND), a communication terminal including a portable terminal such as a mobile phone, a smartphone and a tablet, a navigation device and a communication terminal. The present invention can also be applied to a speech recognition system constructed as a system by appropriately combining the function of an application installed in one and the server. In this case, each function or each component of the speech recognition apparatus 1 described above may be distributed to each device configuring the system, or may be concentrated to any device. Good.
図18は、本変形例に係るサーバ91の構成を示すブロック図である。図18のサーバ91は、通信部91aと制御部91bとを備えており、車両92のナビゲーション装置93と無線通信を行うことが可能となっている。
FIG. 18 is a block diagram showing the configuration of the server 91 according to the present modification. The server 91 of FIG. 18 includes a communication unit 91a and a control unit 91b, and can perform wireless communication with the navigation device 93 of the vehicle 92.
取得部である通信部91aは、ナビゲーション装置93と無線通信を行うことにより、認識関連状況を受信する。
The communication unit 91a, which is an acquisition unit, wirelessly communicates with the navigation device 93 to receive the recognition related situation.
制御部91bは、サーバ91の図示しないプロセッサなどが、サーバ91の図示しないメモリに記憶されたプログラムを実行することにより、図1の制御部12と同様の機能を有している。つまり、制御部91bは、通信部91aで受信された認識関連状況に基づいて、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を判別し、その判別結果をナビゲーション装置93に送信する。
The control unit 91 b has a function similar to that of the control unit 12 of FIG. 1 by executing a program stored in a memory (not shown) of the server 91 by a processor (not shown) of the server 91 or the like. That is, the control unit 91b determines at least one voice segment determination unit and at least one voice recognition processing unit based on the recognition related situation received by the communication unit 91a, and determines the determination result as a navigation device. Send to 93
このように構成されたサーバ91によれば、実施の形態1で説明した音声認識装置1と同様の効果を得ることができる。
According to the server 91 configured as described above, the same effect as that of the speech recognition device 1 described in the first embodiment can be obtained.
図19は、本変形例に係る通信端末96の構成を示すブロック図である。図19の通信端末96は、通信部91aと同様の通信部96aと、制御部91bと同様の制御部96bとを備えており、車両97のナビゲーション装置98と無線通信を行うことが可能となっている。なお、通信端末96には、例えば車両97の運転者が携帯する携帯電話、スマートフォン、及びタブレットなどの携帯端末が適用される。このように構成された通信端末96によれば、実施の形態1で説明した音声認識装置1と同様の効果を得ることができる。
FIG. 19 is a block diagram showing a configuration of communication terminal 96 according to the present modification. The communication terminal 96 of FIG. 19 includes a communication unit 96a similar to the communication unit 91a and a control unit 96b similar to the control unit 91b, and can communicate wirelessly with the navigation device 98 of the vehicle 97. ing. For the communication terminal 96, for example, a mobile terminal such as a mobile phone carried by the driver of the vehicle 97, a smart phone, and a tablet is applied. According to communication terminal 96 configured as described above, the same effect as that of speech recognition device 1 described in Embodiment 1 can be obtained.
なお、本発明は、その発明の範囲内において、各実施の形態及び各変形例を自由に組み合わせたり、各実施の形態及び各変形例を適宜、変形、省略したりすることが可能である。
In the present invention, within the scope of the invention, each embodiment and each modification can be freely combined, or each embodiment and each modification can be suitably modified or omitted.
本発明は詳細に説明されたが、上記した説明は、すべての態様において、例示であって、本発明がそれに限定されるものではない。例示されていない無数の変形例が、本発明の範囲から外れることなく想定され得るものと解される。
Although the present invention has been described in detail, the above description is an exemplification in all aspects, and the present invention is not limited thereto. It is understood that countless variations not illustrated are conceivable without departing from the scope of the present invention.
1 音声認識装置、6 サーバ、8a~8f 音声区間判定部、9a~9f 音声認識処理部、11 取得部、12 制御部、21 認識開始ボタン。
Reference Signs List 1 voice recognition apparatus, 6 servers, 8a to 8f voice section determination unit, 9a to 9f voice recognition processing unit, 11 acquisition unit, 12 control unit, 21 recognition start button.
Claims (8)
- 入力された音声である入力音声の認識が可能な音声認識装置であって、
前記入力音声の認識に関連する状況を取得する取得部と、
前記取得部で取得された前記状況に基づいて、音声を認識する期間である音声区間を判定可能な予め定められた複数の音声区間判定部と、前記複数の音声区間判定部で判定された前記音声区間の音声を認識する処理が可能な予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、前記入力音声の認識に使用する制御を行う制御部と
を備える、音声認識装置。 A speech recognition apparatus capable of recognizing an input speech, which is an input speech, comprising:
An acquisition unit that acquires a situation related to recognition of the input speech;
Based on the situation acquired by the acquisition unit, a plurality of predetermined voice segment determination units capable of determining a voice segment which is a period for recognizing a voice, and the plurality of voice segment determination units determined Among a plurality of predetermined voice recognition processing units capable of processing voice recognition of voice segments, at least one voice segment determination unit and at least one voice recognition processing unit, the input voice And a control unit that performs control used to recognize the voice recognition device. - 請求項1に記載の音声認識装置であって、
前記状況は、
前記音声認識装置とサーバとの間の通信状況、及び、前記音声認識装置が搭載された車両の乗員の乗車状況、の少なくともいずれか1つを含む、音声認識装置。 The speech recognition apparatus according to claim 1, wherein
The situation is
A voice recognition device comprising at least one of a communication status between the voice recognition device and a server, and a riding condition of a passenger of a vehicle equipped with the voice recognition device. - 請求項1に記載の音声認識装置であって、
前記複数の音声区間判定部は、
前記音声区間を常時判定する音声区間判定部と、予め定められた操作に応じて前記音声区間を判定する音声区間判定部とを含み、
前記複数の音声認識処理部は、
音声を常時認識処理する音声認識処理部と、予め定められた操作に応じて音声を認識処理する音声認識処理部とを含む、音声認識装置。 The speech recognition apparatus according to claim 1, wherein
The plurality of voice segment determination units are:
A voice period determination unit that constantly determines the voice period; and a voice period determination unit that determines the voice period according to a predetermined operation;
The plurality of voice recognition processing units are:
A voice recognition apparatus, comprising: a voice recognition processing unit that constantly recognizes voice; and a voice recognition processing unit that recognizes voice according to a predetermined operation. - 請求項1に記載の音声認識装置であって、
前記取得部は、
音声区間判定部及び音声認識処理部の使用を指定する操作の履歴をさらに取得し、
前記制御部は、
前記取得部で取得された前記状況と、前記取得部で取得された前記履歴とに基づいて、前記少なくともいずれか1つの音声区間判定部及び前記少なくともいずれか1つの音声認識処理部を、前記入力音声の認識に使用する制御を行う、音声認識装置。 The speech recognition apparatus according to claim 1, wherein
The acquisition unit
Acquiring a history of operations designating use of the voice section determination unit and the voice recognition processing unit;
The control unit
The at least one speech segment determination unit and the at least one speech recognition processing unit are input based on the situation acquired by the acquisition unit and the history acquired by the acquisition unit. A speech recognition device that performs control used for speech recognition. - 請求項1に記載の音声認識装置であって、
前記取得部は、
過去に入力された音声に予め定められたコマンドが含まれるか否かの判定結果をさらに取得し、
前記制御部は、
前記取得部で取得された前記状況と、前記取得部で取得された前記判定結果とに基づいて、前記少なくともいずれか1つの音声区間判定部及び前記少なくともいずれか1つの音声認識処理部を、前記入力音声の認識に使用する制御を行う、音声認識装置。 The speech recognition apparatus according to claim 1, wherein
The acquisition unit
Further acquire the determination result as to whether the voice input in the past includes a predetermined command or not,
The control unit
The at least one speech segment determination unit and the at least one speech recognition processing unit are based on the situation acquired by the acquisition unit and the determination result acquired by the acquisition unit. A speech recognition device that performs control used to recognize input speech. - 請求項1に記載の音声認識装置であって、
前記取得部は、
前記音声認識装置のハードウェアの使用状態をさらに取得し、
前記制御部は、
前記取得部で取得された前記状況と、前記取得部で取得された前記使用状態とに基づいて、前記少なくともいずれか1つの音声区間判定部及び前記少なくともいずれか1つの音声認識処理部を、前記入力音声の認識に使用する制御を行う、音声認識装置。 The speech recognition apparatus according to claim 1, wherein
The acquisition unit
Further acquiring the usage status of the voice recognition device hardware;
The control unit
The at least one speech segment determination unit and the at least one speech recognition processing unit are based on the situation acquired by the acquisition unit and the use state acquired by the acquisition unit. A speech recognition device that performs control used to recognize input speech. - 請求項1に記載の音声認識装置であって、
前記取得部は、
音声認識処理部及び音声区間判定部の使用を指定する操作の履歴、過去に入力された音声に予め定められたコマンドが含まれるか否かの判定結果、及び、前記音声認識装置のハードウェアの使用状態、の少なくともいずれか1つをさらに取得し、
前記制御部は、
前記取得部で取得された前記状況と、前記取得部で取得された前記履歴、前記判定結果、及び、前記使用状態の少なくともいずれか1つとに基づいて、前記少なくともいずれか1つの音声区間判定部及び前記少なくともいずれか1つの音声認識処理部を、前記入力音声の認識に使用する制御を行う、音声認識装置。 The speech recognition apparatus according to claim 1, wherein
The acquisition unit
A history of operations for designating use of a voice recognition processing unit and a voice section determination unit, a determination result as to whether or not a voice input in the past includes a predetermined command, and hardware of the voice recognition device Get at least one of the usage status, and
The control unit
The at least one voice segment determination unit based on the situation acquired by the acquisition unit and at least one of the history acquired by the acquisition unit, the determination result, and the usage state And a voice recognition device that performs control to use the at least one voice recognition processing unit for recognition of the input voice. - 入力された音声である入力音声の認識が可能な音声認識方法であって、
前記入力音声の認識に関連する状況を取得し、
取得された前記状況に基づいて、音声を認識する期間である音声区間を判定可能な予め定められた複数の音声区間判定部と、前記複数の音声区間判定部で判定された前記音声区間の音声を認識する処理が可能な予め定められた複数の音声認識処理部との中から、少なくともいずれか1つの音声区間判定部及び少なくともいずれか1つの音声認識処理部を、前記入力音声の認識に使用する制御を行う、音声認識方法。 A speech recognition method capable of recognizing an input speech, which is an input speech, comprising:
Get the situation related to the recognition of the input speech,
Based on the acquired situation, a plurality of predetermined voice section determination units capable of determining a voice section that is a period for recognizing voice, and the voice of the voice section determined by the plurality of voice section determination units And at least one voice segment determination unit and at least one voice recognition processing unit among a plurality of predetermined voice recognition processing units capable of recognizing the input voice for recognition of the input voice To control the voice recognition method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/026450 WO2019016938A1 (en) | 2017-07-21 | 2017-07-21 | Speech recognition device and speech recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/026450 WO2019016938A1 (en) | 2017-07-21 | 2017-07-21 | Speech recognition device and speech recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019016938A1 true WO2019016938A1 (en) | 2019-01-24 |
Family
ID=65015595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/026450 WO2019016938A1 (en) | 2017-07-21 | 2017-07-21 | Speech recognition device and speech recognition method |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2019016938A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017151210A (en) * | 2016-02-23 | 2017-08-31 | Nttテクノクロス株式会社 | Information processing device, voice recognition method, and program |
CN112802471A (en) * | 2020-12-31 | 2021-05-14 | 北京梧桐车联科技有限责任公司 | Voice sound zone switching method, device, equipment and storage medium |
US20210383808A1 (en) * | 2019-02-26 | 2021-12-09 | Preferred Networks, Inc. | Control device, system, and control method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09127982A (en) * | 1995-10-27 | 1997-05-16 | Nec Robotics Eng Ltd | Voice recognition device |
JP2005031758A (en) * | 2003-07-07 | 2005-02-03 | Canon Inc | Voice processing device and method |
WO2011148594A1 (en) * | 2010-05-26 | 2011-12-01 | 日本電気株式会社 | Voice recognition system, voice acquisition terminal, voice recognition distribution method and voice recognition program |
WO2013005248A1 (en) * | 2011-07-05 | 2013-01-10 | 三菱電機株式会社 | Voice recognition device and navigation device |
JP2014186295A (en) * | 2013-02-21 | 2014-10-02 | Nippon Telegr & Teleph Corp <Ntt> | Voice section detection device, voice recognition device, voice section detection method, and program |
JP2015200860A (en) * | 2014-04-01 | 2015-11-12 | ソフトバンク株式会社 | Dictionary database management device, api server, dictionary database management method, and dictionary database management program |
-
2017
- 2017-07-21 WO PCT/JP2017/026450 patent/WO2019016938A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09127982A (en) * | 1995-10-27 | 1997-05-16 | Nec Robotics Eng Ltd | Voice recognition device |
JP2005031758A (en) * | 2003-07-07 | 2005-02-03 | Canon Inc | Voice processing device and method |
WO2011148594A1 (en) * | 2010-05-26 | 2011-12-01 | 日本電気株式会社 | Voice recognition system, voice acquisition terminal, voice recognition distribution method and voice recognition program |
WO2013005248A1 (en) * | 2011-07-05 | 2013-01-10 | 三菱電機株式会社 | Voice recognition device and navigation device |
JP2014186295A (en) * | 2013-02-21 | 2014-10-02 | Nippon Telegr & Teleph Corp <Ntt> | Voice section detection device, voice recognition device, voice section detection method, and program |
JP2015200860A (en) * | 2014-04-01 | 2015-11-12 | ソフトバンク株式会社 | Dictionary database management device, api server, dictionary database management method, and dictionary database management program |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017151210A (en) * | 2016-02-23 | 2017-08-31 | Nttテクノクロス株式会社 | Information processing device, voice recognition method, and program |
US20210383808A1 (en) * | 2019-02-26 | 2021-12-09 | Preferred Networks, Inc. | Control device, system, and control method |
US12051412B2 (en) * | 2019-02-26 | 2024-07-30 | Preferred Networks, Inc. | Control device, system, and control method |
CN112802471A (en) * | 2020-12-31 | 2021-05-14 | 北京梧桐车联科技有限责任公司 | Voice sound zone switching method, device, equipment and storage medium |
CN112802471B (en) * | 2020-12-31 | 2024-01-23 | 北京梧桐车联科技有限责任公司 | Voice voice zone switching method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9230538B2 (en) | Voice recognition device and navigation device | |
CN106209138B (en) | Vehicle cautious emergency response system and method | |
WO2013005248A1 (en) | Voice recognition device and navigation device | |
US20180277119A1 (en) | Speech dialogue device and speech dialogue method | |
US10176806B2 (en) | Motor vehicle operating device with a correction strategy for voice recognition | |
US20200160861A1 (en) | Apparatus and method for processing voice commands of multiple talkers | |
JP4260788B2 (en) | Voice recognition device controller | |
US20160004501A1 (en) | Audio command intent determination system and method | |
CN105355202A (en) | Voice recognition apparatus, vehicle having the same, and method of controlling the vehicle | |
WO2019016938A1 (en) | Speech recognition device and speech recognition method | |
US20190130908A1 (en) | Speech recognition device and method for vehicle | |
US20130013310A1 (en) | Speech recognition system | |
US10276180B2 (en) | Audio command adaptive processing system and method | |
JP6459330B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition program | |
JP2020086571A (en) | In-vehicle device and speech recognition method | |
JP2016133378A (en) | Car navigation device | |
US10522141B2 (en) | Vehicle voice recognition including a wearable device | |
CN110556104A (en) | Speech recognition device, speech recognition method, and storage medium storing program | |
JP4770374B2 (en) | Voice recognition device | |
WO2024070080A1 (en) | Information processing device, information processing method, and program | |
KR102417901B1 (en) | Apparatus and method for recognizing voice using manual operation | |
JP2005084589A (en) | Voice recognition device | |
US20230197076A1 (en) | Vehicle and control method thereof | |
US20150317973A1 (en) | Systems and methods for coordinating speech recognition | |
JP2005084590A (en) | Speech recognition device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17918190 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17918190 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |