WO2019016938A1 - Dispositif et procédé de reconnaissance de la parole - Google Patents

Dispositif et procédé de reconnaissance de la parole Download PDF

Info

Publication number
WO2019016938A1
WO2019016938A1 PCT/JP2017/026450 JP2017026450W WO2019016938A1 WO 2019016938 A1 WO2019016938 A1 WO 2019016938A1 JP 2017026450 W JP2017026450 W JP 2017026450W WO 2019016938 A1 WO2019016938 A1 WO 2019016938A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
speech
unit
recognition
speech recognition
Prior art date
Application number
PCT/JP2017/026450
Other languages
English (en)
Japanese (ja)
Inventor
昭男 堀井
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2017/026450 priority Critical patent/WO2019016938A1/fr
Publication of WO2019016938A1 publication Critical patent/WO2019016938A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems

Definitions

  • the present invention relates to a speech recognition apparatus capable of recognizing an input speech which is an input speech, and a speech recognition method in the speech recognition apparatus.
  • Patent Document 1 proposes a technique for switching between a push talk mode and a hands free mode based on the position of a speaker in a vehicle cabin.
  • the push-to-talk mode is a mode for recognizing voice when the button switch is pressed
  • the hands-free mode is a mode for recognizing voice regardless of the pressing of the button switch.
  • the speech recognition apparatus performs speech recognition processing on the speech in the speech segment after determining the speech segment which is a speech recognition period and the non-speech segment which is a period in which speech is not recognized.
  • the accuracy of the speech recognition is not sufficient because the processing is not switched from the plurality of speech segment determinations and the plurality of speech recognition processing to the appropriate processing.
  • the present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technology capable of enhancing the accuracy of speech recognition.
  • a speech recognition apparatus is a speech recognition apparatus capable of recognizing an input speech that is an input speech, and acquired by an acquisition unit that acquires a situation related to recognition of the input speech, and an acquisition unit. Based on the situation, it is possible to process voice recognition of voice segments determined by a plurality of predetermined voice segment determination units that can determine voice segments that are voice recognition periods and multiple voice segment determination units And a control unit for performing control to use at least any one voice segment determination unit and at least one voice recognition processing unit from among a plurality of predetermined voice recognition processing units for recognizing an input voice. Equipped with
  • control is performed to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of an input speech based on a situation related to recognition of the input speech. This can improve the accuracy of speech recognition.
  • FIG. 1 is a block diagram showing a configuration of a speech recognition device according to Embodiment 1.
  • FIG. 7 is a block diagram showing the configuration of a speech recognition device according to Embodiment 2;
  • FIG. 7 is a diagram showing discrimination data according to Embodiment 2;
  • 7 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment.
  • FIG. 10 is a block diagram showing a configuration of a speech recognition device according to a third embodiment.
  • FIG. 16 is a diagram showing discrimination data according to Embodiment 3.
  • 7 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment.
  • FIG. 16 is a block diagram showing a configuration of a speech recognition device according to a fourth embodiment.
  • FIG. 15 is a flowchart showing the operation of the speech recognition device according to the fourth embodiment.
  • FIG. 18 is a block diagram showing a configuration of a speech recognition device according to Embodiment 5.
  • FIG. 18 is a diagram showing discrimination data according to the fifth embodiment.
  • 21 is a flowchart showing the operation of the speech recognition device according to the fifth embodiment.
  • FIG. 18 is a diagram showing discrimination data according to Embodiment 6.
  • FIG. 16 is a flowchart showing the operation of the speech recognition device according to Embodiment 6.
  • FIG. It is a figure which shows the data for discrimination
  • It is a block diagram which shows the hardware constitutions of the navigation apparatus concerning the other modification.
  • It is a block diagram which shows the structure of the communication terminal which concerns on another modification.
  • Embodiment 1 the voice recognition device according to the first embodiment of the present invention is mounted, and a vehicle to be focused is described as “own vehicle”.
  • This voice recognition device can be applied to, for example, a navigation device mounted on the vehicle.
  • FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus 1 according to the first embodiment.
  • the speech recognition apparatus 1 of FIG. 1 is an apparatus capable of recognizing an input speech which is a speech input to the speech recognition apparatus 1. That is, the speech recognition device 1 is a device that selects a recognition vocabulary based on a vocabulary recognized by speech recognition and that is most likely to be acoustically and linguistically probable by the user.
  • An example of such an apparatus is disclosed in, for example, Japanese Patent Application Laid-Open No. 9-50291.
  • the input voice will be described as voice data indicating the strength (amplitude) and height (frequency) of the voice.
  • the speech recognition apparatus 1 of FIG. 1 includes an acquisition unit 11 and a control unit 12.
  • the acquisition unit 11 acquires a situation related to recognition of input speech.
  • a situation related to recognition of input speech may be referred to as "recognition related situation”.
  • the control unit 12 selects at least one of a plurality of speech segment determination units determined in advance and a plurality of speech recognition processing units determined in advance based on the recognition related situation acquired by the acquisition unit 11. Control is performed to use one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech.
  • a plurality of predetermined speech segment determination units may be described only as "a plurality of speech segment determination units”
  • a plurality of predetermined speech recognition processing units may be referred to as "a plurality of speech recognition processing units. There is also a case where it is described only as ".”
  • Each of the plurality of voice segment determination units determines a voice segment that is a period for recognizing voice and a non-voice segment that is a period for not recognizing voice. Note that at least one of the plurality of voice section determination units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.
  • Each of the plurality of voice recognition processing units is configured to be able to process voices in the voice section determined by the plurality of voice section determination units. That is, each of the plurality of voice recognition processing units extracts a feature included in the voice within the voice section determined by at least one of the plurality of voice section determination units, and a vocabulary or a word is extracted based on the feature. , It asks for as a recognition vocabulary which is a recognition result.
  • at least one of the plurality of voice recognition processing units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.
  • ⁇ Summary of Embodiment 1> According to the speech recognition apparatus 1 according to the first embodiment as described above, at least one speech segment determination unit and at least one speech recognition processing unit based on the situation related to the recognition of the input speech Is used to recognize input speech. This makes it possible to perform determination and recognition processing of the determination section suitable for the situation related to the recognition of the input speech, so that the accuracy of the speech recognition can be enhanced.
  • FIG. 2 is a block diagram showing a configuration of the speech recognition device 1 according to Embodiment 2 of the present invention.
  • the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
  • FIG. 2 also shows a server 6 capable of communicating with the voice recognition device 1 by radio or the like.
  • the speech recognition apparatus 1 of FIG. 2 includes a speech recognition method determination unit 2 and a speech recognition unit 3.
  • the voice recognition system determination unit 2 includes a storage unit 13 in addition to the acquisition unit 11 and the control unit 12 corresponding to the acquisition unit 11 and the control unit 12 described in the first embodiment.
  • the acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related status.
  • the communication status selectively includes high quality online, low quality online, and offline.
  • High quality online is a situation in which communication between the speech recognition device 1 and the server 6 is being performed, and a situation in which the evaluation value of the communication is equal to or greater than a predetermined threshold.
  • the evaluation value of communication may be, for example, a value that increases as the field strength of communication increases, or may be a value that increases as the speed of communication increases, or a combination of the two values. It may be a value.
  • the low quality online is a state in which communication between the speech recognition device 1 and the server 6 is performed, and the evaluation value of the communication is less than a threshold.
  • Offline is a situation where communication between the speech recognition device 1 and the server 6 is not performed.
  • the acquisition unit 11 may appropriately store the acquired communication status in the storage unit 13.
  • the plurality of voice section determination units 8a and 8b are components equivalent to the plurality of voice section determination units described in the first embodiment, and the first voice section determination unit 8a included in the voice recognition unit 3 and the server And the second voice section determining unit 8b provided in the second embodiment.
  • the plurality of voice recognition processing units 9a and 9b are components equivalent to the plurality of voice recognition processing units described in the first embodiment, and the first voice recognition processing unit 9a included in the voice recognition unit 3 and the server And the second speech recognition processing unit 9b provided in the second embodiment.
  • the control unit 12 selects one set of voice interval determination units from among the plurality of voice interval determination units 8a and 8b and the plurality of voice recognition processing units 9a and 9b based on the communication status acquired by the acquisition unit 11. Determine the speech recognition processor. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
  • FIG. 3 is a diagram showing discrimination data according to the second embodiment.
  • one set of speech zone determination unit and speech recognition processing unit are associated with one communication situation.
  • “1” and “2” of the “voice segment determination unit” indicate the first voice segment determination unit 8 a and the second voice segment determination unit 8 b, respectively, and “1” of the “voice recognition processing unit”.
  • “2” indicate the first speech recognition processor 9a and the second speech recognition processor 9b, respectively.
  • the control unit 12 determines the first voice segment determination unit Control to use 8a and the 2nd speech recognition processing part 9b for recognition of an input speech will be performed.
  • the voice section determination unit such as the first voice section determination unit 8a and the voice recognition processing unit such as the second voice recognition processing unit 9b may perform a plurality of processes at the same time.
  • the voice recognition unit 3 includes a first voice section determination unit 8a which is an off-line voice section determination unit, a first voice recognition processing unit 9a which is an off-line voice recognition processing unit, and a first recognition dictionary storage unit 10a.
  • the first recognition dictionary storage unit 10a stores a dictionary used when the first speech recognition processing unit 9a performs speech recognition processing.
  • the voice recognition unit 3 appropriately uses the first voice section determination unit 8a and the first voice recognition processing unit 9a for recognition of the input voice based on the control of the control unit 12.
  • the server 6 includes a second speech zone determination unit 8b which is an online speech zone judgment unit, a second speech recognition processing unit 9b which is an online speech recognition processing unit, and a second recognition dictionary storage unit 10b.
  • the second recognition dictionary storage unit 10 b stores a dictionary used when the second speech recognition processing unit 9 b performs a speech recognition process.
  • the server 6 appropriately uses the second voice section determining unit 8b and the second voice recognition processing unit 9b to recognize an input voice based on the control of the control unit 12.
  • the first speech zone determination unit 8a on the speech recognition device 1 side has lower determination accuracy of the speech zone than the second speech zone determination unit 8b on the server 6 side because of hardware limitations. It is possible to make the determination regardless of the communication status. Also, the first speech recognition processing unit 9a on the speech recognition device 1 side has a smaller number of recognizable vocabularies than the second speech recognition processing unit 9b on the server 6 side in general due to hardware limitations. However, it is possible to carry out recognition processing regardless of the communication situation.
  • the speech recognition apparatus 1 can perform speech recognition suitable for the communication situation as follows.
  • the second voice section determining unit 8b and the second voice recognition processing unit 9b are used according to the determination data of FIG. A large number of speech recognitions can be performed.
  • the voice recognition unit 1 When the communication status is low quality online, the first voice section determining unit 8a and the second voice recognition processing unit 9b are used according to the determination data of FIG. As a result, the voice recognition unit 1 performs voice zone determination, and transmits to the server 6 input voice from which data used for voice zone determination is removed, that is, only input voice used for voice recognition processing. Can perform online speech recognition. That is, even if the communication situation is bad, the speech recognition of the server 6 can be performed by reducing the data amount of the input speech to be communicated.
  • the first voice section determining unit 8a and the first voice recognition processing unit 9a are used according to the determination data of FIG. 3, so voice recognition can be performed regardless of the communication status. .
  • FIG. 4 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment. The operation of FIG. 4 is performed as needed.
  • step S ⁇ b> 1 the acquisition unit 11 acquires an input voice and outputs the input voice to the control unit 12.
  • control unit 12 determines whether the intensity of the input voice is equal to or greater than a predetermined threshold. If it is determined that the strength of the input voice is equal to or greater than the threshold, the process proceeds to step S3. If it is determined that the strength of the input voice is less than the threshold, the operation of FIG. 4 ends.
  • step S 3 the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6, and outputs the communication status to the control unit 12.
  • step S4 the control unit 12 determines one set of voice segment determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
  • step S5 the speech recognition unit 3 and the server 6 recognize the input speech using the set of speech segment determination unit and speech recognition processing unit determined by the control unit 12. Then, the recognition vocabulary as the recognition result is output from the speech recognition device 1. Thereafter, the operation of FIG. 4 ends.
  • FIG. 5 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 3 of the present invention.
  • the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
  • the speech recognition apparatus 1 of FIG. 5 includes a speech recognition method determination unit 2 and a speech recognition unit 3 as in the speech recognition apparatus 1 (FIG. 2) according to the second embodiment.
  • the recognition start button 21 mentioned later is connected to the speech recognition apparatus 1 of FIG.
  • the speech recognition method determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the speech recognition method determination unit 2 according to the second embodiment.
  • the acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related situation.
  • the acquisition unit 11 also acquires, as a recognition related situation, the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted.
  • the riding condition includes the presence or absence of the driver and the presence or absence of the passenger who is a passenger other than the driver.
  • the acquisition unit 11 may be configured to determine the riding condition based on whether or not the seat belt of the own vehicle is used, and a sensor for detecting the seating of the own vehicle is provided in each seat If yes, the boarding situation may be determined based on the detection result of the sensor.
  • the acquisition unit 11 may be configured to capture an image of the interior of the host vehicle and to determine the riding condition by performing image recognition on the image.
  • the acquisition unit 11 may be an interface that acquires the result of the determination from an external device provided outside the speech recognition apparatus 1. The acquisition unit 11 may appropriately store the acquired communication status and boarding status in the storage unit 13.
  • the control section 12 determines a plurality of speech segment determination sections 8c, 8d, 8e and 8f determined in advance and a plurality of speech recognition processing sections 9c determined in advance. , 9d, 9e, and 9f, one set of voice section determination unit and voice recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
  • the plurality of voice section determination units 8c to 8f are a first voice section determination unit 8c and a second voice section determination unit 8d provided in the voice recognition unit 3, and a third voice section determination unit 8e provided in the server 6. And a fourth speech zone determination unit 8f.
  • the plurality of voice recognition processing units 9c to 9f are a first voice recognition processing unit 9c and a second voice recognition processing unit 9d provided in the voice recognition unit 3, and a third voice recognition processing unit 9e provided in the server 6.
  • a fourth speech recognition processor 9f are a fourth speech recognition processor 9f.
  • FIG. 6 is a diagram showing discrimination data according to the third embodiment.
  • one set of voice section determination unit and voice recognition processing unit are associated with the presence or absence of one set of passenger, the presence or absence of the driver, and the communication status.
  • "1" to "4" of the "voice section determination unit” indicate the first voice section determination unit 8c to the fourth voice section determination unit 8f, respectively, and "1" of the "voice recognition processing unit”.
  • '4' indicate the first speech recognition processor 9c to the fourth speech recognition processor 9f, respectively.
  • the voice recognition unit 3 includes a first voice section determination unit 8c, which is an always-offline voice section determination unit, a first voice recognition processing unit 9c, which is an always-offline voice recognition processing unit, a first recognition dictionary storage unit 10c, A second speech zone judgment unit 8d which is an offline speech zone judgment unit, a second speech recognition processing unit 9d which is a normal offline speech recognition processor, and a second recognition dictionary storage unit 10d.
  • the first and second recognition dictionary storage units 10c and 10d respectively store dictionaries used when the first and second speech recognition processing units 9c and 9d perform speech recognition processing.
  • the voice recognition unit 3 appropriately uses the first and second voice section determination units 8c and 8d and the first and second voice recognition processing units 9c and 9d based on the control of the control unit 12 for recognition of the input voice. .
  • the server 6 includes a third voice zone determination unit 8e which is an always online voice zone determination unit, a third voice recognition processing unit 9e which is an always online voice recognition processing unit, a third recognition dictionary storage unit 10e, and a normal online voice It includes a fourth speech section judging section 8f which is a section judging section, a fourth speech recognition processing section 9f which is a normal online speech recognition processing section, and a fourth recognition dictionary storage section 10f.
  • the third and fourth recognition dictionary storage units 10e and 10f respectively store dictionaries used when the third and fourth speech recognition processing units 9e and 9f perform speech recognition processing.
  • the server 6 appropriately uses the third and fourth speech zone determination units 8e and 8f and the third and fourth speech recognition processing units 9e and 9f for recognition of the input speech based on the control of the control unit 12.
  • the speech recognition apparatus 1 performs speech recognition suitable for the communication situation as in the speech recognition apparatus 1 according to the second embodiment. Can.
  • first and third speech segment determination units 8c and 8e are speech segment determination units that constantly determine a speech segment, and the second and fourth speech segment determination units 8d and 8f respond to a predetermined operation. It is a voice section determination unit that determines a voice section.
  • the first and third speech recognition processing units 9c and 9e are speech recognition processing units that always recognize and process speech, and the second and fourth speech recognition processing units 9d and 9f perform speech in accordance with a predetermined operation. Is a speech recognition processing unit that recognizes and processes.
  • the predetermined operation is an operation on the recognition start button 21 for starting voice recognition.
  • the speech recognition apparatus 1 can perform speech recognition suitable for the riding situation as follows.
  • the third speech recognition processor 9e is used.
  • a passenger who desires voice recognition can perform voice recognition without performing an operation on the recognition start button 21. This is convenient for the passenger especially when the recognition start button 21 is provided at a place away from the passenger.
  • the second voice section judging unit 8d or the fourth voice section judging unit 8f and the second voice section judging unit 8f according to the discrimination data of FIG.
  • a voice recognition processor 9d or a fourth voice recognition processor 9f is used. This can suppress unintended speech recognition.
  • the discrimination data is not limited to the data set as shown in FIG.
  • the discrimination data when there is no passenger's boarding and there is a driver's boarding, it may be desirable for the driver to concentrate on driving. Therefore, when the passenger does not get in and the driver gets in, the first voice section determination unit 8c or the third voice section determination unit 8e, and the first voice recognition processing unit 9c or the third voice The recognition processor 9e may be used.
  • FIG. 7 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment.
  • the operation in FIG. 7 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3a.
  • step S3 of the operation FIG. 4
  • steps S1 and S2 processing similar to that in steps S1 and S2 of FIG. 4 is performed.
  • step S3a the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6 and outputs the communication state to the control unit 12. Further, the acquisition unit 11 acquires the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted, and outputs the acquired riding condition to the control unit 12.
  • step S4 the control unit 12 determines one set of voice section determining unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status and the boarding status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
  • step S5 the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 7 ends.
  • the control unit 12 determines one set of speech segment determination unit and speech recognition processing unit by one determination.
  • the present invention is not limited to this, and the control unit 12 may determine one set of voice section determination unit and voice recognition processing unit by multiple determinations. For example, based on one of the communication status and the boarding status, the control unit 12 may select one or more speech segment determination units and some speech recognition processing units among the plurality of speech segment determination units and the plurality of speech recognition processing units. You may judge. After that, based on the other of the communication status and the boarding status, the control unit 12 sets one voice segment determination unit and one voice recognition process out of several voice segment determination units and several voice recognition processing units. The part may be determined. The same applies to the fourth and subsequent embodiments described later.
  • the speech segment determination unit and the speech recognition processing unit suitable for the communication status and the boarding status can be used for recognition of input speech.
  • the control unit 12 uses the voice section determination unit and the voice recognition processing unit for recognizing the input voice, based on both of the communication status and the boarding status acquired by the acquisition unit 11. I took control.
  • the present invention is not limited to this, and the control unit 12 does not consider the communication status acquired by the acquisition unit 11, and based on the boarding status acquired by the acquisition unit 11, the voice section determination unit and the speech recognition process The unit may be controlled to use for recognition of input speech.
  • FIG. 8 is a block diagram showing a configuration of a speech recognition device 1 according to a fourth embodiment of the present invention.
  • constituent elements described in the fourth embodiment constituent elements that are the same as or similar to the above constituent elements are given the same reference numerals, and different constituent elements are mainly described.
  • the speech recognition apparatus 1 shown in FIG. 8 has the same configuration as the speech recognition apparatus 1 (FIG. 5) according to the third embodiment, in which the input device 22 is connected.
  • an operation hereinafter, referred to as a "specified operation" for specifying the use of one set of speech segment determination unit and speech recognition processing unit is input.
  • the voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
  • the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
  • the designation operation input to the input device 22 is stored in the storage unit 13, and the acquisition unit 11 acquires the history of the designation operation (hereinafter referred to as “operation history”) from the storage unit 13.
  • control unit 12 selects one of the plurality of speech segment determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f, which is designated by the designation operation.
  • the voice segment determination unit and the voice recognition processing unit are controlled to be used for recognizing an input voice.
  • the control unit 12 sets a plurality of voice section determination units 8c to 8f based on the communication status, the boarding status, and the operation history acquired by the acquisition unit 11. Among the speech recognition processing units 9c to 9f, one set of speech segment determination unit and speech recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.
  • the control unit 12 determines whether or not the number of times when the same set of voice section determination unit and voice recognition processing unit is specified is equal to or more than a predetermined threshold. Then, when it is determined that the number of times is equal to or more than the threshold value, the control unit 12 recognizes one set of the voice section determination unit and the voice recognition processing unit designated by the designation operation of the operation history as the input voice. Perform control to use. On the other hand, when it is determined that the above-mentioned number of times is less than the threshold value, the control unit 12 determines one set of voice based on the communication status and the boarding status acquired by the acquisition section 11 as in the third embodiment. The section determination unit and the speech recognition processing unit are controlled to be used for recognition of input speech.
  • FIG. 9 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment.
  • the operation of FIG. 9 is the same as that of step S3 of the operation of the speech recognition apparatus 1 according to the second embodiment (FIG. 4) except that step S3 is changed to step S3 b and steps S11 and S12 are added.
  • step S3 is changed to step S3 b and steps S11 and S12 are added.
  • control unit 12 determines whether or not a designation operation has been input to input device 22. If it is determined that the designating operation has been input, the process proceeds to step S12. If it is determined that the designating operation has not been input, the process proceeds to step S1.
  • control unit 12 performs control to use one set of voice segment determination unit and voice recognition processing unit designated by the designation operation for recognition of input voice.
  • the speech recognition unit 3 and the server 6 recognize the input speech using the pair of speech segment determination unit and speech recognition processing unit designated by the designation operation. Thereafter, the operation of FIG. 9 ends.
  • step S11 processing similar to that of steps S1 and S2 of FIG. 4 is performed.
  • step S3b the acquisition unit 11 acquires the communication status, the boarding status, and the operation history, and outputs the communication status, the boarding status, and the operation history to the control unit 12.
  • step S4 the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the operation history. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
  • step S5 the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 9 ends.
  • ⁇ Summary of Embodiment 4> According to the speech recognition apparatus 1 according to the fourth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used for recognition of input speech based on the recognition related situation and the operation history. . As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
  • FIG. 10 is a block diagram showing the configuration of the speech recognition device 1 according to the fifth embodiment of the present invention.
  • the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
  • the voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
  • the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
  • the acquisition unit 11 also acquires the recognition result of the speech recognition unit 3 and the recognition result of the server 6. Then, the acquisition unit 11 determines, based on the acquired recognition result, whether the voice input in the past, that is, the input voice in the past includes a predetermined command.
  • the configuration is not limited to this as long as the acquisition unit 11 is configured to acquire the determination result as to whether or not a predetermined command is included in the past input voice. For example, this determination may be performed by an external device provided outside the voice recognition device 1, and the acquisition unit 11 may be configured to acquire the determination result from the external device.
  • the predetermined command includes an upper-level command which is a command for starting the execution.
  • the upper level command includes a command such as "Navigation” for starting execution of a navigation function capable of searching for a destination, a route search, etc.
  • the lower level command that is not the upper level command is a destination.
  • the command “destination search” for executing a search, the command “route search” for executing a route search, and the like are included.
  • the control unit 12 selects one of the plurality of speech section determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f based on the communication status, the boarding status, and the determination result acquired by the acquisition unit 11.
  • a control is performed to use a set of speech zone determination unit and speech recognition processing unit for recognition of input speech.
  • FIG. 11 is a diagram showing part of the discrimination data according to the fifth embodiment.
  • one of the 12 combinations shown in FIG. 6 is divided into whether the determination result includes the upper command or the lower command, and the voice section determining unit and the voice recognition processing unit are set.
  • the voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.
  • a regular voice section determination unit and a regular voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the upper level command.
  • a normal voice section determination unit and a normal voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the lower order command.
  • the discrimination data as shown in FIG. 11 since constant voice recognition is performed on a relatively small number of upper commands, it is possible to suppress voice recognition against the user's intention. It should be noted that the user can input voices of a plurality of lower level commands subsequently to the voice of the upper level command, and if the determination result is once determined to include the upper level command, the user can enter for a certain period of time. The determination result that the lower order command is included may be invalidated.
  • FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment.
  • the operation in FIG. 12 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3c.
  • step S3 of the operation FIG. 4
  • steps S1 and S2 processing similar to that in steps S1 and S2 of FIG. 4 is performed.
  • step S3c the acquisition unit 11 acquires the communication status, the boarding status, and the determination result of the upper-level command, and outputs the communication status, the boarding status, and the determination result of the upper level command to the control unit 12.
  • step S4 the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the determination result. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
  • step S5 the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 12 ends.
  • ⁇ Summary of Embodiment 5> According to the speech recognition device 1 according to the fifth embodiment as described above, one set of speech segment judgment unit and speech recognition processing are based on the recognition related situation and the judgment result of whether or not the upper level command is included. Part is used to recognize the input speech. As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.
  • Embodiment 6 The block configuration of the speech recognition device 1 according to the sixth embodiment of the present invention is the same as the block configuration (FIG. 5) of the speech recognition device 1 according to the third embodiment.
  • the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.
  • the voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.
  • the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.
  • the acquisition unit 11 also acquires the usage state of the hardware of the speech recognition device 1.
  • hardware of the speech recognition device 1 is hardware of a device such as a navigation device to which the speech recognition device 1 is applied, and the usage state of the hardware includes the usage rate of the hardware.
  • the usage rate of hardware includes, for example, a usage rate of a central processing unit (CPU), a usage rate of a memory, and the like.
  • the control section 12 Based on the communication status, the boarding status, and the use status acquired by the acquisition section 11, the control section 12 selects one set out of the plurality of speech segment determination sections 8c to 8f and the plurality of speech recognition processing sections 9c to 9f.
  • the voice section determination unit and the voice recognition processing unit of the above are controlled to be used for recognition of the input voice.
  • the storage unit 13 stores discrimination data used when the control unit 12 performs discrimination.
  • FIG. 13 is a diagram showing a part of discrimination data according to the sixth embodiment.
  • one of the 12 combinations shown in FIG. 6 is divided into whether the usage rate included in the usage state is equal to or more than a predetermined threshold value, and the voice section determination unit and the voice recognition processing unit Are set, but the voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.
  • a voice segment determining unit and a voice recognition processing unit used when the usage rate included in the usage state is equal to or more than a predetermined threshold a normal voice segment determining unit and a normal voice recognition process A section is set, and as a voice section determination section and a voice recognition processing section used when the usage rate is less than the threshold, a regular voice section determination section and a regular voice recognition processing section are set.
  • FIG. 14 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment.
  • the operation in FIG. 14 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3d.
  • step S3 of the operation FIG. 14
  • step S3d FIG. 14
  • steps S1 and S2 processing similar to that in steps S1 and S2 of FIG. 4 is performed.
  • step S3d the acquisition unit 11 acquires the communication status, the boarding status, and the usage status of the hardware, and outputs the acquired status to the control unit 12.
  • step S4 the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status, the boarding status, and the use status of the hardware. Do. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.
  • step S5 the same process as step S5 of FIG. 4 is performed, and the operation of FIG. 14 ends.
  • ⁇ Summary of Embodiment 6> According to the speech recognition apparatus 1 according to the sixth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used to input an input speech based on the recognition related situation and the usage state of the hardware. Used for recognition. As a result, it is possible to use the speech segment determination unit and the speech recognition processing unit suitable for the usage state of the hardware for recognition of the input speech.
  • the control unit 12 performs control to use one set of speech segment determination unit and speech recognition processing unit for recognition of input speech based on the communication status acquired by the acquisition unit 11. Not limited to this. For example, as illustrated in FIG. 15, when the communication status acquired by the acquisition unit 11 is low quality online, the control unit 12 performs the first speech zone judgment unit 8a and the second speech zone judgment unit 8b in parallel. Control may be performed using the first voice recognition processing unit 9a and the second voice section determination unit 8b in parallel. That is, based on the communication status acquired by the acquisition unit 11, the control unit 12 performs control of using a combination of a plurality of speech segment determination units and a combination of a plurality of speech recognition processing units for recognition of input speech. You may go.
  • control unit 12 uses at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. Control may be performed. The same applies to the third to sixth embodiments and to the designation operation of the third embodiment.
  • Embodiments 4 to 6 may be combined as appropriate. That is, the acquisition unit 11 may acquire the recognition related situation, and may acquire at least one of the operation history, the determination result of whether or not the upper-level command is included, and the usage state of the hardware. . Then, the control unit 12 performs at least one of the recognition related situation acquired by the acquisition unit 11 and at least one of the operation history acquired by the acquisition unit 11, the determination result, and the use state. Control may be performed in which one voice segment determination unit and at least one voice recognition processing unit are used to recognize an input voice.
  • the acquisition unit 11 and the control unit 12 in FIG. 1 in the above-described speech recognition apparatus 1 will be referred to as “acquisition unit 11 and the like”.
  • the acquisition unit 11 and the like are realized by the processing circuit 81 shown in FIG. That is, the processing circuit 81 obtains a plurality of speech segment determination units determined in advance and a plurality of voices determined in advance based on the acquisition unit 11 acquiring the recognition related situation and the recognition related situation acquired by the acquisition unit 11.
  • the control unit 12 performs control to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech.
  • Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in a memory may be applied.
  • the processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), and the like.
  • the processing circuit 81 may be, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), an FPGA (field programmable gate) An array) or a combination thereof is applicable.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate
  • Each function of each unit such as the acquisition unit 11 may be realized by a circuit in which processing circuits are dispersed, or the function of each unit may be realized by one processing circuit.
  • the processing circuit 81 When the processing circuit 81 is a processor, the functions of the acquisition unit 11 and the like are realized by a combination with software and the like.
  • the software and the like correspond to, for example, software, firmware, or software and firmware.
  • Software and the like are described as a program and stored in the memory 83.
  • the processor 82 applied to the processing circuit 81 implements the functions of the respective units by reading and executing the program stored in the memory 83. That is, when executed by the processing circuit 81, the speech recognition device 1 obtains a recognition related situation, and determines in advance a plurality of predetermined voice interval judging units based on the acquired recognition related situation.
  • the memory 83 for storing a program to be executed.
  • this program causes a computer to execute the procedure and method of the acquisition unit 11 and the like.
  • the memory 83 is, for example, non-volatile or non-volatile, such as random access memory (RAM), read only memory (ROM), flash memory, erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM). Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc), its drive device, etc. or any storage medium used in the future May be
  • the present invention is not limited to this, and a part of the acquisition unit 11 or the like may be realized by dedicated hardware, and another part may be realized by software or the like.
  • the function of the acquisition unit 11 is realized by the processing circuit 81 and the receiver as dedicated hardware, and the processing circuit 81 as the processor 82 reads out and executes the program stored in the memory 83 for the rest. It is possible to realize the function by that.
  • the processing circuit 81 can realize each of the functions described above by hardware, software, etc., or a combination thereof.
  • the voice recognition device 1 described above includes at least one of a navigation device such as a portable navigation device (PND), a communication terminal including a portable terminal such as a mobile phone, a smartphone and a tablet, a navigation device and a communication terminal.
  • a navigation device such as a portable navigation device (PND)
  • PND portable navigation device
  • a communication terminal including a portable terminal such as a mobile phone, a smartphone and a tablet
  • a navigation device and a communication terminal can also be applied to a speech recognition system constructed as a system by appropriately combining the function of an application installed in one and the server.
  • each function or each component of the speech recognition apparatus 1 described above may be distributed to each device configuring the system, or may be concentrated to any device. Good.
  • FIG. 18 is a block diagram showing the configuration of the server 91 according to the present modification.
  • the server 91 of FIG. 18 includes a communication unit 91a and a control unit 91b, and can perform wireless communication with the navigation device 93 of the vehicle 92.
  • the communication unit 91a which is an acquisition unit, wirelessly communicates with the navigation device 93 to receive the recognition related situation.
  • the control unit 91 b has a function similar to that of the control unit 12 of FIG. 1 by executing a program stored in a memory (not shown) of the server 91 by a processor (not shown) of the server 91 or the like. That is, the control unit 91b determines at least one voice segment determination unit and at least one voice recognition processing unit based on the recognition related situation received by the communication unit 91a, and determines the determination result as a navigation device. Send to 93
  • the same effect as that of the speech recognition device 1 described in the first embodiment can be obtained.
  • FIG. 19 is a block diagram showing a configuration of communication terminal 96 according to the present modification.
  • the communication terminal 96 of FIG. 19 includes a communication unit 96a similar to the communication unit 91a and a control unit 96b similar to the control unit 91b, and can communicate wirelessly with the navigation device 98 of the vehicle 97. ing.
  • a mobile terminal such as a mobile phone carried by the driver of the vehicle 97, a smart phone, and a tablet is applied.
  • communication terminal 96 configured as described above, the same effect as that of speech recognition device 1 described in Embodiment 1 can be obtained.
  • each embodiment and each modification can be freely combined, or each embodiment and each modification can be suitably modified or omitted.
  • Reference Signs List 1 voice recognition apparatus 6 servers, 8a to 8f voice section determination unit, 9a to 9f voice recognition processing unit, 11 acquisition unit, 12 control unit, 21 recognition start button.

Abstract

L'objectif de la présente invention est de fournir une technique qui puisse améliorer la précision de la reconnaissance de la parole. Ce dispositif de reconnaissance de la parole est pourvu d'une unité d'acquisition et d'une unité de commande. Sur la base d'un état acquis par l'unité d'acquisition, l'unité de commande effectue une commande de façon à utiliser, pour la reconnaissance d'une entrée de parole, au moins une unité de détermination de segment de parole, parmi plusieurs unités de détermination de segment de parole qui sont déterminées à l'avance, de façon à pouvoir déterminer un segment de parole qui est une période destinée à la reconnaissance de la parole et au moins une unité de traitement de reconnaissance de la parole, parmi plusieurs unités de traitement de reconnaissance de parole qui sont déterminées à l'avance, de façon à pouvoir effectuer un processus destiné à la reconnaissance de la parole dans le segment de parole déterminé par les unités de détermination de segment de parole.
PCT/JP2017/026450 2017-07-21 2017-07-21 Dispositif et procédé de reconnaissance de la parole WO2019016938A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/026450 WO2019016938A1 (fr) 2017-07-21 2017-07-21 Dispositif et procédé de reconnaissance de la parole

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/026450 WO2019016938A1 (fr) 2017-07-21 2017-07-21 Dispositif et procédé de reconnaissance de la parole

Publications (1)

Publication Number Publication Date
WO2019016938A1 true WO2019016938A1 (fr) 2019-01-24

Family

ID=65015595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/026450 WO2019016938A1 (fr) 2017-07-21 2017-07-21 Dispositif et procédé de reconnaissance de la parole

Country Status (1)

Country Link
WO (1) WO2019016938A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017151210A (ja) * 2016-02-23 2017-08-31 Nttテクノクロス株式会社 情報処理装置、音声認識方法及びプログラム
CN112802471A (zh) * 2020-12-31 2021-05-14 北京梧桐车联科技有限责任公司 语音音区切换方法、装置、设备及存储介质
US20210383808A1 (en) * 2019-02-26 2021-12-09 Preferred Networks, Inc. Control device, system, and control method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09127982A (ja) * 1995-10-27 1997-05-16 Nec Robotics Eng Ltd 音声認識装置
JP2005031758A (ja) * 2003-07-07 2005-02-03 Canon Inc 音声処理装置及び方法
WO2011148594A1 (fr) * 2010-05-26 2011-12-01 日本電気株式会社 Système de reconnaissance vocale, terminal d'acquisition de voix, procédé de répartition de reconnaissance vocale et programme de reconnaissance vocale
WO2013005248A1 (fr) * 2011-07-05 2013-01-10 三菱電機株式会社 Dispositif de reconnaissance vocale et dispositif de navigation
JP2014186295A (ja) * 2013-02-21 2014-10-02 Nippon Telegr & Teleph Corp <Ntt> 音声区間検出装置、音声認識装置、その方法、及びプログラム
JP2015200860A (ja) * 2014-04-01 2015-11-12 ソフトバンク株式会社 辞書データベース管理装置、apiサーバ、辞書データベース管理方法、及び辞書データベース管理プログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09127982A (ja) * 1995-10-27 1997-05-16 Nec Robotics Eng Ltd 音声認識装置
JP2005031758A (ja) * 2003-07-07 2005-02-03 Canon Inc 音声処理装置及び方法
WO2011148594A1 (fr) * 2010-05-26 2011-12-01 日本電気株式会社 Système de reconnaissance vocale, terminal d'acquisition de voix, procédé de répartition de reconnaissance vocale et programme de reconnaissance vocale
WO2013005248A1 (fr) * 2011-07-05 2013-01-10 三菱電機株式会社 Dispositif de reconnaissance vocale et dispositif de navigation
JP2014186295A (ja) * 2013-02-21 2014-10-02 Nippon Telegr & Teleph Corp <Ntt> 音声区間検出装置、音声認識装置、その方法、及びプログラム
JP2015200860A (ja) * 2014-04-01 2015-11-12 ソフトバンク株式会社 辞書データベース管理装置、apiサーバ、辞書データベース管理方法、及び辞書データベース管理プログラム

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017151210A (ja) * 2016-02-23 2017-08-31 Nttテクノクロス株式会社 情報処理装置、音声認識方法及びプログラム
US20210383808A1 (en) * 2019-02-26 2021-12-09 Preferred Networks, Inc. Control device, system, and control method
CN112802471A (zh) * 2020-12-31 2021-05-14 北京梧桐车联科技有限责任公司 语音音区切换方法、装置、设备及存储介质
CN112802471B (zh) * 2020-12-31 2024-01-23 北京梧桐车联科技有限责任公司 语音音区切换方法、装置、设备及存储介质

Similar Documents

Publication Publication Date Title
US10706853B2 (en) Speech dialogue device and speech dialogue method
US9230538B2 (en) Voice recognition device and navigation device
CN106209138B (zh) 一种车辆谨慎紧急响应系统及方法
WO2013005248A1 (fr) Dispositif de reconnaissance vocale et dispositif de navigation
JP4260788B2 (ja) 音声認識機器制御装置
US10176806B2 (en) Motor vehicle operating device with a correction strategy for voice recognition
CN104978015B (zh) 具有语种自适用功能的导航系统及其控制方法
EP2963644A1 (fr) Système et procédé de détermination d&#39;intention à commande audio
CN105355202A (zh) 语音识别装置、具有语音识别装置的车辆及其控制方法
US20200160861A1 (en) Apparatus and method for processing voice commands of multiple talkers
WO2019016938A1 (fr) Dispositif et procédé de reconnaissance de la parole
US20200211562A1 (en) Voice recognition device and voice recognition method
US20190130908A1 (en) Speech recognition device and method for vehicle
US20130013310A1 (en) Speech recognition system
US10276180B2 (en) Audio command adaptive processing system and method
JP6459330B2 (ja) 音声認識装置、音声認識方法、及び音声認識プログラム
JP2016133378A (ja) カーナビゲーション装置
JP4770374B2 (ja) 音声認識装置
US10522141B2 (en) Vehicle voice recognition including a wearable device
JP2005084589A (ja) 音声認識装置
JP2019191477A (ja) 音声認識装置及び音声認識方法
JP2005084590A (ja) 音声認識装置
JP2008309865A (ja) 音声認識装置および音声認識方法
JP2008145676A (ja) 音声認識装置及び車両ナビゲーション装置
JP2018124805A (ja) 車載情報端末及び情報検索プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17918190

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17918190

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP