WO2019016938A1

WO2019016938A1 - Speech recognition device and speech recognition method

Info

Publication number: WO2019016938A1
Application number: PCT/JP2017/026450
Authority: WO
Inventors: 昭男堀井
Original assignee: 三菱電機株式会社
Priority date: 2017-07-21
Filing date: 2017-07-21
Publication date: 2019-01-24

Abstract

The purpose of the present invention is to provide a technique capable of enhancing the accuracy of speech recognition. This speech recognition device is provided with an acquisition unit and a control unit. On the basis of a status acquired by the acquisition unit, the control unit carries out control so as to use, for recognition of input speech, at least one among a plurality of speech segment determination units that are determined in advance so as to be able to determine a speech segment which is a period for recognizing speech and at least one among a plurality of speech recognition processing units that are determined in advance so as to be able to carry out a process for recognizing the speech in the speech segment determined by the speech segment determination units.

Description

Speech recognition apparatus and speech recognition method

The present invention relates to a speech recognition apparatus capable of recognizing an input speech which is an input speech, and a speech recognition method in the speech recognition apparatus.

In recent years, various techniques have been proposed for a speech recognition apparatus that recognizes speech. For example, Patent Document 1 proposes a technique for switching between a push talk mode and a hands free mode based on the position of a speaker in a vehicle cabin. The push-to-talk mode is a mode for recognizing voice when the button switch is pressed, and the hands-free mode is a mode for recognizing voice regardless of the pressing of the button switch.

JP 2001-42894 A

The speech recognition apparatus performs speech recognition processing on the speech in the speech segment after determining the speech segment which is a speech recognition period and the non-speech segment which is a period in which speech is not recognized. However, in the prior art, there is a problem that the accuracy of the speech recognition is not sufficient because the processing is not switched from the plurality of speech segment determinations and the plurality of speech recognition processing to the appropriate processing.

Therefore, the present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technology capable of enhancing the accuracy of speech recognition.

A speech recognition apparatus according to the present invention is a speech recognition apparatus capable of recognizing an input speech that is an input speech, and acquired by an acquisition unit that acquires a situation related to recognition of the input speech, and an acquisition unit. Based on the situation, it is possible to process voice recognition of voice segments determined by a plurality of predetermined voice segment determination units that can determine voice segments that are voice recognition periods and multiple voice segment determination units And a control unit for performing control to use at least any one voice segment determination unit and at least one voice recognition processing unit from among a plurality of predetermined voice recognition processing units for recognizing an input voice. Equipped with

According to the present invention, control is performed to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of an input speech based on a situation related to recognition of the input speech. This can improve the accuracy of speech recognition.

The objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings.

FIG. 1 is a block diagram showing a configuration of a speech recognition device according to Embodiment 1. FIG. 7 is a block diagram showing the configuration of a speech recognition device according to Embodiment 2; FIG. 7 is a diagram showing discrimination data according to Embodiment 2; 7 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment. FIG. 10 is a block diagram showing a configuration of a speech recognition device according to a third embodiment. FIG. 16 is a diagram showing discrimination data according to Embodiment 3. 7 is a flowchart showing the operation of the speech recognition apparatus according to the third embodiment. FIG. 16 is a block diagram showing a configuration of a speech recognition device according to a fourth embodiment. 15 is a flowchart showing the operation of the speech recognition device according to the fourth embodiment. FIG. 18 is a block diagram showing a configuration of a speech recognition device according to Embodiment 5. FIG. 18 is a diagram showing discrimination data according to the fifth embodiment. 21 is a flowchart showing the operation of the speech recognition device according to the fifth embodiment. FIG. 18 is a diagram showing discrimination data according to Embodiment 6. FIG. 16 is a flowchart showing the operation of the speech recognition device according to Embodiment 6. FIG. It is a figure which shows the data for discrimination | determination which concerns on a modification. It is a block diagram which shows the hardware constitutions of the navigation apparatus concerning the other modification. It is a block diagram which shows the hardware constitutions of the navigation apparatus concerning the other modification. It is a block diagram showing composition of a server concerning other modifications. It is a block diagram which shows the structure of the communication terminal which concerns on another modification.

Embodiment 1
Hereinafter, the voice recognition device according to the first embodiment of the present invention is mounted, and a vehicle to be focused is described as “own vehicle”. This voice recognition device can be applied to, for example, a navigation device mounted on the vehicle.

FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus 1 according to the first embodiment. The speech recognition apparatus 1 of FIG. 1 is an apparatus capable of recognizing an input speech which is a speech input to the speech recognition apparatus 1. That is, the speech recognition device 1 is a device that selects a recognition vocabulary based on a vocabulary recognized by speech recognition and that is most likely to be acoustically and linguistically probable by the user. An example of such an apparatus is disclosed in, for example, Japanese Patent Application Laid-Open No. 9-50291. Hereinafter, the input voice will be described as voice data indicating the strength (amplitude) and height (frequency) of the voice.

The speech recognition apparatus 1 of FIG. 1 includes an acquisition unit 11 and a control unit 12.

The acquisition unit 11 acquires a situation related to recognition of input speech. In the following description, a situation related to recognition of input speech may be referred to as "recognition related situation".

The control unit 12 selects at least one of a plurality of speech segment determination units determined in advance and a plurality of speech recognition processing units determined in advance based on the recognition related situation acquired by the acquisition unit 11. Control is performed to use one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. In the following description, a plurality of predetermined speech segment determination units may be described only as "a plurality of speech segment determination units", and a plurality of predetermined speech recognition processing units may be referred to as "a plurality of speech recognition processing units. There is also a case where it is described only as "."

Each of the plurality of voice segment determination units determines a voice segment that is a period for recognizing voice and a non-voice segment that is a period for not recognizing voice. Note that at least one of the plurality of voice section determination units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.

Each of the plurality of voice recognition processing units is configured to be able to process voices in the voice section determined by the plurality of voice section determination units. That is, each of the plurality of voice recognition processing units extracts a feature included in the voice within the voice section determined by at least one of the plurality of voice section determination units, and a vocabulary or a word is extracted based on the feature. , It asks for as a recognition vocabulary which is a recognition result. Note that at least one of the plurality of voice recognition processing units is provided, for example, in at least one of the voice recognition device 1 and a server that can communicate with the voice recognition device 1 by wireless or the like.

<Summary of Embodiment 1>
According to the speech recognition apparatus 1 according to the first embodiment as described above, at least one speech segment determination unit and at least one speech recognition processing unit based on the situation related to the recognition of the input speech Is used to recognize input speech. This makes it possible to perform determination and recognition processing of the determination section suitable for the situation related to the recognition of the input speech, so that the accuracy of the speech recognition can be enhanced.

Second Embodiment
FIG. 2 is a block diagram showing a configuration of the speech recognition device 1 according to Embodiment 2 of the present invention. Hereinafter, among the components described in the second embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described. Note that FIG. 2 also shows a server 6 capable of communicating with the voice recognition device 1 by radio or the like.

The speech recognition apparatus 1 of FIG. 2 includes a speech recognition method determination unit 2 and a speech recognition unit 3.

<Voice recognition method discrimination unit>
The voice recognition system determination unit 2 includes a storage unit 13 in addition to the acquisition unit 11 and the control unit 12 corresponding to the acquisition unit 11 and the control unit 12 described in the first embodiment.

The acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related status. The communication status selectively includes high quality online, low quality online, and offline.

High quality online is a situation in which communication between the speech recognition device 1 and the server 6 is being performed, and a situation in which the evaluation value of the communication is equal to or greater than a predetermined threshold. The evaluation value of communication may be, for example, a value that increases as the field strength of communication increases, or may be a value that increases as the speed of communication increases, or a combination of the two values. It may be a value. The low quality online is a state in which communication between the speech recognition device 1 and the server 6 is performed, and the evaluation value of the communication is less than a threshold. Offline is a situation where communication between the speech recognition device 1 and the server 6 is not performed. The acquisition unit 11 may appropriately store the acquired communication status in the storage unit 13.

The plurality of voice section determination units 8a and 8b are components equivalent to the plurality of voice section determination units described in the first embodiment, and the first voice section determination unit 8a included in the voice recognition unit 3 and the server And the second voice section determining unit 8b provided in the second embodiment. The plurality of voice

recognition processing units

9a and 9b are components equivalent to the plurality of voice recognition processing units described in the first embodiment, and the first voice recognition processing unit 9a included in the voice recognition unit 3 and the server And the second speech recognition processing unit 9b provided in the second embodiment.

The control unit 12 selects one set of voice interval determination units from among the plurality of voice interval determination units 8a and 8b and the plurality of voice

recognition processing units

9a and 9b based on the communication status acquired by the acquisition unit 11. Determine the speech recognition processor. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.

The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 3 is a diagram showing discrimination data according to the second embodiment. In the discrimination data of FIG. 3, one set of speech zone determination unit and speech recognition processing unit are associated with one communication situation. In FIG. 3, “1” and “2” of the “voice segment determination unit” indicate the first voice segment determination unit 8 a and the second voice segment determination unit 8 b, respectively, and “1” of the “voice recognition processing unit”. And "2" indicate the first speech recognition processor 9a and the second speech recognition processor 9b, respectively.

In the configuration in which the data for determination of FIG. 3 is stored in the storage unit 13, for example, when the communication status acquired by the acquisition unit 11 is low quality online, the control unit 12 determines the first voice segment determination unit Control to use 8a and the 2nd speech recognition processing part 9b for recognition of an input speech will be performed. The voice section determination unit such as the first voice section determination unit 8a and the voice recognition processing unit such as the second voice recognition processing unit 9b may perform a plurality of processes at the same time.

<Voice recognition unit and server>
The voice recognition unit 3 includes a first voice section determination unit 8a which is an off-line voice section determination unit, a first voice recognition processing unit 9a which is an off-line voice recognition processing unit, and a first recognition dictionary storage unit 10a. The first recognition dictionary storage unit 10a stores a dictionary used when the first speech recognition processing unit 9a performs speech recognition processing. The voice recognition unit 3 appropriately uses the first voice section determination unit 8a and the first voice recognition processing unit 9a for recognition of the input voice based on the control of the control unit 12.

The server 6 includes a second speech zone determination unit 8b which is an online speech zone judgment unit, a second speech recognition processing unit 9b which is an online speech recognition processing unit, and a second recognition dictionary storage unit 10b. The second recognition dictionary storage unit 10 b stores a dictionary used when the second speech recognition processing unit 9 b performs a speech recognition process. The server 6 appropriately uses the second voice section determining unit 8b and the second voice recognition processing unit 9b to recognize an input voice based on the control of the control unit 12.

In general, the first speech zone determination unit 8a on the speech recognition device 1 side has lower determination accuracy of the speech zone than the second speech zone determination unit 8b on the server 6 side because of hardware limitations. It is possible to make the determination regardless of the communication status. Also, the first speech recognition processing unit 9a on the speech recognition device 1 side has a smaller number of recognizable vocabularies than the second speech recognition processing unit 9b on the server 6 side in general due to hardware limitations. However, it is possible to carry out recognition processing regardless of the communication situation.

Based on this and the discrimination data of FIG. 3, the speech recognition apparatus 1 according to the second embodiment can perform speech recognition suitable for the communication situation as follows.

For example, when the communication status is high quality online, the second voice section determining unit 8b and the second voice recognition processing unit 9b are used according to the determination data of FIG. A large number of speech recognitions can be performed.

When the communication status is low quality online, the first voice section determining unit 8a and the second voice recognition processing unit 9b are used according to the determination data of FIG. As a result, the voice recognition unit 1 performs voice zone determination, and transmits to the server 6 input voice from which data used for voice zone determination is removed, that is, only input voice used for voice recognition processing. Can perform online speech recognition. That is, even if the communication situation is bad, the speech recognition of the server 6 can be performed by reducing the data amount of the input speech to be communicated.

When the communication status is offline, the first voice section determining unit 8a and the first voice recognition processing unit 9a are used according to the determination data of FIG. 3, so voice recognition can be performed regardless of the communication status. .

<Operation>
FIG. 4 is a flowchart showing the operation of the speech recognition apparatus 1 according to the second embodiment. The operation of FIG. 4 is performed as needed.

First, in step S <b> 1, the acquisition unit 11 acquires an input voice and outputs the input voice to the control unit 12.

At step S2, control unit 12 determines whether the intensity of the input voice is equal to or greater than a predetermined threshold. If it is determined that the strength of the input voice is equal to or greater than the threshold, the process proceeds to step S3. If it is determined that the strength of the input voice is less than the threshold, the operation of FIG. 4 ends.

In step S 3, the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6, and outputs the communication status to the control unit 12.

In step S4, the control unit 12 determines one set of voice segment determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.

In step S5, the speech recognition unit 3 and the server 6 recognize the input speech using the set of speech segment determination unit and speech recognition processing unit determined by the control unit 12. Then, the recognition vocabulary as the recognition result is output from the speech recognition device 1. Thereafter, the operation of FIG. 4 ends.

<Summary of Embodiment 2>
According to the speech recognition device 1 according to the second embodiment as described above, the speech segment determination unit and the speech recognition processing unit suitable for the communication situation between the speech recognition device 1 and the server 6 It can be used for

Embodiment 3
FIG. 5 is a block diagram showing the configuration of the speech recognition apparatus 1 according to Embodiment 3 of the present invention. Hereinafter, among the components described in the third embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.

The speech recognition apparatus 1 of FIG. 5 includes a speech recognition method determination unit 2 and a speech recognition unit 3 as in the speech recognition apparatus 1 (FIG. 2) according to the second embodiment. In addition, the recognition start button 21 mentioned later is connected to the speech recognition apparatus 1 of FIG.

<Voice recognition method discrimination unit>
The speech recognition method determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the speech recognition method determination unit 2 according to the second embodiment.

Similar to the acquisition unit 11 according to the second embodiment, the acquisition unit 11 acquires the communication status between the speech recognition apparatus 1 and the server 6 as a recognition related situation.

In addition, the acquisition unit 11 also acquires, as a recognition related situation, the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted. The riding condition includes the presence or absence of the driver and the presence or absence of the passenger who is a passenger other than the driver.

In addition, the acquisition unit 11 may be configured to determine the riding condition based on whether or not the seat belt of the own vehicle is used, and a sensor for detecting the seating of the own vehicle is provided in each seat If yes, the boarding situation may be determined based on the detection result of the sensor. Alternatively, the acquisition unit 11 may be configured to capture an image of the interior of the host vehicle and to determine the riding condition by performing image recognition on the image. Alternatively, the acquisition unit 11 may be an interface that acquires the result of the determination from an external device provided outside the speech recognition apparatus 1. The acquisition unit 11 may appropriately store the acquired communication status and boarding status in the storage unit 13.

Based on the communication status and the boarding status acquired by the acquisition section 11, the control section 12 determines a plurality of speech

segment determination sections

8c, 8d, 8e and 8f determined in advance and a plurality of speech recognition processing sections 9c determined in advance. , 9d, 9e, and 9f, one set of voice section determination unit and voice recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.

The plurality of voice section determination units 8c to 8f are a first voice section determination unit 8c and a second voice section determination unit 8d provided in the voice recognition unit 3, and a third voice section determination unit 8e provided in the server 6. And a fourth speech zone determination unit 8f. The plurality of voice recognition processing units 9c to 9f are a first voice recognition processing unit 9c and a second voice recognition processing unit 9d provided in the voice recognition unit 3, and a third voice recognition processing unit 9e provided in the server 6. And a fourth speech recognition processor 9f.

The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 6 is a diagram showing discrimination data according to the third embodiment. In the discrimination data of FIG. 6, one set of voice section determination unit and voice recognition processing unit are associated with the presence or absence of one set of passenger, the presence or absence of the driver, and the communication status. In FIG. 6, "1" to "4" of the "voice section determination unit" indicate the first voice section determination unit 8c to the fourth voice section determination unit 8f, respectively, and "1" of the "voice recognition processing unit". '4' indicate the first speech recognition processor 9c to the fourth speech recognition processor 9f, respectively.

<Voice recognition unit and server>
The voice recognition unit 3 includes a first voice section determination unit 8c, which is an always-offline voice section determination unit, a first voice recognition processing unit 9c, which is an always-offline voice recognition processing unit, a first recognition dictionary storage unit 10c, A second speech zone judgment unit 8d which is an offline speech zone judgment unit, a second speech recognition processing unit 9d which is a normal offline speech recognition processor, and a second recognition dictionary storage unit 10d. The first and second recognition

dictionary storage units

10c and 10d respectively store dictionaries used when the first and second speech

recognition processing units

9c and 9d perform speech recognition processing. The voice recognition unit 3 appropriately uses the first and second voice

section determination units

8c and 8d and the first and second voice

recognition processing units

9c and 9d based on the control of the control unit 12 for recognition of the input voice. .

The server 6 includes a third voice zone determination unit 8e which is an always online voice zone determination unit, a third voice recognition processing unit 9e which is an always online voice recognition processing unit, a third recognition dictionary storage unit 10e, and a normal online voice It includes a fourth speech section judging section 8f which is a section judging section, a fourth speech recognition processing section 9f which is a normal online speech recognition processing section, and a fourth recognition dictionary storage section 10f. The third and fourth recognition

dictionary storage units

10e and 10f respectively store dictionaries used when the third and fourth speech recognition processing units 9e and 9f perform speech recognition processing. The server 6 appropriately uses the third and fourth speech

zone determination units

8e and 8f and the third and fourth speech recognition processing units 9e and 9f for recognition of the input speech based on the control of the control unit 12.

In addition, although the first and second voice

section determination units

8c and 8d on the voice recognition device 1 side have lower determination accuracy of voice sections than the third and fourth voice

section determination units

8e and 8f on the server 6 side, communication It is possible to make the determination independently of the situation. In addition, although the first and second speech

recognition processing units

9c and 9d on the speech recognition device 1 side have fewer recognizable vocabularies than the third and fourth speech recognition processing units 9e and 9f on the server 6 side, It is possible to perform recognition processing regardless of communication conditions. Based on this and the discrimination data shown in FIG. 6, the speech recognition apparatus 1 according to the third embodiment performs speech recognition suitable for the communication situation as in the speech recognition apparatus 1 according to the second embodiment. Can.

Further, the first and third speech

segment determination units

8c and 8e are speech segment determination units that constantly determine a speech segment, and the second and fourth speech

segment determination units

8d and 8f respond to a predetermined operation. It is a voice section determination unit that determines a voice section. The first and third speech recognition processing units 9c and 9e are speech recognition processing units that always recognize and process speech, and the second and fourth speech

recognition processing units

9d and 9f perform speech in accordance with a predetermined operation. Is a speech recognition processing unit that recognizes and processes. In the third embodiment, the predetermined operation is an operation on the recognition start button 21 for starting voice recognition.

Based on this and the discrimination data of FIG. 6, the speech recognition apparatus 1 according to the third embodiment can perform speech recognition suitable for the riding situation as follows.

For example, when there is a passenger's boarding and there is no driver's boarding, the first voice section judging section 8c or the third voice section judging section 8e and the first voice recognition processing section 9c according to the discrimination data of FIG. Alternatively, the third speech recognition processor 9e is used. Thus, a passenger who desires voice recognition can perform voice recognition without performing an operation on the recognition start button 21. This is convenient for the passenger especially when the recognition start button 21 is provided at a place away from the passenger.

On the other hand, except when there is a passenger's boarding and there is no driver's boarding, the second voice section judging unit 8d or the fourth voice section judging unit 8f and the second voice section judging unit 8f according to the discrimination data of FIG. A voice recognition processor 9d or a fourth voice recognition processor 9f is used. This can suppress unintended speech recognition.

Note that the discrimination data is not limited to the data set as shown in FIG. For example, when there is no passenger's boarding and there is a driver's boarding, it may be desirable for the driver to concentrate on driving. Therefore, when the passenger does not get in and the driver gets in, the first voice section determination unit 8c or the third voice section determination unit 8e, and the first voice recognition processing unit 9c or the third voice The recognition processor 9e may be used.

<Operation>
FIG. 7 is a flowchart showing the operation of the speech recognition apparatus 1 according to the third embodiment. The operation in FIG. 7 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3a. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.

In steps S1 and S2, processing similar to that in steps S1 and S2 of FIG. 4 is performed.

In step S3a, the acquisition unit 11 acquires the communication status between the speech recognition device 1 and the server 6 and outputs the communication state to the control unit 12. Further, the acquisition unit 11 acquires the riding condition of the occupant of the host vehicle on which the voice recognition device 1 is mounted, and outputs the acquired riding condition to the control unit 12.

In step S4, the control unit 12 determines one set of voice section determining unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status and the boarding status. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.

In step S5, the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 7 ends.

Note that, according to the operation of FIG. 7, the control unit 12 determines one set of speech segment determination unit and speech recognition processing unit by one determination. However, the present invention is not limited to this, and the control unit 12 may determine one set of voice section determination unit and voice recognition processing unit by multiple determinations. For example, based on one of the communication status and the boarding status, the control unit 12 may select one or more speech segment determination units and some speech recognition processing units among the plurality of speech segment determination units and the plurality of speech recognition processing units. You may judge. After that, based on the other of the communication status and the boarding status, the control unit 12 sets one voice segment determination unit and one voice recognition process out of several voice segment determination units and several voice recognition processing units. The part may be determined. The same applies to the fourth and subsequent embodiments described later.

<Summary of Embodiment 3>
According to the speech recognition device 1 according to the third embodiment as described above, the speech segment determination unit and the speech recognition processing unit suitable for the communication status and the boarding status can be used for recognition of input speech. In the third embodiment, the control unit 12 uses the voice section determination unit and the voice recognition processing unit for recognizing the input voice, based on both of the communication status and the boarding status acquired by the acquisition unit 11. I took control. However, the present invention is not limited to this, and the control unit 12 does not consider the communication status acquired by the acquisition unit 11, and based on the boarding status acquired by the acquisition unit 11, the voice section determination unit and the speech recognition process The unit may be controlled to use for recognition of input speech.

Fourth Preferred Embodiment
Fourth Embodiment FIG. 8 is a block diagram showing a configuration of a speech recognition device 1 according to a fourth embodiment of the present invention. Hereinafter, among constituent elements described in the fourth embodiment, constituent elements that are the same as or similar to the above constituent elements are given the same reference numerals, and different constituent elements are mainly described.

The speech recognition apparatus 1 shown in FIG. 8 has the same configuration as the speech recognition apparatus 1 (FIG. 5) according to the third embodiment, in which the input device 22 is connected. In the input device 22, an operation (hereinafter, referred to as a "specified operation") for specifying the use of one set of speech segment determination unit and speech recognition processing unit is input.

<Voice recognition method discrimination unit>
The voice recognition system determination unit 2 includes an acquisition unit 11, a control unit 12, and a storage unit 13, as the voice recognition system determination unit 2 according to the third embodiment.

Similar to the acquiring unit 11 according to the third embodiment, the acquiring unit 11 acquires the communication status and the boarding status as the recognition related status.

Here, the designation operation input to the input device 22 is stored in the storage unit 13, and the acquisition unit 11 acquires the history of the designation operation (hereinafter referred to as “operation history”) from the storage unit 13.

When the designation operation is input to the input device 22, the control unit 12 selects one of the plurality of speech segment determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f, which is designated by the designation operation. The voice segment determination unit and the voice recognition processing unit are controlled to be used for recognizing an input voice.

On the other hand, when the designating operation is not input to the input device 22, the control unit 12 sets a plurality of voice section determination units 8c to 8f based on the communication status, the boarding status, and the operation history acquired by the acquisition unit 11. Among the speech recognition processing units 9c to 9f, one set of speech segment determination unit and speech recognition processing unit are determined. Then, the control unit 12 performs control to use one set of the determined voice section determination unit and the voice recognition processing unit for recognition of the input voice.

For example, based on the operation history, the control unit 12 determines whether or not the number of times when the same set of voice section determination unit and voice recognition processing unit is specified is equal to or more than a predetermined threshold. Then, when it is determined that the number of times is equal to or more than the threshold value, the control unit 12 recognizes one set of the voice section determination unit and the voice recognition processing unit designated by the designation operation of the operation history as the input voice. Perform control to use. On the other hand, when it is determined that the above-mentioned number of times is less than the threshold value, the control unit 12 determines one set of voice based on the communication status and the boarding status acquired by the acquisition section 11 as in the third embodiment. The section determination unit and the speech recognition processing unit are controlled to be used for recognition of input speech.

<Operation>
FIG. 9 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fourth embodiment. The operation of FIG. 9 is the same as that of step S3 of the operation of the speech recognition apparatus 1 according to the second embodiment (FIG. 4) except that step S3 is changed to step S3 b and steps S11 and S12 are added. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.

In step S11, control unit 12 determines whether or not a designation operation has been input to input device 22. If it is determined that the designating operation has been input, the process proceeds to step S12. If it is determined that the designating operation has not been input, the process proceeds to step S1.

When the process proceeds from step S11 to step S12, the control unit 12 performs control to use one set of voice segment determination unit and voice recognition processing unit designated by the designation operation for recognition of input voice. Thereby, the speech recognition unit 3 and the server 6 recognize the input speech using the pair of speech segment determination unit and speech recognition processing unit designated by the designation operation. Thereafter, the operation of FIG. 9 ends.

When the process proceeds from step S11 to step S1, processing similar to that of steps S1 and S2 of FIG. 4 is performed.

In step S3b, the acquisition unit 11 acquires the communication status, the boarding status, and the operation history, and outputs the communication status, the boarding status, and the operation history to the control unit 12.

In step S4, the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the operation history. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.

In step S5, the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 9 ends.

<Summary of Embodiment 4>
According to the speech recognition apparatus 1 according to the fourth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used for recognition of input speech based on the recognition related situation and the operation history. . As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.

The Fifth Preferred Embodiment
FIG. 10 is a block diagram showing the configuration of the speech recognition device 1 according to the fifth embodiment of the present invention. Hereinafter, among the components described in the fifth embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.

The acquisition unit 11 also acquires the recognition result of the speech recognition unit 3 and the recognition result of the server 6. Then, the acquisition unit 11 determines, based on the acquired recognition result, whether the voice input in the past, that is, the input voice in the past includes a predetermined command. The configuration is not limited to this as long as the acquisition unit 11 is configured to acquire the determination result as to whether or not a predetermined command is included in the past input voice. For example, this determination may be performed by an external device provided outside the voice recognition device 1, and the acquisition unit 11 may be configured to acquire the determination result from the external device.

In the fifth embodiment, among the commands for executing the function of the application software, the predetermined command includes an upper-level command which is a command for starting the execution. For example, the upper level command includes a command such as "Navigation" for starting execution of a navigation function capable of searching for a destination, a route search, etc. The lower level command that is not the upper level command is a destination. The command “destination search” for executing a search, the command “route search” for executing a route search, and the like are included.

The control unit 12 selects one of the plurality of speech section determination units 8c to 8f and the plurality of speech recognition processing units 9c to 9f based on the communication status, the boarding status, and the determination result acquired by the acquisition unit 11. A control is performed to use a set of speech zone determination unit and speech recognition processing unit for recognition of input speech.

The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 11 is a diagram showing part of the discrimination data according to the fifth embodiment. In FIG. 11, one of the 12 combinations shown in FIG. 6 is divided into whether the determination result includes the upper command or the lower command, and the voice section determining unit and the voice recognition processing unit are set. The voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.

In the example of FIG. 11, a regular voice section determination unit and a regular voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the upper level command. On the other hand, a normal voice section determination unit and a normal voice recognition processing unit are set as the voice section determination unit and the voice recognition processing unit used when the determination result includes the lower order command.

According to the discrimination data as shown in FIG. 11, since constant voice recognition is performed on a relatively small number of upper commands, it is possible to suppress voice recognition against the user's intention. It should be noted that the user can input voices of a plurality of lower level commands subsequently to the voice of the upper level command, and if the determination result is once determined to include the upper level command, the user can enter for a certain period of time. The determination result that the lower order command is included may be invalidated.

<Operation>
FIG. 12 is a flowchart showing the operation of the speech recognition apparatus 1 according to the fifth embodiment. The operation in FIG. 12 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3c. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.

In step S3c, the acquisition unit 11 acquires the communication status, the boarding status, and the determination result of the upper-level command, and outputs the communication status, the boarding status, and the determination result of the upper level command to the control unit 12.

In step S4, the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognition of the input voice according to the determination data based on the communication status, the boarding status, and the determination result. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.

In step S5, the same processing as step S5 in FIG. 4 is performed, and the operation in FIG. 12 ends.

<Summary of Embodiment 5>
According to the speech recognition device 1 according to the fifth embodiment as described above, one set of speech segment judgment unit and speech recognition processing are based on the recognition related situation and the judgment result of whether or not the upper level command is included. Part is used to recognize the input speech. As a result, it is possible to use the speech zone determination unit and the speech recognition processing unit that match the user's usage tendency for recognition of the input speech.

Embodiment 6
The block configuration of the speech recognition device 1 according to the sixth embodiment of the present invention is the same as the block configuration (FIG. 5) of the speech recognition device 1 according to the third embodiment. Hereinafter, among the components described in the sixth embodiment, the same or similar components as or to the components described above are designated by the same reference numerals, and different components will be mainly described.

The acquisition unit 11 also acquires the usage state of the hardware of the speech recognition device 1. In the sixth embodiment, hardware of the speech recognition device 1 is hardware of a device such as a navigation device to which the speech recognition device 1 is applied, and the usage state of the hardware includes the usage rate of the hardware. The usage rate of hardware includes, for example, a usage rate of a central processing unit (CPU), a usage rate of a memory, and the like.

Based on the communication status, the boarding status, and the use status acquired by the acquisition section 11, the control section 12 selects one set out of the plurality of speech segment determination sections 8c to 8f and the plurality of speech recognition processing sections 9c to 9f. The voice section determination unit and the voice recognition processing unit of the above are controlled to be used for recognition of the input voice.

The storage unit 13 stores discrimination data used when the control unit 12 performs discrimination. FIG. 13 is a diagram showing a part of discrimination data according to the sixth embodiment. In FIG. 13, one of the 12 combinations shown in FIG. 6 is divided into whether the usage rate included in the usage state is equal to or more than a predetermined threshold value, and the voice section determination unit and the voice recognition processing unit Are set, but the voice section determination unit and the voice recognition processing unit are similarly set for the other 11 methods.

In the example of FIG. 13, a voice segment determining unit and a voice recognition processing unit used when the usage rate included in the usage state is equal to or more than a predetermined threshold, a normal voice segment determining unit and a normal voice recognition process A section is set, and as a voice section determination section and a voice recognition processing section used when the usage rate is less than the threshold, a regular voice section determination section and a regular voice recognition processing section are set.

According to the discrimination data as shown in FIG. 13, when the hardware usage rate is relatively high, normal voice recognition with low hardware usage rate can be performed. On the other hand, when the hardware usage rate is relatively low, it is possible to perform continuous voice recognition with high hardware usage rate.

<Operation>
FIG. 14 is a flowchart showing the operation of the speech recognition apparatus 1 according to the sixth embodiment. The operation in FIG. 14 is the same as that in step S3 of the operation (FIG. 4) of the speech recognition apparatus 1 according to the second embodiment is changed to step S3d. Hereinafter, only processing different from the processing of the speech recognition apparatus 1 according to the second embodiment will be described.

In step S3d, the acquisition unit 11 acquires the communication status, the boarding status, and the usage status of the hardware, and outputs the acquired status to the control unit 12.

In step S4, the control unit 12 determines one set of voice section determination unit and voice recognition processing unit to be used for recognizing the input voice according to the determination data based on the communication status, the boarding status, and the use status of the hardware. Do. At this time, the control unit 12 outputs the input voice to the voice recognition unit 3 or transmits it to the server 6.

In step S5, the same process as step S5 of FIG. 4 is performed, and the operation of FIG. 14 ends.

<Summary of Embodiment 6>
According to the speech recognition apparatus 1 according to the sixth embodiment as described above, one set of speech segment determination unit and speech recognition processing unit are used to input an input speech based on the recognition related situation and the usage state of the hardware. Used for recognition. As a result, it is possible to use the speech segment determination unit and the speech recognition processing unit suitable for the usage state of the hardware for recognition of the input speech.

<Modification>
In the second embodiment, the control unit 12 performs control to use one set of speech segment determination unit and speech recognition processing unit for recognition of input speech based on the communication status acquired by the acquisition unit 11. Not limited to this. For example, as illustrated in FIG. 15, when the communication status acquired by the acquisition unit 11 is low quality online, the control unit 12 performs the first speech zone judgment unit 8a and the second speech zone judgment unit 8b in parallel. Control may be performed using the first voice recognition processing unit 9a and the second voice section determination unit 8b in parallel. That is, based on the communication status acquired by the acquisition unit 11, the control unit 12 performs control of using a combination of a plurality of speech segment determination units and a combination of a plurality of speech recognition processing units for recognition of input speech. You may go. Furthermore, in other words, based on the communication status acquired by the acquisition unit 11, the control unit 12 uses at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. Control may be performed. The same applies to the third to sixth embodiments and to the designation operation of the third embodiment.

Also, the functions of Embodiments 4 to 6 may be combined as appropriate. That is, the acquisition unit 11 may acquire the recognition related situation, and may acquire at least one of the operation history, the determination result of whether or not the upper-level command is included, and the usage state of the hardware. . Then, the control unit 12 performs at least one of the recognition related situation acquired by the acquisition unit 11 and at least one of the operation history acquired by the acquisition unit 11, the determination result, and the use state. Control may be performed in which one voice segment determination unit and at least one voice recognition processing unit are used to recognize an input voice.

<Other Modifications>
Hereinafter, the acquisition unit 11 and the control unit 12 in FIG. 1 in the above-described speech recognition apparatus 1 will be referred to as “acquisition unit 11 and the like”. The acquisition unit 11 and the like are realized by the processing circuit 81 shown in FIG. That is, the processing circuit 81 obtains a plurality of speech segment determination units determined in advance and a plurality of voices determined in advance based on the acquisition unit 11 acquiring the recognition related situation and the recognition related situation acquired by the acquisition unit 11. Among the recognition processing units, the control unit 12 performs control to use at least one speech segment determination unit and at least one speech recognition processing unit for recognition of input speech. Dedicated hardware may be applied to the processing circuit 81, or a processor that executes a program stored in a memory may be applied. The processor corresponds to, for example, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a DSP (Digital Signal Processor), and the like.

When the processing circuit 81 is dedicated hardware, the processing circuit 81 may be, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), an FPGA (field programmable gate) An array) or a combination thereof is applicable. Each function of each unit such as the acquisition unit 11 may be realized by a circuit in which processing circuits are dispersed, or the function of each unit may be realized by one processing circuit.

When the processing circuit 81 is a processor, the functions of the acquisition unit 11 and the like are realized by a combination with software and the like. The software and the like correspond to, for example, software, firmware, or software and firmware. Software and the like are described as a program and stored in the memory 83. As shown in FIG. 17, the processor 82 applied to the processing circuit 81 implements the functions of the respective units by reading and executing the program stored in the memory 83. That is, when executed by the processing circuit 81, the speech recognition device 1 obtains a recognition related situation, and determines in advance a plurality of predetermined voice interval judging units based on the acquired recognition related situation. Performing control of using at least any one voice segment determination unit and at least one voice recognition processing unit among the plurality of voice recognition processing units to be used for recognizing the input voice; And a memory 83 for storing a program to be executed. In other words, it can be said that this program causes a computer to execute the procedure and method of the acquisition unit 11 and the like. Here, the memory 83 is, for example, non-volatile or non-volatile, such as random access memory (RAM), read only memory (ROM), flash memory, erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM). Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc), its drive device, etc. or any storage medium used in the future May be

In the above, the structure by which each function of the acquisition part 11 grade | etc., Is implement | achieved by either hardware, software, etc. was demonstrated. However, the present invention is not limited to this, and a part of the acquisition unit 11 or the like may be realized by dedicated hardware, and another part may be realized by software or the like. For example, the function of the acquisition unit 11 is realized by the processing circuit 81 and the receiver as dedicated hardware, and the processing circuit 81 as the processor 82 reads out and executes the program stored in the memory 83 for the rest. It is possible to realize the function by that.

As described above, the processing circuit 81 can realize each of the functions described above by hardware, software, etc., or a combination thereof.

Further, the voice recognition device 1 described above includes at least one of a navigation device such as a portable navigation device (PND), a communication terminal including a portable terminal such as a mobile phone, a smartphone and a tablet, a navigation device and a communication terminal. The present invention can also be applied to a speech recognition system constructed as a system by appropriately combining the function of an application installed in one and the server. In this case, each function or each component of the speech recognition apparatus 1 described above may be distributed to each device configuring the system, or may be concentrated to any device. Good.

FIG. 18 is a block diagram showing the configuration of the server 91 according to the present modification. The server 91 of FIG. 18 includes a communication unit 91a and a control unit 91b, and can perform wireless communication with the navigation device 93 of the vehicle 92.

The communication unit 91a, which is an acquisition unit, wirelessly communicates with the navigation device 93 to receive the recognition related situation.

The control unit 91 b has a function similar to that of the control unit 12 of FIG. 1 by executing a program stored in a memory (not shown) of the server 91 by a processor (not shown) of the server 91 or the like. That is, the control unit 91b determines at least one voice segment determination unit and at least one voice recognition processing unit based on the recognition related situation received by the communication unit 91a, and determines the determination result as a navigation device. Send to 93

According to the server 91 configured as described above, the same effect as that of the speech recognition device 1 described in the first embodiment can be obtained.

FIG. 19 is a block diagram showing a configuration of communication terminal 96 according to the present modification. The communication terminal 96 of FIG. 19 includes a communication unit 96a similar to the communication unit 91a and a control unit 96b similar to the control unit 91b, and can communicate wirelessly with the navigation device 98 of the vehicle 97. ing. For the communication terminal 96, for example, a mobile terminal such as a mobile phone carried by the driver of the vehicle 97, a smart phone, and a tablet is applied. According to communication terminal 96 configured as described above, the same effect as that of speech recognition device 1 described in Embodiment 1 can be obtained.

In the present invention, within the scope of the invention, each embodiment and each modification can be freely combined, or each embodiment and each modification can be suitably modified or omitted.

Although the present invention has been described in detail, the above description is an exemplification in all aspects, and the present invention is not limited thereto. It is understood that countless variations not illustrated are conceivable without departing from the scope of the present invention.

Reference Signs List 1 voice recognition apparatus, 6 servers, 8a to 8f voice section determination unit, 9a to 9f voice recognition processing unit, 11 acquisition unit, 12 control unit, 21 recognition start button.

Claims

A speech recognition apparatus capable of recognizing an input speech, which is an input speech, comprising:
An acquisition unit that acquires a situation related to recognition of the input speech;
Based on the situation acquired by the acquisition unit, a plurality of predetermined voice segment determination units capable of determining a voice segment which is a period for recognizing a voice, and the plurality of voice segment determination units determined Among a plurality of predetermined voice recognition processing units capable of processing voice recognition of voice segments, at least one voice segment determination unit and at least one voice recognition processing unit, the input voice And a control unit that performs control used to recognize the voice recognition device.
The speech recognition apparatus according to claim 1, wherein
The situation is
A voice recognition device comprising at least one of a communication status between the voice recognition device and a server, and a riding condition of a passenger of a vehicle equipped with the voice recognition device.
The speech recognition apparatus according to claim 1, wherein
The plurality of voice segment determination units are:
A voice period determination unit that constantly determines the voice period; and a voice period determination unit that determines the voice period according to a predetermined operation;
The plurality of voice recognition processing units are:
A voice recognition apparatus, comprising: a voice recognition processing unit that constantly recognizes voice; and a voice recognition processing unit that recognizes voice according to a predetermined operation.
The speech recognition apparatus according to claim 1, wherein
The acquisition unit
Acquiring a history of operations designating use of the voice section determination unit and the voice recognition processing unit;
The control unit
The at least one speech segment determination unit and the at least one speech recognition processing unit are input based on the situation acquired by the acquisition unit and the history acquired by the acquisition unit. A speech recognition device that performs control used for speech recognition.
The speech recognition apparatus according to claim 1, wherein
The acquisition unit
Further acquire the determination result as to whether the voice input in the past includes a predetermined command or not,
The control unit
The at least one speech segment determination unit and the at least one speech recognition processing unit are based on the situation acquired by the acquisition unit and the determination result acquired by the acquisition unit. A speech recognition device that performs control used to recognize input speech.
The speech recognition apparatus according to claim 1, wherein
The acquisition unit
Further acquiring the usage status of the voice recognition device hardware;
The control unit
The at least one speech segment determination unit and the at least one speech recognition processing unit are based on the situation acquired by the acquisition unit and the use state acquired by the acquisition unit. A speech recognition device that performs control used to recognize input speech.
The speech recognition apparatus according to claim 1, wherein
The acquisition unit
A history of operations for designating use of a voice recognition processing unit and a voice section determination unit, a determination result as to whether or not a voice input in the past includes a predetermined command, and hardware of the voice recognition device Get at least one of the usage status, and
The control unit
The at least one voice segment determination unit based on the situation acquired by the acquisition unit and at least one of the history acquired by the acquisition unit, the determination result, and the usage state And a voice recognition device that performs control to use the at least one voice recognition processing unit for recognition of the input voice.
A speech recognition method capable of recognizing an input speech, which is an input speech, comprising:
Get the situation related to the recognition of the input speech,
Based on the acquired situation, a plurality of predetermined voice section determination units capable of determining a voice section that is a period for recognizing voice, and the voice of the voice section determined by the plurality of voice section determination units And at least one voice segment determination unit and at least one voice recognition processing unit among a plurality of predetermined voice recognition processing units capable of recognizing the input voice for recognition of the input voice To control the voice recognition method.