JP6966374B2

JP6966374B2 - Speech recognition system and computer program

Info

Publication number: JP6966374B2
Application number: JP2018070589A
Authority: JP
Inventors: 信範工藤
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2018-04-02
Filing date: 2018-04-02
Publication date: 2021-11-17
Anticipated expiration: 2038-04-02
Also published as: JP2019184631A

Description

本発明は、ユーザの発話音声を認識する音声認識の技術に関するものである。 The present invention relates to a voice recognition technique for recognizing a user's spoken voice.

ユーザの発話音声を認識する音声認識の技術としては、予め音声認識辞書に登録した各ワードについて、当該ワードが発話音声が表すワードであることの尤もらしさを表す尤度を算定し、尤度が最大のワードを、当該尤度が所定のしきい値を超えたときにのみ、ユーザが発話したワードとして認識する技術が知られている。 As a voice recognition technology for recognizing a user's spoken voice, for each word registered in the voice recognition dictionary in advance, the likelihood of expressing the plausibility that the word is the word represented by the spoken voice is calculated, and the plausibility is calculated. A technique is known in which the maximum word is recognized as a word spoken by a user only when the likelihood exceeds a predetermined threshold value.

また、このような音声認識の技術において、前回のユーザの発話音声に対して行った音声認識において尤度がしきい値を超えるワードが存在せずに認識が失敗し、今回のユーザの発話音声対して行った音声認識において認識されたワードが、前回のユーザの発話音声に対して行った音声認識において尤度が最大のワードであった場合に、前回の音声認識では、今回認識したワードを誤棄却（FA;False Rejection)したものと判定し、より当該ワードが認識されやすい値にしきい値を変更する技術が知られている（たとえば、特許文献１）。 Further, in such a voice recognition technology, in the voice recognition performed for the voice of the previous user, the recognition fails because there is no word whose likelihood exceeds the threshold value, and the voice of the user this time. If the word recognized in the voice recognition performed for the previous user is the word with the highest likelihood in the voice recognition performed for the utterance voice of the previous user, the word recognized this time is used in the previous voice recognition. There is known a technique of determining a false rejection (FA) and changing the threshold value to a value at which the word is more easily recognized (for example, Patent Document 1).

特開２００７-４１３１９号公報Japanese Unexamined Patent Publication No. 2007-41319

上述したしきい値を変更する技術によれば、前回の音声認識と今回の音声認識において、ユーザが同じワードを発話した場合において、前回の音声認識で当該ワードが尤度が最大のワードとして認識されず、今回の音声認識で当該ワードが認識された場合には、前回の音声認識で今回認識したワードを誤棄却しているにも関わらず、しきい値を変更することができず、効率的な誤棄却率（FRR；False Rejection Rate）の低減を行うことができない。 According to the above-mentioned technology for changing the threshold value, when the user speaks the same word in the previous speech recognition and the current speech recognition, the word is recognized as the word with the maximum likelihood in the previous speech recognition. However, if the word is recognized by this voice recognition, the threshold value cannot be changed even though the word recognized this time by the previous voice recognition is erroneously rejected, and the efficiency is high. False rejection rate (FRR) cannot be reduced.

また、上述したしきい値を変更する技術によれば、前回の音声認識で尤度が最大となったワードが前回の音声認識で認識されたワードである場合にはしきい値を変更することとなるが、当該ワードの前回の音声認識における最大の尤度が低い場合、ユーザが前回と今回と同じワードを発話していない可能性、すなわち、前回の音声認識が誤棄却（FA;False Rejection)でない可能性があるため、当該しきい値の変更によって、誤ったワードを当該ワードとして認識してしまう誤受理（FA；False Acceptance)が生じ易くなってしまう。 Further, according to the above-mentioned technique for changing the threshold value, if the word having the maximum likelihood in the previous speech recognition is the word recognized in the previous speech recognition, the threshold value is changed. However, if the maximum likelihood of the word in the previous speech recognition is low, it is possible that the user has not spoken the same word as the previous one, that is, the previous speech recognition was falsely rejected (FA; False Rejection). ), Therefore, the change of the threshold value tends to cause a false acceptance (FA) in which the wrong word is recognized as the word.

そこで、本発明は、音声認識において、できるだけ誤受理率（FAR；False Acceptance Rate)を増加することなく、効率的に誤棄却率（FRR；False Rejection Rate）を低減することを課題とする。 Therefore, it is an object of the present invention to efficiently reduce false rejection rate (FRR) without increasing false acceptance rate (FAR) as much as possible in speech recognition.

前記課題達成のために、本発明は、ワードを音声認識する音声認識システムに、マイクロフォンと、整合の度合いを表す基準値が各々設定された複数のワードが登録された音声認識辞書と、前記音声認識辞書に登録されたワードであって、前記マイクロフォンがピックアップした音声に、当該ワードに設定されている前記基準値よりも低い整合の度合いを表すように設定した予備基準値が表す度合以上高い度合で整合するワードを予備認識ワードとして検出すると共に、前記音声認識辞書に登録されたワードであって、前記マイクロフォンがピックアップした音声に、当該ワードに設定されている前記基準値が表す度合以上高い度合で整合するワードを認識結果として出力する音声認識手段と、前記音声認識手段が、前記予備認識ワードを検出したときに、当該予備認識ワードとして検出されたワードの前記予備認識計数値を１増加し、その後、所定期間経過したならば、当該ワードの予備認識計数値を１減少する予備認識計数手段と、前記音声認識手段が、前記認識結果を出力したときに、当該認識結果として出力されたワードの前記予備認識計数値が所定値以上（ただし、所定値は２以上の整数）である場合に、当該認識結果として出力されたワードの前記基準値を、より低い整合の度合いを表すように変更する基準値変更手段とを備えたものである。 In order to achieve the above object, the present invention has a voice recognition system in which words are recognized by voice, a microphone, a voice recognition dictionary in which a plurality of words in which reference values indicating the degree of matching are set are registered, and the voice. A word registered in the recognition dictionary that is higher than the degree represented by the preliminary reference value set to represent the degree of matching lower than the reference value set for the word in the voice picked up by the microphone. The word that matches in is detected as a preliminary recognition word, and the degree of the word registered in the voice recognition dictionary that is higher than the degree represented by the reference value set in the word in the voice picked up by the microphone. When the voice recognition means that outputs the words matching in the above as the recognition result and the voice recognition means detect the preliminary recognition word, the preliminary recognition count value of the word detected as the preliminary recognition word is incremented by 1. After that, when the predetermined period elapses, the preliminary recognition counting means that decrements the preliminary recognition counting value of the word by 1, and the word that is output as the recognition result when the voice recognition means outputs the recognition result. When the preliminary recognition count value of is greater than or equal to a predetermined value (however, the predetermined value is an integer of 2 or more), the reference value of the word output as the recognition result is changed to indicate a lower degree of matching. It is equipped with a means for changing the reference value.

ここで、このような音声認識システムは、前記基準値変更手段において、認識結果として出力されたワードの前記基準値を、より低い整合の度合いを表すように変更する際に、各ワードの前記予備認識計数値をクリアするように構成してもよい。 Here, in such a voice recognition system, when the reference value changing means changes the reference value of the word output as a recognition result so as to represent a lower degree of matching, the spare of each word. It may be configured to clear the recognition count value.

また、前記課題達成のために、本発明は、ワードを音声認識する音声認識システムに、マイクロフォンと、整合の度合いを表す基準値が各々設定された複数のワードが登録された音声認識辞書と、前記音声認識辞書に登録されたワードであって、前記マイクロフォンがピックアップした音声に、当該ワードに設定されている前記基準値よりも低い整合の度合いを表すように設定した予備基準値が表す度合以上高い度合で整合するワードを予備認識ワードとして検出すると共に、前記音声認識辞書に登録されたワードであって、前記マイクロフォンがピックアップした音声に、当該ワードに設定されている前記基準値が表す度合以上高い度合で整合するワードを認識結果として出力する音声認識手段と、前記音声認識手段が、前記予備認識ワードを検出したときに、当該予備認識ワードとして検出されたワードの前記予備認識計数値を１増加し、その後、所定期間経過したならば、当該ワードの予備認識計数値を１減少する予備認識計数手段と、前記音声認識手段が、前記認識結果を出力したときに、当該認識結果として出力されたワードの前記予備認識計数値が所定値以上（ただし、所定値は２以上の整数）である場合に、当該認識結果として出力された当該ワードの前記基準値を、より低い整合の度合いを表すように変更することを提案する基準値変更提案手段と、ユーザの操作に応じて、前記ワードの前記基準値を変更する基準値編集手段とを備えたものである。 Further, in order to achieve the above-mentioned problems, the present invention comprises a voice recognition system in which words are recognized by voice, a microphone, and a voice recognition dictionary in which a plurality of words in which reference values indicating the degree of matching are set are registered. A word registered in the speech recognition dictionary, which is greater than or equal to the degree represented by a preliminary reference value set to represent a degree of matching lower than the reference value set for the word in the voice picked up by the microphone. A word that matches with a high degree is detected as a preliminary recognition word, and a word registered in the voice recognition dictionary that is more than the degree represented by the reference value set in the word in the voice picked up by the microphone. When the voice recognition means that outputs a word that matches to a high degree as a recognition result and the voice recognition means detects the preliminary recognition word, the preliminary recognition count value of the word detected as the preliminary recognition word is set to 1. When the number increases and then a predetermined period elapses, the preliminary recognition counting means that decreases the preliminary recognition counting value of the word by 1 and the voice recognition means output the recognition result, the recognition result is output. When the preliminary recognition count value of the word is equal to or greater than a predetermined value (however, the predetermined value is an integer of 2 or more), the reference value of the word output as the recognition result indicates a lower degree of matching. It is provided with a reference value change proposal means for proposing such a change, and a reference value editing means for changing the reference value of the word according to a user's operation.

ここで、このような音声認識システムは、前記基準値変更提案手段において、認識結果として出力されたワードの前記基準値を、より低い整合の度合いを表すように変更することを提案する際に、各ワードの前記予備認識計数値をクリアするように構成してもよい。 Here, such a voice recognition system proposes to change the reference value of the word output as a recognition result so as to represent a lower degree of matching in the reference value change proposal means. It may be configured to clear the preliminary recognition count value of each word.

また、以上の各音声認識システムにおいて、前記所定値は３以上の整数とするようにしてもよい。
また、以上の音声認識システムは、自動車に搭載された情報処理システムにおいて音声入力に用いられる音声認識システムであってもよい。
以上のような音声認識システムでは、音声認識システムでは、ワードの認識の直前の期間に、当該認識したワードを発声した音声と類似した音声が複数回入力されている場合にのみ、当該ワードの基準値のより低い整合の度合を表す値への変更、または、当該変更の提案を行う。ここで、このようなワードの認識の直前の期間に、当該認識したワードを発声した音声と類似した音声が複数回入力されている状況は、ユーザが同じワードを認識されるまで繰り返し発声した状況、すなわち、誤棄却（False Rejection；FR）が発生したワードの再発声に対して、当該ワードを正しく認識できた状況である蓋然性が大きい。なお、ワードの認識の直前の期間に、当該認識したワードを発声した音声と類似した音声が３回以上の入力されている状況は、認識したワードを発声した音声と類似した音声が３回以上入力した状況であるので、特に、当該蓋然性が大きい。 Further, in each of the above voice recognition systems, the predetermined value may be an integer of 3 or more.
Further, the above voice recognition system may be a voice recognition system used for voice input in an information processing system mounted on an automobile.
In the above voice recognition system, in the voice recognition system, only when a voice similar to the voice that utters the recognized word is input multiple times in the period immediately before the recognition of the word, the reference of the word is given. Change the value to a value that represents the degree of matching, or propose the change. Here, in the situation immediately before the recognition of such a word, a situation in which a voice similar to the voice in which the recognized word is uttered is input multiple times is a situation in which the user repeatedly utters until the same word is recognized. That is, it is highly probable that the word can be correctly recognized in response to the reoccurrence of the word in which False Rejection (FR) has occurred. In the situation where the voice similar to the voice uttering the recognized word is input three or more times in the period immediately before the recognition of the word, the voice similar to the voice uttering the recognized word is input three times or more. Since it is the input situation, the probability is particularly high.

したがって、以上のような音声認識システムによれば、真に誤棄却（False Rejection；FR）が発生したワードについてのみ、その基準値をより低い整合の度合を表すように変更して、当該ワードを認識されやすくすることができる。よって、誤受理率（FAR；False Acceptance Rate）を増加することなく、誤棄却率（FRR；False Rejection Rate）を低減することができる。 Therefore, according to the speech recognition system as described above, only for a word in which a true false rejection (FR) has occurred, the reference value is changed to indicate a lower degree of matching, and the word is changed. It can be easily recognized. Therefore, the false rejection rate (FRR) can be reduced without increasing the false acceptance rate (FAR).

なお、以上のような音声認識システムでは、ワードの認識の直前の期間に入力した音声が、当該認識したワードを発声した音声と類似した音声であるかどうかを、当該音声が、当該ワードの予備基準値が表す度合以上高い度合で整合しているかどうかで判別しているので、ワードの認識の直前の期間に入力した音声が最も整合したワードが当該認識したワードでなくても、誤棄却（False Rejection；FR）を検出して、基準値のより低い整合の度合を表す値への変更、または、当該変更の提案を行うことができる。また、一方で、ワードの認識の直前の期間に入力した音声が最も整合したワードが当該認識したワードであっても、その整合度が低い場合には、認識したワードを発声した音声と類似した音声として検出しないので、誤って誤棄却（False Rejection；FR）を検出して、基準値の変更や、当該変更の提案を行ってしまうことを抑制できる。 In the voice recognition system as described above, whether or not the voice input in the period immediately before the recognition of the word is similar to the voice that utters the recognized word is determined by the voice as a backup of the word. Since it is determined whether or not the words are matched to a higher degree than the reference value indicates, even if the most matched word is not the recognized word, the voice input in the period immediately before the recognition of the word is erroneously rejected ( False Rejection; FR) can be detected to change the reference value to a value indicating the lower degree of matching, or to propose the change. On the other hand, even if the most consistent word input in the period immediately before the recognition of the word is the recognized word, if the consistency is low, it is similar to the voice uttering the recognized word. Since it is not detected as voice, it is possible to prevent false rejection (FR) from being detected and a change in the reference value or a proposal for the change.

以上のように、本発明によれば、音声認識において、できるだけ誤受理率（FAR；False Acceptance Rate)を増加することなく、効率的に誤棄却率（False Rejection. Rate；FRR）を低減することができる。 As described above, according to the present invention, in speech recognition, the false rejection rate (FRR) can be efficiently reduced without increasing the false acceptance rate (FAR) as much as possible. Can be done.

本発明の実施形態に係る情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information processing system which concerns on embodiment of this invention. 本発明の実施形態に係る音声認識辞書としきい値テーブルを示す図である。It is a figure which shows the voice recognition dictionary and the threshold value table which concerns on embodiment of this invention. 本発明の実施形態に係る音声認識エンジンの音声認識の手法を示す図である。It is a figure which shows the voice recognition method of the voice recognition engine which concerns on embodiment of this invention. 本発明の実施形態に係るFR対応しきい値調整処理を示すフローチャートである。It is a flowchart which shows the FR correspondence threshold value adjustment processing which concerns on embodiment of this invention. 本発明の実施形態に係るFR対応しきい値調整処理の処理例を示す図である。It is a figure which shows the processing example of the FR correspondence threshold value adjustment processing which concerns on embodiment of this invention. 本発明の実施形態に係るしきい値調整画面を示す図である。。It is a figure which shows the threshold value adjustment screen which concerns on embodiment of this invention. ..

以下、本発明の実施形態を、自動車に搭載される情報処理システムへの適用を例にとり説明する。
図示するように、情報処理システムは、データ処理部１、マイクロフォン２、音声入力部３、入力装置４、表示装置５、カメラやオーディオ機器やＧＰＳ受信器等のその他の周辺装置６を備えている。 Hereinafter, an embodiment of the present invention will be described by taking as an example an application to an information processing system mounted on an automobile.
As shown in the figure, the information processing system includes a data processing unit 1, a microphone 2, a voice input unit 3, an input device 4, a display device 5, and other peripheral devices 6 such as a camera, an audio device, and a GPS receiver. ..

ここで、音声入力部３は、マイクロフォン２から入力するユーザの発話音声を音声認識し認識結果をデータ処理部１に出力する。
そして、データ処理部１は、カーナビゲーション機能やミュージックプレイヤ機能やカメラ撮影画像の表示機能などの各種機能を備えており、音声入力部３から入力する認識結果に応じた処理を行う。 Here, the voice input unit 3 recognizes the voice of the user input from the microphone 2 and outputs the recognition result to the data processing unit 1.
The data processing unit 1 is provided with various functions such as a car navigation function, a music player function, and a camera-captured image display function, and performs processing according to the recognition result input from the voice input unit 3.

次に、また、音声入力部３は、音声認識エンジン３１、音声認識辞書３２、しきい値テーブル３３、音声入力制御部３４を備えている。
ここで、このような情報処理システムは、CPUやメモリや周辺デバイスなどを備えたコンピュータを用いて構成されるものであってよく、この場合、上述したデータ処理部１や音声入力部３は、CPUがコンピュータプログラムを実行することにより実現されるものであってよい。 Next, the voice input unit 3 also includes a voice recognition engine 31, a voice recognition dictionary 32, a threshold table 33, and a voice input control unit 34.
Here, such an information processing system may be configured by using a computer including a CPU, a memory, a peripheral device, and the like, and in this case, the above-mentioned data processing unit 1 and voice input unit 3 may be configured. It may be realized by the CPU executing a computer program.

次に、図２ａに示すように、音声認識辞書３２には、音声認識エンジン３１において認識の対象とする複数のワードと、当該ワードの識別番号（No.）とが登録されている。
また、しきい値テーブル３３には、音声認識エンジン３１において認識の対象とする各ワードについて、そのワードの識別番号（No.）と、そのワードのしきい値Thと、そのワードのしきい値の調整を行うか否かを示す調整有無が登録されている。 Next, as shown in FIG. 2a, a plurality of words to be recognized by the voice recognition engine 31 and an identification number (No.) of the words are registered in the voice recognition dictionary 32.
Further, in the threshold value table 33, for each word to be recognized by the voice recognition engine 31, the identification number (No.) of the word, the threshold value Th of the word, and the threshold value of the word are displayed. The presence or absence of adjustment indicating whether or not to perform the adjustment of is registered.

次に、音声認識エンジン３１で行う音声認識の動作について説明する。
音声認識エンジン３１は、マイクロフォン２から入力する音声である認識対象音声の入力と並行して、認識対象音声に対する音声認識辞書３２に格納された各ワードのスコアを算定する。 Next, the operation of voice recognition performed by the voice recognition engine 31 will be described.
The voice recognition engine 31 calculates the score of each word stored in the voice recognition dictionary 32 for the recognition target voice in parallel with the input of the recognition target voice which is the voice input from the microphone 2.

ここで、認識対象音声に対する音声認識辞書３２に登録された各ワードのスコアは、当該ワードと認識対象音声が表す語句との相違の大きさの予測値を表すものであり、より大きい相違を予測しているときほど、スコアはより大きくなる。 Here, the score of each word registered in the speech recognition dictionary 32 for the recognition target voice represents a predicted value of the magnitude of the difference between the word and the phrase represented by the recognition target voice, and predicts a larger difference. The more you do, the higher your score.

より具体的には、スコアの算定は、予め定めておいた初期値をスコアとして設定した上で、認識対象音声の各音声区間（たとえば、音素毎の音声区間）の音が入力する度に、当該音声区間の音と、音声認識辞書３２に登録されている各ワードの当該音声区間に対応する部分の発音との整合の有無を算定し、整合していればスコアを所定値減少し、整合していなければスコアを所定値増加することにより行う。なお、認識対象音声の音声区間毎のスコアの増加値／減少値は、たとえば、当該音声区間のワードの全音声区間に対する割合を、スコアの初期値に乗じた大きさとする。 More specifically, in the calculation of the score, after setting a predetermined initial value as the score, each time the sound of each voice section of the recognition target voice (for example, the voice section for each phonetic element) is input, the score is calculated. Whether or not there is a match between the sound of the voice section and the pronunciation of the part corresponding to the voice section of each word registered in the voice recognition dictionary 32 is calculated, and if they match, the score is reduced by a predetermined value and the match is made. If not, the score is increased by a predetermined value. The increase / decrease value of the score for each voice section of the voice to be recognized is, for example, the value obtained by multiplying the ratio of the words in the voice section to the total voice section by the initial value of the score.

このような音声認識によれば、認識対象音声が「あいうえおか」であるときに、ワード「あいうえお」に対して算出されるスコアの推移を図３ａに示し、ワード「あいうあい」に対して算出されるスコアの推移を図３ｂに示すように、ワードと一致する認識対象音声の音が入力されている間は、ワードとのスコアは順次減少しワードと一致しない認識対象音声の音が入力されている間はワードのスコアは順次増加する。 According to such voice recognition, when the recognition target voice is "aiueoka", the transition of the score calculated for the word "aiueo" is shown in FIG. 3a, and it is calculated for the word "aiueoka". As shown in FIG. 3b, the score with the word gradually decreases and the sound of the recognition target voice that does not match the word is input while the sound of the recognition target voice that matches the word is input. During this time, the ward score will gradually increase.

すなわち、たとえば、図３ａに示したように、認識対象音声「あいうえおか」と、ワード「あいうえお」とスコアは、認識対象音声の「あいうえお」の音が入力されている期間は順次減少し、その後、認識対象音声の「か」が入力されると増加する。 That is, for example, as shown in FIG. 3a, the recognition target voice "aiueoka", the word "aiueo", and the score gradually decrease during the period in which the recognition target voice "aiueo" sound is input, and then gradually decrease. , Increases when the "ka" of the recognition target voice is input.

また、同様に、図３ｂに示したように、認識対象音声「あいうえおか」と、ワード「あいうあい」とスコアは、認識対象音声の「あいう」の音が入力されている期間は順次減少し、その後の、認識対象音声の「えおか」が入力されている期間は順次増加する。 Similarly, as shown in FIG. 3b, the recognition target voice "aiueoka", the word "aiai", and the score gradually decrease during the period in which the recognition target voice "ai" sound is input. After that, the period during which the recognition target voice "Eoka" is input gradually increases.

さて、音声認識エンジン３１は、以上のようにして算出される認識対象音声といずれかのワードとのスコアが、そのワードのしきい値テーブル３３に登録されているしきい値Th以下となったならば、当該スコアがしきい値Th以下となったワードを認識ワードとして音声入力制御部３４に出力し、音声入力制御部３４は音声認識エンジン３１から出力された認識ワードを認識結果としてデータ処理部１に出力する。 By the way, in the voice recognition engine 31, the score of the recognition target voice calculated as described above and any word is equal to or less than the threshold value Th registered in the threshold value table 33 of the word. If so, the word whose score is equal to or less than the threshold value Th is output to the voice input control unit 34 as a recognition word, and the voice input control unit 34 processes the recognition word output from the voice recognition engine 31 as a recognition result. Output to part 1.

すなわち、たとえば、図３ａに示したワード「あいうえお」の場合では、ワード「あいうえお」についてのスコアは、認識対象音声の「あいうえおか」の「え」が入力されるとしきい値Th以下となるので、この時点で、認識ワード「あいうえお」が出力される。 That is, for example, in the case of the word "aiueo" shown in FIG. 3a, the score for the word "aiueo" becomes equal to or less than the threshold value Th when the "e" of the recognition target voice "aiueoka" is input. At this point, the recognition word "aiueo" is output.

一方、図３ｂに示したワード「あいうあい」の場合では、ワード「あいうあい」についてのスコアがのしきい値Th以下となることはないので、このワード「あいうあい」は認識ワードとして出力されない。 On the other hand, in the case of the word "aiai" shown in FIG. 3b, the score for the word "aiai" is never equal to or less than the threshold value Th, so this word "aiai" is not output as a recognition word. ..

また、音声認識エンジン３１は、以上のようにして算出される認識対象音声といずれかのワードとのスコアが、そのワードに対して設定される予備認識しきい値Pth以下となったならば、当該スコアが予備認識しきい値Pth以下となったワードを予備認識し予備認識ワードとして音声入力制御部３４に出力する処理も行う。 Further, if the voice recognition engine 31 has a score of the recognition target voice calculated as described above and one of the words equal to or less than the preliminary recognition threshold Pth set for the word, the voice recognition engine 31 determines. It also performs a process of pre-recognizing a word whose score is equal to or less than the pre-recognition threshold value Pth and outputting it to the voice input control unit 34 as a pre-recognition word.

ここで、各ワードの予備認識しきい値Pth以下は、そのワードのしきい値テーブル３３に登録されているしきい値Thに所定値を加算した値、または、そのワードのしきい値テーブル３３に登録されているしきい値Thを、当該しきい値Thの所定割合分増加したものとする。 Here, the value below the preliminary recognition threshold Pth of each word is the value obtained by adding a predetermined value to the threshold Th registered in the threshold table 33 of that word, or the threshold table 33 of that word. It is assumed that the threshold value Th registered in is increased by a predetermined ratio of the threshold value Th.

このように予備認識しきい値Pthを設定することにより、たとえば、図３ａに示したワード「あいうえお」の場合では、認識対象音声の「あいうえおか」の「え」が入力されてスコアがしきい値Th以下となって認識ワード「あいうえお」が出力される前に、認識対象音声の「あいうえおか」の「う」が入力された時点でスコアが予備認識しきい値PTh以下となってワード「あいうえお」が予備認識され予備認識ワード「あいうえお」が出力される。 By setting the preliminary recognition threshold value Pth in this way, for example, in the case of the word "aiueo" shown in FIG. 3a, the "e" of the recognition target voice "aiueoka" is input and the score is high. Before the recognition word "aiueo" is output when the value is Th or less, the score becomes less than or equal to the preliminary recognition threshold PTh when the "u" of the recognition target voice "aiueoka" is input, and the word "aiueo" "Aiueo" is pre-recognized and the pre-recognition word "aiueo" is output.

一方、図３ｂに示したワード「あいうあい」は、スコアがしきい値Th以下とならず認識ワードとして出力されることはないが、認識対象音声の「あいうえおか」の「う」が入力された時点でスコアが予備認識しきい値PTh以下となってワード「あいうあい」が予備認識され予備認識ワード「あいうあい」が出力される
次に、音声入力制御部３４が誤棄却率（FRR；False Rejection Rate）を低減するために行うFR対応しきい値調整処理について説明する。 On the other hand, the word "aiuei" shown in FIG. 3b is not output as a recognition word because the score does not fall below the threshold value Th, but the "u" of the recognition target voice "aiueoka" is input. At that point, the score becomes equal to or less than the preliminary recognition threshold PTh, the word "aiai" is pre-recognized, and the preliminary recognition word "aiai" is output. Next, the voice input control unit 34 receives a false rejection rate (FRR; The FR-compatible threshold adjustment process performed to reduce False Rejection Rate) will be described.

図４に、このFR対応しきい値調整処理の手順を示す。
図示するように、音声入力制御部３４は、FR対応しきい値調整処理において、音声認識エンジン３１からの予備認識ワードの出力の発生と（ステップ４０２）、後述するタイマのタイムアウトの発生と（ステップ４０４）、音声認識エンジン３１からの認識ワードの出力の発生と（ステップ４０６）を監視する。 FIG. 4 shows the procedure of this FR-compatible threshold adjustment process.
As shown in the figure, the voice input control unit 34 generates a preliminary recognition word output from the voice recognition engine 31 (step 402) and a timer timeout described later (step 402) in the FR-compatible threshold adjustment process. 404), the generation of the output of the recognition word from the speech recognition engine 31 and (step 406) are monitored.

そして、ステップ４０２、４０４、４０６の監視中に、予備認識ワードの出力が発生したならば、予備認識されたワード（予備認識ワードとなっているワード）のしきい値テーブル３３に登録されている調整有無が調整有りとなっているかどうかを調べ（ステップ４１２）、調整有りとなっていなければステップ４０２、４０４、４０６の監視に戻る。 Then, if the output of the preliminary recognition word occurs during the monitoring of steps 402, 404, and 406, it is registered in the threshold table 33 of the preliminary recognition word (the word that is the preliminary recognition word). It is checked whether or not the adjustment is made (step 412), and if the adjustment is not made, the process returns to the monitoring of steps 402, 404, and 406.

なお、しきい値テーブル３３に登録されている調整有無の初期値は全てのワードについて調整有りとなっている。
一方、予備認識されたワードのワードのしきい値テーブル３３に登録されている調整有無が調整有りとなっている場合には（ステップ４１２）、予備認識されたワードに対してフラグをセットし（ステップ４１４）、セットしたフラグに対応づけたタイマをスタートし（ステップ４１６）、ステップ４０２、４０４、４０６の監視に戻る。ここで、ステップ４１６でスタートするタイマは、所定時間（たとえば、１０秒）がタイムアウト時間として設定されている。ただし、タイマのタイムアウト時間は、予備認識されたワードの長さ（文字数）に応じて、長さが長いワードほどタイムアウト時間長が大きくなるように設定するようにしてもよい。 The initial value of the presence / absence of adjustment registered in the threshold table 33 is adjusted for all words.
On the other hand, when the presence / absence of adjustment registered in the threshold table 33 of the pre-recognized word is adjusted (step 412), a flag is set for the pre-recognized word (step 412). Step 414), the timer associated with the set flag is started (step 416), and the process returns to the monitoring of steps 402, 404, and 406. Here, in the timer started in step 416, a predetermined time (for example, 10 seconds) is set as the timeout time. However, the timeout time of the timer may be set so that the longer the word, the larger the timeout time length, according to the length (number of characters) of the pre-recognized words.

次に、ステップ４０２、４０４、４０６の監視中に、いずれかのタイマのタイムアウトが発生した場合には（ステップ４０４）、タイムアウトが発生したタイマに対応づけられているフラグをクリアする（ステップ４２２）。そして、ステップ４０２、４０４、４０６の監視に戻る。 Next, if a timer timeout occurs during the monitoring of steps 402, 404, and 406 (step 404), the flag associated with the timer for which the timeout has occurred is cleared (step 422). .. Then, the process returns to the monitoring of steps 402, 404, and 406.

次に、ステップ４０２、４０４、４０６の監視中に、音声認識エンジン３１からの認識ワードの出力が発生した場合には（ステップ４０６）、認識されたワード（認識ワードとなっているワード）のワードのしきい値テーブル３３に登録されている調整有無が調整有りとなっているかどうかを調べ（ステップ４３２）、調整有りとなっていなければステップ４０２、４０４、４０６の監視に戻る。 Next, if the recognition word is output from the voice recognition engine 31 during the monitoring of steps 402, 404, and 406 (step 406), the recognized word (word that is the recognition word) is word. It is checked whether or not the adjustment is made in the threshold table 33 of the above (step 432), and if the adjustment is not made, the process returns to the monitoring of steps 402, 404, and 406.

一方、認識されたワードのワードのしきい値テーブル３３に登録されている調整有無が調整有りとなっている場合には（ステップ４３２）、認識されたワードに対してセットされているフラグ数が所定値ｎ（ｎはたとえば３）以上であるかどうかを調べ（ステップ４３４）、所定値ｎ以上でなければ、ステップ４０２、４０４、４０６の監視に戻る。 On the other hand, when the presence / absence of adjustment registered in the threshold table 33 of the recognized word is adjusted (step 432), the number of flags set for the recognized word is It is checked whether or not the predetermined value n (n is, for example, 3) or more (step 434), and if it is not the predetermined value n or more, the process returns to the monitoring of steps 402, 404, and 406.

一方、認識されたワードに対してセットされているフラグ数が所定値ｎ以上であれば（ステップ４３４）、認識されたワードに対してしきい値テーブル３３に登録されているしきい値Thを、所定値分増加する（ステップ４３６）。 On the other hand, if the number of flags set for the recognized word is a predetermined value n or more (step 434), the threshold value Th registered in the threshold table 33 for the recognized word is set. , Increase by a predetermined value (step 436).

そして、現時点でセットされている各ワードのフラグの全てをクリアし（ステップ４３８）ステップ４０２、４０４、４０６の監視に戻る。
以上、音声入力制御部３４が行うFR対応しきい値調整処理について説明した。
なお、以上のFR対応しきい値調整処理において、各ワードのフラグは、フラグ数がカウント値を表す当該ワードのカウンタとして機能しており、以上のFR対応しきい値調整処理は、ステップ４１４で予備認識されたワードのカウンタを１増加する処理とし、ステップ４１６を、予備認識されたワードに対応づけたタイマをスタートする処理とし、ステップ４２２を、タイムアウトが発生したタイマに対応づけられているワードのカウンタをクリアする処理とし、ステップ４３８を、全てのワードのカウンタをクリアする処理とすると共に、認識されたワードのカウンタのカウンタ値が、当該ワードのフラグ数を表すものとしてステップ４３４を行うようにしても等価である。 Then, all the flags of each word set at the present time are cleared (step 438), and the process returns to the monitoring of steps 402, 404, and 406.
The FR-compatible threshold adjustment process performed by the voice input control unit 34 has been described above.
In the above FR-compatible threshold adjustment process, the flag of each word functions as a counter for the word whose number of flags represents the count value, and the above FR-compatible threshold adjustment process is performed in step 414. The counter of the pre-recognized word is incremented by 1, step 416 is the process of starting the timer associated with the pre-recognized word, and step 422 is the process associated with the timer in which the timeout has occurred. Step 438 is a process of clearing the counters of all words, and step 434 is performed assuming that the counter value of the counter of the recognized word represents the number of flags of the word. Even so, it is equivalent.

なお、このようにFR対応しきい値調整処理をカウンタを用いて行う場合、以下の説明においては、各ワードのカウンタのカウンタ値が、当該ワードのフラグ数を表すものとして取り扱う。 When the FR-compatible threshold adjustment process is performed using a counter in this way, in the following description, the counter value of the counter of each word is treated as representing the number of flags of the word.

ここで、図５に、このようなFR対応しきい値調整処理の処理例を示す。
図示した例は、ユーザが「ちずかくだい」と発話しても何のワードも認識されないため、再度、「ちずかくだい」と発話することを繰り返した結果、ユーザの３度目の「ちずかくだい」との発話に対してワード「ちずかくだい」が認識された場合についてのものである。 Here, FIG. 5 shows a processing example of such an FR-compatible threshold adjustment process.
In the illustrated example, no word is recognized even if the user says "Chizukakudai", so as a result of repeating saying "Chizukakudai" again, the user's third "Chizukakudai" It is about the case where the word "Chizukakudai" is recognized for the utterance.

すなわち、同じワードを表すに対して、誤棄却（FR；False Rejection）が二度繰り返された後に、当該ワードを表す３度目の発話に対して正しく当該ワードを認識できた場合についてのものである。 That is, it is a case where the word can be correctly recognized for the third utterance representing the word after the false rejection (FR) is repeated twice for the same word. ..

また、この例では、図４に示したFR対応しきい値調整処理出用いるタイマのタイムアウト時間は１０秒であり、所定値ｎは３であるものとしている。
この場合、図示するように、「ちずかくだい」をユーザが発話した音声がマイクロフォン２から、認識対象音声として、音声認識エンジン３１に３度繰り返し入力する。
この場合、音声認識エンジン３１は、１度目の認識対象音声「ちずかくだい」と２度目の認識対象音声「ちずかくだい」については、いずれのワードのスコアとしても、当該ワードのしきい値Th以下となるスコアを算出せず、認識ワードを出力しない。そして、その後、音声認識エンジン３１は、３度目の認識対象音声「ちずかくだい」については、音声認識辞書３２に登録された各ワードのうちのワード「ちずかくだい」に対して、最初に、ワードのしきい値Th以下となるスコアを算出し、認識ワード「ちずかくだい」を出力する(t4)。 Further, in this example, it is assumed that the timeout time of the timer used for the FR-corresponding threshold adjustment process shown in FIG. 4 is 10 seconds, and the predetermined value n is 3.
In this case, as shown in the figure, the voice spoken by the user "Chizukakudai" is repeatedly input from the microphone 2 to the voice recognition engine 31 as the recognition target voice three times.
In this case, the voice recognition engine 31 sets the threshold value Th of the first recognition target voice "Chizukakudai" and the second recognition target voice "Chizukakudai" as the score of either word. The following scores are not calculated and the recognition word is not output. Then, after that, the voice recognition engine 31 first, for the third recognition target voice "Chizukakudai", with respect to the word "Chizukakudai" among the words registered in the voice recognition dictionary 32. The score that is less than or equal to the word threshold Th is calculated, and the recognition word "Chizukakudai" is output (t4).

一方、ワード「ちずかくだい」を発話した音声は少なくとも各回の認識対象音声「ちずかくだい」と類似しているので、音声認識エンジン３１は、１度目の認識対象音声「ちずかくだい」と２度目の認識対象音声「ちずかくだい」について、ワード「ちずかくだい」のスコアとしてワード「ちずかくだい」の予備認識しきい値PTh以下となるスコアを算出し、予備認識ワード「ちずかくだい」を出力する（t1,t2)。また、３度目の認識対象音声「ちずかくだい」についても、認識ワード「ちずかくだい」を出力する前に、ワード「ちずかくだい」のスコアとしてワード「ちずかくだい」の予備認識しきい値PTh以下となるスコアを算出し、予備認識ワード「ちずかくだい」を出力する(t3)。そして、音声入力制御部３４は、予備認識ワード「ちずかくだい」が出力されるたび、ワード「ちずかくだい」に対するフラグをセットし、セット後、１０秒間セット状態のまま維持する。 On the other hand, since the voice uttering the word "Chizukakudai" is at least similar to the recognition target voice "Chizukakudai" each time, the voice recognition engine 31 says the first recognition target voice "Chizukakudai" 2 For the second recognition target voice "Chizukakudai", the score that is equal to or less than the preliminary recognition threshold PTh of the word "Chizukakudai" is calculated as the score of the word "Chizukakudai", and the preliminary recognition word "Chizukakudai" Is output (t1, t2). Also, for the third recognition target voice "Chizukakudai", the preliminary recognition threshold value of the word "Chizukakudai" is used as the score of the word "Chizukakudai" before the recognition word "Chizukakudai" is output. Calculate the score below PTh and output the preliminary recognition word "Chizukakudai" (t3). Then, each time the voice input control unit 34 outputs the preliminary recognition word "Chizukakudai", the voice input control unit 34 sets a flag for the word "Chizukakudai" and maintains the set state for 10 seconds after setting.

なお、音声認識エンジン３１は、各回の認識対象音声「ちずかくだい」に対して、ワード「ちずかくだい」以外の、認識対象音声「ちずかくだい」に所定レベル以上、発話が類似する他のワードについても、当該他のワードのスコアとして当該他のワードの予備認識しきい値PTh以下となるスコアを算出し、当該他のワードを予備認識ワードとして出力し、音声入力制御部３４は、当該他のワードが予備認識ワードとして出力されるたび、当該他のワードに対するフラグをセットし、セット後、１０秒間セット状態のまま維持する。 In addition, the voice recognition engine 31 has a predetermined level or higher and similar utterances to the recognition target voice "Chizukakudai" other than the word "Chizukakudai" for each recognition target voice "Chizukakudai". For words as well, a score that is equal to or less than the preliminary recognition threshold PTh of the other word is calculated as the score of the other word, the other word is output as the preliminary recognition word, and the voice input control unit 34 is concerned. Whenever another word is output as a preliminary recognition word, the flag for the other word is set, and after setting, the set state is maintained for 10 seconds.

そして、時刻t4において、認識ワード「ちずかくだい」が出力されたならば、音声入力制御部３４は、認識ワードとして出力されたワード「ちずかくだい」に対してセットされているフラグの数を調べ、図示した例では、所定値ｎである３以上であるので、ワード「ちずかくだい」のしきい値Thを増加する。ただし、時刻t1最初の予備認識ワード「ちずかくだい」の出力から、時刻t4の認識ワード「ちずかくだい」の出力までの時間は、タイムアウト時間の１０秒以内であったものとする。 Then, if the recognition word "Chizukakudai" is output at time t4, the voice input control unit 34 sets the number of flags set for the word "Chizukakudai" output as the recognition word. In the example investigated and illustrated, since it is 3 or more, which is a predetermined value n, the threshold value Th of the word “chizukakudai” is increased. However, it is assumed that the time from the output of the first preliminary recognition word "Chizukakudai" at time t1 to the output of the recognition word "Chizukakudai" at time t4 is within 10 seconds of the timeout time.

なお、このようなFR対応しきい値調整処理では、１度目の認識対象音声「ちずかくだい」と２度目の認識対象音声「ちずかくだい」に対して、ワード「ちずかくだい」を含む複数のワードが予備認識ワードとして出力されており、予備認識ワードとして出力されたワードのスコアのうちで、ワード「ちずかくだい」のスコアが最小でない場合であっても、認識ワード「ちずかくだい」が認識されたときに、ワード「ちずかくだい」のしきい値Thの増加は行われる。すなわち、以上のようなFR対応しきい値調整処理によれば、１度目の認識対象音声と２度目の認識対象音声の双方に対して予備認識ワードとして出力されたワードが複数存在する場合、その複数のワードのうちの、３度目の認識対象音声に対して認識ワードとして出力されたワードのしきい値Thの増加が行われる。 In such FR-compatible threshold adjustment processing, a plurality of recognition target voices "Chizukakudai" including the word "Chizukakudai" are included for the first recognition target voice "Chizukakudai" and the second recognition target voice "Chizukakudai". Is output as a preliminary recognition word, and even if the score of the word "Chizukakudai" is not the minimum among the scores of the words output as the preliminary recognition word, the recognition word "Chizukakudai" When is recognized, the threshold Th of the word "Chizukakudai" is increased. That is, according to the FR-corresponding threshold adjustment process as described above, when there are a plurality of words output as preliminary recognition words for both the first recognition target voice and the second recognition target voice, the word is specified. The threshold value Th of the word output as the recognition word is increased for the third recognition target voice among the plurality of words.

そして、このようなワード「ちずかくだい」のしきい値Thの増加により、ワード「ちずかくだい」はより認識されやすくなり、以降、ワード「ちずかくだい」の誤棄却率（FRR；False Rejection Rate）は低減する。 Then, by increasing the threshold value Th of the word "Chizukakudai", the word "Chizukakudai" becomes more easily recognized, and thereafter, the false rejection rate (FRR) of the word "Chizukakudai" becomes easier to recognize. Rate) decreases.

たとば、ワード「ちずかくだい」のしきい値Thを、図４中Xの値まで増加させれば、図４の１度目や２度目の認識対象音声「ちずかくだい」と同じ認識対象音声に対して、ワード「ちずかくだい」を認識ワードとして認識できるようになる。 For example, if the threshold value Th of the word "Chizukakudai" is increased to the value of X in Fig. 4, the same recognition target voice as the first and second recognition target voice "Chizukakudai" in Fig. 4 On the other hand, the word "Chizukakudai" can be recognized as a recognition word.

さて、ここで、認識ワードとして認識されたワードと同じワードが予備認識ワードとして検出された音声は、当該認識ワードを発話した音声と類似した音声である。
そして、以上に説明してきたようにFR対応しきい値調整処理では、認識ワードの認識の直前の期間に、当該認識ワードを発話した音声と類似した音声（予備認識ワードが検出された音声）が複数回入力されている場合にのみ、当該ワードのしきい値Thの増加を行う。また、このような認識ワードの認識の直前の期間に、当該認識したワードを発話した音声と類似した音声が複数回入力されている状況は、ユーザが同じワードを認識されるまで繰り返し発話した状況、すなわち、誤棄却（FR；False Rejection）が発生したワードの再発話に対して、当該ワードを正しく認識ワードとして認識できた状況である蓋然性が大きい。なお、認識ワードの認識の直前の期間に、当該認識ワードを発話した音声と類似した音声が３回以上の入力されている状況は、認識ワードを発話した音声と類似した音声が３回以上入力した状況であるので、特に、当該蓋然性が大きい。 Here, the voice in which the same word as the recognized word is detected as the preliminary recognition word is a voice similar to the voice in which the recognition word is uttered.
Then, as described above, in the FR-compatible threshold adjustment process, in the period immediately before the recognition of the recognition word, a voice similar to the voice uttering the recognition word (voice in which the preliminary recognition word is detected) is produced. The threshold value Th of the word is increased only when it is input multiple times. In addition, in the period immediately before the recognition of such a recognized word, the situation in which a voice similar to the voice uttering the recognized word is input multiple times is a situation in which the user repeatedly utters until the same word is recognized. That is, it is highly probable that the word can be correctly recognized as a recognition word for the reoccurrence of the word in which the false rejection (FR) has occurred. In the situation where the voice similar to the voice uttering the recognition word is input three or more times in the period immediately before the recognition of the recognition word, the voice similar to the voice uttering the recognition word is input three times or more. Since this is the situation, the probability is particularly high.

したがって、真に誤棄却（FR；False Rejection）が発生したワードについてのみ、しきい値Thの増加を行って、当該ワードを認識されやすくすることができ、誤受理率（FAR；False Acceptance Rate）を増加することなく、誤棄却率（FRR；False Rejection Rate）を低減することができる。 Therefore, it is possible to increase the threshold value Th only for words for which false rejection (FR) has truly occurred to make the word easier to recognize, and false acceptance rate (FAR). False rejection rate (FRR) can be reduced without increasing the false rejection rate (FRR).

なお、以上のように認識ワードの認識の直前の期間に入力した音声が、当該認識した認識ワードを発話した音声と類似した音声であるかどうかを、当該音声に対して、当該認識ワードの予備認識しきい値PTh以下のスコアが算出されるかどうかで判別しているので、認識ワードの認識の直前の期間に入力した音声が最も整合したワードが当該認識ワードと同じワードでなくても、誤棄却（FR；False Rejection）を検出して、当該ワードのしきい値Thの増加を行うことができる。また、一方で、認識ワードの認識の直前の期間に入力した音声が最も整合したワードが当該認識ワードと同じワードであっても、そのスコアが予備認識しきい値PTh以下とならない場合には、当該音声を認識ワードを発話した音声と類似した音声として検出しないので、誤って誤棄却（FR；False Rejection）を検出して、当該ワードのしきい値Thの増加を行ってしまうことは抑制される。 It should be noted that, as described above, whether or not the voice input in the period immediately before the recognition of the recognition word is similar to the voice in which the recognized recognition word is spoken is determined with respect to the voice as a preliminary for the recognition word. Since it is determined by whether or not a score equal to or lower than the recognition threshold PTh is calculated, even if the word with the best matching voice input in the period immediately before the recognition of the recognition word is not the same word as the recognition word. False rejection (FR) can be detected and the threshold Th of the word can be increased. On the other hand, if the word with the best matching voice input in the period immediately before the recognition of the recognition word is the same word as the recognition word, but the score does not fall below the preliminary recognition threshold PTh. Since the voice is not detected as a voice similar to the voice spoken by the recognition word, it is suppressed that the false rejection (FR) is mistakenly detected and the threshold Th of the word is increased. NS.

さて、ここで、以上のFR対応しきい値調整処理では、ステップ４３６で、認識ワードとして認識されたワードに対してしきい値テーブル３３に登録されているしきい値Thを所定値分増加したが、ステップ４３６は、たとえば、”「ちずかくだい」は、しきい値を増加すると認識されやすくなります”といったような、認識ワードとして認識されたワードのしきい値の増加を促すメッセージを表示もしくは音声出力する処理としてもよい。ただし、この場合には、データ処理部１に、ユーザ操作に応じて、しきい値テーブル３３の各ワードのしきい値Thを変更する機能を設け、ユーザが自身でワードのしきい値を調整できるようにする。 Here, in the above-mentioned FR-corresponding threshold adjustment process, the threshold Th registered in the threshold table 33 is increased by a predetermined value with respect to the word recognized as the recognition word in step 436. However, step 436 displays a message prompting the threshold of the word recognized as a recognized word, for example, "" Chizukakudai "becomes easier to recognize when the threshold is increased". Alternatively, it may be a process of outputting voice. However, in this case, the data processing unit 1 is provided with a function of changing the threshold value Th of each word of the threshold value table 33 according to the user operation, and the user can use the process. Allow yourself to adjust the word threshold.

または、データ処理部１に、図６ａに示すような、各ワード用のしきい値調整画面を表示装置５に表示して、当該しきい値調整画面に対するユーザ操作に応じて、しきい値テーブル３３のしきい値Thを変更するしきい値変更処理を行うしきい値編集機能を設け、ユーザが自身でワードのしきい値を調整できるようにする共に、ステップ４３６を、しきい値編集機能の、認識ワードとして認識されたワードのしきい値変更処理を、しきい値増加提案型の属性で起動する処理としてもよい。ここで、しきい値編集機能は、特定のワードのしきい値変更処理を起動したならば、図６ａに示す当該ワード用のしきい値調整画面を表示装置５に表示し、しきい値調整画面に設けた増加キー６０１、減少キー６０２のユーザ操作をしきい値調整操作として図６ｂ、ｃに示すように受け付けながら、しきい値テーブル３３の当該ワードのしきい値Thを変更する。また、しきい値編集機能は、しきい値増加提案型の属性でしきい値変更処理を起動した場合、しきい値調整画面には、「設定を大きくすると"地図拡大"が認識されやすくなります」といったような、しきい値を減少することを提案するメッセージの表示を含める。 Alternatively, the data processing unit 1 displays the threshold value adjustment screen for each word on the display device 5 as shown in FIG. 6a, and the threshold value table is displayed according to the user operation for the threshold value adjustment screen. A threshold value editing function for changing the threshold value Th of 33 is provided so that the user can adjust the threshold value of the word by himself / herself, and step 436 is performed in the threshold value editing function. The threshold value change process of the word recognized as the recognition word may be started by the attribute of the threshold value increase proposal type. Here, when the threshold value editing function activates the threshold value change process of a specific word, the threshold value adjustment screen for the word shown in FIG. 6a is displayed on the display device 5, and the threshold value adjustment is performed. While accepting the user operations of the increase key 601 and the decrease key 602 provided on the screen as threshold adjustment operations as shown in FIGS. 6b and 6c, the threshold value Th of the word in the threshold value table 33 is changed. In addition, when the threshold value editing function activates the threshold value change process with the threshold value increase proposal type attribute, "If the setting is increased," map enlargement "will be more easily recognized on the threshold value adjustment screen. Include a message that suggests reducing the threshold, such as "Masu".

さて、音声入力制御部３４は、以上の処理の他、FR対応しきい値調整処理によってしきい値Thを増加したワードについて、その後に、ユーザ操作に応じてしきい値を減少したならば、当該ワードのしきい値テーブル３３の調整有無を調整無しに設定する処理等も行う。 By the way, in addition to the above processing, the voice input control unit 34 determines that the threshold value is decreased according to the user operation for the word whose threshold value Th is increased by the FR-compatible threshold value adjustment processing. It also performs a process of setting whether or not to adjust the threshold table 33 of the word without adjustment.

以上、本発明の実施形態について説明した。 The embodiment of the present invention has been described above.

１…データ処理部、２…マイクロフォン、３…音声入力部、４…入力装置、５…表示装置、６…周辺装置、３１…音声認識エンジン、３２…音声認識辞書、３３…値テーブル、３４…音声入力制御部。 1 ... Data processing unit, 2 ... Microphone, 3 ... Voice input unit, 4 ... Input device, 5 ... Display device, 6 ... Peripheral device, 31 ... Voice recognition engine, 32 ... Voice recognition dictionary, 33 ... Value table, 34 ... Voice input control unit.

Claims

It is a voice recognition system that recognizes words by voice.
With a microphone
A speech recognition dictionary in which multiple words with reference values indicating the degree of matching are registered, and
A word registered in the voice recognition dictionary, which is greater than or equal to the degree represented by a preliminary reference value set to represent a degree of matching lower than the reference value set for the word in the voice picked up by the microphone. A word that matches with a high degree is detected as a preliminary recognition word, and a word registered in the voice recognition dictionary that is more than the degree represented by the reference value set in the word in the voice picked up by the microphone. A voice recognition means that outputs words that match to a high degree as a recognition result,
When the voice recognition means detects the preliminary recognition word, the preliminary recognition count value of the word detected as the preliminary recognition word is incremented by 1, and if a predetermined period elapses thereafter, the preliminary recognition of the word is performed. Pre-recognition counting means that reduces the counting value by 1 and
When the voice recognition means outputs the recognition result, the preliminary recognition count value of the word output as the recognition result is a predetermined value or more (however, the predetermined value is an integer of 2 or more). A voice recognition system comprising a reference value changing means for changing the reference value of a word output as a recognition result so as to represent a lower degree of matching.

The voice recognition system according to claim 1.
The reference value changing means is characterized in that when the reference value of a word output as a recognition result is changed to represent a lower degree of matching, the preliminary recognition count value of each word is cleared. Speech recognition system.

It is a voice recognition system that recognizes words by voice.
With a microphone
A speech recognition dictionary in which multiple words with reference values indicating the degree of matching are registered, and
A word registered in the voice recognition dictionary, which is greater than or equal to the degree represented by a preliminary reference value set to represent a degree of matching lower than the reference value set for the word in the voice picked up by the microphone. A word that matches with a high degree is detected as a preliminary recognition word, and a word registered in the voice recognition dictionary that is more than the degree represented by the reference value set in the word in the voice picked up by the microphone. A voice recognition means that outputs words that match to a high degree as a recognition result,
When the voice recognition means detects the preliminary recognition word, the preliminary recognition count value of the word detected as the preliminary recognition word is incremented by 1, and if a predetermined period elapses thereafter, the preliminary recognition of the word is performed. Pre-recognition counting means that reduces the counting value by 1 and
When the voice recognition means outputs the recognition result, when the preliminary recognition count value of the word output as the recognition result is a predetermined value or more (however, the predetermined value is an integer of 2 or more), the user. With respect to the reference value change proposal means that proposes to change the reference value of the word output as the recognition result so as to indicate a lower degree of matching.
A voice recognition system comprising a reference value editing means for changing the reference value of the word according to a user's operation.

The voice recognition system according to claim 3.
The reference value change proposing means clears the preliminary recognition count value of each word when proposing to change the reference value of the word output as a recognition result so as to represent a lower degree of matching. A voice recognition system characterized by that.

The voice recognition system according to claim 1, 2, 3 or 4.
A voice recognition system characterized in that the predetermined value is an integer of 3 or more.

The voice recognition system according to claim 1, 2, 3, 4 or 5.
The voice recognition system is a voice recognition system characterized by being a voice recognition system used for voice input in an information processing system mounted on an automobile.

A computer program that is read and executed by a computer equipped with a microphone.
The computer program is the computer,
A speech recognition dictionary in which multiple words with reference values indicating the degree of matching are registered, and
A word registered in the voice recognition dictionary, which is greater than or equal to the degree represented by a preliminary reference value set to represent a degree of matching lower than the reference value set for the word in the voice picked up by the microphone. A word that matches with a high degree is detected as a preliminary recognition word, and a word registered in the voice recognition dictionary that is more than the degree represented by the reference value set in the word in the voice picked up by the microphone. A voice recognition means that outputs words that match to a high degree as a recognition result,
When the voice recognition means detects the preliminary recognition word, the preliminary recognition count value of the word detected as the preliminary recognition word is incremented by 1, and if a predetermined period elapses thereafter, the preliminary recognition of the word is performed. Pre-recognition counting means that reduces the counting value by 1 and
When the voice recognition means outputs the recognition result, the preliminary recognition count value of the word output as the recognition result is a predetermined value or more (however, the predetermined value is an integer of 2 or more). A computer program characterized by functioning as a reference value changing means for changing the reference value of a word output as a recognition result so as to indicate a lower degree of matching.

A computer program that is read and executed by a computer equipped with a microphone.
The computer program is the computer,
A speech recognition dictionary in which multiple words with reference values indicating the degree of matching are registered, and
A word registered in the voice recognition dictionary, which is greater than or equal to the degree represented by a preliminary reference value set to represent a degree of matching lower than the reference value set for the word in the voice picked up by the microphone. A word that matches with a high degree is detected as a preliminary recognition word, and a word registered in the voice recognition dictionary that is more than the degree represented by the reference value set in the word in the voice picked up by the microphone. A voice recognition means that outputs words that match to a high degree as a recognition result,
When the voice recognition means detects the preliminary recognition word, the preliminary recognition count value of the word detected as the preliminary recognition word is incremented by 1, and if a predetermined period elapses thereafter, the preliminary recognition of the word is performed. Pre-recognition counting means that reduces the counting value by 1 and
When the voice recognition means outputs the recognition result, the preliminary recognition count value of the word output as the recognition result is a predetermined value or more (however, the predetermined value is an integer of 2 or more). A reference value change proposal means that proposes to change the reference value of the word output as a recognition result so as to indicate a lower degree of matching, and
A computer program characterized by functioning as a reference value editing means for changing the reference value of the word according to a user's operation.