JP3415500B2

JP3415500B2 - Speech recognition system for mobile phone

Info

Publication number: JP3415500B2
Application number: JP20685699A
Authority: JP
Inventors: 勝美塩野
Original assignee: 埼玉日本電気株式会社
Priority date: 1999-07-21
Filing date: 1999-07-21
Publication date: 2003-06-09
Anticipated expiration: 2019-07-21
Also published as: JP2001034288A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は携帯電話装置に関する。
特に、本発明は、携帯電話装置の音声認識システム及び
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a mobile phone device.
In particular, the present invention relates to a voice recognition system and method for mobile phone devices.

【０００２】[0002]

【従来の技術】従来の技術として特開昭６１−１１４２
９９号公報に記載されるものがある。この公報には音声
認識の認識率改善の方式について一例が記載されてお
り、その音声認識では第１候補の類似度の値が所定の第
１の閾値より小さい場合、２番目に大きい第２候補の類
似度と第１候補の類似度の差分を取りこの差分が第２の
閾値より小さい場合に認識対象外と判定することによ
り、誤認識の低減を図っている。2. Description of the Related Art As a conventional technique, JP-A-61-1142
Some are described in Japanese Patent Publication No. 99. This publication describes an example of a method of improving the recognition rate of voice recognition. In the voice recognition, if the similarity value of the first candidate is smaller than a predetermined first threshold value, the second candidate is the second largest. The difference between the similarity of 1 and the similarity of the first candidate is calculated, and when the difference is smaller than the second threshold value, it is determined that the target is not a recognition target, thereby reducing false recognition.

【０００３】[0003]

【発明が解決しょうとする課題】しかしながら、騒音環
境下で音声認識を使用時には、第１候補の認識結果が正
解でないにも拘わらず、第１候補の類似度の値が大きく
なり、さらに、第１の閾値より小さくなる条件を満たさ
ない場合があり、誤認識低減の判定処理が行えないとい
う問題がある。However, when voice recognition is used in a noisy environment, the similarity value of the first candidate becomes large, even though the recognition result of the first candidate is not correct. There is a case where the condition of becoming smaller than the threshold value of 1 is not satisfied, and there is a problem that the determination process for reducing the false recognition cannot be performed.

【０００４】したがって、本発明は上記問題点に鑑み
て、多数の類似した認識語が登録されている音声認識装
置に騒音環境下で音声入力された場合に、誤認識の低減
を可能にする携帯電話装置の音声認識システムを提供す
ることを目的とする。Therefore, in view of the above problems, the present invention makes it possible to reduce erroneous recognition when a voice is input to a voice recognition device in which a large number of similar recognition words are registered in a noisy environment. an object of the present invention is to provide a speech recognition system of telephone equipment.

【０００５】[0005]

【課題を解決するための手段】本発明は前記問題点を解
決するために、音声入力を行う携帯電話装置の音声認識
システムにおいて、音声を示す複数の認識語、複数のノ
イズ語が登録される辞書を保持し、前記音声入力を認識
して複数の候補の認識結果を出力する音声認識部と、前
記音声認識部で認識された認識語の候補の出現割合、ノ
イズ語の候補の出現割合で、認識語か又はノイズ語かの
判定処理を行い、第１候補が認識語で、第２候補以降の
ノイズ語の出現割合が判定値以上の場合には、認識結果
をノイズ語と判定し、前記第２候補以降のノイズ語の出
現割合が判定値未満の場合には、認識結果を認識語と判
定する認識判定部を備えることを特徴とする携帯電話装
置の音声認識システムを提供する。According to the present invention, in order to solve the above problems, a plurality of recognition words indicating a voice and a plurality of noise words are registered in a voice recognition system of a portable telephone device for inputting voice. With a voice recognition unit that holds a dictionary, recognizes the voice input, and outputs recognition results of a plurality of candidates, an appearance ratio of the recognition word candidates recognized by the voice recognition unit, and an appearance ratio of the noise word candidates. , The recognition word or the noise word is determined , the first candidate is the recognition word, and
If the noise word appearance rate is greater than or equal to the judgment value, the recognition result
Is determined as a noise word, and the noise words after the second candidate are output.
If the current ratio is less than the judgment value, the recognition result is judged as a recognition word.
Provided is a voice recognition system for a mobile phone device, which is provided with a recognition determination unit that determines the voice recognition.

【０００６】この手段により、多数の類似した認識語が
登録されている音声認識装置に騒音環境下で音声入力さ
れた場合に、誤認識の低減が可能になる。すなわち、認
識結果の第１候補だけでなく第２候補〜第ｋ候補までの
複数の認識結果を用いることにより音声認識中の入力が
音声であるかノイズであるかの判定処理を行い、第１候
補が認識語で、第２候補以降のノイズ語の出現割合が判
定値以上の場合には、認識結果をノイズ語と判定するよ
うにしたので、突発的な誤認識を防止でき、正確な音声
認識が可能になり、誤認識を防止でき、正確な音声認識
が可能になった。 By this means, a large number of similar recognition words
Voice input to a registered voice recognition device in a noisy environment.
If it occurs, it is possible to reduce erroneous recognition. That is,
Not only the first candidate of the knowledge result but also the second candidate to the kth candidate
By using multiple recognition results, input during voice recognition
Perform the process of determining whether it is voice or noise, and
The complementary word is the recognition word, and the appearance ratio of the noise words after the second candidate is determined.
If it is more than the fixed value, the recognition result is judged as a noise word.
As a result, accidental misrecognition can be prevented and accurate voice
Enables recognition, prevents erroneous recognition, and provides accurate voice recognition
Became possible.

【０００７】さらに、本発明は、音声入力を行う携帯電
話装置の音声認識システムにおいて、音声を示す複数の
認識語、複数のノイズ語が登録される辞書を保持し、前
記音声入力を認識して複数の候補の認識結果を出力する
音声認識部と、前記音声認識部で認識された認識語の候
補の出現割合、ノイズ語の候補の出現割合で、認識語か
又はノイズ語かの判定処理を行い、第１候補がノイズ語
で、第２候補以降の認識語の出現割合が判定値以上の場
合には、認識結果を認識語と判定し、前記第２候補以降
の認識語の出現割合が判定値未満の場合には、認識結果
をノイズ語と判定する認識判定部とを備えることを特徴
とする携帯電話装置の音声認識システムを提供する。 Further, the present invention is a portable telephone for voice input.
In the speech recognition system of the talking device,
Holds a dictionary in which recognition words and multiple noise words are registered, and
Recognize voice input and output recognition results of multiple candidates
Voice recognition unit and the recognition word recognized by the voice recognition unit
Complementary appearance rate, noise word candidate appearance rate
Or, it is judged whether it is a noise word, and the first candidate is a noise word.
If the appearance ratio of the recognition word after the second candidate is greater than or equal to the judgment value,
In this case, the recognition result is determined to be a recognition word, and the second and subsequent candidates are determined.
If the occurrence ratio of the recognition word of is less than the judgment value, the recognition result
Is provided as a noise word and a recognition determination unit is provided.
A voice recognition system for a mobile phone device is provided.

【０００８】この手段により、上記発明と同様に、多数
の類似した認識語が登録されている音声認識装置に騒音
環境下で音声入力された場合に、誤認識の低減が可能に
なる。すなわち、認識結果の第１候補だけでなく第２候
補〜第ｋ候補までの複数の認識結果を用いることにより
音声認識中の入力が音声であるかノイズであるかの判定
処理を行い、第１候補がノイズ語で、第２候補以降の認
識語の出現割合が判定値以上の場合には、認識結果を認
識語と判定するようにしたので、突発的な誤認識を防止
でき、正確な音声認識が可能になる。 By this means, as in the above invention, a large number of
Noisy voice recognition devices that have registered similar recognition words
It is possible to reduce erroneous recognition when voice is input in the environment.
Become. That is, not only the first candidate of the recognition result but also the second candidate
By using multiple recognition results from complement to k-th candidate
Determine if the input during speech recognition is voice or noise
Process, the first candidate is a noise word, and the second candidate and later are recognized.
If the linguistic appearance rate is greater than or equal to the judgment value, the recognition result is confirmed.
Since it is determined to be a literary word, accidental misrecognition is prevented.
Yes, accurate voice recognition is possible.

【０００９】好ましくは、前記ノイズ語の出現割合、前
記認識語の出現割合は、複数のノイズ語、複数の認識語
にそれぞれ重みを付け、候補としてノイズ語、認識語が
出現する毎にそれぞれの重みを加算して算出される。こ
の手段により、音声認識の認識性能の向上が行える。 Preferably, the appearance ratio of the noise word,
Appearance ratio of recognition words is multiple noise words, multiple recognition words
To which the noise word and the recognition word are
It is calculated by adding each weight each time it appears. This
By this means, the recognition performance of voice recognition can be improved.

【００１０】さらに、本発明は、音声入力を行う携帯電
話装置の音声認識システムにおいて、音声を示す複数の
認識語、複数のノイズ語が登録される辞書を保持し、前
記音声入力を認識して複数の候補の認識結果を出力する
音声認識部と、前記音声認識部で認識された認識語の候
補の出現割合、ノイズ語の候補の出現割合で、認識語か
又はノイズ語かの判定処理を行い、前記認識語の候補の
うち、所定数よりも少ない文字数で構成される前記認識
語の候補を前記ノイズ語の候補とする認識判定部とを備
えることを特徴とする携帯電話装置の音声認識システム
を提供する。この手段により、認識結果の候補に認識語
が含まれている時、認識語の文字数が少なければ音声認
識時の入力がノイズである可能性が高く、文字数が長け
れば音声である可能性が高いと判定できる。このよう
に、認識結果の候補が音声か又はノイズかの判定に加え
て認識結果の候補に認識語が含まれている時、その文字
数も判定値に加えることにより、より正しい音声認識結
果を出力することが可能になる。 Further, the present invention is a portable telephone for voice input.
In the speech recognition system of the talking device,
Holds a dictionary in which recognition words and multiple noise words are registered, and
Recognize voice input and output recognition results of multiple candidates
Voice recognition unit and the recognition word recognized by the voice recognition unit
Complementary appearance rate, noise word candidate appearance rate
Or, it is determined whether it is a noise word,
Of the above, the recognition consisting of less than a predetermined number of characters
A recognition determination unit that uses word candidates as the noise word candidates.
Voice recognition system for mobile phone devices
I will provide a. By this means, the recognition word
When the recognition word contains a small number of characters,
There is a high possibility that the input at the time of recognition is noise and the number of characters is long.
If so, it can be determined that the possibility of being voice is high. like this
In addition to determining whether the recognition result candidate is voice or noise,
When a recognition word is included in the recognition result candidates, the character
By adding the number to the judgment value, more accurate speech recognition results can be obtained.
It is possible to output the result.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。図１は本発明に係る携帯電
話装置の音声認識システムの概略構成を示すブロック図
である。本図に示すように、携帯電話装置には無線部１
が設けられ、無線部１は図示しない基地局と無線信号の
送受信を行う。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a voice recognition system for a mobile phone device according to the present invention. As shown in the figure, the wireless unit 1 is included in the mobile phone device.
The wireless unit 1 transmits and receives wireless signals to and from a base station (not shown).

【００１２】さらに、携帯電話装置には操作部２が設け
られ、操作部２は携帯電話の操作、音声認識の開始等を
行う。さらに、携帯電話装置には表示部３が設けられ、
表示部３は数字、文字等を表示する。無線部１、操作部
２、表示部３にはメインＣＰＵ（ＣｅｎｔｒａｌＰｒ
ｏｃｅｓｓｉｎｇＵｎｉｔ；中央演算装置）４が接続
され、メインＣＰＵ４は無線部１の送受信の制御を行
い、操作部２から制御指示を受け、表示部３に表示制御
を行う。Further, the portable telephone device is provided with an operating portion 2, and the operating portion 2 operates the portable telephone, starts voice recognition and the like. Further, the mobile phone device is provided with the display unit 3,
The display unit 3 displays numbers, letters and the like. The wireless unit 1, the operation unit 2, and the display unit 3 have a main CPU (Central Pr).
processing unit (central processing unit) 4 is connected, the main CPU 4 controls transmission / reception of the wireless unit 1, receives a control instruction from the operation unit 2, and performs display control on the display unit 3.

【００１３】メインＣＰＵ４には音声認識部５が接続さ
れ、音声認識部５は、音声認識ＬＳＩ（ＬａｒｇｅＳ
ｃａｌｅＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ；大
規模集積回路）で構成され、音声認識処理を行う。な
お、操作部２により、音声認識の開始キーが押下される
と、メインＣＰＵ４で音声認識開始キー押下を検出して
音声認識の起動を確定し、音声認識部５に音声認識開始
命令が送信される。A voice recognition unit 5 is connected to the main CPU 4, and the voice recognition unit 5 is a voice recognition LSI (Large S).
Cale Integrated Circuit (large-scale integrated circuit), and performs voice recognition processing. When the voice recognition start key is pressed by the operation unit 2, the main CPU 4 detects that the voice recognition start key is pressed to confirm the start of voice recognition, and the voice recognition start command is transmitted to the voice recognition unit 5. It

【００１４】メインＣＰＵ４、音声認識部５にはＡ／Ｄ
（ＡｎａｌｏｇｔｏＤｉｇｉｔａｌ；アナログ・
ディジタル）変換器６、音声合成部１０が接続され、音
声合成部１０は音声認識部５からの認識結果を音声に合
成する。音声合成部１０にはＤ／Ａ（Ｄｉｇｉｔａｌ
ｔｏＡｎａｌｏｇ；ディジタル・アナログ）変換器
７が接続される。The main CPU 4 and the voice recognition unit 5 have an A / D
(Analog to Digital; analog
The (digital) converter 6 and the voice synthesis unit 10 are connected, and the voice synthesis unit 10 synthesizes the recognition result from the voice recognition unit 5 into voice. The voice synthesizer 10 has a D / A (Digital
to Analog (digital / analog) converter 7 is connected.

【００１５】Ａ／Ｄ変換器６にはマイクロフォン８が接
続され、マイクロフォン８は音声の入力を行う。Ａ／Ｄ
変換器６は音声のアナログ信号をディジタルの音声デー
タに変換する。マイクロフォン８への音声入力は携帯電
話の送信音として処理される。A microphone 8 is connected to the A / D converter 6, and the microphone 8 inputs a voice. A / D
The converter 6 converts a voice analog signal into digital voice data. The voice input to the microphone 8 is processed as a transmission sound of the mobile phone.

【００１６】さらに、音声認識部５では、メインＣＰＵ
４より開始命令を受信すると、マイクロフォン８から入
力された音声に対して、認識処理が実行される。Ｄ／Ａ
変換器７にはスピーカ９が接続され、スピーカ９は音声
の出力を行う。Ｄ／Ａ変換器７はディジタルの音声デー
タを音声のアナログ信号に変換する。Further, in the voice recognition section 5, the main CPU
When the start command is received from 4, the recognition process is performed on the voice input from the microphone 8. D / A
A speaker 9 is connected to the converter 7, and the speaker 9 outputs a voice. The D / A converter 7 converts digital voice data into a voice analog signal.

【００１７】スピーカ９への音声出力は携帯電話の受信
音として処理される。さらに、音声認識開始時の開始音
の鳴動、音声認識部５で認識結果が確定すると認識結果
に対応した音声が、音声合成部１０で合成されてＤ／Ａ
変換器７経由でスピーカ９から出力される。また、音声
認識部５で確定された上記の認識結果は表示部３に表示
される。The voice output to the speaker 9 is processed as a reception sound of the mobile phone. Further, when the voice recognition unit 5 sounds the start sound at the start of voice recognition and the recognition result is confirmed by the voice recognition unit 5, the voice corresponding to the recognition result is synthesized by the voice synthesis unit 10 and D / A
It is output from the speaker 9 via the converter 7. Further, the above recognition result determined by the voice recognition unit 5 is displayed on the display unit 3.

【００１８】次に、音声認識部５には、認識語として、
携帯電話装置のメモリダイヤル、機能の呼出しを行うｍ
個の複数の認識語と、ノイズに対する誤動作を防止する
ために登録するｎ個の複数のノイズ語が辞書に登録され
る。ノイズ語とは音声認識中にノイズが入力されたとき
に、メモリダイヤル、機能呼出しの認識語の誤動作を防
止するために登録する音声認識の辞書に、突発的なノイ
ズなど複数の種類の音に関するノイズが登録される。Next, the voice recognition section 5 receives a recognition word as
Memory dialing of mobile phone devices, function calls
A plurality of recognition words and a plurality of n noise words that are registered to prevent malfunction due to noise are registered in the dictionary. What is a noise word? When noise is input during voice recognition, a memory dial, a voice recognition dictionary that is registered to prevent malfunction of the recognition word for function calls, relates to multiple types of sounds such as sudden noise. Noise is registered.

【００１９】この登録により、音声認識中に突発的なノ
イズが入力されても音声認識の辞書とのパターンマッチ
ングを行ったときには、メモリダイヤル、機能呼出しの
認識語よりもノイズ語の方が入力音に対して類似性が高
くなる。By this registration, even if sudden noise is input during voice recognition, when the pattern matching with the voice recognition dictionary is performed, the noise word is the input sound more than the recognition word of the memory dial or function call. Is more similar to.

【００２０】このため、認識結果はノイズ語となり、ノ
イズ入力に対する誤動作を防止することが可能になる。
さらに、メインＣＰＵ４には認識判定部４Ａが設けら
れ、認識判定部４Ａは音声認識部５の認識結果である認
識語とノイズ語の判定処理を行う。Therefore, the recognition result becomes a noise word, and it becomes possible to prevent malfunctions due to noise input.
Further, the main CPU 4 is provided with a recognition determination unit 4A, and the recognition determination unit 4A performs a determination process of a recognition word and a noise word which are the recognition results of the voice recognition unit 5.

【００２１】図２は図１のメインＣＰＵ４、音声認識部
５、認識判定部４Ａの動作概略を説明するフローチャー
トである。ステップＳ１において、操作部２より音声認
識開始キーが押下される。ステップＳ２において、メイ
ンＣＰＵ４で上記音声認識開始キーの押下を検出する。
ステップＳ３において、メインＣＰＵ４は音声認識の起
動を確定する。FIG. 2 is a flow chart for explaining the outline of the operations of the main CPU 4, the voice recognition section 5, and the recognition determination section 4A of FIG. In step S1, the voice recognition start key is pressed from the operation unit 2. In step S2, the main CPU 4 detects the depression of the voice recognition start key.
In step S3, the main CPU 4 finalizes the activation of voice recognition.

【００２２】ステップＳ４において、メインＣＰＵ４は
音声認識部５に音声認識開始命令を送信する。ステップ
Ｓ５において、音声認識部５は音声認識を開始する。ス
テップＳ６において、マイクロフォン８より入力された
音声に対し、音声認識部５は音声認識を行う。In step S4, the main CPU 4 sends a voice recognition start command to the voice recognition unit 5. In step S5, the voice recognition unit 5 starts voice recognition. In step S6, the voice recognition unit 5 performs voice recognition on the voice input from the microphone 8.

【００２３】ステップＳ７において、音声認識部５は音
声認識の結果を確定する。ステップＳ８において、認識
結果として第１候補〜第ｋ候補までの認識結果を認識判
定部４Ａに出力する。ステップＳ９において、認識判定
部４Ａでは認識結果の第１候補に加えて第２候補〜第ｋ
候補までの判定処理を行う。In step S7, the voice recognition unit 5 determines the result of voice recognition. In step S8, the recognition results of the first candidate to the kth candidate are output to the recognition determination unit 4A as the recognition result. In step S9, the recognition determination unit 4A adds the second candidate to the kth candidate in addition to the first candidate of the recognition result.
The determination process up to the candidate is performed.

【００２４】ステップＳ１０において、メインＣＰＵ４
の認識判定部４Ａにより認識結果が認識語との判定時に
は認識語に対応したメンテナンスを表示部３に表示し、
スピーカ９より対応する音声を出力する。ステップＳ１
１において、メインＣＰＵ４の認識判定部４Ａにより認
識結果がノイズ語との判定時には音声認識中にノイズが
入力されて誤動作したと判定する。ステップＳ１２にお
いて、判定されたノイズに対して誤動作した旨を伝える
メッセージを表示部３に表示し、スピーカ９より対応す
る音声を出力する。In step S10, the main CPU 4
When the recognition determination unit 4A determines that the recognition result is the recognition word, the maintenance corresponding to the recognition word is displayed on the display unit 3,
The corresponding voice is output from the speaker 9. Step S1
In No. 1, when the recognition determination unit 4A of the main CPU 4 determines that the recognition result is a noise word, it is determined that noise is input during voice recognition and a malfunction occurs. In step S12, a message notifying that the determined noise has malfunctioned is displayed on the display unit 3, and a corresponding voice is output from the speaker 9.

【００２５】図３及び図４は図１のメインＣＰＵ４の認
識判定部４Ａにおける認識結果の判定の詳細な処理につ
いて説明するフローチャートである。図３に示すよう
に、ステップＳ２１において認識判定部４Ａは、操作部
２より音声認識開始キーが押下されると、音声認識を起
動し、音声認識部５でマイクロフォン８より入力された
音声に対して認識処理が行われ、認識結果が確定するの
を待つ。FIGS. 3 and 4 are flow charts for explaining the detailed process of determining the recognition result in the recognition determining unit 4A of the main CPU 4 of FIG. As shown in FIG. 3, when the voice recognition start key is pressed from the operation unit 2 in step S21, the recognition determination unit 4A activates voice recognition, and the voice recognition unit 5 recognizes the voice input from the microphone 8. Then, the recognition process is performed to wait for the recognition result to be confirmed.

【００２６】ステップＳ２２において、認識結果が確定
し、音声認識で認識結果として第１候補〜第ｋ候補まで
のｋ個の候補が出力されると、認識判定部４Ａは、ｋ個
の候補の認識結果を取得してそれらを用いて、以下のよ
うに、判定を行う。ステップＳ２３において、最初に第
１候補が認識語か否かの判定を行う。ステップＳ２４に
おいて、第１候補が認識語の場合には、第２候補がノイ
ズ語か否かの判定を行う。ノイズ語でなければステップ
Ｓ２６に進む。In step S22, when the recognition result is confirmed and k candidates from the first candidate to the k-th candidate are output as the recognition result by the voice recognition, the recognition determining section 4A recognizes the k candidates. The results are acquired, and the judgment is performed using them as follows. In step S23, it is first determined whether the first candidate is a recognition word. In step S24, when the first candidate is a recognition word, it is determined whether the second candidate is a noise word. If it is not a noise word, the process proceeds to step S26.

【００２７】ステップＳ２５において、第２候補がノイ
ズ語の時には、判定値に重み付けｗ１［ｋ］を加算す
る。ステップＳ２６において、次に、第３候補がノイズ
語か否かの判定を行う。ノイズ語でなければステップＳ
２８に進む。ステップＳ２７において、第３候補がノイ
ズ語の時には、判定値に重み付けｗ１［ｋ−１］を加算
する。In step S25, when the second candidate is a noise word, weighting w1 [k] is added to the determination value. In step S26, it is next determined whether the third candidate is a noise word. If it is not a noise word, step S
Proceed to 28. In step S27, when the third candidate is a noise word, weighting w1 [k-1] is added to the determination value.

【００２８】ステップＳ２８において、続けて第ｋ候補
まで各候補がノイズ語か否かの判定処理を行う。ノイズ
語でなければステップＳ３０に進む。ステップＳ２９に
おいて、第ｋ候補がノイズ語の時には、判定値に重み付
けｗ１［１］を加算する。ステップＳ３０において、第
ｋ候補までの判定処理が終了したら判定値の累積加算値
をスレッショルドＴｈ１と比較する。In step S28, it is continuously determined whether each candidate is a noise word up to the kth candidate. If it is not a noise word, the process proceeds to step S30. In step S29, when the k-th candidate is a noise word, weighting w1 [1] is added to the determination value. In step S30, when the determination processing up to the kth candidate is completed, the cumulative addition value of the determination values is compared with the threshold Th1.

【００２９】ステップＳ３１において、判定値がＴｈ１
よりも大きい時には認識結果の第１候補が認識語でも音
声認識中に入力された音声はノイズと判定する。ステッ
プＳ３２において、ノイズ入力に対する認識結果を表示
して処理を終了する。ステップＳ３３において、ステッ
プＳ３０で、判定値がＴｈ１よりも小さいときには音声
認識時の入力が音声であると判定する。In step S31, the judgment value is Th1.
When it is larger than the above, even if the first candidate of the recognition result is a recognition word, the voice input during voice recognition is determined to be noise. In step S32, the recognition result for the noise input is displayed and the process ends. In step S33, when the determination value is smaller than Th1 in step S30, it is determined that the input during voice recognition is voice.

【００３０】ステップＳ３４において、第１候補の認識
語に対する認識結果を表示して処理を終了する。このた
め、音声認識で音声認識中にノイズが入力されたにも拘
わらず第１候補が認識語となる誤認識が発生しても、第
２候補〜第ｋ候補の結果を用いて第２候補〜第ｋ候補に
ノイズ語が多数含まれていて判定値がＴｈ１以上のとき
には音声認識中の入力が音声ではなくノイズ入力である
と判定する。In step S34, the recognition result for the first candidate recognition word is displayed, and the process ends. Therefore, even if erroneous recognition occurs in which the first candidate is a recognition word despite the noise being input during the voice recognition in the voice recognition, the second candidate is determined by using the results of the second candidate to the kth candidate. -When the k-th candidate includes a large number of noise words and the determination value is Th1 or more, it is determined that the input during voice recognition is not a voice but a noise input.

【００３１】このため、ノイズ入力に対する認識結果を
表示することによって誤認識を回避することが可能にな
る。図４に示すように、ステップＳ３５において、ステ
ップＳ２３で第１候補がノイズ語と判定時には第２候補
が認識語か否かの判定処理を行う。認識語でなければス
テップＳ３７に進む。Therefore, it is possible to avoid erroneous recognition by displaying the recognition result for the noise input. As shown in FIG. 4, in step S35, when it is determined in step S23 that the first candidate is a noise word, a process of determining whether or not the second candidate is a recognition word is performed. If it is not a recognition word, the process proceeds to step S37.

【００３２】ステップＳ３６において、第ｋ候補が認識
語の時には、判定値に重み付けｗ２［ｋ］を加算する。
ステップＳ３７において、次に、第３候補が認識語か否
かの判定を行う。認識語でなければステップＳ３９に進
む。ステップＳ３８において、第３候補が認識語の時に
は、判定値に重み付けｗ２［ｋ−１］を加算する。In step S36, when the k-th candidate is a recognized word, weighting w2 [k] is added to the determination value.
In step S37, next, it is determined whether the third candidate is a recognition word. If it is not a recognition word, the process proceeds to step S39. In step S38, when the third candidate is a recognition word, weighting w2 [k-1] is added to the determination value.

【００３３】ステップＳ３９において、続けて第ｋ候補
まで各候補が認識語か否かの判定処理を行う。認識語で
なければステップＳ４１に進む。ステップＳ４０におい
て、第ｋ候補が認識語の時には、判定値に重み付けｗ２
［１］を加算する。ステップＳ４１において、第ｋ候補
までの判定処理が終了したら判定値の累積加算値をスレ
ッショルドＴｈ２と比較する。In step S39, it is continuously determined whether or not each candidate is a recognized word up to the kth candidate. If it is not a recognition word, the process proceeds to step S41. In step S40, when the k-th candidate is a recognition word, the determination value is weighted w2.
Add [1]. In step S41, when the determination process up to the kth candidate is completed, the cumulative addition value of the determination values is compared with the threshold Th2.

【００３４】ステップＳ４２において、判定値がＴｈ２
よりも大きいときには認識結果の第１候補がノイズ語で
も音声認識中に入力された音声はノイズでなく音声であ
ると判定する。ステップＳ４３において、第２候補の認
識語に対する認識結果を表示して処理を終了する。In step S42, the judgment value is Th2.
If it is larger than, even if the first candidate of the recognition result is a noise word, it is determined that the voice input during voice recognition is not noise but voice. In step S43, the recognition result for the second candidate recognition word is displayed, and the process ends.

【００３５】ステップＳ４４において、ステップＳ４１
で、判定値がＴｈ２よりも小さいときには音声認識時の
入力がノイズと判定する。ステップＳ４５において、認
識結果がノイズ語であるときの認識結果を表示して処理
を終了する。In step S44, step S41
Then, when the determination value is smaller than Th2, it is determined that the input at the time of voice recognition is noise. In step S45, the recognition result when the recognition result is a noise word is displayed, and the process ends.

【００３６】図５は具体的な例について説明する図であ
る。本図（ａ）の音声認識部５における認識辞書の構成
に示すように、音声認識の認識辞書として認識語が１０
単語（ｍ）でノイズ語が５単語（ｎ）登録されており、
認識結果として第３候補（ｋ）まで出力されるとする。
さらに、本図（ｄ）の認識結果例１に示すように、音声
認識中にノイズが入力されたときに第１候補が認識語ス
ズキで、第２候補と第３候補がノイズ語の例について説
明する。FIG. 5 is a diagram for explaining a concrete example. As shown in the configuration of the recognition dictionary in the voice recognition unit 5 of FIG. 9A, the recognition words of the recognition dictionary are 10 as the recognition dictionary of the voice recognition.
Five noise words (n) are registered in the word (m),
It is assumed that up to the third candidate (k) is output as the recognition result.
Furthermore, as shown in recognition result example 1 of FIG. 6D, when noise is input during voice recognition, the first candidate is the recognition word Suzuki, and the second and third candidates are noise words. explain.

【００３７】第１候補が認識語の時には第２候補と第３
候補がノイズ語か否かの判定を行い、判定値を計算す
る。上記の例１では、第２候補、第３候補共にノイズ語
であるため、判定値に、本図（ｂ）に示す判定重み付け
値ｗ１［２］（＝２）と値ｗ１［１］（＝１）を加算し
て判定値が３となる。When the first candidate is a recognition word, the second candidate and the third candidate
It is determined whether or not the candidate is a noise word, and the determination value is calculated. In the above-mentioned example 1, since the second candidate and the third candidate are both noise words, the judgment values w1 [2] (= 2) and the values w1 [1] (= shown in FIG. The judgment value becomes 3 by adding 1).

【００３８】判定値Ｔｈ１（＝３）以上の値となるた
め、認識結果の第１候補が認識語のスズキであるが、音
声認識時の入力がノイズと判定して認識結果がノイズ語
に対する認識結果を表示して誤認識が防止される。次
に、本図(ｅ)認識結果例２に示すように、音声認識中に
スズキを発声した時、第１候補がノイズ語、第２候補が
スズキで第３候補がサトウとなる例について説明する。Since the first candidate of the recognition result is Suzuki, which is the recognition word, because the value is equal to or greater than the judgment value Th1 (= 3), the input at the time of voice recognition is judged to be noise, and the recognition result is recognized for the noise word. The result is displayed to prevent misrecognition. Next, as shown in (e) recognition result example 2 of this figure, an example in which when Suzuki is uttered during voice recognition, the first candidate is a noise word, the second candidate is Suzuki, and the third candidate is Sato To do.

【００３９】第１候補がノイズ語の時には第２候補、第
３候補が認識語か否かの判定処理を行い、判定値が計算
される。上記の例２では、第２候補、第３候補共に認識
語のため、判定値に、本図（ｂ）の判定重み付け値ｗ２
［２］（＝２）とｗ２［１］（＝１）を加算して、判定
値が３となる。When the first candidate is a noise word, it is determined whether the second candidate and the third candidate are recognition words, and the determination value is calculated. In the above-mentioned example 2, since the second candidate and the third candidate are both recognition words, the judgment weighting value w2 in FIG.
[2] (= 2) and w2 [1] (= 1) are added, and the determination value becomes 3.

【００４０】判定値はＴｈ２（＝３）以上の値となるた
め、認識結果の第１候補がノイズ語であるが、音声認識
時の入力が音声と判定して認識結果として第２候補のス
ズキを出力する。このように、ノイズ語の候補の出現割
合、認識語の候補の出現割合で、認識語か又はノイズ語
かの判定処理を行うので、誤認識低減が可能になる。Since the judgment value is a value of Th2 (= 3) or more, the first candidate of the recognition result is a noise word, but the input at the time of voice recognition is judged to be voice, and the second candidate Suzuki as the recognition result. Is output. In this way, since the determination process of the recognition word or the noise word is performed based on the appearance ratio of the noise word candidates and the appearance ratio of the recognition word candidates, erroneous recognition can be reduced.

【００４１】以上、本発明の実施の形態における携帯電
話装置の音声認識システムを説明したが、本発明はこの
実施例に限定されるものではなく、その発明の趣旨にし
たがって各種変更が可能である。したがって、本発明に
よれば、誤認識を防止してより正確な音声認識の結果を
出力することが可能になる。この結果、音声認識の認識
性能の向上が行える。Although the voice recognition system for the portable telephone device according to the embodiment of the present invention has been described above, the present invention is not limited to this embodiment, and various modifications can be made according to the spirit of the invention. . Therefore, according to the present invention, it is possible to prevent erroneous recognition and output a more accurate voice recognition result. As a result, the recognition performance of voice recognition can be improved.

【００４２】認識結果の第１候補だけでなく第２候補〜
第ｋ候補までの複数の認識結果を用いることにより音声
認識中の入力が音声であるか、又は、ノイズであるかの
判定処理を行い、認識結果が出力されるためである。次
に、本発明における他の実施の形態について説明する。
本実施の形態では音声認識中の入力が音声か又はノイズ
かの判定処理手段として、第２候補〜第ｋ候補までがノ
イズであるか又は音声であるかの判定処理に加えて、第
２候補〜第ｋ候補までに認識語が含まれている時に認識
語の文字数が長いか又は短いかが判定値として加えられ
る。Not only the first candidate of the recognition result but also the second candidate ~
This is because by using a plurality of recognition results up to the kth candidate, it is determined whether the input during voice recognition is voice or noise, and the recognition result is output. Next, another embodiment of the present invention will be described.
In the present embodiment, as a processing unit for determining whether the input during voice recognition is voice or noise, in addition to the processing for determining whether the second candidate to the kth candidate are noise or voice, the second candidate -When the recognition word is included up to the kth candidate, whether the number of characters of the recognition word is long or short is added as a determination value.

【００４３】例えば、認識結果の候補に認識語が含まれ
ている時、認識語の文字数が少なければ音声認識時の入
力がノイズである可能性が高く、文字数が長ければ音声
である可能性が高いと判定できる。このように、認識結
果の候補が音声か又はノイズかの判定に加えて認識結果
の候補に認識語が含まれている時、その文字数も判定値
に加えることにより、より正しい音声認識結果を出力す
ることが可能になる。For example, when the recognition result candidate includes a recognition word, if the number of characters of the recognition word is small, the input at the time of voice recognition is likely to be noise, and if the number of characters is long, it is likely to be voice. It can be judged to be high. In this way, when the recognition result candidate includes a recognition word in addition to the determination as to whether the recognition result is speech or noise, the number of characters is also added to the determination value to output a more accurate speech recognition result. It becomes possible to do.

【００４４】[0044]

【発明の効果】以上説明したように、本発明によれば、
認識結果の第１候補だけでなく第２候補〜第ｋ候補まで
の複数の認識結果を用いることにより音声認識中の入力
が音声であるかノイズであるかの判定処理を行い、認識
結果を出力するようにしたので、誤認識を防止でき、正
確な音声認識が可能になった。As described above, according to the present invention,
By using not only the first candidate of the recognition result but also the plurality of recognition results of the second candidate to the k-th candidate, it is determined whether the input during voice recognition is voice or noise, and the recognition result is output. By doing so, erroneous recognition can be prevented and accurate voice recognition is possible.

[Brief description of drawings]

【図１】本発明に係る携帯電話装置の音声認識システム
の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a voice recognition system for a mobile phone device according to the present invention.

【図２】図１のメインＣＰＵ４、音声認識部５、認識判
定部４Ａの動作概略を説明するフローチャートである。FIG. 2 is a flowchart illustrating an outline of operations of a main CPU 4, a voice recognition unit 5, and a recognition determination unit 4A in FIG.

【図３】図１のメインＣＰＵ４の認識判定部４Ａにおけ
る認識結果の判定の詳細な処理について説明するフロー
チャートである。FIG. 3 is a flowchart illustrating a detailed process of determining a recognition result in a recognition determining unit 4A of the main CPU 4 of FIG.

【図４】図１のメインＣＰＵ４の認識判定部４Ａにおけ
る認識結果の判定の詳細な処理について説明するフロー
チャートである。FIG. 4 is a flowchart illustrating a detailed process of determining a recognition result in a recognition determination unit 4A of the main CPU 4 of FIG.

【図５】具体的な例について説明する図である。FIG. 5 is a diagram illustrating a specific example.

[Explanation of symbols]

１…無線部２…操作部３…表示部４…メインＣＰＵ４Ａ…認識判定部５…音声認識部６…Ａ／Ｄ変換器７…Ｄ／Ａ変換器８…マイクロフォン９…スピーカ１０…音声合成部 1 ... Wireless section 2 ... Operation part 3 ... Display 4 ... Main CPU 4A ... Recognition determination unit 5 ... Voice recognition unit 6 ... A / D converter 7 ... D / A converter 8 ... Microphone 9 ... Speaker 10 ... Voice synthesizer

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/20 Continuation of front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 15/20

Claims

(57) [Claims]

1. A voice recognition system for a mobile phone device for voice input, comprising: holding a dictionary in which a plurality of recognition words indicating voice and a plurality of noise words are registered, and recognizing the voice input to select a plurality of candidates. A speech recognition unit that outputs a recognition result, and an appearance ratio of recognition word candidates recognized by the speech recognition unit,
A process of determining whether the word is a recognition word or a noise word is performed based on the appearance ratio of noise word candidates, and the first candidate is a recognition word and the second and subsequent candidates are recognized.
If the noise word occurrence ratio of is greater than or equal to the judgment value, the recognition result
The result is determined to be a noise word, and the noise words of the second and subsequent candidates are determined.
If the appearance ratio is less than the judgment value, the recognition result is regarded as the recognition word.
A voice recognition system for a mobile phone device, comprising a recognition determination unit for determination .

2. Voice recognition of a mobile phone device for voice input
In the system, multiple recognition words indicating voice and multiple noise words are registered.
Hold a dictionary, recognize the voice input and recognize multiple candidates.
A voice recognition unit that outputs a recognition result, and an appearance ratio of recognition word candidates recognized by the voice recognition unit,
Occurrence rate of noise word candidates, whether it is a recognition word or noise word
The first candidate is a noise word, and the second candidate
If the appearance ratio of the descending recognition word is more than the judgment value, the recognition result
The result is judged as a recognition word, and the recognition words after the second candidate appear.
If the ratio is less than the judgment value, the recognition result is judged as a noise word.
A voice recognition system for a mobile phone device, comprising:

3. The appearance ratio of the noise word and the recognition word
Occurrence rate is different for multiple noise words and multiple recognition words.
Each time a noise word or a recognition word appears as a candidate with a weight
Is calculated by adding each weight to
The voice recognition system for a mobile phone device according to claim 1 or 2 .

4. A voice recognition of a mobile phone device for voice input.
In the system, multiple recognition words indicating voice and multiple noise words are registered.
Hold a dictionary, recognize the voice input and recognize multiple candidates.
A voice recognition unit that outputs a recognition result, and an appearance ratio of recognition word candidates recognized by the voice recognition unit,
Occurrence rate of noise word candidates, whether it is a recognition word or noise word
Of the recognition word candidates,
The recognition word candidates composed of a smaller number of characters are
And a recognition determination unit that is a candidate for a noise word.
A voice recognition system for mobile phone devices.