JP2001034288A

JP2001034288A - Voice recognition system and method for portable telephone system

Info

Publication number: JP2001034288A
Application number: JP20685699A
Authority: JP
Inventors: Katsumi Shiono; 勝美塩野
Original assignee: NEC Saitama Ltd
Current assignee: NEC Saitama Ltd
Priority date: 1999-07-21
Filing date: 1999-07-21
Publication date: 2001-02-09
Anticipated expiration: 2019-07-21
Also published as: JP3415500B2

Abstract

PROBLEM TO BE SOLVED: To reduce erroneous recognition when a voice is inputted to a voice recognition device in which many similar recognition words are registered under a noisy environment. SOLUTION: The voice recognition system of a portable telephone system to which a voice is to be inputted is provided with a voice recongizing part 5 which preserves a dictionary in which plural recognition words and plural noise words indicating voices are registered and which recognizes an inputted voice and outputs recognized results of plural candidates and a recognition deciding 4A performing a deciding processing of whether the inputted voice is a recognition word or a noise word with the ratio of occurrence of the candidates of the recognition word and the ratio of occurrence of candidates of the noise word recognized in the voice recognizing part 5.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は携帯電話装置に関する。
特に、本発明は、携帯電話装置の音声認識システム及び
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a portable telephone device.
In particular, the present invention relates to a speech recognition system and method for a mobile phone device.

【０００２】[0002]

【従来の技術】従来の技術として特開昭６１−１１４２
９９号公報に記載されるものがある。この公報には音声
認識の認識率改善の方式について一例が記載されてお
り、その音声認識では第１候補の類似度の値が所定の第
１の閾値より小さい場合、２番目に大きい第２候補の類
似度と第１候補の類似度の差分を取りこの差分が第２の
閾値より小さい場合に認識対象外と判定することによ
り、誤認識の低減を図っている。2. Description of the Related Art As a prior art, Japanese Patent Laid-Open No. 61-142
No. 99 is disclosed. This publication describes an example of a method of improving the recognition rate of speech recognition. In the speech recognition, when the similarity value of the first candidate is smaller than a predetermined first threshold value, the second candidate of the second candidate is increased. The difference between the similarity of the first candidate and the similarity of the first candidate is determined, and if this difference is smaller than the second threshold, it is determined to be out of the recognition target, thereby reducing erroneous recognition.

【０００３】[0003]

【発明が解決しょうとする課題】しかしながら、騒音環
境下で音声認識を使用時には、第１候補の認識結果が正
解でないにも拘わらず、第１候補の類似度の値が大きく
なり、さらに、第１の閾値より小さくなる条件を満たさ
ない場合があり、誤認識低減の判定処理が行えないとい
う問題がある。However, when speech recognition is used in a noisy environment, the similarity value of the first candidate increases even though the recognition result of the first candidate is not correct. There is a case where the condition of being smaller than the threshold value of 1 may not be satisfied, and there is a problem that a determination process for reducing false recognition cannot be performed.

【０００４】したがって、本発明は上記問題点に鑑み
て、多数の類似した認識語が登録されている音声認識装
置に騒音環境下で音声入力された場合に、誤認識の低減
を可能にする携帯電話装置の音声認識システム及び方法
を提供することを目的とする。[0004] Therefore, in view of the above-mentioned problems, the present invention provides a portable telephone capable of reducing erroneous recognition when a speech recognition device in which many similar recognition words are registered is inputted under a noisy environment. It is an object to provide a speech recognition system and method for a telephone device.

【０００５】[0005]

【課題を解決するための手段】本発明は前記問題点を解
決するために、音声入力を行う携帯電話装置の音声認識
システムにおいて、音声を示す複数の認識語、複数のノ
イズ語が登録される辞書を保持し、前記音声入力を認識
して複数の候補の認識結果を出力する音声認識部と、前
記音声認識部で認識された認識語の候補の出現割合、ノ
イズ語の候補の出現割合で、認識語か又はノイズ語かの
判定処理を行う認識判定部とを備えることを特徴とする
携帯電話装置の音声認識システムを提供する。According to the present invention, in order to solve the above-mentioned problems, a plurality of recognition words indicating a voice and a plurality of noise words are registered in a voice recognition system of a portable telephone device for inputting voice. A speech recognition unit that holds a dictionary, recognizes the speech input, and outputs recognition results of a plurality of candidates, and an appearance ratio of recognition word candidates recognized by the speech recognition unit and an appearance ratio of noise word candidates. A speech recognition system for a mobile phone device, comprising: a recognition unit that performs a process of determining whether the word is a recognition word or a noise word.

【０００６】この手段により、多数の類似した認識語が
登録されている音声認識装置に騒音環境下で音声入力さ
れた場合に、誤認識の低減が可能になる。好ましくは、
前記認識判定部は、第１候補が認識語で、第２候補以降
のノイズ語の出現割合が判定値以上の場合には、認識結
果をノイズ語と判定し、前記第２候補以降のノイズ語の
出現割合が判定値未満の場合には、認識結果を認識語と
判定する。好ましくは、前記認識判定部は、第１候補が
ノイズ語で、第２候補以降の認識語の出現割合が判定値
以上の場合には、認識結果を認識語と判定し、前記第２
候補以降の認識語の出現割合が判定値未満の場合には、
認識結果をノイズ語と判定する。好ましくは、前記ノイ
ズ語の出現割合、前記認識語の出現割合は、複数のノイ
ズ語、複数の認識語にそれぞれ重みを付け、候補として
ノイズ語、認識語が出現する毎にそれぞれの重みを加算
して算出される。[0006] According to this means, it is possible to reduce erroneous recognition when a voice is input in a noise environment to a voice recognition device in which many similar recognition words are registered. Preferably,
When the first candidate is a recognized word and the appearance ratio of noise words after the second candidate is equal to or greater than a determination value, the recognition determination unit determines that the recognition result is a noise word, and the noise word after the second candidate is a noise word. If the appearance ratio of is less than the determination value, the recognition result is determined to be a recognized word. Preferably, when the first candidate is a noise word and the appearance ratio of the recognized words after the second candidate is equal to or more than a determination value, the recognition determination unit determines the recognition result as a recognized word, and
If the appearance rate of recognized words after the candidate is less than the judgment value,
The recognition result is determined to be a noise word. Preferably, the appearance ratio of the noise word and the appearance ratio of the recognition word are respectively weighted for a plurality of noise words and a plurality of recognition words, and each weight is added each time a noise word and a recognition word appear as candidates. Is calculated.

【０００７】この手段により、認識結果の第１候補だけ
でなく第２候補〜第ｋ候補までの複数の認識結果を用い
ることにより音声認識中の入力が音声であるかノイズで
あるかの判定処理を行い、認識結果を出力するようにし
たので、誤認識を防止でき、正確な音声認識が可能にな
った。好ましくは、前記認識語の候補のうち、所定数よ
りも少ない文字数で構成される前記認識語の候補を前記
ノイズ語の候補とする。[0007] By this means, not only the first candidate of the recognition result but also a plurality of recognition results from the second candidate to the k-th candidate are used to determine whether the input during speech recognition is speech or noise. , The recognition result is output, erroneous recognition can be prevented, and accurate voice recognition can be performed. Preferably, among the candidates for the recognition word, the candidates for the recognition word having a smaller number of characters than a predetermined number are set as the candidates for the noise word.

【０００８】この手段により、認識結果の候補に認識語
が含まれている時、認識語の文字数が少なければ音声認
識時の入力がノイズである可能性が高く、文字数が長け
れば音声である可能性が高いと判定できる。このよう
に、認識結果の候補が音声か又はノイズかの判定に加え
て認識結果の候補に認識語が含まれている時、その文字
数も判定値に加えることにより、より正しい音声認識結
果を出力することが可能になる。According to this means, when the recognition result candidate includes a recognition word, if the number of characters of the recognition word is small, there is a high possibility that the input at the time of voice recognition is noise, and if the number of characters is long, the input is speech. Can be determined to be high. In this way, when the recognition result candidate includes a recognition word in addition to the determination of whether the recognition result candidate is speech or noise, the number of characters is added to the determination value to output a more accurate voice recognition result. It becomes possible to do.

【０００９】さらに、本発明は、音声入力を行う携帯電
話装置の音声認識方法において、音声を示す複数の認識
語、複数のノイズ語が登録される辞書を保持し、前記音
声入力を認識して複数の候補の認識結果を出力する工程
と、認識された認識語の候補の出現割合、ノイズ語の候
補の出現割合で、認識語か又はノイズ語かの判定処理を
行う工程とを備えることを特徴とする携帯電話装置の音
声認識方法を提供する。Further, according to the present invention, in a voice recognition method of a portable telephone device for performing voice input, a dictionary in which a plurality of recognition words indicating voice and a plurality of noise words are registered is stored, and the voice input is recognized. Outputting a recognition result of the plurality of candidates, and performing a process of determining whether the recognition word is a recognition word or a noise word based on the appearance ratio of the recognized recognition word candidate and the appearance ratio of the noise word candidate. A feature of the present invention is to provide a voice recognition method for a mobile phone device.

【００１０】この手段により、上記発明と同様に、多数
の類似した認識語が登録されている音声認識装置に騒音
環境下で音声入力された場合に、誤認識の低減が可能に
なる。[0010] According to this means, similarly to the above-mentioned invention, it is possible to reduce erroneous recognition when a voice is input in a noise environment to a voice recognition device in which many similar recognition words are registered.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。図１は本発明に係る携帯電
話装置の音声認識システムの概略構成を示すブロック図
である。本図に示すように、携帯電話装置には無線部１
が設けられ、無線部１は図示しない基地局と無線信号の
送受信を行う。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a voice recognition system for a mobile phone device according to the present invention. As shown in FIG.
Is provided, and the radio unit 1 transmits and receives a radio signal to and from a base station (not shown).

【００１２】さらに、携帯電話装置には操作部２が設け
られ、操作部２は携帯電話の操作、音声認識の開始等を
行う。さらに、携帯電話装置には表示部３が設けられ、
表示部３は数字、文字等を表示する。無線部１、操作部
２、表示部３にはメインＣＰＵ（ＣｅｎｔｒａｌＰｒ
ｏｃｅｓｓｉｎｇＵｎｉｔ；中央演算装置）４が接続
され、メインＣＰＵ４は無線部１の送受信の制御を行
い、操作部２から制御指示を受け、表示部３に表示制御
を行う。Further, an operation unit 2 is provided in the portable telephone device, and the operation unit 2 operates the portable telephone, starts voice recognition, and the like. Further, a display unit 3 is provided in the mobile phone device,
The display unit 3 displays numbers, characters, and the like. The wireless unit 1, the operation unit 2, and the display unit 3 have a main CPU (Central Pr)
A processing unit (central processing unit) 4 is connected, and the main CPU 4 controls transmission and reception of the wireless unit 1, receives a control instruction from the operation unit 2, and performs display control on the display unit 3.

【００１３】メインＣＰＵ４には音声認識部５が接続さ
れ、音声認識部５は、音声認識ＬＳＩ（ＬａｒｇｅＳ
ｃａｌｅＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ；大
規模集積回路）で構成され、音声認識処理を行う。な
お、操作部２により、音声認識の開始キーが押下される
と、メインＣＰＵ４で音声認識開始キー押下を検出して
音声認識の起動を確定し、音声認識部５に音声認識開始
命令が送信される。A speech recognition unit 5 is connected to the main CPU 4, and the speech recognition unit 5 includes a speech recognition LSI (Large S).
and a voice integrated circuit, which performs a speech recognition process. When the speech recognition start key is pressed by the operation unit 2, the main CPU 4 detects the depression of the speech recognition start key, determines the start of speech recognition, and transmits a speech recognition start command to the speech recognition unit 5. You.

【００１４】メインＣＰＵ４、音声認識部５にはＡ／Ｄ
（ＡｎａｌｏｇｔｏＤｉｇｉｔａｌ；アナログ・
ディジタル）変換器６、音声合成部１０が接続され、音
声合成部１０は音声認識部５からの認識結果を音声に合
成する。音声合成部１０にはＤ／Ａ（Ｄｉｇｉｔａｌ
ｔｏＡｎａｌｏｇ；ディジタル・アナログ）変換器
７が接続される。The main CPU 4 and the voice recognition unit 5 have an A / D
(Analog to Digital; analog
A (digital) converter 6 and a speech synthesis unit 10 are connected, and the speech synthesis unit 10 synthesizes the recognition result from the speech recognition unit 5 into speech. D / A (Digital)
to Analog (digital / analog) converter 7 is connected.

【００１５】Ａ／Ｄ変換器６にはマイクロフォン８が接
続され、マイクロフォン８は音声の入力を行う。Ａ／Ｄ
変換器６は音声のアナログ信号をディジタルの音声デー
タに変換する。マイクロフォン８への音声入力は携帯電
話の送信音として処理される。A microphone 8 is connected to the A / D converter 6, and the microphone 8 inputs voice. A / D
The converter 6 converts an analog audio signal into digital audio data. The voice input to the microphone 8 is processed as a transmission sound of the mobile phone.

【００１６】さらに、音声認識部５では、メインＣＰＵ
４より開始命令を受信すると、マイクロフォン８から入
力された音声に対して、認識処理が実行される。Ｄ／Ａ
変換器７にはスピーカ９が接続され、スピーカ９は音声
の出力を行う。Ｄ／Ａ変換器７はディジタルの音声デー
タを音声のアナログ信号に変換する。Further, the voice recognition unit 5 includes a main CPU
When a start command is received from 4, the recognition process is performed on the voice input from the microphone 8. D / A
A speaker 9 is connected to the converter 7, and the speaker 9 outputs sound. The D / A converter 7 converts digital voice data into a voice analog signal.

【００１７】スピーカ９への音声出力は携帯電話の受信
音として処理される。さらに、音声認識開始時の開始音
の鳴動、音声認識部５で認識結果が確定すると認識結果
に対応した音声が、音声合成部１０で合成されてＤ／Ａ
変換器７経由でスピーカ９から出力される。また、音声
認識部５で確定された上記の認識結果は表示部３に表示
される。The sound output to the speaker 9 is processed as the sound received by the mobile phone. Furthermore, when the start sound sounds at the start of voice recognition and the recognition result is determined by the voice recognition unit 5, the voice corresponding to the recognition result is synthesized by the voice synthesis unit 10 and D / A
Output from the speaker 9 via the converter 7. In addition, the recognition result determined by the voice recognition unit 5 is displayed on the display unit 3.

【００１８】次に、音声認識部５には、認識語として、
携帯電話装置のメモリダイヤル、機能の呼出しを行うｍ
個の複数の認識語と、ノイズに対する誤動作を防止する
ために登録するｎ個の複数のノイズ語が辞書に登録され
る。ノイズ語とは音声認識中にノイズが入力されたとき
に、メモリダイヤル、機能呼出しの認識語の誤動作を防
止するために登録する音声認識の辞書に、突発的なノイ
ズなど複数の種類の音に関するノイズが登録される。Next, the speech recognition unit 5 recognizes
Calling the memory dial and functions of the mobile phone
The plurality of recognition words and the n number of noise words to be registered in order to prevent malfunction due to noise are registered in the dictionary. A noise word is a dictionary for voice recognition that is registered to prevent malfunctions of recognition words for memory dials and function calls when noise is input during voice recognition. Noise is registered.

【００１９】この登録により、音声認識中に突発的なノ
イズが入力されても音声認識の辞書とのパターンマッチ
ングを行ったときには、メモリダイヤル、機能呼出しの
認識語よりもノイズ語の方が入力音に対して類似性が高
くなる。According to this registration, even when sudden noise is input during voice recognition, when pattern matching with the voice recognition dictionary is performed, the input word of the noise word is better than the recognized word of the memory dial or function call. Is similar to.

【００２０】このため、認識結果はノイズ語となり、ノ
イズ入力に対する誤動作を防止することが可能になる。
さらに、メインＣＰＵ４には認識判定部４Ａが設けら
れ、認識判定部４Ａは音声認識部５の認識結果である認
識語とノイズ語の判定処理を行う。For this reason, the recognition result is a noise word, and it is possible to prevent malfunction due to noise input.
Further, the main CPU 4 is provided with a recognition determining unit 4A, and the recognition determining unit 4A performs a process of determining a recognition word and a noise word, which are the recognition results of the speech recognition unit 5.

【００２１】図２は図１のメインＣＰＵ４、音声認識部
５、認識判定部４Ａの動作概略を説明するフローチャー
トである。ステップＳ１において、操作部２より音声認
識開始キーが押下される。ステップＳ２において、メイ
ンＣＰＵ４で上記音声認識開始キーの押下を検出する。
ステップＳ３において、メインＣＰＵ４は音声認識の起
動を確定する。FIG. 2 is a flowchart for explaining the outline of the operation of the main CPU 4, the voice recognition unit 5, and the recognition determination unit 4A of FIG. In step S1, a voice recognition start key is pressed from the operation unit 2. In step S2, the main CPU 4 detects pressing of the voice recognition start key.
In step S3, the main CPU 4 determines activation of voice recognition.

【００２２】ステップＳ４において、メインＣＰＵ４は
音声認識部５に音声認識開始命令を送信する。ステップ
Ｓ５において、音声認識部５は音声認識を開始する。ス
テップＳ６において、マイクロフォン８より入力された
音声に対し、音声認識部５は音声認識を行う。In step S4, the main CPU 4 sends a voice recognition start command to the voice recognition unit 5. In step S5, the voice recognition unit 5 starts voice recognition. In step S6, the voice recognition unit 5 performs voice recognition on the voice input from the microphone 8.

【００２３】ステップＳ７において、音声認識部５は音
声認識の結果を確定する。ステップＳ８において、認識
結果として第１候補〜第ｋ候補までの認識結果を認識判
定部４Ａに出力する。ステップＳ９において、認識判定
部４Ａでは認識結果の第１候補に加えて第２候補〜第ｋ
候補までの判定処理を行う。In step S7, the voice recognition unit 5 determines the result of voice recognition. In step S8, the recognition results of the first to k-th candidates are output to the recognition determination unit 4A as the recognition results. In step S9, the recognition determination unit 4A adds the second candidate to the k-th candidate in addition to the first candidate of the recognition result.
Perform determination processing up to candidates.

【００２４】ステップＳ１０において、メインＣＰＵ４
の認識判定部４Ａにより認識結果が認識語との判定時に
は認識語に対応したメンテナンスを表示部３に表示し、
スピーカ９より対応する音声を出力する。ステップＳ１
１において、メインＣＰＵ４の認識判定部４Ａにより認
識結果がノイズ語との判定時には音声認識中にノイズが
入力されて誤動作したと判定する。ステップＳ１２にお
いて、判定されたノイズに対して誤動作した旨を伝える
メッセージを表示部３に表示し、スピーカ９より対応す
る音声を出力する。In step S10, the main CPU 4
When the recognition result is determined to be a recognized word by the recognition determining unit 4A, maintenance corresponding to the recognized word is displayed on the display unit 3,
The corresponding sound is output from the speaker 9. Step S1
In 1, when the recognition determination unit 4A of the main CPU 4 determines that the recognition result is a noise word, it is determined that noise has been input during speech recognition and a malfunction has occurred. In step S12, a message notifying that a malfunction has occurred with respect to the determined noise is displayed on the display unit 3, and a corresponding sound is output from the speaker 9.

【００２５】図３及び図４は図１のメインＣＰＵ４の認
識判定部４Ａにおける認識結果の判定の詳細な処理につ
いて説明するフローチャートである。図３に示すよう
に、ステップＳ２１において認識判定部４Ａは、操作部
２より音声認識開始キーが押下されると、音声認識を起
動し、音声認識部５でマイクロフォン８より入力された
音声に対して認識処理が行われ、認識結果が確定するの
を待つ。FIGS. 3 and 4 are flowcharts for explaining the detailed processing of determining the recognition result in the recognition determining section 4A of the main CPU 4 in FIG. As shown in FIG. 3, when the voice recognition start key is pressed from the operation unit 2 in step S21, the recognition determination unit 4A starts voice recognition, and the voice recognition unit 5 responds to the voice input from the microphone 8. Wait for the recognition process to be performed and the recognition result to be determined.

【００２６】ステップＳ２２において、認識結果が確定
し、音声認識で認識結果として第１候補〜第ｋ候補まで
のｋ個の候補が出力されると、認識判定部４Ａは、ｋ個
の候補の認識結果を取得してそれらを用いて、以下のよ
うに、判定を行う。ステップＳ２３において、最初に第
１候補が認識語か否かの判定を行う。ステップＳ２４に
おいて、第１候補が認識語の場合には、第２候補がノイ
ズ語か否かの判定を行う。ノイズ語でなければステップ
Ｓ２６に進む。In step S22, when the recognition result is determined and the k candidates from the first candidate to the k-th candidate are output as the recognition result in the voice recognition, the recognition determining unit 4A recognizes the k candidates. The results are obtained and used to make a determination as follows. In step S23, first, it is determined whether the first candidate is a recognized word. In step S24, if the first candidate is a recognized word, it is determined whether the second candidate is a noise word. If it is not a noise word, the process proceeds to step S26.

【００２７】ステップＳ２５において、第２候補がノイ
ズ語の時には、判定値に重み付けｗ１［ｋ］を加算す
る。ステップＳ２６において、次に、第３候補がノイズ
語か否かの判定を行う。ノイズ語でなければステップＳ
２８に進む。ステップＳ２７において、第３候補がノイ
ズ語の時には、判定値に重み付けｗ１［ｋ−１］を加算
する。In step S25, when the second candidate is a noise word, weight w1 [k] is added to the judgment value. In step S26, it is next determined whether or not the third candidate is a noise word. If not a noise word, step S
Proceed to 28. In step S27, when the third candidate is a noise word, weight w1 [k-1] is added to the determination value.

【００２８】ステップＳ２８において、続けて第ｋ候補
まで各候補がノイズ語か否かの判定処理を行う。ノイズ
語でなければステップＳ３０に進む。ステップＳ２９に
おいて、第ｋ候補がノイズ語の時には、判定値に重み付
けｗ１［１］を加算する。ステップＳ３０において、第
ｋ候補までの判定処理が終了したら判定値の累積加算値
をスレッショルドＴｈ１と比較する。In step S28, a process of determining whether each candidate up to the k-th candidate is a noise word is performed. If it is not a noise word, the process proceeds to step S30. In step S29, when the k-th candidate is a noise word, weight w1 [1] is added to the determination value. In step S30, when the determination processing up to the k-th candidate is completed, the cumulative addition value of the determination values is compared with the threshold Th1.

【００２９】ステップＳ３１において、判定値がＴｈ１
よりも大きい時には認識結果の第１候補が認識語でも音
声認識中に入力された音声はノイズと判定する。ステッ
プＳ３２において、ノイズ入力に対する認識結果を表示
して処理を終了する。ステップＳ３３において、ステッ
プＳ３０で、判定値がＴｈ１よりも小さいときには音声
認識時の入力が音声であると判定する。In step S31, the judgment value is Th1
If it is larger than the first word, the speech input during speech recognition is determined to be noise even if the first candidate of the recognition result is a recognized word. In step S32, the recognition result for the noise input is displayed, and the process ends. In step S33, when the determination value is smaller than Th1 in step S30, it is determined that the input at the time of voice recognition is a voice.

【００３０】ステップＳ３４において、第１候補の認識
語に対する認識結果を表示して処理を終了する。このた
め、音声認識で音声認識中にノイズが入力されたにも拘
わらず第１候補が認識語となる誤認識が発生しても、第
２候補〜第ｋ候補の結果を用いて第２候補〜第ｋ候補に
ノイズ語が多数含まれていて判定値がＴｈ１以上のとき
には音声認識中の入力が音声ではなくノイズ入力である
と判定する。In step S34, the recognition result for the first candidate recognition word is displayed, and the process ends. For this reason, even if erroneous recognition in which the first candidate is a recognition word occurs despite noise input during voice recognition in voice recognition, the second candidate is used by using the results of the second to k-th candidates. When the k-th candidate includes many noise words and the determination value is equal to or greater than Th1, it is determined that the input during voice recognition is not a voice but a noise input.

【００３１】このため、ノイズ入力に対する認識結果を
表示することによって誤認識を回避することが可能にな
る。図４に示すように、ステップＳ３５において、ステ
ップＳ２３で第１候補がノイズ語と判定時には第２候補
が認識語か否かの判定処理を行う。認識語でなければス
テップＳ３７に進む。For this reason, it is possible to avoid erroneous recognition by displaying the recognition result for the noise input. As shown in FIG. 4, in step S35, when the first candidate is determined to be a noise word in step S23, a process of determining whether the second candidate is a recognized word is performed. If not, the process proceeds to step S37.

【００３２】ステップＳ３６において、第ｋ候補が認識
語の時には、判定値に重み付けｗ２［ｋ］を加算する。
ステップＳ３７において、次に、第３候補が認識語か否
かの判定を行う。認識語でなければステップＳ３９に進
む。ステップＳ３８において、第３候補が認識語の時に
は、判定値に重み付けｗ２［ｋ−１］を加算する。In step S36, when the k-th candidate is a recognized word, a weight w2 [k] is added to the determination value.
In step S37, it is next determined whether or not the third candidate is a recognized word. If not, the process proceeds to step S39. In step S38, when the third candidate is a recognized word, weight w2 [k-1] is added to the determination value.

【００３３】ステップＳ３９において、続けて第ｋ候補
まで各候補が認識語か否かの判定処理を行う。認識語で
なければステップＳ４１に進む。ステップＳ４０におい
て、第ｋ候補が認識語の時には、判定値に重み付けｗ２
［１］を加算する。ステップＳ４１において、第ｋ候補
までの判定処理が終了したら判定値の累積加算値をスレ
ッショルドＴｈ２と比較する。In step S39, a process of determining whether each candidate up to the k-th candidate is a recognized word is performed. If not, the process proceeds to step S41. In step S40, when the k-th candidate is a recognition word, the determination value is weighted w2
[1] is added. In step S41, when the determination processing up to the k-th candidate is completed, the cumulative addition value of the determination values is compared with the threshold Th2.

【００３４】ステップＳ４２において、判定値がＴｈ２
よりも大きいときには認識結果の第１候補がノイズ語で
も音声認識中に入力された音声はノイズでなく音声であ
ると判定する。ステップＳ４３において、第２候補の認
識語に対する認識結果を表示して処理を終了する。In step S42, the judgment value is Th2
If it is larger than the first candidate of the recognition result, it is determined that the voice input during the voice recognition is not a noise but a voice even if the first candidate is a noise word. In step S43, the recognition result for the recognized word of the second candidate is displayed, and the process ends.

【００３５】ステップＳ４４において、ステップＳ４１
で、判定値がＴｈ２よりも小さいときには音声認識時の
入力がノイズと判定する。ステップＳ４５において、認
識結果がノイズ語であるときの認識結果を表示して処理
を終了する。In step S44, step S41
When the determination value is smaller than Th2, the input at the time of speech recognition is determined to be noise. In step S45, the recognition result when the recognition result is a noise word is displayed, and the process ends.

【００３６】図５は具体的な例について説明する図であ
る。本図（ａ）の音声認識部５における認識辞書の構成
に示すように、音声認識の認識辞書として認識語が１０
単語（ｍ）でノイズ語が５単語（ｎ）登録されており、
認識結果として第３候補（ｋ）まで出力されるとする。
さらに、本図（ｄ）の認識結果例１に示すように、音声
認識中にノイズが入力されたときに第１候補が認識語ス
ズキで、第２候補と第３候補がノイズ語の例について説
明する。FIG. 5 is a diagram for explaining a specific example. As shown in the configuration of the recognition dictionary in the voice recognition unit 5 in FIG.
Five noise words (n) are registered as the word (m),
It is assumed that up to the third candidate (k) is output as a recognition result.
Further, as shown in the recognition result example 1 of FIG. 3D, when noise is input during speech recognition, the first candidate is a recognition word Suzuki, and the second and third candidates are noise words. explain.

【００３７】第１候補が認識語の時には第２候補と第３
候補がノイズ語か否かの判定を行い、判定値を計算す
る。上記の例１では、第２候補、第３候補共にノイズ語
であるため、判定値に、本図（ｂ）に示す判定重み付け
値ｗ１［２］（＝２）と値ｗ１［１］（＝１）を加算し
て判定値が３となる。When the first candidate is a recognized word, the second candidate and the third
It is determined whether or not the candidate is a noise word, and a determination value is calculated. In Example 1 described above, since both the second candidate and the third candidate are noise words, the judgment values include the judgment weight w1 [2] (= 2) and the value w1 [1] (= The judgment value becomes 3 by adding 1).

【００３８】判定値Ｔｈ１（＝３）以上の値となるた
め、認識結果の第１候補が認識語のスズキであるが、音
声認識時の入力がノイズと判定して認識結果がノイズ語
に対する認識結果を表示して誤認識が防止される。次
に、本図(ｅ)認識結果例２に示すように、音声認識中に
スズキを発声した時、第１候補がノイズ語、第２候補が
スズキで第３候補がサトウとなる例について説明する。Since the judgment value Th1 (= 3) or more, the first candidate of the recognition result is the recognition word Suzuki, but the input at the time of speech recognition is determined to be noise, and the recognition result is the recognition for the noise word. The result is displayed to prevent erroneous recognition. Next, as shown in the example (e) of the recognition result, when Suzuki is uttered during speech recognition, an example will be described in which the first candidate is a noise word, the second candidate is Suzuki, and the third candidate is Sato. I do.

【００３９】第１候補がノイズ語の時には第２候補、第
３候補が認識語か否かの判定処理を行い、判定値が計算
される。上記の例２では、第２候補、第３候補共に認識
語のため、判定値に、本図（ｂ）の判定重み付け値ｗ２
［２］（＝２）とｗ２［１］（＝１）を加算して、判定
値が３となる。When the first candidate is a noise word, a process of determining whether or not the second and third candidates are recognized words is performed, and a determination value is calculated. In the above example 2, since both the second candidate and the third candidate are recognized words, the judgment weight is set to the judgment weight w2 in FIG.
By adding [2] (= 2) and w2 [1] (= 1), the determination value becomes 3.

【００４０】判定値はＴｈ２（＝３）以上の値となるた
め、認識結果の第１候補がノイズ語であるが、音声認識
時の入力が音声と判定して認識結果として第２候補のス
ズキを出力する。このように、ノイズ語の候補の出現割
合、認識語の候補の出現割合で、認識語か又はノイズ語
かの判定処理を行うので、誤認識低減が可能になる。Since the determination value is a value equal to or greater than Th2 (= 3), the first candidate of the recognition result is a noise word, but the input at the time of voice recognition is determined to be voice, and the second candidate is Suzuki as the recognition result. Is output. As described above, since the process of determining whether the recognition word is a recognition word or a noise word is performed based on the appearance ratio of the noise word candidate and the recognition word candidate appearance ratio, it is possible to reduce erroneous recognition.

【００４１】以上、本発明の実施の形態における携帯電
話装置の音声認識システムを説明したが、本発明はこの
実施例に限定されるものではなく、その発明の趣旨にし
たがって各種変更が可能である。したがって、本発明に
よれば、誤認識を防止してより正確な音声認識の結果を
出力することが可能になる。この結果、音声認識の認識
性能の向上が行える。Although the speech recognition system of the portable telephone device according to the embodiment of the present invention has been described above, the present invention is not limited to this embodiment, and various changes can be made according to the gist of the invention. . Therefore, according to the present invention, it is possible to prevent erroneous recognition and output a more accurate voice recognition result. As a result, the recognition performance of speech recognition can be improved.

【００４２】認識結果の第１候補だけでなく第２候補〜
第ｋ候補までの複数の認識結果を用いることにより音声
認識中の入力が音声であるか、又は、ノイズであるかの
判定処理を行い、認識結果が出力されるためである。次
に、本発明における他の実施の形態について説明する。
本実施の形態では音声認識中の入力が音声か又はノイズ
かの判定処理手段として、第２候補〜第ｋ候補までがノ
イズであるか又は音声であるかの判定処理に加えて、第
２候補〜第ｋ候補までに認識語が含まれている時に認識
語の文字数が長いか又は短いかが判定値として加えられ
る。Not only the first candidate of the recognition result but also the second candidate
This is because, by using a plurality of recognition results up to the k-th candidate, a process of determining whether the input during speech recognition is speech or noise is performed, and the recognition result is output. Next, another embodiment of the present invention will be described.
In the present embodiment, in addition to the processing of determining whether the second to k-th candidates are noise or voice, the second candidate is used as a processing unit for determining whether the input during voice recognition is voice or noise. When the recognized word is included in the first to k-th candidates, whether the number of characters of the recognized word is long or short is added as a determination value.

【００４３】例えば、認識結果の候補に認識語が含まれ
ている時、認識語の文字数が少なければ音声認識時の入
力がノイズである可能性が高く、文字数が長ければ音声
である可能性が高いと判定できる。このように、認識結
果の候補が音声か又はノイズかの判定に加えて認識結果
の候補に認識語が含まれている時、その文字数も判定値
に加えることにより、より正しい音声認識結果を出力す
ることが可能になる。For example, when the recognition result candidate includes a recognition word, if the number of characters of the recognition word is small, the input during speech recognition is likely to be noise, and if the number of characters is long, the input is likely to be speech. It can be determined to be high. In this way, when the recognition result candidate includes a recognition word in addition to the determination of whether the recognition result candidate is speech or noise, the number of characters is added to the determination value to output a more accurate voice recognition result. It becomes possible to do.

【００４４】[0044]

【発明の効果】以上説明したように、本発明によれば、
認識結果の第１候補だけでなく第２候補〜第ｋ候補まで
の複数の認識結果を用いることにより音声認識中の入力
が音声であるかノイズであるかの判定処理を行い、認識
結果を出力するようにしたので、誤認識を防止でき、正
確な音声認識が可能になった。As described above, according to the present invention,
By using not only the first candidate of the recognition result but also a plurality of recognition results from the second candidate to the k-th candidate, a process of determining whether the input during the voice recognition is a voice or noise is performed, and the recognition result is output. As a result, erroneous recognition can be prevented, and accurate voice recognition can be performed.

[Brief description of the drawings]

【図１】本発明に係る携帯電話装置の音声認識システム
の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a voice recognition system for a mobile phone device according to the present invention.

【図２】図１のメインＣＰＵ４、音声認識部５、認識判
定部４Ａの動作概略を説明するフローチャートである。FIG. 2 is a flowchart illustrating an outline of operations of a main CPU 4, a voice recognition unit 5, and a recognition determination unit 4A in FIG.

【図３】図１のメインＣＰＵ４の認識判定部４Ａにおけ
る認識結果の判定の詳細な処理について説明するフロー
チャートである。FIG. 3 is a flowchart illustrating a detailed process of determining a recognition result in a recognition determining unit 4A of a main CPU 4 in FIG. 1;

【図４】図１のメインＣＰＵ４の認識判定部４Ａにおけ
る認識結果の判定の詳細な処理について説明するフロー
チャートである。4 is a flowchart illustrating a detailed process of determining a recognition result in a recognition determining unit 4A of a main CPU 4 in FIG. 1;

【図５】具体的な例について説明する図である。FIG. 5 is a diagram illustrating a specific example.

[Explanation of symbols]

１…無線部２…操作部３…表示部４…メインＣＰＵ４Ａ…認識判定部５…音声認識部６…Ａ／Ｄ変換器７…Ｄ／Ａ変換器８…マイクロフォン９…スピーカ１０…音声合成部 DESCRIPTION OF SYMBOLS 1 ... Radio | wireless part 2 ... Operation part 3 ... Display part 4 ... Main CPU 4A ... Recognition judgment part 5 ... Voice recognition part 6 ... A / D converter 7 ... D / A converter 8 ... Microphone 9 ... Speaker 10 ... Voice synthesis Department

Claims

[Claims]

1. A speech recognition system for a mobile phone device for performing speech input, comprising: a dictionary in which a plurality of recognition words indicating speech and a plurality of noise words are registered; A speech recognition unit that outputs a recognition result, and the appearance ratio of recognition word candidates recognized by the speech recognition unit,
A speech recognition system for a mobile phone device, comprising: a recognition determination unit that performs a process of determining whether a recognition word is a recognition word or a noise word based on the appearance ratio of noise word candidates.

2. The method according to claim 1, wherein the first candidate is a recognition word and the appearance ratio of noise words after the second candidate is equal to or greater than a determination value, and the recognition result is determined as a noise word. The recognition result is determined as a recognized word when the appearance ratio of the noise word after the candidate is less than the determination value.
A speech recognition system for a mobile phone device according to claim 1.

3. When the first candidate is a noise word and the appearance ratio of the recognized words after the second candidate is equal to or greater than a determination value, the recognition determination unit determines the recognition result as a recognized word. The speech recognition system according to claim 1, wherein the recognition result is determined to be a noise word when the appearance ratio of the recognized word after the candidate is less than the determination value.

4. The appearance rate of the noise word and the appearance rate of the recognition word are respectively weighted to a plurality of noise words and a plurality of recognition words, and each time a noise word or a recognition word appears as a candidate, 4. The speech recognition system for a mobile phone device according to claim 2, wherein the value is calculated by adding

5. The mobile phone according to claim 1, wherein, among the candidates for the recognition word, the candidates for the recognition word having a smaller number of characters than a predetermined number are set as the candidates for the noise word. Equipment speech recognition system.

6. A voice recognition method for a mobile phone device for performing voice input, comprising: a dictionary in which a plurality of recognition words indicating voice and a plurality of noise words are registered; Outputting a recognition result; and performing a process of determining whether the recognition word is a recognition word or a noise word based on the appearance ratio of the recognized recognition word candidate and the appearance ratio of the noise word candidate. Voice recognition method for telephone equipment.