JP2002258889A

JP2002258889A - Dictionary-editable speech recognition device

Info

Publication number: JP2002258889A
Application number: JP2001059940A
Authority: JP
Inventors: Takeshi Ono; 健大野
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2001-03-05
Filing date: 2001-03-05
Publication date: 2002-09-11

Abstract

(57)【要約】【課題】入力語の認識率を向上する。【解決手段】編集された語を音声信号に変換し、変換
された音声信号と辞書内の語との一致度を演算する。そ
して、最高の一致度の語が編集された語と異なる場合
に、編集された語が不適切であると判断し、判断結果を
出力する。これにより、不適切であると判断された語の
再編集を使用者に促すことになり、認識率を向上させる
ことができる。 (57) [Summary] [Problem] To improve the recognition rate of an input word. SOLUTION: An edited word is converted into a voice signal, and a degree of coincidence between the converted voice signal and a word in a dictionary is calculated. If the word having the highest degree of matching is different from the edited word, the edited word is determined to be inappropriate, and the result of the determination is output. As a result, the user is prompted to re-edit the word determined to be inappropriate, and the recognition rate can be improved.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、認識対象語の辞書
を編集可能な音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus capable of editing a dictionary of words to be recognized.

【０００２】[0002]

【従来の技術】認識対象語をひらがな表記の文字列とし
て入力し、辞書に登録するようにした辞書編集可能な音
声認識装置が知られている（例えば特開平１１−３１１
９９１号公報参照）。2. Description of the Related Art There is known a dictionary-editable speech recognition apparatus in which a recognition target word is input as a character string in hiragana and registered in a dictionary (for example, Japanese Patent Laid-Open No. 11-311).
No. 991).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
辞書編集可能な音声認識装置では、すでに辞書内に登録
されている語と発音の近い語句を使用者が入力してしま
った場合に、認識率が低くなるという問題がある。However, in the conventional dictionary-recognizable speech recognition apparatus, when a user inputs a word whose pronunciation is close to a word already registered in the dictionary, the recognition rate is reduced. Is low.

【０００４】本発明の目的は、入力語の認識率を向上す
ることにある。An object of the present invention is to improve the recognition rate of an input word.

【０００５】[0005]

【課題を解決するための手段】（１）請求項１の発明
は、使用者が認識対象語の辞書を編集可能な音声認識装
置であって、編集された語を音声信号に変換する変換手
段と、変換された音声信号と辞書内の語との一致度を演
算する演算手段と、最高の一致度の語が編集された語と
異なる場合に、編集された語が不適切であると判断する
判断手段と、判断手段の判断結果を出力する出力手段と
を備える。（２）請求項２の辞書編集可能な音声認識装置は、判
断手段によって、最高の一致度の語が編集された語であ
る場合に、最高の一致度と次に高い一致度との差がしき
い値より小さければ、編集された語は不適切であると判
断する。（３）請求項３の辞書編集可能な音声認識装置は、変
換手段によって、予め記憶しておいた使用者固有の音素
片を用いて音声信号に変換するようにしたものである。（４）請求項４の辞書編集可能な音声認識装置は、変
換手段によって、予め記憶しておいた騒音を音声信号に
加算するようにしたものである。（５）請求項５の辞書編集可能な音声認識装置は、外
部機器から語とその属性情報とを入力して辞書を編集す
るようにしたものである。（６）請求項６の辞書編集可能な音声認識装置は、判
断手段により外部機器から入力した語が不適切であると
判断された場合に、外部機器から入力した語とその属性
情報とを組み合わせて新たな語を生成する生成手段を備
えるようにしたものである。According to a first aspect of the present invention, there is provided a voice recognition apparatus which allows a user to edit a dictionary of words to be recognized, wherein the conversion means converts the edited word into a voice signal. Calculating means for calculating the degree of coincidence between the converted speech signal and a word in the dictionary; and determining that the edited word is inappropriate if the word having the highest degree of coincidence is different from the edited word. And output means for outputting the result of the judgment by the judgment means. (2) The dictionary-editable speech recognition device according to claim 2, wherein, if the word having the highest matching degree is an edited word, the difference between the highest matching degree and the next highest matching degree is determined by the determining means. If it is smaller than the threshold value, it is determined that the edited word is inappropriate. (3) The dictionary-editable speech recognition device according to the third aspect is configured such that the conversion unit converts the speech signal into a speech signal using a speech element unique to the user stored in advance. (4) In the voice recognition device capable of editing a dictionary according to claim 4, the noise stored in advance is added to the voice signal by the conversion means. (5) The dictionary-recognizable speech recognition apparatus according to claim 5 is configured to edit a dictionary by inputting a word and its attribute information from an external device. (6) The dictionary-editable speech recognition device according to claim 6, wherein the word input from the external device is combined with the attribute information when the word input from the external device is determined to be inappropriate by the determining means. And generating means for generating a new word.

【０００６】[0006]

【発明の効果】（１）請求項１の発明によれば、不適
切であると判断された語の再編集を使用者に促すことに
なり、認識率を向上させることができる。（２）請求項２の発明によれば、不適切な語を編集し
て辞書に記憶することが避けられ、結果として使用者の
発話した音声に対する認識率の低下を防止することがで
きる。（３）請求項３の発明によれば、使用者の音声を反映
したより正確な判断がなされる。（４）請求項４の発明によれば、実使用時の騒音条件
を反映したより正確な判断がなされる。（５）請求項５の発明によれば、使用者が辞書を編集
する負担を軽減することができる。（６）請求項６の発明によれば、使用者が再編集する
負担を軽減することができる。According to the first aspect of the present invention, the user is prompted to re-edit a word determined to be inappropriate, and the recognition rate can be improved. (2) According to the second aspect of the present invention, it is possible to avoid editing an inappropriate word and storing it in the dictionary, and as a result, it is possible to prevent a reduction in the recognition rate for the voice uttered by the user. (3) According to the third aspect of the present invention, a more accurate judgment reflecting the voice of the user is made. (4) According to the fourth aspect of the present invention, a more accurate determination reflecting the noise condition in actual use is made. (5) According to the invention of claim 5, it is possible to reduce the burden of the user editing the dictionary. (6) According to the invention of claim 6, it is possible to reduce the burden of re-editing by the user.

【０００７】[0007]

【発明の実施の形態】《発明の第１の実施の形態》音声
認識ユニット１は、ＣＰＵ１１ａとメモリ１１ｂなどか
ら構成される信号処理装置１１、認識対象の語の辞書
と、語の読みから音声信号を復元するための音素パター
ンとを記憶する外部記憶装置１２、マイク２で収録した
音声をデジタル信号に変換するＡＤコンバーター１３、
デジタル音声信号をアナログ音声信号に変換するＤＡコ
ンバーター１４、アナログ音声信号を増幅する出力アン
プ１５などを備えている。音声認識ユニット１には、マ
イク２、ディスプレイ３、スピーカー４、入力装置５な
どが接続される。入力装置５は、使用者が音声認識開始
要求や認識対象の語の編集操作を入力するための装置で
ある。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS << First Embodiment of the Invention >> A speech recognition unit 1 comprises a signal processing device 11 composed of a CPU 11a and a memory 11b, a dictionary of words to be recognized, and speech from reading words. An external storage device 12 for storing a phoneme pattern for restoring a signal, an AD converter 13 for converting sound recorded by the microphone 2 into a digital signal,
A DA converter 14 that converts a digital audio signal into an analog audio signal, an output amplifier 15 that amplifies the analog audio signal, and the like are provided. The microphone 2, the display 3, the speaker 4, the input device 5, and the like are connected to the voice recognition unit 1. The input device 5 is a device for a user to input a voice recognition start request or an editing operation of a word to be recognized.

【０００８】図２は音声認識処理を示すフローチャート
である。信号処理装置１１は、入力装置５により発話操
作がなされるとこの音声認識処理を開始する。ステップ
１において、音声認識処理を開始したことを使用者に告
知するために、外部記憶装置１２に記憶されている告知
音信号をＤＡコンバーター１４および出力アンプ１５を
介してスピーカー４へ出力し、放送する。FIG. 2 is a flowchart showing a speech recognition process. The signal processing device 11 starts this voice recognition processing when a speech operation is performed by the input device 5. In step 1, the notification sound signal stored in the external storage device 12 is output to the speaker 4 via the DA converter 14 and the output amplifier 15 in order to notify the user of the start of the voice recognition processing, and I do.

【０００９】ステップ２で音声の取り込みを開始する。
ここで使用者は辞書に含まれる語を発話する。外部記憶
装置１２に記憶されている辞書の一例を図４に示す。図
４は人名辞書を示す。信号処理装置１１は、入力装置５
の発話スイッチが操作されるまでは、マイク２で収録さ
れＡＤコンバーター１３でデジタル信号に変換された音
声信号の平均パワーを演算している。発話スイッチが操
作された後、平均パワーに比べて音声信号の瞬時パワー
が所定値以上大きくなったら使用者が発話したと判断
し、音声の取り込みを開始する。[0009] In step 2, the capture of voice is started.
Here, the user speaks a word included in the dictionary. FIG. 4 shows an example of the dictionary stored in the external storage device 12. FIG. 4 shows a personal name dictionary. The signal processing device 11 includes the input device 5
Until the speech switch is operated, the average power of the audio signal recorded by the microphone 2 and converted into the digital signal by the AD converter 13 is calculated. After the utterance switch is operated, when the instantaneous power of the audio signal becomes larger than the average power by a predetermined value or more, it is determined that the user has uttered and the fetching of the voice is started.

【００１０】ステップ３では、取り込んだ音声と外部記
憶装置１２に記憶されている辞書内の認識対象語との一
致度の演算を開始する。一致度は音声区間部分と個々の
認識対象語とがどの程度似ているかを示す値であり、値
が大きいほど一致度が高いものとする。なお、一致度の
演算中も上述した音声の取り込み処理が並行して行われ
る。ステップ４で、取り込んだ音声の瞬時パワーが所定
時間以上、所定値以下になったら使用者の発話が終了し
たと判断し、入力を終了する。In step 3, calculation of the degree of coincidence between the fetched voice and the recognition target word in the dictionary stored in the external storage device 12 is started. The degree of coincidence is a value indicating how similar the speech section part is to the individual recognition target words, and the greater the value, the higher the degree of coincidence. During the calculation of the degree of coincidence, the above-described voice capturing processing is performed in parallel. In step 4, when the instantaneous power of the fetched voice becomes equal to or more than a predetermined time and equal to or less than a predetermined value, it is determined that the utterance of the user has ended and the input is ended.

【００１１】次に、ステップ５で一致度の演算終了を待
ち、一致度の演算を行った認識対象語の中から一致度が
最も高い語を抽出する。そして、ステップ６で抽出した
最高の一致度の語をディスプレイ３に表示する。また、
同時に最高の一致度の語を操作対象機器へ出力する。例
えば、操作対象機器が電話機の場合は、使用者が「かと
う」と発話すると図４に示す辞書の３番目の人名「加
藤」が得られ、電話機に送信されて「加藤」の電話番号
への発呼が開始される。Next, in Step 5, the calculation of the degree of coincidence is waited for, and the word having the highest degree of coincidence is extracted from the recognition target words for which the computation of the degree of coincidence has been performed. Then, the word having the highest matching degree extracted in step 6 is displayed on the display 3. Also,
At the same time, the word with the highest matching degree is output to the operation target device. For example, when the operation target device is a telephone, when the user utters “Kato”, the third person name “Kato” in the dictionary shown in FIG. 4 is obtained, transmitted to the telephone, and transmitted to the telephone number of “Kato”. The call is started.

【００１２】図３は辞書編集入力処理を示すフローチャ
ートである。信号処理装置１１は、入力装置５により編
集操作がなされるとこの辞書編集入力処理を開始する。
図５に、図４に示す人名辞書に「佐藤」を追加編集した
例を示す。ステップ１１において、編集された語の読み
を示すテキストデータをもとに、外部記憶装置１２に記
憶されている音素パターンを組み合わせて音声信号を生
成する。この音声信号は可聴できる信号である必要はな
く、ピッチ周波数や発話速度を反映している必要はな
い。FIG. 3 is a flowchart showing a dictionary editing input process. When an editing operation is performed by the input device 5, the signal processing device 11 starts the dictionary editing input process.
FIG. 5 shows an example in which "Sato" is additionally edited in the personal name dictionary shown in FIG. In step 11, a speech signal is generated by combining phoneme patterns stored in the external storage device 12 based on the text data indicating the reading of the edited word. This audio signal need not be an audible signal, and need not reflect the pitch frequency or speech rate.

【００１３】次に、ステップ１２で、生成した音声信号
と外部記憶装置１２の辞書内のすべての語との一致度を
演算する。なお、辞書には編集済みの認識対象語が含ま
れている。図６に、一致度の演算結果の一例を示す。編
集された「佐藤」から生成した音声信号との一致度を演
算したので、通常は「佐藤」の一致度が最も高い。この
例では、「加藤」の一致度が第２位となっている。ステ
ップ１３で、編集された語「佐藤」が適切か否かを判断
する。辞書内の最高の一致度の語と編集された語「佐
藤」とが異なる場合は、編集された語「佐藤」は辞書に
すでに記憶されている最高の一致度の語と誤認される可
能性が高く、不適切であると判断する。一方、辞書内の
最高の一致度の語と編集された語「佐藤」とが一致する
場合には、直ちに編集された語「佐藤」が適切であると
判断せず、正確な判断を行うために次の処理を行う。Next, in step 12, the degree of coincidence between the generated speech signal and all the words in the dictionary of the external storage device 12 is calculated. The dictionary includes the edited recognition target words. FIG. 6 shows an example of the calculation result of the degree of coincidence. Since the degree of matching with the audio signal generated from the edited "Sato" was calculated, the degree of matching of "Sato" is usually the highest. In this example, the degree of coincidence of “Kato” is second. In step 13, it is determined whether the edited word "Sato" is appropriate. If the word with the highest match in the dictionary is different from the edited word "Sato", the edited word "Sato" may be mistaken for the word with the highest match already stored in the dictionary Is high and judged inappropriate. On the other hand, if the word with the highest degree of matching in the dictionary matches the edited word "Sato", the edited word "Sato" is not immediately determined to be appropriate, and an accurate determination is made. The following processing is performed.

【００１４】最高の一致度の語が編集された語「佐藤」
である場合は、最高の一致度と次に高い一致度とを比較
する。編集された語の一致度をＲ０とし、次に高い一致
度をＲ１とすると、The word "Sato" in which the word with the highest degree of coincidence has been edited
If so, the highest matching score is compared with the next highest matching score. If the degree of coincidence of the edited word is R0 and the next highest degree of coincidence is R1,

【数１】Ｒ０（１−ｋ）＜Ｒ１の関係を満たす場合には、最高の一致度の語と次に一致
度が高い語との一致度の差が余りなく、最高の一致度の
語が編集された語「佐藤」ではあっても、次に一致度の
高い語と間違える可能性も高いので、編集された語は不
適切であると判断する。なお、数式１において、ｋは１
より小さい正の値であり、実験により最適な値を決定す
る。この実施の形態では、例えば０．０１とする。## EQU1 ## If the relationship of R0 (1-k) <R1 is satisfied, there is not much difference in the matching degree between the word having the highest matching degree and the word having the next highest matching degree, and the word having the highest matching degree is obtained. Is an edited word "Sato", it is highly likely that the edited word is the next word with the highest degree of coincidence, so the edited word is determined to be inappropriate. In Equation 1, k is 1
It is a smaller positive value and the optimal value is determined by experiment. In this embodiment, for example, it is 0.01.

【００１５】図６に示す例では、編集された語「佐藤」
の一致度が最も高く、Ｒ０＝６１５０であり、編集語以
外の語の中で「加藤」の一致度が最も高く、Ｒ１＝６１
００である。これらの値を数式１に代入すると、In the example shown in FIG. 6, the edited word "Sato"
Is the highest, R0 = 6150, and among the non-edited words, "Kato" is the highest, and R1 = 61.
00. Substituting these values into Equation 1 gives

【数２】６１５０×（１−０．０１）＝６０８８．５＜６１００となり、編集された語「佐藤」はすでに辞書内に記憶さ
れている「加藤」と誤認しやすく、不適切であると判断
される。なお、演算の誤差でＲ０＜Ｒ１となった場合で
も、当然、数式１は成立する。6150 × (1−0.01) = 6088.5 <6100, and the edited word “Sato” is easily misidentified as “Kato” already stored in the dictionary, and is determined to be inappropriate. Is determined. It should be noted that even when R0 <R1 due to an error in the calculation, Expression 1 is naturally satisfied.

【００１６】ステップ１４において、判断結果をディス
プレイ３にメッセージとして表示する。不適切と判断さ
れた場合は、例えば「編集された”佐藤”は、辞書にす
でに記憶されている”加藤”と誤認識されるおそれがあ
ります」というメッセージを表示する。In step 14, the result of the judgment is displayed on the display 3 as a message. If it is determined to be inappropriate, for example, a message that “edited“ Sato ”may be erroneously recognized as“ Kato ”already stored in the dictionary” is displayed.

【００１７】これにより、不適切な語を編集して辞書に
記憶することが避けられ、結果として使用者の発話した
音声に対する認識率の低下を防止することができる。Thus, it is possible to avoid editing an inappropriate word and storing it in the dictionary, and as a result, it is possible to prevent a reduction in the recognition rate for the voice uttered by the user.

【００１８】《発明の第２の実施の形態》使用者の声を
もとに予め生成した音素パターンを外部記憶装置１２に
記憶しておく第２の実施の形態を説明する。なお、この
第２の実施の形態の構成は図１に示す第１の実施の形態
の構成と同様であり、説明を省略する。<< Second Embodiment of the Invention >> A second embodiment in which a phoneme pattern generated in advance based on a user's voice is stored in the external storage device 12 will be described. The configuration of the second embodiment is the same as the configuration of the first embodiment shown in FIG.

【００１９】この第２の実施の形態の辞書編集入力処理
は、図３に示す第１の実施の形態の処理とステップ１１
の処理を除き同様であり、図示を省略して相違点のみを
説明する。この第２の実施の形態では、ステップ１１
で、編集された語の読みを示すテキストデータをもと
に、外部記憶装置１２に記憶されている音素パターンを
組み合わせて音声信号を生成するが、外部記憶装置１２
に記憶されている音素パターンは使用者の声をもとに予
め生成された音素パターンである。The dictionary editing and inputting process of the second embodiment is the same as that of the first embodiment shown in FIG.
The processing is the same except for the processing described above, and only the differences will be described with illustration omitted. In the second embodiment, step 11
Then, based on the text data indicating the reading of the edited word, a speech signal is generated by combining the phoneme patterns stored in the external storage device 12.
Is a phoneme pattern generated in advance based on the voice of the user.

【００２０】この第２の実施の形態によれば、認識率低
下が使用者の声質に依存する場合でも、不適切であるか
否かを正確に判断することができ、結果として認識率の
低下を防止することができる。According to the second embodiment, it is possible to accurately judge whether or not the recognition rate is inappropriate even if the reduction in the recognition rate depends on the voice quality of the user. Can be prevented.

【００２１】《発明の第３の実施の形態》可聴できる音
声信号を出力するためのデータと実使用時の騒音データ
とを外部記憶装置１２に記憶するようにした第３の実施
の形態を説明する。なお、この第３の実施の形態の構成
は図１に示す第１の実施の形態の構成と同様であり、説
明を省略する。<< Third Embodiment of the Invention >> A third embodiment in which data for outputting an audible audio signal and noise data during actual use are stored in the external storage device 12 will be described. I do. The configuration of the third embodiment is the same as the configuration of the first embodiment shown in FIG.

【００２２】この第３の実施の形態の辞書編集入力処理
は、図３に示す第１の実施の形態の処理とステップ１１
の処理を除き同様であり、図示を省略して相違点のみを
説明する。この第３の実施の形態では、図３のステップ
１１で、編集された語の読みを示すテキストデータをも
とに、外部記憶装置１２に記憶されている音素パターン
を組み合わせて可聴できる音声信号を生成する。可聴で
きる音声信号を生成する手法は、例えば特開平０９−０
８１１７５号公報などに開示されている。この手法は、
使用者にメロディーを提示して歌を歌わせながら音素を
作成し、音声信号に変換したいテキストデータが入力さ
れたときにこのテキストデータに応じて音素片を組み合
わせて音声信号に変換する。なお、必ずしも使用者にメ
ロディーを提示する必要はない。The dictionary editing and inputting process of the third embodiment is the same as that of the first embodiment shown in FIG.
The processing is the same except for the processing described above, and only the differences will be described with illustration omitted. In the third embodiment, in step 11 of FIG. 3, an audible audio signal is generated by combining phoneme patterns stored in the external storage device 12 based on text data indicating the reading of the edited word. Generate. A method of generating an audible audio signal is described in, for example,
No. 81175 discloses this. This technique is
A phoneme is created while presenting a melody to the user and singing a song, and when text data to be converted into a voice signal is input, phoneme pieces are combined according to the text data and converted into a voice signal. It is not always necessary to present the melody to the user.

【００２３】この第３の実施の形態ではまた、図３のス
テップ１１で、生成した音声信号に騒音データを加算
し、新たな音声信号として出力する。騒音信号は、使用
者が過去に音声認識装置を利用したときに、無発話部分
に含まれる騒音を取得して外部記憶装置１２に記憶して
おいたものである。騒音環境が動的である場合には、複
数種類の騒音が記憶される。この場合、１回の編集に対
して複数の音声信号が得られ、図３のステップ１２、１
３における処理は複数回行われる。In the third embodiment, in step 11 of FIG. 3, noise data is added to the generated audio signal, and the result is output as a new audio signal. The noise signal is obtained by acquiring the noise included in the non-utterance part and storing it in the external storage device 12 when the user has used the voice recognition device in the past. If the noise environment is dynamic, multiple types of noise are stored. In this case, a plurality of audio signals are obtained for one editing, and steps S12, S1,
The process in 3 is performed a plurality of times.

【００２４】この第３の実施の形態によれば、認識率低
下が騒音に依存する場合でも、不適切であるか否かを正
確に判断することができ、結果として認識率の低下を防
止することができる。According to the third embodiment, it is possible to accurately determine whether or not the recognition rate is inappropriate even if the reduction in the recognition rate depends on noise. As a result, the reduction in the recognition rate is prevented. be able to.

【００２５】《発明の第４の実施の形態》携帯電話機か
ら認識対象語を転送して辞書の編集操作を行うようにし
た第４の実施の形態を説明する。図７は第４の実施の形
態の構成を示す。なお、図１に示す第１の実施の形態の
構成と同様な機器に対しては同一の符号を付して相違点
を中心に説明する。この第４の実施の形態では、音声認
識ユニット１の信号処理装置１１に携帯電話機６が接続
される。<< Fourth Embodiment of the Invention >> A description will be given of a fourth embodiment in which a word to be recognized is transferred from a mobile phone and a dictionary editing operation is performed. FIG. 7 shows the configuration of the fourth embodiment. Note that the same reference numerals are given to the same devices as those in the configuration of the first embodiment shown in FIG. In the fourth embodiment, the mobile phone 6 is connected to the signal processing device 11 of the voice recognition unit 1.

【００２６】図８は第４の実施の形態の辞書編集入力処
理を示すフローチャートである。信号処理装置１１は、
携帯電話機６から認識対象語の転送が行われるとこの辞
書編集入力処理を開始する。ここでは、図４に示す人名
辞書に携帯電話機６から「佐藤」という語を転送し、図
５に示すように辞書を追加編集する例を示す。ステップ
２１において、追加編集した語の読みを示すテキストデ
ータをもとに、外部記憶装置１２に記憶されている音素
パターンを組み合わせて音声信号を生成する。この音声
信号は可聴できる信号である必要はなく、ピッチ周波数
や発話速度を反映している必要はない。FIG. 8 is a flowchart showing a dictionary editing input process according to the fourth embodiment. The signal processing device 11
When the recognition target word is transferred from the mobile phone 6, the dictionary editing input processing is started. Here, an example is shown in which the word "Sato" is transferred from the mobile phone 6 to the personal name dictionary shown in FIG. 4, and the dictionary is additionally edited as shown in FIG. In step 21, a speech signal is generated by combining the phoneme patterns stored in the external storage device 12 based on the text data indicating the reading of the additionally edited word. This audio signal need not be an audible signal, and need not reflect the pitch frequency or speech rate.

【００２７】次に、ステップ２２で、生成した音声信号
と外部記憶装置１２の辞書内のすべての語との一致度を
演算する。なお、辞書には編集済みの認識対象語が含ま
れている。図６に、一致度の演算結果の一例を示す。編
集された「佐藤」から生成した音声信号との一致度を演
算したので、通常は「佐藤」の一致度が最も高い。この
例では、「加藤」の一致度が第２位となっている。ステ
ップ２３で、編集された語「佐藤」が適切か否かを判断
する。辞書内の最高の一致度の語と編集された語「佐
藤」とが異なる場合は、編集された語「佐藤」は辞書に
すでに記憶されている最高の一致度の語と誤認される可
能性が高く、不適切であると判断する。一方、辞書内の
最高の一致度の語と編集された語「佐藤」とが一致する
場合には、直ちに編集された語「佐藤」が適切であると
判断せず、正確な判断を行うために次の処理を行う。Next, in step 22, the degree of coincidence between the generated speech signal and all the words in the dictionary of the external storage device 12 is calculated. The dictionary includes the edited recognition target words. FIG. 6 shows an example of the calculation result of the degree of coincidence. Since the degree of matching with the audio signal generated from the edited "Sato" was calculated, the degree of matching of "Sato" is usually the highest. In this example, the degree of coincidence of “Kato” is second. In step 23, it is determined whether the edited word "Sato" is appropriate. If the word with the highest match in the dictionary is different from the edited word "Sato", the edited word "Sato" may be mistaken for the word with the highest match already stored in the dictionary Is high and judged inappropriate. On the other hand, if the word with the highest degree of matching in the dictionary matches the edited word "Sato", the edited word "Sato" is not immediately determined to be appropriate, and an accurate determination is made. The following processing is performed.

【００２８】最高の一致度の語が編集された語「佐藤」
である場合は、最高の一致度と次に高い一致度とを比較
する。編集された語の一致度をＲ０とし、次に高い一致
度をＲ１とすると、The word "Sato" in which the word with the highest degree of matching has been edited
If so, the highest matching score is compared with the next highest matching score. If the degree of coincidence of the edited word is R0 and the next highest degree of coincidence is R1,

【数３】Ｒ０（１−ｋ）＜Ｒ１の関係を満たす場合には、最高の一致度の語と次に一致
度が高い語との一致度の差が余りなく、最高の一致度の
語が編集された語「佐藤」ではあっても、次に一致度の
高い語と間違える可能性も高いので、編集された語は不
適切であると判断する。なお、数式１において、ｋは１
より小さい正の値であり、実験により最適な値を決定す
る。この実施の形態では、例えば０．０１とする。## EQU3 ## If the relationship of R0 (1-k) <R1 is satisfied, there is not much difference between the word having the highest matching degree and the word having the next highest matching degree, and the word having the highest matching degree is obtained. Is an edited word "Sato", it is highly likely that the edited word is the next word with the highest degree of coincidence, so the edited word is determined to be inappropriate. In Equation 1, k is 1
It is a smaller positive value and the optimal value is determined by experiment. In this embodiment, for example, it is 0.01.

【００２９】図６に示す例では、編集された語「佐藤」
の一致度が最も高く、Ｒ０＝６１５０であり、編集語以
外の語の中で「加藤」の一致度が最も高く、Ｒ１＝６１
００である。これらの値を数式３に代入すると、In the example shown in FIG. 6, the edited word "Sato"
Is the highest, R0 = 6150, and among the non-edited words, "Kato" is the highest, and R1 = 61.
00. Substituting these values into Equation 3 gives

【数４】６１５０×（１−０．０１）＝６０８８．５＜６１００となり、編集された語「佐藤」はすでに辞書内に記憶さ
れている「加藤」と誤認しやすく、不適切であると判断
される。なお、演算の誤差でＲ０＜Ｒ１となった場合で
も、当然、数式１は成立する。6150 × (1−0.01) = 6088.5 <6100, and the edited word “Sato” is easily misunderstood as “Kato” already stored in the dictionary, and is determined to be inappropriate. Is determined. It should be noted that even when R0 <R1 due to an error in the calculation, Expression 1 is naturally satisfied.

【００３０】ステップ２３で編集された語「佐藤」が適
切な場合は処理を終了し、不適切な場合はステップ２４
へ進む。ステップ２４で、編集された認識対象の語を再
編集する。携帯電話機６のアドレス帳には、認識対象の
語である人名の属性データが保存されている。例えば性
別、親類かどうか、あるいは会社名などがテキストで記
述されている。これを認識対象の語の前あるいは後に付
加することで新たな認識対象の語を生成する。この様子
を図９に示す。転送された「佐藤」に対し、所属する会
社である「●●社」が付加されている。これ以外に性別
を用いれば「●●くん」などの語を語尾に付加すること
が可能である。付加するテキストが存在しない場合や、
ループ処理により付加する組み合わせがこれ以上ない場
合に失敗となり、ステップ２５へ進む。一方、付加する
組み合わせが他に存在する場合はステップ２１へ戻り、
再度評価する。If the word "Sato" edited in step 23 is appropriate, the processing is terminated.
Proceed to. In step 24, the edited recognition target word is re-edited. In the address book of the mobile phone 6, attribute data of a person name as a word to be recognized is stored. For example, gender, relatives, company names, etc. are described in text. By adding this before or after the word to be recognized, a new word to be recognized is generated. This is shown in FIG. To the transferred “Sato”, “●● company” to which the company belongs is added. If gender is used in addition to this, words such as "●● kun" can be added to the end of words. If there is no text to add,
If there are no more combinations to be added by the loop processing, the process fails and the process proceeds to step 25. On the other hand, when there is another combination to be added, the process returns to step 21 and
Evaluate again.

【００３１】ステップ２５では、判断結果をディスプレ
イ３にメッセージとして表示する。不適切な場合、「編
集された”●●”は、辞書内の”××”と誤認される可
能性があります」、「アドレス帳の中に有効な付加語は
ありません」というメッセージを表示する。なお、辞書
の再編集に成功した場合は、ディスプレイ３に「”●●
社さとう”を登録しました」というメッセージを表示す
る。In step 25, the judgment result is displayed on the display 3 as a message. If incorrect, "Edited" ●● "may be mistaken for" xx "in the dictionary" or "No valid additional words in address book" message is displayed . If the dictionary has been re-edited successfully, the display 3 displays ""
The message "Registered company Sato" is displayed.

【００３２】第４の実施の形態によれば、携帯電話機か
ら語とその属性情報とを入手して辞書を編集するように
したので、使用者が辞書を編集する負担を軽減すること
ができる。また、携帯電話機から入力した語が不適切で
あると判断された場合に、携帯電話機から入力した語と
その属性情報とを組み合わせて新しい語を生成し、その
新生語により辞書を編集するようにしたので、使用者が
再編集する負担を軽減することができる。According to the fourth embodiment, a dictionary and a dictionary are edited by acquiring words and their attribute information from a portable telephone, so that the burden on the user to edit the dictionary can be reduced. Further, when it is determined that a word input from the mobile phone is inappropriate, a new word is generated by combining the word input from the mobile phone and its attribute information, and the dictionary is edited based on the new word. Therefore, the burden on the user for reediting can be reduced.

【００３３】以上の実施の形態の構成において、音声認
識ユニット１が変換手段、演算手段、判断手段、出力手
段および生成手段を、携帯電話機６が外部機器をそれぞ
れ構成する。なお、外部機器は携帯電話機に限定されな
い。In the configuration of the above embodiment, the voice recognition unit 1 constitutes the conversion means, the arithmetic means, the judgment means, the output means and the generation means, and the portable telephone 6 constitutes the external equipment. Note that the external device is not limited to a mobile phone.

[Brief description of the drawings]

【図１】第１の実施の形態の構成を示す図である。FIG. 1 is a diagram showing a configuration of a first embodiment.

【図２】第１の実施の形態の発話スイッチ操作入力処
理を示すフローチャートである。FIG. 2 is a flowchart illustrating an utterance switch operation input process according to the first embodiment;

【図３】第１の実施の形態の辞書編集入力処理を示す
フローチャートである。FIG. 3 is a flowchart illustrating a dictionary editing input process according to the first embodiment;

【図４】人名辞書の一例を示す図である。FIG. 4 is a diagram showing an example of a personal name dictionary.

【図５】辞書の編集例を示す図である。FIG. 5 is a diagram showing an example of editing a dictionary.

【図６】編集された語と辞書の認識対象語との一致度
の演算結果の一例を示す図である。FIG. 6 is a diagram illustrating an example of a calculation result of a degree of coincidence between an edited word and a word to be recognized in a dictionary.

【図７】第４の実施の形態の構成を示す図である。FIG. 7 is a diagram illustrating a configuration of a fourth embodiment.

【図８】第４の実施の形態の辞書編集入力処理を示す
フローチャートである。FIG. 8 is a flowchart illustrating a dictionary editing input process according to the fourth embodiment.

【図９】人名属性データにより再編集された辞書の一
例を示す図である。FIG. 9 is a diagram showing an example of a dictionary re-edited based on personal name attribute data.

[Explanation of symbols]

１音声認識ユニット２マイク３ディスプレイ４スピーカー５入力装置１１信号処理装置１１ａＣＰＵ１１ｂメモリ１２外部記憶装置１３ＡＤコンバーター１４ＤＡコンバーター１５出力アンプ REFERENCE SIGNS LIST 1 voice recognition unit 2 microphone 3 display 4 speaker 5 input device 11 signal processing device 11 a CPU 11 b memory 12 external storage device 13 AD converter 14 DA converter 15 output amplifier

Claims

[Claims]

1. A speech recognition apparatus which allows a user to edit a dictionary of words to be recognized, comprising: conversion means for converting an edited word into a speech signal; Calculating means for calculating the degree of coincidence with, if the word having the highest degree of coincidence is different from the edited word,
A dictionary-editable speech recognition device, comprising: a determination unit that determines that the edited word is inappropriate; and an output unit that outputs a determination result of the determination unit.

2. The dictionary-editable speech recognition device according to claim 1, wherein the determining unit determines that the word having the highest matching degree is the edited word and the word having the highest matching degree is next to the word having the highest matching degree. A dictionary-editable speech recognition device, characterized in that if the difference from a high degree of matching is smaller than a threshold value, the edited word is determined to be inappropriate.

3. The voice recognition apparatus according to claim 1, wherein said conversion means converts the voice signal into a voice signal by using a user-stored phoneme fragment stored in advance. Dictionary-recognizable voice recognition device.

4. A speech recognition apparatus capable of editing a dictionary according to claim 1, wherein said conversion means adds noise stored in advance to a speech signal. .

5. The dictionary-recognizable speech recognition apparatus according to claim 1, wherein the dictionary is edited by inputting a word and its attribute information from an external device. .

6. The speech recognition device capable of editing a dictionary according to claim 5, wherein the word input from the external device is used when the word input from the external device is judged to be inappropriate by the judgment means. A dictionary-editable speech recognition device, comprising: a generation unit configured to generate a new word by combining the word and attribute information thereof.