JP2000099082A

JP2000099082A - Voice recognition device

Info

Publication number: JP2000099082A
Application number: JP10268594A
Authority: JP
Inventors: Hirotaka Goi; 啓恭伍井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1998-09-22
Filing date: 1998-09-22
Publication date: 2000-04-07

Abstract

PROBLEM TO BE SOLVED: To make an input code split string to be obtained by storing words of foreign origin split character strings which are to be obtained by dividing words of foreign origin into a constat length by relating them to their words of foreign origin split voice code strings to estimatingly generate a word of foreign origin which is not registered in a word dictionary. SOLUTION: A word dictionary 3 storing words by relating them to registered word character strings and registered word phoneme strings and also an n-gram dictionary (words of foreign split character storage means) 4 storing words of foreign original split character strings which are to be obtained by diving words of foreign original into a constant length by relating them to their word of foreign original split phomene strings are provided. Then, a word estimating means 5 and a katakana unknown word correcting means 6 compare an input phoneme string with registered word phoneme string- a word of foreign split phoneme strings and they generate an input code split string in which the input phoneme string is divided at every coincident part. Thus, the input code split string can be generated by estimatingly generating a word of foreign original which is not registered in the word dictionary 3.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声入力に基づ
いて認識文字列を生成する音声認識装置に係り、特に、
未定義外来語を含む音声を認識して認識文字列を生成す
ることを可能とする改良に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device for generating a recognition character string based on a speech input,
The present invention relates to an improvement capable of generating a recognized character string by recognizing speech including an undefined foreign word.

【０００２】[0002]

【従来の技術】図１１は特開平４−７３６９４号公報な
どに開示された従来の音声認識装置の構成を示すブロッ
ク図である。図において、１は音声入力に応じてアナロ
グ信号を出力するマイク、２はこのアナログ信号をＡ／
Ｄ変換、量子化、スペクトル分析して、音韻単位に分離
した複数の入力音韻列と各音韻の確からしさを示す入力
出現確率とからなるネットワーク構造の音韻ラティスを
生成する音韻確率算出手段、１２は漢字をその登録漢字
文字列、登録漢字音韻列および漢字出現確率に対応づけ
て記憶する漢字辞書、１１は各入力音韻列毎に、上記入
力音韻列と上記登録漢字音韻列とを比較して、一致した
全ての部分を漢字変換区間とする漢字予測手段、７は上
記音韻ラティスや、漢字変換区間を対応する出現確率と
ともに記憶するランダムアクセスメモリ（ＲＡＭ）、８
は上記漢字変換区間の集合体である複数の入力音韻分割
列のうちから総合的な出現確率が最も高くなる入力音韻
分割列を最適入力音韻分割列として選択する選択手段、
９はこの最適入力音韻分割列のそれぞれの分割区間に対
応する登録漢字文字列を対応づけて認識文字列を生成す
る出力手段である。2. Description of the Related Art FIG. 11 is a block diagram showing the configuration of a conventional speech recognition apparatus disclosed in Japanese Patent Application Laid-Open No. 4-73694. In the figure, reference numeral 1 denotes a microphone that outputs an analog signal in accordance with an audio input, and 2 denotes an analog signal
A phonological probability calculating means for generating a network-structured phonological lattice composed of a plurality of input phonological sequences separated into phonological units and an input appearance probability indicating the likelihood of each phonological unit by performing D conversion, quantization, and spectral analysis; A kanji dictionary that stores kanji in association with its registered kanji character string, registered kanji phoneme string, and kanji appearance probability, 11 compares, for each input phoneme string, the input phoneme string and the registered kanji phoneme string, A kanji predicting means for setting all the matched parts to kanji conversion sections; 7 a random access memory (RAM) for storing the phonological lattice and the kanji conversion sections together with corresponding appearance probabilities;
Selecting means for selecting, as an optimal input phoneme divided sequence, an input phoneme divided sequence having the highest overall appearance probability from a plurality of input phoneme divided sequences which is an aggregate of the kanji conversion sections,
Reference numeral 9 denotes output means for generating a recognized character string by associating a registered kanji character string corresponding to each divided section of the optimal input phoneme divided string.

【０００３】次に動作について説明する。音声を入力す
ると、マイク１はこの音声に応じたアナログ信号を出力
し、音韻確率算出手段２がこのアナログ信号に基づいて
複数の入力音韻列を音韻ラティスとして出力する。漢字
予測手段１１は各入力音韻列に関して漢字辞書を参照し
て漢字変換区間を設定する。Next, the operation will be described. When a voice is input, the microphone 1 outputs an analog signal corresponding to the voice, and the phoneme probability calculation means 2 outputs a plurality of input phoneme strings as a phoneme lattice based on the analog signal. The kanji prediction means 11 sets a kanji conversion section with reference to the kanji dictionary for each input phoneme sequence.

【０００４】選択手段８は、このようにして漢字変換区
間の設定の結果、総合的な出現確率が最も高くなる入力
音韻列を選択し、出力手段９はこの最適入力音韻列のそ
れぞれの分割区間に漢字を対応づけて認識文字列を生成
する。The selecting means 8 selects an input phoneme string having the highest overall appearance probability as a result of the setting of the kanji conversion section in this way, and the output means 9 outputs the divided section of each of the optimal input phoneme strings. A recognition character string is generated by associating a Chinese character with.

【０００５】[0005]

【発明が解決しようとする課題】従来の音声認識装置は
以上のように構成されているので、認識可能な（変換可
能な）単語は漢字辞書１２に登録された単語に限られて
しまうという課題があった。Since the conventional speech recognition apparatus is configured as described above, the recognizable (convertible) words are limited to the words registered in the kanji dictionary 12. was there.

【０００６】従って、所定の言語との間では音の連鎖の
仕方（音の配列と表記文字の配列との関係）が異なる外
来語を認識させようとする場合には、その外来語を漢字
辞書１２に登録させなければならない。しかしながら、
近年の日本語のように外来語が急激に増加したり、変化
したりしてしまうような状況においては、全ての外来語
を漢字辞書１２に登録させることは不可能であり、この
ような言語における音声認識技術の実用性を著しく損な
う結果となる。Accordingly, when trying to recognize a foreign word having a different way of chaining sounds (the relationship between the arrangement of sounds and the arrangement of written characters) with a predetermined language, the foreign word is converted into a kanji dictionary. 12 must be registered. However,
In a situation where foreign words are rapidly increasing or changing, as in Japanese in recent years, it is impossible to register all foreign words in the kanji dictionary 12, and such a language cannot be registered. This significantly impairs the practicality of the speech recognition technology in.

【０００７】また、特に外来語が混ざった連続単語認識
においては、その単語の変換ができないだけではなく、
その部分が正確に変換されないことによって、正しいは
ずの入力音声符号列の部分をも誤って区切ってしまっ
て、その総合的な出現確率が不当に低くなってしまい、
それ以外の誤った音声符号列に基づいて認識文字列を生
成してしまう結果につながってしまう。In particular, in continuous word recognition in which foreign words are mixed, not only cannot the word be converted, but also
Because that part is not converted correctly, the part of the input speech code string that should be correct is also incorrectly separated, and the overall appearance probability is unduly lowered,
The result is that a recognized character string is generated based on other incorrect speech code strings.

【０００８】この発明は上記のような課題を解決するた
めになされたもので、辞書に全ての外来語を登録するこ
となく、外来語を含んだ音声を高度に認識することがで
きる音声認識装置を得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and a speech recognition device capable of highly recognizing speech including foreign words without registering all foreign words in a dictionary. The purpose is to obtain.

【０００９】[0009]

【課題を解決するための手段】この発明に係る音声認識
装置は、所定の言語による音声が入力され、この音声に
基づいて当該言語における入力音声符号列を生成する前
処理手段と、上記言語における単語を、その登録単語文
字列と登録単語音声符号列とに対応づけて記憶する単語
辞書と、外来語を一定の長さに分割して得られる外来語
分割文字列を、その外来語分割音声符号列に対応づけて
記憶する外来語分割文字列記憶手段と、上記入力音声符
号列を上記登録単語音声符号列および上記外来語分割音
声符号列と比較し、一致部分ごとに入力音声符号列を分
割する入力符号分割列を生成する分割手段と、入力符号
分割列のそれぞれの分割部分に対応する文字列を対応づ
けて認識文字列を生成する文字列生成手段とを備えたも
のである。A speech recognition apparatus according to the present invention is provided with a pre-processing means for receiving a speech in a predetermined language and generating an input speech code sequence in the language based on the speech; A word dictionary that stores words in association with a registered word character string and a registered word speech code string, and a foreign word segmented character string obtained by dividing a foreign word into a certain length, and the foreign word segmented speech A foreign word segmented character string storage unit that stores the input speech code string in association with the code string, and compares the input speech code string with the registered word speech code string and the foreign word segmented speech code string, and converts the input speech code string for each matching portion. It is provided with dividing means for generating an input code division sequence to be divided, and character string generation means for generating a recognition character string by associating character strings corresponding to respective divided portions of the input code division sequence.

【００１０】この発明に係る音声認識装置は、所定の言
語による音声が入力され、この音声に基づいて当該言語
における複数の入力音声符号列を、それぞれの入力出現
確率とともに生成する前処理手段と、上記言語における
単語を、その登録単語文字列、登録単語音声符号列およ
び単語出現確率に対応づけて記憶する単語辞書と、外来
語を一定の長さに分割して得られる外来語分割文字列を
その外来語分割音声符号列および分割文字列出現確率に
対応づけて記憶する外来語分割文字列記憶手段と、各入
力音声符号列毎に、上記入力音声符号列を上記登録単語
音声符号列および上記外来語分割音声符号列と比較し、
一致部分ごとに入力音声符号列を分割する入力符号分割
列を生成する分割手段と、入力出現確率、単語出現確率
および分割文字列出現確率に基づいて各入力符号分割列
の総合的な出現確率を求め、最も出現確率が高い組み合
わせとなる入力符号分割列のそれぞれの分割部分に対応
する文字列を対応づけて認識文字列を生成する文字列生
成手段とを備えたものである。The speech recognition apparatus according to the present invention is characterized in that a speech in a predetermined language is input, and based on the speech, a plurality of input speech code strings in the language are generated together with respective input appearance probabilities, A word dictionary that stores words in the above-described language in association with the registered word character strings, registered word speech code strings, and word appearance probabilities, and a foreign word divided character string obtained by dividing a foreign word into a certain length. A foreign word divided character string storage unit that stores the foreign word divided speech code string and the divided character string appearance probability in association with each other, and for each input speech code string, stores the input speech code string in the registered word speech code string and the Compared with a foreign word segmented speech code string,
A dividing means for generating an input code division sequence for dividing the input speech code sequence for each matching portion; and a total appearance probability of each input code division sequence based on an input appearance probability, a word appearance probability, and a divided character string appearance probability. And character string generating means for generating a recognized character string by associating a character string corresponding to each divided part of the input code divided string with the combination having the highest appearance probability.

【００１１】この発明に係る音声認識装置の音声符号
は、所定の言語における音素、音韻あるいは音節である
ものである。The speech code of the speech recognition device according to the present invention is a phoneme, phoneme or syllable in a predetermined language.

【００１２】この発明に係る音声認識装置の外来語分割
文字列は、外来語のカタカナ表記文字列あるいはアルフ
ァベット表記文字列であるものである。The foreign word segmented character string of the speech recognition apparatus according to the present invention is a character string of a foreign language written in katakana or an alphabet.

【００１３】この発明に係る音声認識装置の分割手段
は、登録単語音声符号列と一致しない部分について外来
語分割音声符号列との比較を行い、外来語分割音声符号
列との一致部分を分割するものである。[0013] The dividing means of the speech recognition apparatus according to the present invention compares a part which does not match the registered word speech code string with the foreign word divided speech code string, and divides a part which matches the foreign word divided speech code string. Things.

【００１４】この発明に係る音声認識装置は、単語辞書
は各単語毎にその単語出現確率を記憶し、外来語分割文
字列記憶手段は各外来語分割文字列毎にその分割文字列
出現確率を記憶し、分割手段は入力音声符号列を分割し
た際に、その全体的な出現確率が最も高くなるように入
力符号分割列を生成するものである。In the speech recognition apparatus according to the present invention, the word dictionary stores the word appearance probability for each word, and the foreign word divided character string storage means stores the divided character string appearance probability for each foreign word divided character string. The dividing unit generates the input code division sequence such that the overall appearance probability becomes highest when the input speech code sequence is divided.

【００１５】この発明に係る音声認識装置は、外来語分
割文字列記憶手段で各外来語分割文字列毎にその分割文
字列出現確率を記憶し、分割手段は、入力音声符号列を
分割した際に複数の外来語分割文字列が断続的に連続す
る場合には、その断続的な連続区間の全体を外来語分割
文字列に変換した場合の確率と断続的に変換した場合の
確率とを比較し、その区間確率の高い方を用いて入力符
号分割列を生成するものである。In the speech recognition apparatus according to the present invention, the foreign word divided character string storage means stores the divided character string appearance probability for each foreign word divided character string. If multiple foreign word segmented strings are intermittently continuous, compare the probability of converting the entire intermittent continuous section into a foreign word segmented string with the probability of intermittently converting Then, the input code division sequence is generated by using the one having the higher section probability.

【００１６】この発明に係る音声認識装置は、分割手段
は断続的な連続区間の全体を外来語分割文字列に変換し
た場合の確率に修正重み付けを行った上で区間確率を比
較するものである。In the speech recognition apparatus according to the present invention, the dividing means compares the section probabilities after weighting the probabilities obtained when the entire intermittent continuous section is converted into a foreign word divided character string with correction weights. .

【００１７】この発明に係る音声認識装置は、所定の言
語における文章を記憶する例文記憶手段を設け、分割手
段はこの例文から抽出される単語との一致部分において
入力音声符号列を分割するものである。The speech recognition apparatus according to the present invention is provided with an example sentence storing means for storing a sentence in a predetermined language, and the dividing means divides an input speech code string at a portion corresponding to a word extracted from the example sentence. is there.

【００１８】[0018]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１による日
本語入力用の音声認識装置の構成を示すブロック図であ
る。図において、１は音声入力に応じてアナログ信号を
出力するマイク（前処理手段）、２はこのアナログ信号
をＡ／Ｄ変換、量子化、スペクトル分析して、音韻単位
に分離した複数の入力音韻列と各音韻の確からしさを示
す入力出現確率とからなるネットワーク構造の音韻ラテ
ィスを生成する音韻確率算出手段（前処理手段）、３は
漢字や外来語などの単語をその登録単語文字列、登録単
語音韻列および単語出現確率に対応づけて記憶する単語
辞書、４はカタカナ文字表記の外来語をｎ（ｎは整数）
文字毎に分割して得られる外来語分割文字列をその外来
語分割音韻列および分割文字列出現確率に対応づけて記
憶するｎ−ｇｒａｍ辞書（外来語分割文字列記憶手
段）、５は各入力音韻列毎に、上記入力音韻列と上記登
録単語音韻列とを比較して、一致した全ての部分を単語
変換区間とする単語予測手段（分割手段）、６は登録単
語音韻列と一致しない部分について外来語分割音韻列と
の比較を行い、外来語分割音韻列との一致部分を未知外
来語変換区間とするカタカナ未知語修正手段（分割手
段）、７は上記音韻ラティス、単語変換区間および未知
外来語変換区間を対応する出現確率とともに記憶するラ
ンダムアクセスメモリ（ＲＡＭ）、８は上記各変換区間
の集合体である複数の入力音韻分割列のうちから総合的
な出現確率が最も高くなる入力音韻分割列を最適入力音
韻分割列として選択する選択手段（文字列生成手段）、
９はこの最適入力音韻分割列のそれぞれの分割区間に対
応する文字列を対応づけて認識文字列を生成する出力手
段（文字列生成手段）である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a speech recognition device for Japanese input according to Embodiment 1 of the present invention. In the figure, reference numeral 1 denotes a microphone (pre-processing means) for outputting an analog signal in accordance with a voice input, and 2 denotes a plurality of input phonemes which are obtained by subjecting the analog signal to A / D conversion, quantization, and spectrum analysis and separated into phoneme units. Phoneme probability calculating means (preprocessing means) for generating a phoneme lattice having a network structure composed of a sequence and input appearance probabilities indicating the likelihood of each phoneme, 3 registers words such as kanji and foreign words as registered word strings, 4 is a word dictionary that stores word phonological sequences and word appearance probabilities in association with each other, and n is a foreign word in katakana character notation (n is an integer).
An n-gram dictionary (foreign word divided character string storage means) for storing a foreign word divided character string obtained by dividing each character in association with the foreign word divided phoneme sequence and the divided character string appearance probability, For each phoneme string, the input phoneme string and the registered word phoneme string are compared, and word matching means (segmentation means) for setting all matched parts as word conversion sections, 6 is a part which does not match the registered word phoneme string. Is compared with a foreign word segmented phoneme string, and a katakana unknown word correcting unit (segmenting unit) for setting a part that matches the foreign word segmented phoneme string as an unknown foreign word conversion section, 7 is the above phoneme lattice, word conversion section and unknown A random access memory (RAM) 8 for storing a foreign language conversion section together with a corresponding appearance probability has the highest overall appearance probability among a plurality of input phoneme division sequences which are an aggregate of the conversion sections. Selecting means for selecting an input phoneme splitting column as the optimum input phoneme dividing column (string generation means)
Reference numeral 9 denotes output means (character string generation means) for generating a recognition character string by associating character strings corresponding to the respective divided sections of the optimal input phoneme division string.

【００１９】[0019]

【数１】 (Equation 1)

【００２０】上記式１はこの発明の実施の形態１による
入力音韻列における各単語の出現確率（以下、予測単語
確率とよぶ）Ｐ（Ｗ｜Ｙ）の演算式である。同式におい
て、Ｗは入力音韻列の一部あるいは全部であり、Ｙは単
語音韻列であり、Ｐ（Ｙ｜Ｗ）は単語Ｗが与えられたと
きの音韻列の確率であって例えば下記式２により求める
ことができ、Ｐ（Ｗ）は単語列の出現確率であって例え
ば下記式３により求めることができ、Ｐ（Ｙ）は音韻列
の出現確率である。そして、単に同一の入力音韻列に対
する単語間の出現確率を比較する場合には、Ｐ（Ｙ｜
Ｗ）Ｐ（Ｗ）のみを計算して比較すれば良い。そして、
この実施の形態１ではこの予測単語確率Ｐ（Ｗ｜Ｙ）が
最大となるＷを変換候補として選択している。なお、下
記式２や式３では乗算演算となっているが、各確率を対
数確率とするとともに総和演算としてもよい。The above equation 1 is an equation for calculating the appearance probability (hereinafter referred to as predicted word probability) P (W | Y) of each word in the input phoneme sequence according to the first embodiment of the present invention. In the equation, W is a part or all of the input phoneme sequence, Y is a word phoneme sequence, and P (Y | W) is a probability of the phoneme sequence when the word W is given. 2, P (W) is the appearance probability of the word string, and can be obtained by, for example, Equation 3 below, and P (Y) is the appearance probability of the phoneme string. Then, to simply compare the appearance probabilities between words for the same input phoneme sequence, P (Y |
W) Only P (W) needs to be calculated and compared. And
In the first embodiment, W having the maximum predicted word probability P (W | Y) is selected as a conversion candidate. Although the multiplication operation is performed in Expressions 2 and 3 below, each probability may be a logarithmic probability and may be a sum operation.

【００２１】[0021]

【数２】 (Equation 2)

【００２２】[0022]

【数３】 (Equation 3)

【００２３】[0023]

【数４】 (Equation 4)

【００２４】上記式４はこの発明の実施の形態１による
入力音韻列におけるカタカナ未知語の出現確率の演算式
である。同式において、Ｃｉはカタカナ表記文字であ
り、この演算式に基づいてカタカナ未知語の出現確率を
求め、この出現確率を上記式１に代入することによりカ
タカナ未知語に対応する予測単語確率を求めることがで
きる。Equation (4) is an equation for calculating the appearance probability of unknown katakana words in the input phoneme sequence according to Embodiment 1 of the present invention. In the above equation, Ci is a katakana notation character, and the appearance probability of the katakana unknown word is obtained based on this arithmetic expression, and the predicted word probability corresponding to the katakana unknown word is obtained by substituting this appearance probability into the above equation 1. be able to.

【００２５】次に動作について説明する。図２はこの発
明の実施の形態１による音声認識動作を示すフローチャ
ートである。図において、ＳＴ１はマイク１に音声を入
力する音声入力ステップであり、ＳＴ２はマイク１がこ
の音声に応じたアナログ信号を出力するアナログ信号出
力ステップであり、ＳＴ３は音韻確率算出手段２がこの
アナログ信号を分析して複数の入力音韻列と入力出現確
率とからなる音韻ラティスを生成してＲＡＭ７に記憶さ
せる音韻ラティス生成ステップである。Next, the operation will be described. FIG. 2 is a flowchart showing a speech recognition operation according to Embodiment 1 of the present invention. In the figure, ST1 is a voice input step for inputting a voice to the microphone 1, ST2 is an analog signal output step for the microphone 1 to output an analog signal corresponding to the voice, and ST3 is a phonological probability calculation means 2 for the analog input. This is a phoneme lattice generation step in which a signal is analyzed to generate a phoneme lattice composed of a plurality of input phoneme sequences and input appearance probabilities, and stored in the RAM 7.

【００２６】ＳＴ４は音韻ラティスの所定のｎ文字分の
区間毎に順次、複数の入力音韻列の中から１つずつ取り
出す比較区間抽出ステップであり、ＳＴ５はこの区間入
力音韻列と同一の音韻列を含む単語音韻列の有無を判断
する単語一部判断ステップであり、ＳＴ６はこのような
単語が１つも無い場合にその区間入力音韻列の音韻ラテ
ィス上の位置を記憶する失敗位置記憶ステップであり、
ＳＴ７は上記区間一致判断ステップで一致した単語に関
して上記単語変換区間の入力出現確率と一致単語の単語
出現確率とに基づいて予測単語確率を計算する予測単語
確率演算ステップであり、ＳＴ８は入力音韻列のすべて
が単語音韻列の全てと一致したか否かを判断する単語完
全一致判断ステップであり、ＳＴ９はその完全一致した
単語への単語変換区間およびその単語の出現確率をＲＡ
Ｍ７に記憶させる解単語記憶ステップであり、ＳＴ１０
は単語辞書３の全ての単語について比較がなされたか否
かを確認する確認ステップであり、ＳＴ１１は音韻ラテ
ィスの所定のｎ文字分の区間に含まれる全ての入力音韻
列について区間比較が完了したか否かを判断する区間単
語比較完了判断ステップであり、これにより基本的には
各入力音韻列の解である単語がそれぞれの出現確率とと
もに１組ずつＲＡＭ７に記憶される。ST4 is a comparison section extraction step of sequentially extracting one from a plurality of input phoneme strings one by one for each section of a predetermined n characters of the phoneme lattice, and ST5 is the same phoneme string as this section input phoneme string. ST6 is a failure position storage step of storing the position of the section input phoneme string on the phoneme lattice when there is no such word. ,
ST7 is a predicted word probability calculation step of calculating a predicted word probability based on the input appearance probability of the word conversion section and the word appearance probability of the matching word for the word matched in the section match determination step, and ST8 is an input phoneme sequence Is a word perfect match determination step of determining whether or not all of the words match all of the word phoneme strings. ST9 determines the word conversion section to the completely matched word and the occurrence probability of the word by RA
This is a solution word storing step to be stored in M7.
Is a confirmation step for confirming whether or not all the words in the word dictionary 3 have been compared. ST11 is whether or not the section comparison has been completed for all the input phoneme strings included in the section for the predetermined n characters of the phoneme lattice. This is a section word comparison completion judgment step of judging whether or not each word is basically a solution of each input phoneme sequence and is stored in the RAM 7 together with each occurrence probability.

【００２７】ＳＴ１２は解となる単語がＲＡＭ７に記憶
されているか否かを判断する解確認ステップであり、Ｓ
Ｔ１３は解が存在しない場合にカタカナ未知語修正手段
６がｎ−ｇｒａｍ辞書４を参照して一致する場合にはカ
タカナ未知語を生成する未知語生成ステップである。こ
れにより、単語が一致しない区間（全ての入力音韻列に
おいて解を得ることができない区間）に対してはカタカ
ナ未知語がその出現確率とともにＲＡＭ７に記憶される
ことになる。ST12 is a solution checking step for determining whether or not a word serving as a solution is stored in the RAM 7.
T13 is an unknown word generation step of generating an unknown katakana word when the katakana unknown word correction means 6 matches with reference to the n-gram dictionary 4 when there is no solution. As a result, the katakana unknown word is stored in the RAM 7 together with the appearance probability for a section where the words do not match (a section where a solution cannot be obtained in all input phoneme strings).

【００２８】ＳＴ１４はＲＡＭ７に記憶された各区間毎
の単語列やカタカナ未知語の組み合わせの中から出現確
率が最大となる変換候補の組み合わせを選択して読み出
す選択ステップであり、ＳＴ１５はこの選択された変換
候補に対応した入力音韻分割列のそれぞれの分割区間に
対応する文字列を対応づけて認識文字列を生成する出力
ステップである。ST14 is a selection step of selecting and reading out a combination of conversion candidates having the maximum appearance probability from combinations of word strings and katakana unknown words for each section stored in the RAM 7, and ST15 is a selection step. This is an output step of generating a recognized character string by associating a character string corresponding to each divided section of the input phoneme divided string corresponding to the converted candidate.

【００２９】次に具体例を用いて詳細に説明する。マイ
ク１に向かって「・・・ディクテートする」と発声する
と、図３に示すような音韻ラティス（一部のみ図示）が
生成される。同図において、「ｄｅ」や「ｄｉ」などが
入力に基づいて予測された入力音韻であり、横（同図で
は左から右）へむけて順番に発声された音韻順に並んで
いる。また、縦に並んでいる複数の入力音韻は同一の音
に基づいて予測された複数の入力音韻であり、それぞれ
に接続された線分に付された値はその音韻の出現確率で
ある。そして、この線分を結んでできる各音韻列がそれ
ぞれ１つ１つの入力音韻列（例えば「ｄｅ，ｋｕ，ｗ
ａ，…」、「ｄｅ，ｋｕ，ｔｅ，…」、「ｄｅ，ｒｅ，
ｋｕ，…」など）である。Next, a specific example will be described in detail. When the user speaks "... Dictate" toward the microphone 1, a phoneme lattice (only a part is shown) as shown in FIG. 3 is generated. In the figure, “de”, “di”, etc. are input phonemes predicted based on the input, and are arranged in the order of the phonemes uttered sequentially in the horizontal direction (from left to right in the figure). The plurality of input phonemes arranged vertically are a plurality of input phonemes predicted based on the same sound, and the value attached to each connected line segment is the appearance probability of the phoneme. Each phoneme string formed by connecting these line segments is one input phoneme string (for example, “de, ku, w
a, ... "," de, ku, te, ... "," de, re,
ku, ... ").

【００３０】次にこの音韻ラティスの各入力音韻列と単
語とを前方一致にて比較する。例えば図４のように単語
辞書３に単語が記憶されている場合には、これらの各単
語との一致検索を行う。そして、例えば「ｄｉ，ｎａ，
ａ」の入力音韻列について説明すれば、Ｐ（Ｙ｜Ｗ）は
下記式５により求められ、Ｐ（Ｗ）はｎ＝１の場合には
単語の出現確率そのものとなるので、予測単語確率は下
記式６のように求められる。Next, each input phoneme sequence of the phoneme lattice is compared with a word by prefix matching. For example, when words are stored in the word dictionary 3 as shown in FIG. 4, a match search with each of these words is performed. And, for example, "di, na,
To explain the input phoneme sequence of “a”, P (Y | W) is obtained by the following equation 5, and P (W) is the word appearance probability itself when n = 1, so that the predicted word probability is It is obtained as in the following Expression 6.

【００３１】[0031]

【数５】 (Equation 5)

【００３２】[0032]

【数６】 (Equation 6)

【００３３】そして、この実施の形態１では音韻列が一
致する全ての単語についてこの計算を行い（例えば「デ
ィレクター」、「ディスク」、「でくわす」などとの一
致判断および確率計算を行い）最も出現確率が高いもの
を単語変換候補として選択する。なお、ここでは説明の
簡便のため全ての入力音韻列と全ての単語音韻列とを比
較すると説明したが、ビタビなどの動的計画法を用いて
高速に処理することができる。In the first embodiment, this calculation is performed for all words having the same phoneme sequence (for example, a match determination with "director", "disk", "detail", etc. and a probability calculation are performed). A candidate having a high probability is selected as a word conversion candidate. Here, for the sake of simplicity, all input phoneme strings are compared with all word phoneme strings. However, processing can be performed at high speed using a dynamic programming method such as Viterbi.

【００３４】次にこのような変換候補が見つからない場
合のカタカナ未知語処理の具体例について説明する。説
明の簡便のために「Ｙ＝ｄｉ，ｋｕ，ｔｅ，ｅ，ｔｏ，
ｓｕ，ｒｕ」という入力音韻列を例に説明する。Next, a specific example of katakana unknown word processing when such a conversion candidate is not found will be described. For simplicity of explanation, "Y = di, ku, te, e, to,
An example of the input phoneme sequence “su, ru” will be described.

【００３５】図５はこの発明の実施の形態１によるｎ−
ｇｒａｍ辞書４の部分具体例を示すデータリストであ
る。同図において、「＃」はポーズを示し、ｎ＝３の場
合の具体例である。なお、例えば、文字「ディク」の後
に文字「テ」の出現する文字３−ｇｒａｍ（条件付き確
率）は「ディクテ」の３文字の出現度数を「ディク」の
２文字の出現度数で割ることで近似している（拗音も１
文字と数える）。FIG. 5 is a diagram showing n-type semiconductor devices according to the first embodiment of the present invention.
5 is a data list showing a specific example of a part of the gram dictionary 4. In the figure, “#” indicates a pause, and is a specific example when n = 3. For example, the character 3-gram (conditional probability) in which the character "te" appears after the character "dic" is obtained by dividing the frequency of appearance of three characters of "dict" by the frequency of appearance of two characters of "dic". Approximate
Count as letters).

【００３６】そして、上記入力音韻列の単語出現確率は
下記式７によって「ディクテートする」という文字列を
求めることができ、このようにして予測した表記文字列
において「ディクテート」までがカタカナ表記となるの
で、この範囲をカタカナ未知語とみなし、且つ、その出
現確率は下記式８にて求められる。つまり、この「ディ
クテート」という単語を上記「ｄｉ，ｋｕ，ｔｅ，ｅ，
ｔｏ」の区間の変換候補とし、且つ、その単語出現確率
は２．９５ｅ−０８（＝２．９５×１０^−８）として扱
われる。そして、この「ディクテート」という単語を含
んだ文字列の出現確率が最も高ければ、このカタカナ未
知語を含んだ当該文字列が認識文字列として出力される
ことになる。The word appearance probability of the input phoneme string can be obtained by a character string "dictate" according to the following equation (7). In the notation character string thus predicted, up to "dictate" is in katakana notation. Therefore, this range is regarded as katakana unknown word, and its appearance probability is obtained by the following equation (8). That is, the word “dictate” is replaced with the above “di, ku, te, e,
The conversion candidate of the section “to” is used, and the word appearance probability is treated as 2.95e−08 (= 2.95 × 10 ⁻⁸ ). If the probability of occurrence of a character string containing the word "dictate" is the highest, the character string containing the katakana unknown word is output as a recognized character string.

【００３７】[0037]

【数７】 (Equation 7)

【００３８】[0038]

【数８】 (Equation 8)

【００３９】以上のように、この実施の形態１によれ
ば、単語をその登録単語文字列と登録単語音韻列とに対
応づけて記憶する単語辞書３とともに、外来語を一定の
長さに分割して得られる外来語分割文字列をその外来語
分割音韻列に対応づけて記憶するｎ−ｇｒａｍ辞書４を
設け、単語予測手段５やカタカナ未知語修正手段６が入
力音韻列を登録単語音韻列や外来語分割音韻列と比較
し、一致部分ごとに入力音韻列を分割する入力符号分割
列を生成するので、単語辞書３に登録されていない外来
語を推定生成して入力符号分割列を生成することができ
る効果がある。As described above, according to the first embodiment, the foreign words are divided into a fixed length together with the word dictionary 3 storing the words in association with the registered word character strings and the registered word phoneme strings. An n-gram dictionary 4 is provided for storing a foreign word segmented character string obtained in association with the foreign word segmented phoneme sequence, and the word predicting means 5 and the katakana unknown word correcting means 6 convert the input phoneme string into a registered word phoneme string. And the input code division sequence that divides the input phoneme sequence for each matching portion is generated by comparing the input code division sequence with the input word division phoneme sequence. There is an effect that can be.

【００４０】特に、外来語は所定の言語と比べた場合、
その音の連鎖の仕方が異なる場合が多いが、本実施の形
態１においては、そのような場合であったとしても音声
に基づく単語分割の際に未定義外来語を推定するように
しているので、例えば一旦所定の言語の文字列に変換し
た後に外来語を推定させるような場合と比べたとしても
格段に変換誤差を抑制することができ、未定義の外来語
を高度に推定することができる効果がある。In particular, when a foreign language is compared with a predetermined language,
In many cases, the way of chaining the sounds is different, but in the first embodiment, even in such a case, undefined foreign words are estimated at the time of word division based on speech. For example, even when compared with a case where a foreign language is estimated after being once converted into a character string of a predetermined language, the conversion error can be significantly suppressed, and an undefined foreign language can be highly estimated. effective.

【００４１】また、このように音声に基づく単語分割の
際に未定義外来語を推定しているので、その他の入力音
韻列における単語分割に悪影響を及ぼし難くなり、未定
義外来語を含んだ音声を認識させた場合に、その認識処
理全体の破綻を来さないようにすることができる効果が
ある。Further, since the undefined foreign words are estimated at the time of the word segmentation based on the speech as described above, it is difficult to adversely affect the word segmentation in other input phoneme strings, and the speech including the undefined foreign words is reduced. Has the effect of preventing failure of the entire recognition process.

【００４２】その結果、近年の日本語のように、外来語
が急激に増加したり変化したりしてしまうために、単語
辞書３に全ての外来語を登録することが実質的に困難で
あるような状況においても、外来語を含んだ音声を高度
に認識することができる効果がある。As a result, since foreign words rapidly increase or change as in recent Japanese, it is substantially difficult to register all foreign words in the word dictionary 3. Even in such a situation, there is an effect that speech including foreign words can be highly recognized.

【００４３】この実施の形態１によれば、カタカナ未知
語修正手段６は、登録単語音韻列と全ての入力音韻列と
が一致しない部分について外来語分割音韻列との比較を
行い、外来語分割音韻列との一致部分を分割するので、
余分な処理をすることなく高速に未定義外来語を含んだ
音声を認識することができる効果がある。According to the first embodiment, the katakana unknown word correcting means 6 compares the registered word phoneme sequence with all the input phoneme sequences with the foreign word segmented phoneme sequence, and performs the foreign word segmentation. Since the part that matches the phoneme sequence is divided,
There is an effect that speech including undefined foreign words can be quickly recognized without performing extra processing.

【００４４】なお、この実施の形態１では、音声符号と
して音韻を利用しているが、これ以外にも音素や音節な
どを音声符号として利用しても同様の効果が得られる。
また、外来語分割文字列として外来語のカタカナ表記文
字列を利用しているが、これがアルファベット表記文字
列であったとしても同様の効果が得られる。In the first embodiment, phonemes are used as speech codes. However, similar effects can be obtained by using phonemes or syllables as speech codes.
In addition, although the katakana notation character string of a foreign word is used as the foreign word division character string, the same effect can be obtained even if this is an alphabet notation character string.

【００４５】実施の形態２．図６はこの発明の実施の形
態２による音声認識動作を示すフローチャートである。
図において、ＳＴ１６は各入力音韻列毎にカタカナ未知
語修正手段６がｎ−ｇｒａｍ辞書４を参照して一致する
場合にはカタカナ未知語を生成する未知語生成ステップ
である。これにより、各音韻列毎に単語あるいはカタカ
ナ未知語が割り付けられる。これ以外の構成および動作
は実施の形態１と同様であり説明を省略する。Embodiment 2 FIG. 6 is a flowchart showing a speech recognition operation according to Embodiment 2 of the present invention.
In the figure, ST16 is an unknown word generation step of generating an unknown katakana word for each input phoneme sequence when the katakana unknown word correction means 6 matches with reference to the n-gram dictionary 4. As a result, a word or katakana unknown word is assigned to each phoneme sequence. Other configurations and operations are the same as those in the first embodiment, and a description thereof will be omitted.

【００４６】以上のように、この実施の形態２によれ
ば、単語をその登録単語文字列と登録単語音韻列とに対
応づけて記憶する単語辞書３とともに、外来語を一定の
長さに分割して得られる外来語分割文字列をその外来語
分割音韻列に対応づけて記憶するｎ−ｇｒａｍ辞書４を
設け、単語予測手段５やカタカナ未知語修正手段６が入
力音韻列を登録単語音韻列や外来語分割音韻列と比較
し、一致部分ごとに入力音韻列を分割する入力符号分割
列を生成するので、実施の形態１と同様に、単語辞書３
に登録されていない外来語を推定生成して入力符号分割
列を生成することができ、ひいては、近年の日本語のよ
うに外来語が急激に増加したり変化したりしてましまう
ために単語辞書３に全ての外来語を登録することが実質
的に困難であるような状況においても、外来語を含んだ
音声を高度に認識することができる効果がある。As described above, according to the second embodiment, the foreign words are divided into a fixed length together with the word dictionary 3 storing the words in association with the registered word character strings and the registered word phoneme strings. An n-gram dictionary 4 is provided for storing a foreign word segmented character string obtained in association with the foreign word segmented phoneme sequence, and the word predicting means 5 and the katakana unknown word correcting means 6 convert the input phoneme string into a registered word phoneme string. And an input code division sequence that divides the input phoneme sequence for each matching part by comparing the input dictionary with the phonetic string divided by the foreign words.
The input code division sequence can be generated by estimating and generating a foreign word that is not registered in the Japanese language, and as a result, the number of foreign words suddenly increases or changes as in Japanese in recent years. Even in a situation where it is practically difficult to register all foreign words in the dictionary 3, there is an effect that speech including foreign words can be highly recognized.

【００４７】この実施の形態２によれば、カタカナ未知
語修正手段６は、各入力音韻列毎にカタカナ未知語を生
成し、他の単語との間で出現確率に基づく選択がなされ
るので、実施の形態１と比較した場合、同等の高速音声
認識処理を行いつつ、より少ないＲＡＭ容量で処理する
ことが可能となる効果がある。According to the second embodiment, the katakana unknown word correcting means 6 generates a katakana unknown word for each input phoneme sequence and makes a selection between other words based on the appearance probability. Compared with the first embodiment, there is an effect that the processing can be performed with a smaller RAM capacity while performing the same high-speed speech recognition processing.

【００４８】なお、この実施の形態２では、音声符号と
して音韻を利用しているが、これ以外にも音素や音節な
どを音声符号として利用しても同様の効果が得られる。
また、外来語分割文字列として外来語のカタカナ表記文
字列を利用しているが、これがアルファベット表記文字
列であったとしても同様の効果が得られる。In the second embodiment, phonemes are used as speech codes. However, similar effects can be obtained by using phonemes and syllables as speech codes.
In addition, although the katakana notation character string of a foreign word is used as the foreign word division character string, the same effect can be obtained even if this is an alphabet notation character string.

【００４９】実施の形態３．図７はこの発明の実施の形
態３による音声認識動作を示すフローチャートである。
図において、ＳＴ１７はカタカナ未知語修正手段６が解
の単語列の中に複数の外来語分割文字列が断続的に連続
するか否かを判断する未知語予測ステップであり、ＳＴ
１８はカタカナ未知語修正手段６がその断続的な連続区
間の全体を外来語分割文字列に変換した場合の文字列と
その出現確率とを計算する未知語生成ステップであり、
ＳＴ１９はこの未知語の出現確率と断続的に変換した場
合の確率とを比較し、その区間確率の高い方を用いて解
の単語列を修正して出力手段９に出力させる解修正出力
ステップである。これ以外の構成および動作は実施の形
態１と同様であり説明を省略する。Embodiment 3 FIG. 7 is a flowchart showing a voice recognition operation according to Embodiment 3 of the present invention.
In the figure, ST17 is an unknown word prediction step in which the katakana unknown word correcting means 6 determines whether or not a plurality of foreign word divided character strings are intermittently continuous in the word string of the solution.
Reference numeral 18 denotes an unknown word generation step for calculating a character string and its appearance probability when the katakana unknown word correction means 6 converts the entire intermittent continuous section into a foreign word divided character string,
ST19 is a solution correction output step in which the appearance probability of the unknown word is compared with the probability in the case of intermittent conversion, and the word string of the solution is corrected using the higher section probability and output to the output means 9 is there. Other configurations and operations are the same as those in the first embodiment, and a description thereof will be omitted.

【００５０】次に具体例を用いて説明する。「ｄｉ，ｋ
ｕ，ｔｅ，ｅ，ｔｏ」という入力音韻列が入力された場
合に、例えば単語辞書３に「ディ」や「テー」などの単
語が登録されていた場合、単語比較処理では「ディ／く
／テー／と」という誤った認識文字列を生成してしま
う。上記カタカナ未知語処理手段は、このような場合に
も「ｄｉ，ｋｕ，ｔｅ，ｅ，ｔｏ」の全体を「ディクテ
ート」と変換して、その出現確率を計算する。その結
果、「ディ／く／テー／と」の替わりに「ディクテー
ト」と変換された認識文字列を生成することが可能とな
る。Next, a specific example will be described. "Di, k
When an input phoneme sequence of “u, te, e, to” is input, for example, when words such as “di” and “te” are registered in the word dictionary 3, “di / ku / Erroneous recognition character string "te / to" is generated. Even in such a case, the katakana unknown word processing means converts the entirety of "di, ku, te, e, to" into "dictate" and calculates the appearance probability. As a result, it is possible to generate a recognition character string converted to “dictate” instead of “day / day / day / to”.

【００５１】以上のように、この実施の形態３によれ
ば、ｎ−ｇｒａｍ辞書４は各外来語分割文字列毎にその
分割文字列出現確率を記憶し、カタカナ未知語修正手段
６が、解の入力音韻列を分割した際に複数の外来語分割
文字列が断続的に連続する場合には、その断続的な連続
区間の全体を外来語分割文字列に変換した場合の確率と
断続的に変換した場合の確率とを比較し、その区間確率
の高い方を用いて入力符号分割列を修正するので、本来
未定義外来語として変換されるべき部分を誤って複数の
外来語分割文字列が断続的に連続する部位であると判断
してしまった場合であっても、それを本来の未定義外来
語に変換させることができる効果がある。As described above, according to the third embodiment, the n-gram dictionary 4 stores the split character string appearance probability for each foreign word split character string, and When the input phoneme string is divided and a plurality of foreign word divided character strings are intermittently continuous, the probability that the entire intermittent continuous section is converted to the foreign word divided character string and the intermittent Since the input code division sequence is corrected using the one with the higher section probability by comparing the probability of conversion, the part that should be converted as an undefined foreign word is erroneously converted to a plurality of foreign word division character strings. Even if it is determined that the part is intermittently continuous, there is an effect that it can be converted into an original undefined foreign word.

【００５２】実施の形態４．図７の未知語生成ステップ
ＳＴ１８において、断続的な連続区間の全体を外来語分
割文字列に変換した場合の文字列の出現確率に所定の重
み付けを行うようにした以外は実施の形態３と同様の構
成及び動作であり説明を省略する。Embodiment 4 FIG. In the unknown word generation step ST18 in FIG. 7, the same as in the third embodiment except that the appearance probability of a character string when the entire intermittent continuous section is converted into a foreign word divided character string is weighted in a predetermined manner. And the description is omitted.

【００５３】以上のように、この実施の形態４によれ
ば、カタカナ未知語修正手段６は、断続的な連続区間の
全体を外来語分割文字列に変換した場合の確率に修正重
み付けを行った上で区間確率を比較するので、本来未定
義外来語として変換されるべき部分を誤って複数の外来
語分割文字列が断続的に連続する部位であると判断して
しまった場合であっても、それを本来の未定義外来語に
変換させることができるという効果がある。As described above, according to the fourth embodiment, the katakana unknown word correcting means 6 performs correction weighting on the probability when the entire intermittent continuous section is converted into a foreign word divided character string. Since the section probabilities are compared above, even if the part that should be converted as an undefined foreign word is mistakenly determined to be a part where multiple foreign word divided character strings are intermittently continuous Has the effect that it can be converted into the original undefined foreign word.

【００５４】実施の形態５．図８はこの発明の実施の形
態５による音声認識動作を示すフローチャートである。
図において、ＳＴ２０は単語辞書３を参照して区間入力
音韻列と同一の音韻列を含む単語音韻列の有無を判断す
るとともに、ｎ−ｇｒａｍ辞書４を参照して、同一の区
間入力音韻列と一致するカタカナ未知語を生成してＲＡ
Ｍ７に記憶させる一括処理ステップであり、ＳＴ２１は
複数の入力音韻列に対応する単語およびカタカナ未知語
の組み合わせの中から出現確率が最大となる変換候補の
組み合わせを選択して読み出す一括選択ステップであ
る。これ以外の構成及び動作は実施の形態１と同様であ
り説明を省略する。Embodiment 5 FIG. FIG. 8 is a flowchart showing a speech recognition operation according to Embodiment 5 of the present invention.
In the figure, ST20 refers to the word dictionary 3 to determine the presence or absence of a word phoneme string including the same phoneme string as the section input phoneme string, and refers to the n-gram dictionary 4 to determine the same Generate matching katakana unknown words and RA
This is a batch processing step to be stored in M7, and ST21 is a batch selection step of selecting and reading out a combination of conversion candidates with the highest occurrence probability from combinations of words and katakana unknown words corresponding to a plurality of input phoneme strings. . Other configurations and operations are the same as those in the first embodiment, and a description thereof will not be repeated.

【００５５】次に具体例について説明する。例えば
「ｅ，Ｎ，ｚｉ，Ｎ」という入力音韻列が入力され、単
語辞書３に「円陣，ｅＮｚｉＮ，４ｅ−０７」といった
単語が登録されていた場合、この入力音韻列に基づいて
「円陣」と「エンジン」（未定義語）との２つの変換候
補が生成される。そして、この未定義語「エンジン」の
出現確率が「円陣」の出現確率よりも高ければ、単語辞
書３に同音の単語が登録されているにも拘わらず未定義
カタカナ表記文字列を生成することができる。Next, a specific example will be described. For example, when an input phoneme sequence of “e, N, zi, N” is input and a word such as “circle, eNziN, 4e-07” is registered in the word dictionary 3, “circle formation” is performed based on the input phoneme sequence. And “engine” (undefined word) are generated. If the probability of occurrence of the undefined word "engine" is higher than the probability of occurrence of "circle", an undefined katakana character string is generated despite the fact that the same word is registered in the word dictionary 3. Can be.

【００５６】以上のように、この実施の形態５によれ
ば、単語辞書３は各単語毎にその単語出現確率を記憶
し、ｎ−ｇｒａｍ辞書４は各外来語分割文字列毎にその
分割文字列出現確率を記憶して、入力音韻列を分割した
際にその全体的な出現確率が最も高くなるように入力符
号分割列を生成するので、例えば未定義外来語が単語辞
書３に登録された単語と同じ発音である場合であって
も、その未定義外来語の出現確率と単語の出現確率とを
比較して未定義外来語への変換を行うことができる効果
がある。As described above, according to the fifth embodiment, the word dictionary 3 stores the word appearance probabilities of each word, and the n-gram dictionary 4 stores the divided characters of each foreign word divided character string. The input code division sequence is generated so that the sequence appearance probability is stored and the overall appearance probability becomes highest when the input phoneme sequence is divided. For example, an undefined foreign word is registered in the word dictionary 3. Even when the pronunciation is the same as that of a word, there is an effect that the appearance probability of the undefined foreign word is compared with the occurrence probability of the word to convert the word into an undefined foreign word.

【００５７】実施の形態６．図９はこの発明の実施の形
態６による日本語入力用の音声認識装置の構成を示すブ
ロック図である。図において、１０は文章を記憶する例
文記憶手段であり、１１は登録単語音韻列と一致しない
部分について、上記例文から抽出されるカタカナ表記文
字による外来語と一致する部分にはその外来語を用いて
カタカナ未知語を生成するとともに、ｎ−ｇｒａｍ辞書
４の外来語分割音韻列との一致部分を未知外来語変換区
間とするカタカナ未知語修正手段（分割手段）である。
これ以外の構成は実施の形態１と同様であり同一の符号
を付して説明を省略する。Embodiment 6 FIG. FIG. 9 is a block diagram showing a configuration of a speech recognition apparatus for Japanese input according to a sixth embodiment of the present invention. In the figure, reference numeral 10 denotes an example sentence storage means for storing a sentence. Reference numeral 11 denotes a portion that does not match the registered word phoneme sequence, and uses a foreign word in a portion that matches a foreign word in katakana notation characters extracted from the above example sentence. Means for generating katakana unknown words, and a katakana unknown word correction means (division means) for setting a portion of the n-gram dictionary 4 that matches the foreign word divided phoneme sequence as an unknown foreign word conversion section.
The other configuration is the same as that of the first embodiment, and the same reference numerals are given and the description is omitted.

【００５８】次に動作について説明する。図１０はこの
発明の実施の形態６による音声認識動作を示すフローチ
ャートである。図において、ＳＴ２２は解が存在しない
場合にカタカナ未知語修正手段１１がまず例文を参照し
て一致する場合にはカタカナ未知語を生成し、次にこの
例文にも一致しない場合にはｎ−ｇｒａｍ辞書４を参照
して一致する場合にはカタカナ未知語を生成する例文未
知語生成ステップである。これ以外の動作は実施の形態
１と同様であり同一の符号を付して説明を省略する。Next, the operation will be described. FIG. 10 is a flowchart showing a speech recognition operation according to Embodiment 6 of the present invention. In the figure, in step ST22, if there is no solution, the katakana unknown word correcting means 11 first generates an unknown katakana word by referring to the example sentence, and then generates an n-gram if it does not match this example sentence. This is an example sentence unknown word generation step of generating a katakana unknown word when a match is made with reference to the dictionary 4. The other operations are the same as those of the first embodiment, and the same reference numerals are given and the description is omitted.

【００５９】次に具体例について説明する。例文記憶手
段１０に例えば「三菱電機ではアプリコットを販売して
いる。」という例文が記憶されている状態で、「ａ，ｐ
ｕ，ｒｉ，ｋｏ，Ｑ，ｔｏ」などの音韻列が入力された
場合、カタカナ未知語修正手段１１はこの入力音韻列と
上記例文のカタカナ文字列を音韻列に変換したものとの
比較を行い、この入力音韻列を「アプリコット」という
カタカナ未知語に変換する。Next, a specific example will be described. For example, in the state where the example sentence "Apricot is sold by Mitsubishi Electric" is stored in the example sentence storage means 10, "a, p
When a phoneme sequence such as "u, ri, ko, Q, to" is input, the katakana unknown word correcting means 11 compares the input phoneme sequence with the katakana character string of the above example sentence converted into a phoneme sequence. Then, the input phoneme sequence is converted into an unknown katakana word “apricot”.

【００６０】また、この例文に基づくカタカナ未知語の
単語出現確率は例えば、下記式９などを用いて算出すれ
ば良い。また、この式９において分母を例文の全読み出
し文字数を用いているが、簡易に例文の文字数に所定の
係数を乗算したものを用いてもよい。The word appearance probability of the katakana unknown word based on this example sentence may be calculated using, for example, the following equation (9). Further, in Expression 9, the denominator is the total number of read characters of the example sentence, but a simple multiplication of the number of characters of the example sentence by a predetermined coefficient may be used.

【００６１】[0061]

【数９】 (Equation 9)

【００６２】以上のように、この実施の形態６によれ
ば、所定の言語における文章を記憶する例文記憶手段１
０を設け、カタカナ未知語修正手段１１はこの例文から
抽出される単語との一致部分において入力音韻列を分割
するので、単語辞書３には予め登録されていない単語や
外来語分割文字列記憶手段では推定しきれない造語や専
門用語などの未定義外来語を、その推定機能を損なうこ
となく使用者が自由に文章形式のまま追加学習させて、
それを音声認識させることができる効果がある。As described above, according to the sixth embodiment, the example sentence storage means 1 for storing a sentence in a predetermined language.
0, and the katakana unknown word correcting means 11 divides the input phoneme string at a part corresponding to the word extracted from the example sentence. Therefore, a word not previously registered in the word dictionary 3 or a foreign word divided character string storage means The user can freely learn undefined foreign words, such as coined words and technical terms, that cannot be fully estimated in the sentence form without damaging the estimation function.
There is an effect that it can be recognized by voice.

【００６３】なお、この実施の形態６では実施の形態１
を前提として例文記憶手段１０を設けた例を説明した
が、実施の形態２や実施の形態５を前提としても同様の
効果を得ることができる。In the sixth embodiment, the first embodiment is used.
Although the example in which the example sentence storage unit 10 is provided has been described on the premise of the above, the same effect can be obtained also on the premise of the second and fifth embodiments.

【００６４】[0064]

【発明の効果】以上のように、この発明によれば、単語
をその登録単語文字列と登録単語音声符号列とに対応づ
けて記憶する単語辞書とともに、外来語を一定の長さに
分割して得られる外来語分割文字列を、その外来語分割
音声符号列に対応づけて記憶する外来語分割文字列記憶
手段を設け、分割手段が入力音声符号列を上記登録単語
音声符号列および上記外来語分割音声符号列と比較し、
一致部分ごとに入力音声符号列を分割する入力符号分割
列を生成するので、単語辞書に登録されていない外来語
を推定生成して入力符号分割列を生成することができる
効果がある。As described above, according to the present invention, a word dictionary for storing a word in association with a registered word character string and a registered word speech code string and a foreign word are divided into a predetermined length. A foreign word segmented character string storage means for storing the foreign word segmented character string obtained by the above in association with the foreign word segmented speech code string is provided, and the dividing means stores the input speech code string in the registered word speech code string and the foreign word segment. Compared with the word segmentation speech code string,
Since the input code division sequence for dividing the input speech code sequence for each matching portion is generated, there is an effect that a foreign word not registered in the word dictionary is estimated and generated to generate the input code division sequence.

【００６５】特に、外来語は所定の言語とはその音の連
鎖の仕方が異なる場合が多いが、本発明においては、そ
のような場合であったとしても音声に基づく単語分割の
際に未定義外来語を推定するようにしているので、例え
ば一旦所定の言語の文字列に変換した後に外来語を推定
させるような場合と比べたとしても格段に変換誤差を抑
制することができ、未定義の外来語を高度に推定するこ
とができる効果がある。In particular, a foreign language often differs from a predetermined language in the manner of chaining sounds, but in the present invention, even in such a case, undefined words are not defined in word division based on speech. Since the foreign words are estimated, even if it is compared with, for example, a case where the foreign words are estimated after being once converted into a character string of a predetermined language, the conversion error can be remarkably suppressed, and the undefined There is an effect that a foreign word can be highly estimated.

【００６６】また、このように音声に基づく単語分割の
際に未定義外来語を推定しているので、その他の入力音
声符号列における単語分割に悪影響を及ぼし難くなり、
未定義外来語を含んだ音声を認識させた場合に、その認
識処理全体の破綻を来さないようにすることができる効
果がある。In addition, since the undefined foreign words are estimated at the time of speech-based word segmentation, the word segmentation in other input speech code strings is not adversely affected.
When a speech including an undefined foreign word is recognized, there is an effect that it is possible to prevent a failure in the entire recognition processing.

【００６７】その結果、近年の日本語のように外来語が
急激に増加たり変化したりしてしまうために、単語辞書
に全ての外来語を登録することが実質的に困難であるよ
うな状況においても、外来語を含んだ音声を高度に認識
することができる効果がある。As a result, it is substantially difficult to register all the foreign words in the word dictionary because the foreign words suddenly increase or change as in Japanese in recent years. Also, there is an effect that speech including foreign words can be highly recognized.

【００６８】この発明によれば、分割手段は登録単語音
声符号列と一致しない部分について外来語分割音声符号
列との比較を行い、外来語分割音声符号列との一致部分
を分割するので、余分な処理をすることなく高速に未定
義外来語を含んだ音声を認識することができる効果があ
る。According to the present invention, the dividing means compares the part which does not coincide with the registered word speech code string with the foreign word divided speech code string and divides the part which coincides with the foreign word divided speech code string. There is an effect that a voice including an undefined foreign word can be quickly recognized without performing a complicated process.

【００６９】この発明によれば、単語辞書は各単語毎に
その単語出現確率を記憶し、外来語分割文字列記憶手段
は各外来語分割文字列毎にその分割文字列出現確率を記
憶し、分割手段は入力音声符号列を分割した際にその全
体的な出現確率が最も高くなるように入力符号分割列を
生成するので、例えば未定義外来語が単語辞書に登録さ
れた単語と同じ発音である場合であっても、その未定義
外来語の出現確率と単語の出現確率とを比較して未定義
外来語への変換を行うことができる効果がある。According to the present invention, the word dictionary stores the word appearance probability for each word, and the foreign word divided character string storage means stores the divided character string appearance probability for each foreign word divided character string. Since the dividing means generates the input code division sequence so that the overall appearance probability becomes highest when the input speech code sequence is divided, for example, the undefined foreign word has the same pronunciation as the word registered in the word dictionary. Even in such a case, there is an effect that the appearance probability of the undefined foreign word is compared with the occurrence probability of the word, and conversion to the undefined foreign word can be performed.

【００７０】この発明によれば、外来語分割文字列記憶
手段は各外来語分割文字列毎にその分割文字列出現確率
を記憶し、分割手段は、入力音声符号列を分割した際に
複数の外来語分割文字列が断続的に連続する場合には、
その断続的な連続区間の全体を外来語分割文字列に変換
した場合の確率と断続的に変換した場合の確率とを比較
し、その区間確率の高い方を用いて入力符号分割列を生
成するので、分割手段が一旦、本来未定義外来語として
変換されるべき部分を誤って複数の外来語分割文字列が
断続的に連続する部位であると判断してしまった場合で
あっても、それを本来の未定義外来語に変換させること
ができる効果がある。According to the present invention, the foreign word divided character string storage means stores the divided character string appearance probability for each foreign word divided character string, and the dividing means stores a plurality of divided character strings when the input speech code string is divided. If the foreign word segmentation string is intermittent,
By comparing the probability of converting the entire intermittent continuous section into a foreign word segmented character string and the probability of intermittently converting it, an input code division sequence is generated using the higher one of the section probabilities. Therefore, even if the dividing means once mistakenly determines that a part that should be converted as an undefined foreign word is a part where a plurality of foreign word divided character strings are intermittently continuous, Can be converted into the original undefined foreign word.

【００７１】この発明によれば、分割手段は断続的な連
続区間の全体を外来語分割文字列に変換した場合の確率
に修正重み付けを行った上で区間確率を比較するので、
分割手段が確率比較において、一旦本来未定義外来語と
して変換されるべき部分を誤って複数の外来語分割文字
列が断続的に連続する部位であると判断してしまった場
合であっても、それを本来の未定義外来語に変換させる
ことができる効果がある。According to the present invention, the dividing means compares the section probabilities after weighting the probabilities obtained when the entire intermittent continuous section is converted into the foreign word divided character string with correction weights.
Even if the dividing means mistakenly determines that the part that should be originally converted as an undefined foreign word is a part where a plurality of foreign word divided character strings are intermittently continuous, There is an effect that it can be converted into an original undefined foreign word.

【００７２】この発明によれば、所定の言語における文
章を記憶する例文記憶手段を設け、分割手段はこの例文
から抽出される単語との一致部分において入力音声符号
列を分割するので、単語辞書には予め登録されていない
単語や外来語分割文字列記憶手段では推定しきれない造
語や専門用語などの未定義外来語を、その推定機能を損
なうことなく使用者が自由に追加学習させてそれを音声
認識させることができる効果がある。According to the present invention, there is provided an example sentence storage means for storing a sentence in a predetermined language, and the dividing means divides the input speech code string at a portion corresponding to a word extracted from this example sentence. The user can freely learn additional words, such as unregistered words and undefined foreign words such as coined words and technical terms that cannot be estimated by the foreign word segmented character string storage means, without impairing the estimation function. There is an effect that voice recognition can be performed.

[Brief description of the drawings]

【図１】この発明の実施の形態１による日本語入力用
の音声認識装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech recognition device for Japanese input according to a first embodiment of the present invention.

【図２】この発明の実施の形態１による音声認識動作
を示すフローチャートである。FIG. 2 is a flowchart showing a speech recognition operation according to the first embodiment of the present invention.

【図３】この発明の実施の形態１による音韻ラティス
の説明図である。FIG. 3 is an explanatory diagram of a phoneme lattice according to the first embodiment of the present invention.

【図４】この発明の実施の形態１による単語辞書の部
分具体例を示すデータリストである。FIG. 4 is a data list showing a specific example of a part of the word dictionary according to the first embodiment of the present invention.

【図５】この発明の実施の形態１によるｎ−ｇｒａｍ
辞書の部分具体例を示すデータリストである。FIG. 5 is an n-gram according to the first embodiment of the present invention.
6 is a data list showing a specific example of a dictionary.

【図６】この発明の実施の形態２による音声認識動作
を示すフローチャートである。FIG. 6 is a flowchart showing a speech recognition operation according to Embodiment 2 of the present invention.

【図７】この発明の実施の形態３による音声認識動作
を示すフローチャートである。FIG. 7 is a flowchart showing a speech recognition operation according to Embodiment 3 of the present invention.

【図８】この発明の実施の形態５による音声認識動作
を示すフローチャートである。FIG. 8 is a flowchart showing a voice recognition operation according to Embodiment 5 of the present invention.

【図９】この発明の実施の形態６による日本語入力用
の音声認識装置の構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a speech recognition device for Japanese input according to a sixth embodiment of the present invention.

【図１０】この発明の実施の形態６による音声認識動
作を示すフローチャートである。FIG. 10 is a flowchart showing a speech recognition operation according to Embodiment 6 of the present invention.

【図１１】従来の音声認識装置の構成を示すブロック
図である。FIG. 11 is a block diagram showing a configuration of a conventional voice recognition device.

[Explanation of symbols]

１マイク（前処理手段）、２音韻確率算出手段（前
処理手段）、３単語辞書、４ｎ−ｇｒａｍ辞書（外
来語分割文字列記憶手段）、５単語予測手段（分割手
段）、６，１１カタカナ未知語修正手段（分割手
段）、８選択手段（文字列生成手段）、９出力手段
（文字列生成手段）、１０例文記憶手段。1 microphone (preprocessing means), 2 phoneme probability calculating means (preprocessing means), 3 word dictionary, 4n-gram dictionary (foreign word divided character string storage means), 5 word prediction means (dividing means), 6, 11 Katakana unknown word correction means (division means), 8 selection means (character string generation means), 9 output means (character string generation means), 10 example sentence storage means.

Claims

[Claims]

1. A pre-processing means for inputting speech in a predetermined language and generating an input speech code string in the language based on the speech, a word in the language, a registered word character string and a registered word speech code A word dictionary that is stored in association with a sequence, and a foreign word segmented character string that stores a foreign word segmented character string obtained by dividing a foreign word into a fixed length in association with the foreign word segmented speech code string. Storage means, and a dividing means for comparing the input speech code string with the registered word speech code string and the foreign word segmented speech code string, and generating an input code segment sequence for dividing the input speech code string for each matching portion; A speech recognition apparatus comprising: a character string generation unit that generates a recognition character string by associating character strings corresponding to respective divided portions of an input code division string.

2. Pre-processing means for receiving a voice in a predetermined language, generating a plurality of input voice code strings in the language together with respective input appearance probabilities based on the voice, and registering words in the language. A word dictionary that stores word character strings, registered word speech code strings and word appearance probabilities in association with each other, and a foreign word segmented character string obtained by dividing a foreign word into a certain length, the foreign word segmented speech code string And a foreign word divided character string storage unit that stores the input speech code string in association with the divided character string appearance probability, and for each input speech code string, stores the input speech code string with the registered word speech code string and the foreign word divided speech code string. Comparing means for generating an input code division sequence for dividing an input speech code sequence for each matching portion; and inputting character codes based on an input appearance probability, a word appearance probability, and a divided character string appearance probability. A character string generating means for obtaining a total appearance probability of the divided sequence and generating a recognized character string by associating a character string corresponding to each divided portion of the input code divided sequence with the combination having the highest appearance probability Voice recognition device.

3. The phonetic code is a phoneme in a predetermined language,
3. The speech recognition device according to claim 1, wherein the speech recognition device is a phoneme or a syllable.

4. The speech recognition apparatus according to claim 1, wherein the foreign word division character string is a katakana notation character string or an alphabet notation character string of a foreign word.

5. The method according to claim 1, wherein the dividing unit compares a part that does not match the registered word speech code string with the foreign word divided speech code string, and divides the part that matches the foreign word divided speech code string. The speech recognition device according to claim 1 or 2.

6. The word dictionary stores the word appearance probability for each word, and the foreign word divided character string storage means stores the divided character string appearance probability for each foreign word divided character string. 3. The speech recognition apparatus according to claim 1, wherein, when the input speech code string is divided, the input code division string is generated such that the overall appearance probability is highest.

7. A foreign word divided character string storage means stores a divided character string appearance probability for each foreign word divided character string, and the dividing means stores a plurality of foreign word divided character strings when an input speech code string is divided. If the character string is intermittently continuous, the probability that the entire intermittent continuous section is converted into a foreign word segmented character string is compared with the probability that the intermittent conversion is performed intermittently. 7. The speech recognition device according to claim 5, wherein an input code division sequence is generated using a higher one.

8. The dividing means according to claim 7, wherein the dividing means compares the section probabilities after applying correction weighting to the probabilities when the entire intermittent continuous section is converted into the foreign word divided character string. Voice recognition device.

9. An apparatus according to claim 1, further comprising an example sentence storing means for storing a sentence in a predetermined language, wherein the dividing means divides the input speech code string at a portion corresponding to a word extracted from the example sentence. Any one of claims 8
The speech recognition device according to the item.