JPS6039522A - Word voice recognizing method - Google Patents
Word voice recognizing methodInfo
- Publication number
- JPS6039522A JPS6039522A JP58147312A JP14731283A JPS6039522A JP S6039522 A JPS6039522 A JP S6039522A JP 58147312 A JP58147312 A JP 58147312A JP 14731283 A JP14731283 A JP 14731283A JP S6039522 A JPS6039522 A JP S6039522A
- Authority
- JP
- Japan
- Prior art keywords
- word
- value
- dictionary
- recognition
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 13
- 230000005540 biological transmission Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 208000034656 Contusions Diseases 0.000 description 1
- 230000009519 contusion Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
Abstract
Description
【発明の詳細な説明】
産業上の利用分野
本発明は、入力音声と音素表記された単語辞書とを照合
して単語を認識する単語音声認識方法に関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a word speech recognition method for recognizing words by comparing input speech with a word dictionary in which phonemes are expressed.
従来例の構成とその問題点
従来の単語音声認識方法を1図とともに説明する。 図
に示すように、入力音声に対して先ず分析を行ない、こ
の入力単語音声の特徴を抽出して、入力単語音声を構成
する音素を認識する。The configuration of a conventional example and its problems A conventional word speech recognition method will be explained with reference to FIG. As shown in the figure, the input speech is first analyzed, the features of the input word speech are extracted, and the phonemes that make up the input word speech are recognized.
この認識された音素系列を、単語辞書中の各辞書項目の
辞書音素系列と照合し、2つの音素系列間の類似度を、
音素間のコンツー−ジョンマトリクス(CM)を用いて
各音素毎の認識確率をめることにより算出し、次に、各
辞書項目と前記類似度が第1位である辞書項目との組み
合わせ毎に予め定められている尤度1みを各辞書項目毎
の前記類似度に加算、又は減算し、得られた重み伺き類
似度が最大となる辞書項目をもって認識単語とするもの
である。第1表は、前記単語音声認識方法に用いる単語
辞書の一例を示しておシ、各単語は第2表に示す音素表
記法に従って表記されている。This recognized phoneme sequence is compared with the dictionary phoneme sequence of each dictionary entry in the word dictionary, and the degree of similarity between the two phoneme sequences is calculated as follows:
It is calculated by calculating the recognition probability for each phoneme using a contusion matrix (CM) between phonemes, and then for each combination of each dictionary item and the dictionary item with the highest similarity. A predetermined likelihood of 1 is added to or subtracted from the similarity for each dictionary item, and the dictionary item with the maximum weighted similarity is selected as a recognized word. Table 1 shows an example of a word dictionary used in the word speech recognition method, and each word is written according to the phoneme notation shown in Table 2.
第1表
第2表
第3表
第3表は、前記コンフユージヨンマトリクスの一部を示
す。第3表において、縦は単語辞書中の音素を示し、横
は認識音素を示している。また第3表中の数字は単語辞
書中の各音素がどのような音素に認識されるかの確率を
チで示したものである。例えば、第3表において、単語
辞書中のlが■と認識される確率は75%、Uに認識さ
れる確率は5チ、Aに認識される確率はoq6、脱落す
る確率は8%、等を示している。Table 1, Table 2, Table 3, and Table 3 show a portion of the conflation matrix. In Table 3, the vertical lines indicate phonemes in the word dictionary, and the horizontal lines indicate recognized phonemes. Furthermore, the numbers in Table 3 indicate the probability of what kind of phoneme each phoneme in the word dictionary will be recognized as. For example, in Table 3, the probability that l in the word dictionary will be recognized as ■ is 75%, the probability that it will be recognized as U is 5chi, the probability that it will be recognized as A is oq6, the probability that it will be omitted is 8%, etc. It shows.
上記従来例において、単語Aの認識音素系列が特有の音
素認識誤逆傾向を持ち、そのような認識音素系列が単語
Aよシ単語Bの辞書音素系列との類似度の方が高い場合
、単語Aの入力音声が集中的にBに誤認識されるのを防
止するために類似度の値を調整するのが尤度重みを用い
る目的である。In the above conventional example, if the recognized phoneme sequence of word A has a unique reverse tendency for phoneme recognition errors, and such recognized phoneme sequence has a higher degree of similarity with the dictionary phoneme sequence of word A than word B, then the word The purpose of using the likelihood weight is to adjust the similarity value in order to prevent A's input speech from being incorrectly recognized by B in a concentrated manner.
この尤度重みの値は、予め多数の音声データの認識実験
を行ない、単語認識率が最大となるような値にセットさ
れていた。単語毎の音素認識傾向が常に変わらなければ
、従来の方法でも差しつかえないが、伝送路、周囲雑音
の影響等で音素認識傾向が変化すると尤度重みの値は最
適な値ではなくなシ、単語認識率が低下するという問題
があった。The value of this likelihood weight was set to a value that maximized the word recognition rate by performing recognition experiments on a large number of voice data in advance. If the phoneme recognition tendency for each word does not always change, the conventional method can be used, but if the phoneme recognition tendency changes due to the influence of the transmission path, ambient noise, etc., the likelihood weight value will not be the optimal value. There was a problem that the word recognition rate decreased.
発明の目的
本発明は前記従来例の問題を解決し、環境が変わっても
高い単語認識率を維持できる単語音声認識方法を提供す
ることを目的とするものである。OBJECTS OF THE INVENTION It is an object of the present invention to provide a word speech recognition method that solves the problems of the conventional example and can maintain a high word recognition rate even if the environment changes.
発明の構成
本発明の単語音声認識方法は、上記目的を達成するため
、入力音声と辞書音素系列との1久み付き類似度を計算
するのに用いる尤度重みの値を、実際の単語音声認識に
おいて誤認識が発生する度に、その時の重み付き類似度
−位と、入力語の辞書項目における重み付き類似度の差
を縮少する方向に修正し、その修正値を次回以後の単語
音声認識に用いることを特徴とする。Structure of the Invention In order to achieve the above object, the word speech recognition method of the present invention changes the likelihood weight value used to calculate the one-time similarity between the input speech and the dictionary phoneme sequence to the actual word speech. Every time a misrecognition occurs during recognition, the difference between the weighted similarity at that time and the weighted similarity in the dictionary entry of the input word is corrected to reduce the difference, and the corrected value is used for the next word speech. It is characterized by being used for recognition.
実施例の説明
本実施例における単語音声認識方法は、前記従来例を改
良したものであシ、認識アルゴリズムのくし
概略は、従来例と同様、図で表わされる。入力音声の音
素認識を行ない、この認識音素系列と単語辞書中の各辞
書項目の辞書音素系列との類似度をめ、この類似度に尤
度重みを加算又は減算して重み付き類似度を算出し、そ
の値が最大となる辞書項目をもって認識単語とするとい
う点で前記従来例と共通であるが、用いる尤度重みの値
を固定値ではなく、音素認識傾向の変化に合わせ最適値
に修正する点が従来例と異なる。修正法を具体的に述べ
る。単語Aが類似度1位の時に単語Bの類似度に加える
尤度重みの値がn点であるとし、Bの入力に対し単語認
識結果がAであったとすると、先の尤度重みの値は自動
的にn 十m点に修正され、以後の認識においてこのn
十m点を用いることになる。またA、Hの入力に対し
正しい認識結果が得られた場合には尤度重みの値はその
ままである。またAの入力に対し単語認識結果がBとな
った時は、上記と反対に尤度重みの値をn −mに修正
する。これ−を続けることにより、尤度重みの値はAと
Bが互に誤認識する確率が等しい状態、即ち、全体とし
ての認識率が最高になる値に収束するものである。なお
、初期値は0点でも、事前に実験的に得られた値を用い
ても、修正を繰シ返せば同じ値に収束する。DESCRIPTION OF THE EMBODIMENTS The word speech recognition method in this embodiment is an improvement on the conventional example, and the outline of the recognition algorithm is shown in the diagram as in the conventional example. Performs phoneme recognition of the input voice, calculates the degree of similarity between this recognized phoneme sequence and the dictionary phoneme sequence of each dictionary item in the word dictionary, and calculates weighted similarity by adding or subtracting a likelihood weight to this degree of similarity. However, it is similar to the conventional example in that the dictionary entry with the maximum value is selected as the recognized word, but the value of the likelihood weight used is not a fixed value, but is modified to an optimal value according to changes in phoneme recognition trends. This is different from the conventional example. The revised method will be described in detail. When word A has the highest similarity, the value of the likelihood weight added to the similarity of word B is n points, and if the word recognition result is A for the input of B, then the value of the previous likelihood weight is automatically corrected to n 10 m points, and in subsequent recognition this n
The 10m point will be used. Further, if correct recognition results are obtained for inputs A and H, the value of the likelihood weight remains unchanged. Further, when the word recognition result is B for input A, the value of the likelihood weight is corrected to n - m, contrary to the above. By continuing this process, the value of the likelihood weight converges to a value where the probability of misrecognizing A and B is equal, that is, the overall recognition rate is the highest. Note that even if the initial value is 0 point or a value obtained experimentally in advance, if the correction is repeated, the value will converge to the same value.
本実施例においては、伝送路、周囲雑音の影響等で音素
認識傾向が変化しても、単語音声認識を行ないながら尤
度重みの修正値が得られ、それを次の単語音声認識に用
いることによシ、高い認識率を維持できるという効果を
有する。In this embodiment, even if the phoneme recognition tendency changes due to the influence of the transmission path, ambient noise, etc., a corrected value of the likelihood weight can be obtained while performing word speech recognition, and this can be used for the next word speech recognition. This has the advantage that a high recognition rate can be maintained.
なお本発明は、認識音素系列を一意に定める場合に限ら
ず、ラティス形式の認識音素系列を用いる場合、さらに
単語音声認識アルゴリズムにおいて、明白な認識音素系
列の形?とらず、入力音声の分析結果と、音素表記され
た単語辞書の各項目の辞書音素系列とのマツチングを直
接性なって尤度をめる場合、に適用しても同様の効果を
得ることができる。Note that the present invention is applicable not only to the case where a recognized phoneme sequence is uniquely determined, but also when a recognized phoneme sequence in a lattice format is used, and furthermore, in a word speech recognition algorithm, the obvious form of the recognized phoneme sequence? A similar effect can also be obtained when applying the method to directly match the input speech analysis result with the dictionary phoneme sequence of each item in a word dictionary with phoneme notation. can.
発明の効果
本発明によれば、伝送路、周囲雑音等の環境が変化して
も高い単語認識率を維持できるという利点がある。Effects of the Invention According to the present invention, there is an advantage that a high word recognition rate can be maintained even if the environment such as the transmission path and ambient noise changes.
図は本発明の実施例およびその従来例を説明するだめの
単語音声認識方法の概略を示す図である。The figure is a diagram illustrating an outline of a word speech recognition method for explaining an embodiment of the present invention and a conventional example thereof.
Claims (1)
音素系列との類似度を計算して単語を認識するに際し、
入力音声に対し前記類似度を計算した時、各辞書項目毎
の類似度に、各辞書項目と前記類似度が第1位である辞
書項目との組み合わせ毎に予め定められている尤度重み
を加算、減算、乗算、または除算して重み付き類似度を
算出し、この重み付き類似度が最大となる辞書項目をも
って認識単語とする単語音声認識方法において、前記尤
度重み値を、単語の誤認識が発生する度に、その時の重
み付き類似度−位と、入力語の辞書項目における重み付
き類似度の差を縮少する方向に遂次、自動的に修正する
ことを特徴とする単語音声認識方法。When recognizing words by calculating the similarity between the input speech and the dictionary phoneme series of each dictionary entry in the word dictionary with phoneme notation,
When calculating the similarity for the input speech, a likelihood weight predetermined for each combination of each dictionary item and the dictionary item with the highest similarity is added to the similarity for each dictionary item. In a word speech recognition method, a weighted similarity is calculated by addition, subtraction, multiplication, or division, and the dictionary entry with the highest weighted similarity is used as a recognized word. A word sound characterized in that each time recognition occurs, the difference between the current weighted similarity degree and the weighted similarity degree in a dictionary entry of the input word is automatically corrected in a direction that reduces the difference. Recognition method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58147312A JPS6039522A (en) | 1983-08-13 | 1983-08-13 | Word voice recognizing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58147312A JPS6039522A (en) | 1983-08-13 | 1983-08-13 | Word voice recognizing method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS6039522A true JPS6039522A (en) | 1985-03-01 |
JPH0158519B2 JPH0158519B2 (en) | 1989-12-12 |
Family
ID=15427343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP58147312A Granted JPS6039522A (en) | 1983-08-13 | 1983-08-13 | Word voice recognizing method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS6039522A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01150876U (en) * | 1988-04-08 | 1989-10-18 |
-
1983
- 1983-08-13 JP JP58147312A patent/JPS6039522A/en active Granted
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01150876U (en) * | 1988-04-08 | 1989-10-18 |
Also Published As
Publication number | Publication date |
---|---|
JPH0158519B2 (en) | 1989-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3750110B1 (en) | Methods and systems for intent detection and slot filling in spoken dialogue systems | |
Bahl et al. | Maximum mutual information estimation of hidden Markov model parameters for speech recognition | |
KR920008624A (en) | Method for Recognizing Target of Image | |
JPS6039522A (en) | Word voice recognizing method | |
CN115630635B (en) | Chinese text proofreading method, system and equipment based on retrieval and multiple stages | |
WO2022242535A1 (en) | Translation method, translation apparatus, translation device and storage medium | |
JPH0486899A (en) | Standard pattern adaption system | |
JP2009217006A (en) | Dictionary correction device, system and computer program | |
Tian et al. | End-to-end speech recognition with Alignment RNN-Transducer | |
JP2007286511A (en) | Method and device for structuring speech synthesis dictionary, and program | |
KR100322730B1 (en) | Speaker adapting method | |
JP3007357B2 (en) | Dictionary update method for speech recognition device | |
JPH11133994A (en) | Voice input device, and recording medium recorded with mechanically readable program | |
JP2979999B2 (en) | Voice recognition device | |
JP2545960B2 (en) | Learning method for adaptive speech recognition | |
JPS6281699A (en) | Forming and updating method for dictoinary for voice word processor | |
JPS5968796A (en) | Recognition of word voice | |
JPS595292A (en) | Word voice recognition method | |
JPS59160276A (en) | Pattern recognizing device | |
JPS5978399A (en) | Recognition of word voice | |
JPH0690635B2 (en) | Pitchiera-correction method | |
JPS617894A (en) | Voice recognition | |
JPH0352089A (en) | Character information deciding system | |
JPH0573094A (en) | Continuous speech recognizing method | |
JPH0556515B2 (en) |