JPS6039522A - Word voice recognizing method - Google Patents

Word voice recognizing method

Info

Publication number
JPS6039522A
JPS6039522A JP58147312A JP14731283A JPS6039522A JP S6039522 A JPS6039522 A JP S6039522A JP 58147312 A JP58147312 A JP 58147312A JP 14731283 A JP14731283 A JP 14731283A JP S6039522 A JPS6039522 A JP S6039522A
Authority
JP
Japan
Prior art keywords
word
value
dictionary
recognition
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP58147312A
Other languages
Japanese (ja)
Other versions
JPH0158519B2 (en
Inventor
Takao Irumano
入間野 孝雄
Kunio Akiba
秋場 国夫
Hisanori Kanezashi
金指 久則
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Basic Technology Research Association Corp
Original Assignee
Computer Basic Technology Research Association Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Basic Technology Research Association Corp filed Critical Computer Basic Technology Research Association Corp
Priority to JP58147312A priority Critical patent/JPS6039522A/en
Publication of JPS6039522A publication Critical patent/JPS6039522A/en
Publication of JPH0158519B2 publication Critical patent/JPH0158519B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)

Abstract

PURPOSE:To maintain a high recognition rate by correcting a value of tolerance weight in the direction for reducing a difference between the first rank of a weighted similarity degree and a weighted similarity degree in a dictionary item of an input word, whenever an erroneous recognition is generated, and using its value in the next time and thereafter. CONSTITUTION:When a word A is at the first rank of a similarity degree, for instance, if a value of tolerance weight applied to a similarity degree of a word B is an (n) point, and a result of work recognition is A against an input of B, the previous value of tolerance weight is corrected automatically to (n)+(m) points. Also, in case when a correct recognition result is obtained against an input of A and B, a value of tolerance weight remains as it is. On the other hand, in case when the result of word recognition becomes B against the input of A, on the contrary, the value of tolerance weight is corrected to (n)-(m). By continuing it, the value of tolerance weight is converged to a value by which a recognition rate as a whole becomes the highest.

Description

【発明の詳細な説明】 産業上の利用分野 本発明は、入力音声と音素表記された単語辞書とを照合
して単語を認識する単語音声認識方法に関するものであ
る。
DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a word speech recognition method for recognizing words by comparing input speech with a word dictionary in which phonemes are expressed.

従来例の構成とその問題点 従来の単語音声認識方法を1図とともに説明する。 図
に示すように、入力音声に対して先ず分析を行ない、こ
の入力単語音声の特徴を抽出して、入力単語音声を構成
する音素を認識する。
The configuration of a conventional example and its problems A conventional word speech recognition method will be explained with reference to FIG. As shown in the figure, the input speech is first analyzed, the features of the input word speech are extracted, and the phonemes that make up the input word speech are recognized.

この認識された音素系列を、単語辞書中の各辞書項目の
辞書音素系列と照合し、2つの音素系列間の類似度を、
音素間のコンツー−ジョンマトリクス(CM)を用いて
各音素毎の認識確率をめることにより算出し、次に、各
辞書項目と前記類似度が第1位である辞書項目との組み
合わせ毎に予め定められている尤度1みを各辞書項目毎
の前記類似度に加算、又は減算し、得られた重み伺き類
似度が最大となる辞書項目をもって認識単語とするもの
である。第1表は、前記単語音声認識方法に用いる単語
辞書の一例を示しておシ、各単語は第2表に示す音素表
記法に従って表記されている。
This recognized phoneme sequence is compared with the dictionary phoneme sequence of each dictionary entry in the word dictionary, and the degree of similarity between the two phoneme sequences is calculated as follows:
It is calculated by calculating the recognition probability for each phoneme using a contusion matrix (CM) between phonemes, and then for each combination of each dictionary item and the dictionary item with the highest similarity. A predetermined likelihood of 1 is added to or subtracted from the similarity for each dictionary item, and the dictionary item with the maximum weighted similarity is selected as a recognized word. Table 1 shows an example of a word dictionary used in the word speech recognition method, and each word is written according to the phoneme notation shown in Table 2.

第1表 第2表 第3表 第3表は、前記コンフユージヨンマトリクスの一部を示
す。第3表において、縦は単語辞書中の音素を示し、横
は認識音素を示している。また第3表中の数字は単語辞
書中の各音素がどのような音素に認識されるかの確率を
チで示したものである。例えば、第3表において、単語
辞書中のlが■と認識される確率は75%、Uに認識さ
れる確率は5チ、Aに認識される確率はoq6、脱落す
る確率は8%、等を示している。
Table 1, Table 2, Table 3, and Table 3 show a portion of the conflation matrix. In Table 3, the vertical lines indicate phonemes in the word dictionary, and the horizontal lines indicate recognized phonemes. Furthermore, the numbers in Table 3 indicate the probability of what kind of phoneme each phoneme in the word dictionary will be recognized as. For example, in Table 3, the probability that l in the word dictionary will be recognized as ■ is 75%, the probability that it will be recognized as U is 5chi, the probability that it will be recognized as A is oq6, the probability that it will be omitted is 8%, etc. It shows.

上記従来例において、単語Aの認識音素系列が特有の音
素認識誤逆傾向を持ち、そのような認識音素系列が単語
Aよシ単語Bの辞書音素系列との類似度の方が高い場合
、単語Aの入力音声が集中的にBに誤認識されるのを防
止するために類似度の値を調整するのが尤度重みを用い
る目的である。
In the above conventional example, if the recognized phoneme sequence of word A has a unique reverse tendency for phoneme recognition errors, and such recognized phoneme sequence has a higher degree of similarity with the dictionary phoneme sequence of word A than word B, then the word The purpose of using the likelihood weight is to adjust the similarity value in order to prevent A's input speech from being incorrectly recognized by B in a concentrated manner.

この尤度重みの値は、予め多数の音声データの認識実験
を行ない、単語認識率が最大となるような値にセットさ
れていた。単語毎の音素認識傾向が常に変わらなければ
、従来の方法でも差しつかえないが、伝送路、周囲雑音
の影響等で音素認識傾向が変化すると尤度重みの値は最
適な値ではなくなシ、単語認識率が低下するという問題
があった。
The value of this likelihood weight was set to a value that maximized the word recognition rate by performing recognition experiments on a large number of voice data in advance. If the phoneme recognition tendency for each word does not always change, the conventional method can be used, but if the phoneme recognition tendency changes due to the influence of the transmission path, ambient noise, etc., the likelihood weight value will not be the optimal value. There was a problem that the word recognition rate decreased.

発明の目的 本発明は前記従来例の問題を解決し、環境が変わっても
高い単語認識率を維持できる単語音声認識方法を提供す
ることを目的とするものである。
OBJECTS OF THE INVENTION It is an object of the present invention to provide a word speech recognition method that solves the problems of the conventional example and can maintain a high word recognition rate even if the environment changes.

発明の構成 本発明の単語音声認識方法は、上記目的を達成するため
、入力音声と辞書音素系列との1久み付き類似度を計算
するのに用いる尤度重みの値を、実際の単語音声認識に
おいて誤認識が発生する度に、その時の重み付き類似度
−位と、入力語の辞書項目における重み付き類似度の差
を縮少する方向に修正し、その修正値を次回以後の単語
音声認識に用いることを特徴とする。
Structure of the Invention In order to achieve the above object, the word speech recognition method of the present invention changes the likelihood weight value used to calculate the one-time similarity between the input speech and the dictionary phoneme sequence to the actual word speech. Every time a misrecognition occurs during recognition, the difference between the weighted similarity at that time and the weighted similarity in the dictionary entry of the input word is corrected to reduce the difference, and the corrected value is used for the next word speech. It is characterized by being used for recognition.

実施例の説明 本実施例における単語音声認識方法は、前記従来例を改
良したものであシ、認識アルゴリズムのくし 概略は、従来例と同様、図で表わされる。入力音声の音
素認識を行ない、この認識音素系列と単語辞書中の各辞
書項目の辞書音素系列との類似度をめ、この類似度に尤
度重みを加算又は減算して重み付き類似度を算出し、そ
の値が最大となる辞書項目をもって認識単語とするとい
う点で前記従来例と共通であるが、用いる尤度重みの値
を固定値ではなく、音素認識傾向の変化に合わせ最適値
に修正する点が従来例と異なる。修正法を具体的に述べ
る。単語Aが類似度1位の時に単語Bの類似度に加える
尤度重みの値がn点であるとし、Bの入力に対し単語認
識結果がAであったとすると、先の尤度重みの値は自動
的にn 十m点に修正され、以後の認識においてこのn
 十m点を用いることになる。またA、Hの入力に対し
正しい認識結果が得られた場合には尤度重みの値はその
ままである。またAの入力に対し単語認識結果がBとな
った時は、上記と反対に尤度重みの値をn −mに修正
する。これ−を続けることにより、尤度重みの値はAと
Bが互に誤認識する確率が等しい状態、即ち、全体とし
ての認識率が最高になる値に収束するものである。なお
、初期値は0点でも、事前に実験的に得られた値を用い
ても、修正を繰シ返せば同じ値に収束する。
DESCRIPTION OF THE EMBODIMENTS The word speech recognition method in this embodiment is an improvement on the conventional example, and the outline of the recognition algorithm is shown in the diagram as in the conventional example. Performs phoneme recognition of the input voice, calculates the degree of similarity between this recognized phoneme sequence and the dictionary phoneme sequence of each dictionary item in the word dictionary, and calculates weighted similarity by adding or subtracting a likelihood weight to this degree of similarity. However, it is similar to the conventional example in that the dictionary entry with the maximum value is selected as the recognized word, but the value of the likelihood weight used is not a fixed value, but is modified to an optimal value according to changes in phoneme recognition trends. This is different from the conventional example. The revised method will be described in detail. When word A has the highest similarity, the value of the likelihood weight added to the similarity of word B is n points, and if the word recognition result is A for the input of B, then the value of the previous likelihood weight is automatically corrected to n 10 m points, and in subsequent recognition this n
The 10m point will be used. Further, if correct recognition results are obtained for inputs A and H, the value of the likelihood weight remains unchanged. Further, when the word recognition result is B for input A, the value of the likelihood weight is corrected to n - m, contrary to the above. By continuing this process, the value of the likelihood weight converges to a value where the probability of misrecognizing A and B is equal, that is, the overall recognition rate is the highest. Note that even if the initial value is 0 point or a value obtained experimentally in advance, if the correction is repeated, the value will converge to the same value.

本実施例においては、伝送路、周囲雑音の影響等で音素
認識傾向が変化しても、単語音声認識を行ないながら尤
度重みの修正値が得られ、それを次の単語音声認識に用
いることによシ、高い認識率を維持できるという効果を
有する。
In this embodiment, even if the phoneme recognition tendency changes due to the influence of the transmission path, ambient noise, etc., a corrected value of the likelihood weight can be obtained while performing word speech recognition, and this can be used for the next word speech recognition. This has the advantage that a high recognition rate can be maintained.

なお本発明は、認識音素系列を一意に定める場合に限ら
ず、ラティス形式の認識音素系列を用いる場合、さらに
単語音声認識アルゴリズムにおいて、明白な認識音素系
列の形?とらず、入力音声の分析結果と、音素表記され
た単語辞書の各項目の辞書音素系列とのマツチングを直
接性なって尤度をめる場合、に適用しても同様の効果を
得ることができる。
Note that the present invention is applicable not only to the case where a recognized phoneme sequence is uniquely determined, but also when a recognized phoneme sequence in a lattice format is used, and furthermore, in a word speech recognition algorithm, the obvious form of the recognized phoneme sequence? A similar effect can also be obtained when applying the method to directly match the input speech analysis result with the dictionary phoneme sequence of each item in a word dictionary with phoneme notation. can.

発明の効果 本発明によれば、伝送路、周囲雑音等の環境が変化して
も高い単語認識率を維持できるという利点がある。
Effects of the Invention According to the present invention, there is an advantage that a high word recognition rate can be maintained even if the environment such as the transmission path and ambient noise changes.

【図面の簡単な説明】[Brief explanation of the drawing]

図は本発明の実施例およびその従来例を説明するだめの
単語音声認識方法の概略を示す図である。
The figure is a diagram illustrating an outline of a word speech recognition method for explaining an embodiment of the present invention and a conventional example thereof.

Claims (1)

【特許請求の範囲】[Claims] 入力音声と音素表記された単語辞書の各辞書項目の辞書
音素系列との類似度を計算して単語を認識するに際し、
入力音声に対し前記類似度を計算した時、各辞書項目毎
の類似度に、各辞書項目と前記類似度が第1位である辞
書項目との組み合わせ毎に予め定められている尤度重み
を加算、減算、乗算、または除算して重み付き類似度を
算出し、この重み付き類似度が最大となる辞書項目をも
って認識単語とする単語音声認識方法において、前記尤
度重み値を、単語の誤認識が発生する度に、その時の重
み付き類似度−位と、入力語の辞書項目における重み付
き類似度の差を縮少する方向に遂次、自動的に修正する
ことを特徴とする単語音声認識方法。
When recognizing words by calculating the similarity between the input speech and the dictionary phoneme series of each dictionary entry in the word dictionary with phoneme notation,
When calculating the similarity for the input speech, a likelihood weight predetermined for each combination of each dictionary item and the dictionary item with the highest similarity is added to the similarity for each dictionary item. In a word speech recognition method, a weighted similarity is calculated by addition, subtraction, multiplication, or division, and the dictionary entry with the highest weighted similarity is used as a recognized word. A word sound characterized in that each time recognition occurs, the difference between the current weighted similarity degree and the weighted similarity degree in a dictionary entry of the input word is automatically corrected in a direction that reduces the difference. Recognition method.
JP58147312A 1983-08-13 1983-08-13 Word voice recognizing method Granted JPS6039522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58147312A JPS6039522A (en) 1983-08-13 1983-08-13 Word voice recognizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58147312A JPS6039522A (en) 1983-08-13 1983-08-13 Word voice recognizing method

Publications (2)

Publication Number Publication Date
JPS6039522A true JPS6039522A (en) 1985-03-01
JPH0158519B2 JPH0158519B2 (en) 1989-12-12

Family

ID=15427343

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58147312A Granted JPS6039522A (en) 1983-08-13 1983-08-13 Word voice recognizing method

Country Status (1)

Country Link
JP (1) JPS6039522A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01150876U (en) * 1988-04-08 1989-10-18

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01150876U (en) * 1988-04-08 1989-10-18

Also Published As

Publication number Publication date
JPH0158519B2 (en) 1989-12-12

Similar Documents

Publication Publication Date Title
EP3750110B1 (en) Methods and systems for intent detection and slot filling in spoken dialogue systems
Bahl et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition
KR920008624A (en) Method for Recognizing Target of Image
JPS6039522A (en) Word voice recognizing method
CN115630635B (en) Chinese text proofreading method, system and equipment based on retrieval and multiple stages
WO2022242535A1 (en) Translation method, translation apparatus, translation device and storage medium
JPH0486899A (en) Standard pattern adaption system
JP2009217006A (en) Dictionary correction device, system and computer program
Tian et al. End-to-end speech recognition with Alignment RNN-Transducer
JP2007286511A (en) Method and device for structuring speech synthesis dictionary, and program
KR100322730B1 (en) Speaker adapting method
JP3007357B2 (en) Dictionary update method for speech recognition device
JPH11133994A (en) Voice input device, and recording medium recorded with mechanically readable program
JP2979999B2 (en) Voice recognition device
JP2545960B2 (en) Learning method for adaptive speech recognition
JPS6281699A (en) Forming and updating method for dictoinary for voice word processor
JPS5968796A (en) Recognition of word voice
JPS595292A (en) Word voice recognition method
JPS59160276A (en) Pattern recognizing device
JPS5978399A (en) Recognition of word voice
JPH0690635B2 (en) Pitchiera-correction method
JPS617894A (en) Voice recognition
JPH0352089A (en) Character information deciding system
JPH0573094A (en) Continuous speech recognizing method
JPH0556515B2 (en)