JPS58129498A - Voice recognition - Google Patents

Voice recognition

Info

Publication number
JPS58129498A
JPS58129498A JP1087482A JP1087482A JPS58129498A JP S58129498 A JPS58129498 A JP S58129498A JP 1087482 A JP1087482 A JP 1087482A JP 1087482 A JP1087482 A JP 1087482A JP S58129498 A JPS58129498 A JP S58129498A
Authority
JP
Japan
Prior art keywords
voiceless
consonant
phoneme
unvoiced
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1087482A
Other languages
Japanese (ja)
Other versions
JPS637399B2 (en
Inventor
入間野 孝雄
金指 久則
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Basic Technology Research Association Corp
Original Assignee
Computer Basic Technology Research Association Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Basic Technology Research Association Corp filed Critical Computer Basic Technology Research Association Corp
Priority to JP1087482A priority Critical patent/JPS58129498A/en
Publication of JPS58129498A publication Critical patent/JPS58129498A/en
Publication of JPS637399B2 publication Critical patent/JPS637399B2/ja
Granted legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 本発明は、音声認識方法に関し、通常の母音認識手段で
は認識できない無声化母音を高い確度で抽出し、音声認
識率を向上きせることを目的とするものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition method, and an object of the present invention is to extract with high accuracy devoiced vowels that cannot be recognized by ordinary vowel recognition means, and to improve the speech recognition rate.

まず、従来の無声化母音抽出法を述べる。基本的な考え
方としては、まず大まかな音素認識を行ない、無声音区
間が長く、シかもその間に顕著な音声パワのディツノが
存在する時、このディ、fを2音節の境界と見なし、無
声音区間を3音素、すなわち無声子音、無声化母音、無
声子音が連らなったものとして認識するものである。第
1図によシ、具体的に説明する。入力音声を10m5毎
のフレームに分割し、音響レベルでの分析を行ない、音
素認識に必要な−・七うメータを抽出する。次にセグメ
ンテーション(音素の境界を見つけること)とフレーム
単位での予備音素認識を行なう。この時点で有声の母音
はその一!!捷認識されるが、無声化母音の場合には、
両側の子音を含める音素が一体と々す、一つの無声子音
として認識される。従って予備音素認識の結果、無声子
音区間があった場合は、無声化母音の存在をチェックす
る必要がある。従来のチェックルーチンは、第1図中の
ひし形の分岐で表わした部分である。まず無声子音の有
無をチェ、りする。無い場合はそれ以上のチェックは行
なわない。さらに無声子音区間がある場合、この無声子
音区間のフレーム数が、予め実験結果に基づき定められ
たスレッシュホールドTllより多く、かつ、この無声
子音区間中に無音(Qで表わす)フレームが存在し、か
つこのQ区間のフレーム数が予め定められたスレッシュ
ホールドTI2より多い時、このQ区間は2個の音節の
境界をなしていると見なし、前記無声子音区間を前後に
2分し、前の方の無声子音の直後に母音UIを挿入する
。このUIはU又は工なる音素という意味である。以上
のような無声化音処理を行なった後、認識音素系列を作
成し、単語辞書とのマツチングを行なって単語認識を行
なうものであった。
First, a conventional devoiced vowel extraction method will be described. The basic idea is to first perform rough phoneme recognition, and when an unvoiced interval is long and there is a significant voice power ditsuno between them, this D, f is regarded as the boundary between two syllables, and the unvoiced interval is recognized. It is recognized as a series of three phonemes: a voiceless consonant, a voiceless vowel, and a voiceless consonant. This will be explained in detail with reference to FIG. The input audio is divided into frames of every 10m5, analyzed at the acoustic level, and the -7 meters necessary for phoneme recognition are extracted. Next, segmentation (finding phoneme boundaries) and preliminary phoneme recognition are performed on a frame-by-frame basis. At this point, that's the only voiced vowel! ! However, in the case of devoiced vowels,
The phoneme including the consonants on both sides is recognized as one voiceless consonant. Therefore, if a voiceless consonant section is found as a result of preliminary phoneme recognition, it is necessary to check the presence of a voiceless vowel. The conventional check routine is represented by a diamond-shaped branch in FIG. First, check whether there are voiceless consonants. If it does not exist, no further checks are performed. Furthermore, if there is an unvoiced consonant section, the number of frames in this unvoiced consonant section is greater than a threshold Tll determined in advance based on experimental results, and a silent (represented by Q) frame exists in this unvoiced consonant section, Moreover, when the number of frames in this Q interval is greater than the predetermined threshold TI2, this Q interval is considered to be a boundary between two syllables, and the unvoiced consonant interval is divided into two halves, one in front and the other in half. Insert a vowel UI immediately after the voiceless consonant. This UI means the phoneme ``U'' or ``tech''. After performing the devoicing process as described above, a recognized phoneme sequence is created and matched with a word dictionary to perform word recognition.

以上に述べた従来の無声化母音抽出法において、T12
を小さく設定すると、もともと−個の無声化子音である
無声化子音区間に対し無声化母音の誤抽出を行なうケー
スが増大し、反対にTI2を犬きく設定すると無声化母
音の抽出率が低下するという欠点があった。
In the conventional devoiced vowel extraction method described above, T12
If TI2 is set to a small value, the number of cases in which devoiced vowels are incorrectly extracted from a devoiced consonant interval that originally contains - devoiced consonants increases, and on the other hand, if TI2 is set too high, the extraction rate of devoiced vowels decreases. There was a drawback.

本発明は、前記従来例の欠点を除去するものであり、以
下に前記従来例との相異点を中心に、実施例を説明する
。第2図は本発明の一実施例の認識フローを示したもの
である。第2図に示すアル線で示しだ円(201)内の
みである。すなわち、Q区間が短かい場合でもQ区間の
間での・Pワの差分値(フレーム間の・ぞワの差)の極
大値と極小値を求め、その差が予め定められたスレッシ
ュホールドT23よシ大きい場合は、そこに明らかな音
声の区切れがあると見なし、従来のQ区間が長い場合と
同様に取り扱うものである。
The present invention eliminates the drawbacks of the conventional example, and embodiments will be described below, focusing on the differences from the conventional example. FIG. 2 shows a recognition flow according to an embodiment of the present invention. It is only within the circle (201) indicated by the al line shown in FIG. In other words, even if the Q interval is short, the local maximum and minimum values of the difference value of Pwa (difference between frames) between the Q intervals are determined, and the difference is set as a predetermined threshold T23. If the Q interval is too large, it is assumed that there is a clear break in the voice, and the Q interval is treated in the same way as when the Q interval is long.

「機械」という言葉について動作を説明する。Explain the operation of the word "machine".

「キカイ」の「キ」は、標準語では無声化する。The ``ki'' in ``kikai'' is devoiced in standard Japanese.

従って「キカイ」の予備認識結果はK I KA Iで
はなく、■が脱落し、2つのKが運なかってしまう。
Therefore, the preliminary recognition result of "Kikai" is not K I KA I, but ■ is omitted, and two K's are not lucky.

第3図は「キカイ」の前半部の予備認識結果と・やワの
変化の様子を、時間軸を同一にして示したものである。
Figure 3 shows the preliminary recognition results for the first half of ``Kikai'' and the changes in ``Kikai'' with the same time axis.

無音のQは、子音のKと重複して認識されている。これ
は、語中の無音部(正確に言えば、ノクワが、予め定め
られたスレッシュホールド以下である部分)は、子音発
声時の不安定によっても生じ、・クワレベルだけでは子
音の一部か、音節境界か判定できないためである。子音
のKと母音のAはlフレーム重複して認識されている。
The silent Q is recognized to overlap with the consonant K. This is because silent parts in words (more precisely, parts where the nokwa is below a predetermined threshold) are also caused by instability during consonant pronunciation. This is because it is not possible to determine whether it is a syllable boundary. The consonant K and the vowel A are recognized overlappingly by 1 frame.

これは、現在、音素境界は重複認識することにしている
からであって、特別な理由はない。さて、第3図のQ区
間であるが、このQ区間が、予め定められたスレッシュ
ホールドと比べ長いか、あるいは変化が大きければ音節
境界と見なされ、前半のKはK及びUIとなり、後半の
KはAにつながるKとなって、無声化母音が抽出できた
ことになる。
This is because phoneme boundaries are currently recognized overlappingly, and there is no particular reason for this. Now, regarding the Q interval in Figure 3, if this Q interval is longer than a predetermined threshold or has a large change, it is considered a syllable boundary, and the first half K becomes K and UI, and the second half K becomes K connected to A, and the devoiced vowel has been extracted.

なお、パワを計算する場合、真の・ぐワは、瞬時値の2
乗をある時間積分したものであるが、2乗の代りに絶対
値を用いても、はぼ同様の結果が得られる。
Note that when calculating power, the true power is 2 of the instantaneous value.
Although it is obtained by integrating the power over a certain period of time, almost the same result can be obtained even if the absolute value is used instead of the square.

上記実施例と従来例において、認識実験を行った結果を
次表に示す。
The following table shows the results of recognition experiments conducted in the above embodiment and the conventional example.

上記表において、入力のCvCは、無声子音、無声化母
音、無声子音の連らなった場合のみを示すものとし、そ
れが正しく CVCと認識された個数と、1個の子音と
して認識されてしまった個数を示す。
In the above table, the input CvC shows only cases in which unvoiced consonants, unvoiced vowels, and unvoiced consonants are connected, and the number of correctly recognized CVCs and the number of unvoiced consonants recognized as one consonant are shown. Indicates the number of pieces.

また下段は入力が、−個の無声子音であるのに、無声化
母音抽出方法が誤まって適用され、認識結果がCVCと
なってしまった個数を示す。なお、各種スレッシュホー
ルドは、従来例、実施例共、それぞれ予備実験により、
最適値に定めた。結果を見ると、無声化母音の抽出率は
、従来715%であったのが本発明によシ987%に向
上した。一方、無声化母音抽出ルーチンの誤適用は、従
来、正しく適用された場合の12%存在したが、本発明
により皆無となった。本発明は、誤適用の減少を直接の
目的と、けしていないが、Q区間の長石のスレッシュホ
ールドを厳しくすることにより、誤適用の減少が可能と
なったものである。
The lower row shows the number of unvoiced consonants that were input, but the unvoiced vowel extraction method was applied incorrectly and the recognition result was CVC. In addition, various thresholds have been determined through preliminary experiments for both the conventional example and the example.
The optimum value was set. Looking at the results, the extraction rate of devoiced vowels was 715% in the conventional method, but improved to 987% in accordance with the present invention. On the other hand, in the past, incorrect application of the devoiced vowel extraction routine occurred 12% of the time when it was applied correctly, but with the present invention, this has been eliminated. Although the present invention does not directly aim at reducing the number of erroneous applications, it is possible to reduce the number of erroneous applications by tightening the threshold of feldspar in the Q section.

このように、本発明により、無声化母音の抽出率は大巾
に向上し、一方、誤抽出率は大巾に減少した。このよう
な、無声化母音の認識率の向上は単語認識率の向上に直
結するものであって、本発明の効果は大きいものである
As described above, according to the present invention, the extraction rate of devoiced vowels has been greatly improved, while the erroneous extraction rate has been greatly reduced. Such an improvement in the recognition rate of devoiced vowels is directly linked to an improvement in the word recognition rate, and the effects of the present invention are significant.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、従来の音声認識方法のフロー図、第2図は、
本発明の一実施例の音声認識方法のフロー図、第3図は
、入力音声の予備音素認識結果と・ぞワ変化の例を示す
図である。
Figure 1 is a flow diagram of a conventional speech recognition method, and Figure 2 is a flow diagram of a conventional speech recognition method.
FIG. 3, which is a flowchart of the speech recognition method according to an embodiment of the present invention, is a diagram showing an example of preliminary phoneme recognition results of input speech and changes in noise.

Claims (1)

【特許請求の範囲】[Claims] 無声音が通常の一音素の長さ以上継続し、かつこの無声
音区間中に音声パワのディツノが存在し、この音声ノ9
ワディッゾにおける・ぐワの時間に対する変化率が予め
定められた値より大きいとき、前記無声音区間は・、−
個の無声子音ではなく、無声子音、無声化母音、無声子
音よシなる3音素子が連続した状態にあるものとして音
素認識を行なうことを特徴とする音声認識方法。
The unvoiced sound continues for longer than the normal length of one phoneme, and there is a voice power division within this unvoiced sound section, and this sound no. 9
When the rate of change of ・guwa in Wadizo with respect to time is larger than a predetermined value, the unvoiced sound section is ・, −
A speech recognition method characterized in that phoneme recognition is performed on the assumption that three phonetic elements consisting of a voiceless consonant, a voiceless vowel, and a voiceless consonant are in a continuous state instead of individual voiceless consonants.
JP1087482A 1982-01-28 1982-01-28 Voice recognition Granted JPS58129498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1087482A JPS58129498A (en) 1982-01-28 1982-01-28 Voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1087482A JPS58129498A (en) 1982-01-28 1982-01-28 Voice recognition

Publications (2)

Publication Number Publication Date
JPS58129498A true JPS58129498A (en) 1983-08-02
JPS637399B2 JPS637399B2 (en) 1988-02-16

Family

ID=11762475

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1087482A Granted JPS58129498A (en) 1982-01-28 1982-01-28 Voice recognition

Country Status (1)

Country Link
JP (1) JPS58129498A (en)

Also Published As

Publication number Publication date
JPS637399B2 (en) 1988-02-16

Similar Documents

Publication Publication Date Title
CN107045870B (en) Speech signal endpoint detection method based on characteristic value coding
Sarmah et al. Contextual variation of tones in Mizo
JPS58129498A (en) Voice recognition
JPS58129499A (en) Voice recognition
JPS5925240B2 (en) Word beginning detection method for speech sections
JPS6039691A (en) Voice recognition
Elghonemy et al. Speaker independent isolated Arabic word recognition system
JPS59143200A (en) Continuous voice recognition system
JPS5969798A (en) Extraction of pitch
JPH0229229B2 (en)
KR100263297B1 (en) A method to set up units for speech recognition using pseudo morpheme
JP3253753B2 (en) Formatting method and apparatus for text to be read aloud
JPS5872995A (en) Word voice recognition
Salam Recognition of Holy Quran Recitation Rules Using Phoneme Duration
JPS5978399A (en) Recognition of word voice
JPS6033599A (en) Voice recognition equipment
JPS59204099A (en) Voice recognition system
JPS63247798A (en) Voice section detecting system
JPS62223798A (en) Voice recognition equipment
JPS617894A (en) Voice recognition
JPS617896A (en) Word voice recognition method
JPS6147992A (en) Voice recognition system
JPH0792675B2 (en) Voice recognizer
JPS60172098A (en) Monosyllabic voice recognition equipment
JPS6075890A (en) Recognition of vowel