JPS60254099A

JPS60254099A - Voice recognition system

Info

Publication number: JPS60254099A
Application number: JP59108667A
Authority: JP
Inventors: 広田　敦子; 陽一山田; 裕飯塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1984-05-30
Filing date: 1984-05-30
Publication date: 1985-12-14
Anticipated expiration: 2009-05-02
Also published as: JPH0634192B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（技術分野）本発明は、音声認識方式に関し、特に認識性能の向上を
図る為の音声データのマツチング方法の改良に関す゛る
ものである。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a speech recognition system, and particularly to an improvement in a method for matching speech data in order to improve recognition performance.

（背景技術）従来の音声認識装置は第１図のように構成されており、
１は入力端子、２は周波数分析部、３はスペクトル変換
部、４は音声区間決定部、５は非類似度演算部、６は標
準音声スｄクトル・ぐターツメモリ、７は判定部、８は
認識結果出力端子である。(Background Art) A conventional speech recognition device is configured as shown in Figure 1.
1 is an input terminal, 2 is a frequency analysis section, 3 is a spectrum conversion section, 4 is a speech interval determination section, 5 is a dissimilarity calculation section, 6 is a standard speech skeletal/guterz memory, 7 is a judgment section, and 8 is a This is a recognition result output terminal.

従来の音声認識装置では、スペクトル変換した入力音声
スペクトルパターンと標準スにクトル、ヤクーンｋ（ｋ
−１〜Ｋ）との非類似度演算においテ、非類似度Ｄｋを
入カスベクトル・ぐクーンの時間標本点第ｎ番目のｍチ
ャネル目の要素をＡ（ｍ、ｎ）とし、像準亥ベクトルパ
ターンにの時間標本点ｎ番目のｍチャネル目の要素をＳ
Ｋ（ｍ、ｎ）とした時に、（１）式により、計算し、Ｋ個の標準スペクトルミ９タ
ーンの中でＤｋを最ｌＪ・とする標準スペクトル・マタ
ーンのカテゴリを認識結果としている。ここで重みＷ　
（ｍ　、−ｎ　）の計算方法については、数々の方式が
あるが、この発明の目的ではないので省略する。In conventional speech recognition devices, the spectrally converted input speech spectrum pattern and the standard vector, k (k
In the dissimilarity calculation with -1 to K), the dissimilarity Dk is input, and the m-th channel element of the n-th time sample point of the cassette vector is set as A(m, n), and the image quality is The mth channel element of the nth time sample point in the vector pattern is S
When K (m, n), calculation is performed using equation (1), and the recognition result is the category of the standard spectrum pattern whose Dk is the highest lJ· among the K standard spectrum mi-9 turns. Here the weight W
There are many methods for calculating (m, -n), but they are not the purpose of this invention and will therefore be omitted.

従来の認識装置では、ス（クトル変換にょシ入力音声の
・ぐワー情報は、完全に失なわれる。その結果、例えば
「イチ」を「二」と誤認識したり、「コ゛」を「ロク」
に誤認識するという場合がある。In conventional recognition devices, the word information of the input voice is completely lost during vector conversion.As a result, for example, "ichi" may be mistakenly recognized as "two" or "co" may be misrecognized as "ro". ”
There are cases where it is misrecognized.

第２図に、「イチ」、「二」、「コ゛」、「ロク」の音
声パターンのツナグラムの例を示す。第２図で横方向は
周波数軸、たて方向が時間軸である。FIG. 2 shows examples of tunagrams for the speech patterns of "ichi", "two", "koi", and "roku". In FIG. 2, the horizontal direction is the frequency axis, and the vertical direction is the time axis.

このように、スにクトル変換により「イチ」と「二」、
「ゴ」と１０り」は、かなシ似かよった・ぞクーンとな
シ、その差としては「イ」と「チ」の間の無音区間１０
」と「り」の間の無音区間が大きいがパワー情報は失な
われているので、結果として誤認識されることがちシ、
認識率低下の厚層となった。In this way, ``ichi'' and ``two'' are converted by
``Go'' and ``10ri'' are similar to kanashi.
Although there is a large silent interval between ``'' and ``ri'', the power information is lost, so it is likely to be misrecognized as a result.
This resulted in a thick layer of decline in recognition rate.

（発明の課題〕この発明の目的は、これらの欠点を解決し、従来スペク
トル変換により完全に失なわれていたパワー情報、特に
無音区間の情報を用いることによシ、認識率を向上させ
る事のできる音声認識方式を提供するにあり、その要点
は、前出の数字「イチ」が「二」、「ロク」が「コ゛Ｊ
など、単語中のパワーの谷部（以下パワーディッゾと称
す）の有無の情報を、従来のスペクトル距離情報と併用
して認識判定に利用することによって、コンフーーノヨ
ンを失くし、特に電話音声認識などの音韻情報の劣化し
たものなどに利用し、認識率の向上を図るものである。(Problems to be solved by the invention) The purpose of the present invention is to solve these drawbacks and improve the recognition rate by using power information, especially information in silent sections, which was completely lost due to spectrum conversion in the past. The key point is to provide a voice recognition method that allows the number ``ichi'' to be ``two'' and the number ``roku'' to be ``ko゛J''.
By using the information on the presence or absence of power valleys (hereinafter referred to as power dizzos) in words in combination with conventional spectral distance information for recognition judgment, it is possible to eliminate confu-noyons and improve the phonology, especially in phone speech recognition. It is used to improve the recognition rate by using information that has deteriorated.

第３図に単語による・ぐワーディッゾの状況を示す、。Figure 3 shows the situation of Gwardizo in terms of words.

（発明の構成および作用）第４図は、この発明の１実施例を示したブロック図であ
る。第４図において、１００は、入力端子、２００は周
波数分析部である。３００ば、スペクトルデータであり
、４００は音声区間決定部、５００は再サンプル部であ
る。６００は、ノヤワーディノプ演算部であり、・々ツ
ー情報メモ９部６０１、シフトレジスタ１６０２、シフ
トレノスタ２゛６０３、ンフ１へレジスタ３６０４、ン
フトレ；ノスタ４６０５、加算減算器６０６、・やワー
微分器６０７、比較部１６０８、比較部２６０９、比較
部３　６１０．ＤＳ決定部６１１、ＤＥ決定部６１２、
パワーディッゾ区間決定部６１３、ＰＡ計算器６１４、
ＱＡ計算器６１５、パワーディッゾ評価値計算部６１６
、パワーディラフ０距離演算部６１７、比較部４６１８
、ラッチ６１９、除算割算器６２０、減算計算器１　６
２］、減算計算器２６２２、減算計算器：３６２３、減
算訓算器４６２４、比較部５６２５、比較部６６２６、
比較部７６２７、比較部８６２８、ＯＲケ゛−１−６２
９から成る。７００はパワーディノゾ定数テーブルでち
ゃ、８００は総合距離演算部、９００は距離出力端子で
ある。(Structure and operation of the invention) FIG. 4 is a block diagram showing one embodiment of the invention. In FIG. 4, 100 is an input terminal, and 200 is a frequency analysis section. 300 is spectrum data, 400 is a voice section determining section, and 500 is a resampling section. Reference numeral 600 denotes a noyawardinop calculation unit, which includes: 9 information memo section 601, shift register 1602, shift register 2 603, register 3604 to register 3604, nostar 4605, addition/subtractor 606, . . . and word differentiator 607, Comparison section 1608, comparison section 2609, comparison section 3 610. DS determining unit 611, DE determining unit 612,
Power dizzo interval determination unit 613, PA calculator 614,
QA calculator 615, power dizzo evaluation value calculation unit 616
, power differential 0 distance calculation section 617, comparison section 4618
, latch 619, division divider 620, subtraction calculator 1 6
2], subtraction calculator 2622, subtraction calculator: 3623, subtraction calculator 4624, comparison section 5625, comparison section 6626,
Comparison section 7627, comparison section 8628, OR key-1-62
Consists of 9. 700 is a power dinozo constant table, 800 is a comprehensive distance calculation section, and 900 is a distance output terminal.

このような構成において、入力端子１００から入力され
る入力音声信号は、周波数分析部２００に入力され、複
数の周波数帯域に対応した量子化信号として、周波数分
析され、スペクトル変換部３００に送られる。スにクト
ル変換部３００に送られ／ζζデー上、スペクトル変換
がなされ、フレーム毎の正規化されたスペクトル情報と
、音声パワー情報となり、音声区間決定部４００、及び
再サンプル部５００に送られる。音声区間決定部４００
は、音声・やワー情報を利用して音声区間の始端及び終
端を決定し、再サンプル部５０’Ｏ及びパワーディップ
演算部６００へ送る。再サンノル部５００に送られたス
ペクトルデータ及びパワーデータは、抽出された音声区
間を１６点１たけ３２点で時間の正規化がおこなわれ、
そのうちの・ｐワー情報のみが・ぐワーディップ演算部
６００に送られる。一方、スペクトルデータは、第１図
に説明した従来と同じ方法で別のス被りトル情報の非類
似度演算部（ここでは図示せず）に送られ、スにクトル
距離がめられる。この結果は第４図の１０００に入力さ
れる。In such a configuration, an input audio signal input from the input terminal 100 is input to the frequency analysis section 200, frequency-analyzed as a quantized signal corresponding to a plurality of frequency bands, and sent to the spectrum conversion section 300. The signal is then sent to the vector converter 300 and then subjected to spectrum conversion, resulting in normalized spectrum information and audio power information for each frame, which are sent to the audio segment determining unit 400 and the resampling unit 500. Voice section determining unit 400
determines the start and end of the voice section using the voice/voice information and sends it to the resampling section 50'O and the power dip calculation section 600. The spectrum data and power data sent to the re-sanno unit 500 are subjected to time normalization using 16 points and 1 to 32 points for the extracted voice section.
Of these, only the p-word information is sent to the p-word dip calculation section 600. On the other hand, the spectrum data is sent to another spectrum information dissimilarity calculation unit (not shown here) using the same conventional method as explained in FIG. 1, and the spectrum distance is calculated. This result is input at 1000 in FIG.

次にパワーディップを用いての距離演算を行々うため、
・ぐワー情報メモリ部６０１に書き込１れた再ザンプル
済の７Ｆワー情報は、７７トレノスタ６０２〜６０５へ
１フレーム毎に順に転送される。Next, to calculate distance using power dip,
- The resampled 7F word information written in the program information memory section 601 is sequentially transferred to the 77 Trenostars 602 to 605 frame by frame.

本発明は、各カテゴリ毎に異なる無音部の特徴をマツチ
ング演算の距離に換算し、単語中のパワーディップの有
無の情報を利用し、認識率の向上を図ることを主眼とす
るものである。The present invention aims to improve the recognition rate by converting the characteristics of silent parts, which differ for each category, into distances for matching calculations and using information on the presence or absence of power dips in words.

第５図は音声のパワー・ぐターンを示し、音声は始端フ
レーム（５ＴＦＲ）と終端フレーム（ＥＤＦＲ）の区間
切出されている。さて、ソフトレジスフ６０２〜６０５
へ逐次転送された再サンプル済音声・やワーデークＪ（
ｊ　＝　５ＴＦＲ−ＥＤＦＲ）は、加算減算器６０６へ
送られ、データはさらにパワー微分器６０７へ送られ、
第（１）式より値ＤＦＰＷ（ｊ　）を計算する０ＤＦＰＷ（ｊ）＝ＰＯＷ（ｊ＋２）十ＰＯＷ（ｊ＋１）
　ＰＯＷ（ｊ）　ＰＯＷ（ｊ　１）−（１）（ｊ　＝　
５ＴＦＲ〜ＥＤＦＲ）Ｊはスタートフレームから開始する。FIG. 5 shows the power pattern of the audio, in which the audio is cut out between the start frame (5TFR) and the end frame (EDFR). Now, soft register 602-605
Re-sampled audio and Wordake J (
j = 5TFR - EDFR) is sent to an adder/subtractor 606, and the data is further sent to a power differentiator 607,
Calculate the value DFPW (j) from equation (1)0 DFPW (j) = POW (j + 2) + POW (j + 1)
POW (j) POW (j 1) - (1) (j =
5TFR~EDFR) J starts from the start frame.

次にパワーディ、プ開始点及び終点を決定するために以
下の順に比較を行なっていく。Next, in order to determine the power dip start and end points, comparisons are made in the following order.

第（１）式によって計算されたＤＦＰＷ（ｊ）を、１ず
比較部１　６０８（ＴＨＬＩ）と比較を行い、ＤＦＰＷ
（ｊ　）がスレッショルドＴＨＬ　１よシも値が小とな
る点をｊｌとする。次にｊｌ−４６よシ開始してＤＦＰ
Ｗ（ｊ　）がスレッショルドＴＨＬ　２よりも値が太き
いか等しくなる点を３２とする。次に３２より開始して
ｎＦＰＶ／（ｊ）がスレッショルドＴＨＬ　３よりも値
が小となる点をｊ３とする。また比較部１６０８〜３６
１０で既にパワーディップ区間候補となった場合は、ｊ
　１＝ｊ　３として、再び同じ操作を行なう。又、パワ
ーディ、ゾ区間候補とならなかった場合は新たにＪ２及
びＪ３をめる。The DFPW (j) calculated by the formula (1) is compared with the comparison unit 1 608 (THLI), and the DFPW
Let jl be the point where (j) has a smaller value than the threshold THL1. Next, start with jl-46 and DFP
Let 32 be the point where W(j) is greater than or equal to the threshold THL2. Next, starting from 32, the point where nFPV/(j) is smaller than the threshold THL 3 is set as j3. Also, comparison sections 1608 to 36
If it is already a power dip section candidate in 10, j
The same operation is performed again with 1=j3. Also, if the Power Day and Zo sections are not candidates, J2 and J3 will be added.

第５図においては、それぞれＪにｐｌ、ｊ２＝Ｐ２　、
ｊ　３＝Ｐ３となり、ＤＳ決定部６１１及びＤＥ決定部
６１２へ送られる。パワー情報メモリ部６０１に書き込
まれたデータはまた、比較部４６１８へ送られ、比較部
４６１８及びう。In Fig. 5, J is pl, j2=P2,
j 3=P3, and is sent to the DS determining section 611 and the DE determining section 612. The data written in the power information memory section 601 is also sent to the comparison section 4618, and the data written in the power information memory section 601 is also sent to the comparison section 4618.

チを用いて逐次比較を行ない、・ぐワーの最大値ＰＭＡ
Ｘをめる。・やワーの最大値ＰＭＡＸをめる式を第（２
）式に示す。Perform successive approximation using
Put an X.・The formula for calculating the maximum value PMAX of
) is shown in the formula.

ＰＭＡＸ　＝　ＭＡＸ　（ＰＯＷ（ｊ）　）　ｊ　＝Ｓ
ＴＦＲ−ＥＤＦＲ・（２）既に前述した第（１）式によ
請求められたｊ　１　、　ｊ　２゜ｊ２−１及びＪ３の
値は、減算計算器６２１〜６２４及び比較部５６２５〜
比較部８６２８により減算・比較され、ＯＲケ”　−ト
ロ　２９’を通し、出力結果が「１」の時のみパワ−デ
ィ７ノ区間を検出・出力し、ＤＳ決定部６１１及びＤＥ
決定部６１２へ送る。PMAX = MAX (POW(j)) j = S
TFR-EDFR・(2) The values of j 1 , j 2゜j2-1 and J3 already requested by the above-mentioned equation (1) are calculated by the subtraction calculators 621 to 624 and the comparison units 5625 to
It is subtracted and compared by the comparison unit 8628, passed through the OR key 29', and only when the output result is "1", the power-day 7 section is detected and output, and the DS determination unit 611 and DE
It is sent to the determination unit 612.

パワー最大値ＰＭＡＸを第（２）式よ請求めた後、次の
条件筒（４）式１）〜４）のいずれかを満たした場合、
Ｊ１〜Ｊ３をパワーディップ区間候補とする。After calculating the maximum power value PMAX using equation (2), if any of the following conditions (4) equations 1) to 4) are satisfied,
Let J1 to J3 be power dip section candidates.

１）　ＰＯＷ（ｊｌ）−ＰＯＷ（ｊ２））ＰＭＡＸ／８
２）　Ｐｏｗ（ｊｌ）−ＰＯＷ（ｊ２−１）＞ＰＭＡＸ
／８　・・・（４）３）　ＰＯＷ（コ３）Ｔ　Ｐ□Ｗ（
ｊ２−１））ＰＭＡＸ／８４）　ＰＯＷ（ｊ３）−ＰＯ
Ｗ（ｊ２−１））ＰＭＡＸ／８減算計算器６２１〜６２
４においては第（４）式の左辺の減算を行ない、右辺で
は除算計算器６２０において第（２）式より既にめたパ
ワー最犬値ＰＭＡＸを８で割っている。両辺がめられた
後比較部５６２５〜比較部８６２８で比較され、■）〜
４）のいずれかを満たした場合、ＯＲケ゛−）６２９を
介し、パワーディップ区間候補とみなし、Ｄ≦決定部６
１１及びＤＥ決定部６１２へその情報が送られる。ＤＳ
決定部６１１にはパワーディ、プの開始点ＤＳを、ＤＥ
決定部６１２にはパワーディ７プの終点ＤＥを、それぞ
れ格納する。1) POW(jl)-POW(j2))PMAX/8
2) Pow(jl)-POW(j2-1)>PMAX
/8 ...(4)3) POW(ko3)T P□W(
j2-1)) PMAX/84) POW (j3)-PO
W(j2-1)) PMAX/8 subtraction calculator 621-62
4, the left side of equation (4) is subtracted, and on the right side, the maximum power value PMAX, which has already been determined from equation (2) in the division calculator 620, is divided by 8. After both sides have been determined, they are compared in comparison sections 5625 to 8628, and ■)
4), it is considered as a power dip section candidate via OR key () 629, and D≦determination unit 6
The information is sent to DE determination section 11 and DE determination section 612. DS
The determination unit 611 stores the power dip starting point DS and the power dip starting point DS.
The determination unit 612 stores the end points DE of the power dips 7, respectively.

ＤＳ決定部６１１及びＤＥ決定部６１２へ送られたｊ１
＋ｊ３及びパワーディ、プ区間検出データはさらに・々
ワーディ７ノ区間決定部６１３を通して、ＰＡ計算器６
１４及びＱＡ計算器６１５へ送られる。パワーディップ
区間決定部６１３では、・母ワーディッゾ区間候補が２
つできた場合４１例えば第５図でＰ］〜Ｐ３．？３〜Ｐ
５が候補になるが、次の第（５）式］）〜３）を全て満
だ１〜た時はＰ１〜Ｐ５を・ぐワーディップ区間とする
ものであり、１）　ＰＯＷ（ＰＩ）　−ＰＯＷ（Ｐ３）
　）　０２）　ＰＯＷ（Ｐ５）　−ＰＯＷ（Ｐ３）　）
　Ｏ−（５）３）　ＰＯＷ（ＰＬ）＋　ＰＯＷ（Ｐ５）
　：＞　３　十ＰＯＷ（Ｐ３）第５図でばＤ１〜Ｄ２．
Ｄ２〜Ｄ３の２つがパワーディップ区間になる。j1 sent to the DS determining unit 611 and DE determining unit 612
The +j3 and power dip section detection data are further passed through the worddy 7 section determining section 613 to the PA calculator 6.
14 and QA calculator 615. The power dip section determination unit 613 determines that the mother Wardizzo section candidate is 2.
If completed, 41 For example, in FIG. 5, P] to P3. ? 3~P
5 is a candidate, but when all of the following equations (5)]) to 3) are satisfied, P1 to P5 are set as a word dip interval, and 1) POW (PI) - POW (P3)
) 02) POW(P5) -POW(P3) )
O-(5)3) POW(PL)+POW(P5)
:> 3 10 POW (P3) In Figure 5, D1 to D2.
Two sections, D2 and D3, are power dip sections.

（以下余白）パｌノーディッゾ区間決定部６１３で決定されたデータ
は、ＰＡ計算器６１４及びＱＡ則算器６１５へ送られ、
第５図のｆ（ｉ）に示す、・ぐワーディッノ。(Left below) The data determined by the Pal No Dizzo interval determination unit 613 is sent to the PA calculator 614 and the QA rule calculator 615.
・Guwadino shown in f(i) of FIG.

区間の始端、終端の値を結ぶ直線の傾き、及び切片が計
算される。The slope and intercept of the straight line connecting the values at the start and end of the section are calculated.

まず、（ＤＳ　、　ＰＯＷ（ＤＳ））、（ＤＥ　、　Ｐ
ＯＷ（ＤＥ））の２点を結ぶ直線の方程式を第（９）式
よりめる。First, (DS, POW(DS)), (DE, P
The equation of the straight line connecting the two points of OW(DE)) is determined from equation (9).

ｆ（ｊ）＝　ＰＡ＊、＋−＋−ｑＡ　・・・・・・（７
）ここでＰＡは直線の傾き、ＱＡは切片とする。f(j)=PA*,+−+−qA (7
) Here, PA is the slope of the straight line and QA is the intercept.

第（８）弐に直線の頌きＰＡをめる式、第（９）式に切
片ＱＡをめる式を示す。The formula for inserting the straight line parameter PA into the second equation (8) and the equation for inserting the intercept QA into the second equation are shown.

このよってして直線の切片及び頌きをめた後、パワーデ
ィップの大きさの評価関数値をパワーディップ評価値計
算部６１６で計算する。評価関数値は、第０４式により
正規化されたものとして定義する。After determining the straight line intercept and consideration, the power dip evaluation value calculation unit 616 calculates the evaluation function value of the magnitude of the power dip. The evaluation function value is defined as having been normalized by Equation 04.

−５Ｓ１？ＤＶ工、□　・・・・・・θ０）ＷＷ−ＡＡ　’ＰＭＡＸ第（１０）式の・ぐワーディｙフ０評価関数値ＰＤＶに
おいて、Ｃは定数でＣ＝２とする。ＳＳは、パワーデＷ
Ｗは・やワーディップのはばをあられし第（壕式より訓
算される。-5S 1? DV engineering, □ ...... θ0) WW-AA 'PMAX In the □ 0 evaluation function value PDV of equation (10), C is a constant and C=2. SS is Power DeW
W is calculated from the trench style.

ＷＷ＝ＤＥ−ＤＳ＋１　・・・・・・（１つすなわち、
ＷＷの１直が大きい程、〕やワーディッノの可能性は少
なくなる。WW=DE-DS+1 (one, that is,
The larger the first shift of WW, the less the possibility of] or Wardino.

捷だＡＡは／ぐワーディッグの傾きをあらＩ−）シ、第
０３式によシ計算される。The short AA is calculated according to Equation 03, where the slope of Wardig is calculated by I-).

ＡＡ＝ＰＡ　＋１　・・・・・・０擾以上のようにして第（角穴から算出されたパワーディ７
）の大きさＰＤＶは、パワーディップ距離演算部６１７
へ送られる。パワーディップ距離演算部６１．７　テｌ
”ｌ：、□　７−テ゛イツプによる距離を第（ユ→式に
より計算し、パワーディップの有無に応じた距離値の計
算を行う。AA=PA +1 ......The power di calculated from the square hole is 7.
) is determined by the power dip distance calculation unit 617.
sent to. Power dip distance calculation section 61.7 Tel
"l:, □ The distance based on the 7-tipe is calculated using the formula (U→), and the distance value is calculated depending on the presence or absence of a power dip.

（１４−１）は・やワーディップありの場合、（１−４
，−２）は、ノぐワーディノグなしの場合である。Ｃ１
、ＤＢＭＡＸ、　Ｃ２は定数であり、それぞれＣ１＝３
．Ｃ２＝５゜ＤＢＭＡＸ　＝　３００とする。ＣＣＰ及
びＣＣＮ０値はノぐワーディッグ定数テーブルブ００よ
り与えられる。(14-1) is... with wardip, (1-4
, -2) is the case without nogwardinog. C1
, DBMAX, and C2 are constants, each with C1=3
．． Let C2=5°DBMAX=300. The CCP and CCN0 values are given from the Nogwardig constant table 00.

ことで第６図に各カラコ’　ｌ）　ＶＣ応じたＣＣＰ　
、　ＣＣＮ０値の設定値の一部を示す。Therefore, Figure 6 shows each color code'l) CCP according to VC.
, shows a part of the setting value of CCN0 value.

このようにＣＣＰ及びＣＣＮは各カテゴリ毎の・ぐワー
ディノプの有無、パワ〜ディップの大きさなどによって
予め決定されるものであり、ＣＣＰの値は０〜８の範囲
で設定し、パワーディップのあるカテコ゛す、例えば「
イチ」、「ロク」、「ハチ」。In this way, CCP and CCN are predetermined in each category based on the presence or absence of power dips, the size of power dip, etc., and the CCP value is set in the range of 0 to 8. For example, “
"Ichi", "Roku", "Hachi".

「ホリエウ」などは、ｃｃｐを「Ｏ」、逆ｒ「ヨン」。In words such as "horieu", ccp is "O" and reverse r is "yon".

「ゴ」、「ハイ」などの／ぐワーディッゾのカテコゞり
は「８」とする。The category of /guwadizo such as "go" and "hai" is "8".

ＣＣＮ０値は、ＣＣＰの逆でパワーディップのないカテ
ゴリは、「０」とクリ、パワーディップのあるカテコ゛
りは「３０」とする。The CCN0 value is the opposite of CCP, and is set to "0" for categories without power dips, and "30" for categories with power dips.

第α→式により計算された結果が（１４−１）式の条件
を満たせばパワーディ７ノであり、また（１４．−２）
式の条件を満たしたのであればパワーディップなしの判
定を行ない、判定結果ＤＢＣが総合距離演算部８００へ
送られる。総合距Ｎ１演算部８００ではス４クトル距施
情報１０００から送られる従来からのス投りトル距離値
と、第０４式からめたパヮーディッゾ判定結果ＤＢＣと
の距離のりｐ算を行いその結果として距離出力端子９０
０がら出力する。If the result calculated by the αth → formula satisfies the condition of formula (14-1), it is power di 7, and (14.-2)
If the condition of the expression is satisfied, it is determined that there is no power dip, and the determination result DBC is sent to the total distance calculation unit 800. The total distance N1 calculation unit 800 calculates the distance between the conventional distance distance value sent from the distance information 1000 and the Paadizzo judgment result DBC obtained from formula 04, and outputs the distance as a result. terminal 90
Outputs 0.

以上述べたように、本発明では、通常のスにクトルマソ
チング距随ニ加え、各カテコ゛すのノやワーｆ’イップ
の有無の情報を取り込むことにより、パ１ノーディップ
を持っカテコゞす、「イチ」、１０り」。As described above, in the present invention, in addition to the vector sowing distance in addition to the normal speed, by incorporating information on the number of each category and the presence or absence of warp f'yip, it is possible to perform a category with no dip. 1”, 10ri”.

「ハチ」などと、パワーディップを持たないカテゴリ「
二」「ヨン」「ゴ」などのコンフユージヨンを減少させ
る車力Ｓ可能であシ、認識率を向上させることが出来る
。Categories that do not have a power dip, such as ``Hachi'',
It is possible to reduce confusions such as ``2'', ``Yon'', and ``Go'', thereby improving the recognition rate.

以上述べ′た本発明の有効性を証明するため（、で、認
識実験しだ結果を説明する。In order to prove the effectiveness of the present invention described above, we will explain the results of a recognition experiment.

男性データ約２９００）ぐタンに対し、約５８００バタ
ンから作成した標準・ぐタンを用いて認識実験を行なっ
たところ、パワーディップを加えない従来の認識率９６
６１％に対して、９７．８０％と約１ヂ強の認識率の向
上が？ｌられノこ。同時（Ｃ１位と２位の距離の差が拡
大し、認識の安定度の向上がみられる。When we conducted a recognition experiment using a standard button created from approximately 5,800 buttons, compared to the male data (approximately 2,900) buttons, we found that the conventional recognition rate without adding power dip was 96.
The recognition rate improved by about 1 degree from 61% to 97.80%? lRarenoko. Simultaneously (C) The difference in distance between the 1st and 2nd positions widens, and the stability of recognition improves.

（発明の効果）本発明は、通常の・やターンマツチング距離に加え、音
声のパワー情報をマツチング距離に換算して取シ込むこ
とによって、単語間の識別をよシ精度よく行なうことが
でき、音声認識装置の認識性能を向上するのに効果があ
る。(Effects of the Invention) The present invention allows for more accurate discrimination between words by converting speech power information into a matching distance in addition to the usual turn matching distance. , which is effective in improving the recognition performance of speech recognition devices.

[Brief explanation of the drawing]

第１図は、従来の音声認識装置のブロック図、第２図は
、音声／、ｏターンの例を示す図、第３図は単語による
・やワーディソプの状況の例を示す図、第４図は本発明
による音声認識装置の一実施例のブロック図、第５図は
・やワーディッノの設定範囲を表わした図、第６図はパ
ワーディラグ定数テーブルを示す図である。１００・・・入力端子、２００・・・周波数分析部、３
００・・・ス被りトル変換部、４００・・・音声区間決
定部、５００・・・再サンプル部、６００・・・マツチ
ング演算部、６０１・・・パワー情報メモリ部、６ｏ２
・・・シフトレジスタ（Ｊ＋２）、６０３・・・シフト
レジスタ（Ｊ＋１）、６０４・・・シフトレジスタ（Ｊ
）、６０５・・・ソフトレジスタ（Ｊ−１）、６０６・
・・加算減算器、６０７・・・パワー微分器、６０訃・
・比較部１．６０９・・・比較部２．６　］、　Ｏ・・
・比較部３．６１１・・・ＤＳ決定部、６１２・・・Ｄ
Ｅ決定部、６エ３・・・パワーディップ区間決定部、６
１４・・・ＰＡ計算器、６１５・・・ＱＡ　計算器、６
１６・・・ノぐワーディッゾ評価値計算部、６１７・・
・加算器、６１８・・・比較部４．６１９・・・ラッチ
、６２ｏ・・・除算計算器、６２１・・・減算計算器１
．６２２・・・減算計算器２．６２３・・・減算計算器
３．６２４・・・減算計算器４．６２５・・・比較部５
．６２６・・・比較部６．６２７・・・比較部７．６２
８　・・・比較部８．６２９　・ＯＲダート、７ｏ。・・・パワーディラグ定数テーブル、８ｏｏ・・・総合
距離演算部、９００・・・距離出力端子。特許出願人　沖電気工業株式会社特許出願代理人　弁理士　山　本　恵　−第１図〜６第２図（ａ）　（ｂ）　（ｃ）（ｄ）　（ｅ）　（ｆ）第５図ｊ第６図ＣＣＰ−ＣＣＮ　のイ直手続補正書（自発）昭和５９年７月３１日特許庁長官　志賀　学殿１　事件の表示昭和５９年特許願第１０８６６７号２　発明の名称音声認識方法３　補正をする者事件との関係　特許出願人名　称　（０２９）沖電気工業株式会社４代理人５、補正の対象明１１．ｌＩｌ書の発明の詳細な説明の欄６、補正の内
容（１）明細曹第３頁ノ（、＋ｊ式中「ＸＷ　（ｍ、　ｎ
　）　Ｊ　トｂ＞　ルのをｒ　ＸＶ１’Ｋ　（ｍ、　ｎ
　）Ｊと補正する。（２）同第３頁第１３行の「’Ｗ　（ｒｒ３．　ｎ　）
　Ｊを口ＩＶＫ（ＩＴＩ。１コ）」と補正する。以　−１ニFig. 1 is a block diagram of a conventional speech recognition device, Fig. 2 is a diagram showing an example of a speech /, o-turn, Fig. 3 is a diagram showing an example of a word-based situation, and Fig. 4 is a diagram showing an example of a word-based situation. 5 is a block diagram of an embodiment of the speech recognition device according to the present invention, FIG. 5 is a diagram showing the setting range of the wordinno, and FIG. 6 is a diagram showing a power delag constant table. 100...Input terminal, 200...Frequency analysis section, 3
00...Speech overlap conversion section, 400...Speech interval determination section, 500...Resampling section, 600...Matching calculation section, 601...Power information memory section, 6o2
...Shift register (J+2), 603...Shift register (J+1), 604...Shift register (J
), 605...Soft register (J-1), 606...
・Addition subtractor, 607 ・Power differentiator, 60・
・Comparison part 1.609... Comparison part 2.6], O...
・Comparison section 3.611...DS determination section, 612...D
E determining section, 6 E3... Power dip section determining section, 6
14...PA calculator, 615...QA calculator, 6
16...Nogwardizzo evaluation value calculation section, 617...
- Adder, 618... Comparison unit 4. 619... Latch, 62o... Division calculator, 621... Subtraction calculator 1
．． 622...Subtraction calculator 2.623...Subtraction calculator 3.624...Subtraction calculator 4.625...Comparison section 5
．． 626...Comparison part 6.627...Comparison part 7.62
8... Comparison part 8.629 ・OR dirt, 7o. ...Power delag constant table, 8oo...Comprehensive distance calculation unit, 900...Distance output terminal. Patent applicant: Oki Electric Industry Co., Ltd. Patent application agent: Patent attorney Megumi Yamamoto - Figures 1 to 6 Figure 2 (a) (b) (c) (d) (e) (f) Figure 5j No. 6 Figure CCP-CCN Direct Procedural Amendment (Voluntary) July 31, 1980 Commissioner of the Patent Office Gakudono Shiga1 Indication of the Case 1986 Patent Application No. 1086672 Name of the Invention Speech Recognition Method 3 Person making the amendment Relationship to the case Patent applicant name (029) Oki Electric Industry Co., Ltd. 4 Agent 5, Subject of amendment 11. Column 6 of Detailed Explanation of the Invention in Book IIl, Contents of Amendment (1) Page 3 of the Specification Column No.
) J Tob>Le's r XV1'K (m, n
) Correct as J. (2) "'W (rr3.n)" on page 3, line 13
Correct J to mouth IVK (ITI. 1 piece). -1d

Claims

[Claims]

Frequency analysis of the input audio signal, normalization of the analyzed spectrum slope, resampling to a certain data length to create an input audio pattern, calculation of the distance between the corresponding button and the standard button, and the distance In a speech recognition method that recognizes and determines the minimum recognition category, means for creating a power q-turn of input speech; means for creating a spectral pattern normalized by the spectral slope of input speech; means for matching the spectral pattern of the input voice with the spectral pattern of the input voice to calculate a first degree of dissimilarity; After normalization based on the average power, a second dissimilarity is calculated by performing an evaluation operation between the normalized ``Gwaa'' pattern and the above-mentioned ``ewordy'' and ``zoo'' information of the human voice. Means for producing a total matching distance by adding the distance lost points related to power D and Zo to the normal spectrum matching distance for each category; Recognizing the category name that gives the minimum total matching distance. 1. A voice recognition method comprising: means for determining.