JPS60129800A

JPS60129800A - Voice recognition system

Info

Publication number: JPS60129800A
Application number: JP23634583A
Authority: JP
Inventors: 広田　敦子; 裕飯塚; 山田　興三
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1983-12-16
Filing date: 1983-12-16
Publication date: 1985-07-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（技術分野）本発明は、音声認識装置のマツチング精度の向上を図る
為の音声データの再サンプル方法の改良に関するもので
ある。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to an improvement in a method for resampling voice data in order to improve the matching accuracy of a voice recognition device.

（技術的背景）従来の音声認識装置は、第１図のように構成されておシ
、１は入力端子、２は周波数分析部、３はスペクトルデ
ータ、４は音声区間決定部、５は再サンプル部、６は距
離演算部、７は標準ｐＪ？ターンメモリ、８は判定部、
９は認識結果出力端子である。(Technical background) A conventional speech recognition device is configured as shown in Fig. 1, where 1 is an input terminal, 2 is a frequency analysis section, 3 is spectrum data, 4 is a speech interval determination section, and 5 is a reproduction section. Sample section, 6 is distance calculation section, 7 is standard pJ? Turn memory, 8 is the judgment section,
9 is a recognition result output terminal.

従来の音声認識装置では、スペクトル変換した入力音声
パターンと、標準パターンに、（Ｋ＝１．Ｋ）との距離
演算において、距離ＤＫを入力・ぐターンの時間標本第
ｎ番目のｍチャネル目の要素をＡ　（ｍ　、　ｎ　）と
し、標準パターンにの時間標本点ｎ番目のｍチャネル目
の要素を、ＳＫ（ｍ、ｎ）とした時に、（１）式により、計算しに個の標準・ぐターンの中でＤ
Ｋを最小とする標準／、ｏターンのカテコゝすを認識結
果としている。In a conventional speech recognition device, in the distance calculation between the spectrally converted input speech pattern and the standard pattern (K=1.K), the distance DK is input and the mth channel of the nth time sample of the turn is input. When the element is A (m, n) and the m-th channel element of the n-th time sample point in the standard pattern is SK (m, n), the standard D in the turn
The recognition result is the standard/o-turn catechism that minimizes K.

ここで、重みＷ　（ｍ　、　ｎ　）の計算方法について
は数々の方式があるが、この発明の目的でないので省略
する。Here, there are many methods for calculating the weights W (m, n), but they are not the purpose of this invention and will therefore be omitted.

従来の音声認識装置では、入力音声を周波数分析し、音
声スペクトルの最小二乗近似直線を計算し、最小二乗近
似直線の傾きをスペクトル傾斜値としている。そして、
スペクトル傾斜値が負の場合は、入力音声を有声音と判
定し、音声スペクトルから音声スペクトルの最小二乗近
似直線を引き、スペクトル傾斜値が正の場合には、入力
音声を無声音と判定し、音声スペクトルから音声スペク
トルの平均を引くことにより、入力音声の声帯音源特性
の傾きおよび発声強度の差の正規化を行なっている。Conventional speech recognition devices frequency-analyze input speech, calculate a least squares approximation straight line of the speech spectrum, and use the slope of the least squares approximation straight line as a spectral slope value. and,
If the spectral slope value is negative, the input speech is determined to be voiced and a least square approximation line of the voice spectrum is drawn from the voice spectrum.If the spectral slope value is positive, the input voice is determined to be unvoiced and the voice is By subtracting the average of the voice spectrum from the spectrum, the slope of the vocal cord sound source characteristics of the input voice and the difference in phonation intensity are normalized.

正規化された音声データは、再サンゾル部５で音声区間
決定部４で決定された音声区間を再サングルし、一定の
音声サンプルデータ長とする。再サンプルは通常、音声
の発声区間を１６点ないし、３２点に分割して行なって
いる。The normalized audio data is resampled by the resampling section 5 to the audio section determined by the audio section determining section 4 to obtain a constant audio sample data length. Re-sampling is usually performed by dividing the utterance section of the voice into 16 or 32 points.

第２図に、リニアマツチング方式における、入力音声デ
ータと、再サングルデータの対応を示す。FIG. 2 shows the correspondence between input audio data and resampled data in the linear matching method.

即ち、入力音声の始端フレームＳ、ＴＦＲ１終端フレー
ムＥＤＦＲに対し、再サングルデータ（サンプルａ一定
値ｎ）は、図のように直線的関係で対応づけられる。That is, the resampling data (sample a constant value n) is associated with the start frame S of the input audio and the end frame EDFR of TFR1 in a linear relationship as shown in the figure.

この場合、入ヵ音声ｒ−夕のフレーム数と、再サンプル
データ数の値によっては、入力音声データに対する再サ
ンプルデータの時間的直線性が失なわれる。In this case, depending on the number of frames of the input audio data and the value of the number of resampled data, the temporal linearity of the resampled data with respect to the input audio data may be lost.

第３図に、再サンプルの状態例を示す。このように、従
来のリニアマツチングでは、入力音声データの再サンプ
ル結果は、非直線的になシ、正しい入力音声データスペ
クトルとの時間的変化がとシ込まれないことになる。こ
の結果例えば単語認識において、単語中の特徴的なス啄
りトルの変化部などが、再サンプル結果に反映されず、
マツチングにおける誤認識を招きやすい等の問題がある
。FIG. 3 shows an example of a resampling state. As described above, in conventional linear matching, the resampling result of input audio data is non-linear, and temporal changes with respect to the correct input audio data spectrum are not included. As a result, for example, in word recognition, characteristic changes in the word ``sutakutor'' are not reflected in the resampling results.
There are problems such as easy recognition errors during matching.

（発明の目的）この発明の目的は、誤認識をなくして認識率の向上をは
かることの出来る音声認識方式を特徴とし、入力音声デ
ータのスペクトルの変化を正しく再サンプルデータに反
映し、上記の問題を解決する手段を提供するもので以下
詳細に説明する。(Objective of the Invention) The object of the present invention is to provide a speech recognition method capable of eliminating misrecognition and improving the recognition rate, and to accurately reflect changes in the spectrum of input speech data in resampled data, thereby achieving the above-mentioned method. It provides a means to solve the problem and will be explained in detail below.

（発明の構成）第４図は、本発明の入力データ及び再サンプルデータと
、そのウィンドの関係を示す図であり、入力データが６
フレームで再サンプル数を５フレームとして行なった場
合の例である。(Structure of the Invention) FIG. 4 is a diagram showing the relationship between input data and resampled data of the present invention and their windows, and the input data is 6.
This is an example in which the number of resamples is set to 5 frames.

このような関係図において、入力音声データ６フレーム
は、一般には、次式（１）により再す／プルが行なわれ
る。In such a relationship diagram, six frames of input audio data are generally replayed/pulled according to the following equation (1).

ａ−一　・・・・・・・・・・・・・・・・　（１）（
１）式ニオイて、ｎは入力音声データフレーム数であシ
、ｍは、再サンプル数である。再サンプルは、入力音声
データ内を正しく管台した位置に再サンプルされ、入力
音声のスペクトルの時間的変化を正確に反映する。a-1 ・・・・・・・・・・・・・・・ (1)(
In formula 1), n is the number of input audio data frames, and m is the number of resamples. The resampling is resampled at the correct position within the input audio data to accurately reflect temporal changes in the spectrum of the input audio.

ウィンド幅ｗｌ　１ｗ２は各再サンプルデータの１フレ
一ム分程度にとられるが、ｗ２に示すようにウィンドを
広くとり、各サンプルでお互いに重複するようにとるこ
ともありうる。The window width wl1w2 is set to approximately one frame of each resampled data, but the window may be set wide and overlapped in each sample as shown in w2.

第５図は、再サンゾル部にクトラムデータノ算出法を示
す図である。FIG. 5 is a diagram illustrating a method for calculating tract data in the resansol part.

ウィンドウ長Ｗは、次式（２）で示される関係にある。The window length W has the relationship shown by the following equation (2).

”　＝　、１−　ｒ　＋１２　川・・・・・曲・　（２
）ここで、再サンプルフレーム（ｍ）のウィンドウＷの
内、入力音声データ第にフレーム目に含まれる長さを１
１とし、第（ｋ＋、１）７レーム目に含まれる長さをｔ
２とする。” = , 1- r + 12 river...song (2
) Here, the length included in the first frame of the input audio data in the window W of the resampled frame (m) is 1.
1, and the length included in the (k+, 1) 7th frame is t
Set it to 2.

再サンプルスペクトラムデータｖ（ｍ）は、次式（３）
で算出される。The resampled spectrum data v(m) is expressed by the following formula (3)
It is calculated by

（３）式において、第にフレームの入力音声スペクトラ
λ央−夕を、　（ｋ　）、、第（ｋ＋１）フレームの入
力音声スペクトラムデータをｕ（ｋ＋１）　、（ｍ１番
目の再サンプルスペクトラムデータをｖ（ｍ）で表わす
。In Equation (3), the input audio spectrum of the 1st frame is expressed as (k), the input audio spectrum data of the (k+1)th frame is expressed as u(k+1), and the input audio spectrum data of the (m1)th frame is expressed as v Represented by (m).

第５図では、ｖ（ｍ）の演算に乗除算が必要で回路での
実現が複雑化する。従って、演算の簡単化の為、入力音
声１フレームを、一定の数で分割し、ｔ□　＋Ｌ２を整
数におきかえる。In FIG. 5, the calculation of v(m) requires multiplication and division, which complicates implementation in a circuit. Therefore, in order to simplify the calculation, one frame of input audio is divided into a fixed number of parts, and t□+L2 is replaced with an integer.

第６図は、フレーム内定分割方式による再サンノルデー
タ算出法を示す図である。例として、１フレームを４分
割、再サンプルデータのウィンドウ長を４の場合の図を
示す。FIG. 6 is a diagram illustrating a method for re-calculating data using a frame internal division method. As an example, a diagram is shown in which one frame is divided into four and the window length of resampled data is four.

この場合の１．及びｔ２の比の値は、・ｚ、：ｚ、、＝
３：ｘとなり、再サンプルスペクトラムデータ次式（４）で算
出される。1 in this case. The value of the ratio of and t2 is ・z, :z, ,=
3:x, and the resampled spectrum data is calculated using the following equation (4).

３ｕ（ｎ）−１−ｕ（ｎ＋１　）ｖ（ｍ）−□　・・・・・・・・・・・・　（４）第７
図は、本発明の１実施例を示したブロック図である。3u(n)-1-u(n+1) v(m)-□ ・・・・・・・・・・・・ (4) Seventh
The figure is a block diagram showing one embodiment of the present invention.

第７図において、１００は入力端子、２００は周波数分
析部である。３０θはスペクトル変換部であＩ）、４ｏ
ｏは音声区間決定部である。５００に１再サンプル部で
あり、アキュムレータ５０ノ、スペクトル加算回路５０
２、除算回路５０３、入カフレーム数演算回路５０４、
Ｐ１演算回路５０５、整数化回路５０６、加算回路５０
７、除算回路５０８、整数化回路５０９、再サンプルス
テッゾカウンター５１０１サンプル内カウンタ５１１、
再サンプル制御部５１２から成る。６００はマ。In FIG. 7, 100 is an input terminal, and 200 is a frequency analysis section. 30θ is the spectrum conversion section I), 4o
o is a voice section determining section. 1 in 500 resampling section, 50 accumulators, 50 spectral addition circuits
2. Division circuit 503, input frame number calculation circuit 504,
P1 arithmetic circuit 505, integer conversion circuit 506, addition circuit 50
7, division circuit 508, integer conversion circuit 509, resample Stezzo counter 5101 in-sample counter 511,
It consists of a resample control section 512. 600 is Ma.

チング演算部、７Ｏ０は標準・ぞタンメモリ部、８００
は判定部、９００は認識結果出力端子である。Ching calculation section, 7O0 is standard/zotan memory section, 800
9 is a determination unit, and 900 is a recognition result output terminal.

このような構成において、入力端子１００から入力され
る入力音声信号は、周波数分析部２θ０に入力され、板
数の周波数帯域に対応した量子化信号として、周波数分
析されスにクトル変換部３θ０に送られる。In such a configuration, an input audio signal inputted from the input terminal 100 is inputted to the frequency analysis section 2θ0, and is frequency analyzed and sent to the vector transformation section 3θ0 as a quantized signal corresponding to the frequency band of the number of boards. It will be done.

スペクトル変換部３ｏｏに送られたデータは、スペクト
ル変換がなされ、スペクトル情報と音声パワー情報等と
なり、音声区間決定部４００、及び再サンプル部５θθ
に送られる。The data sent to the spectrum conversion unit 3oo undergoes spectrum conversion to become spectrum information, audio power information, etc., and is sent to the audio interval determination unit 400 and the resampling unit 5θθ
sent to.

音声区間決定部４００は、音声・やワー情報を利用して
、音声区間の始端及び終端を決定し、再サンプル部５０
０及びマツチング演算部６００へ送る。　− 再サンゾル部５θ０に送られたスペクトルデータｕ（ｌ
、Ｊ）は、以下に説明する方法で再サンプルされ、再サ
ンプル済データｖ（ｉ、ｊ）となり、マツチング演算部
６００に送られ、標準・ぐターン７００に格納されてい
る標準パターンとの距離演算を行ない、判定部８００へ
送る。判定部８００では、トータル距離との距離値の比
較を行ない、最も小さいトータル距離のカテコ゛り名を
認識結果として出力端子９．００から出力する。The voice section determination unit 400 determines the start and end points of the voice interval using the voice/voice information, and the resampling unit 50
0 and is sent to the matching calculation section 600. - Spectral data u(l) sent to the re-sansol unit 5θ0
, J) are resampled by the method described below to become resampled data v(i, j), which is sent to the matching calculation unit 600, and the distance from the standard pattern stored in the standard pattern 700 is calculated. The calculation is performed and sent to the determination section 800. The determination unit 800 compares the distance value with the total distance, and outputs the category name of the smallest total distance from the output terminal 9.00 as the recognition result.

さて、本発明の再サンニア’　／Ｉ／演算部５００の動
作を説明する。Now, the operation of the resannier' /I/ calculation section 500 of the present invention will be explained.

これまで述べてきた再サンプル法は、理想的な再サンプ
ルの状態を説明してきた。The resampling methods described so far have described ideal resampling conditions.

本方式を回路化した実施例を説明する。An example in which this method is implemented as a circuit will be described.

回路化においては、演算を簡単にするため、再サンプル
データの演算方法に多少の変形を加えている。１フレー
ムの分割数４、ウィンド長４の場合について、説明する
。In circuitization, some modifications were made to the method of calculating resampled data in order to simplify calculations. A case where the number of divisions of one frame is 4 and the window length is 4 will be explained.

入力音声データをｕ（１＋ｊＬ　！Ｊサンプルデータを
ｖ（１＋Ｊ）とし、ｖ（＋＋ｊ）の算出は、次の式よシ
行なう。Let the input audio data be u(1+jL!J) and the sample data be v(1+J), and calculate v(++j) according to the following equation.

Ｐ　１＝　［：４Ｘ（ｎ　−１）Ｘ（ｊ−ｉ　）／３１
〕−（６）Ｐ　２　＝　Ｐ　１　＋　３　・・・・・・
・・・・・・・・・・・・・・・・・・・・　（７）〔
〕は値を整数化することを意味する。まだ、ｎは、切り
出された入力音声データの総フレーム数、Ｐは、入力音
声データの各フレームを４分割したサブルーム番号系例
において、リサンプルデ−タのウィンド区間の範囲を示
す。サブフレーム番号Ｐ１は、Ｐの始端サブフレーム番
号、Ｐ２は終端サブフレーム番号でウィンド区間４であ
ることから、ＰＩ　とＰ２の関係は、（７）式の通りと
なる。P 1= [:4X(n-1)X(ji-i)/31
]-(6)P2=P1+3...
・・・・・・・・・・・・・・・・・・・・・ (7)
] means to convert the value into an integer. In addition, n indicates the total number of frames of input audio data that has been cut out, and P indicates the range of the window section of resample data in an example of a subroom number system in which each frame of input audio data is divided into four. Since the subframe number P1 is the starting subframe number of P, and P2 is the ending subframe number of window section 4, the relationship between PI and P2 is as shown in equation (7).

第８図にフレーム番号とサブフレーム番号の関係を示す
。FIG. 8 shows the relationship between frame numbers and subframe numbers.

（６）式は、再サンプル番号ｊによってＰ、値を与える
式で、演算結果は整数化してめられる。演算結果は、入
力音声データフレーム数ｎを与えると、次のようになる
。　〜ｊ＝３２　Ｐｘ−Ｃφ冒−２Ｘ３□〕−４（。−１）１即ち、この場合、再サンプルデータの１番目のサンゾル
ｍ（ｊ＝ｉ　）は、入力音声データの第１フレーム目の
先頭サブフレームを、最終３２番目のサンゾルｕ（ｊ＝
３２）は、入力音声データの最終ｎ＠目のフレーみの先
頭サブフレームを・指示していることになる。Equation (6) is an equation that gives the value P according to the resampling number j, and the calculation result is converted into an integer. The calculation result is as follows when the number n of input audio data frames is given. 〜 j=32 Px−Cφ−2 The first subframe is converted to the final 32nd subframe u (j=
32) indicates the first subframe of the last n@th frame of the input audio data.

以上の各再サンプル区間の指定により、（５）式で示さ
れる再サンプルスペクトルデータ例の演算ヲ行なう。By specifying each resampling interval as described above, the calculation of the resampled spectrum data example shown by equation (5) is performed.

即ち（５）式において、再サングルの各区間ｐ　ｌ〜Ｐ
２が入力音声データのどのフレーム領域に属す応する入
力音声スペクトルデータを４回加算し、全体のスペクト
ル平均化を行なうため、加算値を演算することによシ、
本発明の目的とするところの、平均化再サンプルスペク
トルデータ列が−まることになる。以上の演算処理を第
７図に示す。That is, in equation (5), each resampling interval p l~P
In order to add the corresponding input audio spectrum data four times to which frame region of the input audio data 2 belongs and average the entire spectrum, by calculating the added value,
This results in an averaged resampled spectrum data sequence, which is the object of the present invention. The above calculation process is shown in FIG.

スペクトル変換がなされスペクトル情報と音声パワー情
報等となったスペクトルデータ３０’　０のデータｕ（
ｉ、ｊ）は、再サンプル部５００に送られアキームレー
タ５０１に格納される。Spectral data 30' 0 data u(
i, j) are sent to the resampling unit 500 and stored in the achievator 501.

７＊、ムＬ／−夕５０１に格納されたスペクトル交換済
の音声データは、スペクトル加算回路５０２に送られ入
力音声スペクトラムデータの加算を行なった後、除算回
路５０３において、スペクトルの平均化のための除算を
おこない、再サンプルデータｖ（ｉ、ｊ）をマツチング
演算部６００へ送る。7*, the spectrum-exchanged audio data stored in the MU 501 is sent to the spectrum addition circuit 502 to add the input audio spectrum data, and then to the division circuit 503 for averaging the spectrum. is divided, and the re-sampled data v(i, j) is sent to the matching calculation unit 600.

また音声区間決定部４００で音声・ぐワー情報を利用し
、音声区間の始端及び終端が決定されたデータは、入力
フレーム数演算回路５０４に送られる。入力フレーム数
演算回路５０４によって、算出された入力フレーム数ｎ
は、Ｐ１演算回路５０５に送られる。Ｐｌ演算回路５０
５に送られたデータは、前述した（３）式により、再サ
ンプルスペクトラムデータＰ１が算出された後、整数化
回路５０６で（６）式よりデータが整数化され、（７）
式によ゛り加算回路５０７でＰ値を順次求めた後、除算
回路５ｏ８゛で入力音声データのフレーム番号をめるだ
めの５０１及び、スペクトル加算回路５０．）へ送る。Further, the data in which the start and end of the voice section are determined by the voice section determining section 400 using the voice/voice information is sent to the input frame number calculation circuit 504 . The number n of input frames calculated by the input frame number calculation circuit 504
is sent to the P1 calculation circuit 505. Pl calculation circuit 50
After the resampled spectrum data P1 is calculated using the above-mentioned equation (3), the data sent to step 5 is converted into an integer by the integer conversion circuit 506 using equation (6), and then the data is converted into an integer using equation (6).
After the P value is sequentially obtained by the addition circuit 507 according to the formula, the division circuit 5o8' calculates the frame number of the input audio data 501, and the spectrum addition circuit 50. ).

次に、サンプル内カウンタ５１１のカウントにより、加
算回路でＰ値に１を加算をくり返し、順次Ｐ１からＰ２
の４つの値をめ、各々整数化し、その値を、アキームレ
ータ５０ノ及びスペクトル加算回路５０２へ送る。Next, according to the count of the in-sample counter 511, the addition circuit repeatedly adds 1 to the P value, and sequentially from P1 to P2.
The four values are converted into integers, and the values are sent to the achievator 50 and the spectrum addition circuit 502.

スペクトル加算回路５０２は、以上求まった４フレーム
に相当するスペクトルデータをアキームレータ５０１よ
シよみ出し、加算し、除算回路５０３で平均化し、１つ
の再サンプルデータとなる。The spectral addition circuit 502 reads out the spectral data corresponding to the four frames obtained above from the achievator 501, adds them, and averages them in the division circuit 503 to obtain one resampled data.

なお、再サンゾル５００は、音声区間決定部からの再サ
ンチル開始信号が再サンプル制御部５１２に送られると
、再サンプルステラフ０カウンタ５１０にてスタートフ
レーム１フレームからエンドフレーム３２フレームまで
のカウントを行ない、３２個の再サンプルデータを順次
作成する。In addition, when the re-sampling start signal from the voice interval determining section is sent to the re-sampling control section 512, the re-sampling 500 counts from the start frame 1 frame to the end frame 32 frames in the re-sampling stelaph 0 counter 510. 32 re-sampled data are sequentially created.

再サンプルステップカウンタ５１ｏで必要なカウント数
が満たされると、再サンプル制御部５１２よりマツチン
グ開始信号がマツチング演算部６ｏθへ送られる。When the re-sample step counter 51o reaches the required count, the re-sample control unit 512 sends a matching start signal to the matching calculation unit 6oθ.

（発明の効果）以上説明したように、本発明の構成をとることニヨって
、入力音声データに対する再サンプルデータの時間的直
線性が失なわれることか回避、されると同時に、単語認
識において、単語中の特徴的なスペクトル変化部が再サ
ンゾル結果に反映され、マツチングにおける誤認識を減
少させ、認識性能の向上を図ることができる。また、実
際に、本方式を用いて実験を行なった結果、従来の再サ
ンプル方式と比較すると、認識率の向上と同時に、１位
と２位の距離値が拡大し、認識時の安定度の向」二がみ
られた。(Effects of the Invention) As explained above, by employing the configuration of the present invention, loss of temporal linearity of resampled data with respect to input speech data can be avoided, and at the same time, in word recognition. , characteristic spectral changes in words are reflected in the re-sansol results, reducing misrecognition during matching and improving recognition performance. In addition, as a result of actually conducting experiments using this method, we found that when compared with the conventional resampling method, the recognition rate improved, the distance value between the 1st and 2nd place expanded, and the stability during recognition improved. Two were seen.

[Brief explanation of the drawing]

第１図は、従来の音声認識装置のグロック図、第２図は
リニアマツチング方式における入力音声データと再サン
プルデータの対応を表わした図、第３図は、再サンプル
の状態例、第４図は、本発明の入力データ及び再サンプ
ルデータと、そのウィンドの関係を示す図、第５図は、
再サンプルスペクトラムデータの算出法を示す図、第６
図は、フレーム内定分割方式による再サンノルデータ算
出法を示す図、第７図は、本発明の１実施例を示したブ
ロック図１．第８図はフレーム番号とサブフレーム番号
の関係を示した図である。ｌθＱ・・・入力端子、２θθ・・・周波数分析部、３
θ０・・・スペクトル変換部、４ｏｏ・・・音声区間決
定部、５ｏｏ・・・再サンプル部、５０ノ・・・７キユ
ムレータ、５ｏ２・・・スペクトル加算回路、５ｏ３・
・・除算回路、５ｏ４・・・入ヵフレーム数演算回路、
５０５・・・Ｐ、演算回路、５０５・・・整数化回路、
５０７・・・加Ｎ　Ｏ路、５０Ｂ・・・除算回路、５０
９・・・整数化回路、５１ｏ・・・再サンプルステップ
カウンタ、５１ノ・・・サンノル内カウンタ、６１２・
・・再サンプル制御部、６０Ｑ・・・マツチング演算部
、７θθ・・・標準ノリーンメモリ部、８θθ・・・判
定部、９θ０・・・認識結果出力ｇｌ、ｉ子。特許出願人　沖電気工業株式会社第２図５ＴＦＲＥＤＦＲ −入ｆ１＠？−ｒ′−り第３図メ〃フし一ム教１０　再オン７″Ｊｌ／＆　６　の１辺
りイ≧＼のづクリ第４図第５図手続補正書（自発）５９．４１６昭和　年　月　日特許庁長官　殿１　事件の表示昭和５８年　特　許　願第　２３６３４５号２　発明の
名称音声認識方式３　補正をする者事件との関係　特許　出　願　人Fig. 1 is a block diagram of a conventional speech recognition device, Fig. 2 is a diagram showing the correspondence between input speech data and resampled data in the linear matching method, Fig. 3 is an example of the state of resampling, and Fig. 4 The figure shows the relationship between the input data and resampled data of the present invention and their windows.
Diagram showing the calculation method of resampled spectrum data, No. 6
7 is a diagram illustrating a re-sannor data calculation method using a frame internal division method, and FIG. 7 is a block diagram illustrating an embodiment of the present invention. FIG. 8 is a diagram showing the relationship between frame numbers and subframe numbers. lθQ...input terminal, 2θθ...frequency analysis section, 3
θ0...spectrum conversion section, 4oo...speech interval determination section, 5oo...resampling section, 50...7 cumulator, 5o2...spectrum addition circuit, 5o3.
・・Division circuit, 5o4 ・・Input frame number calculation circuit,
505...P, arithmetic circuit, 505...integer conversion circuit,
507... Addition NO path, 50B... Division circuit, 50
9... Integer conversion circuit, 51o... Re-sample step counter, 51no... Sunnor internal counter, 612...
...Resampling control unit, 60Q...Matching calculation unit, 7θθ...Standard Noreen memory unit, 8θθ...Judgment unit, 9θ0...Recognition result output gl, i child. Patent applicant: Oki Electric Industry Co., Ltd. Figure 2 5TFREDFR -in f1@? -r'-ri Figure 3 Mefu Shimukyo 10 Re-on 7''Jl/& 6 around 1≧\Nozukuri Figure 4 Figure 5 Procedural amendment (voluntary) 59.416 Showa year Month Date Commissioner of the Japan Patent Office 1 Indication of the case 1982 Patent Application No. 236345 2 Name of the invention Voice recognition method 3 Person making the amendment Relationship with the case Patent Applicant

Claims

[Claims] Frequency analysis of the input voice is performed, spectral characteristics are normalized, time normalization is performed by resampling, matching is performed with the standard/mother turn, and the category with the highest degree of similarity is determined to be the category of the input voice. In the speech recognition device to be judged, (a) each original frame data of input audio data is divided into a plurality of subframe data carrying the same information as the spectral characteristics of the original frame data, and (b) input audio is divided into a whole frame information series. Extract the corresponding subframe information series, (c) Divide the subframe information series at equal intervals by a constant number corresponding to the number of resamples, and (2) Divide the subframe information series into a set time length of the spectral data to be averaged as resampled data. The starting subframe and the ending subframe of each resample are determined according to the corresponding window length, and the resampling is performed by calculating the average of the spectral characteristics carried by all subframes between the starting end and the ending subframe. A voice recognition method that uses