JPS6073600A

JPS6073600A - Voice recognition equipment

Info

Publication number: JPS6073600A
Application number: JP58181135A
Authority: JP
Inventors: 和弘金子; 角石　光夫; 誠治加藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-09-29
Filing date: 1983-09-29
Publication date: 1985-04-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】ｔａ＞　発明の技術分野本発明は音声認識装置に係り、特に入力音声信号の特徴
パラメータを効率よく且つ精度よく抽出することが可能
となる音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition device that is capable of efficiently and accurately extracting characteristic parameters of an input speech signal.

（ｂ）従来技術と問題点音声認識装置は、例えば第１図に示す如くマイクロホン
１等より入力される音声信号の特徴を抽出する特徴抽出
部２．この抽出部２より得られた特徴をパラメータとし
て登録する音声パターンメモリ５．このメモリ５に格納
された特徴パラメータを基に、入力される音声を認識し
てその認識結果を出力する認識部４を備える。(b) Prior Art and Problems A speech recognition device includes a feature extraction section 2 which extracts the features of a speech signal inputted from a microphone 1 or the like, as shown in FIG. 1, for example. A voice pattern memory 5 in which the features obtained from the extractor 2 are registered as parameters. A recognition unit 4 is provided that recognizes input speech based on the feature parameters stored in the memory 5 and outputs the recognition result.

ここで特徴抽出部２の構成として、それぞれ異なる中心
周一波数をを有する複数の帯域濾波器を用いる方法、及
びこれら帯域濾波器に代えて相関器を用いる方法がある
。しかるに前者の方法は、回路規模の増大ないしは演算
量の増大を招く懸念があり、後者の方法に比べてやや不
利である。Here, as the configuration of the feature extracting section 2, there are a method of using a plurality of bandpass filters each having a different center frequency and wave number, and a method of using a correlator in place of these bandpass filters. However, the former method is somewhat disadvantageous compared to the latter method, as there is a concern that the circuit scale will increase or the amount of calculations will increase.

一方後者の方法は、回路規模或いは演算量の点で有利で
はあるが、相関をとる時間を長くするとこの出力特性が
変化する懸念がある。即ち、自己相関関数をＹ、入力さ
れる音声波形を５ｉｎ（２πｆｔ十〇）、基準関数を５
ｉｎ２ｙｃｆｏｔ及びＣＯ３２πｒｏｔ、波形のサンプ
リングに使う時間を八Ｔとすると次式で定義される。On the other hand, although the latter method is advantageous in terms of circuit scale or amount of calculation, there is a concern that the output characteristics may change if the correlation is taken for a long time. That is, the autocorrelation function is Y, the input audio waveform is 5 inches (2πft 〇), and the reference function is 5
In2ycfot and CO32πrot are defined by the following equation, assuming that the time used for waveform sampling is 8T.

相関をとる時間（フレーム幅）はＴ＝△ＴＸＮである。The time (frame width) for taking the correlation is T=ΔTXN.

第３図に上記（１）式の特性を示す。つまり、この特性
で音声波形を変換し特徴ノ々ラメータを得ることになる
。相関をとる時間を長くすると、変換された特徴パラメ
ータはより正確になる。しかし相関をとる時間Ｎを２Ｎ
、３Ｎ　・−ｍＮとしてし）りと出力特性が第４図に示
すように変化してしまうという欠点がある。FIG. 3 shows the characteristics of equation (1) above. In other words, the voice waveform is converted using this characteristic to obtain the characteristic parameter. The longer the correlation time, the more accurate the transformed feature parameters will be. However, the time N for taking the correlation is 2N
, 3N .-mN), the output characteristic changes as shown in FIG. 4, which is a drawback.

（Ｃ）　発明の目的本発明の目的は、特徴抽出部におし）で相関器を用いる
方法において、入力音声信号の相関をとる時間を変えて
も出力特性が変化しないようにして正確な特徴パラメー
タに変換する相関関数を提イ共し、その結果、音声認識
性能を向上させること力（可能となる音声ｌ織装置を提
供するにある。(C) Object of the Invention The object of the present invention is to obtain accurate features in a method using a correlator in the feature extracting section so that the output characteristics do not change even if the time for correlating the input audio signal is changed. The object of the present invention is to provide a speech recognition device that can improve speech recognition performance by providing a correlation function that converts the parameters into parameters.

（ｄ）　発明の構成上記目的を一達成するため本発明においては、上述した
（１）式における単位（△ＴＸＮの時間に相当）の相関
を複数回繰返し、それぞれの相関結果を累積加算した結
果を用いて特徴抽出を行うように構成したものである。(d) Structure of the Invention In order to achieve the above object, the present invention repeats the correlation of the unit (corresponding to the time of △TXN) in the above-mentioned formula (1) multiple times, and calculates the result of cumulatively adding the respective correlation results. It is configured to perform feature extraction using .

以下実施例を用いて本発明を詳述する。The present invention will be explained in detail below using Examples.

（ｅ）　発明の実施例第２図は本発明の一実施例構成を示すエルプリズム図で
あり、第１図に示した抽出部２における相関器の構成を
示すものである。図中、２１．２２は乗算部、２３．２
４は２乗演算部、２５は加算部、２６は加算累積部を示
す。また、Ｙｓ　（△Ｔ）、Ｙｃ（△Ｔ）はそれぞれ上
記（１）式で示した基準関数５ｉｎ２ｙｒｆＱ△Ｔ、ｃ
ｏｓ２ｙｒ’ｆＯ△Ｔを表わす。(e) Embodiment of the Invention FIG. 2 is an L prism diagram showing the configuration of an embodiment of the present invention, and shows the configuration of the correlator in the extraction section 2 shown in FIG. In the figure, 21.22 is a multiplication section, 23.2
4 is a square calculation section, 25 is an addition section, and 26 is an addition accumulation section. Furthermore, Ys (△T) and Yc (△T) are the reference functions 5in2yrfQ△T and c shown in equation (1) above, respectively.
represents os2yr'fO△T.

即ち本実施例では、入力音声信号ＩＮを乗算部２１．２
２にてサンプリング周期△Ｔで基準関数との積をとり、
また２乗演算部２３．２４によってそれぞれの積の２乗
演算を行う。これにより単位フレームの相関Ｙ′をとる
。この単位の相関出力Ｙ′は加算累積部２６によって逐
次加算、累積される。尚、Ｔは遅延回路であり累積結果
を次のフレームの相関出力が得られるまでバッファリン
グするものである。That is, in this embodiment, the input audio signal IN is multiplied by the multiplier 21.2.
In step 2, take the product with the reference function at the sampling period △T,
Further, the square calculation units 23 and 24 perform a square calculation of each product. This calculates the correlation Y' of the unit frame. This unit of correlation output Y' is successively added and accumulated by the addition/accumulation section 26. Note that T is a delay circuit that buffers the cumulative result until the correlation output of the next frame is obtained.

このように単位フレーム（ΔＴＸＮ時間）の相関を複数
回（例えばｍ回）行った結果が、相関器゛　数Ｙの最終
的な出力０ｔＪＴとして、図示しない特徴パラメータ変
換部に供給されることになる。この出力０ｔＪＴを上記
（１）式と対応して示すと下式の、　（２πｆΔＴ−１
十〇）Ｙ２＝ｃｏｓ２ｉｆ、ＯΔＴ−ｉ−ｓｉｎ（２πｆへＴ
−ｉ十〇）従って、例えば相関出力を、繰返した回数分（ｍ＋１）
で平均をとることによって、第３図に示したと同様の出
力特性を得ることができる。In this way, the result of performing correlation for a unit frame (ΔTXN time) multiple times (for example, m times) is supplied to the feature parameter conversion unit (not shown) as the final output 0tJT of the number Y of correlators. . If this output 0tJT is shown in correspondence with the above equation (1), the following equation is expressed as (2πfΔT−1
10) Y2=cos2if, OΔT-i-sin (T to 2πf
-i 〇) Therefore, for example, the correlation output is repeated for the number of times (m+1)
By averaging the output characteristics as shown in FIG. 3, it is possible to obtain an output characteristic similar to that shown in FIG.

しかも相関をとった期間は、時系列的に巾広くとった事
となり、特徴パラメータを精度よく抽出できるものであ
る。Furthermore, the period during which the correlation was taken is spread over a wide time series, allowing the characteristic parameters to be extracted with high accuracy.

（ｆ）　発明の効果以上詳述した如（本発明によれば、特徴抽出部における
相関器の出力特性を変化させることな（、相関時間を実
質的に長くとることができ、音声波形をより正確な特徴
パラメータに変換でき、ひいては認識率を大巾に向上さ
せることが可能となる。(f) Effects of the Invention As detailed above (according to the present invention, the correlation time can be substantially lengthened without changing the output characteristics of the correlator in the feature extraction section), and the speech waveform can be further improved. This can be converted into accurate feature parameters, which in turn makes it possible to greatly improve the recognition rate.

[Brief explanation of drawings]

第１図は音−ｍ織装置の構成例を示す図、第２図は本発
明の一実施例を示す図、第３図及び第４図は相関器の出
力特性を示す図である。２は特徴抽出部、２１．２２は
乗算部、２６は加算累積部をそれぞれ示す。また第３′
＠、第４図において横軸は周波数、縦軸は減衰量を表わ
す。FIG. 1 is a diagram showing an example of the configuration of a sound-m weaving device, FIG. 2 is a diagram showing an embodiment of the present invention, and FIGS. 3 and 4 are diagrams showing output characteristics of a correlator. Reference numeral 2 indicates a feature extraction section, 21 and 22 a multiplication section, and 26 an addition accumulation section. Also the 3rd
@, In FIG. 4, the horizontal axis represents frequency, and the vertical axis represents attenuation amount.

Claims

[Claims]

The apparatus includes a feature extraction section that extracts features of the audio signal waveform by making a correlation, the correlation section that takes a correlation between each of the temporally different signal waveform portions of the input audio signal and the reference signal; , a speech recognition device characterized in that the calculation results obtained from the correlation section are summed.