JP2507311B2

JP2507311B2 - Voice analyzer

Info

Publication number: JP2507311B2
Application number: JP61017157A
Authority: JP
Inventors: 久夫石塚; 雄一郎池田
Original assignee: NIPPON DENKI AISHII MAIKON SHISUTEMU KK; Nippon Electric Co Ltd
Current assignee: NIPPON DENKI AISHII MAIKON SHISUTEMU KK; NEC Corp
Priority date: 1986-01-28
Filing date: 1986-01-28
Publication date: 1996-06-12
Anticipated expiration: 2011-06-12
Also published as: JPS62174799A

Description

【発明の詳細な説明】１）発明の分野本発明は、デジタル処理による音声認識及び音声分析
に関するものである。Description: 1) Field of the Invention The present invention relates to voice recognition and voice analysis by digital processing.

２）従来技術一般に、音声認識及び音声分析をデジタル処理で行な
う場合、膨大な情報量を持つ音声信号を、フーリエ変換
やウォルシュ変換などで直交変換し、情報量を圧縮し、
特徴量を抽出することが行なわれている。2) Prior Art Generally, when performing voice recognition and voice analysis by digital processing, a voice signal having a huge amount of information is orthogonally transformed by Fourier transform, Walsh transform, or the like to compress the amount of information,
Extraction of feature quantities is performed.

フーリエ変換やウォルシュ変換を用いる場合、周波数
分解能及び特徴量の時間変動との兼合いから時間窓を設
け、時系列データである音声の１部分を取り出して処理
を行なうのが一般的である。In the case of using the Fourier transform or Walsh transform, it is general that a time window is provided in consideration of the frequency resolution and the time variation of the feature amount, and a part of the voice that is time series data is extracted and processed.

この時間窓は音声信号のサンプリング間隔と処理の容
易さとの兼合いから32msec程度のものになることが多い
（参考文献，例えば共立出版刊「音声認識」新美康永
著，オーム社刊「音声情報処理の基礎」斉藤収三ら，な
どに詳しい）。This time window is often about 32 msec due to the balance between the sampling interval of the voice signal and the ease of processing (references, for example, Kyoritsu Shuppan "Speech Recognition" by Yasunaga Niimi, Ohmsha "Speech Information" Basics of processing "Sato Saito et al., Etc.).

前記時間窓長は、例えばサンプリング間隔が0.125mse
cの場合、データ数換算で256データにより、このデータ
を保持しておく記憶領域が256個必要である。The time window length is, for example, a sampling interval of 0.125 mse.
In the case of c, since there are 256 data in terms of the number of data, 256 storage areas for holding this data are required.

更に、音声信号の特徴量の時間変動追従性を高めるた
めに、時間窓を一部重複させて計算することも広く行な
われている。例えば、32msecのうち16msec分を重複させ
ることが一般的である。この処理されたデータの出現時
間間隔をフレームレートと呼ぶ。Further, in order to improve the time variation followability of the feature amount of the audio signal, it is widely practiced to partially overlap the time windows for calculation. For example, it is common to overlap 16 msec of 32 msec. The appearance time interval of the processed data is called a frame rate.

上記の条件を満足する処理装置を実現しようとする場
合、少なくとも384個以上の記憶領域と、全処理を16mse
c以内に終了できる高速性が要求されるので、安価で記
憶領域も少なく、しかも比較的低速な汎用の１チップマ
イクロプロセッサなどを利用した装置で、実現するのは
従来不可能であった。In order to realize a processing device that satisfies the above conditions, at least 384 or more storage areas and 16 mse
Since it is required to have a high speed so that it can be completed within c, it has hitherto been impossible to realize with a device using a general-purpose one-chip microprocessor which is inexpensive, has a small storage area, and is relatively slow.

３）発明の目的本発明は前記欠点である記憶領域及び計算量を減少
し、安価に実現できる音声分析装置を提供することであ
る。3) Object of the Invention The present invention is to provide a speech analysis apparatus which can be realized at a low cost by reducing the storage area and the calculation amount which are the above-mentioned drawbacks.

４）発明の構成実時間処理を行なう音声分析装置において、従来の時
間窓より短かい、即ち、記憶領域及び処理量の少ない音
声分析部と、分析結果を保持するバッファ部と、１個ま
たは複数の加算器と、前記加算器と対を成す加算結果を
保持するバッファ部と、前記加算器及びバッファ部とを
制御する制御部と、前記バッファ部の内容を加算する加
算器と、加算した結果を保持するバッファ部を有するこ
とを構成上の特徴とする。4) Configuration of the invention In a voice analysis device that performs real-time processing, a voice analysis unit that is shorter than a conventional time window, that is, has a smaller storage area and less processing amount, a buffer unit that holds an analysis result, and one or a plurality of units. Adder, a buffer unit that holds the addition result that forms a pair with the adder, a control unit that controls the adder and the buffer unit, an adder that adds the contents of the buffer unit, and the addition result Is characterized in that it has a buffer unit for holding

５）発明の効果本発明を用いれば従来、大きな記憶容量と、多くの処
理量とが必要であった音声分析装置を、小さな記憶容量
と少ない処理量で、同等の情報量の得られる音声分析装
置が実現できる。5) Effect of the Invention According to the present invention, a voice analysis device that has conventionally required a large storage capacity and a large amount of processing can be used to obtain an equivalent amount of information with a small storage capacity and a small amount of processing. The device can be realized.

例えば前述の例（分析時間窓32msec,フレームレート1
6msec,サンプリング間隔0.125msec）では、記憶容量は3
84個，処理量は、FFTを用いるとすると、1024回のバタ
フライ演算が必要である。For example, the above example (analysis time window 32msec, frame rate 1
6msec, sampling interval 0.125msec), the memory capacity is 3
Assuming 84 pieces, the amount of processing is 1024 butterfly calculations if FFT is used.

これを、音声分析部を時間窓8msecのFFTで行なうとす
ると、記憶容量は64個，処理量は192回のバタフライ演
算となり、記憶容量で1/6,処理量で約1/5となる効果が
ある。If this is performed by an FFT with a time window of 8 msec in the speech analysis unit, the storage capacity will be 64, and the processing amount will be 192 times of butterfly calculations, and the storage capacity will be 1/6 and the processing amount will be approximately 1/5. There is.

６）実施例以下に本発明の音声分析装置の実施例について、図面
を用いて説明する。6) Example An example of the speech analysis apparatus of the present invention will be described below with reference to the drawings.

第１図は、音声分析部の時間窓を8msecとした場合
に、分析時間窓32msec,フレームレート16msec相当の音
声分析装置の主要ブロック図である。FIG. 1 is a main block diagram of a voice analysis device corresponding to an analysis time window of 32 msec and a frame rate of 16 msec when the time window of the voice analysis unit is 8 msec.

なお、音声分析部は公知の分析方式のもので良い。 The voice analysis unit may be of a known analysis method.

第１図において、音声分析部１は、8msec分の音声デ
ータを公知の方式により分析し、分析結果を分析バッフ
ァー２に書き込む。また音声分析部１は、コントローラ
６に１回の分析が終了する毎に、信号を送る。In FIG. 1, the voice analysis unit 1 analyzes voice data of 8 msec by a known method and writes the analysis result in the analysis buffer 2. The voice analysis unit 1 also sends a signal to the controller 6 each time one analysis is completed.

コントローラ６は、音声分析部１の信号により、切り
換えスイッチ３を接点12→接点13→接点14→接点15→接
点12の順に切り換える。接点12が接続すると、分析バッ
ファ２の内容をバッファ７に送る。接点13が接続すると
分析バッファ２の内容とバッファ７の内容を加算器４で
加算し、バッファ７に送る。The controller 6 switches the changeover switch 3 in the order of contact 12 → contact 13 → contact 14 → contact 15 → contact 12 according to a signal from the voice analysis unit 1. When the contact 12 is connected, the contents of the analysis buffer 2 are sent to the buffer 7. When the contact 13 is connected, the contents of the analysis buffer 2 and the contents of the buffer 7 are added by the adder 4 and sent to the buffer 7.

接点14を接続すると、分析バッファ２の内容をバッフ
ァ８に送る。接点15を接続すると分析バッファ２の内容
とバッファ８の内容を加算器５で加算し、バッファ８に
送る。Connecting contact 14 sends the contents of analysis buffer 2 to buffer 8. When the contact 15 is connected, the contents of the analysis buffer 2 and the contents of the buffer 8 are added by the adder 5 and sent to the buffer 8.

接点13と接点15が接続すると、バッファ７及びバッフ
ァ８には16msec分のデータが存在することになる。When the contact 13 and the contact 15 are connected, 16 msec worth of data exists in the buffer 7 and the buffer 8.

コントローラ６は接点13と接点15が接続される場合だ
け、加算器９に信号を送り、バッファ７とバッファ８の
内容を加算し、除算器10に結果を送る。Only when the contacts 13 and 15 are connected, the controller 6 sends a signal to the adder 9, adds the contents of the buffers 7 and 8, and sends the result to the divider 10.

除算器はこの場合、４回分の分析結果の平均化処理を
行なうので、４で割る処理を行なう。In this case, the divider performs the averaging process of the analysis results for four times, and thus performs the process of dividing by 4.

４で割る処理はシフターによる2bitシフトでも実現で
きるので除算器10はシフターに置き換えられる。The division by 10 can be replaced with a shifter because the process of dividing by 4 can be realized by a 2-bit shift by a shifter.

除算結果は、平均化バッファ11に送られ、平均化バッ
ファ11は分析結果としてバッファの内容を出力する。The division result is sent to the averaging buffer 11, and the averaging buffer 11 outputs the contents of the buffer as the analysis result.

これら一連の処理の時間推移を第２図に示す。 The time transition of the series of processes is shown in FIG.

第２図において、タイミング１は、第１図における音
声分析部１から分析結果が出力されるタイミングであ
る。In FIG. 2, timing 1 is the timing at which the analysis result is output from the voice analysis unit 1 in FIG.

Ｔは、前記の例では8msecである。 T is 8 msec in the above example.

タイミング２は、第１図における接点12及び接点14が
接続されるタイミングであり、タイミング３は第１図に
おける接点13及び接点15が接続され、加算器９が動作
し、分析結果が出力するタイミングである。Timing 2 is the timing at which the contact 12 and contact 14 in FIG. 1 are connected, and timing 3 is the timing at which the contact 13 and contact 15 in FIG. 1 are connected and the adder 9 operates and the analysis result is output. Is.

以上の実施例では、音声分析部の時間窓Ｔに対して、
時間窓4T,フレームレート2Tの分析装置を示したが、第
１図における切り換えスイッチ３の接点数及びバッファ
ーと加算器の対を任意の数にすることで、例えば時間窓
6T,フレームレート3Tのような構成とすることができる
ことは自明である。In the above embodiment, with respect to the time window T of the voice analysis unit,
Although an analyzer having a time window of 4T and a frame rate of 2T is shown, the number of contact points of the changeover switch 3 and the number of pairs of buffer and adder in FIG.
It is obvious that a configuration such as 6T and frame rate 3T can be used.

以上説明したように、本発明の音声分析装置によれ
ば、少ない記憶容量と少ない処理量でも従来の大記憶容
量と大処理量の音声分析装置と同一の情報量を有する音
声分析装置を実現することが可能である。As described above, according to the voice analysis device of the present invention, a voice analysis device having the same amount of information as a conventional large storage capacity and large processing amount voice analysis device is realized even with a small storage capacity and a small processing amount. It is possible.

[Brief description of drawings]

第１図は、本発明の一実施例を示す分析装置の主要ブロ
ック図、第２図は、第１図のタイミング図である。１……音声分析部,2……分析バッファ,3……切り換えス
イッチ,4……加算器,5……加算器,6……コントローラ,7
……バッファ,8……バッファ,9……加算器,10……除算
器またはシフター,11……平均化バッファ。FIG. 1 is a main block diagram of an analyzer according to an embodiment of the present invention, and FIG. 2 is a timing diagram of FIG. 1 ... Voice analysis unit, 2 ... Analysis buffer, 3 ... Changeover switch, 4 ... Adder, 5 ... Adder, 6 ... Controller, 7
... buffer, 8 ... buffer, 9 ... adder, 10 ... divider or shifter, 11 ... averaging buffer.

Claims

(57) [Claims]

1. A voice analysis device for storing and outputting an analysis signal analyzed by a voice analysis unit in a predetermined amount of buffer,
A voice characterized by using a buffer having a capacity smaller than the predetermined amount of buffer, adding an analysis signal input to the buffer having a small capacity and a newly input analysis signal, and then performing averaging processing. Analysis equipment.