JPS60254100A

JPS60254100A - Voice recognition system

Info

Publication number: JPS60254100A
Application number: JP59108668A
Authority: JP
Inventors: 広田　敦子; 裕飯塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1984-05-30
Filing date: 1984-05-30
Publication date: 1985-12-14
Also published as: JPH0424717B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（技術分野）本発明は、音声認識方式に関し、特に精度良く音声区間
の検出を行う音声区間検出に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a speech recognition system, and particularly to speech segment detection for detecting speech segments with high accuracy.

（背景技術）従来の音声認識装置のブロック図を第１図に示す。第１
図において、■は信号入力端子、２は周波数分析部、３
は音声取込制御部、４は取込開始信号、５は音声区間検
出部、６は取込終了信号、７は始端・終端情報、８は認
識部、９は出力端子の如く構成されてお９、以下各部の
説明をする。(Background Art) A block diagram of a conventional speech recognition device is shown in FIG. 1st
In the figure, ■ is a signal input terminal, 2 is a frequency analysis section, and 3 is a signal input terminal.
4 is a voice capture control section, 4 is a capture start signal, 5 is a voice section detection section, 6 is a capture end signal, 7 is start/end information, 8 is a recognition section, and 9 is an output terminal. 9. Each part will be explained below.

周波数分析部２は、第２図に示す如く構成されておシ、
人力音声信号１１は前置増幅器１２により適邑なレベル
に増幅され、約２０１ＪＨｚ７＞・ら６０００Ｈｚまで
を対数尺度で等間隔に分割されたＮ個のバンド・ぐスフ
ィルタ群ｉ３、全波整流器群１４、およびローパスフィ
ルタ群１５により分析され、さらに、あらかじめ定めら
れた時間周期（以後サンプル周期と記す）毎にマルチプ
レクサ１６を順次切り換えなからＡＤ変換器１７によっ
て量子化され、サンプル周期毎にＮ個の分析結果１８を
出力する。The frequency analysis section 2 is configured as shown in FIG.
The human voice signal 11 is amplified to an appropriate level by a preamplifier 12, and is then amplified by a group of N band filters i3 and a group of full-wave rectifiers, which are equally spaced on a logarithmic scale from approximately 201 JHz7 to 6000 Hz. 14, and a group of low-pass filters 15, and furthermore, the multiplexer 16 is sequentially switched at every predetermined time period (hereinafter referred to as a sample period), and then quantized by the AD converter 17. The analysis result 18 is output.

音声取込制御部３（は、取込開始信号４を受信したのち
、周波数分析部２の分析結果１８を音声区間検出部５お
よび認識部８へ一定時間、寸たけ確かに音声の入力が終
了したと判断する寸で出力する。音声の入力終了の判断
法としては、たとえば、各サンプル周期毎のＮ個のデ゛
−夕の平均値（以後７、　Ｖ−ムｙｅワー、！：　記−
ｒ　）　ヲ利用して、フレーｌ、パワーがあらかじめ設
定された閾値を越えるものが、ある一定数存在したのち
、閾値を越えないものが連続一定数続いたとき音声の入
力が終了したと判断する方法がある。After receiving the capture start signal 4, the voice capture control unit 3 sends the analysis result 18 of the frequency analysis unit 2 to the voice section detection unit 5 and the recognition unit 8 for a certain period of time, and then the voice input is completed exactly. To determine when the audio input has ended, for example, the average value of N data for each sample period (hereinafter 7, V-mye-wah,!: Record-
r) After a certain number of sounds whose power exceeds a preset threshold, it is determined that voice input has ended when a certain number of sounds that do not exceed the threshold continue. There is a way.

音声区間検出部５におけるブロック図を第３図に示す。A block diagram of the voice section detection section 5 is shown in FIG.

第３図において、１８は分析結果、２１はパラメータ演
算部、６は取込終了信号、２２はブロック化部、２３は
音声区間判定部、７は始端終端情報の如く構成され、以
下詳細に説明する。In FIG. 3, 18 is an analysis result, 21 is a parameter calculation unit, 6 is a capture end signal, 22 is a blocking unit, 23 is a voice section determination unit, and 7 is configured as start and end information, which will be explained in detail below. do.

パラメータ演算部２１は、分析結果１８から音声区間検
出に使用する（］）式て定義されるパラメータをめる部
分である。The parameter calculation unit 21 is a part that calculates parameters defined by the expression (]) used for voice section detection from the analysis result 18.

Ｐ−ａ、・Ｘ（１）、ＩＪ、まただしａ　；第３番目の分析結果のス被りトル傾斜マ　
；第３番目の分析結果の平均値また、スにクトル傾劇ａ、すなわち最少２乗近似直線の
傾きは、第３番目のＮ個の分析結果をＸ　Ｈｊとすると
（ｉｉＮ分割されたバンドパスフィルタ群の周波数の低
いものから順にイ」けられた番号）、ａ　は（２）式に
よってめられる。P-a, ·
; Average value of the third analysis results Also, the gradient a, that is, the slope of the least squares approximation straight line, is the average value of the third N analysis results. (numbers assigned in descending order of frequency of the filter group), a is determined by equation (2).

置き換えることができ、（２）式は（３）式に変形され
る。can be replaced, and equation (2) is transformed into equation (3).

めることかできる。I can do it.

また、んはΣＸ　１ＪをＮで除すことによって得！＝１られる。第４図は、Ｐｌ、を演算するブロック図であり
、以下図に従って説明する。Also, n is obtained by dividing ΣX 1J by N! =1 can be done. FIG. 4 is a block diagram for calculating Pl, and will be explained below according to the diagram.

第３番目のＮ個の分析結果ｘ　ｒ　、＋　（’−１，２
　ｒ・・・Ｎ）が順番に出力されるものとすると、加算
器１０１結果を乗算器１０３と除算器１０６に出力され
る。Third N analysis results x r , + ('-1,2
r...N) are output in order, the result of the adder 101 is output to the multiplier 103 and the divider 106.

ない、さも（で補数器１０４によって−ｃｌ・ΣＸ　の
ｉ＝＋　１Ｊ値をめ、加算器１．０５の一方に入力される。また、Ｘ
１Ｊのデータ出力と同期して働くカウンタ１０７ＯＩｊ
ｊ力と、Ｘｌｊと）ＮＲ’　−ｘ　ｌ　Ｊを乗算器１０
８によってめ、乗算器１０８の出力に接続されている加
算器１０９と、さらにそれに接続されている″″１１０
Ｇ・て１蛋１′°゛・Ｊをめ７′員Ｃへ）ができる。レジスター１０の出力、δ＋　−ｘ　ｒ　ｊ
ｊｏ算器１１１の一方の入力に接続さ涯ており、乗算器
１１１の他方の入力にはＮがセットされていて、乗算器
１１１ではＮ　′、蛋、ｉ　°ｘ　Ｈ・が演算され・加
算器１０５のもう一方に入力される。加算器１０５でサ
ンプルデータのス被りトル傾＆４．　をめら五１．１その結果は乗算器１１３の一方の入力となる。また除算
器１０６７は・、蛋１ｘ・ＪをＮ１除す員によってＸＪ
がめられ、その結果は乗算器１１３の他方の入力となり
、乗算器１１３によってＰ、（−ａ−ｘ、）をめること
ができる。以上の演算をサンプル周期毎に行在って、各
サンプル時のＰの値を全て演算することができる。Yes, the complementer 104 calculates the i=+1J value of -cl·ΣX and inputs it to one side of the adder 1.05.
Counter 107OIj that works in synchronization with the data output of 1J
j force, Xlj and) NR' −x l J multiplier 10
8, an adder 109 connected to the output of the multiplier 108, and an adder 110 connected thereto.
G, te1, 1'°゛, J to 7', member C) can be done. Output of register 10, δ+ −x r j
The multiplier 111 is connected to one input of the multiplier 111, and the other input of the multiplier 111 is set to N, and the multiplier 111 calculates and adds The signal is input to the other side of the device 105. The adder 105 calculates the spread torque slope of the sample data &4. 1.1 The result becomes one input of the multiplier 113. Also, the divider 1067 is
The result becomes the other input of the multiplier 113, and the multiplier 113 can multiply P, (-a-x,). By performing the above calculations for each sampling period, all values of P at each sampling time can be calculated.

ブロック化部２２は、パラメータ演算部２１の結果ＰＪ
を取込終了信号６を検出するまで受け取り、取込終了信
号６を検出後、音声のプロ、り化（音声であると思われ
る部分のかたまりの検出）を行なう部分て、第５図にブ
ロック図を示し、第５図に従って説明する。The blocking unit 22 uses the result PJ of the parameter calculation unit 21.
The block diagram in FIG. 5 shows the part where the audio is processed until the capture end signal 6 is detected, and after the capture end signal 6 is detected, the audio is professionally converted (detection of chunks of parts that seem to be audio). A diagram is shown and the explanation will be given according to FIG.

、ｅ−ｙメータ演算部２１の各サンプル周期毎（Ｄ　Ｐ
ｊ ―：１、順次ｐ−クラメータメモ＋Ｊ　２００に格納さ
れているので、それを順番に読取し絶対値回路２０１に
よって絶対値化され、ＩＰ、ｌを比較器２０２の一方に
入力する。比較器２０２の他方の入力には、ＩＰ−の閾
値ＰＴＨがセットされている。比較器２０２では、ＩＰ
７１≧ＰＴＨのときにはα出力に、ＩＰｊ　Ｉ　〈ＰＴ
Ｈのときにはβ出力にそれぞれ有意信号を出力する。, for each sampling period of the e-y meter calculation unit 21 (D P
j -: 1, which are sequentially stored in the p-crameter memo +J 200, are read in order, converted into absolute values by the absolute value circuit 201, and input IP, l to one side of the comparator 202. The other input of the comparator 202 is set to an IP- threshold PTH. In the comparator 202, the IP
When 71≧PTH, α output, IPj I <PT
When the signal is H, a significant signal is output to each β output.

カウンタ２０３は、ＩＰｊｌ≧ＰＴＨのときカウントア
ツプし、ＩＰｊｌ〈Ｐ、Ｈのときクリアされるようにな
っており、ＩＰ−≧ＰＴＩ（となる連続量をカウントす
る。また、カウンタ２０３の出力は、常にレジスタ２０
４にセットされている。レジスタ２０４にセットされて
いる値（ＩＰ、ｌ≧ＰＴＨである連続数）は、比較器２
０５に入力され、比較器２０５の他方の入力にはＫがセ
ットされており、ＩＰ−≧ＰＴｌ（である連続量（以下
ブロック長吉記す）かに以上のとき、比較器２０５の出
力Ｃに有意信号が出力される。The counter 203 counts up when IPjl≧PTH, and is cleared when IPjl<P, H, and counts the continuous amount that satisfies IP−≧PTI. Also, the output of the counter 203 is as follows: always register 20
It is set to 4. The value set in the register 204 (IP, consecutive number where l≧PTH) is set in the comparator 2
05, and K is set in the other input of the comparator 205, and when IP-≧PTl (a continuous quantity (hereinafter referred to as block Chokichi)) or more, the output C of the comparator 205 is significant. A signal is output.

ブロック長がＫ（Ｋ≧２の自然数）以上（Ｃ信号出力時
）で、かつ、比較器２０２のβ出力（１ｐＪｌ　＜ＰＴ
Ｈ）が表われたタイミング１ＡＮＤ回路２０６によって
捕える。カウンタ２０７は、ＡＮＤ回路２０６の出力か
ら出力寸でのＰを読み出した、１量を数えるもので、減Ｈ器２０８によってカウンタ７の
出力からレジスタ２０４の結果（ブロック長）を差し引
くことにより、ブロック間の距離（時間）をめることが
できる。寸たカウンタ２０９は、Ｐの読出しと同期して
カウントしておす、減算器２１．０によってカウンタ２
０９の結果からレジスタ２０４の出力（ブロック長）を
引くことによって、当該ブロックの先頭をめられる。The block length is K (a natural number of K≧2) or more (when outputting C signal), and the β output of the comparator 202 (1 pJl < PT
The timing at which H) appears is captured by the AND circuit 206. The counter 207 counts the amount of P read out at the output size from the output of the AND circuit 206. By subtracting the result (block length) of the register 204 from the output of the counter 7 by the H reducer 208, the block length is calculated. You can measure the distance (time) between The subtractor 21.0 counts the counter 209 in synchronization with the reading of P.
By subtracting the output (block length) of the register 204 from the result of step 09, the beginning of the block can be determined.

加算器２１１とレジスタ２１２によりＩＰｊｌ≧ＰＴｌ
□の部分の累積をめ、プロ、りの大きさを表わすＳｌｌ
なるものをめ、ＡＮＤ回路２０６の信号を検出したとき
、レジスタ２１３にセットすると同時に、レジスタ２１
３の出力（以下ブロック量と記す）、減算器２１０の出
力（ブロック先頭情報）、レジスタ２０４の出力（ブロ
ック長）、および減算器２０８のｉＪＪ力（ブロック間
距離）をプロ、ツクテーブル２１４に登録する。このよ
うにして取込んだ邦全てについてブロック化が行なうこ
とができる。IPjl≧PTl by adder 211 and register 212
The accumulation of the □ part indicates the size of the pro.
When the signal of the AND circuit 206 is detected, it is set in the register 213 and at the same time, the signal of the AND circuit 206 is set.
3 (hereinafter referred to as block amount), the output of the subtracter 210 (block head information), the output of the register 204 (block length), and the iJJ force (interblock distance) of the subtractor 208 are stored in the professional table 214. register. In this way, all the imported Japanese can be divided into blocks.

音声区間判定部２３ば、ブロック化部２２で得られたブ
ロックテーブル２１４から、次のようにして音声区間の
判定を行なっていた。すなわち、ブロック量の最大値と
彦るブロックを検出し、それを音声区間の中心として前
後のブロックについて、ブロック間距離が一定値以下で
あれば当該ブロックも童声区間に含めるという方法で、
音声区間の判定を行なっていた。The speech section determination section 23 determines the speech section from the block table 214 obtained by the blocking section 22 in the following manner. In other words, the block with the maximum block amount is detected, and with respect to the blocks before and after the block as the center of the voice section, if the distance between the blocks is less than a certain value, the block is also included in the children's voice section.
The voice section was being determined.

認識部８は、音声取込制御部３に取込開始信号を送ると
ともに、音声取込制御部３からの分析結果を格納してお
き、さらに音声区間検出部５からの始端終端情報７を受
けると、あらかじめ用意されている内容既知の標準パタ
ーンとの類似度演算を行ない、最も類似度の高い標準・
ぐターンと同一内容の音声が入力されたと判断し、その
結果を出力する。The recognition unit 8 sends a capture start signal to the voice capture control unit 3, stores the analysis results from the voice capture control unit 3, and further receives start and end information 7 from the voice section detection unit 5. The similarity calculation is performed between the standard pattern prepared in advance and the contents of which are known, and the standard pattern with the highest degree of similarity is selected.
It determines that the audio with the same content as the turn is input, and outputs the result.

しかしながら、上記従来の技術における音声区間検出で
は、（１）入力音声の強弱によりスペクトル傾斜ａｊが変化
するため、不安定な・ぐラメータすなわち、Ｐｊが不安
定々・やラメークである。However, in the voice section detection in the conventional technique described above, (1) the spectral slope aj changes depending on the strength of the input voice, so that the parameter Pj is unstable.

（２）　スペクトル傾斜ａｊは、音韻、話者による変化
とともにマイクの特性等によって往往にして、音声部に
おいても０に近い値を取り、結果としてＰもＯに近い値
となり、プロ、り化を誤捷る。(2) The spectral slope aj changes depending on the phoneme and the speaker, as well as the characteristics of the microphone, and takes a value close to 0 even in the voice part, and as a result, P also takes a value close to O, which makes it difficult for professionals and listeners to make a mistake.

（３）　ノイズが大きい場合、ノイズとの区別（特に子
音）がつけにくい。(3) When the noise is large, it is difficult to distinguish it from the noise (especially consonants).

という欠点があった。There was a drawback.

（発明の課題）この発明の目的は誤認識をなくして認識率の向上をはか
ることの出来る音声認識方式を提供することにあり、そ
の特徴は、音声区間検出時に、音声・やターンからノイ
ズノやターンを差し引くことによシ、音声区間検出をよ
り精度よく行ない、認識率を上げる手段を提供するもの
で、以下詳細に説明する。(Problem to be solved by the invention) The purpose of the present invention is to provide a speech recognition method that can improve the recognition rate by eliminating recognition errors. By subtracting turns, this method provides a means for detecting voice sections with higher accuracy and increasing the recognition rate, and will be described in detail below.

（発明の構成および作用）第６図（は、本発明のブロック図であり、１００は入力
端子、２００は周波数分析部、３００は対数変換部、４
００はス硬りトル変換部、５００は音声区間決定部であ
り、対数変換部データ部５０１、ノイズパターン検出部
５０２、減算回路５０３、乗算回路５０４、加算回路５
０５、除算回路５０６、Ｐ）ぐラメータメモリ５０７、
比較器１　５０８、ＦＬＡＧ　５０９、スムージング１
５１０、スムージング２　５１１、ブロック化５］２、
比較器２５１３、ブロック決定５１４、音声区間決定５
１５、ＭＡＸＢＬＫテーブル５１６から成る、６００は
再サンプル部、７００は距離演算部、８００は標準パタ
ーンメモリ、９００は判定部、１０００は認識結果出力
端子である。(Structure and operation of the invention) FIG. 6 is a block diagram of the present invention, 100 is an input terminal, 200 is a frequency analysis section, 300 is a logarithmic conversion section, 4
00 is a tone conversion section, 500 is a speech interval determination section, a logarithmic conversion section data section 501, a noise pattern detection section 502, a subtraction circuit 503, a multiplication circuit 504, an addition circuit 5
05, division circuit 506, P) parameter memory 507,
Comparator 1 508, FLAG 509, smoothing 1
510, Smoothing 2 511, Blocking 5] 2,
Comparator 2513, block determination 514, voice section determination 5
15, MAXBLK table 516, 600 is a resampling section, 700 is a distance calculation section, 800 is a standard pattern memory, 900 is a determination section, and 1000 is a recognition result output terminal.

このよう々構成において、入力端子１００から入力され
る入力音声信号は、周波数分析部２００に入力され、複
数の周波数帯域に対応した君子化信号Ｕ（ｉ、ｊ）とし
て周波数分析され、対数変換部３００に送られる。In such a configuration, an input audio signal inputted from the input terminal 100 is inputted to the frequency analysis section 200, frequency-analyzed as a regularized signal U(i, j) corresponding to a plurality of frequency bands, and then processed by the logarithmic conversion section. Sent to 300.

対数変換部３００に送られたデータは、スにクトル情報
と、・やワー情報等となり、スぜクトル変換部４００へ
はスにクトル情報、音声区間決定部５００へはスイクト
ル情報及び・ぐワー情報が送られる。The data sent to the logarithmic conversion section 300 becomes spectral information, . . . , word information, etc.; Information will be sent.

対数変換部３００では第（４）式の割算が行なわれる。The logarithmic conversion unit 300 performs division according to equation (4).

周波数分析データをｕに、Ｄとする。Let u and D be the frequency analysis data.

Ｕ（ｉ、Ｄ　””１〜１９　ｊ＝１〜■Ｏ≦ｕ（１，Ｄ
≦２０４７対数変換データを■（ｉ、Ｄとする。U(i, D ””1~19 j=1~■O≦u(1,D
≦2047 Let the logarithmic conversion data be ■(i, D.

Ｖ（ｉ、ｊ）　ｉ「９Ｊ＝１〜■ とこでｉは周波数（］　ｃｈ〜１９ｃｈ）を示し、Ｊは
時間（１フレーム〜（１）フレーム）を示す。寸だ前処
理部からの入力データをｕ（１，Ｄとする。Ｕ（１，；
）　＋−１〜１９　ｊ＝１〜ω　Ｏ≦Ｕ（１１ｊ）≦２
０４７　対数変換ビット数をＮＢとする。ここではＮＨ
−４である。V(i, j) i'9J=1~■ Here, i indicates the frequency (]ch to 19ch), and J indicates the time (1 frame to (1) frame). Input from the preprocessing section Let the data be u(1,D.U(1,;
) +-1~19 j=1~ω O≦U(11j)≦2
047 Let NB be the number of logarithmic conversion bits. Here N.H.
-4.

Ｕ（ｉ、ｊ）＞０ここて入カバターンの・ぐワーＰＯＷ（Ｊ）及び入力・
ぐターンの１０フレーム・やワーの計算式を第（５）式
。U (i, j) > 0 Here, input cover turn POW (J) and input
Equation (5) is the calculation formula for the 10th frame of the turn.

第（６）式で定義する。It is defined by equation (6).

ｋ　−＝　’（Ｊ−１）／］　Ｏ＋１但し、ｊ＝（ｋ−］）才１０＋１とする。k −=’(J-1)/] O+1 However, it is assumed that j=(k-]) 10+1.

ノイズレベルは第（７）式で定義する。The noise level is defined by equation (7).

ノイズレベル測定区間をに＝に、〜に２とした時、但し
、ｋ２＝に、＋２とするここで切り出しスライスレベルＬ１をＬ　１　＝　ＮＬＥＶＥＬ＋ＬＯとして、はじめてＰＯＷｌｏ（ｋ３）がＬｌよりも大き
くＰＯＷ　１０　（ｋ３　＋　１　）がＬｌよりも大き
い点に３から４０フレーム逆のぼったフレームＪｌヲｊ＋＝（ｋ３１）才１０＋１−４０として、仮の音声始りｈ１フレーム５ＴＦＲ１をＳＴＦ
Ｒｌ−Ｍｉ′Ａ（Ｊ、１）とする。When the noise level measurement interval is set to 2 and to 2, however, k2 is set to +2.Then, when the cutout slice level L1 is L1 = NLEVEL+LO, POWlo(k3) is larger than Ll for the first time.POW10 Assuming that the frame Jlwoj+=(k31) is 10+1-40, which is 3 to 40 frames backward at the point where (k3 + 1) is larger than Ll, the tentative voice start h1 frame 5TFR1 is converted into STF.
Let it be Rl-Mi'A(J, 1).

終端検出は１（４かに２＋１よりも大きく、かつＰＯＷ
　］、０　＜　ｋ４　）がＬｌよりも小さいか等しくな
った時に、仮の音声終端フレームＥＤＦＲ１をＥＤＦＲ
１＝　（ｋ４−１　）　＊　］　Ｏ−］＋９とする。Termination detection is 1 (4 or greater than 2+1 and POW
], 0 < k4 ) is smaller than or equal to Ll, the temporary audio end frame EDFR1 is set to EDFR
1=(k4-1)*]O-]+9.

さて、対数変換部３００より計算された対数変換デ゛−
タＶ（ｉ　、　ｊ　）は−１対数変換済デ一タ部５０１
へ送られた後、ノイズノやターンＮＰＡＴ（ｉ　）をめ
るためノイズパターン検出部５０２にて、ノイズパター
７　ＮＰＡＴ（ｉ　）を割算する。但し、ノイズレベル
測定区間をに＝に、−に２とした時、Ｊ２及びＪ３の値
を第（８）式において計算する。Now, the logarithmic transformation data calculated by the logarithmic transformation unit 300 is
The data V(i, j) is -1 logarithmically transformed data section 501
After being sent to the noise pattern 7 NPAT(i), the noise pattern detection unit 502 divides the noise pattern 7 NPAT(i) in order to calculate the noise pattern 7 NPAT(i). However, when the noise level measurement interval is set to 2 and -2, the values of J2 and J3 are calculated using equation (8).

ノイズパターンＮＰＡＴ（ｉ　）をめる式を第（９）式
に示す。The formula for calculating the noise pattern NPAT(i) is shown in formula (9).

ｊ＝ｓＴＦＲ１〜ＥＤＦＲ１次に、減算回路５０３、乗算回路５０４、力ｌｌ算回路
５０５、除算回路５０６、において、対数変換法データ
部５０１に格納されている■（ｉ、、ｉ）及びノイズ、
ノやターン検出部５０２において、第（９）式より１つ
たＮＰＡＴ（ｉ）を用い、ノイズ・ぐターンを差し引い
た・ぐワ一の割算を第００）式により行なう。j=sTFR1 to EDFR1 Next, in the subtraction circuit 503, the multiplication circuit 504, the force ll calculation circuit 505, and the division circuit 506,
In the turn detection section 502, NPAT(i), which is one from equation (9), is used to subtract the noise and turn, and then divide the noise and turn by equation 00).

Ｐ（、ｉ）＝上ｒ　（（：Ｖ（ｉ、、ｊ）−ＮＰＡＴ（
ｉ）／４）２→−９（１０）］９□テ・。P(,i)=upper r ((:V(i,,j)−NPAT(
i)/4)2→-9(10)]9□Te・.

２−Ｊ”−、（ｉ［Ｊ）式より寸ったＰ（、ｉ）はＰパ
ラメータメモリ５０７へ格納され、比較器］５０８によ
り次の第（１１）式の比較を行なう。2-J"-, (i[J) P(,i) obtained from the equation is stored in the P parameter memory 507, and the comparator] 508 performs a comparison according to the following equation (11).

第１、印式において、スライスレベルＬ２がＰ（ｊ）よ
りも大きい場合は、Ｆｒ、Ａｃ（ｊ）−ｏとする。また
Ｌ２がＰ（ｊ）よりも等しいか小さい場合はＦｂＡＧ（
Ｊ）＝　１とする。第（１１）式において決定されたＦ
ＬＡＧ（Ｊ）の値は、ＦＬＡＧ　５０９へ格納され、Ｆ
ＬＡＧ（Ｊ）の値−に応じて、スムージング１５１０あ
るいけスムージング２５１１へ送うれる。スムージング
１５」ＯではＦＬＡＧ（ｊｌ二〇の場合の操作を行ない
ＦＬＡＧ（、ｉ−］）　＝　０であり、ＦＬＡＧ（Ｊ＋
１　）−〇である時ば、ＦＬＡＧ（、ｒ）＝ｏとする。In the first type, when the slice level L2 is larger than P(j), Fr, Ac(j)-o is set. Also, if L2 is equal to or smaller than P(j), FbAG(
J) = 1. F determined in equation (11)
The value of LAG(J) is stored in FLAG 509 and
Depending on the value of LAG(J), it is sent to smoothing 1510 or smoothing 2511. In smoothing 15''O, perform the operation in the case of FLAG(jl20, FLAG(,i-]) = 0, and FLAG(J+
1) - 0, set FLAG(,r)=o.

寸だ、スムージング２５」１ではＦＬＡＧ　（：ｒ　）
＝１の砺合の操作を行ないＦＬＡＧ（ｊ−］　）＝１で
あり、ＦＬＡＧ（ｊ＋］　）　−１である時は、ＦＬＡ
Ｇ（ｊ　）　−１とする。Smoothing 25" 1 is FLAG (:r)
= 1, and when FLAG (j-] ) = 1 and FLAG (j+] ) -1, FLA
Let G(j) be −1.

次にブロック化５１２においてＦＬＡＧ（Ｊ）　＝　１
が４フレーム以」一連続し、その区間の、ｐｏｗ　１　
（ｚ）　−よΣ：Ｐ（ｊ）がｐ□ｗ　１　（／−）≧Ｌ
３、すなわちＰＯＷ　１．　（／ｉ）がスライスレベル
Ｌ３よりも太きいか等しい場合のものをブロック表する
。Next, in blocking 512, FLAG(J) = 1
is continuous for 4 frames or more, and pow 1 of that section
(z) −yoΣ:P(j) is p□w 1 (/−)≧L
3, i.e. POW 1. A block is represented when (/i) is thicker than or equal to slice level L3.

ブＯ，り数をＢＬＫＳとし、ブロックｌの先頭フレーム
ｆ：５（ｔ）、ブロックｌの最終フレームラＥ（，４）
どする。ブロックｌのノイズ・Ｐターンを差り引いた・
ぐワーＰ　（ｊ）の加算値は第（１２）式によりめられ
る。The first frame f of block l is 5(t), and the last frame of block l is E(,4).
What should I do? Subtracted the noise and P turn of block l.
The added value of the power P (j) is determined by equation (12).

ブロックｌのフレーム数は第（１３）式によりめられる
。The number of frames in block l is determined by equation (13).

ＦＲ］（ｔ）二Ｅ（７１−５（ｔ）＋　１　・　・　（
１３）寸だ、前プロ、り（１−１）との間隔は第（１４
）式によりめられる。FR](t)2E(71-5(t)+1 ・・ (
13) Sunda, former pro, the distance with Ri (1-1) is the (14th)
) is determined by the formula.

＝２（ｔ）＝ｓ（ｔ）−Ｅ（１−１，）　・・（１４）
ここでｔｌを音声先頭プロ、り、１２を音声最終ブロッ
クとして比較器２　５１３において、音声先頭プロ、り
ｔｌについては、第（Ｉ５）式の条件を満たしている限
りｔ、　＝ｔ１−１とする。=2(t)=s(t)-E(1-1,)...(14)
Here, tl is the audio beginning block, and 12 is the audio final block. In the comparator 2 513, as long as the audio beginning process, tl, satisfies the condition of equation (I5), t = t1-1. do.

ＦＲ２（４）≦ｉＶ［ＩＮ（ＰＯＷＩ（４］）／ＳＣ１
＋ＳＣ２，５Ｃ３）　−（１５）捷だ音声最終プロ、り
１２については、第（１Ｇ）式の条件を満たしている限
りＪ−２−ｔ２＋１とする。FR2(4)≦iV[IN(POWI(4))/SC1
+SC2, 5C3) - (15) For the final audio pro, ri12, set as J-2-t2+1 as long as the condition of equation (1G) is satisfied.

ＦＲ（２２＋１．）≦ＭＩＮ（ＰＯＷＩ（ｔｚ＋１）／
ＳＣ］＋ＳＣ２，５Ｃ３）−ｆ１６）ここでＳＣＩ〜Ｓ
Ｃ３は定数でありＳＣに１６　、　ＳＣ２二８゜５Ｃ３
−３０である。FR(22+1.)≦MIN(POWI(tz+1)/
SC]+SC2,5C3)-f16) Here SCI~S
C3 is a constant and SC is 16, SC228゜5C3
-30.

以上の式より、最大ブロックを中心に前後のブロックを
音声区間のブＯ７りとして取シ込むがどうかの判定を行
ない、音声区間として採用する。Based on the above formula, it is determined whether or not the blocks before and after the largest block are to be included as part of the voice section, and are adopted as the voice section.

このようにして決定された音声区１１４１プロ、り候補
である音声光［相］ブロックＬ１及び音声最終ブロック
・ｔ２の値はブロック決定５１４に送られる。The values of the audio optical [phase] block L1 and the audio final block t2, which are candidates for the audio area 1141, determined in this manner, are sent to the block determination 514.

次に音声区間決定４］７に用いる認識語の最大ブロック
数のテーブルＭＡＸＢ　ＬＫテーブル１１１８を説明す
る。Next, the table MAXB LK table 1118 for the maximum number of blocks of recognition words used for speech section determination 4]7 will be explained.

最大ブロック数ＭＡＸＢＬＫＯ例を第８図に示す。An example of the maximum number of blocks MAXBLKO is shown in FIG.

左側が力テコ’ＩＪ（１６語）を示し、右側ｔｑｔ：、
予め発声データからめた各力テコ゛りの最大ブロック数
を示す。これらの認識語セットの中で最大のＭＡＸＢＬ
Ｋをジぶ。例えば認識語の中に「モーイチドＪを含むな
ら１ＶＩＡＸＢ　ＬＫ二３とする。The left side shows power lever 'IJ (16 words), the right side tqt:,
It shows the maximum number of blocks for each force lever determined in advance from the vocalization data. MAXBL is the largest among these recognition word sets.
Give K. For example, if the recognized word contains "Moichido J", it will be 1VIAXB LK23.

音声区間決定部５１５において、ＢＬＫＳ≦ＭＡＸＢＬＫとする時、すなわちプロ、り数ＢＬＫＳが最大ブロック
数ＭＡＸＢＬＫよりも小さいか等しい場合であればすべ
てのプロ、７りを音声区間とする。逆にＢＬＫＳ　）　
ＭＡＸＢＬＫとする時、すなわちプロ、り数ＢＬＫが最大プロ。In the voice section determining unit 515, when BLKS≦MAXBLK, that is, when the number of blocks BLKS is smaller than or equal to the maximum number of blocks MAXBLK, all blocks are determined to be voice sections. On the contrary, BLKS)
When MAXBLK is set, that is, the number of BLK is maximum.

り数ＭＡＸＢＬＫ　、１：夕も大きい場合、例え（げ第
７図においてブロック数ＢＬＫＳ　＝　：３で）２大プ
ロ、り数ＭＡＸＢＬＫ　＝　２であれば■または■の紹
み合わせが考えられ、■及び■のブロックの組み合わせ
の各々ノハワーｐｐ　（ｔ）をめた後ＰＰの比較を行な
いプロ、りのパワーＰＰ（ｔ）が最大となるブロックの
組合せを音声区間とする。ブロックのパワーＰＰ（ｔ）
は第（１７）式によりめられる。If the number of blocks MAXBLK is 1: If the number of blocks is also large, for example (in Figure 7, the number of blocks BLKS = :3), if there are two major professionals, and the number of blocks MAXBLK = 2, the introduction of ■ or ■ is possible, and ■ After determining the power PP (t) of each of the combinations of blocks (2) and (2), the PPs are compared, and the combination of blocks for which the professional power PP (t) is maximized is determined as the voice section. Block power PP(t)
is determined by equation (17).

ｔ＝１〜ＢＬＫＳ−ＭＡＸＢＬＫ＋１第（Ｉ７）式よりめられたＳ（／ｌ−＋）は音声先頭プ
ロ、りであシ、Ｅ（２２）は音声最終ブロックとなり、
音声始端フレーム５ＴＦＲば　− ５ＴＦＲ，＝　Ｓ（Ｌ、）寸だ音声終端フレームＥＤＦＲはＥＤＦＲ二Ｅ（、−５２）となる。丑だ、入力・ぐターンフレーム数ＩＦＲは次の
第（１８）式で表わされる。t=1~BLKS-MAXBLK+1 S (/l-+) determined from equation (I7) is the audio beginning block, rear edge, E (22) is the audio final block,
The voice start frame 5TFR is -5TFR,=S(L,), and the voice end frame EDFR is EDFR2E(, -52). The input/turn frame number IFR is expressed by the following equation (18).

ＴＦＲ二ＥＤＦＲ−５ＴＦＲ＋　１　・・・（１８）処
理終了の判定は、音声最終ブロックｔ２が以下の第卸式
の条件を全て満たした時、処理を終了とする。TFR2EDFR-5TFR+1 (18) The determination of the end of the process is made when the final audio block t2 satisfies all the conditions of the following formula.

すなわち、Ｌ　１が１＜４　、に４’−１−１、ｋ４＋
２　、　ｋ４−１−３　。That is, L 1 is 1<4, 4'-1-1, k4+
2, k4-1-3.

ｋ　４　＋４　＋のいずれに対しても太きいか等しい場
合は、処理終了となる。If it is thick or equal to any of k 4 +4 +, the process ends.

−１だ第（Ｉ９）式の条件が満たされなかった場合は、
認識を打ち切りｐｏｗ】、ｏ（ｋ４）≦Ｌ１すなわちＬｌが大きいか等しくなる次のに４の値をめる
。-1 If the condition of formula (I9) is not satisfied,
Abort recognition pow], o(k4)≦L1 In other words, set the value of 4 to the next value where Ll is greater or equal.

このように決定された音声区間５ＴＦＲ及びＥ［）ＦＲ
は、スペクトル変換部４００から送られるＷ（ｉ、ｊ）
と同時に再サンプル部５００に送られる。再サンプル部
５００では、音声の時間軸の正規化を行われる。時間軸
の正規化の方法は従来公知の技術であり、リニアマツチ
ング方法で（は、音声区間を認識装置の条件によって定
められた一定数に、時間的に等間隔に分割、再サンプル
する方法である。Voice section 5TFR and E[)FR determined in this way
is W(i,j) sent from the spectrum conversion section 400
At the same time, it is sent to the re-sampling unit 500. The resampling unit 500 normalizes the time axis of the audio. The method of normalizing the time axis is a conventionally known technique, and is a linear matching method (a method in which the speech interval is divided into a fixed number determined by the conditions of the recognition device at equal intervals in time and resampled). It is.

そして、距離演算部７００において、同様に作成された
標準バタンメモリ８００の出力との距離演算を行ないそ
の結果を判定部９００へ送る。Then, the distance calculation section 700 performs a distance calculation with the output of the standard baton memory 800 created in the same manner, and sends the result to the determination section 900.

判定部９００ては、トータル距離との距離値の比較を行
ない、最も小さいトータル距離のカテがす名を認識結果
として、認識結果出力瑞子１０００から出力する。The determination unit 900 compares the distance value with the total distance, and outputs the category name with the smallest total distance as the recognition result from the recognition result output Mizuko 1000.

す、上器１明したように、本発明では、音声区間検出時
に音声パターンからノイズ・ぐターンを差し引くととに
より、音声区間検出をより精度よく行ない、認識率を上
げることができる。As mentioned above, in the present invention, by subtracting noise and patterns from the speech pattern when detecting a speech section, the speech section can be detected more accurately and the recognition rate can be increased.

（以下余白）（発明の効果）本発明は、音声区間検出の際に、音声のノイズパターン
の情報を音声パターン情報から差し引くととにより、音
声区間検出をより精度よく行なうことができ、音声認識
装置の認識性能を向上するのに効果がある。(The following is a margin) (Effects of the Invention) The present invention subtracts information on the noise pattern of the voice from the voice pattern information when detecting the voice zone, thereby making it possible to detect the voice zone more accurately and improve voice recognition. It is effective in improving the recognition performance of the device.

[Brief explanation of drawings]

第１図は従来の音声認識装置のブロック図、第２図は第
１図の周波数分析部の詳細ブロック図、第３図は第１図
の音声区間検出部のブロック図、第４図は第３図の・ぐ
ラメータ演算部の詳細ブロック図、第５図は第３図のプ
ロ、り化部の詳細図、第６図は本発明の音声認識装置の
プｒｊ７り図、第７図は音声区間のブロックの組合せを
示す図、第８図は音声の最大ブロック数を示す図である
。１・・入力端子、２・周波数分析部、３　・音声取込制
御部、４・・・取込開始信号、５・・音声区間検出部、
６・・・取込終了信号、７・・・始端・終端情報、８・
・・認識部、９・・・出力端子、１１・入力音声信号、
１２・・・前置増幅器、１３・−バンドパスフィルタ群
、■４・・全波整流器群、１５・ローパスフィルタ群、
１６・・マルチプレクサ、１７・・ＡＤ変換器、１８・
・・分析結果、２１・・・パラメータ演算部、２２・・
ブロック化部、２３・音声区間判定部、１０１，１．０
５゜１０９・・加算器、１０２，１１０・・・レジスタ
、１０３．１０８，１１１，１１３・・・乗算器、１０
４　補数器’１　１０６．１１２・・・除算器、１０７
・・・カウンタ、２００・　Ｐパラメータ演算部Ｊ、２
０１・絶対値回路、２０２，２０５・比較器、２０３，
２０７，２０９・　カウンタ、２０４．．２１２，２１
３・レジスタ、２０６・・ＡＮＤ回路、２０８，２１０
　減算器、２１１　・加算器、２１４・ブロックテーブ
ル、１００・入力端子、２００・・周波数分析部、３’
ＯＯ・・・対数変換部、４００・・・スペクトル変換部
、５００・・・音声区間決定部、５０１・・対数変換部
、５０２　ノイズ・々タン検出部、５０３・減算回路、
５０４・乗算回路、５０５　・加算回路、５０６・・・
除算回路、５０７・　Ｐパラメータメモリ、５０８・・
・比較器１．５０９・・・ＦＬＡＧ、、５１０・・スム
ージング１．５１１・・・スムージング２．５１２・・
・ブロック化、５１３・・比較器２　’、）１２　、、
　５１４・・ブロック決定、５１５・・・音声区間決定
、５１６・・・ＭＡＸＢＬＫ　、、　６００・・・再サ
ンプル部、７０〇　−距離演算部、、ＳＯＯ・・標準・
ぐタンメモリ、９００・・・判定部、、１０００・・認
識結果出力端子。特許出願人沖電気工業株式会社特許出願代理人弁理士　山　本　恵　− 第１図第２図第３図Fig. 1 is a block diagram of a conventional speech recognition device, Fig. 2 is a detailed block diagram of the frequency analysis section shown in Fig. 1, Fig. 3 is a block diagram of the speech interval detection section shown in Fig. Figure 3 is a detailed block diagram of the parameter calculation section, Figure 5 is a detailed diagram of the programmerization unit in Figure 3, Figure 6 is a detailed diagram of the speech recognition device of the present invention, and Figure 7 is FIG. 8 is a diagram showing the combination of blocks in the audio section, and FIG. 8 is a diagram showing the maximum number of audio blocks. 1. Input terminal, 2. Frequency analysis section, 3. Audio capture control section, 4. Capture start signal, 5. Voice section detection section,
6... Capturing end signal, 7... Starting end/terminating end information, 8...
... Recognition unit, 9... Output terminal, 11. Input audio signal,
12... Preamplifier, 13... Band pass filter group, ■4... Full wave rectifier group, 15... Low pass filter group,
16... Multiplexer, 17... AD converter, 18...
...Analysis results, 21...Parameter calculation section, 22...
Blocking unit, 23・Voice section determining unit, 101, 1.0
5゜109...adder, 102,110...register, 103.108,111,113...multiplier, 10
4 Complementer '1 106.112...Divider, 107
...Counter, 200・P parameter calculation section J, 2
01・Absolute value circuit, 202, 205・Comparator, 203,
207,209・Counter, 204. ．． 212, 21
3. Register, 206...AND circuit, 208, 210
Subtractor, 211 - Adder, 214 - Block table, 100 - Input terminal, 200... Frequency analysis section, 3'
OO: Logarithmic conversion unit, 400: Spectrum conversion unit, 500: Voice interval determination unit, 501: Logarithmic conversion unit, 502: Noise/tan detection unit, 503: Subtraction circuit,
504・Multiplication circuit, 505・Addition circuit, 506...
Division circuit, 507・P parameter memory, 508...
・Comparator 1.509...FLAG,, 510...Smoothing 1.511...Smoothing 2.512...
・Blocking, 513... Comparator 2', ) 12 ,,
514...Block determination, 515...Speech section determination, 516...MAXBLK,, 600...Re-sampling section, 700-distance calculation section, SOO...Standard...
900... Judgment unit, 1000... Recognition result output terminal. Patent applicant Oki Electric Industry Co., Ltd. Patent application agent Megumi Yamamoto - Figure 1 Figure 2 Figure 3

Claims

[Claims]

Analyze the frequency of the input audio signal, logarithmically transform the result,
The analyzed spectral characteristics are normalized and resampled to a certain data length to create the input audio pattern.
In a speech recognition method that calculates the distance between the pattern and the standard pattern and recognizes and determines the recognition category with the minimum distance; means for storing logarithmic transformation data; means for calculating the power obtained by subtracting the noise pattern from the voice/guitar; means for setting a voice section flag from the calculated power information and performing smoothing;
A speech recognition method comprising: means for determining a speech block candidate based on a speech section flag obtained by the processing; and means for determining a speech section using the speech block candidate while referring to a maximum block (MAXBLK) table. .