JPS6075900A

JPS6075900A - Word voice recognition equipment

Info

Publication number: JPS6075900A
Application number: JP58183841A
Authority: JP
Inventors: 光生下谷; 日比野　昌弘; 憲司嶋
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1983-09-30
Filing date: 1983-09-30
Publication date: 1985-04-30

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［発明の技術分野］この発明は、単語音声認ｍＷＡ＠に関し、特にたとえば
単語音声区間の始終端検出の改良に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a word speech recognition mWA@, and particularly relates to, for example, an improvement in detecting the beginning and end of a word speech section.

［従来技術］第・１図は従来の単語音声認識装置の一例を示す概略ブ
ロック図である。図において、マイクロホン１１から入
力された音声信号は、マイクロホンアンプ１２で増幅さ
れた後、ＡＧＣ回路１３に与えられる。このＡＧＣ回路
１３は、入力信号の大きさが変動しても、一定出力が得
られるように、その内部に備えられた増幅器の利得を自
動的に制御する回路である。Ａ　Ｇ　Ｃ回路１３の出力
は、Ａｙ’Ｄ変挨回路１７１に与えられ、ディジタル借
りに変換される。Ａ　、、／　Ｄ変換回路１４の出力は
、波形メｔす１５に与えられる。この波形メモリ１５は
、１フレームの入力波形データを一時記憶するメモリで
ある。波形メモリ１５の出力は、パワー計算回路２１に
りえられるとともに、特徴抽出部３に与えられる。パワ
ー針幹回路２１は、波形のパワー（電力）を計算する回
路である。パワー計算回路２１の出力は認識処理部６に
与えられるとともに、始終端検出回路２２に与えられる
。始終端検出回路２２は、音声信号の始＃端を検出する
回路であり、その出力は認識処理部６に与えられる。[Prior Art] FIG. 1 is a schematic block diagram showing an example of a conventional word speech recognition device. In the figure, an audio signal input from a microphone 11 is amplified by a microphone amplifier 12 and then provided to an AGC circuit 13. The AGC circuit 13 is a circuit that automatically controls the gain of an internal amplifier so that a constant output can be obtained even if the magnitude of the input signal changes. The output of the AGC circuit 13 is given to an Ay'D conversion circuit 171 and converted into a digital signal. The output of the A,.../D conversion circuit 14 is given to a waveform meter 15. This waveform memory 15 is a memory that temporarily stores one frame of input waveform data. The output of the waveform memory 15 is sent to the power calculation circuit 21 and also given to the feature extraction section 3. The power needle circuit 21 is a circuit that calculates the power of a waveform. The output of the power calculation circuit 21 is given to the recognition processing section 6 as well as to the start/end detection circuit 22 . The start/end detection circuit 22 is a circuit that detects the start/end of the audio signal, and its output is given to the recognition processing section 6.

一方、特徴抽出部３はディジタルフィルタなどを含んで
構成され、入力音Ｆｌ１１波形の特徴パラメータを抽出
する回路もある。特徴抽出部３の出力は認識処理部６に
与えられる。この認識処理部６には、入カバターンメモ
リ４および登録パターンメモリ５が接続される。入カバ
ターンメモリ４は、単語音声の認識モードにおいで、特
徴抽出部３ぐ分析抽出された認識すべき音声の￥１微パ
ラメータを一時記憶するメモリである。登録パターンノ
モリ５は、登録モードにおいて、分析抽出された登録語
の特徴パラメータあるいは標準音声の特徴パラメータを
予め記憶するメモリである。認識処理部６は、たとえば
マイクロプロセッサやマイクロコンビ平−夕などを含ん
で構成され、入カバターンメモリ４と登録パターンメモ
リ５内の特徴パラメータを用いて認識処珪を行なう回路
である。このような単語音声ｕＮ＠置においては、音声
区間をフレームと呼ばれる一定時ｌ１ｌＩ！１１１１ｉ
ｌに分割してフレームごとに音声の４？７ｔ＊抽出が行
なわれる。On the other hand, the feature extraction section 3 includes a digital filter and the like, and also includes a circuit for extracting feature parameters of the waveform of the input sound Fl11. The output of the feature extraction section 3 is given to the recognition processing section 6. An input pattern memory 4 and a registered pattern memory 5 are connected to this recognition processing section 6. The input cover turn memory 4 is a memory that temporarily stores minute parameters of the speech to be recognized that have been analyzed and extracted by the feature extraction section 3 in the word speech recognition mode. The registered pattern memory 5 is a memory that stores in advance characteristic parameters of registered words analyzed and extracted or characteristic parameters of standard speech in the registered mode. The recognition processing unit 6 is a circuit that includes, for example, a microprocessor, a microcombi processor, etc., and performs recognition processing using the characteristic parameters in the input pattern memory 4 and the registered pattern memory 5. In such a word speech uN@ arrangement, the speech interval is called a frame, which is a fixed period of time l1lI! 1111i
4-7t* extraction of audio is performed for each frame.

次に、第１図の回路の動作を説明する。マイクロホン１
１から入力された音声信号はマイクロホンアンプ１２．
ＡＧＣ回路１３．Ａ、’Ｄ変変向回路１４通って−Ｈ波
形メモリ１５に記憶される。Next, the operation of the circuit shown in FIG. 1 will be explained. Microphone 1
The audio signal input from microphone amplifier 12.
AGC circuit 13. The signals A and 'D are passed through the diversion circuit 14 and stored in the -H waveform memory 15.

特徴抽出部３は波形メモリ１５から１フレ一ム分の波形
データを受取り特徴パラメータの抽出を行なう。得られ
た特徴パラメータは、登録モードにおいてｔＪｆｌ録パ
ターンメモリ５に記憶される。一方、認識モー・ドにお
いては、１ｑられた特徴パラメータは一旦入力パターン
メモリ４に記憶され、その後認識処理部Ｃでパターンマ
ツチング等の手法により認識処理が行なわれる。The feature extractor 3 receives one frame of waveform data from the waveform memory 15 and extracts feature parameters. The obtained feature parameters are stored in the tJfl record pattern memory 5 in the registration mode. On the other hand, in the recognition mode, the 1q feature parameters are temporarily stored in the input pattern memory 4, and thereafter recognition processing is performed in the recognition processing section C using techniques such as pattern matching.

一方、始終端検出回路２２は、パワー計算回路２１が計
算する音声信号のパワーにもとつパいて、音用＠号区間
の始終端を検出する。認識処理部６は、この始終端検出
回路２２で規定される区間の音声信号を認識ずべき音声
ｆＨ号とし・て認識処理を行なう。On the other hand, the start/end detection circuit 22 detects the start/end of the sound @ section based on the power of the audio signal calculated by the power calculation circuit 21. The recognition processing unit 6 performs recognition processing on the audio signal in the section defined by the start/end detection circuit 22 as the audio fH that should not be recognized.

第２図は音声信号のパワー波形を示す図である。FIG. 2 is a diagram showing the power waveform of the audio signal.

この第２図を参照して、第゛１図に示す−始終制検出回
路２２の動作を説明する。始終＠Ｉ検出回路２２は、音
声信号のパワーが予め設定されたしきい値ｐｓを越える
と音声信号の始端を検出し、パワーが予め設定されたし
きい（ｉ＆　Ｐ　ｅ以下であるフレームが予め設定され
た区間Ｋｔｈだけ続くと音声信号の終端を検出する。こ
の例では、１り１とに２がそれぞれ音声信号の始端フレ
ームと終端フレームである。前述のように、認識処理部
６は始終ＩＩＮ検出回路２２で１．Ｑ定される音声信号
区間すなわちに１〜に２の区間を認識ずべき単ＨＨ音声
として認識処理づる。したがって、このような認識装謂
においては、音声信号の始終端検出の性能が認識結果に
大きな影響を与える。Referring to FIG. 2, the operation of the start-stop detection circuit 22 shown in FIG. 1 will be described. The start/end@I detection circuit 22 detects the beginning of the audio signal when the power of the audio signal exceeds a preset threshold ps, and detects frames whose power is less than the preset threshold (i&P e). When the set interval Kth continues, the end of the audio signal is detected.In this example, 1, 1, and 2 are the start and end frames of the audio signal, respectively.As mentioned above, the recognition processing unit 6 detects the end of the audio signal. The IIN detection circuit 22 recognizes the audio signal section determined by 1.Q, that is, the section 1 to 2, as a single HH speech that should not be recognized. Detection performance has a large impact on recognition results.

第３図ＩＪ音声に騒音がカ［Ｊわった場合の音声信号の
パワー波形を示す図である。この波形の音声信号区間（
ｊ第２図に示すように、１（１〜に２であるにもかかわ
らず、従来装置の始終＠検出方法ではに３〜に４が音声
信号区間、であると検出する。このように、ｆ、Ｙ来の
音声認識装置は、騒音が強い環境下においては、中８８
音声の始終端検出が正確に行なわれづ゛、認識性能が下
がるという欠点があった。FIG. 3 is a diagram showing the power waveform of the audio signal when noise is added to the IJ audio. The audio signal section of this waveform (
As shown in Fig. 2, even though 1 (1 to 2), the beginning/end detection method of the conventional device detects that 3 to 4 are audio signal sections.In this way, f, Y The conventional voice recognition device is difficult to use in a noisy environment.
This method has the disadvantage that recognition performance is degraded because the beginning and end of the voice cannot be detected accurately.

［発明の概要〕この発明は、上述のような従来の装置の欠点を除去する
ためになされＩこもので、単重音声の始終端を予め設定
された範囲内の自己相関関数の最大値（以下ＣＯＲＩ〜
１△Ｘと称す）を用いて検出することにＪ：すＲ１音が
大きいＩππ上下おいても始終端検出を正確に行ない得
て、認ｖＡ姓能の優れた音声認識装置をｊ工供す゛るこ
とを目的とする。[Summary of the Invention] The present invention was made in order to eliminate the drawbacks of the conventional device as described above. CORI~
1△ The purpose is to

［Ｒ明の実施例］第４図はこの発明の一実施例を示！１′慨略ブロック図
である。図において、この第４図の実施例は、以下の点
を除いて第１図の回路と同様であり、相当づる部分には
１ｆｆｉＪ條の参照画＠を付しその説明を省略する。こ
の第４図の実施例では、ＣＯＲＭ　Ａ×計算回路２３が
段けらる。このＣＯＲＭＡＸ計算回路２３は、たとえば
乗算器や加算器からなる自己相関器を含んで偶成され、
波形メ［す１５ＩＪ１ら入力される音声旧＠波形のＣＯ
ＲＩｖｌ　ＡＸを計算する。ｇｌ算さねたＣＯＲＭＡＸ
は、始終端検出回路２２０の一方人力に与えられるとと
もに、認識処理部６に与えられる。始ｉｉ’ｔＬ端検出
回路２２０の他方入力には、パワー計算回路２１の出力
が与えられる。すなわち、この実施例の特徴は、パワー
計算回１２”ｌて計算されたパワーとＣＯＲＭＡＸ計算
回路２３で「１専されたＣ　ＯＲＭ　Ａ　Ｘとに基づい
て、音声信号の始終端を検出することである。[Embodiment of R-light] Figure 4 shows an embodiment of this invention! 1' is a schematic block diagram. In the figure, the embodiment of FIG. 4 is the same as the circuit of FIG. 1 except for the following points, and corresponding parts are marked with the reference image @ of 1ffiJ and their explanation will be omitted. In the embodiment of FIG. 4, a CORM A× calculation circuit 23 is provided. This CORMAX calculation circuit 23 is configured to include an autocorrelator consisting of a multiplier and an adder, for example, and
Audio old @ waveform CO input from waveform menu
Calculate RIvl AX. GL calculation CORMAX
is given to the start/end detection circuit 220 manually and also to the recognition processing section 6. The output of the power calculation circuit 21 is given to the other input of the start ii'tL edge detection circuit 220. That is, the feature of this embodiment is that the start and end ends of the audio signal are detected based on the power calculated by the power calculation circuit 23 and the C ORMA be.

次に、ＣＯＲＭ、へＸについて説明する。１フレ一ム分
の波形データを×　（１）、＜１−１．２゜・・弓ｆ　
）とするとパワーＰは次式（１）で表わされる。Next, CORM and toX will be explained. Waveform data for one frame × (1), <1-1.2°...bow f
), the power P is expressed by the following equation (1).

τ次の自己相関関数ＣＯＲ（τ）は次式（２）％式％（２）ＣＯＲｈｌ　Ａ　Ｘをめるために設定した自己相関関数
の区間を次数７８〜τｅ　（τＳ〜τｅ）とするとＣＯ
Ｒｈ＝Ｉ　Ａ　Ｘは次式（３）で表わされる。The τ-order autocorrelation function COR (τ) is calculated by the following formula (2)% Formula % (2) If the interval of the autocorrelation function set to calculate CORhl A
Rh=I A X is expressed by the following equation (3).

Ｃ０Ｒｔｖｌ　ＡＸ＝ｋｌＩＡＸ　［ＣＯＲ（ｒ　）　
１　・　（３）パワーの大きさが同Ｕ波形であっても母
音などのピッチ性の強い波形はＣＯＲＭ△Ｘは大きく、
白色雑音に近い環境Ｕ音などの波形はＣＯＲＭ　Ａ　Ｘ
は小さい。この１そ明は、このことを利用して音声信号
の始終端検出を行なうものである。C0Rtvl AX=klIAX [COR(r)
1. (3) Even if the U waveform has the same power level, CORM△X will be large for waveforms with strong pitch characteristics such as vowels.
Waveforms such as environmental U sounds that are close to white noise are CORM A
is small. This first step utilizes this fact to detect the beginning and end of the audio signal.

第５図は音声に騒音を加えた場合のＣＯＲＭ　ＡＸ波形
を示す図であるが、図示のようにＣＯＲＭＡＸ波形ては
騒音の影響が緩和されている。したがって、第１因の回
路と同様にしきい値弁別を行なって始終端を検出し１ζ
場合、音声信号区間はに５〜トロとなり、従来の装置に
比べて正確に始終端の検出を行なうことができる。この
弁明では、ＣＯＲ〜１ＡＸのみに基づいて単語音声の始
終端を検出するようにしてもＪ：０、しかしながら、始
終端の検出の要素として、Ｃ０ＲＰ、ＬＡＸだけでなく
音声１３号のパワーやレベルなど波形の大きさに対応す
る舟を組合わせて用いると、さｒうに正確な始終端の検
出が行ない得るっそこで、第４図の実施例では、始終端
の検出の要素として音声信号のパワーと、ＣＯＲＭＡＸ
とを用いている。FIG. 5 is a diagram showing the CORM AX waveform when noise is added to the voice, but as shown in the figure, the influence of the noise is alleviated in the CORMAX waveform. Therefore, similar to the circuit of the first cause, threshold discrimination is performed to detect the beginning and end of the 1ζ
In this case, the audio signal section is from 5 to 30 seconds, and the start and end points can be detected more accurately than conventional devices. In this defense, even if the beginning and end of a word sound is detected based only on COR~1AX, J:0 is used. By using a combination of signals corresponding to waveform sizes such as and CORMAX
and is used.

すなわら、第４図の実ｉ例では、音声信号のパワーが予
め設定されたしきい値ｐｓ以上でしかもＣＯＲＭＡＸに
予め設定された定数Ｃｓを掛けた値がパワー以上である
フレームを音声信号の始端フレームとし、パワーが予め
設定されたしきい値ｐｅ以下であをかＣＯＲＭ　Ａ　Ｘ
に予め設定した定数Ｃｅ＠掛けた値がパワー以下である
かの少なくとも一方をＩｎ定するフレームを無音フレー
ムとし、この無音フレームが予めεす定されたフームＨ
Ｋｔｈだけ続くど、音声信りの終端を検出し、最初の無
音フレームを音声信号の終端フレームとしている。In other words, in the example i of FIG. 4, a frame in which the power of the audio signal is greater than or equal to a preset threshold ps and the value obtained by multiplying CORMAX by a preset constant Cs is greater than or equal to the power is designated as an audio signal. CORM A
A frame in which at least one of the values multiplied by a preset constant Ce @ is less than or equal to the power is defined as a silent frame, and this silent frame is set as a frame H in which ε is determined in advance.
Although the signal continues for Kth, the end of the voice signal is detected and the first silent frame is taken as the end frame of the voice signal.

次に、第４図の実施例のさらに詳細な動作を説明する。Next, a more detailed operation of the embodiment shown in FIG. 4 will be explained.

パワー計尊回路２１およびＣＯＲＭ　Ａ　Ｘ計算回路２
３は、波形メモリ１５から１フレ一ム分の波形データを
受取り、それぞれ、第（１）式および第（２）式の計算
を行ない、パワーとＣＯＲＭ　Ａ　Ｘを始終端検出回路
２２０に送る。始終端検出回路２２０は、パワーとＣ０
ＲＮ・ＩＡＸを用いて音声信号の始終端判定を上述の方
法によって行ない、その結果を認識処理部６に与える。Power measurement circuit 21 and CORM A X calculation circuit 2
3 receives one frame of waveform data from the waveform memory 15, calculates the equations (1) and (2), and sends the power and CORM A X to the start/end detection circuit 220. The start/end detection circuit 220 detects power and C0.
Using the RN/IAX, the start and end of the audio signal is determined by the method described above, and the results are provided to the recognition processing section 6.

認識処理部６では、始終端検出回路２２０によって検出
された始端から終端までの間の音声信号を入カバターン
メモリ４あるいは登録パターンメモリ５に格納し、認識
処理を行なう。なお、その他の動作は、第２図に示す従
来装置と同様である。The recognition processing section 6 stores the audio signal from the start to the end detected by the start/end detection circuit 220 in the input pattern memory 4 or the registered pattern memory 5, and performs recognition processing. Note that the other operations are similar to those of the conventional device shown in FIG.

なお、他の実施例として、始終端検出回路２２０におけ
る始終端の検出は、パワーが予め設定されたしきい値２
３以上でかつＣＯＲＭ　Ａ　Ｘが予め設定されたしきい
値ＣＭＳ以上であるフレームを始端フレームとして検出
し、パワーが予め設定されたしきい値ｐｅ以下であるか
ＣＯＲＭ　Ａ　Ｘが予め設定されたしきいｌｌＩＩＣＭ
ｅ以下であるか少なくとも一方を満足するフレームを無
音フレームとし、この無音フレームが予め設定されたフ
レーム数Ｋｔｈだけ続くと１／′４端を検出し最初の無
音フレームを音声信号の終端フレームとするようにして
もよい。In addition, as another embodiment, the detection of the start and end edges in the start and end edge detection circuit 220 is performed using a threshold value 2 whose power is set in advance.
3 or more and CORM A KillIICM
A frame that satisfies at least one of e or less is defined as a silent frame, and when this silent frame continues for a preset number of frames Kth, the 1/'4 end is detected and the first silent frame is defined as the end frame of the audio signal. You can do it like this.

この場合も８！Ｔ４図の実施例と同様の効果を奏するこ
とはちらろんである。In this case too, 8! It is likely that the same effect as the embodiment shown in the T4 diagram can be achieved.

さらに池の実施例として、始端を検出する場合、パワー
の値が１フレーム前のパワーの値よりも大きいという条
件を加えてもよく、この５月合は始端検出能力を向上す
ることができる。Further, as an example of the pond, when detecting the starting edge, a condition may be added that the power value is larger than the power value of one frame before, and this May conjunction can improve the ability to detect the starting edge.

また、上述の実施例では、始終端を検出するための１要
素として音声信号のパワーを用いるようにしているが、
このパワーに代えてその他音声信号の波形の大きさを表
わすｍ（波形のレベルなど）を針線して始終端検出のた
めの要素として用いるようにしてもよい。Furthermore, in the above embodiment, the power of the audio signal is used as one element for detecting the beginning and end.
Instead of this power, m (waveform level, etc.) representing the size of the waveform of the audio signal may be used as an element for detecting the beginning and end.

また、上）ホの実施例では、パワーが予め設定されたし
きい値１）Ｓ以上ＣかつＣＯＲＭＡＸに予め設定された
定数Ｃ３を掛けた値がパワー以上であるフレームを音声
信号の始端フレームとしたが、始端フレームはこの近１
考のフレームにしても差Ｌ７支えない。In addition, in the embodiment of (above) E, a frame in which the power is greater than or equal to a preset threshold value 1) S and the value obtained by multiplying CORMAX by a preset constant C3 is greater than or equal to the power is considered to be the starting frame of the audio signal. However, the starting frame is around 1
Even if it is a frame of thought, the difference L7 does not support it.

また、上述の実施例では、音声信号の終端フレームを最
初の無音フレームとしたが、終端７１ノームはこの近傍
のフレームでも差し支えない。Further, in the above-described embodiment, the last frame of the audio signal is the first silent frame, but the last 71 norm may be a frame in the vicinity of this frame.

さらに、上述の実施例Ｃは、説明の都合ト単語音声認識
装置を特定話者登録望としたが、予め標準音声の特徴を
登録パターンメモリに登録している不特定話者用の単量
音声認識装置であってももちろんよい。Furthermore, in the above-mentioned embodiment C, for convenience of explanation, the word speech recognition device is intended to be registered to a specific speaker, but it is also possible to use a single speech recognition device for unspecified speakers whose characteristics of standard speech are registered in advance in the registration pattern memory. Of course, it may be a recognition device.

［発明の効果］以上のように、この発明によれば、ＣＯＲＭ　ＡＸに基
づいて単語音声の始終端を検出するようにしたので、騒
音が大きい環境下でも音ＦｔＴ　［８号の始終端の検出
を正確に行なうことができ、音声認識％ｉ置の認識性能
を高めることができる。[Effects of the Invention] As described above, according to the present invention, since the beginning and end of word sounds are detected based on CORM AX, detection of the beginning and end of sound FtT [No. 8] is possible even in a noisy environment. can be performed accurately, and the recognition performance of speech recognition can be improved.

[Brief explanation of drawings]

第１図は従来の単筒音声認識装置の一例を示す概略ブロ
ック図である。、第２図は音声信号のパワー波形を示す
図である。第３図は音声に騒音が加わった場合のパワー
波形を示す図である。、第４図はこの発明の一実施例を
示ず概略ブロック図である。第５図は音声に騒音をｈｌ
′Ｉえｌ：虐ｎのＣＯＲ１ＬＩＡＸ波形を示す図である
。図において、３は特徴抽出部、４は入カバターンメモリ
、５は登録パターンメモリ、６は認識処理部、１１はマ
イクロホン、２１１ｔパワ一計棹回路、２３はＣＯＲＭ
ＡＸ計算回路、２２０は始終端検出回路を示す。代　理　人　大　岩　増　維FIG. 1 is a schematic block diagram showing an example of a conventional single-tube speech recognition device. , FIG. 2 is a diagram showing the power waveform of the audio signal. FIG. 3 is a diagram showing a power waveform when noise is added to voice. , FIG. 4 is a schematic block diagram showing one embodiment of the present invention. Figure 5 shows how noise is added to audio.
It is a figure which shows the COR1LIAX waveform of 'Iel:Ara-n'. In the figure, 3 is a feature extraction unit, 4 is an input pattern memory, 5 is a registered pattern memory, 6 is a recognition processing unit, 11 is a microphone, 211t power unit circuit, 23 is a CORM
AX calculation circuit; 220 indicates a start/end detection circuit; Representative: Masu Oiwa

Claims

[Scope of Claims] (1) Extract the feature parameters of the input speech signal, and determine the word speech based on the degree of similarity between the extracted feature parameters and the feature parameters of a plurality of pre-registered word sounds. In a word speech recognition device that performs a recognition process, in order to define nine sections of a voice message in which the recognition process is performed,
A start/end detecting means for detecting the start/end of the input audio signal is provided, and the start/end @ detecting means measures the maximum value CORMAX within a predetermined range of the autocorrelation function of the input audio signal. A word speech WX recognition device comprising: CORMΔX calculation means for calculating CORMΔX; and start nm determining means for determining the start and end of a speech signal based on the CORMAX. (2) The start/end end determining means determines the start end of the audio signal section on the condition that the CORMAX exceeds a preset CORMAX threshold. Word speech recognition device, equipment. (3) The start/end determining means determines the end of the audio signal section on the condition that the input waveform in which the CORMAX is smaller than a preset threshold value of CORNIA X continues for a preset section. The word speech recognition device according to claim 1 or 2, characterized in that the word speech recognition device determines the number of words. (4) The start/end detection means further includes a level calculation means for calculating n by 1 corresponding to the size of the waveform of the input audio signal, and the start/end determination means calculates the value of the CORMAX and the level calculation means. Word acoustic recognition 1i according to claim 1, which determines the beginning and end of an audio signal based on the output.
ii Haze. (5) The word speech recognition device according to claim 1, wherein the level measuring means includes means for calculating the power of the input speech signal. (6) The start/end determining means determines that the power calculated by the power calculating means exceeds a preset power threshold and that the CORM A
6. The word speech recognition device according to claim 5, wherein the start end of the speech recognition section is determined on the condition that a threshold value of R to 1ΔX is exceeded. (7) The four stages of the start and end determiners are arranged so that the power calculated by the power calculation means exceeds a preset power threshold, and the value obtained by multiplying the CORMAX by a preset constant is less than the power. Provided that it is large, the voice (
The word speech recognition device according to claim 5, characterized in that the start end of section No. 3 is determined. (8) The start/end determining means at least determines that the CORMAX is smaller than a preset CORMAX threshold, and that the power calculated by the power calculation means is smaller than a preset power threshold. 6. The word speech mWA device according to claim 5, wherein the end of an audio signal section is determined on the condition that an input waveform satisfying either one of the conditions continues for a preset section. (9) The start/end determining means determines that the power calculated by the power calculating means is smaller than a preset power threshold and that the CORMAX is a preset number. The word according to claim 5, characterized in that the end of the audio signal section is determined on the condition that the input waveform satisfying at least one of the conditions of being smaller continues for a preset section. Audio mtA@M.