JPH0713585A

JPH0713585A - Speech section segmentation device

Info

Publication number: JPH0713585A
Application number: JP5149030A
Authority: JP
Inventors: Seiya Kato; 誠也加藤
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 1993-06-21
Filing date: 1993-06-21
Publication date: 1995-01-17

Abstract

PURPOSE:To provide the speech section segmentation device which correctly segments a speech section from an input signal containing a speech. CONSTITUTION:The input signal containing the speech is stored in a data buffer 13 and an amplitude threshold value At is found on the basis of the noise amplitude in a voiceless section and stored in an amplitude threshold buffer 17; and a count threshold value Nt is found on the basis of how many times the input signal in the voiceless section exceeds the amplitude threshold value At and stored in a count value threshold buffer 21. An amplitude comparison part 16 compares the input signal stored in the data buffer 13 with the amplitude threshold value At and a counter 19 counts how many times the input signal exceeds the amplitude threshold value At; and a count value comparison part 20 compares the count value N with the count threshold value Nt and the speech section of the input signal is determined according to the comparison result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識を行うための
音声区間を切出す音声区間切出し装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice section cutting device for cutting a voice section for voice recognition.

【０００２】[0002]

【従来の技術】従来、機械に対しては人間が直接ハンド
ルなどを操作することで所定の動作を行わせるようにし
ていたが、最近になって、機械に対して直接音声を入力
することで指令を与え、所定の動作を行わせる試みがな
されており、このための音声認識装置がいろいろ開発さ
れている。2. Description of the Related Art Conventionally, a human being has been made to perform a predetermined operation by directly operating a handle or the like, but recently, by directly inputting a voice to the machine. Attempts have been made to give a command to perform a predetermined operation, and various voice recognition devices have been developed for this purpose.

【０００３】しかして、従来、この種の音声認識装置と
して、図９に示すように音声入力をマイクロフォン１か
ら入力すると、この音声入力をＡ／Ｄ変換器２でＡ／Ｄ
変換し、この変換データを音声区間切出し部３に与え音
声区間を切出し、これら区間について音響分析部４によ
り連続的に周波数解析などの音響分析を行いパラメータ
化して比較部５に与え、ここで、予め用意したパラメー
タ辞書６の辞書パラメータと比較して、最も類似したも
のを認識結果として表示部７に出力するようにしたもの
がある。Conventionally, as a voice recognition device of this type, when a voice input is input from a microphone 1 as shown in FIG. 9, this voice input is A / D converted by an A / D converter 2.
After the conversion, the converted data is given to the voice section cutout unit 3, the voice section is cut out, and the acoustic analysis unit 4 continuously performs acoustic analysis such as frequency analysis on these sections and parameterizes them and gives them to the comparison unit 5. There is a method in which the most similar one is output as a recognition result to the display unit 7 by comparing with the dictionary parameter of the parameter dictionary 6 prepared in advance.

【０００４】この場合の認識結果は、切出される音声区
間によって異なってくる。つまり、切出される音声区間
が異なれば、当然パラメータの値も異なるものになるた
め、正しい音声区間が切出せない場合は、誤認識を生じ
ることになる。The recognition result in this case varies depending on the voice segment to be cut out. That is, if the voice segment to be cut out is different, the value of the parameter is naturally different, and if the correct voice segment cannot be cut out, erroneous recognition will occur.

【０００５】ところが、実際に音声データより音声区間
を切出す場合、マイクロフォン１から入力される音声に
は、発声者の周囲の騒音（ノイズ）が含まれるので、正
しく音声区間を切出すのが難しいことがある。However, when the voice section is actually cut out from the voice data, it is difficult to correctly cut out the voice section because the voice input from the microphone 1 includes noise around the speaker. Sometimes.

【０００６】そこで、従来、音声区間切出し装置とし
て、図１０に示すように入力信号をクロック発生部３１
からのクロック信号を用いてＡ／Ｄ変換器３２でＡ／Ｄ
変換し、このデータをバッファメモリ３３に書き込む。
また、フレームデータカウンタ３４の出力によりアドレ
ス生成部３５より所定の時間区間Ｔ（１フレーム）単位
のアドレスを出力しバッファメモリ３３より音声データ
を読み出し、振幅比較部３６に与える。そして、この振
幅比較部３６にて、予め無音声区間でのバッファメモリ
３３の出力（周囲ノイズ）より振幅閾値計算部３７で求
められ、閾値バッファ３８に記憶された振幅閾値と比較
し、この比較結果から音声区間を検出するようにしたも
のがある。Therefore, conventionally, as a voice segment cutout device, as shown in FIG.
A / D converter 32 using the clock signal from
Convert and write this data in the buffer memory 33.
Further, the output of the frame data counter 34 outputs an address of a predetermined time interval T (one frame) unit from the address generation unit 35, reads the audio data from the buffer memory 33, and supplies the audio data to the amplitude comparison unit 36. Then, the amplitude comparison unit 36 compares the amplitude threshold value calculated by the amplitude threshold value calculation unit 37 from the output (ambient noise) of the buffer memory 33 in the non-voice section in advance and stored in the threshold value buffer 38, and the comparison is made. There is one that detects the voice section from the result.

【０００７】そして、このような音声区間切出し装置で
は、図１１に示すように、予め無音声区間のノイズに基
づき設定された振幅閾値Ｓに対して入力信号Ｉが与えら
れると、入力信号Ｉの振幅が、最初に閾値Ｓを越えた点
を音声の始まりの点ａとし、また、入力信号Ｉの振幅
が、最後に閾値Ｓを越えた点をｂとして、この点ｂから
数ポイント後に音声の終りの点ｃを設定し、これらａ点
とｃ点の区間を音声区間とするようにしている。Then, in such a voice segment cutting device, as shown in FIG. 11, when the input signal I is given to the amplitude threshold S which is set in advance based on the noise in the non-voice segment, the input signal I The point at which the amplitude first exceeds the threshold value S is the beginning point a of the voice, and the point at which the amplitude of the input signal I finally exceeds the threshold value S is b. The end point c is set, and the section between the points a and c is set as the voice section.

【０００８】[0008]

【発明が解決しようとする課題】ところが、このような
音声区間の切出し装置では、振幅閾値Ｓを無音声区間の
周囲ノイズに基づいて設定する関係でバラツキが大きい
ことから、仮に、図１２（ａ）に示すように閾値Ｓが大
きめに設定され、音声信号として最初部分の振幅が小さ
いものが与えられると、この部分が閾値Ｓを越えないこ
とがあり、本来の音声区間の始まり点ａを、これより後
の点ａ´を音声の始まり点と誤検出してしまい、正しい
音声区間を切出しできないことがあり、また、同図
（ｂ）に示すように閾値Ｓが小さめに設定されると、音
声信号でないノイズ部分が閾値Ｓを越えてしまい、本来
の音声区間の始まり点ａを、これより後の点ａ´´を音
声の始まり点と誤検出し、この場合も正しい音声区間を
切出しできないという問題点があった。However, in such a voice segment cutout device, since there is a large variation in the relation that the amplitude threshold S is set based on the ambient noise in the non-voice segment, it is assumed that FIG. ), If the threshold value S is set to a large value and a voice signal having a small amplitude at the first portion is given, this portion may not exceed the threshold value S, and the starting point a of the original voice section is The point a ′ after this may be erroneously detected as the start point of the voice, and the correct voice section may not be cut out. If the threshold value S is set to be small as shown in FIG. Since the noise part which is not the voice signal exceeds the threshold value S, the starting point a of the original voice section is erroneously detected as the starting point a of the voice, and the correct voice section cannot be cut out in this case as well. Question There was a point.

【０００９】本発明は、上記事情に鑑みてなされたもの
で、音声を含む入力信号から正しく音声区間を切出しす
ることができる音声区間切出し装置を提供することを目
的とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a voice section cutout device which can correctly cut out a voice section from an input signal containing a voice.

【００１０】[0010]

【課題を解決するための手段】本発明は、音声を含む入
力信号を記憶する記憶手段と、前記入力信号と振幅閾値
を比較する比較手段と、前記入力信号が前記振幅閾値を
越える回数をカウントするカウンタ手段と、このカウン
タ手段のカウント値とカウント閾値を比較するカウント
値比較手段と、このカウント値比較手段の比較結果から
前記入力信号の音声区間を決定する音声区間決定手段と
により構成されている。According to the present invention, a storage means for storing an input signal containing a voice, a comparing means for comparing the input signal with an amplitude threshold value, and a number of times the input signal exceeds the amplitude threshold value are counted. Counter means, a count value comparison means for comparing the count value of the counter means with a count threshold value, and a voice section determination means for determining the voice section of the input signal from the comparison result of the count value comparison means. There is.

【００１１】また、前記閾値は、入力信号の無音声区間
でのノイズの平均振幅に所定の値を加えて該振幅閾値が
設定されることを特徴としている。また、前記カウント
閾値は、無音声区間を所定の時間区間に区分し、これら
時間区間について入力信号が振幅閾値を越える回数をカ
ウントし、これらカウント値の平均に所定の値を加えて
該カウント閾値が設定されることを特徴としている。Further, the threshold value is characterized in that the amplitude threshold value is set by adding a predetermined value to the average amplitude of noise in the non-voice section of the input signal. Further, the count threshold is obtained by dividing the non-voice section into predetermined time sections, counting the number of times the input signal exceeds the amplitude threshold for these time sections, and adding a predetermined value to the average of these count values. Is set.

【００１２】また、音声区間決定手段は、カウント値比
較手段の比較結果として同じ状態が複数回連続して得ら
れることを条件に入力信号の音声区間を決定することを
特徴としている。Further, the voice section determining means is characterized in that the voice section of the input signal is determined on condition that the same state is continuously obtained a plurality of times as the comparison result of the count value comparing means.

【００１３】[0013]

【作用】この結果、本発明によれば音声を含む入力信号
の無音声区間のノイズ振幅に基づいて設定される振幅閾
値と、前記無音声区間での入力信号が前記振幅閾値を越
える回数に基づいて設定されるカウント閾値を用い、入
力信号が振幅閾値を越える回数とカウント閾値の比較結
果から入力信号の音声区間を決定するようになる。これ
により、音声区間以外での入力信号が振幅閾値を越える
回数を、音声区間に比べて明らかに少なくできるように
なり、音声区間以外のノイズ部分を音声区間と誤って検
出するのを防止できる。As a result, according to the present invention, the amplitude threshold is set based on the noise amplitude of the input signal including voice in the non-voice section, and the number of times the input signal in the non-voice section exceeds the amplitude threshold. By using the count threshold set by the above, the voice section of the input signal is determined from the number of times the input signal exceeds the amplitude threshold and the comparison result of the count threshold. As a result, the number of times that the input signal exceeds the amplitude threshold in a portion other than the voice section can be clearly reduced as compared with the voice section, and it is possible to prevent a noise part other than the voice section from being erroneously detected as the voice section.

【００１４】[0014]

【実施例】以下、本発明の実施例を図面に従い説明す
る。（第１実施例）図１は、同実施例の概略構成を示してい
る。図において、１１はＡ／Ｄ変換器で、このＡ／Ｄ変
換器１１は、音声を含む入力信号が与えられ、この入力
信号をクロック発生部１２のクロック信号を用いてＡ／
Ｄ変換し、このデータをバッファメモリ１３に書き込む
ようにしている。Embodiments of the present invention will be described below with reference to the drawings. (First Embodiment) FIG. 1 shows a schematic structure of the first embodiment. In the figure, reference numeral 11 is an A / D converter, which is supplied with an input signal including voice, and this input signal is converted into an A / D signal by using a clock signal of a clock generator 12.
The data is D-converted and this data is written in the buffer memory 13.

【００１５】また、クロック発生部１２のクロック信号
は、フレームデータカウンタ１４にも与えられる。この
フレームデータカウンタ１４は、所定の時間区間Ｔ（１
フレーム）のデータ数をカウントするもので、このフレ
ームデータカウンタ１４の出力によりアドレス生成部１
５でアドレスを生成し、バッファメモリ１３よりデータ
を読み出すようにしている。この場合、バッファメモリ
１３は、後述する閾値の設定では、時間区間Ｔの間隔で
アドレスが指定されてデータ読み出しが行われ、音声区
間の始まり点または終り点の決定では、時間区間Ｔを時
間軸に沿って所定時間だけシフトしつつアドレスが指定
されてデータ読み出しが行われるようになっている。The clock signal of the clock generator 12 is also given to the frame data counter 14. The frame data counter 14 has a predetermined time interval T (1
The frame data counter 14 counts the number of frames, and the output of the frame data counter 14 causes the address generator 1
An address is generated in 5, and the data is read from the buffer memory 13. In this case, in the buffer memory 13, in the setting of the threshold value described later, the data is read by specifying the address at intervals of the time section T, and when determining the start point or the end point of the voice section, the time section T is set as the time axis. The data is read out while the address is specified while shifting for a predetermined time along.

【００１６】なお、フレームデータカウンタ１４は、時
間区間Ｔ（１フレーム）のデータをカウントする度にそ
の内容をクリアするようにしている。バッファメモリ１
３より読み出されたデータは、振幅比較部１６に与えら
れる。この振幅比較部１６は、バッファメモリ１３より
読み出されたデータＡと振幅閾値バッファ１７に記憶さ
れた振幅閾値Ａt とを比較するもので、閾値Ａt を越え
るデータＡがある場合（Ａ＞Ａt ）に出力を発生するよ
うにしている。It should be noted that the frame data counter 14 clears its contents every time it counts the data in the time section T (1 frame). Buffer memory 1
The data read from No. 3 is given to the amplitude comparison unit 16. The amplitude comparison unit 16 compares the data A read from the buffer memory 13 with the amplitude threshold At stored in the amplitude threshold buffer 17, and when there is data A exceeding the threshold At (A> At). To generate output.

【００１７】ここで、振幅閾値バッファ１７の振幅閾値
Ａt は、無音声区間でのバッファメモリ１３からのデー
タ（周囲ノイズ）に基づいて振幅閾値計算部１８で求め
られたものである。Here, the amplitude threshold At of the amplitude threshold buffer 17 is calculated by the amplitude threshold calculator 18 based on the data (ambient noise) from the buffer memory 13 in the non-voice section.

【００１８】振幅比較部１６の出力は、カウンタ１９に
与えられる。このカウンタ１９は、振幅比較部１６から
の出力（Ａ＞Ａt ）の発生回数をカウントするものであ
る。また、このカウンタ１９は、フレームデータカウン
タ１４の時間区間Ｔごとの出力でクリアされるようにな
っている。The output of the amplitude comparison section 16 is given to the counter 19. The counter 19 counts the number of times the output (A> At) from the amplitude comparing section 16 is generated. Further, the counter 19 is cleared by the output of the frame data counter 14 for each time period T.

【００１９】カウンタ１９のカウント値は、カウント値
比較部２０に与えられる。このカウント値比較部２０
は、カウンタ１９のカウント値Ｎとカウント値閾値バッ
ファ２１に記憶されたカウント閾値Ｎt を比較するもの
で、比較結果がＮ＞Ｎt の場合に音声区間として出力す
るようにしている。The count value of the counter 19 is given to the count value comparing section 20. This count value comparison unit 20
Is for comparing the count value N of the counter 19 with the count threshold value Nt stored in the count value threshold value buffer 21. When the comparison result is N> Nt, the count value Nt is output as a voice section.

【００２０】この場合、カウント値閾値バッファ２１の
カウント値閾値Ｎt は、無音声区間でのカウンタ１９の
カウント値（周囲ノイズ）に基づいてカウント値計算部
２２で求められたものである。In this case, the count value threshold value Nt of the count value threshold value buffer 21 is obtained by the count value calculating section 22 based on the count value (ambient noise) of the counter 19 in the non-voice section.

【００２１】次に、以上のように構成した実施例の動作
を説明する。まず、音声を含まない無音声区間において
振幅閾値Ａt とカウント閾値Ｎt の設定が図２に示すフ
ローチャートにより実行される。Next, the operation of the embodiment configured as described above will be described. First, the amplitude threshold value At and the count threshold value Nt are set according to the flowchart shown in FIG.

【００２２】この場合、音声を含まない無音声区間で
は、バッファメモリ１３からの出力を振幅閾値計算部１
８に与え、ノイズの平均振幅Ａn を求め（ステップ２０
１）、この平均振幅Ａn に＋αを加えて振幅閾値Ａt を
決定し（ステップ２０２）、振幅閾値バッファ１７に書
き込む。また、上述の平均振幅Ａn を求めたバッファメ
モリ１３の無音声区間について時間区間Ｔで複数に区分
し、振幅比較部１６で各時間区間ＴのデータＡを振幅閾
値Ａt と比較を行い、データＡが閾値Ａt を越えるＡ＞
Ａt となる回数をカウンタ１９でカウントして、カウン
ト値Ｎを求めるとともに、このカウント値Ｎをカウント
値計算部２２に与えて、各時間区間Ｔについて平均カウ
ント値Ｎn を求め（ステップ２０３）、この平均カウン
ト値Ｎn に＋βを加えてカウント閾値Ｎt を決定し（ス
テップ２０４）、カウント値閾値バッファ２１に書き込
む。In this case, in the non-voice section containing no voice, the output from the buffer memory 13 is set to the amplitude threshold calculation unit 1
8 to obtain the average amplitude An of the noise (step 20
1) Then, + α is added to this average amplitude An to determine the amplitude threshold At (step 202), and the amplitude threshold At is written in the amplitude threshold buffer 17. Further, the non-voice section of the buffer memory 13 for which the above-mentioned average amplitude An is obtained is divided into a plurality of sections by the time section T, and the amplitude comparison unit 16 compares the data A of each time section T with the amplitude threshold At to obtain the data A. Exceeds the threshold At>A>
The number of times At is counted by the counter 19 to obtain the count value N, and the count value N is given to the count value calculation unit 22 to obtain the average count value Nn for each time interval T (step 203). + Β is added to the average count value Nn to determine the count threshold value Nt (step 204), and the count value threshold value buffer 21 is written.

【００２３】次に、音声を含む音声区間が与えられる
と、この音声区間の始まり点の決定が図３に示すフロー
チャートにより実行される。この場合、バッファメモリ
１３より最初の時間区間Ｔに関するデータＡを読み出
し、振幅比較部１６で振幅閾値Ａt と比較する。そし
て、データＡが振幅閾値Ａt を越える回数Ｎをカウンタ
１９でカウントし（ステップ３０１）、このカウンタ１
９のカウント値をカウント値比較部２０に与え、カウン
ト閾値Ｎt と比較する（ステップ３０２）。ここで、比
較結果がＮ＞Ｎt となった場合は、その時間区間Ｔの始
まり点を音声区間の始まり点と決定する（ステップ３０
３）。一方、比較結果がＮ≦Ｎt となった場合は、該当
時間区間Ｔを時間軸に沿って所定時間だけシフトし（ス
テップ３０４）、ステップ３０１に戻って、上述したと
同様な動作を繰り返す。その後、Ｎ＞Ｎt が得られる時
間区間Ｔが現れ、音声区間の始まり点が決定されるのを
待って、処理を終了する。Next, when a voice section including a voice is given, the start point of this voice section is determined by the flowchart shown in FIG. In this case, the data A relating to the first time section T is read from the buffer memory 13 and compared with the amplitude threshold At in the amplitude comparison unit 16. Then, the number N of times the data A exceeds the amplitude threshold At is counted by the counter 19 (step 301), and this counter 1
The count value of 9 is given to the count value comparison unit 20 and compared with the count threshold value Nt (step 302). If the comparison result is N> Nt, the start point of the time section T is determined as the start point of the voice section (step 30).
3). On the other hand, when the comparison result is N ≦ Nt, the corresponding time section T is shifted by a predetermined time along the time axis (step 304), the process returns to step 301, and the same operation as described above is repeated. After that, a time section T in which N> Nt is obtained appears, waits for the start point of the voice section to be determined, and then the processing ends.

【００２４】次に、音声区間の終り点の決定が図４に示
すフローチャートにより実行される。この場合、上述の
音声区間の始まり点の決定が確認されたものとすると
（ステップ４０１）、この直後の時間区間Ｔについて閾
値Ａt を越えるデータＡの回数Ｎをカウンタ１９でカウ
ントする（ステップ４０２）。そして、このカウンタ１
９のカウント値をカウント値比較部２０に与え、カウン
ト閾値Ｎt と比較する（ステップ４０３）。ここで、比
較結果がＮ≦Ｎt となった場合は、その時間区間Ｔの終
りの点を音声区間の終り点と決定する（ステップ４０
４）。一方、比較結果がＮ＞Ｎt となった場合は、該当
時間区間Ｔを時間軸に沿って所定時間だけシフトし（ス
テップ４０５）、ステップ４０２に戻って、上述したと
同様な動作を繰り返す。その後、Ｎ≦Ｎt が得られる時
間区間Ｔが現れ、音声区間の終り点が決定されるのを待
って、処理を終了する。Next, the determination of the end point of the voice section is executed according to the flowchart shown in FIG. In this case, assuming that the determination of the start point of the voice section is confirmed (step 401), the counter 19 counts the number N of data A exceeding the threshold value At for the time section T immediately after this (step 402). . And this counter 1
The count value of 9 is given to the count value comparison unit 20 and compared with the count threshold value Nt (step 403). If the comparison result is N≤Nt, the end point of the time section T is determined as the end point of the voice section (step 40).
4). On the other hand, when the comparison result is N> Nt, the corresponding time section T is shifted by a predetermined time along the time axis (step 405), the process returns to step 402, and the same operation as described above is repeated. After that, a time section T in which N≤Nt is obtained appears, waits for the end point of the voice section to be determined, and then the processing ends.

【００２５】次に、このような音声区間切出し装置を使
用して、実際の入力信号に対して各閾値の設定および音
声区間の始まり点の決定を行う様子を説明する。図５
は、音声を含まない無音声区間の入力信号を示すもの
で、この場合、無音声区間の入力信号からデータ（ノイ
ズ）Ａの平均振幅Ａn を求め、この平均振幅Ａn に＋α
を加えて振幅閾値Ａt を決定する。また、無音声区間を
時間区間ＴでＭ個の区分Ｎ0 〜ＮM-1 に分割し、これら
区分Ｎ0 〜ＮM-1 について、データ（ノイズ）Ａが振幅
閾値Ａt を越える回数Ｎをカウントし、これをＭ個の時
間区間Ｔについて平均して、１時間区間Ｔ当たりの平均
カウント値Ｎn を求め、この平均カウント値Ｎn に＋β
を加えてカウント閾値Ｎt を決定する。Next, a description will be given of how such a voice segment cutout device is used to set each threshold value and determine the start point of the voice segment for an actual input signal. Figure 5
Indicates an input signal of a voiceless section which does not include voice. In this case, the average amplitude An of the data (noise) A is obtained from the input signal of the voiceless section, and the average amplitude An is + α.
Is added to determine the amplitude threshold At. Further, the non-voice section is divided into M sections N0 to NM-1 in the time section T, and the number N of times when the data (noise) A exceeds the amplitude threshold At is counted for these sections N0 to NM-1. Is averaged over M time sections T to obtain an average count value Nn per one time section T, and this average count value Nn is + β.
Is added to determine the count threshold Nt.

【００２６】次に、図６は、音声を含む音声区間の入力
信号を示すもので、この場合、最初の時間区間Ｔ0 につ
いて、振幅閾値Ａt を越えるデータＡの数Ｎをカウント
し、このカウント値Ｎをカウント閾値Ｎt と比較する。
ここで、比較結果がＮ≦Ｎtであれば、時間区間Ｔ0 を
時間軸に沿って所定時間だけシフトして時間区間Ｔ1を
新たに設定し、この時間区間Ｔ1 について、振幅閾値Ａ
t を越えるデータＡの数Ｎをカウントし、このカウント
値Ｎをカウント閾値Ｎt と比較する。そして、ここでの
比較結果もＮ≦Ｎt であれば、さらに時間区間Ｔ1 をシ
フトして時間区間Ｔ2 を新たに設定し、上述の動作を繰
り返す。そして、時間区間Ｔi において、Ｎ＞Ｎt が得
られたとすると、この時間区間Ｔi の始まり点を音声区
間の始まり点と決定するようになる。Next, FIG. 6 shows an input signal of a voice section including a voice. In this case, in the first time section T0, the number N of data A exceeding the amplitude threshold At is counted, and this count value is counted. Compare N with count threshold Nt.
If the comparison result is N ≦ Nt, the time section T0 is shifted by a predetermined time along the time axis to newly set the time section T1, and the amplitude threshold value A is set for the time section T1.
The number N of data A exceeding t is counted, and this count value N is compared with the count threshold Nt. If the comparison result here is N≤Nt, the time interval T1 is further shifted to newly set the time interval T2, and the above-described operation is repeated. If N> Nt is obtained in the time section Ti, the start point of the time section Ti is determined as the start point of the voice section.

【００２７】従って、このようにすれば、音声を含む入
力信号の無音声区間のノイズ振幅に基づい設定される振
幅閾値Ａt と、無音声区間での入力信号が振幅閾値Ａt
を越える回数に基づいて設定されるカウント閾値Ｎt に
基づいて、入力信号が振幅閾値Ａt を越える回数Ｎとカ
ウント閾値Ｎt の比較結果から入力信号の音声区間を決
定することで、音声区間以外での入力信号が振幅閾値を
越える回数を、音声区間に比べて明らかに少なくなるよ
うにできるので、従来の振幅閾値のみを設定したものに
比べ、音声区間以外のノイズ部分を音声区間と誤って検
出するようなことを確実に防止でき、入力される音信号
から正しく音声区間を切出しすることができる。Accordingly, in this way, the amplitude threshold At set based on the noise amplitude of the input signal including voice in the non-voice section and the amplitude threshold At of the input signal in the non-voice section are set.
On the basis of the count threshold value Nt set based on the number of times over, the voice section of the input signal is determined from the comparison result of the number N of times the input signal exceeds the amplitude threshold value At and the count threshold value Nt. Since the number of times the input signal exceeds the amplitude threshold can be made significantly smaller than that in the voice section, the noise part other than the voice section is erroneously detected as the voice section as compared with the conventional method in which only the amplitude threshold is set. This can be reliably prevented, and the voice section can be correctly cut out from the input sound signal.

【００２８】（第２実施例）図７は、音声区間の開始前
に比較的大きなノイズＮＳが存在すると、このノイズＮ
Ｓを音声区間の始まりと誤認識するおそれがある場合を
示している。(Second Embodiment) FIG. 7 shows that if a relatively large noise NS exists before the start of a voice section, this noise N
It shows a case where S may be erroneously recognized as the beginning of a voice section.

【００２９】そこで、このような不都合を除去するた
め、第２実施例では、最初の時間区間Ｔ0 について、振
幅閾値Ａt を越えるデータＡのカウント値Ｎがカウント
閾値Ｎt に対してＮ＞Ｎt であっても直ちに始まり点を
決定せずに、時間区間Ｔ0 をシフトして新たに時間区間
Ｔ1 を設定して、このような動作を繰り返し、Ｎ＞Ｎｔ
の関係が所定回連続して得られた場合にのみ、音声区間
の始まり点を決定するようにしている。図７の場合、ノ
イズＮＳの部分では、Ｎ＞Ｎｔの関係が３回連続して
いるが、予め連続回数を４回以上に設定しておけば、音
声区間の始まりと誤認識するのを防止できる。Therefore, in order to eliminate such inconvenience, in the second embodiment, the count value N of the data A exceeding the amplitude threshold At is N> Nt with respect to the count threshold Nt in the first time section T0. However, without immediately determining the starting point, the time interval T0 is shifted to newly set the time interval T1, and such an operation is repeated, and N> Nt.
The starting point of the voice section is determined only when the relationship of is obtained a predetermined number of times in succession. In the case of FIG. 7, the relation of N> Nt is continuous three times in the noise NS portion, but if the number of consecutive times is set to four or more in advance, it is possible to prevent erroneous recognition as the start of the voice section. it can.

【００３０】（第３実施例）図８は、音声区間の途中に
振幅の小さな信号部分が存在すると、この信号部分を音
声区間の終りと誤認識するおそれがある場合を示してい
る。(Third Embodiment) FIG. 8 shows a case where a signal portion having a small amplitude exists in the middle of a voice section, which may be erroneously recognized as the end of the voice section.

【００３１】そこで、このような不都合を除去するた
め、第３実施例では、最初の時間区間Ｔ0 について、振
幅閾値Ａt を越えるデータＡのカウント値Ｎがカウント
閾値Ｎt に対してＮ≦Ｎt であっても直ちに終り点を決
定せずに、時間区間Ｔ0 をシフトして新たに時間区間Ｔ
1 を設定して、このような動作を繰り返し、Ｎ≦Ｎt の
関係が所定回連続して得られた場合にのみ、音声区間の
終り点を決定するようにしている。図８の場合、振幅の
小さな信号部分では、Ｎ≦Ｎt の関係が２回連続してい
るが、予め連続回数を３回以上に設定しておけば、音声
区間の終り点と誤認識するのを防止できる。Therefore, in order to eliminate such inconvenience, in the third embodiment, the count value N of the data A exceeding the amplitude threshold At is N≤Nt with respect to the count threshold Nt in the first time section T0. Even if the end point is not determined immediately, the time interval T0 is shifted and a new time interval T is added.
By setting 1 and repeating such an operation, the end point of the voice section is determined only when the relationship of N≤Nt is continuously obtained a predetermined number of times. In the case of FIG. 8, the relationship of N ≦ Nt is continuous twice in the signal portion with a small amplitude, but if the number of continuous times is set to three times or more in advance, it may be erroneously recognized as the end point of the voice section. Can be prevented.

【００３２】なお、本発明は上記実施例にのみ限定され
ず、要旨を変更しない範囲で適宜変形して実施できる。
例えば、上述した実施例では、入力信号の振幅の絶対値
によりノイズに対する閾値を設定するようにしたが、入
力信号の対数値をとりパワーに変換したデータを使用し
てノイズに対する閾値を設定するようにしてもよい。The present invention is not limited to the above-mentioned embodiments, and can be carried out by appropriately modifying it without changing the gist.
For example, in the above-described embodiment, the threshold value for noise is set by the absolute value of the amplitude of the input signal. However, the threshold value for noise is set by using the logarithmic value of the input signal and converting it into power. You may

【００３３】[0033]

【発明の効果】本発明によれば、音声を含む入力信号の
無音声区間のノイズ振幅に基づい設定される振幅閾値
と、前記無音声区間の入力信号が前記振幅閾値を越える
回数に基づいて設定されるカウント閾値を用い、入力信
号が振幅閾値を越える回数とカウント閾値の比較結果か
ら入力信号の音声区間を決定するようにしたので、音声
区間以外での入力信号が振幅閾値を越える回数を、音声
区間に比べて明らかに少なくできることから、音声区間
以外のノイズ部分を音声区間と誤って検出するのを防止
でき、入力される音信号から正しく音声区間を切出しす
ることができる。また、こうすることで比較的振幅の小
さな子音から始まるような音声に対しても、閾値を小さ
く設定できることから、正しい音声区間の切出しを行う
ことができる。According to the present invention, the amplitude threshold value is set based on the noise amplitude of the voice-free input signal in the non-voice section and the number of times the input signal in the non-voice section exceeds the amplitude threshold value. Since the count threshold is set to determine the voice section of the input signal from the number of times the input signal exceeds the amplitude threshold and the comparison result of the count threshold, the number of times the input signal other than the voice section exceeds the amplitude threshold is Since it can be clearly reduced compared to the voice section, it is possible to prevent a noise part other than the voice section from being erroneously detected as the voice section, and the voice section can be correctly cut out from the input sound signal. Further, by doing so, the threshold value can be set small even for a voice that starts from a consonant having a relatively small amplitude, so that a correct voice segment can be cut out.

[Brief description of drawings]

【図１】本発明の一実施例の概略構成を示す図。FIG. 1 is a diagram showing a schematic configuration of an embodiment of the present invention.

【図２】一実施例の動作を説明するためのフローチャー
ト。FIG. 2 is a flowchart for explaining the operation of the embodiment.

【図３】一実施例の動作を説明するためのフローチャー
ト。FIG. 3 is a flowchart for explaining the operation of the embodiment.

【図４】一実施例の動作を説明するためのフローチャー
ト。FIG. 4 is a flowchart for explaining the operation of the embodiment.

【図５】実際の入力信号に対して閾値の設定の様子を説
明する図。FIG. 5 is a diagram for explaining how threshold values are set for an actual input signal.

【図６】実際の入力信号に対して音声区間の始まり点の
決定の様子を説明する図。FIG. 6 is a diagram illustrating how a start point of a voice section is determined with respect to an actual input signal.

【図７】本発明の第２実施例を説明するための図。FIG. 7 is a diagram for explaining the second embodiment of the present invention.

【図８】本発明の第３実施例を説明するための図。FIG. 8 is a diagram for explaining a third embodiment of the present invention.

【図９】従来の音声認識装置の概略構成を示す図。FIG. 9 is a diagram showing a schematic configuration of a conventional voice recognition device.

【図１０】従来の音声区間切出し装置の概略構成を示す
図。FIG. 10 is a diagram showing a schematic configuration of a conventional voice segment cutout device.

【図１１】従来の音声区間切出し装置を説明するための
図。FIG. 11 is a diagram for explaining a conventional voice segment cutout device.

【図１２】従来の音声区間切出し装置を説明するための
図。FIG. 12 is a diagram for explaining a conventional voice segment cutout device.

[Explanation of symbols]

１１…Ａ／Ｄ変換器、１２…クロック発生部、１３…バ
ッファメモリ、１４…フレームデータカウンタ、１５…
アドレス生成部、１６…振幅比較部、１７…閾値バッフ
ァ、１８…振幅閾値計算部、１９…カウンタ、２０…カ
ウント値比較部、２１…カウント値閾値バッファ、２２
…カウント値計算部。11 ... A / D converter, 12 ... Clock generator, 13 ... Buffer memory, 14 ... Frame data counter, 15 ...
Address generation unit, 16 ... Amplitude comparison unit, 17 ... Threshold buffer, 18 ... Amplitude threshold calculation unit, 19 ... Counter, 20 ... Count value comparison unit, 21 ... Count value threshold buffer, 22
… Count value calculator.

Claims

[Claims]

1. A storage unit for storing an input signal including voice, a comparison unit for comparing the input signal with an amplitude threshold value, a counter unit for counting the number of times the input signal exceeds the amplitude threshold value, and a counter unit. 5. A voice segment cutout device, comprising: a count value comparing means for comparing a count value with a count threshold value; and a voice segment determining means for determining a voice segment of the input signal from a comparison result of the count value comparing means.

2. The speech segment extraction device according to claim 1, wherein the threshold is set by adding a predetermined value to an average amplitude of noise in a non-speech segment of an input signal.

3. The count threshold value divides a non-voice section into predetermined time sections, counts the number of times the input signal exceeds the amplitude threshold value in these time sections, and adds a predetermined value to the average of these count values. The voice segment extraction device according to claim 1, wherein the count threshold value is set.

4. The voice section determining means determines the voice section of the input signal on condition that the same state is obtained a plurality of times consecutively as a comparison result of the count value comparing means. Voice segment cutting device.