JP2002091470A

JP2002091470A - Voice section detection device

Info

Publication number: JP2002091470A
Application number: JP2000286011A
Authority: JP
Inventors: Toshitaka Yamato; 俊孝大和; Hideki Kitao; 英樹北尾; Shinichi Iwamoto; 真一岩本; Osamu Iwata; 收岩田; Masataka Nakamura; 正孝中村; Yoshihisa Omoto; 芳尚大元
Original assignee: Denso Ten Ltd; Tsuru Gakuen
Current assignee: Denso Ten Ltd; Tsuru Gakuen
Priority date: 2000-09-20
Filing date: 2000-09-20
Publication date: 2002-03-27

Abstract

(57)【要約】【課題】促音を含む単語、あるいはサ行、ハ行音が連
続する単語に対しても音声区間を確実に検出することの
可能な音声区間検出装置を提供する。【解決手段】マイクロフォン２１で検出された音声信
号は、ライン増幅器２２で増幅、アナログ／ディジタル
変換部２３でディジタル化された後記憶部２４に記憶さ
れる。記憶された音声信号はピッチ検出部２５に取り込
まれ、時間領域の処理によって音声ピッチを抽出する。
ゲート信号生成部２６でこの音声ピッチに基づいてゲー
ト信号を制御し、音声区間信号生成部２８でこのゲート
信号に基づいて音声区間信号を制御する。音声区間信号
により記憶部に記憶されている音声信号を区分すること
により単語を抽出することができる。 (57) [Summary] [Problem] To provide a voice section detection device capable of reliably detecting a voice section even for a word including a prompting sound or a word having a continuous line of sounds. SOLUTION: An audio signal detected by a microphone 21 is amplified by a line amplifier 22, digitized by an analog / digital conversion unit 23, and stored in a storage unit 24. The stored voice signal is taken into the pitch detection unit 25, and the voice pitch is extracted by processing in the time domain.
The gate signal generator 26 controls the gate signal based on the voice pitch, and the voice section signal generator 28 controls the voice section signal based on the gate signal. Words can be extracted by distinguishing speech signals stored in the storage unit by speech section signals.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声区間検出装置に
係り、特に促音を含む単語、あるいはサ行、ハ行音が連
続する単語に対しても音声区間を確実に検出することの
可能な音声区間検出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice section detection device, and more particularly to a voice section capable of reliably detecting a voice section even for a word including a prompt sound or a word having a continuous line of sound. The present invention relates to a section detection device.

【０００２】[0002]

【従来の技術】音声認識にあっては、マイクロフォンか
ら取り込んだ時系列信号の中から音声認識の対象となる
音声区間を抽出することが必要となる。音声の短時間パ
ワーが予め定められた閾値以上である期間を音声区間と
する方法が提案されているが、不特定話者の音声から多
種類の単語を認識することを目的とする場合には十分な
精度を確保することは困難であった。2. Description of the Related Art In speech recognition, it is necessary to extract a speech section to be subjected to speech recognition from a time-series signal taken from a microphone. A method has been proposed in which a period in which the short-time power of a voice is equal to or greater than a predetermined threshold is set as a voice section, but when the purpose is to recognize many types of words from the voice of an unspecified speaker, It was difficult to secure sufficient accuracy.

【０００３】本出願人は既に時間領域において音声信号
から声の高さであるピッチを高精度で検出することの可
能なピッチ周期抽出装置及び方法をすでに提案している
（特開平９−５０２９７号公報）が、ピッチ周期に基づ
いて音声区間を決定することも可能である。The present applicant has already proposed a pitch period extracting apparatus and method capable of detecting a pitch, which is the pitch of a voice, from a speech signal in the time domain with high accuracy (Japanese Patent Laid-Open No. 9-50297). Publication) can determine the voice section based on the pitch period.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、単語中
に促音を含む単語Ａ（例えば「窒素」）、サ行が連続す
る単語Ｂ（例えば「寿司屋」）、あるいはハ行が連続す
る単語Ｃ（例えば「皮膚科」）を対象とした場合には、
単語を構成するすべての音が一つの連続した音声区間と
して検出されない誤検出が発生する可能性を回避できな
かった。However, the word A (for example, "Nitrogen") containing a prompting sound in the word, the word B (for example, "Sushi shop") having a continuous line, or the word C (for a sushi bar) having a continuous line. For example, "dermatology")
It is not possible to avoid the possibility of erroneous detection in which all sounds constituting a word are not detected as one continuous voice section.

【０００５】図１は従来のピッチ周期に基づく音声区間
検出結果であって、（イ）は「単語Ａ」を、（ロ）は
「単語Ｂ」を、（ハ）は「単語Ｃ」を対象として音声区
間を検出した場合を示す。いずれも上段は音声信号を、
下段は音声区間を示す。この図から判明するように、
「単語Ａ」の場合には、前部の音（「窒素」にあっては
"ちっ”）は音声区間内に検出されているものの、後部
の音（「窒素」にあっては "そ”）は検出されていな
い。[0005] Fig. 1 shows a conventional speech section detection result based on a pitch period, wherein (a) targets "word A", (b) targets "word B", and (c) targets "word C". Shows a case where a voice section is detected. In each case, the upper row contains audio signals,
The lower part shows a voice section. As can be seen from this figure,
In the case of "word A", the front sound ("nitrogen"
Although "Chi" is detected in the voice section, the sound at the rear ("So" for "Nitrogen") is not detected.

【０００６】「寿司屋」の場合には "すし”と "や”の
間で、そして「皮膚科」の場合には"ひふ”と "か”の
間で音声区間が途切れ、一つの音声区間として検出され
ていない。この誤検出の原因としては、以下のものが考
えられる。Ａ：単語Ａの促音 "っ”に続く摩擦音 "そ”、単語Ｂの
サ行音 "す”に続く摩擦音 "し”はレベルが低いだけで
なく、騒音との識別が困難であるためにピッチ周期の検
出自体が困難である。[0006] In the case of "sushi restaurant", the voice section is interrupted between "sushi" and "ya", and in the case of "dermatology", between "hifu" and "ka", one voice is interrupted. Not detected as a section. The following can be considered as causes of this erroneous detection. A: The fricative sound "so" following the prompting sound of the word A "tsu" and the fricative sound "shi" following the sound of the word B "su" are not only low in level, but also difficult to distinguish from noise, and are therefore pitches. It is difficult to detect the period itself.

【０００７】Ｂ：単語に先行する気音部あるいは騒音部
がなく、かつ低ピッチである場合にはピッチ周期の検出
ができない。Ｃ：単語Ｃの場合にはハ行音（「皮膚化」の場合は "ひ
ふ”）と、それに続く音（「皮膚化」の場合は "か”）
の間の無音期間が長い。Ｄ：無音期間の騒音本発明は上記課題に鑑みなされたものであって、促音を
含む単語、あるいはサ行、ハ行音が連続する単語に対し
ても音声区間を確実に検出することの可能な音声区間検
出装置を提供することを目的とする。B: If there is no audible or noise portion preceding the word and the pitch is low, the pitch period cannot be detected. C: In the case of the word C, the cue sound ("hifu" for "skin") and the sound following it ("ka" for "skin")
The silence period between is long. D: Noise during silence period The present invention has been made in view of the above problem, and it is possible to reliably detect a voice section even for a word including a prompting sound or a word having a continuous line of sound. It is an object of the present invention to provide a simple voice section detection device.

【０００８】[0008]

【課題を解決するための手段】第１の発明に係る音声区
間検出装置にあっては、音声信号中に含まれる雑音を除
去する前処理手段と、前処理手段により雑音が除去され
た音声信号から音声ピッチ信号を抽出する音声ピッチ抽
出手段と、音声ピッチ抽出手段で抽出された音声ピッチ
に基づいてゲート信号を生成するゲート信号生成手段
と、ゲート信号生成手段に基づいて音声区間信号を生成
する音声区間信号生成手段と、を具備する。According to a first aspect of the present invention, there is provided a speech section detecting apparatus, comprising: preprocessing means for removing noise contained in a speech signal; and a speech signal from which noise has been removed by the preprocessing means. Voice pitch extracting means for extracting a voice pitch signal from a voice signal, a gate signal generating means for generating a gate signal based on the voice pitch extracted by the voice pitch extracting means, and a voice section signal based on the gate signal generating means. Voice section signal generation means.

【０００９】本発明にあっては、音声信号から抽出され
た音声ピッチに基づいてゲート信号が制御され、このゲ
ート信号に基づいて音声区間信号が制御される。第２の
発明に係る音声区間検出装置にあっては、音声区間信号
生成手段で生成された音声区間信号に基づいて前処理手
段により雑音が除去された音声信号を複数の音声信号に
区分する音声信号区分手段をさらに具備する。In the present invention, the gate signal is controlled based on the voice pitch extracted from the voice signal, and the voice section signal is controlled based on the gate signal. In the voice section detection device according to the second invention, the voice sectioning the voice signal from which the noise has been removed by the preprocessing means on the basis of the voice section signal generated by the voice section signal generating means into a plurality of voice signals. The apparatus further includes signal dividing means.

【００１０】本発明にあっては、音声信号が音声区間信
号により複数の区間に区分される。第３の発明に係る音
声区間検出装置にあっては、音声ピッチ抽出手段が、前
処理手段により雑音が除去された音声信号に対して予め
定められた所定振幅以下の音声信号を除去する減算処理
を施す減算処理手段と、減算処理手段により減算処理さ
れた音声信号の振幅を略一定振幅に揃える揃振幅手段
と、揃振幅手段により略一定振幅に揃えられた音声信号
から正ピーク及び前記正ピークに引き続く負ピークを検
出し負ピークから正ピークを減算して負ピークを強調し
た音声信号を生成する負ピーク強調手段と、負ピーク強
調手段で負ピークの強調された音声信号を検波処理し検
波処理後の信号を微分処理する微分処理手段と、を具備
する。In the present invention, the audio signal is divided into a plurality of sections by the audio section signal. In the voice segment detection device according to the third invention, the voice pitch extracting means removes a voice signal having a predetermined amplitude or less from the voice signal from which noise has been removed by the preprocessing means. , An equalizing means for adjusting the amplitude of the audio signal subjected to the subtraction processing to the substantially constant amplitude, and a positive peak and the positive peak from the audio signal adjusted to a substantially constant amplitude by the equalizing means. Negative peak emphasizing means for detecting a negative peak following the above and subtracting the positive peak from the negative peak to generate an audio signal in which the negative peak is emphasized, and detecting and processing the audio signal in which the negative peak is emphasized by the negative peak emphasizing means Differential processing means for differentiating the processed signal.

【００１１】本発明にあっては、音声ピッチが時間領域
の処理により抽出される。第４の発明に係る音声区間検
出装置にあっては、減算処理手段が、前処理手段により
雑音が除去された音声信号の正側包絡線及び負側包絡線
を算出し正側包絡線及び負側包絡線の差である包絡線差
を算出する包絡線差算出手段と、包絡線差算出手段で算
出された包絡線差の予め定められた所定係数倍を乗じて
減算処理閾値を算出する減算処理閾値算出手段と、前処
理手段により雑音が除去された音声信号の振幅が減算処
理閾値算出手段で算出された減算処理閾値以上である場
合は音声信号の振幅から減算処理閾値を減算する減算処
理閾値減算手段と、を具備する。In the present invention, the voice pitch is extracted by processing in the time domain. In the speech section detection device according to the fourth invention, the subtraction processing means calculates the positive envelope and the negative envelope of the audio signal from which the noise has been removed by the preprocessing means, and calculates the positive envelope and the negative envelope. An envelope difference calculating means for calculating an envelope difference which is a difference between side envelopes; and a subtraction for calculating a subtraction processing threshold by multiplying the envelope difference calculated by the envelope difference calculating means by a predetermined multiplication factor. Processing threshold value calculating means, and subtraction processing for subtracting the subtraction processing threshold value from the audio signal amplitude when the amplitude of the audio signal from which noise has been removed by the preprocessing means is equal to or greater than the subtraction processing threshold value calculated by the subtraction processing threshold value calculating means. Threshold subtraction means.

【００１２】本発明にあっては、音声信号の包絡線差の
所定倍が減算処理閾値とされる。第５の発明に係る音声
区間検出装置にあっては、減算処理手段が、前処理手段
により雑音が除去された音声信号の振幅が前記減算処理
閾値算出手段で算出された減算処理閾値未満である場合
は音声信号の振幅を零に設定する零設定手段をさらに具
備する。In the present invention, a predetermined multiple of the envelope difference of the audio signal is set as the subtraction processing threshold. In the speech section detection device according to a fifth aspect, the subtraction processing means has an amplitude of the audio signal from which the noise has been removed by the preprocessing means is smaller than the subtraction processing threshold value calculated by the subtraction processing threshold value calculation means. In such a case, the apparatus further comprises zero setting means for setting the amplitude of the audio signal to zero.

【００１３】本発明にあっては、音声信号の振幅が減算
処理閾値以下である場合には音声信号の振幅が零に設定
される。第６の発明に係る音声区間検出装置にあって
は、揃振幅手段が、前処理手段により雑音が除去された
音声信号の正側包絡線及び負側包絡線を算出し正側包絡
線及び負側包絡線の差である包絡線差を算出する包絡線
差算出手段と、包絡線差算出手段で現在以前に算出され
た包絡線差の中の最大包絡線差を保持する最大包絡線差
保持手段と、最大包絡線差保持手段に保持された最大包
絡線差を現在包絡線差で除して揃振幅利得を算出する揃
振幅利得算出手段と、を具備する。In the present invention, when the amplitude of the audio signal is equal to or smaller than the subtraction threshold, the amplitude of the audio signal is set to zero. In the voice section detection device according to the sixth invention, the uniform amplitude means calculates the positive envelope and the negative envelope of the voice signal from which the noise has been removed by the preprocessing means, and calculates the positive envelope and the negative envelope. An envelope difference calculating means for calculating an envelope difference which is a difference between side envelopes; and a maximum envelope difference holding means for holding a maximum envelope difference among the envelope differences calculated before and now by the envelope difference calculating means. Means, and a matching amplitude gain calculating means for calculating a matching amplitude gain by dividing the maximum envelope difference held by the maximum envelope difference holding means by the current envelope difference.

【００１４】本発明にあっては、音声信号の包絡線差に
基づき揃振幅利得が決定される。第７の発明に係る音声
区間検出装置にあっては、揃振幅手段が、揃振幅利得算
出手段で算出された揃振幅利得が予め定められた所定閾
値以上である場合には揃振幅利得を単位利得に設定する
単位利得設定手段をさらに具備する。本発明にあって
は、揃振幅利得が予め定められた所定閾値以上である場
合には揃振幅利得は単位利得に設定される。In the present invention, the uniform amplitude gain is determined based on the envelope difference of the audio signal. In the speech section detection device according to the seventh invention, the matching amplitude means is configured to use the matching amplitude gain as a unit when the matching amplitude gain calculated by the matching amplitude gain calculating means is equal to or greater than a predetermined threshold value. The apparatus further includes unit gain setting means for setting a gain. In the present invention, when the matching amplitude gain is equal to or more than a predetermined threshold, the matching amplitude gain is set to a unit gain.

【００１５】第８の発明に係る音声区間検出装置にあっ
ては、ゲート信号生成手段が、音声ピッチ抽出手段で抽
出された連続する予め定められた数の音声ピッチの平均
値が予め定められたゲート開閾値以上となったときにゲ
ート信号を開とするゲート信号開手段を具備する。本発
明にあっては、予め定められた数の音声ピッチの平均値
がゲート開閾値以上となったときにゲート信号が開とさ
れる。In the voice section detection device according to an eighth aspect of the present invention, the gate signal generation means determines an average value of a predetermined number of continuous voice pitches extracted by the voice pitch extraction means. A gate signal opening means for opening a gate signal when a gate opening threshold is exceeded is provided. According to the present invention, the gate signal is opened when the average value of the predetermined number of voice pitches is equal to or greater than the gate opening threshold.

【００１６】第９の発明に係る音声区間検出装置にあっ
ては、ゲート信号生成手段が、ゲート信号開手段により
いったんゲート信号が開とされたときは音声ピッチ抽出
手段で抽出された連続する予め定められた数の音声ピッ
チの平均値がゲート開閾値より小である予め定められた
ゲート閉閾値以上であればゲート信号を開状態に維持す
るゲート信号開維持手段をさらに具備する。In the speech section detection device according to the ninth invention, the gate signal generation means is configured such that, when the gate signal is once opened by the gate signal opening means, the continuous pitch extracted by the voice pitch extraction means. If the average value of the predetermined number of voice pitches is equal to or greater than a predetermined gate closing threshold which is smaller than the gate opening threshold, the apparatus further comprises a gate signal open maintaining means for maintaining the gate signal in an open state.

【００１７】本発明にあっては、ゲート信号は連続する
予め定められた数の音声ピッチの平均値がゲート閉閾値
以上であればゲート信号は開状態に維持される。第１０
の発明に係る音声区間検出装置にあっては、ゲート信号
生成手段が、音声ピッチ抽出手段で抽出された連続する
予め定められた数の音声ピッチの平均値がゲート開閾値
未満となったときにゲート信号を閉状態とするゲート信
号閉手段をさらに具備する。According to the present invention, the gate signal is maintained in the open state if the average value of a predetermined number of continuous voice pitches is equal to or greater than the gate closing threshold. Tenth
In the voice section detection device according to the invention, the gate signal generation means, when the average value of a predetermined number of continuous voice pitches extracted by the voice pitch extraction means is less than the gate open threshold The apparatus further comprises a gate signal closing means for closing the gate signal.

【００１８】本発明にあっては、ゲート信号は音声ピッ
チの平均値がゲート閉閾値未満となったときに閉とされ
る。第１１の発明に係る音声区間検出装置にあっては、
音声区間信号生成手段が、ゲート信号生成手段で生成さ
れたゲート信号が開となった時点から予め定められた第
１の所定期間を計時する第１の所定期間計時手段と、第
１の所定期間計時手段による第１の所定期間の計時が終
了した時点から予め定められた第２の所定期間遡及して
音声区間信号を開とする音声区間信号開手段を具備す
る。In the present invention, the gate signal is closed when the average value of the voice pitch becomes smaller than the gate closing threshold. In the voice segment detection device according to the eleventh invention,
A first predetermined time period measuring means for measuring a first predetermined time period from a point in time when the gate signal generated by the gate signal generation means is opened; a first predetermined time period; There is provided a voice section signal opening means for opening the voice section signal retroactively for a predetermined second predetermined period from the time when the timing of the first predetermined period by the timing means is completed.

【００１９】本発明にあっては、音声区間信号は、ゲー
ト信号が第１の所定期間継続して開であるときは第１の
所定期間経過時点から第２の所定期間遡及して開とされ
る。第１２の発明に係る音声区間検出装置にあっては、
音声区間信号生成手段が、第２ゲート信号生成手段で生
成されたゲート信号が閉となった時点から予め定められ
た第３の所定期間を計時する第３の所定期間計時手段
と、第３の所定期間計時手段による第３の所定期間の計
時が終了した時点に音声区間信号を閉とする音声区間信
号閉手段をさらに具備する。In the present invention, when the gate signal is continuously open for the first predetermined period, the voice section signal is opened retroactively for a second predetermined period from the time when the first predetermined period has elapsed. You. In the voice segment detection device according to the twelfth invention,
A third predetermined time period measuring means for measuring a predetermined third predetermined period from the time when the gate signal generated by the second gate signal generating means is closed; The apparatus further includes voice section signal closing means for closing the voice section signal when the timing of the third predetermined time period by the predetermined time counting means is completed.

【００２０】本発明にあっては、音声区間信号は、ゲー
ト信号が閉となった時点から第３の所定期間経過後に閉
とされる。第１３の発明に係る音声区間検出装置にあっ
ては、音声区間信号生成手段が、第３の所定期間計時手
段による第３の所定期間の計時が終了しない前に音声区
間信号開手段から第２の所定期間遡及して音声区間信号
が開とされたときには音声区間信号を開状態に維持する
音声区間信号開状態維持手段をさらに具備する。In the present invention, the voice section signal is closed after a lapse of a third predetermined period from the time when the gate signal is closed. In the voice section detection device according to the thirteenth aspect, the voice section signal generation means may switch the voice section signal opening means from the voice section signal opening means before the time measurement of the third predetermined time period by the third predetermined time counting means does not end. And a voice section signal open state maintaining means for maintaining the voice section signal in an open state when the voice section signal is opened retroactively for a predetermined period of time.

【００２１】本発明にあっては、音声区間信号は、第３
の所定期間と第２の所定期間が重複する場合には開状態
に維持される。In the present invention, the voice section signal is the third
When the predetermined period and the second predetermined period overlap, the open state is maintained.

【００２２】[0022]

【発明の実施の形態】図２は本発明に係る音声区間検出
装置の機能構成図であって、マイクロフォン２１で電気
信号に変換された音声信号はライン増幅器２２で増幅さ
れた後、アナログ／ディジタル変換部２３で音声信号は
予め定められたサンプリング時間Δｔごとにサンプリン
グされてディジタル信号に変換されて、記憶部２４に記
憶される。FIG. 2 is a functional block diagram of a voice section detection apparatus according to the present invention. A voice signal converted into an electric signal by a microphone 21 is amplified by a line amplifier 22 and then analog / digital. The audio signal is sampled by the conversion unit 23 at every predetermined sampling time Δt, converted into a digital signal, and stored in the storage unit 24.

【００２３】ゲート信号生成部２６はピッチ検出部２５
で検出されたピッチに基づいてゲート信号を生成し、音
声区間信号生成部２７はゲート信号生成部２６で生成さ
れたゲート信号に基づいて音声区間信号を生成する。単
語抽出部２８は、音声区間信号生成部２７で生成された
音声区間信号に基づいて記憶部２４に記憶されているデ
ィジタル信号を処理して音声区間に含まれる単語を抽出
して出力する。The gate signal generator 26 is provided with a pitch detector 25
A gate signal is generated on the basis of the pitch detected in step (1), and the voice section signal generation unit 27 generates a voice section signal based on the gate signal generated by the gate signal generation unit 26. The word extracting section 28 processes the digital signal stored in the storage section 24 based on the voice section signal generated by the voice section signal generating section 27 to extract and output words included in the voice section.

【００２４】なお、本実施例においては、アナログ／デ
ィジタル変換部２３、記憶部２４、ピッチ検出部２５、
ゲート信号生成部２６、音声区間信号生成部２７及び単
語抽出部２８は例えばパーソナルコンピュータで構成さ
れ、ピッチ検出部２５、第１ゲート生成部２６、第２ゲ
ート生成部２７及び単語抽出部２８はソフトウエア的に
構成される。In this embodiment, the analog / digital conversion unit 23, the storage unit 24, the pitch detection unit 25,
The gate signal generation unit 26, the voice section signal generation unit 27, and the word extraction unit 28 are configured by, for example, a personal computer, and the pitch detection unit 25, the first gate generation unit 26, the second gate generation unit 27, and the word extraction unit 28 It is configured as hardware.

【００２５】図３はアナログ／ディジタル変換部２３及
び記憶部２４で実行される音声サンプリングルーチンの
フローチャートであって、サンプリング時間Δｔごとに
割り込み処理として実行される。まず、ステップ３０で
アナログ／ディジタル変換部２３でサンプリングされた
音声信号Ｖを取り込み、ステップ３１で音声信号Ｖに対
して前処理を行うが前処理の内容の詳細は後述する。FIG. 3 is a flowchart of a voice sampling routine executed by the analog / digital conversion unit 23 and the storage unit 24, and is executed as an interrupt process every sampling time Δt. First, in step 30, the audio signal V sampled by the analog / digital converter 23 is fetched, and in step 31, preprocessing is performed on the audio signal V. The details of the preprocessing will be described later.

【００２６】ステップ３２で記憶部２３の記憶順序を示
すインデックスｉを "１" に設定し、ステップ３３から
３５で既に記憶部２３に記憶されている音声信号Ｘ
（ｉ）を以下の処理により順送りする。Ｘ（ｉ＋１） ← Ｘ（ｉ）順送りが完了すると、今読み込んだ音声信号Ｖを記憶部
２３の最先番地Ｘ（１）に記憶してこのルーチンを終了
する。At step 32, an index i indicating the storage order of the storage unit 23 is set to "1". At steps 33 to 35, the audio signal X already stored in the storage unit 23 is set.
(I) is forwarded by the following processing. X (i + 1) ← X (i) When the forward feeding is completed, the currently read audio signal V is stored in the earliest address X (1) of the storage unit 23, and this routine ends.

【００２７】図４はステップ３１で実行される前処理ル
ーチンの詳細フローチャートであって、ステップ３１０
でディジタル信号に対して高周波雑音除去処理を実行す
る。この処理には、例えば遮断周波数が４ＫＨｚであり
遮断特性が１８ｄｂ／ｏｃｔであるローパスフィルタが
使用される。ステップ３１１では高周波雑音が除去され
たディジタル信号に対して低周波雑音除去処理を実行す
る。この処理には、例えば遮断周波数が３００Ｈｚであ
り遮断特性が１８ｄｂ／ｏｃｔであるハイパスフィルタ
が使用される。FIG. 4 is a detailed flowchart of the pre-processing routine executed in step 31.
Executes high frequency noise removal processing on the digital signal. For this processing, for example, a low-pass filter having a cutoff frequency of 4 KHz and a cutoff characteristic of 18 db / oct is used. In step 311, low frequency noise removal processing is performed on the digital signal from which high frequency noise has been removed. For this processing, for example, a high-pass filter having a cutoff frequency of 300 Hz and a cutoff characteristic of 18 db / oct is used.

【００２８】なお、上記実施例においては高周波雑音除
去処理及び低周波雑音除去処理をソフトウエアで行って
いるが、ライン増幅器２２内にハードウエア的なフィル
タを組み込んでもよい。図５はピッチ検出部２５で実行
されるピッチ検出ルーチンの詳細フローチャートであっ
て、ステップ５０で記憶部２３に記憶されている音声信
号Ｘ（ｉ）を読み込む。In the above embodiment, the high-frequency noise removal processing and the low-frequency noise removal processing are performed by software, but a hardware filter may be incorporated in the line amplifier 22. FIG. 5 is a detailed flowchart of the pitch detection routine executed by the pitch detection section 25. In step 50, the audio signal X (i) stored in the storage section 23 is read.

【００２９】そして、ステップ５１で減算処理を、ステ
ップ５２でＡＧＣ処理を、ステップ５３でピーク検出処
理を実行する。さらに、ステップ５４で極値検出クラン
プ処理を、ステップ５５でピッチ周期検出処理を実行し
てこのルーチンを終了する。なお、ステップ５１〜５５
の処理については、以下に詳述する。Then, a subtraction process is executed in step 51, an AGC process is executed in step 52, and a peak detection process is executed in step 53. Further, an extreme value detection clamp process is executed in step 54, and a pitch cycle detection process is executed in step 55, and this routine is ended. Steps 51-55
Will be described in detail below.

【００３０】図６はピッチ検出ルーチンのステップ５１
で実行される減算処理ルーチンのフローチャートであっ
て、音声信号の振幅を一定に揃えるＡＧＣ処理において
微小な雑音成分までもＡＧＣ処理されて増幅されること
を防止するために所定振幅以下の成分を除去することを
目的とする。まずステップ５１ａで包絡値差ΔＥを算出
するが詳細は図７で説明する。FIG. 6 shows a step 51 of the pitch detection routine.
Is a flowchart of a subtraction processing routine executed in step A. In the AGC processing for equalizing the amplitude of the audio signal, even a small noise component is removed from the predetermined amplitude or less in order to prevent the noise component from being AGC processed and amplified. The purpose is to do. First, the envelope value difference ΔE is calculated in step 51a, which will be described in detail with reference to FIG.

【００３１】ステップ５１ｂで包絡値差ΔＥが予め定め
られた振幅除去の閾値ｒ未満であるかを判定し、肯定判
定されたとき、即ち包絡値差ΔＥが閾値ｒ未満であると
きはステップ５１ｃで音声信号Ｘ（ｉ）を "０" に設定
してステップ５１ｄに進む。なお、ステップ５１ｂで否
定判定されたとき、即ち包絡値差ΔＥがしきい値ｒ以上
であるときは直接ステップ５１ｄに進む。In step 51b, it is determined whether or not the envelope value difference ΔE is less than a predetermined threshold value r for amplitude removal. When the determination is affirmative, that is, when the envelope value difference ΔE is less than the threshold value r, the process proceeds to step 51c. The audio signal X (i) is set to "0", and the routine proceeds to step 51d. When a negative determination is made in step 51b, that is, when the envelope value difference ΔE is equal to or larger than the threshold value r, the process directly proceeds to step 51d.

【００３２】ステップ５１ｄでは今回の正側包絡値Ｅp
が前回の正側包絡値Ｅbpより大であるかを判定する。ス
テップ５１ｄで肯定判定されたとき、即ち今回の正側包
絡値Ｅp が前回の正側包絡値Ｅbpより大であり正側包絡
値が増加しているときは、ステップ５１ｅでインデック
スｓを "１" に設定してステップ５１ｇに進む。In step 51d, the current positive-side envelope value Ep is calculated.
Is larger than the previous positive side envelope value Ebp. When an affirmative determination is made in step 51d, that is, when the current positive envelope value Ep is larger than the previous positive envelope value Ebp and the positive envelope value is increasing, the index s is set to "1" in step 51e. And the process proceeds to step 51g.

【００３３】逆にステップ５１ｄで否定判定されたと
き、即ち今回の正側包絡値Ｅ_pが前回の正側包絡値Ｅ_pb
より小であり正側包絡値が減少しているときは、ステッ
プ５１ｆでインデックスｓを "０" に設定してステップ
５１ｇに進む。ステップ５１ｇではインデックスｓの前
回値ｓ_bが "１" かつ今回のインデックスｓが "０" で
ある、即ち正側のピークが検出されたかを検出する。[0033] Conversely, when a negative determination is made in step 51d, i.e. positive envelope value of this positive envelope value E _p the previous E _pb
If it is smaller and the positive side envelope value is decreasing, the index s is set to "0" in step 51f, and the flow advances to step 51g. Step previous value s _b of the index s in 51g is "1" and this index s is "0", i.e., positive-side peak is detected whether the detected.

【００３４】ステップ５１ｇで肯定判定されたとき、即
ち正側ピークが検出されたときは、ステップ５１ｈで減
算処理の閾値ｂｃを次式を用いて算出した後ステップ５
１ｉに進む。ｂｃ ← α＊ΔＥここでαは予め定められた所定値であり、本発明に係る
音声区間検出装置を自動車車室内で使用する場合には一
定値 "０．０５" とすることができる。When an affirmative determination is made in step 51g, that is, when a positive peak is detected, a threshold value bc for the subtraction processing is calculated in step 51h using the following equation.
Go to 1i. bc ← α * ΔE Here, α is a predetermined value, and can be set to a fixed value “0.05” when the voice section detection device according to the present invention is used in a vehicle cabin.

【００３５】逆にステップ５１ｇで否定判定されたと
き、即ち正側ピークが検出されなかったときは、直接ス
テップ５１ｉに進む。ステップ５１ｉでは音声信号Ｘ
（ｉ）が減算処理の閾値ｂｃ以上であるか、即ち音声信
号Ｘ（ｉ）の振幅が大であるかを判定する。ステップ５
１ｉで肯定判定されたとき、即ち音声信号Ｘ（ｉ）の振
幅が閾値ｂｃ以上であるときは、ステップ５１ｊで音声
信号Ｘ（ｉ）から減算処理の閾値ｂｃを減算した値を減
算処理後音声信号Ｘ_S（ｉ）に設定してステップ５１ｉ
に進む。Conversely, when a negative determination is made in step 51g, that is, when the positive peak is not detected, the flow directly proceeds to step 51i. In step 51i, the audio signal X
It is determined whether (i) is equal to or larger than the threshold value bc of the subtraction process, that is, whether the amplitude of the audio signal X (i) is large. Step 5
1i, that is, when the amplitude of the audio signal X (i) is equal to or larger than the threshold bc, the value obtained by subtracting the threshold bc of the subtraction processing from the audio signal X (i) in step 51j is obtained. Step 51i by setting the signal X _S (i)
Proceed to.

【００３６】Ｘ_S（ｉ） ← Ｘ（ｉ）−ｂｃ一方、ステップ５１ｉで否定判定されたとき、即ち音声
信号Ｘ（ｉ）の振幅が閾値ｂｃ未満であるときはステッ
プ５１ｋでＸ_S（ｉ）を零に設定してステップ５１ｉに
進む。なお、ステップ５１ｋの処理を省略してステップ
５１ｉで否定判定されたときは直接ステップ５１ｉに進
むようにしてもよい。X _S (i) ← X (i) -bc On the other hand, when a negative determination is made in step 51i, that is, when the amplitude of the audio signal X (i) is smaller than the threshold bc, X _S (i) is determined in step 51k. ) Is set to zero and the routine proceeds to step 51i. Note that the process of step 51k may be omitted, and if a negative determination is made in step 51i, the process may directly proceed to step 51i.

【００３７】最後に、ステップ５１ｉで前回の正側包絡
値Ｅ_pb、前回の負側包絡値Ｅ_mb及び前回のインデックス
ｓ_bを更新してこのルーチンを終了する。Ｅ_pb ← Ｅ_p Ｅ_mb ← Ｅ_m ｓ_b ← ｓ図７は減算処理ルーチンのステップ５１ａで実行される
包絡値差算出ルーチンのフローチャートであって、ステ
ップａ１で今回正側包絡値Ｅ_pを次式により算出する。[0037] Finally, update the positive envelope value E _pb, negative envelope value E _mb and previous index s _b of the previous last at step 51i and ends this routine. _{_{_{E pb ← E p E mb ←}}} E m s b ← s 7 is a flowchart of the envelope value difference calculation routine executed in step 51a of the subtraction process routine, following the current positive envelope value E _p in step a1 It is calculated by the formula.

【００３８】Ｅ_p＝Ｅ_pb・ｅｘｐ｛−１／（τ・ｆ_s）｝ここでτは時定数ｆ_sはサンプリング周波数ステップａ２で今回負側包絡値Ｅ_mを次式により算出す
る。Ｅ_m＝Ｅ_mb・ｅｘｐ｛−１／（τ・ｆ_s）｝ステップａ３で減算処理後の音声信号Ｘ_S（ｉ）とステ
ップａ１で演算された今回正側包絡値Ｅ_pの最大値を改
めて今回正側包絡値Ｅ_pに置き換える。[0038] The _{_{E p = E pb · exp {}} -1 / (τ · f s)} where tau is a time constant f _s is negative envelope value E _m current at a sampling frequency step a2 is calculated by the following equation. The maximum value of _{_{E m = E mb · exp {}} -1 / (τ · f s)} audio signal after subtraction processing in step a3 X _S (i) and the current positive envelope value E _p calculated in step a1 again this time replace the positive side envelope value E _p.

【００３９】ステップａ４で減算処理後の音声信号Ｘ_S
（ｉ）とステップ７１で算出された今回負側包絡値Ｅ_m
の最小値を改めて今回負側包絡値Ｅ_mに置き換える。ス
テップａ５で次式により包絡値差ΔＥを算出して、この
ルーチンを終了する。 ΔＥ＝Ｅ_p− Ｅ_m 図８は減算処理の効果に説明図であって、（イ）は減算
処理前の音声信号を、（ロ）は減算処理後の音声信号を
示す。この図から、減算処理により微小な雑音が除去さ
れていることが理解される。The audio signal X _S after the subtraction processing in step a4
(I) a negative side envelope value currently calculated at step 71 E _m
Replacing the minimum value of anew to this negative side envelope value E _m. In step a5, the envelope value difference ΔE is calculated by the following equation, and this routine ends. ΔE = E _p - E _m 8 is an explanatory view to the effects of the subtraction process, (b) shows an audio signal before subtraction, the (b) the audio signal after subtraction. From this figure, it is understood that minute noise has been removed by the subtraction processing.

【００４０】図９はピッチ検出ルーチンのステップ５２
で実行されるＡＧＣ処理ルーチンのフローチャートであ
って、減算処理後の音声信号Ｘ_S（ｉ）の振幅を一定に
揃えることを目的とする。まず、ステップ５２ａで最大
包絡値差ΔＥ_maxの初期値を "０" に設定し、ステップ
５２ｂで図７に示す包絡値差算出ルーチンを実行して包
絡値差ΔＥを算出する。ただし、この場合は包絡値差算
出ルーチンのステップａ３及びａ４のＸ（ｉ）をＸ
_S（ｉ）とすることはいうまでもない。FIG. 9 shows step 52 of the pitch detection routine.
Is a flowchart of an AGC processing routine executed in step (a), which aims to make the amplitude of the audio signal X _S (i) after the subtraction processing uniform. First, in step 52a, the initial value of the maximum envelope value difference ΔE _max is set to “0”, and in step 52b, the envelope value difference calculation routine shown in FIG. 7 is executed to calculate the envelope value difference ΔE. However, in this case, X (i) of steps a3 and a4 of the envelope value difference calculation routine is X
Needless to say, _S (i).

【００４１】次にステップ５２ｃで、Ｘ_S（ｉ−２）＜Ｘ_S（ｉ−１）かつＸ_S（ｉ）＜Ｘ_S（ｉ−１）かつＸ（ｉ−１）_S＞０であるか、即ちΔｔ前にサンプリングされた減算処理後
の音声信号Ｘ_S（ｉ−１）が正のピークであるかを判定
する。Next, at step 52c, X _S (i-2) <X _S (i-1) and X _S (i) <X _S (i-1) and X (i-1) _S > 0. That is, it is determined whether the audio signal X _S (i−1) after the subtraction processing sampled before Δt has a positive peak.

【００４２】ステップ５２ｃで肯定判定されたとき、即
ち減算処理後の音声信号Ｘ_S（ｉ−１）が正のピークで
あるときは、ステップ５２ｄで包絡値差ΔＥとそれ以前
に決定された最大包絡値差ΔＥ_maxの最大値を最大包絡
値差ΔＥ_maxに設定し直して、最大包絡値差ΔＥ_maxを
更新してステップ５２ｅに進む。なお、ステップ５２ｃ
で否定判定されたとき、即ち音声信号Ｘ_S（ｉ−１）が
正のピークでないときは直接ステップ５２ｅに進む。When an affirmative determination is made in step 52c, that is, when the audio signal X _S (i-1) after the subtraction processing has a positive peak, in step 52d the envelope value difference ΔE and the maximum value determined before that are determined. the maximum value of the envelope value difference Delta] E _max again set to the maximum envelope value difference Delta] E _max, the process proceeds to step 52e to update the maximum envelope value difference Delta] E _max. Step 52c
In If a negative determination is made, i.e. the process proceeds to directly step 52e when audio signals X _S (i-1) is not a positive peak.

【００４３】ステップ５２ｅではステップ５２ｂで算出
した包絡値差ΔＥが "０" であるかを判定する。そし
て、否定判定されたとき、即ちΔＥが "０" でないとき
はステップ５２ｆでゲインＧをΔＥ_max／ΔＥに設定す
る。次にステップ５２ｇでゲインＧが予め定められた閾
値β（例えば１０）以上であるかを判定し、肯定判定さ
れたときはステップ５２ｈでゲインＧを "１" に設定し
てステップ５２ｉに進む。なお、ステップ５２ｇの判断
を省略して、ステップ５２ｆからステップ５２ｉに直接
進むようにしてもよい。In step 52e, it is determined whether or not the envelope value difference ΔE calculated in step 52b is "0". If a negative determination is made, that is, if ΔE is not “0”, the gain G is set to ΔE _max / ΔE in step 52f. Next, at step 52g, it is determined whether the gain G is equal to or larger than a predetermined threshold value β (for example, 10). When the determination is affirmative, the gain G is set to "1" at step 52h, and the routine proceeds to step 52i. The determination in step 52g may be omitted, and the process may proceed directly from step 52f to step 52i.

【００４４】逆にステップ５２ｇで否定判定されたと
き、即ちゲインＧが予め定められた閾値β未満であると
きは直接ステップ５２ｉに進む。なお、ステップ５２ｅ
で肯定判定されたとき、即ちΔＥが "０" である時もス
テップ５２ｈでゲインＧを "１" に設定してステップ５
２ｉに進む。最後にステップ５２ｉで減算処理後の音声
信号Ｘ_S（ｉ−１）にゲインＧを乗算してＡＧＣ処理後
の音声信号Ｘ_G（ｉ−１）を算出してこのルーチンを終
了する。Conversely, when a negative determination is made in step 52g, that is, when the gain G is less than the predetermined threshold value β, the flow directly proceeds to step 52i. Step 52e
When the affirmative determination is made in step 5, that is, when ΔE is “0”, the gain G is set to “1” in step 52h, and step 5
Proceed to 2i. Finally, in step 52i, the audio signal X _S (i-1) after the subtraction processing is multiplied by the gain G to calculate the audio signal X _G (i-1) after the AGC processing, and this routine ends.

【００４５】Ｘ_G（ｉ−１） ← Ｇ＊Ｘ_S（ｉ−１）図１０はＡＧＣ処理の効果の説明図であって、（イ）は
ＡＧＣ処理前の音声信号を、（ロ）はＡＧＣ処理後の音
声信号を示す。即ち、（イ）のように音声波形の振幅が
急激に変化する場合には後述のピッチ周期検出において
誤検出の発生を回避できない。そこで、ＡＧＣ処理によ
り音声波形をほぼ一定振幅に揃えることにより、誤検出
の発生を防止することが可能となる。X _G (i-1) ← G * X _S (i-1) FIGS. 10A and 10B are explanatory diagrams of the effect of the AGC processing. FIG. 10A shows an audio signal before the AGC processing, and FIG. 5 shows an audio signal after AGC processing. That is, when the amplitude of the voice waveform changes rapidly as in (a), occurrence of erroneous detection cannot be avoided in pitch period detection described later. Therefore, it is possible to prevent the occurrence of erroneous detection by making the audio waveform to have a substantially constant amplitude by the AGC process.

【００４６】図１１はピッチ検出ルーチンのステップ５
３で実行されるピーク検出処理ルーチンの詳細フローチ
ャートであって、ステップ５３ａでＡＧＣ処理後の音声
信号に正ピークが検出されたかを判定する。即ち、以下
の条件が満足されたときにＸ _G（ｉ−２）が正ピークで
あると判定する。Ｘ_G（ｉ−３）＜Ｘ_G（ｉ−２）かつＸ_G（ｉ−
１）＜Ｘ_G（ｉ−２）かつ０＜Ｘ_G（ｉ−２）ステップ５３ａで肯定判定されたとき、即ちＡＧＣ処理
後の音声信号に正ピークが検出されたときはステップ５
３ｂでピーク値Ｘ_G（ｉ−２）をＰとして記憶してこの
ルーチンを終了する。FIG. 11 shows step 5 of the pitch detection routine.
Detailed flowchart of the peak detection processing routine executed in step 3
And the voice after the AGC processing in step 53a.
It is determined whether a positive peak has been detected in the signal. That is,
When the condition of is satisfied, X _G(I-2) is a positive peak
It is determined that there is. X_G(I-3) <X_G(I-2) AND X_G(I-
1) <X_G(I-2) and 0 <X_G(I-2) When a positive determination is made in step 53a, that is, AGC processing
Step 5 when a positive peak is detected in the later audio signal
Peak value X at 3b_G(I-2) is stored as P, and
End the routine.

【００４７】ステップ５３ａで否定判定されたとき、即
ちＡＧＣ処理後の音声信号に正ピークが検出されないと
きは直接このルーチンを終了する。図１２はピッチ検出
ルーチンのステップ５４で実行される極値検出・クラン
プ処理ルーチンの詳細フローチャートであって、ステッ
プ３２４ａでＡＧＣ処理後の音声信号Ｘ_Gに負のピーク
が検出されたかを判定する。即ち、以下の条件が満足さ
れたときにＸ_G（ｉ−２）が負ピークであると判定す
る。When a negative determination is made in step 53a, that is, when no positive peak is detected in the audio signal after the AGC processing, this routine is directly terminated. Figure 12 determines whether a detailed flowchart of extreme detection clamp processing routine executed at step 54 of pitch detection routine, a negative peak is detected in the speech signal X _G after AGC processing at step 324a. That is, it is determined that X _G (i-2) is a negative peak when the following condition is satisfied.

【００４８】Ｘ_G（ｉ−３）＞Ｘ_G（ｉ−２）かつ
Ｘ_G（ｉ−１）＞Ｘ_G（ｉ−２）かつ０＞Ｘ_G（ｉ−
２）ステップ５４ａで肯定判定されたとき、即ちＡＧＣ処理
後の音声信号に負のピークが検出されたときは、ステッ
プ５４ｂでＡＧＣ処理後の音声信号Ｘ_G（ｉ−２）から
ピーク値Ｐを減算して、負ピークを強調したクランプ処
理後の音声信号Ｘ_C（ｉ−２）を算出してこのルーチン
を終了する。X _G (i-3)> X _G (i-2) AND
X _G (i-1)> X _G (i-2) and 0> X _G (i-
2) When a positive determination is made in step 54a, that is, when a negative peak is detected in the audio signal after the AGC processing, the peak value P is calculated from the audio signal X _G (i-2) after the AGC processing in step 54b. The subtraction is performed to calculate the clamped audio signal X _C (i-2) emphasizing the negative peak, and the routine ends.

【００４９】Ｘ_C（ｉ−２） ← Ｘ_G（ｉ−２）−Ｐステップ５４ａで否定判定されたとき、即ちＡＧＣ処理
後の音声信号に負のピークが検出されないときは、ステ
ップ５４ｃでＡＧＣ処理後の音声信号Ｘ_G（ｉ−２）を
クランプ処理後の音声信号Ｘ_C（ｉ−２）としてこのル
ーチンを終了する。X _C (i−2) ← X _G (i−2) −P When a negative determination is made in step 54a, that is, when no negative peak is detected in the audio signal after the AGC processing, the AGC is performed in step 54c. The audio signal X _G (i-2) after the processing is used as the audio signal X _C (i-2) after the clamp processing, and this routine ends.

【００５０】Ｘ_C（ｉ−２） ← Ｘ_G（ｉ−２）図１３はピッチ検出ルーチンのステップ５５で実行され
るピッチ周期検出処理ルーチンの詳細フローチャートで
あって、ステップ５５ａにおいて検波後出力Ｘ _D（ｉ−
３）を次式により算出する。Ｘ_D（ｉ−３） ← Ｅ・ｅｘｐ（−Δｔ／τ）ここでΔｔはサンプリング時間、τは予め定められた時
定数である。X_C(I-2) ← X_G(I-2) FIG. 13 is executed in step 55 of the pitch detection routine.
The detailed flowchart of the pitch cycle detection processing routine
In step 55a, the detected output X _D(I-
3) is calculated by the following equation. X_D(I-3) ← E · exp (−Δt / τ) where Δt is a sampling time and τ is a predetermined time
Is a constant.

【００５１】なお、Ｅについては後述する。ステップ５
５ｂでクランプ処理後の音声信号Ｘ_C（ｉ−３）の絶対
値が検波後出力Ｘ_D（ｉ−３）の絶対値より大であるか
を判定する。ステップ５５ｂで否定判定されたとき、即
ちＸ_C（ｉ−３）の絶対値がＸ_D（ｉ−３）の絶対値以
下であるときは、ステップ５５ｃで検波後出力Ｘ_D（ｉ
−３）をＥに設定してステップ５５ｆに進む。E will be described later. Step 5
In 5b, it is determined whether the absolute value of the clamped audio signal X _C (i-3) is greater than the absolute value of the detected output X _D (i-3). If a negative determination is made in step 55b, that is, when the absolute value of X _C (i-3) is less than the absolute value of the X _D (i-3) after detection in step 55c outputs X _D (i
-3) is set to E, and the routine proceeds to step 55f.

【００５２】ステップ５５ｂで肯定判定されたとき、即
ちＸ_C（ｉ−３）の絶対値がＸ_D（ｉ−３）の絶対値よ
り大であるときは、ステップ５５ｄでクランプ処理後の
音声信号に負のピークが存在するかを判定する。即ち、
以下の条件が満足されたときにＸ_C（ｉ−３）が負ピー
クであると判定する。Ｘ_C（ｉ−４）＞Ｘ_C（ｉ−３）かつＸ_C（ｉ−
２）＞Ｘ_C（ｉ−３）かつ０＞Ｘ_C（ｉ−３）ステップ５５ｄで肯定判定されたとき、即ちクランプ処
理後の音声信号に負のピークが検出されたときはステッ
プ５５ｅで負ピーク値Ｘ_C（ｉ−３）ををＥに設定して
ステップ５５ｆに進む。なお、ステップ５５ｄで否定判
定されたとき、即ちクランプ処理後の音声信号に負のピ
ークが検出されないときはステップ５５ｃに進む。When an affirmative determination is made in step 55b, that is, when the absolute value of X _C (i-3) is larger than the absolute value of X _D (i-3), the audio signal after the clamp processing is performed in step 55d. Is determined whether or not there is a negative peak. That is,
When the following conditions are satisfied, it is determined that X _C (i-3) is a negative peak. X _C (i-4)> X _C (i-3) and X _C (i-
2)> X _C (i-3) and 0> X _C (i-3) When an affirmative determination is made in step 55d, that is, when a negative peak is detected in the clamped audio signal, a negative determination is made in step 55e. The peak value X _C (i-3) is set to E, and the routine proceeds to step 55f. When a negative determination is made in step 55d, that is, when no negative peak is detected in the audio signal after the clamp processing, the process proceeds to step 55c.

【００５３】ステップ５５ｆではＥとして記憶されてい
た値を検波後信号Ｘ_D（ｉ−３）に設定し、ステップ５
５ｇで次式により検波後信号変化ΔＸ_Dを算出する。 ΔＸ_D← Ｘ_D（ｉ−３） − Ｘ_D（ｉ−４）ステップ５５ｈで検波後信号変化ΔＸ_Dの絶対値が予め
定められた閾値γ以上であるかを判定する。In step 55f, the value stored as E is set as the post-detection signal X _D (i-3).
At 5 g, a post-detection signal change ΔX _D is calculated by the following equation. ΔX _D ← X _D (i−3) −X _D (i−4) In step 55h, it is determined whether or not the absolute value of the detected signal change ΔX _D is equal to or larger than a predetermined threshold γ.

【００５４】ステップ５５ｈで肯定判定されたとき、即
ち検波後出力が急減したときは、ステップ５５ｉで音声
ピッチ信号Ｘ_P（ｉ−３）を "−１" に設定してこのル
ーチンを終了する。逆にステップ５５ｈで否定判定され
たとき、即ち検波後出力が急減していないときは、ステ
ップ５５ｊで音声ピッチ信号Ｘ_P（ｉ−３）を "０" に
設定してこのルーチンを終了する。[0054] When an affirmative determination is made in step 55h, that is, when the post-detection output fell sharply, set to "-1" to speech pitch signal X _P (i-3) in step 55i and ends this routine. Conversely, when a negative determination is made in step 55h, that is, when the post-detection output is not sharply, and set to "0" to speech pitch signal X _P (i-3) in step 55j terminates this routine.

【００５５】図１４及び１５は本発明で適用されるピッ
チ周期検出方法の説明図（１／２）及び（２／２）であ
る。図１４の（イ）はクランプ処理後の音声信号を、ま
た（ロ）及び（ハ）は該当部分の拡大音声信号を示し、
横軸は時間を、縦軸は振幅を表す。即ち、クランプ処理
後の音声信号が負ピークを起点とする包絡線の内側にあ
るとき（ロ）は包絡線を維持し、外側にあるとき（ハ）
はクランプ処理後の音声信号を検波後出力とする。FIGS. 14 and 15 are explanatory diagrams (1/2) and (2/2) of the pitch period detecting method applied in the present invention. 14A shows the audio signal after the clamp processing, and FIGS. 14B and 14C show the enlarged audio signal of the corresponding portion.
The horizontal axis represents time, and the vertical axis represents amplitude. That is, when the sound signal after the clamp processing is inside the envelope starting from the negative peak (b), the envelope is maintained, and when the sound signal is outside the envelope (c).
Represents the audio signal after the clamp processing as an output after detection.

【００５６】図１５の（ニ）は検波後信号を、（ホ）は
音声ピッチ信号の波形図であって、時刻ｔ₂、ｔ₄及び
ｔ₆でピッチパルスが検出されていることを示す。図１
６は第１ゲート生成部２６で実行される第１のゲート信
号生成ルーチンのフローチャートであって、ステップ１
６０で音声ピッチ信号Ｘ_P（ｉ−３）が "−１" であ
り、かつ直前に音声ピッチ信号が "−１" であった時刻
を示すインデックスｊが（ｉ−３）と等しくないかを判
断する。FIG. 15D shows a signal after detection, and FIG. 15E shows a waveform diagram of a voice pitch signal, showing that a pitch pulse is detected at times t ₂ , t ₄ and t ₆ . FIG.
6 is a flowchart of a first gate signal generation routine executed by the first gate generation unit 26, and is a flowchart of step 1;
At 60, it is determined whether the voice pitch signal _XP (i-3) is "-1" and the index j indicating the time when the voice pitch signal was "-1" immediately before is not equal to (i-3). to decide.

【００５７】ステップ１６０で否定判定されたとき、即
ち音声ピッチ信号Ｘ_P（ｉ−３）が"−１" でないか、
又はｊが（ｉ−３）と等しいときは直接このルーチンを
終了する。ステップ１６０で肯定判定されたとき、即ち
音声ピッチ信号Ｘ_P（ｉ−３）が"−１" であり、かつ
インデックスｊが（ｉ−３）と等しくないときはステッ
プ１６１に進み、次式によってピッチ周波数ｆを算出す
る。If a negative determination is made in step 160, that is, if the voice pitch signal X _P (i-3) is not "-1",
Alternatively, when j is equal to (i-3), this routine is directly terminated. When an affirmative determination is made in step 160, that is, when the voice pitch signal _XP (i-3) is "-1" and the index j is not equal to (i-3), the process proceeds to step 161 and the following expression is used. Calculate the pitch frequency f.

【００５８】ｆ（ｉ−３）＝ｆ_s／｛（ｉ−３）−ｊ｝ここで、ｆ_sはサンプリング周波数で１／Δｔに等し
い。ステップ１６２でピッチ周波数ｆが予め定められた
最高周波数５００Ｈｚ以上であるかを判定し、最高周波
数以上であればステップ１６３でピッチ周波数ｆを"０"
に設定してステップ１６４に進む。なお、ステップ１
６２で否定判定されたときは直接ステップ１６４に進
む。[0058] f (i-3) = f s / {(i-3) -j} Here, f _s equals 1 / Delta] t at the sampling frequency. At step 162, it is determined whether the pitch frequency f is equal to or higher than a predetermined maximum frequency of 500 Hz.
And the process proceeds to step 164. Step 1
If a negative determination is made in 62, the flow directly proceeds to step 164.

【００５９】ステップ１６４で直前に音声ピッチ信号が
"−１" であった時刻を示すインデックスｊを（ｉ−
３）で更新する。次に、ステップ１６５で次式によりピ
ッチ周波数を更新した後、平均ピッチ周波数ｆ_mを算出
する。なお、本実施形態においては３つのピッチ周波数
の算術平均により平均ピッチ周波数を算出しているが、
使用するピッチ周波数の数は３つに限定されない。又平
均ピッチ周波数の算出方法も算術平均に限定されず、重
み付け平均、移動平均等他の方法により算出してもよ
い。At step 164, immediately before the voice pitch signal
The index j indicating the time at which the value was "-1" is set to (i-
Update in 3). Then, after updating the pitch frequency by the following expression in step 165, calculates an average pitch frequency f _m. In this embodiment, the average pitch frequency is calculated by the arithmetic average of three pitch frequencies.
The number of pitch frequencies used is not limited to three. Also, the method of calculating the average pitch frequency is not limited to the arithmetic average, but may be calculated by another method such as a weighted average or a moving average.

【００６０】ｆ₃← ｆ₂ ｆ₂← ｆ₁ ｆ₁← ｆ（ｉ−３）ｆ_m＝（ｆ₃＋ｆ₂＋ｆ₁）／３そして、ステップ１６６で平均ピッチ周波数ｆ_mが予め
定められた第１の閾値Ｔｈ₁（例えば２００Ｈｚ）以上
であるかを判定する。[0060] _{_{_{f 3 ← f 2 f 2 ←}}} f 1 f 1 ← f (i-3) f m = (f 3 + f 2 + f 1) / 3 Then, the average pitch frequency f _m is predetermined in step 166 It is determined whether it is equal to or more than a _first threshold Th ₁ (for example, 200 Hz).

【００６１】ステップ１６６で肯定判定されたとき、即
ち平均ピッチ周波数ｆ_mが第１の閾値Ｔｈ₁以上である
ときは、音声区間が始まったものとしてステップ１６７
で第１ゲート信号ｇ₁を "１" に設定してこのルーチン
を終了する。逆に、ステップ１６６で否定判定されたと
き、即ち平均ピッチ周波数ｆ_mが第１の閾値Ｔｈ₁未満
であるときは、ステップ１６８で平均ピッチ周波数ｆ_m
が予め定められた第２の閾値Ｔｈ₂（例えば８０Ｈｚ）
以上であるかを判定する。[0061] When an affirmative determination is made in step 166, that is, when the average pitch frequency f _m is the first threshold value Th ₁ or more, the step 167 as the speech segment began
To set the first gate signal g ₁ to "1" and terminate this routine. Conversely, when a negative determination is made in step 166, that is, when the average pitch frequency f _m is lower than the first threshold value Th _1, the average pitch frequency f _m at step 168
Is a predetermined second threshold value Th ₂ (for example, 80 Hz)
It is determined whether this is the case.

【００６２】ステップ１６８で肯定判定されたとき、即
ち平均ピッチ周波数ｆ_mが第２の閾値Ｔｈ₂以上である
ときは、音声区間が継続しているものとしてステップ１
６７に進みゲート信号ｇ₁を "１" に維持してこのルー
チンを終了する。逆に、ステップ１６８で否定判定され
たとき、即ち平均ピッチ周波数ｆ_mが第２の閾値Ｔｈ₂
未満であるときは、音声区間が終了したものとしてステ
ップ１６９に進みゲート信号ｇ₁を "０" にリセットし
てこのルーチンを終了する。[0062] When an affirmative determination is made in step 168, that is, when the average pitch frequency f _m is the second threshold value Th ₂ or more, Step 1 as the speech section continues
The gate signal g ₁ proceeds to 67 and maintained at "1" and ends this routine. Conversely, when a negative determination is made in step 168, i.e., the average pitch frequency f _m is the second threshold value Th ₂
If it is less than the threshold, it is determined that the voice section has ended, and the flow advances to step 169 to reset the gate signal g ₁ to “0” and terminate this routine.

【００６３】図１７はゲート信号の生成方法の説明図で
あって、（イ）はピッチ周波数を、（ロ）はゲート信号
ｇ₁を示す。そして、（イ）の黒丸は各時刻におけるピ
ッチ周波数ｆを表す。即ち、連続する３つの平均ピッチ
周波数が第１の閾値Ｔｈ₁（２００Ｈｚ）以上となった
時点でゲート信号ｇ₁が "１" 、即ち開となる。[0063] Figure 17 is a diagram for explaining the method of generating the gate signals, (b) the pitch frequency, (b) shows the gate signal g _1. The black circles in (a) represent the pitch frequency f at each time. That is, when three consecutive average pitch frequencies become equal to or higher than the first threshold value Th ₁ (200 Hz), the gate signal g ₁ becomes “1”, that is, opens.

【００６４】そして連続する３つの平均ピッチ周波数が
第２の閾値Ｔｈ₂（８０Ｈｚ）以上を維持している間は
ゲート信号ｇ₁は開を維持し、連続する３つの平均ピッ
チ周波数が第２の閾値Ｔｈ₂（８０Ｈｚ）未満となった
ときにゲート信号ｇ₁は "０" 、即ち閉となる。図１８
は音声信号処理例であって、（イ）は対象の音声信号Ｖ
を前処理ルーチンで遮断周波数が３００Ｈｚである高周
波通過フィルタでろ波して低周波雑音を除去した音声信
号Ｘを示す。The gate signal g ₁ remains open while the three consecutive average pitch frequencies are equal to or higher than the second threshold Th ₂ (80 Hz), and the three consecutive average pitch frequencies are maintained at the second average pitch frequency. the gate signal g ₁ when it becomes less than the threshold Th ₂ (80 Hz) is "0", that is, closed. FIG.
Is an example of audio signal processing, and (a) is an audio signal V
Shows an audio signal X in which a low frequency noise is removed by filtering with a high frequency pass filter having a cutoff frequency of 300 Hz in a preprocessing routine.

【００６５】（ロ）はＡＧＣ処理ルーチンによるＡＧＣ
処理後の音声信号Ｘ_Gの波形であって、所定の振幅以上
の成分の振幅がほぼ一定に成形されている。（ハ）はピ
ッチ周期検出処理ルーチンによる検波処理後の信号Ｘ_D
を示し、（ニ）は第１のゲート信号生成ルーチンのステ
ップ３４１で算出されたピッチ周波数ｆを示す。(B) AGC by AGC processing routine
A waveform of the audio signal X _G after treatment, the amplitude of the predetermined amplitude or more components are formed into a substantially constant. (C) shows the signal X _D after the detection processing by the pitch cycle detection processing routine.
(D) shows the pitch frequency f calculated in step 341 of the first gate signal generation routine.

【００６６】さらに、（ホ）は第１のゲート信号生成ル
ーチンで生成されたゲート信号ｇ₁を示す。この図から
理解できるように音声信号の存在期間とゲート信号ｇ₁
が開である期間は一致するものの、音声が途絶えた後に
雑音が発生した場合には雑音に起因したピッチ周波数
（（ニ）の○印）が発生しゲート信号ｇ₁の閉タイミン
グが遅れてしまう。(E) shows the gate signal g ₁ generated in the first gate signal generation routine. As can be understood from this figure, the existence period of the audio signal and the gate signal g ₁
Although but open at a time is consistent, resulting in delayed closing timing of the gate signal g ₁ pitch frequency due to noise ((d) of ○ mark) is generated when the noise after the voice is interrupted occurs .

【００６７】図１９は第２のゲート信号生成ルーチンの
フローチャートであって、第１のゲート信号生成ルーチ
ンに対してステップ１９０〜１９３が追加され、上記課
題を解決することを目的とする。即ち、ステップ１９０
で次式により直前に音声ピッチ信号Ｘ_p（ｉ−３）が "
−１" であった時刻を示すインデックスｊから（ｉ−
３）までの経過時間Ｄｔを算出する。FIG. 19 is a flowchart of the second gate signal generation routine. The object of the present invention is to solve the above problem by adding steps 190 to 193 to the first gate signal generation routine. That is, step 190
Then, immediately before the voice pitch signal X _p (i-3) is obtained by the following equation,
-1 "from the index j indicating (i-
The elapsed time Dt up to 3) is calculated.

【００６８】Ｄｔ ← ｛（ｉ−３）−ｊ｝／ｆ_s 次にステップ１９１で経過時間Ｄｔが予め定められた閾
値時間Ｄｔ_th（例えば０．０２５秒）以上、かつゲート
信号ｇ₁が "１" （即ちゲートが開）であるかを判定す
る。そしてステップ１９１で肯定判定されたとき、即ち
ゲートが開であり、かつ最後に "−１" である音声ピッ
チ信号が検出されてから２５ミリ秒以上が経過したとき
はステップ１９３で修正ゲート信号ｇ₁を "０" にして
ゲートを閉とするとともに、インデックスｊを更新し、
ｆ₂及びｆ₃をリセットしてこのルーチンを終了する。[0068] Dt ← {(i-3) -j} / f s next threshold time Dt _th the elapsed time Dt is predetermined in step 191 (e.g. 0.025 seconds), and the gate signal g ₁ is " 1 "(i.e., the gate is open). If an affirmative determination is made in step 191, that is, if the gate is open and more than 25 milliseconds have elapsed since the last detection of the voice pitch signal of “−1”, in step 193 the modified gate signal g _{Set 1} to "0" to close the gate, update the index j,
Reset the f ₂ and f ₃ ends the routine.

【００６９】逆にステップ１９２で否定判定されたと
き、即ちゲートが閉であるか、あるいは最後に "−１"
である音声ピッチ信号が検出されてから２５ミリ秒が経
過していないときはステップ１９４で図１６に示す第１
のゲート信号生成ルーチンを実行してこのルーチンを終
了する。なお、上記実施形態において閾値時間Δｔ_thを
２５ミリ秒としたのは、２５ミリ秒以上は周波数４０Ｈ
ｚ以下に対応するが、人間の声のピッチ周波数が４０Ｈ
ｚ以下となることは考え難いからである。Conversely, when a negative determination is made in step 192, that is, the gate is closed, or finally "-1"
If 25 milliseconds have not elapsed since the detection of the voice pitch signal, the first step shown in FIG.
Is executed and the routine is terminated. In the above embodiment, the threshold time Δt _{th is set} to 25 milliseconds because the frequency 40H is used for 25 milliseconds or more.
z or less, but the pitch frequency of the human voice is 40H
This is because it is hard to imagine that it is less than z.

【００７０】第２のゲート信号生成ルーチンで生成され
た修正ゲート信号を図１８の（ヘ）に示すが、雑音に起
因したピッチ周波数（（ニ）の○印）の影響を受けずに
修正ゲートが閉となることが判る。上記の修正ゲートを
使用することにより正確に音声区間を検出することが可
能であるが、さらに以下の課題を解決することにより一
層正確な音声区間の検出が可能となる。FIG. 18F shows the corrected gate signal generated by the second gate signal generation routine. The corrected gate signal is not affected by the pitch frequency (marked by ニ in (d)) caused by noise. It turns out that is closed. Although the voice section can be accurately detected by using the correction gate, the voice section can be detected more accurately by further solving the following problems.

【００７１】１．３つのピッチ周波数の平均値が第１の
閾値Ｔｈ₁以上となったときにゲートを開としているの
で、開タイミングが遅れがちとなる。２．単発的な大振幅の雑音と音声信号を区別できない。３．気音と雑音を区別できない。４．促音は振幅が小であるため、促音を検出できない。[0071] Since the average value of 1.3 one pitch frequency is the gate opens when a first threshold value Th ₁ or more, the opening timing is delayed is bee. 2. Cannot distinguish between sporadic large-amplitude noise and speech signal. 3. Inability to distinguish between noise and noise. 4. Since the prompting sound has a small amplitude, the prompting sound cannot be detected.

【００７２】そこで本発明においては、ゲート信号（修
正ゲート信号も含む）によって以下のように制御される
音声区間信号を導入することにより上記課題を解決す
る。即ち上記１、２及び３を解決するために、ゲート信
号が第１の所定期間（例えば５０ミリ秒）以上開状態を
維持している場合に、現在時点から第２の所定期間（例
えば１００ミリ秒）遡及して音声区間信号を開とする。In the present invention, the above-mentioned problem is solved by introducing a speech section signal controlled as follows by a gate signal (including a modified gate signal). That is, in order to solve the above 1, 2 and 3, when the gate signal is kept open for a first predetermined period (for example, 50 milliseconds) or more, a second predetermined period (for example, 100 milliseconds) from the current time point Second) Open the voice section signal retroactively.

【００７３】上記４を解決するために、ゲート信号が閉
となった時点から第３の所定期間（例えば１５０ミリ
秒）は音声区間信号を開状態に維持する。図２０は音声
区間信号生成部２７で実行される音声区間信号生成ルー
チンのフローチャートであって、ステップ２００で前回
演算されたゲート信号ｇ_1bが "０" であるか、即ちゲー
トが閉であったかが判定される。To solve the above problem 4, the voice section signal is kept open for a third predetermined period (for example, 150 milliseconds) from the time when the gate signal is closed. FIG. 20 is a flowchart of a voice section signal generation routine executed by the voice section signal generation unit 27, and it is determined whether the gate signal g _1b previously calculated in step 200 is “0”, that is, whether the gate is closed. Is determined.

【００７４】ステップ２００で肯定判定されたとき、即
ちゲートが閉であったときはステップ２０１で今回演算
されたゲート信号ｇ₁が "０" であるか、即ちゲートが
閉を維持しているかを判定する。ステップ２０１で肯定
判定されたとき、即ちゲートが閉を維持しているときは
ステップ２０２で閉維持処理を実行してステップ２０７
に進む。[0074] When an affirmative determination is made in step 200, i.e., either when the gate was closed, a gate signal g ₁ which is calculated this time in step 201 is "0", i.e. whether the gate is maintained closed judge. When an affirmative determination is made in step 201, that is, when the gate is kept closed, the closed maintenance process is executed in step 202, and step 207 is performed.
Proceed to.

【００７５】ステップ２０１で否定判定されたとき、即
ちゲートが開に移行したときはステップ２０３で開処理
を実行してステップ２０７に進む。一方ステップ２００
で否定判定されたとき、即ちゲートが開であったときは
ステップ２０４で今回演算されたゲート信号ｇ₁が "
１" であるか、即ちゲートが開を維持しているかを判定
する。When a negative determination is made in step 201, that is, when the gate is opened, the opening process is executed in step 203 and the process proceeds to step 207. Step 200
Is negative, that is, when the gate is open, the gate signal g ₁ calculated this time in step 204
1 ", that is, whether the gate remains open.

【００７６】ステップ２０４で肯定判定されたとき、即
ちゲートが開を維持しているときはステップ２０５で開
維持処理を実行してステップ２０７に進む。ステップ２
０４で否定判定されたとき、即ちゲートが閉に移行した
ときはステップ２０６で閉処理を実行してステップ２０
７に進む。ステップ２０７で音声区間信号を出力し、ス
テップ２０８では前回演算されたゲート信号ｇ_1bを今回
演算されたゲート信号ｇ₁で更新してこのルーチンを終
了する。When an affirmative determination is made in step 204, that is, when the gate is kept open, an open maintaining process is executed in step 205, and the routine proceeds to step 207. Step 2
When a negative determination is made in step 04, that is, when the gate shifts to closing, the closing process is executed in step 206 to execute step 20.
Go to 7. In step 207, a voice section signal is output, and in step 208, the gate signal g _1b calculated last time is updated with the gate signal g ₁ calculated this time, and this routine ends.

【００７７】図２１は音声区間信号生成ルーチンのステ
ップ２０２で実行される開維持処理ルーチンのフローチ
ャートであって、ステップ２ａでゲート信号ｇ₁が閉状
態を継続している時間である閉継続時間ｔ_ceにサンプリ
ング時間Δｔを加算する。ステップ２ｂで閉継続時間ｔ
_ceが第３の所定期間である１５０ミリ秒以上となったか
を判定する。[0077] Figure 21 is a flowchart of open maintenance process routine executed in step 202 of the speech section signal generation routine, the closing duration t is the time the gate signal g ₁ continues the closed state in step 2a _The sampling time Δt is added to _ce . In step 2b, the closing duration t
It is determined whether _ce has become equal to or longer than the third predetermined period of 150 milliseconds.

【００７８】ステップ２ｂで肯定判定されたとき、即ち
ゲート信号ｇ₁が閉となってから１５０ミリ秒を経過し
たときは、ステップ２ｃで処理時刻（ｉ−３）における
音声区間信号ｇ₂を "１" に設定してこのルーチンを終
了する。逆にステップ２ｂで否定判定されたとき、即ち
ゲート信号ｇ₁が閉となってから１５０ミリ秒を経過し
ていないときは、ステップ２ｄで処理時刻を示すインデ
ックスが（ｉ−３）であるときの第２のゲート信号であ
るｇ₂（ｉ−３）を "０" に設定してこのルーチンを終
了する。[0078] When an affirmative determination is made in step 2b, that is, when the gate signal g ₁ has elapsed 150 milliseconds in the closed can, the speech section signal g ₂ in step 2c in the processing time (i-3) " Set to 1 "and end this routine. Conversely, when a negative determination is made in step 2b time, that is, when the gate signal g ₁ has not exceeded the 150 milliseconds in the closed is an index showing the processing time in step 2d is (i-3) The second gate signal g ₂ (i−3) is set to “0”, and this routine ends.

【００７９】図２２は音声区間信号生成ルーチンのステ
ップ２０３で実行される開処理ルーチンのフローチャー
トであって、ステップ３ａで前回演算されたゲート信号
ｇ_1bを "１" に設定する。ステップ３ｂで閉継続時間ｔ
_ceを "０" に復帰し、ステップ３ｃで処理時刻を示すイ
ンデックスが（ｉ−３）であるときの音声区間信号であ
るｇ₂（ｉ−３）を "１" に設定してこのルーチンを終
了する。FIG. 22 is a flowchart of the opening processing routine executed in step 203 of the voice section signal generation routine. In step 3a, the gate signal g _1b calculated last time is set to “1”. In step 3b, the closing duration t
return to the _ce "0", an index indicating the processing time in step 3c is the routine is set to g ₂ a (i-3) "1" is a speech section signal when a (i-3) finish.

【００８０】図２３は音声区間信号生成ルーチンのステ
ップ２０５で実行される開維持処理ルーチンのフローチ
ャートであって、ステップ５ａでゲート信号ｇ₁が開状
態を継続している時間である開継続時間ｔ_oeにサンプリ
ング時間Δｔを加算する。ステップ５ｂで開継続時間ｔ
_oeが第１の所定期間である５０ミリ秒以上となったかを
判定する。[0080] Figure 23 is a flowchart of open maintenance process routine executed in step 205 of the speech section signal generation routine, the opening duration is a time gate signal g ₁ continues the opened state in step 5a t _The sampling time Δt is added to _oe . In step 5b, the open duration time t
It is determined whether or not _oe is _equal to or longer than a first predetermined period of 50 milliseconds.

【００８１】ステップ５ｂで否定判定されたとき、即ち
ゲート信号ｇ₁が開となってから５０ミリ秒を経過して
いないときはステップ５ｃで処理時刻を示すインデック
スが（ｉ−３）であるときの音声区間信号であるｇ
₂（ｉ−３）を "０" に設定してこのルーチンを終了す
る。ステップ５ｂで否定判定されたとき、即ちゲート信
号ｇ₁が開となってから５０ミリ秒を経過したときは、
ステップ５ｄで処理時刻から第２の所定時間である１０
０ミリ秒遡及した時刻を示すインデックスｉ_Bを次式に
より演算する。[0081] If a negative determination is made in step 5b, that is, when when the gate signal g ₁ has not reached the 50 milliseconds in the open is an index showing the processing time in step 5c is (i-3) G is the voice section signal of
₂ Set (i-3) to "0" and end this routine. If a negative determination is made in step 5b, that is, when the gate signal g ₁ has elapsed 50 milliseconds in the open, the
In step 5d, the second predetermined time from the processing time, ie, 10
An index i _B indicating a time that has been traced back by 0 milliseconds is calculated by the following equation.

【００８２】ｉ_B← （ｉ−３）−０．１／Δｔなお、右辺第２項は１００ミリ秒に存在するサンプリン
グ回数である。ステップ５ｅで音声信号が存在しない領
域まで遡ることを防止するためにインデックスｉ_Bが零
以上に設定し、ステップ５ｆで時刻を示すインデックス
がｉ_Bであるときの音声区間信号であるｇ₂（ｉ_B）を
"１" に設定する。I _B ← (i−3) −0.1 / Δt The second term on the right side is the number of samplings existing in 100 milliseconds. In step 5e, the index i _B is set to be equal to or greater than zero in order to prevent the sound signal from going back to an area where no audio signal exists. In step 5f, g ₂ (i) is the audio section signal when the index indicating the time is i _B. _B )
Set to "1".

【００８３】ステップ５ｇでインデックスｉ_Bが処理時
刻を示すインデックス（ｉ−３）となったか、即ち第２
の所定期間について遡及処理が完了したかを判定する。
そして否定判定されたとき、即ち遡及処理が完了してい
ないときはステップ５ｈでインデックスｉ_Bをデクレメ
ントしてステップ５ｆに戻る。逆に、ステップ５ｇで肯
定判定されたとき、即ち遡及処理が完了したときはこの
ルーチンを終了する。Whether the index i _B has become the index (i−3) indicating the processing time in step 5g,
It is determined whether the retroactive processing has been completed for the predetermined period.
And when a negative determination is made, i.e. when the retroactive process has not been completed it is returned to step 5f and decrementing the index i _B at step 5h. Conversely, when an affirmative determination is made in step 5g, that is, when the retroactive processing is completed, this routine ends.

【００８４】図２４は音声区間信号生成ルーチンのステ
ップ２０６で実行される開処理ルーチンのフローチャー
トであって、ステップ６ａで前回演算されたゲート信号
ｇ_1bを "０" に設定する。ステップ６ｂで開継続時間ｔ
_oeを "０" に復帰し、ステップ６ｃで処理時刻を示すイ
ンデックスが（ｉ−３）であるときの音声区間信号であ
るｇ₂（ｉ−３）を "０" に設定してこのルーチンを終
了する。FIG. 24 is a flowchart of an opening process routine executed in step 206 of the voice section signal generation routine. In step 6a, the gate signal g _1b calculated last time is set to “0”. In step 6b, the open duration time t
_oe returned to "0", an index indicating the processing time in step 6c is the routine is set to (i-3) is a speech section signal when a g ₂ a (i-3) "0" finish.

【００８５】図２５は音声区間信号生成ルーチンのステ
ップ２０７で実行される第２のゲート信号出力ルーチン
のフローチャートであって、ステップ７ａで処理時刻か
ら第２の所定時間である１００ミリ秒遡及した時刻を示
すインデックスｉ_bを次式により演算する。ｉ_b←
（ｉ−３）−０．１／Δｔステップ７ｂで音声信号が存在しない領域まで遡ること
を防止するためにインデックスｉ_bを零以上に設定し、
ステップ７ｃでｇ₂（ｉ_b）を出力してこのルーチンを
終了する。FIG. 25 is a flowchart of a second gate signal output routine executed in step 207 of the voice section signal generation routine. In step 7a, a time which is 100 ms earlier as the second predetermined time, which is a second predetermined time. the index i _b showing a calculating by the following equation. i _b ←
(I-3) setting the index i _b zero or more to -0.1 / Delta] t audio signal in step 7b is prevented from going back to the region that does not exist,
In step 7c, g ₂ (i _b ) is output, and this routine ends.

【００８６】図２６は単語抽出部２８で実行される単語
抽出ルーチンのフローチャートであって、ステップ２６
０で時刻を示すインデックスｉ_bであるときの単語信号
Ｗ（ｉ_b）を次式により算出する。Ｗ（ｉ_b） ← Ｘ（ｉ_b）＊ｇ₂（ｉ_b）ただしＸ（ｉ_b）は記憶部２４に記憶されている音声信
号である。FIG. 26 is a flow chart of a word extraction routine executed by the word extraction unit 28.
0 index indicates the time in i _b is a word signal W when the (i _b) is calculated by the following equation. W (i _b ) ← X (i _b ) * g ₂ (i _b ) where X (i _b ) is an audio signal stored in the storage unit 24.

【００８７】ステップ２６１でＷ（ｉ_b）を出力してこ
のルーチンを終了する。In step 261, W (i _b ) is output, and this routine ends.

【００８８】[0088]

【発明の効果】第１の発明に係る音声区間検出装置によ
れば、音声信号を時間領域で処理して抽出された音声ピ
ッチに基づいてゲート信号が制御され、ゲート信号によ
り音声区間が検出されるので、簡易な構成で音声区間を
検出することが可能となる。第２の発明に係る音声区間
検出装置によれば、音声区間に基づいて音声信号を複数
の音声区間に区分することが可能となる。According to the voice section detection apparatus of the first invention, the gate signal is controlled based on the voice pitch extracted by processing the voice signal in the time domain, and the voice section is detected by the gate signal. Therefore, a voice section can be detected with a simple configuration. According to the voice section detection device according to the second invention, it is possible to divide the voice signal into a plurality of voice sections based on the voice section.

【００８９】第３の発明に係る音声区間検出装置によれ
ば、音声信号を時間領域で処理して抽出された音声ピッ
チに基づいて音声区間が検出されるので、ほぼ実時間で
音声区間を検出することが可能となる。第４の発明に係
る音声区間検出装置によれば、音声信号の振幅の変動を
抑制することが可能となる。According to the voice section detection apparatus of the third invention, the voice section is detected based on the voice pitch extracted by processing the voice signal in the time domain, so that the voice section is detected almost in real time. It is possible to do. According to the voice section detection device according to the fourth aspect, it is possible to suppress the fluctuation of the amplitude of the voice signal.

【００９０】第５の発明に係る音声区間検出装置によれ
ば、音声信号内に存在する雑音を確実に除去することが
可能となる。第６の発明に係る音声区間検出装置によれ
ば、音声信号の振幅を略一定に揃えることにより音声ピ
ッチを確実に抽出することが可能となる。第７の発明に
係る音声区間検出装置によれば、揃振幅利得が所定閾値
であるときは揃振幅利得は単位利得に再設定することに
より雑音の混入を防止することが可能となる。According to the speech section detection device according to the fifth aspect, it is possible to reliably remove noise present in the speech signal. According to the voice section detection device according to the sixth aspect, the voice pitch can be reliably extracted by making the amplitude of the voice signal substantially uniform. According to the speech section detection device according to the seventh aspect, when the uniform amplitude gain is the predetermined threshold value, the mixing of the noise can be prevented by resetting the uniform amplitude gain to the unity gain.

【００９１】第８の発明に係る音声区間検出装置によれ
ば、ゲート信号が雑音の影響により誤って開となること
を防止することが可能となる。第９の発明に係る音声区
間検出装置によれば、ゲート信号が雑音の影響により誤
って閉となることを防止することが可能となる。第１０
の発明に係る音声区間検出装置によれば、音声ピッチが
抽出されなくなったときに第１ゲート信号を確実に閉と
することが可能となる。According to the speech section detection apparatus according to the eighth aspect, it is possible to prevent the gate signal from being erroneously opened due to the influence of noise. According to the speech section detection device according to the ninth aspect, it is possible to prevent the gate signal from being accidentally closed due to the influence of noise. Tenth
According to the voice section detection device according to the invention, it is possible to reliably close the first gate signal when the voice pitch is no longer extracted.

【００９２】第１１の発明に係る音声区間検出装置によ
れば、ゲート信号の開遅れを補償するとともに、雑音を
気音と区別して確実に排除することが可能となる。第１
２の発明に係る音声区間検出装置によれば、振幅の小さ
い促音を確実に検出することが可能となる。第１３の発
明に係る音声区間検出装置によれば、音声区間が重複し
た場合にも誤検出を防止することが可能となる。According to the speech section detection apparatus of the eleventh aspect, it is possible to compensate for the delay of opening of the gate signal and to reliably eliminate noise by distinguishing it from aero sounds. First
According to the voice section detection device of the second aspect, it is possible to reliably detect a prompt sound having a small amplitude. According to the voice section detection device according to the thirteenth aspect, it is possible to prevent erroneous detection even when voice sections overlap.

[Brief description of the drawings]

【図１】従来のピッチ周期に基づく音声区間検出結果で
ある。FIG. 1 shows a result of detecting a voice section based on a conventional pitch cycle.

【図２】本発明に係る音声区間検出装置の機能構成図で
ある。FIG. 2 is a functional configuration diagram of a voice section detection device according to the present invention.

【図３】音声サンプリングルーチンのフローチャートで
ある。FIG. 3 is a flowchart of an audio sampling routine.

【図４】前処理ルーチンのフローチャートである。FIG. 4 is a flowchart of a preprocessing routine.

【図５】ピッチ検出ルーチンのフローチャートである。FIG. 5 is a flowchart of a pitch detection routine.

【図６】減算処理ルーチンのフローチャートである。FIG. 6 is a flowchart of a subtraction processing routine.

【図７】包絡線差算出ルーチンのフローチャートであ
る。FIG. 7 is a flowchart of an envelope difference calculation routine.

【図８】減算処理の効果の説明図である。FIG. 8 is an explanatory diagram of an effect of a subtraction process.

【図９】ＡＧＣ処理ルーチンのフローチャートである。FIG. 9 is a flowchart of an AGC processing routine.

【図１０】ＡＧＣの効果の説明図である。FIG. 10 is an explanatory diagram of an effect of AGC.

【図１１】ピーク検出処理ルーチンのフローチャートで
ある。FIG. 11 is a flowchart of a peak detection processing routine.

【図１２】極値検出・クランプ処理ルーチンのフローチ
ャートである。FIG. 12 is a flowchart of an extreme value detection / clamping processing routine.

【図１３】ピッチ周期検出処理ルーチンのフローチャー
トである。FIG. 13 is a flowchart of a pitch cycle detection processing routine.

【図１４】ピッチ周期検出方法の説明図（１／２）であ
る。FIG. 14 is an explanatory diagram (1/2) of a pitch period detection method.

【図１５】ピッチ周期検出方法の説明図（２／２）であ
る。FIG. 15 is an explanatory diagram (2/2) of a pitch period detection method.

【図１６】第１のゲート信号生成ルーチンのフローチャ
ートである。FIG. 16 is a flowchart of a first gate signal generation routine.

【図１７】ゲート信号の生成方法説明図である。FIG. 17 is an explanatory diagram of a generation method of a gate signal.

【図１８】音声信号処理例である。FIG. 18 is an example of audio signal processing.

【図１９】第２のゲート信号生成ルーチンのフローチャ
ートである。FIG. 19 is a flowchart of a second gate signal generation routine.

【図２０】音声区間信号生成ルーチンのフローチャート
である。FIG. 20 is a flowchart of a voice section signal generation routine.

【図２１】閉維持処理ルーチンのフローチャートであ
る。FIG. 21 is a flowchart of a closing maintenance processing routine.

【図２２】開処理ルーチンのフローチャートである。FIG. 22 is a flowchart of an opening process routine.

【図２３】開維持処理ルーチンのフローチャートであ
る。FIG. 23 is a flowchart of an open maintaining process routine.

【図２４】閉処理ルーチンのフローチャートである。FIG. 24 is a flowchart of a closing process routine.

【図２５】音声区間信号出力ルーチンのフローチャート
である。FIG. 25 is a flowchart of a voice section signal output routine.

【図２６】単語抽出ルーチンのフローチャートである。FIG. 26 is a flowchart of a word extraction routine.

[Explanation of symbols]

２１…マイクロフォン２２…ライン増幅器２３…アナログ／ディジタル変換部２４…記憶部２５…ピッチ検出部２６…ゲート信号生成部２７…音声区間信号生成部８…単語抽出部 DESCRIPTION OF SYMBOLS 21 ... Microphone 22 ... Line amplifier 23 ... Analog / digital conversion part 24 ... Storage part 25 ... Pitch detection part 26 ... Gate signal generation part 27 ... Voice section signal generation part 8 ... Word extraction part

───────────────────────────────────────────────────── フロントページの続き (72)発明者北尾英樹兵庫県神戸市兵庫区御所通１丁目２番28号富士通テン株式会社内 (72)発明者岩本真一兵庫県神戸市兵庫区御所通１丁目２番28号富士通テン株式会社内 (72)発明者岩田收兵庫県神戸市兵庫区御所通１丁目２番28号富士通テン株式会社内 (72)発明者中村正孝広島県広島市佐伯区三宅二丁目１−１学校法人鶴学園内 (72)発明者大元芳尚東京都杉並区久我山１−５−25 Ｆターム(参考） 5D015 AA05 DD03 EE05 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Hideki Kitao 1-2-28 Gosho-dori, Hyogo-ku, Kobe City, Hyogo Prefecture Inside Fujitsu Ten Limited (72) Inventor Shinichi Iwamoto 1-chome, Goshodori, Hyogo-ku, Kobe City, Hyogo Prefecture 2-28 Inside Fujitsu Ten Limited (72) Inventor Osamu Iwata 1-2-28 Goshodori, Hyogo-ku, Kobe City, Hyogo Prefecture Inside Fujitsu Ten Limited (72) Inventor Masataka Nakamura Miyake Saiki-ku, Hiroshima City, Hiroshima Prefecture Chome 1-1 Inside Tsuru Gakuen (72) Inventor Yoshinao Omoto 1-5-25 Kugayama, Suginami-ku, Tokyo F-term (reference) 5D015 AA05 DD03 EE05

Claims

[Claims]

1. A pre-processing means for removing noise contained in an audio signal, a voice pitch extracting means for extracting a voice pitch signal from the audio signal from which noise has been removed by the pre-processing means, and a voice pitch extracting means A voice section detection device, comprising: a gate signal generation unit that generates a gate signal based on the voice pitch extracted in step (a); and a voice section signal generation unit that generates a voice section signal based on the gate signal generation unit.

2. The audio further comprising audio signal classification means for dividing the audio signal, from which noise has been removed by the preprocessing means, into a plurality of audio signals based on the audio section signal generated by the audio section signal generation means. Section detection device.

3. The subtraction processing means for performing a subtraction process for removing an audio signal having a predetermined amplitude or less from the audio signal from which the noise has been removed by the preprocessing means, An equalizing means for adjusting the amplitude of the audio signal subjected to the subtraction processing to a substantially constant amplitude; and detecting a positive peak and a negative peak following the positive peak from the audio signal adjusted to a substantially constant amplitude by the equalizing means. A negative peak emphasizing means for subtracting the positive peak from the negative peak to generate an audio signal in which the negative peak is emphasized; and detecting and processing the audio signal in which the negative peak is emphasized by the negative peak emphasizing means. The voice section detection device according to claim 1, further comprising: a differential processing unit that performs differential processing on a subsequent signal.

4. The subtraction processing means calculates a positive envelope and a negative envelope of the audio signal from which noise has been removed by the preprocessing means, and calculates a difference between the positive envelope and the negative envelope. An envelope difference calculating means for calculating a certain envelope difference, and a subtraction processing threshold value calculating means for calculating a subtraction processing threshold value by multiplying the envelope difference calculated by the envelope difference calculating means by a predetermined predetermined coefficient multiple. When the amplitude of the audio signal from which the noise has been removed by the preprocessing unit is equal to or greater than the subtraction threshold calculated by the subtraction threshold calculation unit, a subtraction threshold subtraction unit that subtracts the subtraction threshold from the amplitude of the audio signal. The voice section detection device according to claim 3, comprising:

5. The method according to claim 1, wherein the subtraction processing means sets the amplitude of the audio signal to zero if the amplitude of the audio signal from which the noise has been removed by the preprocessing means is smaller than the subtraction processing threshold value calculated by the subtraction processing threshold value calculation means. 5. The voice section detection device according to claim 4, further comprising a zero setting means for setting the value to.

6. The uniform amplitude means calculates a positive envelope and a negative envelope of the audio signal from which noise has been removed by the preprocessing means, and calculates a difference between the positive envelope and the negative envelope. An envelope difference calculating means for calculating a certain envelope difference; a maximum envelope difference holding means for holding a maximum envelope difference among the envelope differences calculated before and now by the envelope difference calculating means; and 4. The voice section detection device according to claim 3, further comprising: a matched amplitude gain calculating means for calculating a matched amplitude gain by dividing the maximum envelope difference held by the envelope difference holding means by the current envelope difference.

7. A unit gain setting means for setting the matching amplitude gain to a unit gain when the matching amplitude gain calculated by the matching amplitude gain calculating means is equal to or greater than a predetermined threshold value. The voice section detection device according to claim 6, further comprising:

8. The gate signal generating means, wherein the gate signal is output when an average value of a predetermined number of continuous voice pitches extracted by the voice pitch extracting means is equal to or greater than a predetermined gate opening threshold value. The voice section detection device according to claim 1, further comprising a gate signal opening unit that opens the signal.

9. The gate signal generating means, when the gate signal is once opened by the gate signal opening means, is an average value of a predetermined number of continuous voice pitches extracted by the voice pitch extracting means. 9. The voice section detection device according to claim 8, further comprising a gate signal open maintaining means for maintaining said gate signal in an open state when is greater than or equal to a predetermined gate close threshold value smaller than said gate open threshold value.

10. The gate signal generating means, when an average value of a predetermined number of continuous voice pitches extracted by the voice pitch extracting means becomes less than the gate open threshold, outputs the gate signal. The voice section detection device according to claim 9, further comprising a gate signal closing unit that closes the gate signal.

11. A first predetermined period timer means for measuring a predetermined first predetermined period from a time point when the gate signal generated by the gate signal generation means is opened, and And a voice section signal opening means for opening the voice section signal retroactively for a predetermined second predetermined period from the time when the first predetermined period clocking by the first predetermined period clocking means ends. Item 3. The voice section detection device according to item 1 or 2.

12. A third predetermined period timer means for measuring a predetermined third predetermined period from a point in time when the gate signal generated by the gate signal generation means is closed. 12. The voice section detection device according to claim 11, further comprising voice section signal closing means for closing a voice section signal when the timing of the third predetermined period by the third predetermined time counting section is completed.

13. The voice section signal generation means, before the time measurement of the third predetermined time period by the third predetermined time counting means does not end, retroactively from the voice section signal opening means for the second predetermined time period. 13. The voice section detection device according to claim 12, further comprising voice section signal open state maintaining means for maintaining the voice section signal in an open state when the voice section signal is opened.