JPS6146998A

JPS6146998A - Voice head detector

Info

Publication number: JPS6146998A
Application number: JP59168421A
Authority: JP
Inventors: 吉村　元一; 高木　琢美
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 1984-08-10
Filing date: 1984-08-10
Publication date: 1986-03-07

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は音声波形の時間変化に対する振幅変化に対して
サンプリングやその他の処理を行なうことにより音声区
間の始端に対応する時点を検出して出力する音声始端検
出装置に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention detects and outputs the time point corresponding to the start of a voice section by performing sampling and other processing on amplitude changes with respect to time changes in a voice waveform. The present invention relates to a voice start edge detection device.

〔従来技術］従来、この秤の音声始端検出装置は、まず音声波形の時
間変化に対する振幅の連続変化をサンプリングして複数
の値の列に変換し、次に、このそれぞれの値を２乗して
音声の短時間エネルギーの値の列に変換する。次に、こ
の短時間エネルギーの値の列にＪ５いて、音声の短時間
エネルギーの下限値に対応するように予め設定した閾値
を上回る区間を検出し、これを音声区間として始端を決
定していた。[Prior art] Conventionally, the voice start edge detection device of this scale first samples continuous changes in amplitude with respect to time changes in the voice waveform, converts it into a string of multiple values, and then squares each value. and convert it into a sequence of short-term energy values of the voice. Next, in this short-term energy value column J5, an interval exceeding a preset threshold corresponding to the lower limit of voice short-term energy was detected, and this was determined as the starting point of the voice interval. .

［発明が解決しようとする問題点］一般に音声の短時間エネルギーは一つの音声区間であっ
てもなめらかに増減するとは限らず、例えばバースト（
ｂｕｒｓｔ　）のように徐々に増加しつつあったものが
突然非常に低いレベルまで減少した後再び高いレベルま
で増加するといったことが起こり得る。[Problems to be solved by the invention] In general, the short-term energy of speech does not necessarily increase or decrease smoothly even in one speech section, for example, in bursts (
It is possible that something that was gradually increasing, such as burst), suddenly decreases to a very low level and then increases again to a high level.

このため、音声の短時間エネルギーが複数回にわたって
予め設定してあった閾値を上下することになり、どの区
間を音声区間の始端とするかを一義的に決定してしまう
と正確な音声区間の始端を検出することができない場合
もあり不都合であった。For this reason, the short-term energy of the voice will go up and down the preset threshold multiple times, and if you unambiguously decide which interval is the start of the voice interval, you will not be able to accurately determine the voice interval. This is inconvenient because it may not be possible to detect the starting end.

また、音声入力時における話者の動作や周囲の状況の変
化等により予期しない雑音が音声区間の前後にｄ大した
場合は、音声区間外であるのにもかかわらず短時間エネ
ルギーが閾値を越えることも起こり得るため、やはり正
確な音声区間の始端を検出するのに障害となっていた。In addition, if unexpected noise increases by d before and after the voice section due to the speaker's movement or changes in the surrounding situation during voice input, the energy will exceed the threshold for a short time even though it is outside the voice zone. This can also occur, which is an obstacle to accurately detecting the beginning of a voice section.

［発明の目的］本発明の目的は上記従来の問題点を解消し、バースト等
を含む音声のように短時間エネルギーに不規則な逆転が
生じたり、予期し得ない雑音が一時的に混入したりして
も、これらのエネルギーの変動の影響を極力低く抑え、
しかも簡潔な短時間の処理により音声区間の始端を検出
することのできる音声始端検出装置を提供することにあ
る。[Objective of the Invention] The object of the present invention is to solve the above-mentioned conventional problems, and to solve the problem of irregular reversal of short-term energy such as voice including burst, or temporary mixing of unpredictable noise. The impact of these energy fluctuations is kept to a minimum,
Moreover, it is an object of the present invention to provide a speech start end detection device that can detect the start end of a speech section through simple and short-time processing.

［問題点を解決するための手段］本発明においては、入力された音声の前記短時間エネル
ギーの値の列が所定時間以上連続して第１の閾値を越え
る高エネルギー区間が検出された場合、この高エネルギ
ー区間の存在する時点より以前において前記短時間エネ
ルギーの値の列が前記第１の閾値以下に設定された第２
の閾値を下回る時点を検出する。次に、この時点から所
定時間遡った時点までを始端検出区間として設定する。[Means for solving the problem] In the present invention, when a high energy section is detected in which the sequence of short-term energy values of the input voice exceeds the first threshold value continuously for a predetermined period of time or more, A second set in which the sequence of short-term energy values is set to be less than or equal to the first threshold before the high-energy section exists.
Detect the point in time when the value is below the threshold. Next, a period from this point to a point a predetermined time later is set as a starting edge detection section.

次に、前記短時間エネルギーの値の列が再び前記第２の
［Ｉを越えることなく前記第２の＠値以下に設定された
第３のｐＡ値を下回る時点の検出を前記始端検出区間内
において時間的に遡る方向へ向って行い、検出した時点
を音声区間の始端としている。Next, detect a point in time when the series of short-time energy values falls below the third pA value, which is set below the second @ value without exceeding the second [I, within the start detection interval. The detection is performed in a temporally backward direction, and the detected time is taken as the start of the voice section.

［作用］まず、第１のｒｉｉ値により前記高エネルギー区間の存
在を確認することにより、音声が通常有する所定レベル
以上のエネルギーの所定時間以上の持続がないような、
言い換えれば、雑音のレベルが一時的に高くなったのに
過ぎない区間は、音声区間ではないとして初めから除外
している。また、第２の閾値を再び越えることなく、第
２の閾値より低く設定した第３の閾値を下回る時点を音
声区間の始端とすることにより、音声区間内であるのに
もかかわらず、音声の短時間エネルギーが一時的に雑音
のみの区間と同じレベルまで低下した時点を始端とする
誤認を抑えている。また、始端検出区間を設定して始端
を検出するための処理を実行する範囲を限定しており、
音声区間外の雑音レベルの変動による音声区間の始端の
誤認を抑えている。[Operation] First, by confirming the existence of the high-energy section using the first rii value, it is possible to determine whether the energy of a voice normally having a predetermined level or higher does not persist for a predetermined time or longer.
In other words, a section where the noise level is only temporarily high is excluded from the beginning as not a speech section. In addition, by setting the point in time when the voice falls below the third threshold, which is set lower than the second threshold, as the start of the voice interval without exceeding the second threshold again, the voice can be stopped even though it is within the voice interval. This suppresses misidentification that occurs when the short-time energy temporarily drops to the same level as the noise-only section. In addition, the start end detection section is set to limit the range in which processing for detecting the start end is executed.
This suppresses misidentification of the beginning of a voice section due to fluctuations in the noise level outside the voice section.

［実施例］以下、第１図乃至第５図を参照して本発明の一実施例を
説明する。[Embodiment] An embodiment of the present invention will be described below with reference to FIGS. 1 to 5.

第２図において、話者の発音した音声が収音されるマイ
クロフォン１は、増幅器２を介してＡ／Ｄ変換器３に接
続されている。ここで、増幅器２はマイクロフォン１で
収音した音声のレベルを以後の処理に適するレベルに増
幅するものである。In FIG. 2, a microphone 1 that picks up the voice produced by a speaker is connected to an A/D converter 3 via an amplifier 2. Here, the amplifier 2 amplifies the level of the sound picked up by the microphone 1 to a level suitable for subsequent processing.

また、Ａ／Ｄ変換器３は中央処理装置（以下ＣＰＵと称
ず）４に接続されている。また、ＣＰＵ４には各処理の
プログラム等が書き込まれているＲＯＭ＜読み出し専用
メモリ）５及びＲＡＭ　＜＝み出し書き込み可能なメモ
リ）６が接続されている。Further, the A/D converter 3 is connected to a central processing unit (hereinafter referred to as CPU) 4. Further, connected to the CPU 4 are a ROM (read-only memory) 5 and a RAM (extra-writable memory) 6, in which programs and the like for each process are written.

ココテ、ＲＡ　Ｍ　６　ハ、Ａ／Ｄｖｌ換？Ａ３で＋ｔ
ンプリングされた音声波形の振幅値が次々に書き込まれ
ていく振幅バッファ６ａ、前記振幅値の２乗値が出き込
まれていく２乗バッフ７６ｂ、前記２乗値に基づいて算
出された短時間エネルギー値が書き込まれていくエネル
ギーバッファ６０、前記短時間エネルギー（１０が予め
設定されている第１の閾値を下回る２個の時点が書き込
まれる第１閾値バッファ６ｄ、及び前記２個の時点間よ
り時間的に遡る方向及び時間的に経過する方向のそれぞ
れに向って最初に第２の閾値を下まわるそれぞれの時点
が書き込まれる第２閾値バツフア６８等としてのワーキ
ング用としてはたらくようになっている。Kokote, RAM 6 Ha, A/Dvl exchange? +t in A3
An amplitude buffer 6a into which the amplitude values of the sampled audio waveforms are written one after another, a square buffer 76b into which the square values of the amplitude values are written, and a short time period calculated based on the square values. an energy buffer 60 into which energy values are written; a first threshold buffer 6d into which two time points at which the short-time energy (10) falls below a preset first threshold value are written; It functions as a second threshold buffer 68, etc., in which each point in time when the second threshold value is first lowered in both the backward and forward directions is written.

上記構成において、次にその動作を話者が日本語の°°
が°゛を発音した場合について説明する。話者が発音し
た“が″の音声は、まずマイクロフォン１で収音され増
幅器２で適正なレベルに増幅された後、Ａ／［）変換器
３でのサンプリングにより時間変化に対する音声波形の
振幅の変化を示す複数の値の列としてのデータに変換さ
れてＣＰＩＪ４へ入力される。ここで、第１図ステップ
２１で示すＡ／Ｄ変換は４ＫＨ２までの音声波形の情報
を得るためにサンプリングの定理よりサンプリング周波
数を８ＫＨｚとして行っている。従って、音声波形の振
幅データが１秒間に８０００個、言い換えれば１２５マ
イクロ秒毎に１個の振幅データが得られ、順次振幅バッ
ファ６ａへ徂き込んでいく。この各振幅データのうち隣
接するものをそれぞれ直線で結び、横軸に時間、縦軸に
電圧をとって図示すると第３図（ａ　）に示すような波
形図となる。また、このＡ／Ｄ変換は、マイクロフォン
１のスイッチがＯＮされたことをＣＰＵ４が検出するこ
とにより開始され、ＯＮされている間実行される。従っ
て、波形図には音声区間の前俄に雑音のみの区間が現わ
れている。この１２５マイクロ秒毎の振幅データのうち
ｑ番目の振幅データをＡ（Ｑ）で表ねりことにする。こ
こで、ｑは１からｎまでの整数であり、ｎは振幅データ
の総数である。また、ｑが１増加すると時間は１２５マ
イクロ秒経過する。In the above structure, the speaker then repeats the action in Japanese.
Let us explain the case where the person pronounces °゛. The sound of "ga" pronounced by the speaker is first collected by microphone 1, amplified to an appropriate level by amplifier 2, and then sampled by A/[) converter 3 to calculate the amplitude of the speech waveform over time. The data is converted into data as a string of a plurality of values indicating changes and input to the CPIJ4. Here, the A/D conversion shown in step 21 in FIG. 1 is performed at a sampling frequency of 8 KHz in accordance with the sampling theorem in order to obtain information on audio waveforms up to 4 KH2. Therefore, 8,000 pieces of audio waveform amplitude data are obtained per second, in other words, one piece of amplitude data is obtained every 125 microseconds, and is sequentially loaded into the amplitude buffer 6a. If adjacent amplitude data are connected with straight lines and illustrated with time on the horizontal axis and voltage on the vertical axis, a waveform diagram as shown in FIG. 3(a) will be obtained. Further, this A/D conversion is started when the CPU 4 detects that the switch of the microphone 1 is turned on, and is executed while the switch of the microphone 1 is turned on. Therefore, in the waveform diagram, a section containing only noise appears before the speech section. Let A(Q) represent the q-th amplitude data among the amplitude data every 125 microseconds. Here, q is an integer from 1 to n, and n is the total number of amplitude data. Furthermore, when q increases by 1, 125 microseconds elapse.

次にステップ２２へ進み、ステップ２１で１りらすれたｎ個の振幅データＡ（Ｑ）のそれぞれを２乗してｎ
個の２乗値Ｓ　（Ｑ　）の算出を行い、順次２乗バッフ
７６ｂへ書き込んでいく。これを式にすると、Ｓ　（Ｑ
　）　−Ａ　（Ｑ　）と表わされる。Next, the process proceeds to step 22, where each of the n pieces of amplitude data A(Q) shifted by 1 in step 21 is squared and n
The squared values S (Q ) are calculated and sequentially written to the squared buffer 76b. If we put this into a formula, S (Q
) −A (Q ).

次にステップ２３へ進み、２乗値５（Ｑ）の６４個毎、
すなわち、８ミリ秒毎の和をとり、その区間の中央の短
時間エネルギー１ａＥ（ｊ＞としてエネルギーバッファ
６Ｃへ順次口き込んでいく。Next, proceed to step 23, and every 64 squared values 5 (Q),
That is, the sum is calculated every 8 milliseconds, and the short-time energy 1aE(j>) at the center of the interval is sequentially filled into the energy buffer 6C.

これを式にすると、と表わされる。ここで、ｊは１から（ｎ／６４）までの
整数をとる。また、前記各短時間エネルギー値Ｅ（ｊ）
の時点に対応させて前記エネルギーバッフ７６Ｃのアド
レスを設定し、前記短時間エネルギー値Ｅ（ｊ＞を指定
すれば対応する時点がわかるようになっている。横軸に
時間をとり、縦軸に前記短時間エネルギーｉ直Ｅ（ｊ）
をその最大値を暴準にして表示すると第３図（ｂ）のよ
うになる。同図において、前後の低レベル区間は音声以
外の周囲の雑音等の区間であり、もっと時間的に遡った
り又は時間経過方向へ進んだりして図示されていない区
間においても音声が入力されていなければ、Ｉｆレベル
の変動により多少の上下はあるもののほぼ同程度の低レ
ベルがそのまま継続している。When this is expressed as a formula, it is expressed as follows. Here, j takes an integer from 1 to (n/64). In addition, each short-time energy value E(j)
The address of the energy buffer 76C is set corresponding to the time point, and the corresponding time point can be found by specifying the short-time energy value E (j>.The horizontal axis represents time, and the vertical axis represents the time point. The short-time energy idirectE(j)
When expressed with its maximum value as the absolute standard, it becomes as shown in Fig. 3(b). In the figure, the low-level sections before and after are sections of ambient noise other than voice, and it is necessary that voice is input even in sections that are not shown, going back in time or moving forward in time. For example, although there are some fluctuations due to fluctuations in the If level, it continues to remain at approximately the same low level.

次にステップ２４へ進み、ステップ２３で算出した短時
間エネルギー値Ｅ（ｊ＞の列において、予め設定しであ
る第１の閾値Ｔ１−０．１を所定時間以上、本実施例で
は１２０ミリ秒以上上回っている区間、ずなわら、第３
図（ｂ）において区間ｔ１〜ｔ２の検出を行い、この区
間を高エネルギー区間どして２個の時点ｔｌ　、　ｔ２
を第１１叫値バツフア６ｄへ書き込む。ここで、第３図
（ｂ　）に破線で示しである第１の閾値Ｔ１は、一般に
各音声が通常上記所定時間以上維持する前記高エネルギ
ー区間における短時間エネルギー値の下限に対応するｉ
直となっている。Next, the process proceeds to step 24, and in the column of the short-time energy value E(j> calculated in step 23, the preset first threshold T1-0.1 is set for a predetermined time or longer, 120 milliseconds in this example). Sections exceeding the above, Zunawara, 3rd
In Figure (b), the interval t1 to t2 is detected, and this interval is classified as a high energy interval and two time points tl and t2 are detected.
is written into the 11th shout value buffer 6d. Here, the first threshold value T1, which is indicated by a broken line in FIG.
It is direct.

次にステップ２５へ進み、時点ｔ１から時間的に遡って
行き最初に予め設定しである第２のｒｌｌ値Ｔ２−０．
００４を下回る時点、ずなわら、音声区間の始端検出開
始時点ｔ３、及びｔ２から時間経過方向へ行き前記第２
の閾値を最初に下回る時点、ずなわら音声区間の終端検
出開始時点ｔ４を第２閾値バツフア６０へ古き込む。こ
こで、第３図（ｂ）に破線で示しである第２の閾値Ｔ２
は、音声区間外においては雑音のレベルが多少変動して
もその短時間エネルギーがこの第２のＷ４値Ｔ２を上回
ることのないように、音声区間の短時間エネルギーの下
限値よりやや高く設定しである。Next, the process proceeds to step 25, and the process goes back in time from the time point t1 to obtain the second rll value T2-0.
004, the start point detection start time t3 of the voice section, and the second
The time point t4 at which end detection of the Zunawara voice section starts is stored in the second threshold buffer 60. Here, the second threshold T2 shown by the broken line in FIG. 3(b)
is set slightly higher than the lower limit of the short-time energy of the voice section so that the short-time energy does not exceed this second W4 value T2 even if the noise level changes slightly outside the voice section. It is.

次に、ステップ２６へ進み、前記始端検出開始時点ｔ３
から時間的に遡る方向へ所定時間、本実施例では１４０
ミリ秒だけ遡る範囲に限定し、遡る方向に向って以後再
び前記第２の閾値を上回ることなく予め設定され第３図
（ｂ）に破線で示した第３の閾値を下回る時点ｔ５を検
出して時点ｔ５に対応するデータ信号を音声区間の始端
として出力する。ここにおける所定時間は、その時間内
に必ず音声区間の始端を含むことができるように長めに
設定しである。また、第３の閾値は音声区間の短時間エ
ネルギーの下限値に対応している。Next, the process proceeds to step 26, at the start point detection start time t3.
A predetermined period of time in the direction backward in time from 140 in this example.
The range is limited to go back only milliseconds, and in the backward direction, a time point t5 is detected when the second threshold value is not exceeded again and the time point t5 falls below a third threshold value set in advance and shown by a broken line in FIG. 3(b). Then, the data signal corresponding to time t5 is outputted as the start of the voice section. The predetermined time here is set to be long enough to ensure that the beginning of the voice section is included within the predetermined time. Further, the third threshold value corresponds to the lower limit value of the short-time energy of the voice section.

次にステップ２７へ進み、終端検出開始時点ｔ４から時
間経過方向へ所定時間、本実施例では１４０ミリ秒だけ
経過する範囲に限定し、経過する方向に向って以後再び
前記第２の１３１１値を上回ることなく前記第３の閾値
を下回る時点（６を検出して時点ｔ６に対応するデータ
信号を音声区間の終端として出力する。ここにおける所
定時間は、ステップ２６での始端検出の場合と同様に、
その時間内に必ず音声区間の終端を含むことができるよ
うに長めに設定しである。このようにして音声区間■１
（ｔ５〜ｔ６）が検出され、この区間は第３図（ａ　）
に示されている時間と音声の振幅との関係を表わす図か
ら判断される音声区間ともほぼ一致している。Next, the process proceeds to step 27, where the range is limited to a predetermined period of time, 140 milliseconds in this embodiment, from the end detection start point t4 in the time elapsed direction, and the second 1311 value is thereafter set again in the elapsed direction. The data signal corresponding to the time point t6 is detected as the end of the voice section when the time point (6) falls below the third threshold value without exceeding the third threshold value. ,
It is set to be long enough to ensure that the end of the voice section is included within that time. In this way, the audio section ■1
(t5-t6) is detected, and this section is shown in Figure 3(a).
This almost coincides with the speech section determined from the diagram showing the relationship between time and speech amplitude shown in FIG.

また、日本語の“ば“及び“ば”と発音された音声に対
して上述したのと同様な処理を実行した結果を第４図及
び第５図に示す。これらの例においても第３図に示した
場合と同様に音声区間■２及びｖ３が検出されている。Further, FIGS. 4 and 5 show the results of performing the same processing as described above on Japanese sounds pronounced as "ba" and "ba". In these examples as well, voice sections ■2 and v3 are detected as in the case shown in FIG.

［発明の効果］以上に詳述した通り、本発明に係る音声始端検出装置は
、音声のエネルギーを所定時間毎の代表噴で表わすこと
により得られた短時間エネルギーが、例えばバーストの
ように音声区間内において一時的に雑音のみの区間と同
じレベルまで低下するようなことがあっても、この影響
による音声区間の始端の誤認を簡単な構成により抑える
ことができる。また、音声が通常布する所定レベル以上
のエネルギーが所定時間以上持続しないような、言い換
えれば、雑音のレベルが一時的に高くなったのに過ぎな
いような区間は音声区間ではないとして初めから各処理
の対象から除外しており、また、音声区間の始端を検出
する処理の実行範囲を限定している。従って、音声区間
とは無関係な区間に対してまでも音声区間の始端検出の
処理を実行して処理時間を増大させてしまうことを防止
できるばかりでなく、音声区間外におりる雑音レベルの
変動の影響を受けて音声区間の始端検出に誤りが発生す
ることの防止もできる。また、各処理を実行する際に必
要な第１乃至第３の閾値は、すべて、音声の一般的な特
性を考慮して予め設定されでいるため、音声区間の９！
Ｕ端検出を簡潔な処理により行うことが可能であり、処
理時間の増大を抑えることができ、音声認識の一過程と
して非常に有効である。[Effects of the Invention] As described in detail above, the voice start edge detection device according to the present invention is capable of detecting short-term energy obtained by representing the voice energy as a representative burst at each predetermined time, such as a burst. Even if the noise level temporarily drops to the same level as the noise-only section within the section, misidentification of the start of the speech section due to this effect can be suppressed with a simple configuration. In addition, sections where the energy above a predetermined level normally distributed by speech does not last for more than a predetermined time, in other words, where the noise level is only temporarily high, are not considered to be speech sections and are treated as such from the beginning. It is excluded from processing, and the execution range of the processing for detecting the start end of a voice section is limited. Therefore, it is possible not only to prevent an increase in processing time due to the process of detecting the start of a voice section even for sections unrelated to the voice section, but also to prevent fluctuations in the noise level that fall outside the voice section. It is also possible to prevent an error from occurring in detecting the start of a voice section due to the influence of. In addition, the first to third thresholds necessary for executing each process are all set in advance taking into consideration the general characteristics of voice, so the 9!
U-end detection can be performed through simple processing, and an increase in processing time can be suppressed, making it very effective as a step in speech recognition.

[Brief explanation of the drawing]

第１図は本発明の一実施例のフローチャート、第２図は
そのブロック図、第３図（ａ　）及び（ｂ）は“が″と
発音した音声に各処理を実行した結果を示り図、第４図
（ａ　）及び（ｂ）は°°ば″と発音した音声に各５１
ａ理を実行した結果を示す図、第５図＜ａ　＞及び（ｂ
　）は“ば″と発音した音声に各処理を実行した結果を
示ず図である。図中、１はマイクロフォン、２は増幅器、３はＡ／Ｄ変
換器、４はＣＰＵ、５はＲＯＭ、６はＲＡＭである。Fig. 1 is a flowchart of an embodiment of the present invention, Fig. 2 is a block diagram thereof, and Figs. 3 (a) and (b) show the results of performing each process on the voice pronounced "ga". , Figures 4(a) and (b) each have 51 syllables for the sound pronounced ``°°ba''.
A diagram showing the results of executing the process a, Figure 5 <a> and (b)
) is a diagram that does not show the results of performing each process on the voice pronounced "ba". In the figure, 1 is a microphone, 2 is an amplifier, 3 is an A/D converter, 4 is a CPU, 5 is a ROM, and 6 is a RAM.

Claims

[Scope of Claims] 1. Sampling means for sampling amplitude changes with respect to temporal changes in audio and converting the samples into a string of a plurality of values; and square conversion means for converting the plurality of values into a string of values obtained by squaring each of the plurality of values. and representative value converting means for dividing the row of squared values into a plurality of groups and converting the row of representative values into a row of representative values for each group; a high-energy interval detection means for detecting a high-energy interval that exceeds the first threshold; means for detecting, means for setting a start detection interval from the point in time when the row of representative values falls below the second threshold value to the point in time a predetermined time back; and when the row of representative values exceeds the second threshold value again. Detection of the point in time when the value falls below the third threshold set to be less than or equal to the second threshold is performed in the backward direction in time within the start edge detection section, and the data signal at the detected point is detected in the voice section. A voice start edge detection device comprising: a start edge output means for outputting the start edge of the voice.