JPH0619498A

JPH0619498A - Speech detector

Info

Publication number: JPH0619498A
Application number: JP4173599A
Authority: JP
Inventors: Hideaki Yamada; 英明山田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-07-01
Filing date: 1992-07-01
Publication date: 1994-01-28

Abstract

PURPOSE:To provide a speech which gives no feeling of physical disorder by detecting the head and tail of the speech as to the speech detector which detects a speech included in a pulse-code modulated speech. CONSTITUTION:After the DC component of the pulse code modulated speech which is inputted is passed through a high-pass filter 1 which cuts of the DC component, the output of the high-pass filter 1 is applied to a zero-cross quantity counter 2, an input electric power averaging part 3, and a predicted gain variation rate part 4 to calculate levels, frame by frame, and the calculation result is applied to a decision part 5 to detect whether or not there is the speech. This speech detector is provided internally with a sample power monitor part 6 which monitors the speech not in frame units, but in sample units after a no-sound decision made once, thereby detecting the head and tail of the pulse code modulated speech.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、パルスコード変調音声
に含まれる音声を検出する音声検出器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice detector for detecting voice contained in pulse code modulated voice.

【０００２】近年の音声通信では、データの高能率化を
実現するために、音声の無音を圧縮し、有音データのみ
を伝送する方式が要求されている。このため、音声の始
まりである話頭と、音声の終わりである話尾を正確に検
出する音声検出器が必要になっている。In recent years, voice communication requires a method of compressing silence of voice and transmitting only voiced data in order to realize high efficiency of data. Therefore, there is a need for a voice detector that accurately detects the beginning of a voice and the end of a voice.

【０００３】[0003]

【従来の技術】以下において、図４と図５をもちいて従
来例を説明する。図４は従来の一実施例回路の構成を示
す図である。また、図５は従来の一実施例回路の各検出
タイミングを示す図であり、図４の動作を説明する図で
ある。2. Description of the Related Art A conventional example will be described below with reference to FIGS. FIG. 4 is a diagram showing the configuration of a conventional example circuit. Further, FIG. 5 is a diagram showing each detection timing of the conventional example circuit, and is a diagram for explaining the operation of FIG.

【０００４】図４において、１は高域フィルタであり、
入力するパルスコード変調音声（ＰＣＭ音声）に含まれ
る直流成分を遮断して、ｉ次高調波成分Ｓi ( ｉはサン
プル期間の番号、但し、ｉ＝１，２，・・・ｎ）の信号
を出力するものである。In FIG. 4, 1 is a high-pass filter,
By cutting off the DC component contained in the input pulse code modulated voice (PCM voice), the signal of the i-th harmonic component Si (i is the sample period number, where i = 1, 2, ... N) is output. It is what is output.

【０００５】２は零交差数カウンタであり、高域フィル
タ１から出力されるｉ次高調波成分Ｓi の１フレーム毎
の零交差数を算出するものである。なお、３は入力電力
平均部であり、高域フィルタ１から出力されるｉ次高調
波成分Ｓi の１フレーム毎の平均電力を算出するもので
ある。Reference numeral 2 denotes a zero-crossing counter, which calculates the number of zero-crossings for each frame of the i-th harmonic component Si output from the high-pass filter 1. The input power averaging unit 3 calculates the average power of the i-th harmonic component Si output from the high-pass filter 1 for each frame.

【０００６】また、４は予測利得変動率部であり、高域
フィルタ１から出力されるｉ次高調波成分Ｓi の１フレ
ーム毎の予測利得変動率、つまり全サンプル期間のΣＳ
i ×Ｓi の値を算出したものである。Reference numeral 4 denotes a predictive gain variation section, which is the predictive gain variation rate for each frame of the i-th harmonic component Si output from the high-pass filter 1, that is, ΣS during the entire sampling period.
The value of i × Si is calculated.

【０００７】また、５は判定部であり、零交差数カウン
タ２と入力電力平均部３および予測利得変動率部４の３
つの出力条件が揃えば、即ち、当該３つの値が所定のし
きい値以上に達すればＰＣＭ音声を有音と判定し、また
一つでも所定のしきい値に達しない時は、ＰＣＭ音声を
無音と判定するものである。Reference numeral 5 is a determination unit, which is a zero-crossing number counter 2, an input power averaging unit 3, and a prediction gain fluctuation rate unit 3.
If the three output conditions are met, that is, if the three values reach a predetermined threshold value or more, the PCM voice is judged to be voiced, and if even one does not reach the predetermined threshold value, the PCM voice is output. It is determined that there is no sound.

【０００８】そして、10は上記した高域フィルタ１と零
交差数カウンタ２と入力電力平均部３と予測利得変動率
部４および判定部５を備えた音声検出器である。図４と
図５に示すように、区間〜の領域〜ｔ1 では、当
該信号Ｓi のレベルは零レベルであり、領域ｔ1 〜に
おいては当該信号Ｓi は略しきい値であるとする。Reference numeral 10 is a voice detector provided with the high-pass filter 1, the zero-crossing number counter 2, the input power averaging unit 3, the predictive gain variation unit 4 and the judging unit 5. As shown in FIGS. 4 and 5, it is assumed that the level of the signal Si is zero level in the area ~ t1 of the section ~, and the signal Si is substantially the threshold value in the area t1 ~.

【０００９】この場合、１フレーム間では零交差数カウ
ンタ２と入力電力平均部３および予測利得変動率部４の
３つの出力条件の何れかがしきい値以下になり、従っ
て、１フレームの終わる点において当該ＰＣＭ音声は
無音と判定する。In this case, any one of the three output conditions of the zero-crossing number counter 2, the input power averaging unit 3 and the predictive gain variation unit 4 becomes equal to or less than the threshold value during one frame, and thus one frame ends. At this point, the PCM voice is determined to be silent.

【００１０】また、区間〜と区間〜では当該信
号Ｓi は或る一定レベルを有しているとする。この場
合、零交差数と平均電力および予測利得変動率は所定の
しきい値以上になり、従って、１フレームの終わる点
および点において当該ＰＣＭ音声は有音と判定する。In addition, it is assumed that the signal Si has a certain constant level between the sections 1 and 2. In this case, the number of zero crossings, the average power, and the predicted gain fluctuation rate are equal to or higher than the predetermined threshold values, and therefore, the PCM voice is determined to be voiced at the end points and points of one frame.

【００１１】さらに、区間〜の領域〜ｔ2 の間に
おいては当該信号Ｓi は略しきい値であり、ｔ2 〜の
間では当該信号Ｓi のレベルは零レベルであるとする。
この場合、１フレーム間では零交差数カウンタ２と入力
電力平均部３および予測利得変動率部４の３つの出力条
件の何れかがしきい値以下になり、従って、１フレーム
の終わる点において当該ＰＣＭ音声は無音と判定す
る。Further, it is assumed that the signal Si is substantially a threshold value during the region ~ t2 of the section ~, and the level of the signal Si is zero level during the period t2 ~.
In this case, one of the three output conditions of the zero-crossing counter 2, the input power averaging unit 3 and the predicted gain variation unit 4 becomes equal to or less than the threshold value during one frame, and therefore, at the end point of one frame, The PCM voice is determined to be silent.

【００１２】上記したように、従来の音声検出器10にお
ける当該ＰＣＭ音声の零交差数と平均電力および予測利
得変動率の算出は、１フレーム単位で算出されている。
このため、フレームの最後の方でＰＣＭ音声が有音にな
っても、判定部５ではＰＣＭ音声は有音と判断すること
ができない場合がある。As described above, the number of zero-crossings, the average power, and the predicted gain variation rate of the PCM voice in the conventional voice detector 10 are calculated for each frame.
Therefore, even if the PCM voice becomes voiced at the end of the frame, the determination unit 5 may not be able to determine that the PCM voice is voiced.

【００１３】[0013]

【発明が解決しようとする課題】従って、従来例の話頭
・話尾検出方式においては、話頭・話尾が判断出来ない
音声を聞くと違和感を生じるという課題がある。Therefore, in the conventional speech / tail detection method, there is a problem in that the user feels uncomfortable when he / she hears a speech whose speech / tail cannot be determined.

【００１４】本発明は、話頭・話尾を検出することによ
り、違和感を感じない音声を提供することを目的とす
る。It is an object of the present invention to provide a voice that does not give an uncomfortable feeling by detecting the head and tail.

【００１５】[0015]

【課題を解決するための手段】上記の目的を達成するた
め、図１に示すごとく、入力されるパルスコード変調音
声の直流成分を遮断する高域フィルタ１を通した後、該
高域フィルタ１の出力を零交差数カウンタ２と入力電力
平均部３と予測利得変動率部４に加えることによりフレ
ーム毎に各レベル算出を行い、当該算出結果を判定部５
に加えて音声の有無を検出する音声検出器において、該
音声検出器の中に、一度無音判定された後ではフレーム
単位で無くサンプル毎の音声を監視するサンプルパワー
監視部６を設け、パルスコード変調音声の話頭・話尾を
検出するように構成する。In order to achieve the above object, as shown in FIG. 1, after passing through a high-pass filter 1 for cutting off a DC component of an input pulse code modulated voice, the high-pass filter 1 is passed through. By adding the output of the above to the zero-crossing number counter 2, the input power averaging unit 3, and the prediction gain fluctuation rate unit 4 to calculate each level for each frame, and the calculation result is determined by the determination unit 5.
In addition to the above, in the voice detector for detecting the presence or absence of voice, the voice detector is provided with a sample power monitoring unit 6 for monitoring the voice for each sample instead of the frame unit after the sound is once determined to be a pulse code. It is configured to detect the head and tail of modulated voice.

【００１６】[0016]

【作用】本発明は図１および図２に示すごとく、サンプ
ルパワー監視部６において入力するＰＣＭ音声を有音を
判定した後、零交差数カウンタ２と入力電力平均部３と
予測利得変動率部４および判定部５をもちいて話頭にお
ける音声検出を行うようにしている。In the present invention, as shown in FIGS. 1 and 2, after the PCM voice input in the sample power monitoring section 6 is judged to be voiced, the zero crossing number counter 2, the input power averaging section 3 and the predicted gain variation rate section are provided. 4 and the determination unit 5 are used to detect the voice at the beginning of the talk.

【００１７】また、話尾の方も、零交差数カウンタ２と
入力電力平均部３と予測利得変動率部４による１フレー
ムの監視結果が無音と判断されても、サンプルパワー監
視部６が有音であれば有音と見なすようにしている。Also, in the tail end, even if the result of monitoring one frame by the zero-crossing number counter 2, the input power averaging unit 3 and the prediction gain variation unit 4 is judged to be silent, the sample power monitoring unit 6 is provided. Sounds are considered to be voiced.

【００１８】従って、当該有／無音判定結果を用いるこ
とにより、音声の違和感を削減した話頭・話尾の検出が
可能になる。Therefore, by using the presence / absence determination result, it is possible to detect the beginning / suffix with a reduced sense of discomfort in the voice.

【００１９】[0019]

【実施例】以下、図２〜図３により本発明の実施例を詳
細に説明する。図２は本発明の一実施例回路の構成を示
す図であり、非同期変調（ＡＴＭ）用高能率音声コーデ
ックの一実施例の回路を示している。また、図３は本発
明の一実施例回路の各検出タイミングを示す図であり、
図３は図２の動作を説明する図である。Embodiments of the present invention will be described in detail below with reference to FIGS. FIG. 2 is a diagram showing a configuration of an embodiment circuit of the present invention, and shows a circuit of an embodiment of a high efficiency voice codec for asynchronous modulation (ATM). FIG. 3 is a diagram showing each detection timing of the circuit according to the embodiment of the present invention.
FIG. 3 is a diagram for explaining the operation of FIG.

【００２０】図２において、10は音声検出器であり、従
来例と同一構成の高域フィルタ１と零交差数カウンタ２
と入力電力平均部３と予測利得変動率部４と判定部５お
よび本発明のサンプルパワー監視部６を備えている。ま
た、11は高能率音声符号部、12はＡＴＭ用セル生成部で
ある。In FIG. 2, reference numeral 10 is a voice detector, which has a high-pass filter 1 and a zero-crossing counter 2 having the same structure as the conventional example.
An input power averaging unit 3, a predicted gain variation unit 4, a determination unit 5, and a sample power monitoring unit 6 of the present invention are provided. Further, 11 is a high-efficiency voice coding unit, and 12 is an ATM cell generation unit.

【００２１】図３に示すように、区間〜において、
サンプルパワー監視部６を用いて１サンプル期間におけ
ＰＣＭ音声を監視する。もし、領域ｔ1 〜においてサ
ンプルパワー監視部６が当該ＰＣＭ音声を有音と判断し
た場合は、区間〜間の当該ＰＣＭ音声を判定部５に
蓄積し、次の区間〜において算出した零交差数と平
均電力および予測利得変動率の値が所定のしきい値以上
の有音であるならば、区間〜で蓄積したＰＣＭ音声
を音声検出器10より高能率音声符号部11に送出する。As shown in FIG.
The sample power monitoring unit 6 is used to monitor the PCM voice during one sample period. If the sample power monitoring unit 6 determines that the PCM voice is voiced in the region t1 to, the PCM voice in the interval to is accumulated in the determining unit 5 and the zero crossing number calculated in the next interval to. If the values of the average power and the predicted gain variation rate are voices equal to or higher than a predetermined threshold value, the PCM voice accumulated in the section ~ is sent from the voice detector 10 to the high-efficiency voice encoding unit 11.

【００２２】なお、区間〜と区間〜において
は、当該信号Ｓi は或るしきい値以上である。従って、
ＰＣＭ音声は音声検出器10において有音と判断する。ま
た、区間〜間の領域０〜ｔ2 では当該信号Ｓi は或
るしきい値以上であるが、その後の領域ｔ2 〜におい
ては当該信号Ｓi は零である。従って、音声検出器10は
領域０〜ｔ2 までを有音と判断し、ｔ2 以降においては
点で無音と判断する。In sections 1 and 2, the signal Si is above a certain threshold. Therefore,
The PCM voice is judged to be voiced by the voice detector 10. Further, the signal Si is equal to or more than a certain threshold in the areas 0 to t2 between the sections, but the signal Si is zero in the subsequent areas t2 to t2. Therefore, the voice detector 10 determines that there is sound in the areas 0 to t2, and after t2 that there is no sound.

【００２３】上記したように、サンプルパワー監視部６
でＰＣＭ音声を有音を判断した後、零交差数カウンタ２
と入力電力平均部３と予測利得変動率部４を使って確実
に音声の有音／無音の検出を行うようにする。As described above, the sample power monitoring unit 6
After judging the PCM voice with voice, the zero crossing number counter 2
By using the input power averaging unit 3 and the predictive gain variation unit 4, it is possible to reliably detect the presence / absence of voice.

【００２４】一方、話尾の方においても、１フレームで
見ると無音と判断されてもサンプルパワー監視部６が有
音であれば有音と見なすようにしている。さらに、高能
率音声符号部11では、音声検出器10で有音と判定された
データについてのみ高能率音声符号化を行う。そして、
ＡＴＭ用セル生成部12では高能率音声符号部11からの有
音符号化データについて、ＡＴＭ網へ出力するために固
定長パケットより成るセルに組立て、セル出力として送
出する。On the other hand, in the case of the tail, even if it is judged that there is no sound when viewed in one frame, if the sample power monitoring section 6 has a sound, it is regarded as a sound. Further, the high-efficiency voice encoding unit 11 performs high-efficiency voice encoding only on the data determined to be voiced by the voice detector 10. And
The ATM cell generation unit 12 assembles the voice coded data from the high-efficiency voice encoding unit 11 into cells having fixed length packets for output to the ATM network, and sends the cells as cell output.

【００２５】[0025]

【発明の効果】以上の説明から明らかなように本発明に
よれば、従来の音声検出器では無音と判断されたことも
あった話頭・話尾を検出することができ、音声を復号し
た際の違和感を減少することができるという効果を奏す
る。As is apparent from the above description, according to the present invention, it is possible to detect the head and tail of a speech which has been determined to be silent by the conventional speech detector, and when the speech is decoded. The effect of being able to reduce the discomfort of is exhibited.

[Brief description of drawings]

【図１】本発明の原理構成の回路を示す図である。FIG. 1 is a diagram showing a circuit of a principle configuration of the present invention.

【図２】本発明の一実施例回路の構成を示す図であ
る。FIG. 2 is a diagram showing a configuration of an embodiment circuit of the present invention.

【図３】本発明の一実施例回路の各検出タイミングを
示す図である。FIG. 3 is a diagram showing each detection timing of a circuit according to an embodiment of the present invention.

【図４】従来の一実施例回路の構成を示す図である。FIG. 4 is a diagram showing a configuration of a conventional example circuit.

【図５】従来の一実施例回路の各検出タイミングを示
す図である。FIG. 5 is a diagram showing each detection timing of a conventional example circuit.

[Explanation of symbols]

１は高域フィルタ２は零交差数カウンタ３は入力電力平均部４は予測利得変動率部５は判定部６はサンプルパワー監視部 10は音声検出器 1 is a high-pass filter 2 is a zero-crossing number counter 3 is an input power averaging unit 4 is a prediction gain fluctuation rate unit 5 is a determination unit 6 is a sample power monitoring unit 10 is a voice detector

Claims

[Claims]

1. A high-pass filter (1) for cutting off a DC component of an input pulse-code-modulated voice is passed through, and then the output of the high-pass filter (1) is compared with a zero-crossing counter (2) and an input power average. Section (3) and prediction gain fluctuation rate section (4) to calculate each level for each frame,
In addition to (5), in the voice detector (10) for detecting the presence or absence of voice, in the voice detector (10), a sample for monitoring the voice of each sample instead of the frame unit after the silence is once determined A voice detector characterized by being provided with a power monitoring unit (6) so as to detect the talk head and tail of a voice input.