JPH0619498A - Speech detector - Google Patents

Speech detector

Info

Publication number
JPH0619498A
JPH0619498A JP4173599A JP17359992A JPH0619498A JP H0619498 A JPH0619498 A JP H0619498A JP 4173599 A JP4173599 A JP 4173599A JP 17359992 A JP17359992 A JP 17359992A JP H0619498 A JPH0619498 A JP H0619498A
Authority
JP
Japan
Prior art keywords
speech
voice
frame
unit
pass filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
JP4173599A
Other languages
Japanese (ja)
Inventor
Hideaki Yamada
英明 山田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP4173599A priority Critical patent/JPH0619498A/en
Publication of JPH0619498A publication Critical patent/JPH0619498A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

PURPOSE:To provide a speech which gives no feeling of physical disorder by detecting the head and tail of the speech as to the speech detector which detects a speech included in a pulse-code modulated speech. CONSTITUTION:After the DC component of the pulse code modulated speech which is inputted is passed through a high-pass filter 1 which cuts of the DC component, the output of the high-pass filter 1 is applied to a zero-cross quantity counter 2, an input electric power averaging part 3, and a predicted gain variation rate part 4 to calculate levels, frame by frame, and the calculation result is applied to a decision part 5 to detect whether or not there is the speech. This speech detector is provided internally with a sample power monitor part 6 which monitors the speech not in frame units, but in sample units after a no-sound decision made once, thereby detecting the head and tail of the pulse code modulated speech.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、パルスコード変調音声
に含まれる音声を検出する音声検出器に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice detector for detecting voice contained in pulse code modulated voice.

【0002】近年の音声通信では、データの高能率化を
実現するために、音声の無音を圧縮し、有音データのみ
を伝送する方式が要求されている。このため、音声の始
まりである話頭と、音声の終わりである話尾を正確に検
出する音声検出器が必要になっている。
In recent years, voice communication requires a method of compressing silence of voice and transmitting only voiced data in order to realize high efficiency of data. Therefore, there is a need for a voice detector that accurately detects the beginning of a voice and the end of a voice.

【0003】[0003]

【従来の技術】以下において、図4と図5をもちいて従
来例を説明する。図4は従来の一実施例回路の構成を示
す図である。また、図5は従来の一実施例回路の各検出
タイミングを示す図であり、図4の動作を説明する図で
ある。
2. Description of the Related Art A conventional example will be described below with reference to FIGS. FIG. 4 is a diagram showing the configuration of a conventional example circuit. Further, FIG. 5 is a diagram showing each detection timing of the conventional example circuit, and is a diagram for explaining the operation of FIG.

【0004】図4において、1は高域フィルタであり、
入力するパルスコード変調音声(PCM音声)に含まれ
る直流成分を遮断して、i次高調波成分Si ( iはサン
プル期間の番号、但し、i=1,2,・・・n)の信号
を出力するものである。
In FIG. 4, 1 is a high-pass filter,
By cutting off the DC component contained in the input pulse code modulated voice (PCM voice), the signal of the i-th harmonic component Si (i is the sample period number, where i = 1, 2, ... N) is output. It is what is output.

【0005】2は零交差数カウンタであり、高域フィル
タ1から出力されるi次高調波成分Si の1フレーム毎
の零交差数を算出するものである。なお、3は入力電力
平均部であり、高域フィルタ1から出力されるi次高調
波成分Si の1フレーム毎の平均電力を算出するもので
ある。
Reference numeral 2 denotes a zero-crossing counter, which calculates the number of zero-crossings for each frame of the i-th harmonic component Si output from the high-pass filter 1. The input power averaging unit 3 calculates the average power of the i-th harmonic component Si output from the high-pass filter 1 for each frame.

【0006】また、4は予測利得変動率部であり、高域
フィルタ1から出力されるi次高調波成分Si の1フレ
ーム毎の予測利得変動率、つまり全サンプル期間のΣS
i ×Si の値を算出したものである。
Reference numeral 4 denotes a predictive gain variation section, which is the predictive gain variation rate for each frame of the i-th harmonic component Si output from the high-pass filter 1, that is, ΣS during the entire sampling period.
The value of i × Si is calculated.

【0007】また、5は判定部であり、零交差数カウン
タ2と入力電力平均部3および予測利得変動率部4の3
つの出力条件が揃えば、即ち、当該3つの値が所定のし
きい値以上に達すればPCM音声を有音と判定し、また
一つでも所定のしきい値に達しない時は、PCM音声を
無音と判定するものである。
Reference numeral 5 is a determination unit, which is a zero-crossing number counter 2, an input power averaging unit 3, and a prediction gain fluctuation rate unit 3.
If the three output conditions are met, that is, if the three values reach a predetermined threshold value or more, the PCM voice is judged to be voiced, and if even one does not reach the predetermined threshold value, the PCM voice is output. It is determined that there is no sound.

【0008】そして、10は上記した高域フィルタ1と零
交差数カウンタ2と入力電力平均部3と予測利得変動率
部4および判定部5を備えた音声検出器である。図4と
図5に示すように、区間〜の領域〜t1 では、当
該信号Si のレベルは零レベルであり、領域t1 〜に
おいては当該信号Si は略しきい値であるとする。
Reference numeral 10 is a voice detector provided with the high-pass filter 1, the zero-crossing number counter 2, the input power averaging unit 3, the predictive gain variation unit 4 and the judging unit 5. As shown in FIGS. 4 and 5, it is assumed that the level of the signal Si is zero level in the area ~ t1 of the section ~, and the signal Si is substantially the threshold value in the area t1 ~.

【0009】この場合、1フレーム間では零交差数カウ
ンタ2と入力電力平均部3および予測利得変動率部4の
3つの出力条件の何れかがしきい値以下になり、従っ
て、1フレームの終わる点において当該PCM音声は
無音と判定する。
In this case, any one of the three output conditions of the zero-crossing number counter 2, the input power averaging unit 3 and the predictive gain variation unit 4 becomes equal to or less than the threshold value during one frame, and thus one frame ends. At this point, the PCM voice is determined to be silent.

【0010】また、区間〜と区間〜では当該信
号Si は或る一定レベルを有しているとする。この場
合、零交差数と平均電力および予測利得変動率は所定の
しきい値以上になり、従って、1フレームの終わる点
および点において当該PCM音声は有音と判定する。
In addition, it is assumed that the signal Si has a certain constant level between the sections 1 and 2. In this case, the number of zero crossings, the average power, and the predicted gain fluctuation rate are equal to or higher than the predetermined threshold values, and therefore, the PCM voice is determined to be voiced at the end points and points of one frame.

【0011】さらに、区間〜の領域〜t2 の間に
おいては当該信号Si は略しきい値であり、t2 〜の
間では当該信号Si のレベルは零レベルであるとする。
この場合、1フレーム間では零交差数カウンタ2と入力
電力平均部3および予測利得変動率部4の3つの出力条
件の何れかがしきい値以下になり、従って、1フレーム
の終わる点において当該PCM音声は無音と判定す
る。
Further, it is assumed that the signal Si is substantially a threshold value during the region ~ t2 of the section ~, and the level of the signal Si is zero level during the period t2 ~.
In this case, one of the three output conditions of the zero-crossing counter 2, the input power averaging unit 3 and the predicted gain variation unit 4 becomes equal to or less than the threshold value during one frame, and therefore, at the end point of one frame, The PCM voice is determined to be silent.

【0012】上記したように、従来の音声検出器10にお
ける当該PCM音声の零交差数と平均電力および予測利
得変動率の算出は、1フレーム単位で算出されている。
このため、フレームの最後の方でPCM音声が有音にな
っても、判定部5ではPCM音声は有音と判断すること
ができない場合がある。
As described above, the number of zero-crossings, the average power, and the predicted gain variation rate of the PCM voice in the conventional voice detector 10 are calculated for each frame.
Therefore, even if the PCM voice becomes voiced at the end of the frame, the determination unit 5 may not be able to determine that the PCM voice is voiced.

【0013】[0013]

【発明が解決しようとする課題】従って、従来例の話頭
・話尾検出方式においては、話頭・話尾が判断出来ない
音声を聞くと違和感を生じるという課題がある。
Therefore, in the conventional speech / tail detection method, there is a problem in that the user feels uncomfortable when he / she hears a speech whose speech / tail cannot be determined.

【0014】本発明は、話頭・話尾を検出することによ
り、違和感を感じない音声を提供することを目的とす
る。
It is an object of the present invention to provide a voice that does not give an uncomfortable feeling by detecting the head and tail.

【0015】[0015]

【課題を解決するための手段】上記の目的を達成するた
め、図1に示すごとく、入力されるパルスコード変調音
声の直流成分を遮断する高域フィルタ1を通した後、該
高域フィルタ1の出力を零交差数カウンタ2と入力電力
平均部3と予測利得変動率部4に加えることによりフレ
ーム毎に各レベル算出を行い、当該算出結果を判定部5
に加えて音声の有無を検出する音声検出器において、該
音声検出器の中に、一度無音判定された後ではフレーム
単位で無くサンプル毎の音声を監視するサンプルパワー
監視部6を設け、パルスコード変調音声の話頭・話尾を
検出するように構成する。
In order to achieve the above object, as shown in FIG. 1, after passing through a high-pass filter 1 for cutting off a DC component of an input pulse code modulated voice, the high-pass filter 1 is passed through. By adding the output of the above to the zero-crossing number counter 2, the input power averaging unit 3, and the prediction gain fluctuation rate unit 4 to calculate each level for each frame, and the calculation result is determined by the determination unit 5.
In addition to the above, in the voice detector for detecting the presence or absence of voice, the voice detector is provided with a sample power monitoring unit 6 for monitoring the voice for each sample instead of the frame unit after the sound is once determined to be a pulse code. It is configured to detect the head and tail of modulated voice.

【0016】[0016]

【作用】本発明は図1および図2に示すごとく、サンプ
ルパワー監視部6において入力するPCM音声を有音を
判定した後、零交差数カウンタ2と入力電力平均部3と
予測利得変動率部4および判定部5をもちいて話頭にお
ける音声検出を行うようにしている。
In the present invention, as shown in FIGS. 1 and 2, after the PCM voice input in the sample power monitoring section 6 is judged to be voiced, the zero crossing number counter 2, the input power averaging section 3 and the predicted gain variation rate section are provided. 4 and the determination unit 5 are used to detect the voice at the beginning of the talk.

【0017】また、話尾の方も、零交差数カウンタ2と
入力電力平均部3と予測利得変動率部4による1フレー
ムの監視結果が無音と判断されても、サンプルパワー監
視部6が有音であれば有音と見なすようにしている。
Also, in the tail end, even if the result of monitoring one frame by the zero-crossing number counter 2, the input power averaging unit 3 and the prediction gain variation unit 4 is judged to be silent, the sample power monitoring unit 6 is provided. Sounds are considered to be voiced.

【0018】従って、当該有/無音判定結果を用いるこ
とにより、音声の違和感を削減した話頭・話尾の検出が
可能になる。
Therefore, by using the presence / absence determination result, it is possible to detect the beginning / suffix with a reduced sense of discomfort in the voice.

【0019】[0019]

【実施例】以下、図2〜図3により本発明の実施例を詳
細に説明する。図2は本発明の一実施例回路の構成を示
す図であり、非同期変調(ATM)用高能率音声コーデ
ックの一実施例の回路を示している。また、図3は本発
明の一実施例回路の各検出タイミングを示す図であり、
図3は図2の動作を説明する図である。
Embodiments of the present invention will be described in detail below with reference to FIGS. FIG. 2 is a diagram showing a configuration of an embodiment circuit of the present invention, and shows a circuit of an embodiment of a high efficiency voice codec for asynchronous modulation (ATM). FIG. 3 is a diagram showing each detection timing of the circuit according to the embodiment of the present invention.
FIG. 3 is a diagram for explaining the operation of FIG.

【0020】図2において、10は音声検出器であり、従
来例と同一構成の高域フィルタ1と零交差数カウンタ2
と入力電力平均部3と予測利得変動率部4と判定部5お
よび本発明のサンプルパワー監視部6を備えている。ま
た、11は高能率音声符号部、12はATM用セル生成部で
ある。
In FIG. 2, reference numeral 10 is a voice detector, which has a high-pass filter 1 and a zero-crossing counter 2 having the same structure as the conventional example.
An input power averaging unit 3, a predicted gain variation unit 4, a determination unit 5, and a sample power monitoring unit 6 of the present invention are provided. Further, 11 is a high-efficiency voice coding unit, and 12 is an ATM cell generation unit.

【0021】図3に示すように、区間〜において、
サンプルパワー監視部6を用いて1サンプル期間におけ
PCM音声を監視する。もし、領域t1 〜においてサ
ンプルパワー監視部6が当該PCM音声を有音と判断し
た場合は、区間〜間の当該PCM音声を判定部5に
蓄積し、次の区間〜において算出した零交差数と平
均電力および予測利得変動率の値が所定のしきい値以上
の有音であるならば、区間〜で蓄積したPCM音声
を音声検出器10より高能率音声符号部11に送出する。
As shown in FIG.
The sample power monitoring unit 6 is used to monitor the PCM voice during one sample period. If the sample power monitoring unit 6 determines that the PCM voice is voiced in the region t1 to, the PCM voice in the interval to is accumulated in the determining unit 5 and the zero crossing number calculated in the next interval to. If the values of the average power and the predicted gain variation rate are voices equal to or higher than a predetermined threshold value, the PCM voice accumulated in the section ~ is sent from the voice detector 10 to the high-efficiency voice encoding unit 11.

【0022】なお、区間〜と区間〜において
は、当該信号Si は或るしきい値以上である。従って、
PCM音声は音声検出器10において有音と判断する。ま
た、区間〜間の領域0〜t2 では当該信号Si は或
るしきい値以上であるが、その後の領域t2 〜におい
ては当該信号Si は零である。従って、音声検出器10は
領域0〜t2 までを有音と判断し、t2 以降においては
点で無音と判断する。
In sections 1 and 2, the signal Si is above a certain threshold. Therefore,
The PCM voice is judged to be voiced by the voice detector 10. Further, the signal Si is equal to or more than a certain threshold in the areas 0 to t2 between the sections, but the signal Si is zero in the subsequent areas t2 to t2. Therefore, the voice detector 10 determines that there is sound in the areas 0 to t2, and after t2 that there is no sound.

【0023】上記したように、サンプルパワー監視部6
でPCM音声を有音を判断した後、零交差数カウンタ2
と入力電力平均部3と予測利得変動率部4を使って確実
に音声の有音/無音の検出を行うようにする。
As described above, the sample power monitoring unit 6
After judging the PCM voice with voice, the zero crossing number counter 2
By using the input power averaging unit 3 and the predictive gain variation unit 4, it is possible to reliably detect the presence / absence of voice.

【0024】一方、話尾の方においても、1フレームで
見ると無音と判断されてもサンプルパワー監視部6が有
音であれば有音と見なすようにしている。さらに、高能
率音声符号部11では、音声検出器10で有音と判定された
データについてのみ高能率音声符号化を行う。そして、
ATM用セル生成部12では高能率音声符号部11からの有
音符号化データについて、ATM網へ出力するために固
定長パケットより成るセルに組立て、セル出力として送
出する。
On the other hand, in the case of the tail, even if it is judged that there is no sound when viewed in one frame, if the sample power monitoring section 6 has a sound, it is regarded as a sound. Further, the high-efficiency voice encoding unit 11 performs high-efficiency voice encoding only on the data determined to be voiced by the voice detector 10. And
The ATM cell generation unit 12 assembles the voice coded data from the high-efficiency voice encoding unit 11 into cells having fixed length packets for output to the ATM network, and sends the cells as cell output.

【0025】[0025]

【発明の効果】以上の説明から明らかなように本発明に
よれば、従来の音声検出器では無音と判断されたことも
あった話頭・話尾を検出することができ、音声を復号し
た際の違和感を減少することができるという効果を奏す
る。
As is apparent from the above description, according to the present invention, it is possible to detect the head and tail of a speech which has been determined to be silent by the conventional speech detector, and when the speech is decoded. The effect of being able to reduce the discomfort of is exhibited.

【図面の簡単な説明】[Brief description of drawings]

【図1】 本発明の原理構成の回路を示す図である。FIG. 1 is a diagram showing a circuit of a principle configuration of the present invention.

【図2】 本発明の一実施例回路の構成を示す図であ
る。
FIG. 2 is a diagram showing a configuration of an embodiment circuit of the present invention.

【図3】 本発明の一実施例回路の各検出タイミングを
示す図である。
FIG. 3 is a diagram showing each detection timing of a circuit according to an embodiment of the present invention.

【図4】 従来の一実施例回路の構成を示す図である。FIG. 4 is a diagram showing a configuration of a conventional example circuit.

【図5】 従来の一実施例回路の各検出タイミングを示
す図である。
FIG. 5 is a diagram showing each detection timing of a conventional example circuit.

【符号の説明】[Explanation of symbols]

1は高域フィルタ 2は零交差数カウンタ 3は入力電力平均部 4は予測利得変動率部 5は判定部 6はサンプルパワー監視部 10は音声検出器 1 is a high-pass filter 2 is a zero-crossing number counter 3 is an input power averaging unit 4 is a prediction gain fluctuation rate unit 5 is a determination unit 6 is a sample power monitoring unit 10 is a voice detector

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力されるパルスコード変調音声の直流
成分を遮断する高域フィルタ(1) を通した後、該高域フ
ィルタ(1) の出力を零交差数カウンタ(2) と入力電力平
均部(3) と予測利得変動率部(4) に加えることによりフ
レーム毎に各レベル算出を行い、当該算出結果を判定部
(5) に加えて音声の有無を検出する音声検出器(10)にお
いて、 該音声検出器(10)の中に、一度無音判定された後ではフ
レーム単位で無くサンプル毎の音声を監視するサンプル
パワー監視部(6) を設け、 音声入力の話頭・話尾を検出するようにしたことを特徴
とする音声検出器。
1. A high-pass filter (1) for cutting off a DC component of an input pulse-code-modulated voice is passed through, and then the output of the high-pass filter (1) is compared with a zero-crossing counter (2) and an input power average. Section (3) and prediction gain fluctuation rate section (4) to calculate each level for each frame,
In addition to (5), in the voice detector (10) for detecting the presence or absence of voice, in the voice detector (10), a sample for monitoring the voice of each sample instead of the frame unit after the silence is once determined A voice detector characterized by being provided with a power monitoring unit (6) so as to detect the talk head and tail of a voice input.
JP4173599A 1992-07-01 1992-07-01 Speech detector Withdrawn JPH0619498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4173599A JPH0619498A (en) 1992-07-01 1992-07-01 Speech detector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4173599A JPH0619498A (en) 1992-07-01 1992-07-01 Speech detector

Publications (1)

Publication Number Publication Date
JPH0619498A true JPH0619498A (en) 1994-01-28

Family

ID=15963593

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4173599A Withdrawn JPH0619498A (en) 1992-07-01 1992-07-01 Speech detector

Country Status (1)

Country Link
JP (1) JPH0619498A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000132177A (en) * 1998-10-20 2000-05-12 Canon Inc Device and method for processing voice

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000132177A (en) * 1998-10-20 2000-05-12 Canon Inc Device and method for processing voice

Similar Documents

Publication Publication Date Title
JP4146489B2 (en) Audio packet reproduction method, audio packet reproduction apparatus, audio packet reproduction program, and recording medium
JP4851578B2 (en) Method and apparatus for performing reduced rate, variable rate speech analysis synthesis
KR100455225B1 (en) Method and apparatus for adding hangover frames to a plurality of frames encoded by a vocoder
EP1861847A2 (en) Adaptive noise state update for a voice activity detector
EP1229520A2 (en) Silence insertion descriptor (sid) frame detection with human auditory perception compensation
JP2004177978A (en) Method of generating comfortable noise of digital speech transmission system
WO1998049673A1 (en) Method and device for detecting voice sections, and speech velocity conversion method and device utilizing said method and device
JP2006189907A (en) Method of detecting voice activity of signal and voice signal coder including device for implementing method
WO2000046789A1 (en) Sound presence detector and sound presence/absence detecting method
JPH0713586A (en) Speech decision device and acoustic reproduction device
JP2000172283A (en) System and method for detecting sound
US20100106490A1 (en) Method and Speech Encoder with Length Adjustment of DTX Hangover Period
US20100054454A1 (en) Method and apparatus for the detection and suppression of echo in packet based communication networks using frame energy estimation
JP3240832B2 (en) Packet voice decoding method
JP2861889B2 (en) Voice packet transmission system
JPH0619498A (en) Speech detector
JP2656069B2 (en) Voice detection device
JPH0236628A (en) Transmission system and transmission/reception system for voice signal
JPH08202394A (en) Voice detector
JP2900987B2 (en) Silence compressed speech coding / decoding device
JP3496618B2 (en) Apparatus and method for speech encoding / decoding including speechless encoding operating at multiple rates
JP3001584B2 (en) Audio signal transmission method
JP3055608B2 (en) Voice coding method and apparatus
Hatamian Enhanced speech activity detection for mobile telephony
JPH10301593A (en) Method and device detecting voice section

Legal Events

Date Code Title Description
A300 Application deemed to be withdrawn because no request for examination was validly filed

Free format text: JAPANESE INTERMEDIATE CODE: A300

Effective date: 19991005