EP1551006B1 - Vorrichtung und Verfahren zur Sprachaktivitätsdetektion - Google Patents

Vorrichtung und Verfahren zur Sprachaktivitätsdetektion Download PDF

Info

Publication number
EP1551006B1
EP1551006B1 EP20040030697 EP04030697A EP1551006B1 EP 1551006 B1 EP1551006 B1 EP 1551006B1 EP 20040030697 EP20040030697 EP 20040030697 EP 04030697 A EP04030697 A EP 04030697A EP 1551006 B1 EP1551006 B1 EP 1551006B1
Authority
EP
European Patent Office
Prior art keywords
noise
input signal
unit
time
active
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
EP20040030697
Other languages
English (en)
French (fr)
Other versions
EP1551006A1 (de
Inventor
Nobuhiko NTT DoCoMo Inc. Naka
Tomoyuki NTT DoCoMo Inc. Ohya
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Docomo Inc
Original Assignee
NTT Docomo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2003430973A external-priority patent/JP4490090B2/ja
Priority claimed from JP2004020351A external-priority patent/JP4601970B2/ja
Application filed by NTT Docomo Inc filed Critical NTT Docomo Inc
Publication of EP1551006A1 publication Critical patent/EP1551006A1/de
Application granted granted Critical
Publication of EP1551006B1 publication Critical patent/EP1551006B1/de
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the present invention relates to a voice activity detection apparatus and a voice activity detection method.
  • Discontinuous transmission is a technology commonly used in telephony services over the mobile and in telephony services over the Internet for the purpose of reducing transmission power or saving transmission bandwidth.
  • inactive period in an input signal such as silence and background noise
  • VAD Voice activity detection
  • US-B1-6453285 discloses a transition to an inactive VAD reset state after a predetermined time in an active state.
  • the voice activity detection apparatus described in Non-patent Document 1 listed below estimates a background noise from the input signal by the predetermined noise estimating method and uses the ratio of the input signals to the estimated background noise (S/N ratio: signal to noise ratio) for activity detection.
  • S/N ratio signal to noise ratio
  • the above mentioned conventional voice activity detection apparatus has the following problem.
  • the performance of the noise estimation may be degraded with the lapse of time, when the characteristics of the noise signal is not stationary. And such performance degradation of the noise estimation likely occurs, especially at the time when the active period continues for a long time, because the input signal contains not only the background noise, and thus it is difficult to estimate the characteristics of the noise signal correctly during such period of time.
  • the activity decision with the unmatched estimated background noise leads that the accuracy of the activity detection is deteriorated with the lapse of time (especially, when the active period continues for a long time).
  • the above mentioned conventional voice activity detection apparatus may decide the active period as inactive with the lapse of time (especially, when the sound interval continued for a long time).
  • the objective of the present invention is therefore to provide a voice activity detection apparatus and a voice activity detection method, which can perform activity decision of the input signal accurately regardless of the passage of time.
  • a voice activity detection apparatus as set forth in claim 1 and a voice activity detection method as set forth in claim 3.
  • a preferred embodiment is set out in claim 2.
  • FIG.1 is a block diagram of the voice activity detection apparatus according to this embodiment.
  • a voice activity detection apparatus 10 is, physically, configured as a computer system comprising a CPU (central processing unit), a memory, input devices such as a mouse and a keyboard, a displaying device such as a display, a storage device such as a hard disk, a radio communication unit that executes data communication with an external equipment via radio communication, and the like.
  • the voice activity detection apparatus 10 is, functionally, provided with an autocorrelation calculating unit 11, a delay calculating unit 12, a noise deciding unit 13, a noise estimating unit 14, an activity decision unit 15, and a sound interval detecting unit 16 (time measurement means).
  • a voice activity detection means 17 is composed of the autocorrelation calculating unit 11, the delay calculating unit 12, the noise deciding unit 13, the noise estimating unit 14, and the activity decision unit 15.
  • the autocorrelation calculating unit 11 calculates autocorrelation values of the input signal. More specifically, the autocorrelation calculating unit 11 calculates an autocorrelation value c(t) for the delay t of an input signal x(n), according to the following equation (1).
  • the autocorrelation value c(t) is obtained as discrete values every fixed time interval (e.g., 1/8000 sec) over a fixed time (e.g., 18 msec).
  • the autocorrelation calculating unit 11 calculates the autocorrelation value strictly in accordance with the above mentioned equation (1).
  • the autocorrelation calculating unit 11 can be designed to calculate the autocorrelation value based on the perceptually weighted input signal as widely used in speech encoders.
  • the noise deciding unit 13 decides whether the input signal is noise or not based on the delay calculated by the delay calculating unit 12.
  • the noise deciding unit 13 may decide whether the input signal is noise or not by using a procedure other than the above mentioned procedure.
  • the noise estimating unit 14 estimates a noise from the input signal. More specifically, the noise estimating unit 14, for example, estimates a noise by (3).
  • noise m + 1 ( n ) ( 1 ⁇ ⁇ ) • noise m ( n ) + ⁇ • input m ⁇ 1 ( n )
  • noise m (n) is the estimated noise
  • input m (n) is an input signal
  • n denotes the frequency band
  • m denotes the time (frame)
  • is a coefficient
  • the noise m (n) represents the estimated noise of the n-th frequency band at time (frame) m.
  • the noise estimating unit 14 changes the coefficient ⁇ in (3) in accordance with the result of decision by noise deciding unit 13.
  • the noise estimating unit 21 sets the coefficient ⁇ in (3) to 0 or a value ⁇ 1 close to 0 in such a manner as to cause no increase in the power of the estimated noise.
  • the noise estimating unit 21 sets the coefficient ⁇ in the above equation (3) to 1 or a value ⁇ 2 ( ⁇ 2 > ⁇ 1) near 1 so as to cause the estimated noise to be close to the input signal.
  • the noise estimating unit 21 may be designed to estimate a noise from the input signal using a procedure other than the above procedure.
  • the activity decision unit 15 performs activity decision on the basis of the result of decision by the noise deciding unit 13, the input signal, and the noise estimated by the noise estimating unit 14. More specifically, the activity decision unit 15, for example, calculates an S/N ratio (signal to noise ratio) from the noise estimated by the noise estimating unit 14 and the input signal, (more accurately, calculates an integrated value or an average value of the S/N ratio at each frequency band). And the activity decision unit 15 compares the calculated S/N ratio with a threshold value, and decides that the input signal is active in the case where the S/N ratio is larger than the threshold value, and decides that the input signal is inactive in the case where the S/N ratio is equal to the threshold value or less.
  • S/N ratio signal to noise ratio
  • the threshold may be adapted by the result of decision at the noise deciding unit 13.
  • the threshold value for the case that the noise deciding unit 13 decides the input signal 1 is not noise is set to be smaller than the threshold value for the case that the noise deciding unit 13 decides the input signal is noise.
  • the possibility of detecting signals having small S/N ratios (i.e., signals buried in the noise) as active increases.
  • the activity decision unit 15 can decide the activity of the input signal by using a procedure other than the above mentioned procedure.
  • the activity decision unit 15 may decide the activity of the input signal on the basis of the input signal and the noise estimated by the noise estimating unit 14. It is also possible that the activity decision unit 15 decides whether the input signal is active or not by utilizing additional information of the input signal (power, a spectrum envelope, the number of zero-crossing, and the like).
  • additional information of the input signal power, a spectrum envelope, the number of zero-crossing, and the like.
  • inactive refers to the meaningless sound, such as silence and background noise
  • active refers to a sound containing human voice, music or tones.
  • the sound interval detecting unit 16 measures time duration of the active interval, based on the result of decision by the activity decision unit 15. Specifically, the sound interval detecting unit 16 measures the time duration of the active interval by directly using the result of the activity decision unit 15. Alternatively, the sound interval detecting unit 16 can measure the time duration of the active interval by measuring a time that the speech encoding unit (not shown) is executing its speech encoding by an encoding rate being equal to a fixed threshold value or more (in case of the AMR, an encoding rate being 4.75 kbps or more). When the input signal has been decided as active by the activity decision unit 15, the input signal is encoded the larger bitrate is used for encoding the input signal in the speech encoding unit.
  • the noise estimating unit 14 changes a noise estimating method such that the input signal is likely decided as active, when the time duration of the active interval measured by the sound interval detecting unit 16 becomes a predetermined period of time or more. More specifically, the noise estimating unit 14 sets the estimated noise noise m (n) at unit time before (1 frame before) in (3) to the initial value noise 0 (n), when the time duration of the active interval measured by the sound interval detecting unit 16 becomes the predetermined period of time or more. Since the initial value noise 0 (n) has been set to a sufficiently small value compared with the input signal of the active interval, the estimated noise becomes small_by setting the estimated noise noise m (n) at the unit time before (1 frame before) in (3) to the initial value noise 0 (n). Therefore, the input signal is likely decided as active by the activity decision unit 15.
  • Fig.2 is a flow chart showing the operation of the voice activity detection apparatus according to this embodiment.
  • the autocorrelation values of the input signal are calculated by the autocorrelation calculating unit 11 (step S11). More specifically, the each autocorrelation value c (t) for delay t of the input signal x(n) is calculated by (1).
  • a delay corresponds to maximum autocorrelation value among the autocorrelation values calculated over the predetermined delay interval by the autocorrelation calculating unit 11 is calculated by the delay calculating unit 12 (step S12)
  • step S13 it is decided whether an input signal is noise or not by the noise deciding unit 13 based on the delay calculated by the delay calculating unit 12 (step S13). More specifically, the noise deciding unit 13 decides that the input signal is not noise, when the condition given by (2) is met for a predetermined period of time. Conversely, the noise deciding unit 13 decides that the input signal is not noise, when the condition given by (2) is not met within the predetermined period of time.
  • the noise is estimated from the input signal by the noise estimating unit 14 (step S14). More specifically, the noise is estimated by (3), where the coefficient ⁇ is adapted according to the result of decision by noise deciding unit 13.
  • the coefficient ⁇ is set to 0 or a coefficient ⁇ 1 close to 0 so as not to increase the level of the estimated noise.
  • the coefficient is set to 1 or a coefficient ⁇ 2 close to 1 ( ⁇ 2 > ⁇ 1) so as to make the level or the estimated noise close to the input signal.
  • the activity decision unit 15 decides the activity of the input signal based on the result of decision by the noise deciding unit 13, the input signal, and the noise estimated by the noise estimating unit 14 (step S15). More specifically, for example, an S/N ratio (signal to noise ratio) is calculated from the noise estimated by the noise estimating unit 14 and the input signal, and the calculated S/N ratio is compared with a predetermined threshold value. It is then decided that the input signal is active when the S/N ratio is larger than the threshold value or that the input signal is inactive when the S/N ratio is equal to or less than the threshold value
  • the time duration of the active interval is measured by the sound interval detecting unit 16. Specifically, the time duration of the active interval is measured by directly using the result of decision of the activity decision unit 15. Alternatively, the time duration of the active interval may be measured by using the time that the bitrate used in the speech encoding part (not shown in the figure) is higher than the certain threshold.
  • the noise estimating method is changed such that the input signal is likely decided as active (step S17). More specifically, when the time duration of the sound interval measured by the sound interval detecting unit 16 become the predetermined period of time or more, the estimated noise noise m (n) at the unit time before (1 frame before) in (3) is set to the initial value noise 0 (n) at the noise estimating unit 14.
  • the estimated noise becomes small by setting the estimated noise noise m (n) at unit time before (1 frame before) in (3) to the initial value noise 0 (n), , and thus the input signal is likely decided as active at the activity decision unit 15.
  • the voice activity detection apparatus 10 measures the time duration of the active interval by the sound interval detecting unit 16, and when the time duration of the active interval becomes a predetermined period of time or more, the noise estimating unit 14 changes the noise estimating method such that the input signal is likely decided as active. More specifically, the estimated noise noise n (n) at unit time before (1 frame before) in (3) is set to the initial value noise 0 (n). Therefore, the number of times of fault decision, i.e., active period of the input signal decided as inactive, can be decreased even when the accuracy of the noise estimation is deteriorated with the passage of time. As a result, the activity of the input signal can be decided correctly regardless of the passage of time.
  • the noise estimating method in the noise estimating unit 14 is changed such that the input signal is likely decided as active.
  • the time duration of the active interval becomes a predetermined period of time or more
  • several modified embodiments can be conceived, within the technical thought of the present invention, in that the deciding condition whether the input signal is active or not is eased such that the input signal is likely decided as active.
  • the autocorrelation calculating method in the autocorrelation calculating unit 11 the delay calculating method in the delay calculating unit 12, the noise deciding method in the noise deciding unit 13, and the activity deciding method in the activity deciding unit 15 can be changed. More specifically, when the time duration of the active interval measured by the sound interval detecting unit 16 become a predetermined period of time or more, usage of the parameters for the activity detection, such as the autocorrelation values, the spectrum envelope, the delay, the estimated noise power, the S/N ratio, may be changed, or these parameters may be reset to the initial values.
  • the present invention is applicable to a voice activity detection apparatus for deciding whether an input signal is active including human voice or inactive in which information is not needed to transmit, typically used in mobile telephony services or the Internet telephony services.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Claims (3)

  1. Stimmaktivität-Detektionsvorrichtung (10) mit einem Aktivität-Entscheidungsmittel (15) zum Entscheiden, ob ein Eingangssignal aktiv ist oder nicht, gemäß einer vorbestimmten Entscheidungsbedingung und mit einem Zeitmessmittel (16), das zum Messen einer Zeitdauer des aktiven Intervalls auf der Grundlage des Entscheidungsergebnisses des Aktivität-Entscheidungsmittels (15) ausgebildet ist, dadurch gekennzeichnet, dass das Aktivität-Entscheidungsmittel (15) dazu ausgebildet ist, die Entscheidungsbedingung zu verändern, sodass das Eingangssignal wahrscheinlich als aktiv bewertet wird, wenn die durch das Zeitmessmittel (16) gemessene Zeitdauer des Geräuschintervalls gleich oder länger als eine vorbestimmte Zeitperiode wird.
  2. Stimmaktivität-Detektionsvorrichtung (10) gemäß Anspruch 1, dadurch gekennzeichnet, dass das Aktivität-Entscheidungsmittel (15) dazu ausgebildet ist, die Aktivität des Eingangssignals auf der Grundlage eines Rauschens zu bewerten, das durch ein vorbestimmtes Rauschbestimmungsverfahren bestimmt wurde, wobei die Aktivität-Entscheidungsmittel (15) dazu ausgebildet sind, das Rauschbestimmungsverfahren zu verändern, sodass das Eingangssignal wahrscheinlich als aktiv bewertet wird, wenn die durch das Zeitmessmittel (16) gemessene Zeitdauer des Geräuschintervalls gleich oder länger als eine vorbestimmte Zeitperiode wird.
  3. Stimmaktivität-Detektionsverfahren zum Bewerten der Aktivität eines Eingangssignals gemäß einer vorbestimmten Entscheidungsbedingung, dadurch gekennzeichnet, dass: innerhalb des Verfahrens ein Vorgang zum Verändern der Entscheidungsbedingung ausgeführt wird (S17), sodass das Eingangssignal wahrscheinlich als aktiv bewertet wird, wenn die Zeitdauer des aktiven Intervalls gleich oder länger als eine vorbestimmte Zeitperiode wird (S16).
EP20040030697 2003-12-25 2004-12-23 Vorrichtung und Verfahren zur Sprachaktivitätsdetektion Expired - Fee Related EP1551006B1 (de)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2003430973 2003-12-25
JP2003430973A JP4490090B2 (ja) 2003-12-25 2003-12-25 有音無音判定装置および有音無音判定方法
JP2004020351 2004-01-28
JP2004020351A JP4601970B2 (ja) 2004-01-28 2004-01-28 有音無音判定装置および有音無音判定方法

Publications (2)

Publication Number Publication Date
EP1551006A1 EP1551006A1 (de) 2005-07-06
EP1551006B1 true EP1551006B1 (de) 2006-09-27

Family

ID=34576005

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20040030697 Expired - Fee Related EP1551006B1 (de) 2003-12-25 2004-12-23 Vorrichtung und Verfahren zur Sprachaktivitätsdetektion

Country Status (2)

Country Link
EP (1) EP1551006B1 (de)
DE (1) DE602004002553T2 (de)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6453285B1 (en) * 1998-08-21 2002-09-17 Polycom, Inc. Speech activity detector for use in noise reduction system, and methods therefor

Also Published As

Publication number Publication date
DE602004002553T2 (de) 2007-08-23
DE602004002553D1 (de) 2006-11-09
EP1551006A1 (de) 2005-07-06

Similar Documents

Publication Publication Date Title
JP4995913B2 (ja) 信号変化検出のためのシステム、方法、および装置
EP1982324B1 (de) Stimmendetektor und verfahren zur unterdrückung von subbändern in einem stimmendetektor
EP0784311B1 (de) Verfahren und Vorrichtung zur Feststellung der Sprachaktivität in einem Sprachsignal und eine Kommunikationsvorrichtung
KR101018952B1 (ko) 음성 통신 시스템에서의 컴포트 노이즈 생성 방법 및 장치
EP3252771B1 (de) Verfahren und vorrichtung zur durchführung von sprachaktivitätserkennung
JP2007534020A (ja) 信号符号化
EP1521238A1 (de) Sprachaktivitätsdetektion
US8380494B2 (en) Speech detection using order statistics
US6381568B1 (en) Method of transmitting speech using discontinuous transmission and comfort noise
JP2000010591A (ja) 音声符号化レート選択器と音声符号化装置
US20050171769A1 (en) Apparatus and method for voice activity detection
EP1548703B1 (de) Vorrichtung und Verfahren zur Sprachaktivitätsdetektion
JP4551817B2 (ja) ノイズレベル推定方法及びその装置
US7277847B2 (en) Method for determining intensity parameters of background noise in speech pauses of voice signals
EP1551006B1 (de) Vorrichtung und Verfahren zur Sprachaktivitätsdetektion
US6842526B2 (en) Adaptive noise level estimator
CN100492495C (zh) 一种噪声检测装置和方法
US7391737B2 (en) Method and apparatus for measuring quality of service in voice-over-IP network applications based on speech characteristics
EP1619665A1 (de) Verfahren, Vorrichtung zur Sprachkodierung in einem mobilen Kommunikationsendgerät mittels PLP
KR100388454B1 (ko) 배경잡음 예측을 통한 음성 출력 이득 조정 방법
US20240105213A1 (en) Signal energy calculation with a new method and a speech signal encoder obtained by means of this method
US20240013803A1 (en) Method enabling the detection of the speech signal activity regions
Hoene et al. Calculation of speech quality by aggregating the impacts of individual frame losses
WO2007040883A2 (en) Voice activity detector

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20041223

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR LV MK YU

AKX Designation fees paid

Designated state(s): DE GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 602004002553

Country of ref document: DE

Date of ref document: 20061109

Kind code of ref document: P

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20070628

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20131218

Year of fee payment: 10

Ref country code: GB

Payment date: 20131218

Year of fee payment: 10

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602004002553

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20141223

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20141223

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20150701