EP3078027A1 - Stimmendetektionsverfahren - Google Patents

Stimmendetektionsverfahren

Info

Publication number
EP3078027A1
EP3078027A1 EP14814978.4A EP14814978A EP3078027A1 EP 3078027 A1 EP3078027 A1 EP 3078027A1 EP 14814978 A EP14814978 A EP 14814978A EP 3078027 A1 EP3078027 A1 EP 3078027A1
Authority
EP
European Patent Office
Prior art keywords
frame
subframe
value
detection
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP14814978.4A
Other languages
English (en)
French (fr)
Other versions
EP3078027B1 (de
Inventor
Karim Maouche
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Adeunis RF SA
Original Assignee
Adeunis RF SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Adeunis RF SA filed Critical Adeunis RF SA
Publication of EP3078027A1 publication Critical patent/EP3078027A1/de
Application granted granted Critical
Publication of EP3078027B1 publication Critical patent/EP3078027B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold

Definitions

  • the present invention relates to a voice detection method for detecting the presence of speech signals in a noisy acoustic signal from a microphone.
  • VAD Voice Activity Detection
  • VAD Voice Activity Detection
  • the invention finds a preferred, but not limiting, application with a multi-user wireless audio communication system, of the type of time-division or full-duplex communication system, between several autonomous communication terminals, that is to say say without connection to a transmission base or network, and easy to use, that is to say without intervention of a technician to establish communication.
  • Such a communication system in particular known from the documents WO10149864 A1, WO10149875 A1 and EP1843326 A1, is conventionally used in a noisy or even very noisy environment, for example in a marine environment, in the context of a sports spectacle or event. room or outdoors, on a construction site, etc.
  • the detection of voice activity generally consists in delimiting, by means of quantifiable criteria, the beginning and end of words and / or sentences in a noisy acoustic signal, in other words in a given audio stream.
  • Such detection finds applications in areas such as speech coding, noise reduction or speech recognition.
  • a method for detecting the voice in the processing chain of an audio communication system makes it possible in particular not to transmit an acoustic or audio signal during periods of silence. Therefore, the surrounding noise will not be transmitted during these periods, for the sake of improving the audio rendering of the communication or to reduce the transmission rate.
  • voice activity detection it is known to employ voice activity detection to encode the audio signal in a full manner only when the "VAD" method is used. indicates activity. Therefore, when there is no speech and we are in a period of silence, the coding rate drops significantly, which on average, over the entire signal, allows to achieve lower rates.
  • the signal indeed has a so-called fundamental frequency, generally called "pitch", which corresponds to the frequency of vibration of the vocal cords of the speaker, and which generally extends between 70 and 400 Hertz.
  • the evolution of this fundamental frequency determines the melody of speech and its extent depends on the speaker, his habits but also his physical and mental state.
  • a first method of detecting the fundamental frequency implements the search for the maximum of the autocorrelation function R (T) defined by the following relation:
  • R (x) (n) x (n + ⁇ ), 0 ⁇ ⁇ max (i).
  • This first method employing the autocorrelation function does not give satisfaction, however, when there is presence of relatively large noise. Moreover, the autocorrelation function suffers from the presence of maxima that do not correspond to the fundamental frequency or its multiples, but to sub-multiples of it.
  • a second method of detecting the fundamental frequency implements the search for the minium of the difference function D (x) defined by the following relation:
  • I I is the absolute value operator, this difference function being minimal in the vicinity of the fundamental frequency and its multiples, then the comparison of this minimum with a threshold to deduce the decision of presence of voice or not.
  • the difference function ⁇ ( ⁇ ) Compared to the autocorrelation function R (T), the difference function ⁇ ( ⁇ ) has the advantage of offering a lower computational load, making this second method more interesting for real-time applications. However, this second method does not give complete satisfaction when there is noise.
  • a third method of detecting the fundamental frequency implements the calculation, by considering a processing window of length H where H ⁇ N, of the square difference function d t (x) defined by the relation:
  • this third method consists in normalizing the square difference function d t (x) by calculating a normalized square difference function ⁇ ⁇ ( ⁇ ) corresponding to the following relation: inon
  • this third method has limitations in terms of voice detection, particularly in areas of low SNR (Signal to Noise) noise. characteristics of a very noisy environment.
  • the threshold would be the same for all the situations without it changing according to the noise level. can thus lead to cuts at the beginning of the sentence, or even non-detection of the voice, when the signal to detect is a voice, especially in a context where the noise is a diffuse audience noise in such a way that it does not look like a speech signal at all.
  • the present invention aims to provide a voice detection method that provides a detection of speech signals contained in a noisy acoustic signal, particularly in noisy environments or very noisy.
  • a voice detection method for detecting the presence of speech signals in a noisy acoustic signal x (t) from a microphone, comprising the following successive steps:
  • a prior sampling step comprising a division of the acoustic signal x (t) into a discrete acoustic signal ⁇ xj composed of a sequence of vectors associated with i-time frames of length N, N corresponding to the number of sampling points; , where each vector translates the acoustic content of the associated frame i and is composed of N samples X (M) N + I, X (ii) N + 2, XiN-i, XIN, i positive integer;
  • this step of calculating the detection function FD (x) consists of a calculation of a discrete detection function FDi (x) associated with the frames i;
  • a threshold adaptation step in said current interval as a function of values calculated from the acoustic signal x (t) established in said current interval, and in particular maximum values of said acoustic signal x (t), where this step of adapting the threshold consists in, for each frame i, adapting a threshold ⁇ specific to the frame i as a function of reference values calculated from the values of the samples of the discrete acoustic signal ⁇ xj in said frame i;
  • this step of finding the minimum of the detection function FD (x) and the comparison of this minimum with a threshold are performed by searching, on each frame i, the minimum rr (i) of the discrete detection function FDi (x ) and comparing this minium rr (i) with a threshold ⁇ , specific to the frame i;
  • step of adapting the thresholds ⁇ for each frame i comprises the following steps:
  • n 1 (X (I-I) N + O-I) L + I, X (I-I) N + O-I) L + 2, X (I-I) N + JL;
  • this method is based on the principle of an adaptive threshold, which will be relatively low during periods of noise or silence and relatively high during speech periods. As a result, false detections will be minimized and speech will be detected correctly with a minimum of cuts at the beginning and end of words.
  • the detection function FD (x) corresponds to the difference function D (x).
  • the detection function FD (x) corresponds to the normalized difference function DN (x) calculated from the difference function D (x) as follows:
  • DN (T) ⁇ ⁇ ⁇ ( ⁇ ⁇ ) if ⁇ ⁇ 0;
  • the calculation of the normalized difference function DN (T) consists of a calculation of a discrete normalized difference function DN ⁇ x) associated with the frames i, where:
  • DN j (x) ⁇ ⁇ Di T (T) if ⁇ ⁇ 0.
  • the discrete difference function DiC relating to the frame i is calculated as follows:
  • the discrete difference function Di (x) relative to the frame i is calculated as the sum of the difference functions dd p (Y) of the sub-frames of index p of the frame i, namely:
  • step c the following sub-steps are carried out on each frame i:
  • ⁇ A jj m ij _ 1 + (1 - A) m dd, where ⁇ is a predefined factor between 0 and 1;
  • the variation signals are considered smoothed envelopes established in the subframes j to make the decision (voice or no voice) on the entire frame i, making reliable the detection of speech (or voice).
  • step c) and following substep c2) the following sub-steps are performed on each frame i:
  • the maximums of variation Sjj are calculated in each subfield of index j of the frame i, where Sjj corresponds to the maximum of the variation signal calculated on a sliding window of length Lm prior to said sub-frame j, said length Lm being variable according to whether the sub-frame j of the frame i corresponds to a period of silence or presence of speech;
  • 3 ⁇ 4j ⁇ ⁇ "s i, i;
  • the variation signals are considered together and the variations of variation 5i established in the sub-frames j to choose the value of the threshold ⁇ , adaptive and thus make the decision (voice or absence of voice) on the entire frame i, reinforcing the detection of speech.
  • the couple is studied (Ai, ⁇ ,,,) to determine the value of ⁇ threshold, adaptive.
  • the normalized variation signal A'i and away from 5'ij normalized variation each constitute a main reference value Ref,, so that, when step d), it sets the value of ⁇ threshold, capable of frame i according to the torque ( ⁇ ', ⁇ ,,',,) standardized variation signals A 'li and standard variation differences 5'ij in the sub-frames j of the frame i .
  • the thresholds ⁇ chosen from these normalized signals A ' li and 5'ij will be independent of the level of the discrete acoustic signal ⁇ xj.
  • the torque is studied ( ⁇ ',,,; O'I) for determining the value of ⁇ threshold adaptive.
  • the value of the threshold ⁇ is established by partitioning the space defined by the value of the pair (A'i, ô'i), and examining the value of the torque ( ⁇ ⁇ , ⁇ ',,,) on one or more (e.g. one to three) successive subframes according to the value of the torque box ( ⁇ ⁇ , O'I).
  • the ⁇ threshold calculation procedure is based on an experimental partition the space defined by the value of the torque ( ⁇ ⁇ , ⁇ ',,,).
  • a decision mechanism that scrutinizes the value of the pair ( ⁇ ' ⁇ , ô'i) on one, two or more successive subframes according to the value area of the pair.
  • the conditions of positioning tests of the torque value ( ⁇ ' ⁇ , ô'i) depend mainly on the speech detection during the previous frame and the scanning mechanism on one, two or more successive subframes also uses an experimental partitioning.
  • the length Lm of the sliding window corresponds to the following equations:
  • - Lm LO if the sub-frame j of the frame i corresponds to a period of silence
  • the sliding window of length Lm is delayed by Mm frames of length N with respect to said sub-frame j.
  • each maximum normalized variation is calculated according to a minimization method comprising the following iterative steps:
  • step c4) the standardized variation differences 5 ', j in each subfield of index j of the frame i are calculated as follows:
  • a sub-step c6) is performed in which the maximum maxima q, j are calculated in each subfield of index j of the frame i, where q, j corresponds to at most of the maximum value ⁇ 3 ⁇ 4 calculated on a sliding window of fixed length Lq prior to said subframe j, where the sliding window of length Lq is delayed by Mq frames of length N opposite said subframe j, and where another so-called secondary reference value MRefi j per sub-frame j corresponds to said maximum maximum q, j in the sub-frame j of the frame i .
  • the ⁇ threshold fit for frame i is divided into several sub-thresholds ⁇ ,,, specific to each sub-frame of frame i j, and the value each sub-threshold ⁇ ,, is at least determined on the basis of the refi j or reference values, MRefi j calculated in the subframe j of the corresponding frame i.
  • the value of each threshold ⁇ ⁇ specific to the sub-frame j of the frame i is determined by comparing the values of the pair (A'i j , 5 ', j ) with several pairs of fixed thresholds, the value of each threshold Q i j being selected from among several fixed values as a function of the comparisons of the torque (A ', j , 5', j ) with said pairs of fixed thresholds.
  • pairs of fixed thresholds are for example determined experimentally by a distribution of the space of the values (A ', j , 5', j ) into decision zones. Complementarily, it sets the value of each Qi j threshold specific to the sub-frame to frame j i also performing a comparison of the torque (A'i, ⁇ ',,,) on one or more successive sub-frames according to the initial zone of the torque ( ⁇ ', ⁇ ,,',,).
  • the decision mechanism based on comparing the torque (A'i, O'I) with pairs of fixed thresholds is completed through another decision mechanism based on the comparison of q, with other fixed thresholds .
  • step d a so-called decision procedure is carried out comprising the following substeps, for each frame i:
  • a decision index DECi (j) is established which occupies either a state "1" of detection of a speech signal or a state "0" of non-detection of a speech signal;
  • a temporary decision VAD (i) is established based on the comparison of the decision indices DEC, (j) with logical "OR” operators, so that the temporary decision VAD (i) occupies a state "1" of detection of a speech signal if at least one of said decision indices DEC, (j) occupies this state "1" of detection of a speech signal.
  • the final decision (voice or no voice) is taken following this decision procedure, based on the temporary decision VAD (i) which is itself taken on the entire frame i, by implementing a logical "OR" operator on the decisions taken in the sub-frames j, and preferably in successive sub-frames j over a short and finite horizon from from the beginning of the frame i.
  • the following sub-steps can be performed for each frame i:
  • a maximum threshold value Lastmax is stored which corresponds to the variable value of a comparison threshold for the amplitude of the discrete acoustic signal ⁇ x ⁇ below which it is considered that the acoustic signal does not comprise a speech signal, this a variable value being determined during the last frame of index k preceding said frame i and in which the temporary decision VAD (k) has a state "1" of detection of a speech signal;
  • a maximum average value is stored which corresponds to the average maximum value of the discrete acoustic signal ⁇ xj in the subframe j of the frame i calculated as follows:
  • ay corresponds to the maximum of the discrete acoustic signal ⁇ xj contained in a frame k formed by the subframe j of the frame i and by at least one or more successive subframes that precede said subframe j;
  • is a predefined coefficient between 0 and 1 with ⁇ ⁇
  • each sub-threshold Q li is established as a function of the comparison between said maximum threshold value Lastmax and average maximum values A i and A i considered on two successive sub-frames j and j-1.
  • this decision procedure aims to further eliminate the bad detections by storing the Lastmax maximum threshold value of the updated speech signal in the last activation period and the average maximum values Ajj and ⁇ ,, ⁇ which correspond to the mean maximum value of the discrete acoustic signal ⁇ xj in the subframes j and j-1 of the frame i. Taking into account these values (Lastmax, A and A.-I), we add a condition at the level of the establishment of the threshold ⁇ , adaptive.
  • the maximum threshold value Lastmax is updated each time the process has considered that a sub-frame p of a frame k contains a speech signal, by implementing the following procedure:
  • Lastmax takes the updated value [a (A k , p + LastMax)] where a is a predefined coefficient between 0 and 1, for example between 0.2 and 0.7;
  • Lastmax takes the updated value A k , p if A k , p > Lastmax.
  • the Lastmax value is updated only during the activation periods of the process (ie the detection periods of the voice). In a speech detection situation, the value Lastmax will be worth A k , p when A k , p > LastMax. However, it is important that this update is done as follows when activating the first sub-frame p following a zone of silence: the value Lastmax will be worth [a (A k , p + LastMax)].
  • This mechanism of updating the Lastmax maximum threshold value allows the process to detect the user's voice even if the user has reduced the intensity of his voice (ie speaking less loud) compared to the last time the process detected that he had spoken.
  • Lastmax maximum threshold value is variable and is compared with the average maximum values and .1 of the discrete acoustic signal.
  • Kp is a fixed weighting coefficient between 1 and 2.
  • the method further comprises a so-called blocking phase comprising a step of switching from a state of non-detection of a speech signal to a detection state of a speech signal after having detected the presence of a speech signal on N P successive time frames i.
  • the method implements a step hangover type configured such that the transition from a situation voiceless to a situation with the presence of voice is that after N successive P frames with the presence of voice.
  • the method further comprises a so-called blocking phase comprising a step of switching from a detection state of a speech signal to a non-detection state of a speech signal after detecting no presence of a speech signal.
  • signal voiced on N A successive time frames i.
  • the method implements a step of the hangover type configured in such a way that the transition from a situation with presence of voice to a situation without voice is done only after N A successive frames without voice.
  • the process may punctually cut the acoustic signal during the sentences or even in the middle of the words spoken.
  • these failover steps implement a blocking or hangover step on a given series of frames.
  • the method comprises a step of interrupting the blocking phase in decision areas intervening at the end of words and in a non-noisy situation, said decision areas being detected by analyzing the minimum rr (i) the discrete detection function FD (T).
  • the blocking phase is interrupted at the end of a sentence or word during a particular detection in the decision space. This interruption occurs only in a situation with little or no noise.
  • the method provides for isolating a particular decision area that occurs only at the end of words and in a non-noisy situation.
  • the method also uses the minimum rr (i) of the discrete detection function FDi (x), where the discrete detection function FDi (x) corresponds to either the discrete difference function Di ( x) to the discrete normalized difference function DN Î (T).
  • the voice will be cut faster at the end of the speech, giving the system better audio quality.
  • the invention also relates to a computer program comprising code instructions able to control the execution of the steps of the voice detection method as defined above when executed by a processor.
  • the invention further relates to a recording data recording medium on which is stored a computer program as defined above.
  • Another object of the invention is to provide a computer program as defined above over a telecommunication network with a view to downloading it.
  • FIG. 1 is a block diagram of the method according to the invention.
  • FIG. 2 is a schematic view of a limiting loop implemented by a so-called hangover type decision blocking step
  • FIG. 3 illustrates the result of a voice detection method using a fixed threshold with, at the top, a representation of the curve of the minimum rr (i) of the detection function and of the fixed threshold line Qfix and at the bottom, a representation of the discrete acoustic signal ⁇ xj and of the output signal DF,;
  • FIG. 4 illustrates the result of a voice detection method according to the invention by using an adaptive threshold with, at the top, a representation of the curve of the minimum rr (i) of the detection function and the adaptive threshold line Qi and, below, a representation of the discrete acoustic signal ⁇ xj and the output signal DF ,.
  • FIG. 1 schematically illustrates the succession of the different steps necessary for detecting the presence of speech (or voice) signals in a noisy acoustic signal x (t) originating from a single microphone operating in a noisy environment.
  • the method starts with a preliminary sampling step 101 comprising a division of the acoustic signal x (t) into a discrete acoustic signal ⁇ xj composed of a sequence of vectors associated with time frames of length N, N corresponding to the number of sampling points, where each vector translates the acoustic content of the associated frame i and is composed of N samples X (M) N + I, X (M) N + 2, XIN-I, Î N, i positive integer:
  • the noisy acoustic signal x (t) is divided into frames of 240 or 256 samples, which at a sampling frequency F e of 8 kHz corresponds to time frames of 30 or 32 milliseconds.
  • the method continues with a step 102 of calculating a discrete difference function Di (x) relative to the frame i is calculated as follows:
  • each field i is subdivided into K subframes of length H, with the following relation:
  • samples of the discrete acoustic signal ⁇ xj in a subscript of index p of the frame i comprise the following H samples:
  • the discrete difference function Di (x) relative to the frame i is calculated as the sum of the difference functions dd p (Y) of the sub-frames of index p of the frame i, namely:
  • step 102 also includes calculating a discrete normalized difference function DN (x) from the discrete difference function Di (x), as follows:
  • DN j (x) 1 7 Di T (T) if ⁇ ⁇ 0.
  • step 103 in which, for each frame i:
  • n 1 (X (I-I) N + O-I) L + I, X (I-I) N + O-I) L + 2, X (I-I) N + JL;
  • ⁇ dd A ij m _ 1 + (1 - A) m dd, where ⁇ is a predefined coefficient between 0 and 1.
  • the variation signals ⁇ are calculated in each subfield of index j of the frame i, defined by:
  • the maximums of variation Sij are calculated in each subfield of index j of the frame i, where s ,,, corresponds to the maximum of the variation signal. calculated on a sliding window of length Lm prior to said subframe j.
  • the length Lm is variable according to whether the sub-frame j of the frame i corresponds to a period of silence or presence of speech, with:
  • L being a reminder of the length of subframes of index j and kO
  • k1 being positive integers with k1 ⁇ k0.
  • the sliding window length Lm is delayed by Mm frames of length N vis-à-vis said subframe j.
  • the normalized maximums of variation are also calculated in each subfield of index j of the frame i, where:
  • a step 109 the maximum maxima q, j are calculated in each subfield of index j of the frame i, where q ,,, corresponds to the maximum of the maximum value ms j calculated on a sliding window of fixed length Lq prior to said subframe j, where the sliding window of length Lq is delayed by Mq frames of length N vis-à-vis said subframe j.
  • Lq> L0, and in particular Lq kq.L with kq a positive integer and kq> kO.
  • Mq> Mm Mm.
  • each threshold Q, or sub-threshold Qij takes a fixed value chosen from six fixed values Qa, Qb, Qc, Qd, Qe, Qf, these fixed values being for example between 0.05 and 1, and in particular between 0.1 and 0.7.
  • Each threshold Q, or sub-threshold Qi is fixed at a fixed value Qa, Qb, Qc, Qd, Qe, Qf by the implementation of two analyzes: first analysis: the comparison of the values of the pair ( ⁇ ',,,, ⁇ ' ,,,) in the subscript of index j of the frame i with several pairs of fixed thresholds;
  • This decision procedure includes the following substeps, for each frame i:
  • a decision index DECi (j) is established which occupies either a state "1" of detection of a speech signal or a state "0" of non-detection of a speech signal;
  • a temporary decision VAD (i) is established based on the comparison of the decision indices DEC, (j) with logical "OR” operators, so that the temporary decision VAD (i) occupies a state "1" of detection of a speech signal if at least one of said decision indices DEC, (j) occupies this state "1" of detection of a speech signal, in other words the following relation is:
  • VAD (i) DECi (1) + DECi (2) + ... + DECi (T), where "+” is the "OR” operator.
  • the threshold ⁇ is fixed at one of the fixed values Qa, Qb, Qc, Qd , Qe, Qf and we deduce the final decision by comparing the minimum rr (i) with the threshold ⁇ , fixed at one of its fixed values (see description below).
  • the false detections arrive with an amplitude lower than that of the speech signal, the microphone being located next to the mouth of the user.
  • the maximum threshold value Lastmax deduced from the speech signal in the last activation period of the "VAD" and adding a condition in the process based on this Lastmax maximum threshold value.
  • step 109 the storage of the Lastmax maximum threshold value which corresponds to the variable (or actualized) value of a comparison threshold for the amplitude of the discrete acoustic signal ⁇ xj below which it is considered that the acoustic signal does not comprise a speech signal, this variable value being determined during the last frame of index k which precedes said frame i and in which the temporary decision VAD (k) had a state "1" of detection of a speech signal.
  • a mean maximum value is also stored which corresponds to the average maximum value of the discrete acoustic signal ⁇ xj in the sub-frame j of the frame i calculated as follows:
  • ay is the maximum of the discrete acoustic signal ⁇ xj contained in the theoretical frame k formed by the subframe j of the frame i and by at least one or more successive subframes that precede said subframe j; and ⁇ is a predefined coefficient between 0 and 1 with ⁇ ⁇
  • Lastmax takes the updated value [a (A k , p + LastMax)] where a is a predefined coefficient between 0 and 1, for example between 0.2 and 0.7;
  • Lastmax takes the updated value A k , p if A k , p > Lastmax.
  • this condition is based on the comparison between:
  • Kp is a fixed weighting coefficient between 1 and 2.
  • a step 1 1 1 for each current frame i, the minimum rr (i) of a discrete detection function FDi (x) is calculated, in which the discrete detection function FDi (x) corresponds to either the discrete difference function Di (x) is the discrete normalized difference function DN Î (T).
  • this minimum rr (i) is compared with the threshold ⁇ , specific to the frame i, to detect the presence or absence of a speech signal (yes signal voiced), with:
  • the frame i is considered as presenting a speech signal and the method delivers an output signal DF, taking the value "1" (in other words, the final decision for the frame i is "Presence of voices in the frame i");
  • the frame i is considered as not having a speech signal and the method delivers an output signal DF, taking the value "0" (in other words, the final decision for the frame i is "absence of voice in the frame i").
  • this stage 1 13 blocking decision to strengthen the decision of presence / absence of voice by implementing the following two steps:
  • this blocking step 13 makes it possible to output a decision signal of the detection of the voice D v which takes the value "1" corresponding to a decision of the detection of the voice and the value "0" corresponding to a decision of the non detection of the voice, where:
  • the decision signal of the detection of the voice D v switches from a state "1" to a state “0” if and only if the output signal DF takes the value "0" over N A successive time frames i ; and the decision signal of the detection of the voice D v switches from a state "0" to a state “1” if and only if the output signal DF takes the value "1" on N P successive time frames i .
  • the remainder of the description relates to two voice detection results obtained with a conventional method using a fixed threshold (FIG. 3) and with the method according to the invention using an adaptive threshold (FIG. 4).
  • Fig. 3 the minimum function rr (i) is compared to a fixed threshold Qfix optimally selected for voice detection.
  • Qfix optimally selected for voice detection.
  • Figure 3 (bottom) we note the form of the output signal DF, which occupies a state "1" if rr (i) ⁇ Qfix and a state "0" if rr (i)> Qfix.
  • the minimum function rr (i) is compared with an adaptive threshold Q, calculated according to the steps previously described in FIG. reference to Figure 1.
  • an adaptive threshold Q calculated according to the steps previously described in FIG. reference to Figure 1.
  • Figure 4 (bottom) we note the shape of the output signal DF, which occupies a state "1" if rr (i) ⁇ , and a state "0" if rr (i)> ⁇ ,.
  • the method according to the invention makes it possible to detect the voice in the speech presence zone "PAR" with the output signal DF, which occupies a state "1", and that this same signal DF output, several times occupies a state "1" in other areas where speech is however absent, which corresponds by false detections unwanted with the conventional method.
  • the method in accordance with the invention allows optimal detection of the voice in the speech presence zone "PAR" with the output signal DF, which occupies a state "1", and that this same output signal DF, occupies a state "0" in the other areas where speech is absent.
  • the method according to the invention provides voice detection with a sharp reduction in the number of false detections.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Mobile Radio Communication Systems (AREA)
EP14814978.4A 2013-12-02 2014-11-27 Stimmendetektionsverfahren Active EP3078027B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1361922A FR3014237B1 (fr) 2013-12-02 2013-12-02 Procede de detection de la voix
PCT/FR2014/053065 WO2015082807A1 (fr) 2013-12-02 2014-11-27 Procédé de détection de la voix

Publications (2)

Publication Number Publication Date
EP3078027A1 true EP3078027A1 (de) 2016-10-12
EP3078027B1 EP3078027B1 (de) 2018-05-23

Family

ID=50482942

Family Applications (1)

Application Number Title Priority Date Filing Date
EP14814978.4A Active EP3078027B1 (de) 2013-12-02 2014-11-27 Stimmendetektionsverfahren

Country Status (7)

Country Link
US (1) US9905250B2 (de)
EP (1) EP3078027B1 (de)
CN (1) CN105900172A (de)
CA (1) CA2932449A1 (de)
ES (1) ES2684604T3 (de)
FR (1) FR3014237B1 (de)
WO (1) WO2015082807A1 (de)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3014237B1 (fr) * 2013-12-02 2016-01-08 Adeunis R F Procede de detection de la voix
US10621980B2 (en) * 2017-03-21 2020-04-14 Harman International Industries, Inc. Execution of voice commands in a multi-device system
CN107248046A (zh) * 2017-08-01 2017-10-13 中州大学 一种思想政治课课堂教学质量评价装置及方法
JP6904198B2 (ja) * 2017-09-25 2021-07-14 富士通株式会社 音声処理プログラム、音声処理方法および音声処理装置
CN111161749B (zh) * 2019-12-26 2023-05-23 佳禾智能科技股份有限公司 可变帧长的拾音方法、电子设备、计算机可读存储介质
CN111261197B (zh) * 2020-01-13 2022-11-25 中航华东光电(上海)有限公司 一种复杂噪声场景下的实时语音段落追踪方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2825505B1 (fr) 2001-06-01 2003-09-05 France Telecom Procede d'extraction de la frequence fondamentale d'un signal sonore au moyen d'un dispositif mettant en oeuvre un algorithme d'autocorrelation
FR2899372B1 (fr) 2006-04-03 2008-07-18 Adeunis Rf Sa Systeme de communication audio sans fil
KR100930584B1 (ko) * 2007-09-19 2009-12-09 한국전자통신연구원 인간 음성의 유성음 특징을 이용한 음성 판별 방법 및 장치
WO2010070840A1 (ja) * 2008-12-17 2010-06-24 日本電気株式会社 音声検出装置、音声検出プログラムおよびパラメータ調整方法
FR2947122B1 (fr) 2009-06-23 2011-07-22 Adeunis Rf Dispositif d'amelioration de l'intelligibilite de la parole dans un systeme de communication multi utilisateurs
FR2947124B1 (fr) 2009-06-23 2012-01-27 Adeunis Rf Procede de communication par multiplexage temporel
US8949118B2 (en) * 2012-03-19 2015-02-03 Vocalzoom Systems Ltd. System and method for robust estimation and tracking the fundamental frequency of pseudo periodic signals in the presence of noise
FR2988894B1 (fr) * 2012-03-30 2014-03-21 Adeunis R F Procede de detection de la voix
FR3014237B1 (fr) * 2013-12-02 2016-01-08 Adeunis R F Procede de detection de la voix

Also Published As

Publication number Publication date
US20160284364A1 (en) 2016-09-29
US9905250B2 (en) 2018-02-27
EP3078027B1 (de) 2018-05-23
FR3014237B1 (fr) 2016-01-08
ES2684604T3 (es) 2018-10-03
FR3014237A1 (fr) 2015-06-05
CA2932449A1 (fr) 2015-06-11
WO2015082807A1 (fr) 2015-06-11
CN105900172A (zh) 2016-08-24

Similar Documents

Publication Publication Date Title
EP3078027B1 (de) Stimmendetektionsverfahren
JP6694426B2 (ja) ランニング範囲正規化を利用したニューラルネットワーク音声活動検出
US10412488B2 (en) Microphone array signal processing system
KR100636317B1 (ko) 분산 음성 인식 시스템 및 그 방법
EP1320087B1 (de) Synthese eines Anregungssignales zur Verwendung in einem Generator von Komfortrauschen
US20020165713A1 (en) Detection of sound activity
EP2772916B1 (de) Verfahren zur Geräuschdämpfung eines Audiosignals mit Hilfe eines Algorithmus mit variabler Spektralverstärkung mit dynamisch modulierbarer Härte
EP0867856A1 (de) Verfahren und Vorrichtung zur Sprachdetektion
US9467790B2 (en) Reverberation estimator
CN110827795A (zh) 语音输入结束判断方法、装置、设备、系统以及存储介质
JP2002508891A (ja) 特に補聴器における雑音を低減する装置および方法
US20110238417A1 (en) Speech detection apparatus
JP2008058983A (ja) 音声コーディングにおける雑音のロバストな分類のための方法
US8694311B2 (en) Method for processing noisy speech signal, apparatus for same and computer-readable recording medium
EP3807878B1 (de) Auf tiefem neuronalem netz basierte sprachverbesserung
KR101317813B1 (ko) 노이지 음성 신호의 처리 방법과 이를 위한 장치 및 컴퓨터판독 가능한 기록매체
EP1451548A2 (de) Einrichtung zur sprachdetektion in einem audiosignal bei lauter umgebung
US20080215318A1 (en) Event recognition
CN113192535B (zh) 一种语音关键词检索方法、系统和电子装置
KR20090104558A (ko) 노이지 음성 신호의 처리 방법과 이를 위한 장치 및 컴퓨터판독 가능한 기록매체
EP3192073B1 (de) Unterscheidung und dämpfung von vorechos in einem digitalen audiosignal
EP3627510A1 (de) Filterung eines tonsignals, das durch ein stimmerkennungssystem erfasst wurde
Martin et al. Robust speech/non-speech detection based on LDA-derived parameter and voicing parameter for speech recognition in noisy environments
Chelloug et al. An efficient VAD algorithm based on constant False Acceptance rate for highly noisy environments
US20240046927A1 (en) Methods and systems for voice control

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20160628

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20170627

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 25/78 20130101ALN20171124BHEP

Ipc: G10L 25/84 20130101AFI20171124BHEP

INTG Intention to grant announced

Effective date: 20171221

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602014025940

Country of ref document: DE

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1002143

Country of ref document: AT

Kind code of ref document: T

Effective date: 20180615

REG Reference to a national code

Ref country code: SE

Ref legal event code: TRGR

REG Reference to a national code

Ref country code: CH

Ref legal event code: NV

Representative=s name: CABINET GERMAIN AND MAUREAU, CH

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20180523

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2684604

Country of ref document: ES

Kind code of ref document: T3

Effective date: 20181003

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 5

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180823

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180823

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180824

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1002143

Country of ref document: AT

Kind code of ref document: T

Effective date: 20180523

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602014025940

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20190226

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181127

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20181130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180523

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20141127

Ref country code: MK

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180523

REG Reference to a national code

Ref country code: ES

Ref legal event code: PC2A

Owner name: VOGO

Effective date: 20200629

Ref country code: DE

Ref legal event code: R082

Ref document number: 602014025940

Country of ref document: DE

Representative=s name: HOEFER & PARTNER PATENTANWAELTE MBB, DE

Ref country code: DE

Ref legal event code: R081

Ref document number: 602014025940

Country of ref document: DE

Owner name: VOGO, FR

Free format text: FORMER OWNER: ADEUNIS RF, CROLLES, FR

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180923

REG Reference to a national code

Ref country code: CH

Ref legal event code: PUE

Owner name: VOGO, FR

Free format text: FORMER OWNER: ADEUNIS RF, FR

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20200723 AND 20200729

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230914

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230816

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20231218

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: SE

Payment date: 20231117

Year of fee payment: 10

Ref country code: IE

Payment date: 20231019

Year of fee payment: 10

Ref country code: DE

Payment date: 20230915

Year of fee payment: 10

Ref country code: CH

Payment date: 20231201

Year of fee payment: 10