CN108039182B - Voice activation detection method - Google Patents

Voice activation detection method Download PDF

Info

Publication number
CN108039182B
CN108039182B CN201711407711.0A CN201711407711A CN108039182B CN 108039182 B CN108039182 B CN 108039182B CN 201711407711 A CN201711407711 A CN 201711407711A CN 108039182 B CN108039182 B CN 108039182B
Authority
CN
China
Prior art keywords
frame
audio
noise
sample
audio sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711407711.0A
Other languages
Chinese (zh)
Other versions
CN108039182A (en
Inventor
张亦希
陈晨
王陈春
王业芳
常浩宇
王蕴
舒敏
王琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Fenghuo Communication Group Co Ltd
Original Assignee
Shaanxi Fenghuo Communication Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Fenghuo Communication Group Co Ltd filed Critical Shaanxi Fenghuo Communication Group Co Ltd
Priority to CN201711407711.0A priority Critical patent/CN108039182B/en
Publication of CN108039182A publication Critical patent/CN108039182A/en
Application granted granted Critical
Publication of CN108039182B publication Critical patent/CN108039182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Abstract

The invention belongs to the technical field of voice signal processing, and discloses a voice activation detection method, which is used for performing voice activation detection by utilizing the characteristics that a voice signal has stronger autocorrelation and noise has weaker autocorrelation, not only can realize smaller missed detection and false picking probability under a stronger noise environment, but also has lower computational complexity, and is easy to realize in various embedded platforms.

Description

Voice activation detection method
Technical Field
The invention belongs to the technical field of voice signal processing, and particularly relates to a voice activation detection method.
Background
For the radio IP gateway, since the radio station can only perform half-duplex voice communication generally, and the voice signal from the IP network is a full-duplex voice signal generally, the radio IP gateway needs to be able to realize the mutual conversion between full duplex and half duplex, that is, when it is found that there is no voice in the audio signal from the IP network and only noise exists, the radio station is in a receiving state and sends the audio signal received by the radio station to the IP network, and when the audio signal from the IP network contains a voice signal, the radio station is in a sending state and sends the voice signal from the IP network out through the radio station.
Therefore, the station IP gateway needs to detect whether the audio signal from the IP network contains voice using a voice activation detection algorithm, and the requirements for the voice activation detection algorithm generally include: (1) the radio station IP gateway usually adopts embedded platforms (such as various ARM platforms) and uses a Linux operating system to process various protocols, so that the voice activation detection algorithm has low algorithm complexity so as to be capable of running on various embedded Linux platforms; (2) the voice activation detection algorithm has strong anti-noise performance, and voice signals sent from different places through an IP network often contain noise signals with different amplitudes, so the voice activation detection algorithm has to be capable of realizing small missed detection and false detection probability under a strong noise environment.
Currently, the most used voice activity detection on embedded Linux platforms is the short-time energy and zero-crossing rate voice activity detection algorithm. The short-time energy and zero-crossing rate voice activation detection algorithm compares the calculated energy and zero-crossing rate with a preset threshold, if the calculated energy and zero-crossing rate exceed the threshold at the same time, the current frame is judged to be a voice frame, and if the calculated energy and zero-crossing rate exceed the threshold at the same time or one of the calculated energy and zero-crossing rate is lower than the other group of thresholds, the current frame is judged to be noise.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a voice activation detection method, which can not only achieve a low probability of missing detection and false detection in a relatively strong noise environment, but also have a low computational complexity, and is easy to implement in various embedded platforms.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A voice activity detection method, the voice activity detection method comprising:
step 1, acquiring an audio signal sample stream, and dividing the audio signal sample stream into continuous multi-frame audio samples;
step 2, setting a voice threshold and a noise threshold, and calculating the autocorrelation of the ith frame of audio sample, wherein i is more than or equal to 1 and less than or equal to M, and M is the total number of audio sample frames contained in the audio signal sample stream;
step 3, when the autocorrelation degree of the ith frame audio sample is greater than the speech threshold, judging that the ith frame audio sample is a speech frame;
when the autocorrelation of the ith frame of audio sample is smaller than the noise threshold, judging that the ith frame of audio sample is a noise frame;
otherwise, when i is equal to 1, judging the 1 st frame audio sample as a noise frame;
and when i is larger than 1, the judgment result of the i frame audio sample is the same as that of the i-1 frame audio sample.
The technical scheme of the invention has the characteristics and further improvements that:
(1) in step 2, calculating the autocorrelation degree R of the ith frame of audio sampleiThe method specifically comprises the following steps:
Figure BDA0001520670560000021
wherein N represents the total number of sampling points contained in the ith frame of audio samples, and xi(k) Representing the kth sample point, x, in the ith frame of audio samplesi(k +1) denotes a (k +1) th sampling point in the ith frame of audio samples, sgn (.) denotes a sign function, and C denotes a set constant greater than zero.
(2) Setting the audio sample of the 1 st frame as a noise frame, calculating the noise energy E of the audio sample of the 1 st frame, and determining a constant C according to the noise energy E:
Figure BDA0001520670560000031
the method of the invention utilizes the characteristic that the voice signal has stronger autocorrelation and the noise has weaker autocorrelation to carry out voice activation detection, not only can realize smaller missed detection and false picking probability under stronger noise environment, but also has lower computational complexity and is easy to realize in various embedded platforms.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a voice activity detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a comparison between a probability distribution function and a standard normal distribution function according to an embodiment of the present invention;
fig. 3 is a schematic diagram of simulation results of the conventional method and the method of the present invention according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a voice activation detection method, as shown in fig. 1, where the voice activation detection method includes:
step 1, obtaining an audio signal sample stream, and dividing the audio signal sample stream into continuous multi-frame audio samples.
And 2, setting a voice threshold and a noise threshold, and calculating the autocorrelation of the ith frame of audio sample, wherein i is more than or equal to 1 and less than or equal to M, and M is the total number of audio samples contained in the audio signal sample stream.
In step 2, calculating the autocorrelation degree R of the ith frame of audio sampleiThe method specifically comprises the following steps:
Figure BDA0001520670560000041
wherein N represents the total number of sampling points contained in the ith frame of audio samples, and xi(k) Representing the kth sample point, x, in the ith frame of audio samplesi(k +1) denotes a (k +1) th sampling point in the ith frame of audio samples, sgn (.) denotes a sign function, and C denotes a set constant greater than zero.
Step 3, when the autocorrelation degree of the ith frame audio sample is greater than the speech threshold, judging that the ith frame audio sample is a speech frame;
when the autocorrelation of the ith frame of audio sample is smaller than the noise threshold, judging that the ith frame of audio sample is a noise frame;
otherwise, when i is equal to 1, judging the 1 st frame audio sample as a noise frame;
and when i is larger than 1, the judgment result of the i frame audio sample is the same as that of the i-1 frame audio sample.
Further, the 1 st frame of audio samples is set as a noise frame, the noise energy E of the 1 st frame of audio samples is calculated, and the constant C is determined according to the noise energy E as follows:
Figure BDA0001520670560000042
it should be noted that, when the audio samples do not include a speech signal, x (k) can be assumed to be an additive white gaussian noise signal and obey a normal distribution
Figure BDA0001520670560000051
If so:
Z=X(k)X(k+1) (1)
the probability distribution function for Z can be shown to be:
Figure BDA0001520670560000052
a comparison of the probability distribution function f (z) and the standard normal distribution function is shown in fig. 2. And then ordering:
U=Z+C (3)
the probability that U is greater than or equal to 0 can be expressed as:
Figure BDA0001520670560000053
as can be seen from the shaded part in FIG. 2, the reasonable selection of C can make the probability P { U ≧ 0} that U is greater than or equal to 0 decrease rapidly, and the speech signal usually has stronger correlation, so the invention can significantly improve the anti-noise performance of the speech activation detection algorithm. And when C is equal to 0, P { U ≧ 0} is larger, so the short-time energy and zero-crossing rate speech activation algorithm can not distinguish the speech signal from the noise signal under a strong noise environment.
The computer simulation result also proves the effectiveness and superiority of the method. The original speech signal, the noisy speech signal, and the short-term energy, zero-crossing rate and autocorrelation of each frame thereof are shown in fig. 3, where fig. 3(a) is the original speech time domain signal, fig. 3(b) is the noisy speech time domain signal, fig. 3(c) is a schematic diagram of the detection result of the existing short-term energy and short-term zero-crossing rate method, and fig. 3(d) is a schematic diagram of the detection result of the autocorrelation method-based detection method of the present invention. As can be seen from fig. 3, when the signal-to-noise ratio is 2dB, it is already difficult to distinguish between speech and noise signals by the short-term energy and zero-crossing rate indicators, but the signals can still be effectively distinguished by using the autocorrelation. Therefore, the voice activation detection algorithm based on the autocorrelation degree effectively improves the anti-noise performance of the algorithm by increasing the operation of N times of integer multiplication, and can be operated on various embedded Linux platforms.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (2)

1. A voice activity detection method, characterized in that the voice activity detection method comprises:
step 1, acquiring an audio signal sample stream, and dividing the audio signal sample stream into continuous multi-frame audio samples;
step 2, setting a voice threshold and a noise threshold, and calculating the autocorrelation of the ith frame of audio sample, wherein i is more than or equal to 1 and less than or equal to M, and M is the total number of audio sample frames contained in the audio signal sample stream;
calculating the autocorrelation degree R of the ith frame audio sampleiThe method specifically comprises the following steps:
Figure FDA0003162208580000011
wherein N represents the total number of sampling points contained in the ith frame of audio samples, and xi(k) Representing the kth sample point, x, in the ith frame of audio samplesi(k +1) represents the (k +1) th sampling point in the ith frame of audio samples, sgn (.) represents a sign function, and C represents a set constant greater than zero;
step 3, when the autocorrelation degree of the ith frame audio sample is greater than the speech threshold, judging that the ith frame audio sample is a speech frame; when the autocorrelation of the ith frame of audio sample is smaller than the noise threshold, judging that the ith frame of audio sample is a noise frame;
if not, then,
when i is 1, judging the 1 st frame audio sample as a noise frame; and when i is larger than 1, the judgment result of the i frame audio sample is the same as that of the i-1 frame audio sample.
2. A voice activity detection method as claimed in claim 1,
setting the audio sample of the 1 st frame as a noise frame, calculating the noise energy E of the audio sample of the 1 st frame, and determining a constant C according to the noise energy E:
Figure FDA0003162208580000012
CN201711407711.0A 2017-12-22 2017-12-22 Voice activation detection method Active CN108039182B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711407711.0A CN108039182B (en) 2017-12-22 2017-12-22 Voice activation detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711407711.0A CN108039182B (en) 2017-12-22 2017-12-22 Voice activation detection method

Publications (2)

Publication Number Publication Date
CN108039182A CN108039182A (en) 2018-05-15
CN108039182B true CN108039182B (en) 2021-10-08

Family

ID=62100806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711407711.0A Active CN108039182B (en) 2017-12-22 2017-12-22 Voice activation detection method

Country Status (1)

Country Link
CN (1) CN108039182B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785866A (en) * 2019-03-07 2019-05-21 上海电力学院 The method of broadcasting speech and noise measuring based on correlation function maximum value
CN111651135B (en) * 2020-04-27 2021-05-25 珠海格力电器股份有限公司 Sound awakening method and device, storage medium and electrical equipment
CN115699173A (en) * 2020-06-16 2023-02-03 华为技术有限公司 Voice activity detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101010722A (en) * 2004-08-30 2007-08-01 诺基亚公司 Detection of voice activity in an audio signal
CN102044242A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method, device and electronic equipment for voice activity detection
CN102194452A (en) * 2011-04-14 2011-09-21 西安烽火电子科技有限责任公司 Voice activity detection method in complex background noise
US20120158401A1 (en) * 2010-12-20 2012-06-21 Lsi Corporation Music detection using spectral peak analysis
CN107045870A (en) * 2017-05-23 2017-08-15 南京理工大学 A kind of the Method of Speech Endpoint Detection of feature based value coding
CN107134277A (en) * 2017-06-15 2017-09-05 深圳市潮流网络技术有限公司 A kind of voice-activation detecting method based on GMM model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101010722A (en) * 2004-08-30 2007-08-01 诺基亚公司 Detection of voice activity in an audio signal
CN102044242A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method, device and electronic equipment for voice activity detection
US20120158401A1 (en) * 2010-12-20 2012-06-21 Lsi Corporation Music detection using spectral peak analysis
CN102194452A (en) * 2011-04-14 2011-09-21 西安烽火电子科技有限责任公司 Voice activity detection method in complex background noise
CN107045870A (en) * 2017-05-23 2017-08-15 南京理工大学 A kind of the Method of Speech Endpoint Detection of feature based value coding
CN107134277A (en) * 2017-06-15 2017-09-05 深圳市潮流网络技术有限公司 A kind of voice-activation detecting method based on GMM model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Auto-Correlation Property of Speech and its Application in Voice Activity Detection";Z. Shuyin;《2009 First International Workshop on Education Technology and Computer Science》;20090526;全文 *
"Long-term auto-correlation statistics based voice activity detection for strong noisy speech";W. Shi;《2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP)》;20140904;全文 *
"基于长时特征的语音端点检测方法研究";冯璐;《http://d.wanfangdata.com.cn/thesis/ChJUaGVzaXNOZXdTMjAyMTAyMDESCFkyNjA0MjQ4GghueXo2M2phdg%3D%3D》;20141106;第1,3,13,16,19-23,28页 *
"话音激活检测优化算法研究";曹云;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20130315;全文 *

Also Published As

Publication number Publication date
CN108039182A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
EP3468162B1 (en) Method and device for tracking echo delay
US8515097B2 (en) Single microphone wind noise suppression
US8184816B2 (en) Systems and methods for detecting wind noise using multiple audio sources
US9253568B2 (en) Single-microphone wind noise suppression
CN108039182B (en) Voice activation detection method
WO2017181772A1 (en) Speech detection method and apparatus, and storage medium
CN107863099B (en) Novel double-microphone voice detection and enhancement method
CN106098076B (en) One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise
WO2004075167A2 (en) Log-likelihood ratio method for detecting voice activity and apparatus
CN112004177B (en) Howling detection method, microphone volume adjustment method and storage medium
JP2010061151A (en) Voice activity detector and validator for noisy environment
CN107331393B (en) Self-adaptive voice activity detection method
CN108736921B (en) Power line carrier communication preamble detection method for resisting random impulse noise
CN108810923B (en) Method and device for judging AFH interference frequency point, computer readable storage medium and receiver
KR20080059881A (en) Apparatus for preprocessing of speech signal and method for extracting end-point of speech signal thereof
CN110556128B (en) Voice activity detection method and device and computer readable storage medium
CN111490954B (en) Method and system for selecting important time delay tap of channel impulse response
May et al. Generalization of supervised learning for binary mask estimation
CN110085264B (en) Voice signal detection method, device, equipment and storage medium
CN116527081A (en) Pulse noise suppression method suitable for medium-voltage carrier system
JP2008209445A (en) Reverberation removing device, reverberation removing method, reverberation removing program and recording medium
CN109617839A (en) A kind of Morse signal detection method based on Kalman filtering algorithm
CN115995234A (en) Audio noise reduction method and device, electronic equipment and readable storage medium
KR101494966B1 (en) Method and apparatus for wideband spectrum sensing in cognitive radio

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant