CN108039182B - Voice activation detection method - Google Patents
Voice activation detection method Download PDFInfo
- Publication number
- CN108039182B CN108039182B CN201711407711.0A CN201711407711A CN108039182B CN 108039182 B CN108039182 B CN 108039182B CN 201711407711 A CN201711407711 A CN 201711407711A CN 108039182 B CN108039182 B CN 108039182B
- Authority
- CN
- China
- Prior art keywords
- frame
- audio
- noise
- sample
- audio sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Abstract
The invention belongs to the technical field of voice signal processing, and discloses a voice activation detection method, which is used for performing voice activation detection by utilizing the characteristics that a voice signal has stronger autocorrelation and noise has weaker autocorrelation, not only can realize smaller missed detection and false picking probability under a stronger noise environment, but also has lower computational complexity, and is easy to realize in various embedded platforms.
Description
Technical Field
The invention belongs to the technical field of voice signal processing, and particularly relates to a voice activation detection method.
Background
For the radio IP gateway, since the radio station can only perform half-duplex voice communication generally, and the voice signal from the IP network is a full-duplex voice signal generally, the radio IP gateway needs to be able to realize the mutual conversion between full duplex and half duplex, that is, when it is found that there is no voice in the audio signal from the IP network and only noise exists, the radio station is in a receiving state and sends the audio signal received by the radio station to the IP network, and when the audio signal from the IP network contains a voice signal, the radio station is in a sending state and sends the voice signal from the IP network out through the radio station.
Therefore, the station IP gateway needs to detect whether the audio signal from the IP network contains voice using a voice activation detection algorithm, and the requirements for the voice activation detection algorithm generally include: (1) the radio station IP gateway usually adopts embedded platforms (such as various ARM platforms) and uses a Linux operating system to process various protocols, so that the voice activation detection algorithm has low algorithm complexity so as to be capable of running on various embedded Linux platforms; (2) the voice activation detection algorithm has strong anti-noise performance, and voice signals sent from different places through an IP network often contain noise signals with different amplitudes, so the voice activation detection algorithm has to be capable of realizing small missed detection and false detection probability under a strong noise environment.
Currently, the most used voice activity detection on embedded Linux platforms is the short-time energy and zero-crossing rate voice activity detection algorithm. The short-time energy and zero-crossing rate voice activation detection algorithm compares the calculated energy and zero-crossing rate with a preset threshold, if the calculated energy and zero-crossing rate exceed the threshold at the same time, the current frame is judged to be a voice frame, and if the calculated energy and zero-crossing rate exceed the threshold at the same time or one of the calculated energy and zero-crossing rate is lower than the other group of thresholds, the current frame is judged to be noise.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a voice activation detection method, which can not only achieve a low probability of missing detection and false detection in a relatively strong noise environment, but also have a low computational complexity, and is easy to implement in various embedded platforms.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A voice activity detection method, the voice activity detection method comprising:
when the autocorrelation of the ith frame of audio sample is smaller than the noise threshold, judging that the ith frame of audio sample is a noise frame;
otherwise, when i is equal to 1, judging the 1 st frame audio sample as a noise frame;
and when i is larger than 1, the judgment result of the i frame audio sample is the same as that of the i-1 frame audio sample.
The technical scheme of the invention has the characteristics and further improvements that:
(1) in step 2, calculating the autocorrelation degree R of the ith frame of audio sampleiThe method specifically comprises the following steps:
wherein N represents the total number of sampling points contained in the ith frame of audio samples, and xi(k) Representing the kth sample point, x, in the ith frame of audio samplesi(k +1) denotes a (k +1) th sampling point in the ith frame of audio samples, sgn (.) denotes a sign function, and C denotes a set constant greater than zero.
(2) Setting the audio sample of the 1 st frame as a noise frame, calculating the noise energy E of the audio sample of the 1 st frame, and determining a constant C according to the noise energy E:
the method of the invention utilizes the characteristic that the voice signal has stronger autocorrelation and the noise has weaker autocorrelation to carry out voice activation detection, not only can realize smaller missed detection and false picking probability under stronger noise environment, but also has lower computational complexity and is easy to realize in various embedded platforms.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a voice activity detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a comparison between a probability distribution function and a standard normal distribution function according to an embodiment of the present invention;
fig. 3 is a schematic diagram of simulation results of the conventional method and the method of the present invention according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a voice activation detection method, as shown in fig. 1, where the voice activation detection method includes:
And 2, setting a voice threshold and a noise threshold, and calculating the autocorrelation of the ith frame of audio sample, wherein i is more than or equal to 1 and less than or equal to M, and M is the total number of audio samples contained in the audio signal sample stream.
In step 2, calculating the autocorrelation degree R of the ith frame of audio sampleiThe method specifically comprises the following steps:
wherein N represents the total number of sampling points contained in the ith frame of audio samples, and xi(k) Representing the kth sample point, x, in the ith frame of audio samplesi(k +1) denotes a (k +1) th sampling point in the ith frame of audio samples, sgn (.) denotes a sign function, and C denotes a set constant greater than zero.
when the autocorrelation of the ith frame of audio sample is smaller than the noise threshold, judging that the ith frame of audio sample is a noise frame;
otherwise, when i is equal to 1, judging the 1 st frame audio sample as a noise frame;
and when i is larger than 1, the judgment result of the i frame audio sample is the same as that of the i-1 frame audio sample.
Further, the 1 st frame of audio samples is set as a noise frame, the noise energy E of the 1 st frame of audio samples is calculated, and the constant C is determined according to the noise energy E as follows:
it should be noted that, when the audio samples do not include a speech signal, x (k) can be assumed to be an additive white gaussian noise signal and obey a normal distributionIf so:
Z=X(k)X(k+1) (1)
the probability distribution function for Z can be shown to be:
a comparison of the probability distribution function f (z) and the standard normal distribution function is shown in fig. 2. And then ordering:
U=Z+C (3)
the probability that U is greater than or equal to 0 can be expressed as:
as can be seen from the shaded part in FIG. 2, the reasonable selection of C can make the probability P { U ≧ 0} that U is greater than or equal to 0 decrease rapidly, and the speech signal usually has stronger correlation, so the invention can significantly improve the anti-noise performance of the speech activation detection algorithm. And when C is equal to 0, P { U ≧ 0} is larger, so the short-time energy and zero-crossing rate speech activation algorithm can not distinguish the speech signal from the noise signal under a strong noise environment.
The computer simulation result also proves the effectiveness and superiority of the method. The original speech signal, the noisy speech signal, and the short-term energy, zero-crossing rate and autocorrelation of each frame thereof are shown in fig. 3, where fig. 3(a) is the original speech time domain signal, fig. 3(b) is the noisy speech time domain signal, fig. 3(c) is a schematic diagram of the detection result of the existing short-term energy and short-term zero-crossing rate method, and fig. 3(d) is a schematic diagram of the detection result of the autocorrelation method-based detection method of the present invention. As can be seen from fig. 3, when the signal-to-noise ratio is 2dB, it is already difficult to distinguish between speech and noise signals by the short-term energy and zero-crossing rate indicators, but the signals can still be effectively distinguished by using the autocorrelation. Therefore, the voice activation detection algorithm based on the autocorrelation degree effectively improves the anti-noise performance of the algorithm by increasing the operation of N times of integer multiplication, and can be operated on various embedded Linux platforms.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (2)
1. A voice activity detection method, characterized in that the voice activity detection method comprises:
step 1, acquiring an audio signal sample stream, and dividing the audio signal sample stream into continuous multi-frame audio samples;
step 2, setting a voice threshold and a noise threshold, and calculating the autocorrelation of the ith frame of audio sample, wherein i is more than or equal to 1 and less than or equal to M, and M is the total number of audio sample frames contained in the audio signal sample stream;
calculating the autocorrelation degree R of the ith frame audio sampleiThe method specifically comprises the following steps:
wherein N represents the total number of sampling points contained in the ith frame of audio samples, and xi(k) Representing the kth sample point, x, in the ith frame of audio samplesi(k +1) represents the (k +1) th sampling point in the ith frame of audio samples, sgn (.) represents a sign function, and C represents a set constant greater than zero;
step 3, when the autocorrelation degree of the ith frame audio sample is greater than the speech threshold, judging that the ith frame audio sample is a speech frame; when the autocorrelation of the ith frame of audio sample is smaller than the noise threshold, judging that the ith frame of audio sample is a noise frame;
if not, then,
when i is 1, judging the 1 st frame audio sample as a noise frame; and when i is larger than 1, the judgment result of the i frame audio sample is the same as that of the i-1 frame audio sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407711.0A CN108039182B (en) | 2017-12-22 | 2017-12-22 | Voice activation detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407711.0A CN108039182B (en) | 2017-12-22 | 2017-12-22 | Voice activation detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108039182A CN108039182A (en) | 2018-05-15 |
CN108039182B true CN108039182B (en) | 2021-10-08 |
Family
ID=62100806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711407711.0A Active CN108039182B (en) | 2017-12-22 | 2017-12-22 | Voice activation detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108039182B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785866A (en) * | 2019-03-07 | 2019-05-21 | 上海电力学院 | The method of broadcasting speech and noise measuring based on correlation function maximum value |
CN111651135B (en) * | 2020-04-27 | 2021-05-25 | 珠海格力电器股份有限公司 | Sound awakening method and device, storage medium and electrical equipment |
CN115699173A (en) * | 2020-06-16 | 2023-02-03 | 华为技术有限公司 | Voice activity detection method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101010722A (en) * | 2004-08-30 | 2007-08-01 | 诺基亚公司 | Detection of voice activity in an audio signal |
CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection |
CN102194452A (en) * | 2011-04-14 | 2011-09-21 | 西安烽火电子科技有限责任公司 | Voice activity detection method in complex background noise |
US20120158401A1 (en) * | 2010-12-20 | 2012-06-21 | Lsi Corporation | Music detection using spectral peak analysis |
CN107045870A (en) * | 2017-05-23 | 2017-08-15 | 南京理工大学 | A kind of the Method of Speech Endpoint Detection of feature based value coding |
CN107134277A (en) * | 2017-06-15 | 2017-09-05 | 深圳市潮流网络技术有限公司 | A kind of voice-activation detecting method based on GMM model |
-
2017
- 2017-12-22 CN CN201711407711.0A patent/CN108039182B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101010722A (en) * | 2004-08-30 | 2007-08-01 | 诺基亚公司 | Detection of voice activity in an audio signal |
CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection |
US20120158401A1 (en) * | 2010-12-20 | 2012-06-21 | Lsi Corporation | Music detection using spectral peak analysis |
CN102194452A (en) * | 2011-04-14 | 2011-09-21 | 西安烽火电子科技有限责任公司 | Voice activity detection method in complex background noise |
CN107045870A (en) * | 2017-05-23 | 2017-08-15 | 南京理工大学 | A kind of the Method of Speech Endpoint Detection of feature based value coding |
CN107134277A (en) * | 2017-06-15 | 2017-09-05 | 深圳市潮流网络技术有限公司 | A kind of voice-activation detecting method based on GMM model |
Non-Patent Citations (4)
Title |
---|
"Auto-Correlation Property of Speech and its Application in Voice Activity Detection";Z. Shuyin;《2009 First International Workshop on Education Technology and Computer Science》;20090526;全文 * |
"Long-term auto-correlation statistics based voice activity detection for strong noisy speech";W. Shi;《2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP)》;20140904;全文 * |
"基于长时特征的语音端点检测方法研究";冯璐;《http://d.wanfangdata.com.cn/thesis/ChJUaGVzaXNOZXdTMjAyMTAyMDESCFkyNjA0MjQ4GghueXo2M2phdg%3D%3D》;20141106;第1,3,13,16,19-23,28页 * |
"话音激活检测优化算法研究";曹云;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20130315;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108039182A (en) | 2018-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10602267B2 (en) | Sound signal processing apparatus and method for enhancing a sound signal | |
EP3468162B1 (en) | Method and device for tracking echo delay | |
US8515097B2 (en) | Single microphone wind noise suppression | |
US8184816B2 (en) | Systems and methods for detecting wind noise using multiple audio sources | |
US9253568B2 (en) | Single-microphone wind noise suppression | |
CN108039182B (en) | Voice activation detection method | |
WO2017181772A1 (en) | Speech detection method and apparatus, and storage medium | |
CN107863099B (en) | Novel double-microphone voice detection and enhancement method | |
CN106098076B (en) | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise | |
WO2004075167A2 (en) | Log-likelihood ratio method for detecting voice activity and apparatus | |
CN112004177B (en) | Howling detection method, microphone volume adjustment method and storage medium | |
JP2010061151A (en) | Voice activity detector and validator for noisy environment | |
CN107331393B (en) | Self-adaptive voice activity detection method | |
CN108736921B (en) | Power line carrier communication preamble detection method for resisting random impulse noise | |
CN108810923B (en) | Method and device for judging AFH interference frequency point, computer readable storage medium and receiver | |
KR20080059881A (en) | Apparatus for preprocessing of speech signal and method for extracting end-point of speech signal thereof | |
CN110556128B (en) | Voice activity detection method and device and computer readable storage medium | |
CN111490954B (en) | Method and system for selecting important time delay tap of channel impulse response | |
May et al. | Generalization of supervised learning for binary mask estimation | |
CN110085264B (en) | Voice signal detection method, device, equipment and storage medium | |
CN116527081A (en) | Pulse noise suppression method suitable for medium-voltage carrier system | |
JP2008209445A (en) | Reverberation removing device, reverberation removing method, reverberation removing program and recording medium | |
CN109617839A (en) | A kind of Morse signal detection method based on Kalman filtering algorithm | |
CN115995234A (en) | Audio noise reduction method and device, electronic equipment and readable storage medium | |
KR101494966B1 (en) | Method and apparatus for wideband spectrum sensing in cognitive radio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |