CN108039182A - A kind of voice-activation detecting method - Google Patents

A kind of voice-activation detecting method Download PDF

Info

Publication number
CN108039182A
CN108039182A CN201711407711.0A CN201711407711A CN108039182A CN 108039182 A CN108039182 A CN 108039182A CN 201711407711 A CN201711407711 A CN 201711407711A CN 108039182 A CN108039182 A CN 108039182A
Authority
CN
China
Prior art keywords
mrow
frame
voice
audio sample
mtd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711407711.0A
Other languages
Chinese (zh)
Other versions
CN108039182B (en
Inventor
张亦希
陈晨
王陈春
王业芳
常浩宇
王蕴
舒敏
王琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Fenghuo Electronics Co Ltd
Shaanxi Fenghuo Communication Group Co Ltd
Original Assignee
Shaanxi Fenghuo Communication Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Fenghuo Communication Group Co Ltd filed Critical Shaanxi Fenghuo Communication Group Co Ltd
Priority to CN201711407711.0A priority Critical patent/CN108039182B/en
Publication of CN108039182A publication Critical patent/CN108039182A/en
Application granted granted Critical
Publication of CN108039182B publication Critical patent/CN108039182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Abstract

The invention belongs to voice process technology field, disclose a kind of voice-activation detecting method, there is stronger correlation using voice signal and there is noise weaker correlation to carry out voice activation detection, less missing inspection can not only be realized under stronger noise circumstance and pick up probability by mistake, and there is relatively low computation complexity, it is easy to realize in various embedded platforms.

Description

A kind of voice-activation detecting method
Technical field
The invention belongs to voice process technology field, more particularly to a kind of voice-activation detecting method.
Background technology
For radio station IP gateway, since radio station can only generally carry out half-duplex voice communications, and the voice from IP network Signal is usually all the voice signal of full duplex, therefore radio station IP gateway is just required to realize the phase between full duplex and half-duplex Mutually conversion, i.e., when finding not having voice there was only noise in the audio signal from IP network, make radio station be in reception state, and Give the received audio signal in radio station to IP network, and when the audio signal from IP network includes voice signal, then make electricity Platform is in transmission state, and the voice signal from IP network is sent by radio station.
Therefore, radio station IP gateway needs whether to wrap the audio signal from IP network using voice activation detection algorithm It is detected containing voice, the requirement to voice activation detection algorithm generally includes:(1) there is relatively low complexity, due to radio station IP gateway generally use embedded platform is (such as:Various ARM platforms), and various agreements are handled using (SuSE) Linux OS, because This voice activation detection algorithm must have relatively low algorithm complex, so as to be transported in various embedded Linux platforms OK;(2) there is stronger noise robustness, since the voice signal sent from different location by IP network usually contains amplitude Different noise signals, thus voice activation detection algorithm allow for realizing under stronger noise circumstance less missing inspection and Probability is picked up by mistake.
At present, most commonly used voice activation detection is short-time energy and zero-crossing rate voice in embedded Linux platform Activate detection algorithm.Short-time energy and zero-crossing rate voice activation detection algorithm by the energy calculated and zero-crossing rate with it is set in advance Thresholding compares, if both be at the same time speech frame more than present frame is sentenced if thresholding, if both are at the same time or one of them is less than another group During thresholding, then it is noise to sentence present frame, and the algorithm is too simple, so as to cause its noise robustness poor, i.e., stronger Larger missing inspection is had under noise circumstance and picks up probability by mistake.
The content of the invention
In view of the above-mentioned problems, it is an object of the invention to provide a kind of voice-activation detecting method, can not only be stronger Noise circumstance under realize less missing inspection and pick up probability by mistake and there is relatively low computation complexity, be easy in various insertions Realized in formula platform.
To reach above-mentioned purpose, the present invention is realised by adopting the following technical scheme.
A kind of voice-activation detecting method, the voice-activation detecting method include:
Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point;
Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤ M, M are the audio sample totalframes that the sampled audio signal stream includes;
Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, the i-th frame sound is judged Frequency sampling is speech frame;
When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio sample is judged For noise frame;
Otherwise, as i=1, judge that the 1st frame audio sample is noise frame;
As i > 1, the i-th frame audio sample is identical with the judgement result of the i-th -1 frame audio sample.
The characteristics of technical solution of the present invention and further it is improved to:
(1) in step 2, the correlation R of the i-th frame audio sample is calculatedi, it is specially:
Wherein, N represents the sampled point total number that the i-th frame audio sample includes, xi(k) represent in the i-th frame audio sample K-th of sampled point, xi(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, C tables Show the setting constant more than zero.
(2) the 1st frame audio sample is set as noise frame, the noise energy E of the 1st frame audio sample is calculated, according to the noise ENERGY E determines constant C:
The method of the present invention using voice signal there is stronger correlation and noise have weaker correlation come Voice activation detection is carried out, less missing inspection can not only be realized under stronger noise circumstance and pick up probability by mistake, and had Relatively low computation complexity, is easy to realize in various embedded platforms.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of voice-activation detecting method provided in an embodiment of the present invention;
Fig. 2 is probability-distribution function provided in an embodiment of the present invention and Standard Normal Distribution comparison schematic diagram;
Fig. 3 is existing method provided in an embodiment of the present invention and the method for the present invention simulation result schematic diagram.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment, belongs to the scope of protection of the invention.
The embodiment of the present invention provides a kind of voice-activation detecting method, as shown in Figure 1, the voice-activation detecting method bag Include:
Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point.
Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤ M, M are the audio sample totalframes that the sampled audio signal stream includes.
In step 2, the correlation R of the i-th frame audio sample is calculatedi, it is specially:
Wherein, N represents the sampled point total number that the i-th frame audio sample includes, xi(k) represent in the i-th frame audio sample K-th of sampled point, xi(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, C tables Show the setting constant more than zero.
Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, the i-th frame sound is judged Frequency sampling is speech frame;
When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio sample is judged For noise frame;
Otherwise, as i=1, judge that the 1st frame audio sample is noise frame;
As i > 1, the i-th frame audio sample is identical with the judgement result of the i-th -1 frame audio sample.
Further, if the 1st frame audio sample is noise frame, the noise energy E of the 1st frame audio sample is calculated, according to institute State noise energy E and determine that constant C is as follows:
It should be noted that when audio sample does not include voice signal, then assume that x (k) is additivity White Gaussian Noise signal, and Normal DistributionIf order:
Z=X (k) X (k+1) (1)
The probability-distribution function that can then prove Z is:
Probability-distribution function F (z) and Standard Normal Distribution comparison schematic diagram are as shown in Figure 2.Make again:
U=Z+C (3)
Then probability of the U more than or equal to 0 can be expressed as:
As in dash area in Fig. 2 as it can be seen that reasonably select C size, can make U be more than or equal to 0 probability P { U >=0 } Diminish rapidly, and voice signal usually has the stronger degree of correlation, therefore the present invention can significantly improve voice activation detection and calculate The noise robustness of method.And as C=0, P { U >=0 } is larger, therefore short-time energy and zero-crossing rate voice activation algorithm just can not Under stronger noise circumstance, voice signal and noise signal are distinguished.
Computer artificial result also demonstrates that the validity and superiority of the method for the present invention.Add and add in primary speech signal Property additive white Gaussian, when signal-to-noise ratio is reduced to 2dB, primary speech signal adds the voice signal after making an uproar, and its each frame is short As shown in figure 3, wherein, Fig. 3 (a) is raw tone time-domain signal for Shi Nengliang, zero-crossing rate and correlation, after Fig. 3 (b) plus noises Voice time domain signal, Fig. 3 (c) is the testing result schematic diagram of existing short-time energy and short-time zero-crossing rate method, and Fig. 3 (d) is Testing result schematic diagram of the invention based on correlation method.As seen from Figure 3, it has been difficult to pass through when signal-to-noise ratio is 2dB Short-time energy and zero-crossing rate index distinguish voice and noise signal, but still can effectively be distinguished using correlation.Therefore, this hair The voice activation detection algorithm based on correlation is run by increasing n times multiplication of integers in bright, is effectively improved algorithm Noise robustness, while can also be run in various embedded Linux platforms.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through The relevant hardware of programmed instruction is completed, and foregoing program can be stored in computer read/write memory medium, which exists During execution, execution the step of including above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or CD Etc. it is various can be with the medium of store program codes.
The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (3)

1. a kind of voice-activation detecting method, it is characterised in that the voice-activation detecting method includes:
Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point;
Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤M, M are The audio sample totalframes that the sampled audio signal stream includes;
Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, judge that the i-th frame audio is adopted Sample is speech frame;When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio is judged It is sampled as noise frame;
Otherwise,
As i=1, judge that the 1st frame audio sample is noise frame;As i > 1, the i-th frame audio sample and i-th -1 The judgement result of frame audio sample is identical.
2. a kind of voice-activation detecting method according to claim 1, it is characterised in that in step 2, calculate the i-th frame sound The correlation R of frequency samplingi, it is specially:
<mrow> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>2</mn> </mrow> </munderover> <mo>{</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>sgn</mi> <mo>&amp;lsqb;</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msub> <mi>x</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <mi>C</mi> <mo>&amp;rsqb;</mo> <mo>}</mo> </mrow>
Wherein, N represents the sampled point total number that the i-th frame audio sample includes, xi(k) k-th in the i-th frame audio sample is represented Sampled point, xi(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, and C represents to be more than Zero setting constant.
A kind of 3. voice-activation detecting method according to claim 2, it is characterised in that
If the 1st frame audio sample is noise frame, the noise energy E of the 1st frame audio sample is calculated, it is true according to the noise energy E Permanent several C:
<mrow> <mi>C</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0.020</mn> <mo>&amp;times;</mo> <mi>E</mi> </mrow> </mtd> <mtd> <mrow> <mi>E</mi> <mo>&amp;GreaterEqual;</mo> <mn>320000</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0.015</mn> <mo>&amp;times;</mo> <mi>E</mi> </mrow> </mtd> <mtd> <mrow> <mn>36000</mn> <mo>&amp;le;</mo> <mi>E</mi> <mo>&lt;</mo> <mn>320000</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0.010</mn> <mo>&amp;times;</mo> <mi>E</mi> </mrow> </mtd> <mtd> <mrow> <mi>E</mi> <mo>&lt;</mo> <mn>36000</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow>
CN201711407711.0A 2017-12-22 2017-12-22 Voice activation detection method Active CN108039182B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711407711.0A CN108039182B (en) 2017-12-22 2017-12-22 Voice activation detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711407711.0A CN108039182B (en) 2017-12-22 2017-12-22 Voice activation detection method

Publications (2)

Publication Number Publication Date
CN108039182A true CN108039182A (en) 2018-05-15
CN108039182B CN108039182B (en) 2021-10-08

Family

ID=62100806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711407711.0A Active CN108039182B (en) 2017-12-22 2017-12-22 Voice activation detection method

Country Status (1)

Country Link
CN (1) CN108039182B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785866A (en) * 2019-03-07 2019-05-21 上海电力学院 The method of broadcasting speech and noise measuring based on correlation function maximum value
CN111651135A (en) * 2020-04-27 2020-09-11 珠海格力电器股份有限公司 Sound awakening method and device, storage medium and electrical equipment
WO2021253235A1 (en) * 2020-06-16 2021-12-23 华为技术有限公司 Voice activity detection method and apparatus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101010722A (en) * 2004-08-30 2007-08-01 诺基亚公司 Detection of voice activity in an audio signal
CN102044242A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method, device and electronic equipment for voice activity detection
CN102194452A (en) * 2011-04-14 2011-09-21 西安烽火电子科技有限责任公司 Voice activity detection method in complex background noise
US20120158401A1 (en) * 2010-12-20 2012-06-21 Lsi Corporation Music detection using spectral peak analysis
CN107045870A (en) * 2017-05-23 2017-08-15 南京理工大学 A kind of the Method of Speech Endpoint Detection of feature based value coding
CN107134277A (en) * 2017-06-15 2017-09-05 深圳市潮流网络技术有限公司 A kind of voice-activation detecting method based on GMM model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101010722A (en) * 2004-08-30 2007-08-01 诺基亚公司 Detection of voice activity in an audio signal
CN102044242A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method, device and electronic equipment for voice activity detection
US20120158401A1 (en) * 2010-12-20 2012-06-21 Lsi Corporation Music detection using spectral peak analysis
CN102194452A (en) * 2011-04-14 2011-09-21 西安烽火电子科技有限责任公司 Voice activity detection method in complex background noise
CN107045870A (en) * 2017-05-23 2017-08-15 南京理工大学 A kind of the Method of Speech Endpoint Detection of feature based value coding
CN107134277A (en) * 2017-06-15 2017-09-05 深圳市潮流网络技术有限公司 A kind of voice-activation detecting method based on GMM model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
W. SHI: ""Long-term auto-correlation statistics based voice activity detection for strong noisy speech"", 《2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP)》 *
Z. SHUYIN: ""Auto-Correlation Property of Speech and its Application in Voice Activity Detection"", 《2009 FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE》 *
冯璐: ""基于长时特征的语音端点检测方法研究"", 《HTTP://D.WANFANGDATA.COM.CN/THESIS/CHJUAGVZAXNOZXDTMJAYMTAYMDESCFKYNJA0MJQ4GGHUEXO2M2PHDG%3D%3D》 *
曹云: ""话音激活检测优化算法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785866A (en) * 2019-03-07 2019-05-21 上海电力学院 The method of broadcasting speech and noise measuring based on correlation function maximum value
CN111651135A (en) * 2020-04-27 2020-09-11 珠海格力电器股份有限公司 Sound awakening method and device, storage medium and electrical equipment
WO2021253235A1 (en) * 2020-06-16 2021-12-23 华为技术有限公司 Voice activity detection method and apparatus

Also Published As

Publication number Publication date
CN108039182B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN108039182A (en) A kind of voice-activation detecting method
CN101430882B (en) Method and apparatus for restraining wind noise
Tanyer et al. Voice activity detection in nonstationary noise
CN103346845B (en) Based on blind frequency spectrum sensing method and the device of fast Fourier transform
CN102842305B (en) Method and device for detecting keynote
CN105190746A (en) Method and apparatus for detecting a target keyword
CN102915753A (en) Method for intelligently controlling volume of electronic device and implementation device of method
CN108564966A (en) The method and its equipment of tone testing, the device with store function
CN109767776B (en) Deception voice detection method based on dense neural network
CN109040940A (en) A kind of detection method and device of loudspeaker
CN106898353A (en) A kind of Intelligent household voice control system and its audio recognition method
CN103632681B (en) A kind of spectral envelope silence detection method
CN101494508A (en) Frequency spectrum detection method based on characteristic cyclic frequency
CN105845149A (en) Predominant pitch acquisition method in acoustical signal and system thereof
CN108877809A (en) A kind of speaker&#39;s audio recognition method and device
CN105848052A (en) Microphone switching method and terminal
CN108010536A (en) Echo cancel method, device, system and storage medium
CN107293287A (en) The method and apparatus for detecting audio signal
CN110111811A (en) Audio signal detection method, device and storage medium
CN107742516A (en) Intelligent identification Method, robot and computer-readable recording medium
CN103581447B (en) A kind of method of signal transacting, device and electronic equipment
CN109087657A (en) A kind of sound enhancement method applied to ultrashort wave radio set
CN101814291B (en) Method and device for improving signal-to-noise ratio of voice signals in time domain
CN105261363A (en) Voice recognition method, device and terminal
CN109377982A (en) A kind of efficient voice acquisition methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant