CN108039182A - A kind of voice-activation detecting method - Google Patents
A kind of voice-activation detecting method Download PDFInfo
- Publication number
- CN108039182A CN108039182A CN201711407711.0A CN201711407711A CN108039182A CN 108039182 A CN108039182 A CN 108039182A CN 201711407711 A CN201711407711 A CN 201711407711A CN 108039182 A CN108039182 A CN 108039182A
- Authority
- CN
- China
- Prior art keywords
- mrow
- frame
- voice
- audio sample
- mtd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Abstract
The invention belongs to voice process technology field, disclose a kind of voice-activation detecting method, there is stronger correlation using voice signal and there is noise weaker correlation to carry out voice activation detection, less missing inspection can not only be realized under stronger noise circumstance and pick up probability by mistake, and there is relatively low computation complexity, it is easy to realize in various embedded platforms.
Description
Technical field
The invention belongs to voice process technology field, more particularly to a kind of voice-activation detecting method.
Background technology
For radio station IP gateway, since radio station can only generally carry out half-duplex voice communications, and the voice from IP network
Signal is usually all the voice signal of full duplex, therefore radio station IP gateway is just required to realize the phase between full duplex and half-duplex
Mutually conversion, i.e., when finding not having voice there was only noise in the audio signal from IP network, make radio station be in reception state, and
Give the received audio signal in radio station to IP network, and when the audio signal from IP network includes voice signal, then make electricity
Platform is in transmission state, and the voice signal from IP network is sent by radio station.
Therefore, radio station IP gateway needs whether to wrap the audio signal from IP network using voice activation detection algorithm
It is detected containing voice, the requirement to voice activation detection algorithm generally includes:(1) there is relatively low complexity, due to radio station
IP gateway generally use embedded platform is (such as:Various ARM platforms), and various agreements are handled using (SuSE) Linux OS, because
This voice activation detection algorithm must have relatively low algorithm complex, so as to be transported in various embedded Linux platforms
OK;(2) there is stronger noise robustness, since the voice signal sent from different location by IP network usually contains amplitude
Different noise signals, thus voice activation detection algorithm allow for realizing under stronger noise circumstance less missing inspection and
Probability is picked up by mistake.
At present, most commonly used voice activation detection is short-time energy and zero-crossing rate voice in embedded Linux platform
Activate detection algorithm.Short-time energy and zero-crossing rate voice activation detection algorithm by the energy calculated and zero-crossing rate with it is set in advance
Thresholding compares, if both be at the same time speech frame more than present frame is sentenced if thresholding, if both are at the same time or one of them is less than another group
During thresholding, then it is noise to sentence present frame, and the algorithm is too simple, so as to cause its noise robustness poor, i.e., stronger
Larger missing inspection is had under noise circumstance and picks up probability by mistake.
The content of the invention
In view of the above-mentioned problems, it is an object of the invention to provide a kind of voice-activation detecting method, can not only be stronger
Noise circumstance under realize less missing inspection and pick up probability by mistake and there is relatively low computation complexity, be easy in various insertions
Realized in formula platform.
To reach above-mentioned purpose, the present invention is realised by adopting the following technical scheme.
A kind of voice-activation detecting method, the voice-activation detecting method include:
Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point;
Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤
M, M are the audio sample totalframes that the sampled audio signal stream includes;
Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, the i-th frame sound is judged
Frequency sampling is speech frame;
When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio sample is judged
For noise frame;
Otherwise, as i=1, judge that the 1st frame audio sample is noise frame;
As i > 1, the i-th frame audio sample is identical with the judgement result of the i-th -1 frame audio sample.
The characteristics of technical solution of the present invention and further it is improved to:
(1) in step 2, the correlation R of the i-th frame audio sample is calculatedi, it is specially:
Wherein, N represents the sampled point total number that the i-th frame audio sample includes, xi(k) represent in the i-th frame audio sample
K-th of sampled point, xi(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, C tables
Show the setting constant more than zero.
(2) the 1st frame audio sample is set as noise frame, the noise energy E of the 1st frame audio sample is calculated, according to the noise
ENERGY E determines constant C:
The method of the present invention using voice signal there is stronger correlation and noise have weaker correlation come
Voice activation detection is carried out, less missing inspection can not only be realized under stronger noise circumstance and pick up probability by mistake, and had
Relatively low computation complexity, is easy to realize in various embedded platforms.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of voice-activation detecting method provided in an embodiment of the present invention;
Fig. 2 is probability-distribution function provided in an embodiment of the present invention and Standard Normal Distribution comparison schematic diagram;
Fig. 3 is existing method provided in an embodiment of the present invention and the method for the present invention simulation result schematic diagram.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment, belongs to the scope of protection of the invention.
The embodiment of the present invention provides a kind of voice-activation detecting method, as shown in Figure 1, the voice-activation detecting method bag
Include:
Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point.
Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤
M, M are the audio sample totalframes that the sampled audio signal stream includes.
In step 2, the correlation R of the i-th frame audio sample is calculatedi, it is specially:
Wherein, N represents the sampled point total number that the i-th frame audio sample includes, xi(k) represent in the i-th frame audio sample
K-th of sampled point, xi(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, C tables
Show the setting constant more than zero.
Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, the i-th frame sound is judged
Frequency sampling is speech frame;
When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio sample is judged
For noise frame;
Otherwise, as i=1, judge that the 1st frame audio sample is noise frame;
As i > 1, the i-th frame audio sample is identical with the judgement result of the i-th -1 frame audio sample.
Further, if the 1st frame audio sample is noise frame, the noise energy E of the 1st frame audio sample is calculated, according to institute
State noise energy E and determine that constant C is as follows:
It should be noted that when audio sample does not include voice signal, then assume that x (k) is additivity White Gaussian
Noise signal, and Normal DistributionIf order:
Z=X (k) X (k+1) (1)
The probability-distribution function that can then prove Z is:
Probability-distribution function F (z) and Standard Normal Distribution comparison schematic diagram are as shown in Figure 2.Make again:
U=Z+C (3)
Then probability of the U more than or equal to 0 can be expressed as:
As in dash area in Fig. 2 as it can be seen that reasonably select C size, can make U be more than or equal to 0 probability P { U >=0 }
Diminish rapidly, and voice signal usually has the stronger degree of correlation, therefore the present invention can significantly improve voice activation detection and calculate
The noise robustness of method.And as C=0, P { U >=0 } is larger, therefore short-time energy and zero-crossing rate voice activation algorithm just can not
Under stronger noise circumstance, voice signal and noise signal are distinguished.
Computer artificial result also demonstrates that the validity and superiority of the method for the present invention.Add and add in primary speech signal
Property additive white Gaussian, when signal-to-noise ratio is reduced to 2dB, primary speech signal adds the voice signal after making an uproar, and its each frame is short
As shown in figure 3, wherein, Fig. 3 (a) is raw tone time-domain signal for Shi Nengliang, zero-crossing rate and correlation, after Fig. 3 (b) plus noises
Voice time domain signal, Fig. 3 (c) is the testing result schematic diagram of existing short-time energy and short-time zero-crossing rate method, and Fig. 3 (d) is
Testing result schematic diagram of the invention based on correlation method.As seen from Figure 3, it has been difficult to pass through when signal-to-noise ratio is 2dB
Short-time energy and zero-crossing rate index distinguish voice and noise signal, but still can effectively be distinguished using correlation.Therefore, this hair
The voice activation detection algorithm based on correlation is run by increasing n times multiplication of integers in bright, is effectively improved algorithm
Noise robustness, while can also be run in various embedded Linux platforms.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
The relevant hardware of programmed instruction is completed, and foregoing program can be stored in computer read/write memory medium, which exists
During execution, execution the step of including above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (3)
1. a kind of voice-activation detecting method, it is characterised in that the voice-activation detecting method includes:
Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point;
Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤M, M are
The audio sample totalframes that the sampled audio signal stream includes;
Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, judge that the i-th frame audio is adopted
Sample is speech frame;When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio is judged
It is sampled as noise frame;
Otherwise,
As i=1, judge that the 1st frame audio sample is noise frame;As i > 1, the i-th frame audio sample and i-th -1
The judgement result of frame audio sample is identical.
2. a kind of voice-activation detecting method according to claim 1, it is characterised in that in step 2, calculate the i-th frame sound
The correlation R of frequency samplingi, it is specially:
<mrow>
<msub>
<mi>R</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>2</mn>
</mrow>
</munderover>
<mo>{</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mo>-</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mi>sgn</mi>
<mo>&lsqb;</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>+</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>C</mi>
<mo>&rsqb;</mo>
<mo>}</mo>
</mrow>
Wherein, N represents the sampled point total number that the i-th frame audio sample includes, xi(k) k-th in the i-th frame audio sample is represented
Sampled point, xi(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, and C represents to be more than
Zero setting constant.
A kind of 3. voice-activation detecting method according to claim 2, it is characterised in that
If the 1st frame audio sample is noise frame, the noise energy E of the 1st frame audio sample is calculated, it is true according to the noise energy E
Permanent several C:
<mrow>
<mi>C</mi>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mn>0.020</mn>
<mo>&times;</mo>
<mi>E</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>E</mi>
<mo>&GreaterEqual;</mo>
<mn>320000</mn>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mn>0.015</mn>
<mo>&times;</mo>
<mi>E</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<mn>36000</mn>
<mo>&le;</mo>
<mi>E</mi>
<mo><</mo>
<mn>320000</mn>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mn>0.010</mn>
<mo>&times;</mo>
<mi>E</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>E</mi>
<mo><</mo>
<mn>36000</mn>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>.</mo>
</mrow>
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407711.0A CN108039182B (en) | 2017-12-22 | 2017-12-22 | Voice activation detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711407711.0A CN108039182B (en) | 2017-12-22 | 2017-12-22 | Voice activation detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108039182A true CN108039182A (en) | 2018-05-15 |
CN108039182B CN108039182B (en) | 2021-10-08 |
Family
ID=62100806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711407711.0A Active CN108039182B (en) | 2017-12-22 | 2017-12-22 | Voice activation detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108039182B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785866A (en) * | 2019-03-07 | 2019-05-21 | 上海电力学院 | The method of broadcasting speech and noise measuring based on correlation function maximum value |
CN111651135A (en) * | 2020-04-27 | 2020-09-11 | 珠海格力电器股份有限公司 | Sound awakening method and device, storage medium and electrical equipment |
WO2021253235A1 (en) * | 2020-06-16 | 2021-12-23 | 华为技术有限公司 | Voice activity detection method and apparatus |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101010722A (en) * | 2004-08-30 | 2007-08-01 | 诺基亚公司 | Detection of voice activity in an audio signal |
CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection |
CN102194452A (en) * | 2011-04-14 | 2011-09-21 | 西安烽火电子科技有限责任公司 | Voice activity detection method in complex background noise |
US20120158401A1 (en) * | 2010-12-20 | 2012-06-21 | Lsi Corporation | Music detection using spectral peak analysis |
CN107045870A (en) * | 2017-05-23 | 2017-08-15 | 南京理工大学 | A kind of the Method of Speech Endpoint Detection of feature based value coding |
CN107134277A (en) * | 2017-06-15 | 2017-09-05 | 深圳市潮流网络技术有限公司 | A kind of voice-activation detecting method based on GMM model |
-
2017
- 2017-12-22 CN CN201711407711.0A patent/CN108039182B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101010722A (en) * | 2004-08-30 | 2007-08-01 | 诺基亚公司 | Detection of voice activity in an audio signal |
CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection |
US20120158401A1 (en) * | 2010-12-20 | 2012-06-21 | Lsi Corporation | Music detection using spectral peak analysis |
CN102194452A (en) * | 2011-04-14 | 2011-09-21 | 西安烽火电子科技有限责任公司 | Voice activity detection method in complex background noise |
CN107045870A (en) * | 2017-05-23 | 2017-08-15 | 南京理工大学 | A kind of the Method of Speech Endpoint Detection of feature based value coding |
CN107134277A (en) * | 2017-06-15 | 2017-09-05 | 深圳市潮流网络技术有限公司 | A kind of voice-activation detecting method based on GMM model |
Non-Patent Citations (4)
Title |
---|
W. SHI: ""Long-term auto-correlation statistics based voice activity detection for strong noisy speech"", 《2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP)》 * |
Z. SHUYIN: ""Auto-Correlation Property of Speech and its Application in Voice Activity Detection"", 《2009 FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE》 * |
冯璐: ""基于长时特征的语音端点检测方法研究"", 《HTTP://D.WANFANGDATA.COM.CN/THESIS/CHJUAGVZAXNOZXDTMJAYMTAYMDESCFKYNJA0MJQ4GGHUEXO2M2PHDG%3D%3D》 * |
曹云: ""话音激活检测优化算法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109785866A (en) * | 2019-03-07 | 2019-05-21 | 上海电力学院 | The method of broadcasting speech and noise measuring based on correlation function maximum value |
CN111651135A (en) * | 2020-04-27 | 2020-09-11 | 珠海格力电器股份有限公司 | Sound awakening method and device, storage medium and electrical equipment |
WO2021253235A1 (en) * | 2020-06-16 | 2021-12-23 | 华为技术有限公司 | Voice activity detection method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN108039182B (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108039182A (en) | A kind of voice-activation detecting method | |
CN101430882B (en) | Method and apparatus for restraining wind noise | |
Tanyer et al. | Voice activity detection in nonstationary noise | |
CN103346845B (en) | Based on blind frequency spectrum sensing method and the device of fast Fourier transform | |
CN102842305B (en) | Method and device for detecting keynote | |
CN105190746A (en) | Method and apparatus for detecting a target keyword | |
CN102915753A (en) | Method for intelligently controlling volume of electronic device and implementation device of method | |
CN108564966A (en) | The method and its equipment of tone testing, the device with store function | |
CN109767776B (en) | Deception voice detection method based on dense neural network | |
CN109040940A (en) | A kind of detection method and device of loudspeaker | |
CN106898353A (en) | A kind of Intelligent household voice control system and its audio recognition method | |
CN103632681B (en) | A kind of spectral envelope silence detection method | |
CN101494508A (en) | Frequency spectrum detection method based on characteristic cyclic frequency | |
CN105845149A (en) | Predominant pitch acquisition method in acoustical signal and system thereof | |
CN108877809A (en) | A kind of speaker's audio recognition method and device | |
CN105848052A (en) | Microphone switching method and terminal | |
CN108010536A (en) | Echo cancel method, device, system and storage medium | |
CN107293287A (en) | The method and apparatus for detecting audio signal | |
CN110111811A (en) | Audio signal detection method, device and storage medium | |
CN107742516A (en) | Intelligent identification Method, robot and computer-readable recording medium | |
CN103581447B (en) | A kind of method of signal transacting, device and electronic equipment | |
CN109087657A (en) | A kind of sound enhancement method applied to ultrashort wave radio set | |
CN101814291B (en) | Method and device for improving signal-to-noise ratio of voice signals in time domain | |
CN105261363A (en) | Voice recognition method, device and terminal | |
CN109377982A (en) | A kind of efficient voice acquisition methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |