CN108039182A

CN108039182A - A kind of voice-activation detecting method

Info

Publication number: CN108039182A
Application number: CN201711407711.0A
Authority: CN
Inventors: 张亦希; 陈晨; 王陈春; 王业芳; 常浩宇; 王蕴; 舒敏; 王琼
Original assignee: Shaanxi Fenghuo Communication Group Co Ltd
Current assignee: Shaanxi Fenghuo Electronics Co Ltd; Shaanxi Fenghuo Communication Group Co Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2018-05-15
Anticipated expiration: 2037-12-22
Also published as: CN108039182B

Abstract

The invention belongs to voice process technology field, disclose a kind of voice-activation detecting method, there is stronger correlation using voice signal and there is noise weaker correlation to carry out voice activation detection, less missing inspection can not only be realized under stronger noise circumstance and pick up probability by mistake, and there is relatively low computation complexity, it is easy to realize in various embedded platforms.

Description

A kind of voice-activation detecting method

Technical field

The invention belongs to voice process technology field, more particularly to a kind of voice-activation detecting method.

Background technology

For radio station IP gateway, since radio station can only generally carry out half-duplex voice communications, and the voice from IP network Signal is usually all the voice signal of full duplex, therefore radio station IP gateway is just required to realize the phase between full duplex and half-duplex Mutually conversion, i.e., when finding not having voice there was only noise in the audio signal from IP network, make radio station be in reception state, and Give the received audio signal in radio station to IP network, and when the audio signal from IP network includes voice signal, then make electricity Platform is in transmission state, and the voice signal from IP network is sent by radio station.

Therefore, radio station IP gateway needs whether to wrap the audio signal from IP network using voice activation detection algorithm It is detected containing voice, the requirement to voice activation detection algorithm generally includes：(1) there is relatively low complexity, due to radio station IP gateway generally use embedded platform is (such as：Various ARM platforms), and various agreements are handled using (SuSE) Linux OS, because This voice activation detection algorithm must have relatively low algorithm complex, so as to be transported in various embedded Linux platforms OK；(2) there is stronger noise robustness, since the voice signal sent from different location by IP network usually contains amplitude Different noise signals, thus voice activation detection algorithm allow for realizing under stronger noise circumstance less missing inspection and Probability is picked up by mistake.

At present, most commonly used voice activation detection is short-time energy and zero-crossing rate voice in embedded Linux platform Activate detection algorithm.Short-time energy and zero-crossing rate voice activation detection algorithm by the energy calculated and zero-crossing rate with it is set in advance Thresholding compares, if both be at the same time speech frame more than present frame is sentenced if thresholding, if both are at the same time or one of them is less than another group During thresholding, then it is noise to sentence present frame, and the algorithm is too simple, so as to cause its noise robustness poor, i.e., stronger Larger missing inspection is had under noise circumstance and picks up probability by mistake.

The content of the invention

In view of the above-mentioned problems, it is an object of the invention to provide a kind of voice-activation detecting method, can not only be stronger Noise circumstance under realize less missing inspection and pick up probability by mistake and there is relatively low computation complexity, be easy in various insertions Realized in formula platform.

To reach above-mentioned purpose, the present invention is realised by adopting the following technical scheme.

A kind of voice-activation detecting method, the voice-activation detecting method include：

Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point；

Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤ M, M are the audio sample totalframes that the sampled audio signal stream includes；

Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, the i-th frame sound is judged Frequency sampling is speech frame；

When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio sample is judged For noise frame；

Otherwise, as i=1, judge that the 1st frame audio sample is noise frame；

As i ＞ 1, the i-th frame audio sample is identical with the judgement result of the i-th -1 frame audio sample.

The characteristics of technical solution of the present invention and further it is improved to：

(1) in step 2, the correlation R of the i-th frame audio sample is calculated_i, it is specially：

Wherein, N represents the sampled point total number that the i-th frame audio sample includes, x_i(k) represent in the i-th frame audio sample K-th of sampled point, x_i(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, C tables Show the setting constant more than zero.

(2) the 1st frame audio sample is set as noise frame, the noise energy E of the 1st frame audio sample is calculated, according to the noise ENERGY E determines constant C：

The method of the present invention using voice signal there is stronger correlation and noise have weaker correlation come Voice activation detection is carried out, less missing inspection can not only be realized under stronger noise circumstance and pick up probability by mistake, and had Relatively low computation complexity, is easy to realize in various embedded platforms.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.

Fig. 1 is a kind of flow diagram of voice-activation detecting method provided in an embodiment of the present invention；

Fig. 2 is probability-distribution function provided in an embodiment of the present invention and Standard Normal Distribution comparison schematic diagram；

Fig. 3 is existing method provided in an embodiment of the present invention and the method for the present invention simulation result schematic diagram.

Embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment, belongs to the scope of protection of the invention.

The embodiment of the present invention provides a kind of voice-activation detecting method, as shown in Figure 1, the voice-activation detecting method bag Include：

Step 1, sampled audio signal stream is obtained, is continuous multiframe audio sample by the sampled audio signal flow point.

Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤ M, M are the audio sample totalframes that the sampled audio signal stream includes.

In step 2, the correlation R of the i-th frame audio sample is calculated_i, it is specially：

Otherwise, as i=1, judge that the 1st frame audio sample is noise frame；

Further, if the 1st frame audio sample is noise frame, the noise energy E of the 1st frame audio sample is calculated, according to institute State noise energy E and determine that constant C is as follows：

It should be noted that when audio sample does not include voice signal, then assume that x (k) is additivity White Gaussian Noise signal, and Normal DistributionIf order：

Z=X (k) X (k+1) (1)

The probability-distribution function that can then prove Z is：

Probability-distribution function F (z) and Standard Normal Distribution comparison schematic diagram are as shown in Figure 2.Make again：

U=Z+C (3)

Then probability of the U more than or equal to 0 can be expressed as：

As in dash area in Fig. 2 as it can be seen that reasonably select C size, can make U be more than or equal to 0 probability P { U >=0 } Diminish rapidly, and voice signal usually has the stronger degree of correlation, therefore the present invention can significantly improve voice activation detection and calculate The noise robustness of method.And as C=0, P { U >=0 } is larger, therefore short-time energy and zero-crossing rate voice activation algorithm just can not Under stronger noise circumstance, voice signal and noise signal are distinguished.

Computer artificial result also demonstrates that the validity and superiority of the method for the present invention.Add and add in primary speech signal Property additive white Gaussian, when signal-to-noise ratio is reduced to 2dB, primary speech signal adds the voice signal after making an uproar, and its each frame is short As shown in figure 3, wherein, Fig. 3 (a) is raw tone time-domain signal for Shi Nengliang, zero-crossing rate and correlation, after Fig. 3 (b) plus noises Voice time domain signal, Fig. 3 (c) is the testing result schematic diagram of existing short-time energy and short-time zero-crossing rate method, and Fig. 3 (d) is Testing result schematic diagram of the invention based on correlation method.As seen from Figure 3, it has been difficult to pass through when signal-to-noise ratio is 2dB Short-time energy and zero-crossing rate index distinguish voice and noise signal, but still can effectively be distinguished using correlation.Therefore, this hair The voice activation detection algorithm based on correlation is run by increasing n times multiplication of integers in bright, is effectively improved algorithm Noise robustness, while can also be run in various embedded Linux platforms.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above method embodiment can pass through The relevant hardware of programmed instruction is completed, and foregoing program can be stored in computer read/write memory medium, which exists During execution, execution the step of including above method embodiment；And foregoing storage medium includes：ROM, RAM, magnetic disc or CD Etc. it is various can be with the medium of store program codes.

The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of voice-activation detecting method, it is characterised in that the voice-activation detecting method includes：

Step 2, voice thresholding and Noise Gate are set, calculate the correlation of the i-th frame audio sample, wherein, 1≤i≤M, M are The audio sample totalframes that the sampled audio signal stream includes；

Step 3, when the correlation of the i-th frame audio sample is more than the voice thresholding, judge that the i-th frame audio is adopted Sample is speech frame；When the correlation of the i-th frame audio sample is less than the Noise Gate, the i-th frame audio is judged It is sampled as noise frame；

Otherwise,

As i=1, judge that the 1st frame audio sample is noise frame；As i ＞ 1, the i-th frame audio sample and i-th -1 The judgement result of frame audio sample is identical.

2. a kind of voice-activation detecting method according to claim 1, it is characterised in that in step 2, calculate the i-th frame sound The correlation R of frequency sampling_i, it is specially：

<mrow> <msub> <mi>R</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>2</mn> </mrow> </munderover> <mo>{</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>sgn</mi> <mo>&lsqb;</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msub> <mi>x</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <mi>C</mi> <mo>&rsqb;</mo> <mo>}</mo> </mrow>

Wherein, N represents the sampled point total number that the i-th frame audio sample includes, x_i(k) k-th in the i-th frame audio sample is represented Sampled point, x_i(k+1)+1 sampled point of kth in the i-th frame audio sample is represented, sgn () represents sign function, and C represents to be more than Zero setting constant.

A kind of 3. voice-activation detecting method according to claim 2, it is characterised in that

If the 1st frame audio sample is noise frame, the noise energy E of the 1st frame audio sample is calculated, it is true according to the noise energy E Permanent several C：

<mrow> <mi>C</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0.020</mn> <mo>&times;</mo> <mi>E</mi> </mrow> </mtd> <mtd> <mrow> <mi>E</mi> <mo>&GreaterEqual;</mo> <mn>320000</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0.015</mn> <mo>&times;</mo> <mi>E</mi> </mrow> </mtd> <mtd> <mrow> <mn>36000</mn> <mo>&le;</mo> <mi>E</mi> <mo><</mo> <mn>320000</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0.010</mn> <mo>&times;</mo> <mi>E</mi> </mrow> </mtd> <mtd> <mrow> <mi>E</mi> <mo><</mo> <mn>36000</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow>