CN105261376A

CN105261376A - Voice signal detection method of digital audio system

Info

Publication number: CN105261376A
Application number: CN201510565648.8A
Authority: CN
Inventors: 李帅; 余方桃; 汤远峰; 王德勇; 姜黎; 向平
Original assignee: Hunan Goke Microelectronics Co Ltd
Current assignee: Hunan Goke Microelectronics Co Ltd
Priority date: 2015-09-08
Filing date: 2015-09-08
Publication date: 2016-01-20
Anticipated expiration: 2035-09-08
Also published as: CN105261376B

Abstract

The invention discloses a voice signal detection method of a digital audio system. The method comprises: a sample mean absolute value is obtained from each frame of voice signal and an average absolute deviation value of all voice signals is calculated based on a plurality of sample mean absolute values of multiple frames of voice signals; if the average absolute deviation value is less than a noise threshold value all the time within set noise detection time, all detected voice signals are determines as noises; and if the average absolute deviation value is larger than the noise threshold value all the time within set voice detection time, the detected voice signals are determined as voice signals; and under other circumstances, the detected voice signals are determined as ones in intermediate states. According to the invention, the average absolute deviation value is used as a detection decision metric and obvious differences between the voice signals and the noises in terms of energy and stationary performances are used. Therefore, the computation complexity is reduced; the requirement on hardware during realization is reduced; and the good detection performance can be guaranteed.

Description

A kind of voice signal detection method of digital audio system

Technical field

The present invention relates to digital audio system input, particularly a kind of voice signal detection method.

Background technology

In digital audio system, the process of voice signal detects based on activation, need distinguish voice signal and noise.

Current detection method mainly utilizes the statistical property of voice signal, as amplitude, energy, zero-crossing rate, quasi periodic, frequency characteristic, correlativity etc., judges according to maximum-likelihood criterion.These methods are all can distinguish the characteristic parameter of voice and noise by extraction or convert it, obtain obvious difference results, thus find out both separations.

Under high s/n ratio, the energy of voice signal is always greater than ground unrest, and energy measuring method has good detection perform.But noise can be judged to signal when worsening by to-noise ratio, there is very large probability of false detection.

Voice signal has continuity, in short-term strong correlation in short-term, and noise is stochastic distribution, thus utilizes the significant difference of zero-crossing rate can carry out the detection of voice signal.Because the zero-crossing rate and noise that are subject to the impact of mixing voice, particularly voiceless sound are suitable, therefore under some voice environment, the correct detection probability of zero-crossing rate method is lower.

In Speech signal detection, method based on amplitude, energy, zero-crossing rate realizes simple, and its detection perform is not good, although can obtain good detection perform based on the method for quasi periodic, correlation properties, frequency characteristic, but operand is very large, also higher to hardware requirement.Existing detection method, often based on binary decision criterion, adjudicates differentiation only by a thresholding variables, and the setting of single decision threshold is the most important, and court verdict is not voice signal is exactly noise, false-alarm and miss probability higher.

Summary of the invention

Technical matters to be solved by this invention is, not enough for prior art, provides a kind of voice signal detection method of digital audio system.

For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of voice signal detection method of digital audio system, the main implementation procedure of the method is: from every frame of digital sound signal, obtain a sample absolute value average, by the mean absolute deviation of all digital audio and video signals of multiple sample absolute value mean value computation of multiframe digital audio and video signals; If within the walkaway time arranged, this mean absolute deviation is less than default noise gate, then detected digital audio and video signals is judged to noise always; If within the Speech signal detection time arranged, this mean absolute deviation is greater than noise gate, then detected digital audio and video signals is judged to voice signal always; In other situation, then it is intermediate state.

Compared with prior art, the beneficial effect that the present invention has is: the present invention utilizes voice signal and the significant difference of noise in energy, smooth performance, using mean absolute deviation as detection decision metric standard, distinguish noise and voice signal, both reduced computation complexity, reduced the requirement to hardware when realizing, good detection perform can have been ensured again; The present invention can according to different application environmental demand, flexible configuration Speech signal detection frame number, walkaway frame number, and testing result is divided into tri-state, improves the correct probability detected.

Accompanying drawing explanation

Fig. 1 is voice signal detection method process flow diagram of the present invention.

Embodiment

As Fig. 1, implementation procedure of the present invention specifically comprises the following steps:

1) the absolute value average X [n] of the n-th frame of digital sound signal is calculated;

2) utilize the absolute value average of N frame of digital sound signal, calculate mean absolute deviation values wherein n=1,2 ..., N, Σ represent summation, for the average of X [1] ~ X [N]; In order to improve detection further accurately, the value setting N in the present invention is relevant to the sampling rate of digital audio and video signals, and when sampling rate is 176.4K, N gets 16; Or N gets 8 when sampling rate is 96K and 88.2K; Or N gets 4 when sampling rate is 48K and 32K;

3) compared with default noise gate Th by the mean absolute deviation values d [n] of the n-th frame of digital sound signal, digital audio system can be used as default noise gate without actual measurement mean absolute deviation values during input signal; When d [n] is more than or equal to Th, noise frame counter noisecnt=0, voice signal frame counter signalcnt=signalcnt+1; Otherwise, noise frame counter noisecnt=noisecnt+1, voice signal frame counter signalcnt=0;

4) if noise frame counter noisecnt is less than default walkaway frame number, and voice signal frame counter signalcnt is less than default Speech signal detection frame number, then the testing result of the n-th frame of digital sound signal is intermediate state; If voice signal frame counter signalcnt is more than or equal to default Speech signal detection frame number (corresponding Speech signal detection time), then the n-th frame signal is voice signal, and signalcnt resets to 0; If noise frame counter noisecnt is more than or equal to default walkaway frame number (corresponding walkaway time), then this n-th frame signal is noise, and noisecnt resets to 0; Here, when for intermediate state, the testing result of the n-th frame of digital sound signal is identical with the testing result of previous frame digital audio and video signals;

5) preserve the testing result of signalcnt, noisecnt, the n-th frame of digital sound signal, return step 1) continue to detect digital audio and video signals to be detected.

In the present invention, Speech signal detection frame number=signal detection time × sampling rate/1000/ frame length; Walkaway frame number=walkaway time × sampling rate/1000/ frame length.Signal detection time, walkaway time can be arranged flexibly according to applied environment demand.Signal detection time unit is second, and sampling rate unit is 1/ second.

Under domestic. applications scene, the parameter of the inventive method can be set to:

Parameter	Value
		Th	0x200
Frame length	256
		Samplerate (sampling rate)	48000Hz(48K)
The walkaway time	3s
		Walkaway frame number	Walkaway time * Samplerate/1000/ frame length
Signal detection time	50ms
		Input frame number	Signal detection time * Samplerate/1000/ frame length

Every frame data obtain an absolute value average, utilize multiple absolute value average to calculate mean absolute deviation, and it can be used as detection decision metric standard.By mean absolute deviation compared with default noise gate, simultaneously in conjunction with Speech signal detection frame number, walkaway time frame number, obtain three kinds of testing results: voice signal, noise, intermediate state.

Claims

1. the voice signal detection method of a digital audio system, it is characterized in that, the main implementation procedure of the method is: from every frame of digital sound signal, obtain a sample absolute value average, by the mean absolute deviation of all digital audio and video signals of multiple sample absolute value mean value computation of multiframe digital audio and video signals; If in the walkaway frame number arranged, this mean absolute deviation is less than predict noise thresholding, then detected digital audio and video signals is judged to be noise always; If in the Speech signal detection frame number arranged, this mean absolute deviation is greater than default noise gate, then detected digital audio and video signals is judged to be voice signal always; In other situation, then it is intermediate state.

2. the voice signal detection method of digital audio system according to claim 1, is characterized in that, the method comprises the following steps:

2) utilize the absolute value average of N frame of digital sound signal, calculate mean absolute deviation values wherein n=1,2 ..., N, Σ represent summation, for the average of X [1] ~ X [N];

3) using digital audio system without actual measurement mean absolute deviation values during input signal as default noise gate Th, and mean absolute deviation values d [n] to be compared with default noise gate Th; When d [n] is more than or equal to Th, noise frame counter noisecnt=0, voice signal frame counter signalcnt=signalcnt+1; Otherwise, noise frame counter noisecnt=noisecnt+1, voice signal frame counter signalcnt=0;

4) if noise frame counter noisecnt is less than default walkaway frame number, and voice signal frame counter signalcnt is less than default Speech signal detection frame number, then the testing result of the n-th frame of digital sound signal is judged to be intermediate state; If voice signal frame counter signalcnt is more than or equal to default Speech signal detection frame number, then the n-th frame of digital sound signal is voice signal, and signalcnt resets to 0; If noise frame counter noisecnt is more than or equal to default walkaway frame number, then this n-th frame of digital sound signal is noise, and noisecnt resets to 0;

5) preserve signalcnt, noisecnt, the n-th frame of digital sound signal testing result, return step 1) continue to detect digital audio and video signals to be detected.

3. the voice signal detection method of digital audio system according to claim 2, is characterized in that, the value of N is relevant to the sampling rate of digital audio and video signals, and when sampling rate is 176.4K, N gets 16; Or N gets 8 when sampling rate is 96K and 88.2K; Or N gets 4 when sampling rate is 48K and 32K.

4. the voice signal detection method of digital audio system according to claim 3, is characterized in that, described Speech signal detection frame number=signal detection time × sampling rate/1000/ frame length.

5. the voice signal detection method of digital audio system according to claim 3, is characterized in that, described walkaway frame number=walkaway time × sampling rate/1000/ frame length.