CN106356071A

CN106356071A - Noise detection method and device

Info

Publication number: CN106356071A
Application number: CN201610769237.5A
Authority: CN
Inventors: 刘运
Original assignee: All Kinds Of Fruits Garden Guangzhou Network Technology Co Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2016-08-30
Filing date: 2016-08-30
Publication date: 2017-01-25
Anticipated expiration: 2036-08-30
Also published as: CN106356071B

Abstract

An embodiment of the invention discloses a noise detection method and device. The method comprises the steps as follows: a to-be-processed audio signal is obtained, and a power spectrum (omega) of an audio frame in the audio signal is computed; omega is the frequency of 2pi*power spectrum; autocorrelation Corr(tau) is computed according to the power spectrum of the audio frame, and tau is a time value; an enhanced correlation spectrum Ecorr (tau) is computed on the basis of the autocorrelation Corr(tau); the maximum value Max (ECorr) in Ecorr (tau) is obtained, and the audio frame is determined as noise if Max (ECorr) of the continuous predetermined number of audio frames is smaller than a first threshold; or, tau corresponding to Max (ECorr) is obtained, and the audio frame is determined as noise if tau corresponding to Max (ECorr) is not in the preset threshold range. Noise is identified on the basis of the enhanced correlation spectrum Ecorr (tau) and can be separated from music and human sound, so that a basis is provided for denoising.

Description

A kind of noise detecting method, and device

Technical field

The present invention relates to field of computer technology, particularly to a kind of noise detecting method, and device.

Background technology

Carry out live network application by mobile phone progressively to popularize, but live and in voice call process audio signal is deposited In place of relatively big difference, for example: making a phone call is the transmission of speech data, the live biography not simply carrying out speech data Pass, main broadcaster may sing during live or perform, be also possible to there is musical background or scene accompaniment etc. simultaneously Situation.

Can be using to noise reduction skill in webpage real-time Communication for Power (web real-time communication, webrtc) technology Art, specific as follows: webrtc technology calculates each frequency using the spectral change degree of frequency spectrum flatness parameter and adjacent interframe The speech/noise probability of point, then updates noise spectrum, removes noise finally by Wiener filtering.

But webrtc is to carry out noise reduction process for voice, when there being music in background sound, especially frequency spectrum is basic Indeclinable snatch of music (long note of such as bowstring kind musical instrument), the renewal noise spectrum of meeting mistake, this section of music is suppressed Fall, thus causing to music to damage.And although common Autocorrelation Detection is capable of detecting when the relevant peaks of music, due to environment Noise great majority are pink colour noise (pink noise), and the relevant peaks of music are in the autocorrelation spectrum of pink colour noise and inconspicuous, Therefore rarer use autocorrelation spectrum distinguishes music and noise.

Therefore at present in the urgent need to being suitable under such as live scene, comprise voice and music etc. in audio signal all types of Accurate noise detection scheme in the case of voice data, thus provide foundation for noise reduction process.

Content of the invention

Embodiments provide a kind of noise detecting method, and device, it is used for accurately identifying noise.

On the one hand embodiments provide a kind of noise detecting method, comprising:

Obtain pending audio signal, calculate the power spectrum spectrum (ω) of described audio signal sound intermediate frequency frame；Institute State the frequency that ω is 2 π * power spectrum；

According to spectra calculation class autocorrelation spectrum corr (τ) of described audio frame, described τ is time value；

Calculate according to described autocorrelation spectrum corr (τ) and strengthen Correlated Spectroscopy ecorr (τ)；

Obtain the maximum max (ecorr) in described ecorr (τ), if the max of the audio frame of continuous predetermined number (ecorr) it is respectively less than first threshold it is determined that described audio frame is noise, described first threshold is the threshold value strengthening Correlated Spectroscopy；

Or, obtain the corresponding τ of described max (ecorr), if the corresponding τ of described max (ecorr) is not in predetermined threshold value model Enclose it is determined that described audio frame is noise, described preset threshold range is default time range.

In an optional implementation, described calculating according to described autocorrelation spectrum corr (τ) strengthens Correlated Spectroscopy ecorr (τ) include:

The value being less than 0 in described corr (τ) is entered as 0 and then calculates enhancing spectrum ecorr (τ)；

The value being less than 0 in described ecorr (τ) is entered as 0, obtains strengthening Correlated Spectroscopy ecorr (τ).

In an optional implementation, described calculating strengthens spectrum ecorr (τ) inclusion:

Calculate described ecorr (τ) according to ecorr (τ)=corr (τ)-corr (τ/2), if τ is odd number, described corr (τ/2) are obtained by near stratum exhaust.

In an optional implementation, the described spectra calculation class autocorrelation spectrum corr according to described audio frame (τ) include:

Calculate the cube root of the frequency of described spectrum (ω), and three times of the frequency to described spectrum (ω) Root makees fast Fourier transform treating excess syndrome portion, obtains described corr (τ).

In an optional implementation, before the described audio frame of described determination is noise, methods described also includes:

Calculate the average distance d of the amplitude spectrum of amplitude spectrum s and noise spectrum n of described audio frame, d=20 (log10 (s)- log10(n))；If described d is less than Second Threshold and the max (ecorr) of the audio frame of continuous predetermined number is respectively less than described first Threshold value, or, if described d is less than described Second Threshold and the corresponding τ of described max (ecorr) not in preset threshold range, really Fixed described audio frame is noise, and described preset threshold range is default time range.

In an optional implementation, methods described also includes:

If it is determined that described audio frame is noise, then determine new noise spectrum by the way of window is average.

In an optional implementation, after the new noise spectrum of described determination, methods described also includes:

Using described new noise spectrum, Wiener filtering is carried out to the audio frame of described audio signal.

In an optional implementation, methods described also includes:

If described audio frame is not defined as noise it is determined that described audio frame is voice or music.

In an optional implementation, before the described audio frame of described determination is voice or music, described side Method also includes:

If if described d is more than described Second Threshold and described audio frame is not defined as noise it is determined that described audio frame is Voice or music.

In an optional implementation, methods described also includes:

If not determining, described audio frame is voice or music, uses described audio frame by the way of window is average Ecorr (τ) updates described first threshold.

The two aspect embodiment of the present invention additionally provide a kind of noise detection apparatus, comprising:

Signal acquiring unit, for obtaining pending audio signal；

Computing unit, for calculating the power spectrum spectrum (ω) of described audio signal sound intermediate frequency frame；Described ω is 2 π * The frequency of power spectrum；According to spectra calculation class autocorrelation spectrum corr (τ) of described audio frame, described τ is time value；According to institute State autocorrelation spectrum corr (τ) and calculate and strengthen Correlated Spectroscopy ecorr (τ)；

Signal determining unit, for obtaining the maximum max (ecorr) in described ecorr (τ), if continuous predetermined number Audio frame max (ecorr) be respectively less than first threshold it is determined that described audio frame be noise, described first threshold be strengthen The threshold value of Correlated Spectroscopy；Or, obtain the corresponding τ of described max (ecorr), if the corresponding τ of described max (ecorr) is not in default threshold It is determined that described audio frame is noise, described preset threshold range is default time range to value scope.

In an optional implementation, described computing unit, specifically for being less than 0 value in described corr (τ) It is entered as 0 and then calculate enhancing spectrum ecorr (τ)；The value being less than 0 in described ecorr (τ) is entered as 0, obtains strengthening Correlated Spectroscopy ecorr(τ).

In an optional implementation, described computing unit, specifically for according to ecorr (τ)=corr (τ)- Corr (τ/2) calculates described ecorr (τ), if τ is odd number, described corr (τ/2) is obtained by near stratum exhaust.

In an optional implementation, described computing unit, specifically for calculating the frequency of described spectrum (ω) The cube root of point, and the cube root of the frequency of described spectrum (ω) is made with fast Fourier transform treating excess syndrome portion, obtain Described corr (τ).

In an optional implementation, described computing unit, it is additionally operable to described in the determination of described signal determining unit Before audio frame is noise, calculate the average distance d, d=20 of the amplitude spectrum of amplitude spectrum s and noise spectrum n of described audio frame (log10(s)-log10(n))；

Described signal determining unit, if be less than the audio frame of Second Threshold and continuous predetermined number specifically for described d Max (ecorr) is respectively less than described first threshold, or, if described d is less than described Second Threshold and described max (ecorr) is corresponding τ not in preset threshold range it is determined that described audio frame be noise, described preset threshold range be default time range.

In an optional implementation, described device also includes:

Noise spectrum updating block, if determining that described audio frame is noise for described signal determining unit, adopts window Average mode determines new noise spectrum n.

In an optional implementation, described device also includes:

Filter unit, for carrying out Wiener filtering using described new noise spectrum to the audio frame of described audio signal.

In an optional implementation, described signal determining unit, if be additionally operable to described audio frame not to be defined as making an uproar Sound is it is determined that described audio frame is voice or music.

In an optional implementation, described signal determining unit, it is additionally operable to determine that described audio frame is described Before voice or music, if if described d is more than described Second Threshold and described audio frame and is not defined as noise it is determined that described Audio frame is voice or music.

In an optional implementation, described device also includes:

Threshold value updating block, if for not determining that described audio frame is voice or music, using the average side of window Formula updates described first threshold using the ecorr (τ) of described audio frame.

As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that based on enhancing Correlated Spectroscopy ecorr (τ) to accurately identify noise, noise can be distinguished with music and voice, thus providing foundation for noise reduction process.

Brief description

For the technical scheme being illustrated more clearly that in the embodiment of the present invention, will make to required in embodiment description below Accompanying drawing briefly introduce it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these accompanying drawings His accompanying drawing.

Fig. 1 is present invention method schematic flow sheet；

Fig. 2 is present invention method schematic flow sheet；

Fig. 3 is embodiment of the present invention apparatus structure schematic diagram；

Fig. 4 is embodiment of the present invention apparatus structure schematic diagram；

Fig. 5 is embodiment of the present invention apparatus structure schematic diagram；

Fig. 6 is embodiment of the present invention apparatus structure schematic diagram；

Fig. 7 is embodiment of the present invention terminal unit structural representation；

Fig. 8 is embodiment of the present invention terminal unit structural representation.

Specific embodiment

In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into One step ground describes in detail it is clear that described embodiment is only present invention some embodiments, rather than whole enforcement Example.Based on the embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of not making creative work All other embodiment, broadly falls into the scope of protection of the invention.

Embodiments provide a kind of noise detecting method, as shown in Figure 1, comprising:

101: obtain pending audio signal, calculate the power spectrum spectrum of above-mentioned audio signal sound intermediate frequency frame (ω)；Above-mentioned ω is the frequency of 2 π * power spectrum；

Wherein spectrum is the function name of power spectrum, and ω is the independent variable of power spectrum function.

102: according to spectra calculation class autocorrelation spectrum corr (τ) of above-mentioned audio frame, above-mentioned τ is time value；

Wherein corr is the function name of class autocorrelation spectrum, τ class autocorrelation spectrum argument of function.

103: calculate according to above-mentioned autocorrelation spectrum corr (τ) and strengthen Correlated Spectroscopy ecorr (τ)；

How enhancement process is carried out to it after corr (τ) determines, the embodiment of the present invention is not made uniqueness and limited.Follow-up Optional implementation will be given in embodiment.

104: obtain the maximum max (ecorr) in above-mentioned ecorr (τ), if the max of the audio frame of continuous predetermined number (ecorr) it is respectively less than first threshold it is determined that above-mentioned audio frame is noise, described first threshold is the threshold value strengthening Correlated Spectroscopy；

Or, obtain the corresponding τ of above-mentioned max (ecorr), if the corresponding τ of above-mentioned max (ecorr) is not in predetermined threshold value model Enclose it is determined that above-mentioned audio frame is noise, described preset threshold range is default time range.

The embodiment of the present invention, accurately identifies noise based on strengthening Correlated Spectroscopy ecorr (τ), can by noise and music and Voice distinguishes, thus providing foundation for noise reduction process.

Alternatively, embodiments provide, as an optional implementation, ecorr is calculated by corr (τ) (τ) scheme, it should be noted that carrying out strengthening the realization not affecting the embodiment of the present invention by other means, the present invention is implemented Example is not made uniqueness and is limited, and above-mentioned calculating according to above-mentioned autocorrelation spectrum corr (τ) strengthens Correlated Spectroscopy ecorr (τ) inclusion:

The value being less than 0 in above-mentioned corr (τ) is entered as 0 and then calculates enhancing spectrum ecorr (τ)；

The value being less than 0 in above-mentioned ecorr (τ) is entered as 0, obtains strengthening Correlated Spectroscopy ecorr (τ).

The above calculation strengthening corr (τ), amount of calculation is less can be used as a more preferred implementation.

Alternatively, the embodiment of the present invention additionally provides the scheme calculating ecorr (τ), for improving in subsequent calculations ecorr (τ) effect, specific as follows: above-mentioned calculating strengthens spectrum ecorr (τ) and includes:

Calculate above-mentioned ecorr (τ) according to ecorr (τ)=corr (τ)-corr (τ/2), if τ is odd number, above-mentioned corr (τ/2) are obtained by near stratum exhaust.

The scheme of this calculating ecorr (τ) both can improve the accuracy of ecorr (τ), and amount of calculation is also less, is adapted to straight Broadcast etc. under application scenarios, the larger situation of data processing amount.

Alternatively, the present invention implements to additionally provide the preferential implementation calculating corr (τ), specific as follows: above-mentioned foundation Spectra calculation class autocorrelation spectrum corr (τ) of above-mentioned audio frame includes:

Calculate the cube root of the frequency of above-mentioned spectrum (ω), and three times of the frequency to above-mentioned spectrum (ω) Root makees fast Fourier transform treating excess syndrome portion, obtains above-mentioned corr (τ).

Further, the embodiment of the present invention additionally provides the amplitude passing through the amplitude spectrum s and noise spectrum n of audio frame further Spectrum determines the scheme of noise as reference value, can improve the accuracy of noise determination further, specific as follows: in above-mentioned determination Before above-mentioned audio frame is noise, said method also includes:

Calculate the average distance d of the amplitude spectrum of amplitude spectrum s and noise spectrum n of above-mentioned audio frame, d=20 (log10 (s)- log10(n))；If above-mentioned d is less than Second Threshold and the max (ecorr) of the audio frame of continuous predetermined number is respectively less than above-mentioned first Threshold value, or, if above-mentioned d is less than above-mentioned Second Threshold and the corresponding τ of above-mentioned max (ecorr) not in preset threshold range, really Fixed above-mentioned audio frame is noise, and described preset threshold range is default time range.

Further, the embodiment of the present invention additionally provides the implementation updating noise spectrum, updates noise spectrum later permissible To make to determine more accurate during noise next time, also provides accurate foundation for subsequently carrying out noise reduction process, specific as follows: said method Also include:

If it is determined that above-mentioned audio frame is noise, then determine new noise spectrum by the way of window is average.

Window averagely refers to, using a window as reference, calculate average mode；For example: the value of window is 8, before 1～No. 8 audio frame of serial number through determining according to time-series is noise, then if current audio frame is also to make an uproar Sound, serial number 9；The corresponding noise of window that the audio frame of so serial number 2～9 is new, calculates the audio frame of serial number 2～9 Noise spectrum meansigma methodss.

Further, the embodiment of the present invention additionally provides the specific implementation of noise reduction process, as follows: new in above-mentioned determination After noise spectrum, said method also includes:

Using above-mentioned new noise spectrum, Wiener filtering is carried out to the audio frame of above-mentioned audio signal.

Wiener filtering, as more conventional noise reduction process means, is attached in the embodiment of the present invention and accurately newly makes an uproar Sound spectrum, it is possible to obtain preferably noise reduction, noise reduction process will not cause damage to music and voice, can improve audio signal Quality, be adapted to the complex application context with music, voice and noise such as live.

Further, the embodiment of the present invention additionally provides the application scenarios determining non-noise, specific as follows: said method is also Including:

If above-mentioned audio frame is not defined as noise it is determined that above-mentioned audio frame is voice or music.

In the present embodiment, it is not defined as other situations that noise refers to be unsatisfactory for the condition determining noise, that is to say this Inventive embodiments do not determine the situation that above-mentioned audio frame is noise.

Further, in order to improve the accuracy determining that audio frame is voice or music, the embodiment of the present invention also provides With reference to the implementation of the average distance d of the amplitude spectrum of the amplitude spectrum s and noise spectrum n of above-mentioned audio frame, specific as follows: upper State before determining that above-mentioned audio frame is voice or music, said method also includes:

If if above-mentioned d is more than above-mentioned Second Threshold and above-mentioned audio frame is not defined as noise it is determined that above-mentioned audio frame is Voice or music.

In the present embodiment, if above-mentioned d is not greater than above-mentioned Second Threshold, then it is considered that this audio frequency cannot be determined Frame is noise, also cannot determine that this audio frame is voice or music.

Further, in view of the voice accurately having determined or music, the embodiment of the present invention additionally provides renewal threshold value Scheme, can improve further subsequent audio frame type determine accuracy, specific as follows: said method also includes:

If not determining, above-mentioned audio frame is voice or music, uses above-mentioned audio frame by the way of window is average Ecorr (τ) updates above-mentioned first threshold.

The application scenarios of the embodiment of the present invention, are primarily related to real-time high definition speech processes；With video The growth of the Internet service such as live, detection and noise reduction to music are increasingly becoming a new demand.Noise reduction technology is strengthening In addition it is also necessary to the music being capable of detecting when in environment outside voice, reduce the injury to music in noise reduction process as far as possible.The present invention is real Apply example to make a return journey except the impact of pink noise, the power of test to musical sound for the lifting using enhanced auto-correlation, be simultaneously based on this increasing Strong auto-correlation it is proposed that update noise spectrum strategy, after Wiener filtering, can protection music injury-free before Put, filter most of background noise.Carry out because Wiener filtering requires transformation into frequency domain in itself, calculating enhanced auto-correlation can So that using existing frequency spectrum data, the amount of calculation of filtering can't be significantly increased, can smooth run on a handheld device.Specifically such as Shown in Fig. 2.

Embodiment of the present invention technical scheme is divided into enhancing auto-correlation and noise spectrum to update two parts；Wherein strengthen autocorrelative Calculation procedure is:

201: calculate the work(of present frame using fast Fourier transform (fast fourier transformation, fft) Rate is composed, and obtains spectrum (ω)；

Present frame is the audio frame currently extracting in audio signal.

202: cube root is asked to each frequency, obtains (spectrum (ω))^1/3；

203: to (spectrum (ω))^1/3Make fft treating excess syndrome portion, obtain class autocorrelation spectrum corr (τ)；

204: 0 is entered as to the value being less than 0 in corr (τ), then calculates and strengthen spectrum ecorr (τ)；

Ecorr (τ)=corr (τ)-corr (τ/2)；When τ is for odd number, corr (τ/2) is obtained by near stratum exhaust.

205: the value being less than 0 in ecorr (τ) is entered as 0, that is, obtains final enhancing Correlated Spectroscopy ecorr (τ)；

Wherein noise spectrum renewal step is:

206: the maximum max (ecorr) in detection ecorr (τ) and its corresponding τ；

207: if the maximum of successive frame is respectively less than first threshold, or τ not in the threshold range setting, is then judged to noise, Otherwise it is judged to music/voice；Described first threshold is the threshold value strengthening Correlated Spectroscopy；

208: calculate the average distance d of the amplitude spectrum of amplitude spectrum s and noise spectrum n of present frame；

Wherein, d=20 (log10 (s)-log10 (n))；

209: if d is less than in Second Threshold, and step 207 is judged to noise it is determined that present frame is judged to noise；

210: if d is more than in Second Threshold, and step 207 is judged to music or voice, then present frame is judged to music or voice；

211: if not 209 or 210, be then judged to uncertain sound type.

212: be judged to noise or through 211 uncertain sound through 209, then gone using enhancing Correlated Spectroscopy ecorr (τ) of present frame Update Second Threshold used in step 209 and 210.Update mode can be average for window.

213: be judged to noise through 209, then update noise spectrum using 209 result.Update mode can be average for window.

214: using the noise spectrum updating, Wiener filtering is carried out to the audio frame of input, obtain the later audio frequency letter of denoising Number.

The real-time example of the present invention, can significantly distinguish music and pink colour noise under conditions of not dramatically increasing operand, Reduce noise reduction process to the damage with noise music in happy.

The embodiment of the present invention additionally provides a kind of noise detection apparatus, as shown in Figure 3, comprising:

Signal acquiring unit 301, for obtaining pending audio signal；

Computing unit 302, for calculating the power spectrum spectrum (ω) of above-mentioned audio signal sound intermediate frequency frame；Above-mentioned ω is The frequency of 2 π * power spectrum；According to spectra calculation class autocorrelation spectrum corr (τ) of above-mentioned audio frame, above-mentioned τ is time value；According to Calculate according to above-mentioned autocorrelation spectrum corr (τ) and strengthen Correlated Spectroscopy ecorr (τ)；

Signal determining unit 303, for obtaining the maximum max (ecorr) in above-mentioned ecorr (τ), if continuously make a reservation for individual The max (ecorr) of the audio frame of number is respectively less than first threshold it is determined that above-mentioned audio frame is noise, and described first threshold is to increase The threshold value of strong correlation spectrum；Or, obtain the corresponding τ of above-mentioned max (ecorr), if the corresponding τ of above-mentioned max (ecorr) is not default It is determined that above-mentioned audio frame is noise, described preset threshold range is default time range to threshold range.

Alternatively, embodiments provide, as an optional implementation, ecorr is calculated by corr (τ) (τ) scheme, it should be noted that carrying out strengthening the realization not affecting the embodiment of the present invention by other means, the present invention is implemented Example is not made uniqueness and is limited, above-mentioned computing unit 302, specifically for then the value being less than 0 in above-mentioned corr (τ) is entered as 0 Calculate and strengthen spectrum ecorr (τ)；The value being less than 0 in above-mentioned ecorr (τ) is entered as 0, obtains strengthening Correlated Spectroscopy ecorr (τ).

Alternatively, the embodiment of the present invention additionally provides the scheme calculating ecorr (τ), for improving in subsequent calculations ecorr (τ) effect, specific as follows: above-mentioned computing unit 302, specifically for according to ecorr (τ)=corr (τ)-corr (τ/2) meter Count in stating ecorr (τ), if τ is odd number, above-mentioned corr (τ/2) is obtained by near stratum exhaust.

Alternatively, the present invention implements to additionally provide the preferential implementation calculating corr (τ), specific as follows: above-mentioned calculating Unit 302, specifically for calculating the cube root of the frequency of above-mentioned spectrum (ω), and the frequency to above-mentioned spectrum (ω) The cube root of point makees fast Fourier transform treating excess syndrome portion, obtains above-mentioned corr (τ).

Further, the embodiment of the present invention additionally provides the amplitude passing through the amplitude spectrum s and noise spectrum n of audio frame further Spectrum determines the scheme of noise as reference value, can improve the accuracy of noise determination further, specific as follows: above-mentioned calculating list Unit 302, is additionally operable to, before above-mentioned signal determining unit 303 determines that above-mentioned audio frame is noise, calculate the width of above-mentioned audio frame The average distance d, d=20 (log10 (s)-log10 (n)) of the amplitude spectrum of degree spectrum s and noise spectrum n；

Above-mentioned signal determining unit 303, if be less than the audio frame of Second Threshold and continuous predetermined number specifically for above-mentioned d Max (ecorr) be respectively less than above-mentioned first threshold, or, if above-mentioned d is less than above-mentioned Second Threshold and above-mentioned max (ecorr) is right Not in preset threshold range it is determined that above-mentioned audio frame is noise, described preset threshold range is default time model to the τ answering Enclose.

Further, the embodiment of the present invention additionally provides the implementation updating noise spectrum, updates noise spectrum later permissible To make to determine more accurate during noise next time, also provides accurate foundation for subsequently carrying out noise reduction process, specific as follows: as Fig. 4 institute Show, said apparatus also include:

Noise spectrum updating block 401, if determining that above-mentioned audio frame is noise for above-mentioned signal determining unit 303, adopts Determine new noise spectrum n with the average mode of window.

Further, the embodiment of the present invention additionally provides the specific implementation of noise reduction process, as follows: as shown in figure 5, on State device also to include:

Filter unit 501, for carrying out Wiener filtering using above-mentioned new noise spectrum to the audio frame of above-mentioned audio signal.

Further, the embodiment of the present invention additionally provides the application scenarios determining non-noise, specific as follows: above-mentioned signal is true Order unit 303, if be additionally operable to above-mentioned audio frame not to be defined as noise it is determined that above-mentioned audio frame is voice or music.

Further, in order to improve the accuracy determining that audio frame is voice or music, the embodiment of the present invention also provides With reference to the implementation of the average distance d of the amplitude spectrum of the amplitude spectrum s and noise spectrum n of above-mentioned audio frame, specific as follows: above-mentioned Signal determining unit 303, was additionally operable to before the above-mentioned audio frame of above-mentioned determination is voice or music, if above-mentioned d is more than above-mentioned If Second Threshold and above-mentioned audio frame are not defined as noise it is determined that above-mentioned audio frame is voice or music.

Further, in view of the voice accurately having determined or music, the embodiment of the present invention additionally provides renewal threshold value Scheme, the accuracy that the type of subsequent audio frame determines can be improved further, specific as follows: as shown in fig. 6, said apparatus Also include:

Threshold value updating block 601, if for not determining that above-mentioned audio frame is voice or music, average using window Mode updates above-mentioned first threshold using the ecorr (τ) of above-mentioned audio frame.

The embodiment of the present invention additionally provides a kind of terminal unit, as shown in fig. 7, comprises: input-output equipment 701, process Device 702 and memorizer 703；Wherein, memorizer 703 can be used for storing the data that inputted by input-output equipment 701 or The data that person will be exported by input-output equipment 701, can be also used for providing required for processor 702 execution data processing Caching；

Wherein, above-mentioned processor 702, for obtaining pending audio signal, calculates above-mentioned audio signal sound intermediate frequency frame Power spectrum spectrum (ω)；Above-mentioned ω is the frequency of 2 π * power spectrum；

According to spectra calculation class autocorrelation spectrum corr (τ) of above-mentioned audio frame, above-mentioned τ is time value；

Calculate according to above-mentioned autocorrelation spectrum corr (τ) and strengthen Correlated Spectroscopy ecorr (τ)；

Obtain the maximum max (ecorr) in above-mentioned ecorr (τ), if the max of the audio frame of continuous predetermined number (ecorr) it is respectively less than first threshold it is determined that above-mentioned audio frame is noise, described first threshold is the threshold value strengthening Correlated Spectroscopy；

Alternatively, embodiments provide, as an optional implementation, ecorr is calculated by corr (τ) (τ) scheme, it should be noted that carrying out strengthening the realization not affecting the embodiment of the present invention by other means, the present invention is implemented Example is not made uniqueness and is limited, and above-mentioned processor 702 strengthens Correlated Spectroscopy ecorr for calculating according to above-mentioned autocorrelation spectrum corr (τ) (τ) include:

Alternatively, the embodiment of the present invention additionally provides the scheme calculating ecorr (τ), for improving in subsequent calculations ecorr (τ) effect, specific as follows: above-mentioned processor 702, strengthen spectrum ecorr (τ) inclusion for calculating:

Alternatively, the present invention implements to additionally provide the preferential implementation calculating corr (τ), specific as follows: above-mentioned process Device 702, includes for spectra calculation class autocorrelation spectrum corr (τ) according to above-mentioned audio frame:

Further, the embodiment of the present invention additionally provides the amplitude passing through the amplitude spectrum s and noise spectrum n of audio frame further Spectrum determines the scheme of noise as reference value, can improve the accuracy of noise determination further, specific as follows: above-mentioned processor 702, it is additionally operable to calculate the width of the amplitude spectrum s and noise spectrum n of above-mentioned audio frame before the above-mentioned audio frame of above-mentioned determination is noise The average distance d, d=20 (log10 (s)-log10 (n)) of degree spectrum；If above-mentioned d is less than Second Threshold and continuous predetermined number The max (ecorr) of audio frame is respectively less than above-mentioned first threshold, or, if above-mentioned d is less than above-mentioned Second Threshold and above-mentioned max (ecorr) not in preset threshold range it is determined that above-mentioned audio frame is noise, described preset threshold range is default to corresponding τ Time range.

Further, the embodiment of the present invention additionally provides the implementation updating noise spectrum, updates noise spectrum later permissible To make to determine more accurate during noise next time, also provides accurate foundation for subsequently carrying out noise reduction process, specific as follows: above-mentioned process Device 702, is additionally operable to if it is determined that above-mentioned audio frame is noise, then determine new noise spectrum by the way of window is average.

Further, the embodiment of the present invention additionally provides the specific implementation of noise reduction process, as follows: above-mentioned processor 702, it is additionally operable to, after the new noise spectrum of above-mentioned determination, using above-mentioned new noise spectrum, the audio frame of above-mentioned audio signal be tieed up Nanofiltration ripple.

Further, the embodiment of the present invention additionally provides the application scenarios determining non-noise, specific as follows: above-mentioned processor 702, if being additionally operable to above-mentioned audio frame not to be defined as noise it is determined that above-mentioned audio frame is voice or music.

Further, in order to improve the accuracy determining that audio frame is voice or music, the embodiment of the present invention also provides With reference to the implementation of the average distance d of the amplitude spectrum of the amplitude spectrum s and noise spectrum n of above-mentioned audio frame, specific as follows: above-mentioned Processor 702, was additionally operable to before the above-mentioned audio frame of above-mentioned determination is voice or music, if above-mentioned d is more than above-mentioned second threshold If value and above-mentioned audio frame are not defined as noise it is determined that above-mentioned audio frame is voice or music.

Further, in view of the voice accurately having determined or music, the embodiment of the present invention additionally provides renewal threshold value Scheme, can improve further subsequent audio frame type determine accuracy, specific as follows: above-mentioned processor 702, also use If in not determining that above-mentioned audio frame is voice or music, using the ecorr of above-mentioned audio frame by the way of window is average (τ) update above-mentioned first threshold.

The embodiment of the present invention additionally provides another kind of terminal unit, as shown in figure 8, for convenience of description, illustrate only with The related part of the embodiment of the present invention, particular technique details does not disclose, and refer to present invention method part.This terminal Equipment can be including mobile phone, panel computer, pda (personal digital assistant, personal digital assistant), pos The arbitrarily terminal unit such as (point of sales, point-of-sale terminal), vehicle-mounted computer, so that terminal unit is as mobile phone as a example:

Fig. 8 is illustrated that the block diagram of the part-structure of the mobile phone related to terminal unit provided in an embodiment of the present invention.Ginseng Examine Fig. 8, mobile phone includes: radio frequency (radio frequency, rf) circuit 810, memorizer 820, input block 830, display unit 840th, sensor 850, voicefrequency circuit 860, Wireless Fidelity (wireless fidelity, wifi) module 870, processor 880, And the part such as power supply 890.It will be understood by those skilled in the art that the handset structure shown in Fig. 8 is not constituted to mobile phone Limit, ratio can be included and illustrate more or less of part, or combine some parts, or different part arrangements.

With reference to Fig. 8, each component parts of mobile phone are specifically introduced:

Rf circuit 810 can be used for receiving and sending messages or communication process in, the reception of signal and transmission, especially, by base station After downlink information receives, process to processor 880；In addition, up data is activation will be designed to base station.Generally, rf circuit 810 Including but not limited to antenna, at least one amplifier, transceiver, bonder, low-noise amplifier (low noise Amplifier, lna), duplexer etc..Additionally, rf circuit 810 can also be communicated with network and other equipment by radio communication. Above-mentioned radio communication can use arbitrary communication standard or agreement, including but not limited to global system for mobile communications (global System of mobile communication, gsm), general packet radio service (general packet radio Service, gprs), CDMA (code division multiple access, cdma), WCDMA (wideband code division multiple access, wcdma), Long Term Evolution (long term evolution, Lte), Email, Short Message Service (short messaging service, sms) etc..

Memorizer 820 can be used for storing software program and module, and processor 880 is stored in memorizer 820 by operation Software program and module, thus executing various function application and the data processing of mobile phone.Memorizer 820 can mainly include Storing program area and storage data field, wherein, storing program area can application journey needed for storage program area, at least one function Sequence (such as sound-playing function, image player function etc.) etc.；Storage data field can store according to mobile phone using being created Data (such as voice data, phone directory etc.) etc..Additionally, memorizer 820 can include high-speed random access memory, acceptable Including nonvolatile memory, for example, at least one disk memory, flush memory device or other volatile solid-state Part.

Input block 830 can be used for numeral or the character information of receives input, and produce with the user setup of mobile phone with And the key signals input that function control is relevant.Specifically, input block 830 may include contact panel 831 and other inputs set Standby 832.Contact panel 831, also referred to as touch screen, can collect user thereon or neighbouring touch operation (such as user uses Any suitable object such as finger, stylus or adnexa on contact panel 831 or the operation near contact panel 831), and root Drive corresponding attachment means according to formula set in advance.Optionally, contact panel 831 may include touch detecting apparatus and touch Two parts of controller.Wherein, touch detecting apparatus detect the touch orientation of user, and detect the signal that touch operation brings, Transmit a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and is converted into touching Point coordinates, then give processor 880, and can the order sent of receiving processor 880 being executed.Furthermore, it is possible to using electricity The polytypes such as resistive, condenser type, infrared ray and surface acoustic wave realize contact panel 831.Except contact panel 831, input Unit 830 can also include other input equipments 832.Specifically, other input equipments 832 can include but is not limited to secondary or physical bond One or more of disk, function key (such as volume control button, switch key etc.), trace ball, mouse, action bars etc..

Display unit 840 can be used for display and by the information of user input or is supplied to the information of user and the various of mobile phone Menu.Display unit 840 may include display floater 841, optionally, can adopt liquid crystal display (liquid crystal Display, lcd), the form such as Organic Light Emitting Diode (organic light-emitting diode, oled) aobvious to configure Show panel 841.Further, contact panel 831 can cover display floater 841, when contact panel 831 detect thereon or attached After near touch operation, send processor 880 to determine the type of touch event, with preprocessor 880 according to touch event Type corresponding visual output is provided on display floater 841.Although in fig. 8, contact panel 831 and display floater 841 It is input and the input function to realize mobile phone as two independent parts, but in some embodiments it is possible to by touch-control Panel 831 is integrated with display floater 841 and realizes mobile phone input and output function.

Mobile phone may also include at least one sensor 850, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity transducer, and wherein, ambient light sensor can be according to ambient light The brightness to adjust display floater 841 for the light and shade, proximity transducer can cut out display floater 841 when mobile phone moves in one's ear And/or backlight.As one kind of motion sensor, accelerometer sensor can detect (generally three axles) acceleration in all directions Size, can detect that size and the direction of gravity when static, can be used for identify mobile phone attitude application (such as horizontal/vertical screen is cut Change, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.；Also may be used as mobile phone The other sensors such as the gyroscope of configuration, barometer, drimeter, thermometer, infrared ray sensor, will not be described here.

Voicefrequency circuit 860, speaker 861, microphone 862 can provide the audio interface between user and mobile phone.Audio-frequency electric The signal of telecommunication after the voice data receiving conversion can be transferred to speaker 861, is converted to sound by speaker 861 by road 860 Signal output；On the other hand, the acoustical signal of collection is converted to the signal of telecommunication by microphone 862, turns after being received by voicefrequency circuit 860 It is changed to voice data, then after voice data output processor 880 is processed, through rf circuit 810 to be sent to such as another mobile phone, Or voice data is exported to memorizer 820 to process further.

Wifi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronicses postal by wifi module 870 Part, browse webpage and access streaming video etc., it has provided the user wireless broadband internet and has accessed.Although Fig. 8 shows Wifi module 870, but it is understood that, it is simultaneously not belonging to must be configured into of mobile phone, can not change as needed completely Omit in the scope of the essence becoming invention.

Processor 880 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, leads to Cross and run or software program and/or module that execution is stored in memorizer 820, and call and be stored in memorizer 820 Data, the various functions of execution mobile phone and processing data, thus carry out integral monitoring to mobile phone.Optionally, processor 880 can wrap Include one or more processing units；Preferably, processor 880 can integrated application processor and modem processor, wherein, should Mainly process operating system, user interface and application program etc. with processor, modem processor mainly processes radio communication. It is understood that above-mentioned modem processor can not also be integrated in processor 880.

Mobile phone also includes the power supply 890 (such as battery) powered to all parts it is preferred that power supply can pass through power supply pipe Reason system is logically contiguous with processor 880, thus realizing management charging, electric discharge and power managed by power-supply management system Etc. function.

Although not shown, mobile phone can also include photographic head, bluetooth module etc., will not be described here.

In embodiments of the present invention, the function of the processor 880 included by this terminal unit can correspond to aforementioned enforcement The function of processor 702 in example.Wherein, voicefrequency circuit 860 can use collection audio signal as input-output equipment.

It should be noted that in said apparatus embodiment, included unit simply carries out drawing according to function logic Point, but it is not limited to above-mentioned division, as long as being capable of corresponding function；In addition, each functional unit is concrete Title also only to facilitate mutual distinguish, is not limited to protection scope of the present invention.

In addition, one of ordinary skill in the art will appreciate that realizing all or part of step in above-mentioned each method embodiment The program that can be by completes come the hardware to instruct correlation, and corresponding program can be stored in a kind of computer-readable recording medium In, storage medium mentioned above can be read only memory, disk or CD etc..

These are only the present invention preferably specific embodiment, but protection scope of the present invention is not limited thereto, any Those familiar with the art in the technical scope that the embodiment of the present invention discloses, the change that can readily occur in or replace Change, all should be included within the scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claim Enclose and be defined.

Claims

1. a kind of noise detecting method is it is characterised in that include:

Obtain pending audio signal, calculate the power spectrum spectrum (ω) of described audio signal sound intermediate frequency frame；Described ω Frequency for 2 π * power spectrum；

Obtain the maximum max (ecorr) in described ecorr (τ), if the max (ecorr) of the audio frame of continuous predetermined number is all Less than first threshold it is determined that described audio frame is noise, described first threshold is the threshold value strengthening Correlated Spectroscopy；

Or, obtain the corresponding τ of described max (ecorr), if the corresponding τ of described max (ecorr) is not in preset threshold range, Determine that described audio frame is noise, described preset threshold range is default time range.

2. according to claim 1 method it is characterised in that described according to described autocorrelation spectrum corr (τ) calculate strengthen phase Close spectrum ecorr (τ) to include:

3. method composes ecorr (τ) inclusion it is characterised in that described calculating strengthens according to claim 2:

Calculate described ecorr (τ) according to ecorr (τ)=corr (τ)-corr (τ/2), if τ is odd number, described corr (τ/2) Obtained by near stratum exhaust.

4. according to claim 1 method it is characterised in that the described spectra calculation class auto-correlation according to described audio frame Spectrum corr (τ) includes:

Calculate the cube root of the frequency of described spectrum (ω), and the cube root of the frequency to described spectrum (ω) Make fast Fourier transform treating excess syndrome portion, obtain described corr (τ).

5. according to Claims 1-4 any one methods described it is characterised in that being noise in the described audio frame of described determination Before, methods described also includes:

Calculate the average distance d, d=20 (log10 (s)-log10 of the amplitude spectrum of amplitude spectrum s and noise spectrum n of described audio frame (n))；If described d is less than Second Threshold and the max (ecorr) of the audio frame of continuous predetermined number is respectively less than described first threshold, Or, if described d is less than described Second Threshold and the corresponding τ of described max (ecorr) not in preset threshold range it is determined that institute Stating audio frame is noise, and described preset threshold range is default time range.

6. according to claim 5 method it is characterised in that methods described also includes:

7. according to claim 6 method it is characterised in that described determination new noise spectrum after, methods described also includes:

8. according to Claims 1-4 any one methods described it is characterised in that methods described also includes:

9. according to claim 8 method it is characterised in that the described audio frame of described determination be voice or music it Before, methods described also includes:

10. according to claim 9 method it is characterised in that methods described also includes:

A kind of 11. noise detection apparatus are it is characterised in that include:

Signal acquiring unit, for obtaining pending audio signal；

Computing unit, for calculating the power spectrum spectrum (ω) of described audio signal sound intermediate frequency frame；Described ω is 2 π * power The frequency of spectrum；According to spectra calculation class autocorrelation spectrum corr (τ) of described audio frame, described τ is time value；According to described from Correlated Spectroscopy corr (τ) calculates and strengthens Correlated Spectroscopy ecorr (τ)；

Signal determining unit, for obtaining the maximum max (ecorr) in described ecorr (τ), if the sound of continuous predetermined number The max (ecorr) of frequency frame is respectively less than first threshold it is determined that described audio frame is noise, and described first threshold is to strengthen correlation The threshold value of spectrum；Or, obtain the corresponding τ of described max (ecorr), if the corresponding τ of described max (ecorr) is not in predetermined threshold value model Enclose it is determined that described audio frame is noise, described preset threshold range is default time range.

12. according to claim 11 described device it is characterised in that

Described computing unit, specifically for being entered as 0 by the value being less than 0 in described corr (τ) and then calculating enhancing spectrum ecorr (τ)；The value being less than 0 in described ecorr (τ) is entered as 0, obtains strengthening Correlated Spectroscopy ecorr (τ).

13. according to claim 12 described device it is characterised in that

Described computing unit, specifically for calculating described ecorr (τ) according to ecorr (τ)=corr (τ)-corr (τ/2), if τ For odd number, described corr (τ/2) is obtained by near stratum exhaust.

14. according to claim 11 described device it is characterised in that

Described computing unit, specifically for calculating the cube root of the frequency of described spectrum (ω), and to described The cube root of the frequency of spectrum (ω) makees fast Fourier transform treating excess syndrome portion, obtains described corr (τ).

15. according to claim 11 to 14 any one described device it is characterised in that

Described computing unit, is additionally operable to, before described signal determining unit determines that described audio frame is noise, calculate described sound The average distance d of the amplitude spectrum of amplitude spectrum s and noise spectrum n of frequency frame, d=20 (log10 (s)-log10 (n))；

Described signal determining unit, if be less than the max of the audio frame of Second Threshold and continuous predetermined number specifically for described d (ecorr) it is respectively less than described first threshold, or, if described d is less than described Second Threshold and the corresponding τ of described max (ecorr) Not in preset threshold range it is determined that described audio frame is noise, described preset threshold range is default time range.

16. according to claim 15 described device it is characterised in that described device also includes:

Noise spectrum updating block, if determining that described audio frame is noise for described signal determining unit, average using window Mode determine new noise spectrum n.

17. according to claim 16 described device it is characterised in that described device also includes:

18. according to claim 11 to 14 any one described device it is characterised in that

Described signal determining unit, if be additionally operable to described audio frame be not defined as noise it is determined that described audio frame be voice or Person's music.

19. according to claim 18 described device it is characterised in that

Described signal determining unit, is additionally operable to before the described audio frame of described determination is voice or music, if described d is more than If described Second Threshold and described audio frame are not defined as noise it is determined that described audio frame is voice or music.

20. according to claim 19 described device it is characterised in that described device also includes:

Threshold value updating block, if for not determining that described audio frame is voice or music, make by the way of window is average Update described first threshold with the ecorr (τ) of described audio frame.