CN115762547A

CN115762547A - Method, device, coding method, medium and equipment for detecting and eliminating noise

Info

Publication number: CN115762547A
Application number: CN202211212786.4A
Authority: CN
Inventors: 李强; 王尧; 叶东翔; 朱勇
Original assignee: Barrot Wireless Co Ltd
Current assignee: Barrot Wireless Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-03-07

Abstract

The application discloses a method, a device, an encoding method, a medium and equipment for detecting and eliminating noise, which belong to the technical field of Bluetooth, wherein the method comprises the following steps: acquiring an original frequency domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by a coder/decoder with improved discrete cosine transform processing; calculating the flatness of a pseudo spectrum according to the original frequency domain spectral coefficient; setting a noise updating speed parameter according to the pseudospectral flatness and calculating the noise energy of the current frame audio under the condition that the pseudospectral flatness is greater than a preset threshold; and calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performing subsequent encoding or decoding process according to the updated spectral coefficient. The audio data are encoded by using the existing encoding process in the encoder, and meanwhile, the noise updating speed parameter is set by using the pseudo-spectrum flatness, so that transition among multiple frames is performed, power consumption and delay are reduced, and the user experience is improved.

Description

Method, device, coding method, medium and equipment for detecting and eliminating noise

Technical Field

The present application relates to the field of bluetooth technology, and in particular, to a method, an apparatus, an encoding method, a medium, and a device for detecting and eliminating noise.

Background

Hiss noise is a stable additive noise in the full frequency band, and sounds a hissing sound, such as the noise is often introduced by the process of some old recordings using digital storage, and is also easily introduced by recording equipment and the like during live broadcasting, so that the user experience is reduced. In order to eliminate noise, the prior art includes a processing flow for suppressing Hiss noise: 1. inputting an audio signal to be processed (PCM format); 2. an analysis window; 3. fourier transform; 4. identifying a frame type; 5, estimating and updating Hiss noise; 6. a spectral gain; 7. smoothing the frequency spectrum; 8. multiplying; 9. performing Fourier inversion; 10. a synthesis window; 11. overlap-add; 12. and outputting the audio signal with the Hiss noise suppressed. In the existing bluetooth application field, there are requirements for high sound quality, low power consumption and low latency. Firstly, the processing procedures are multiple and complex, and the power consumption of the Bluetooth equipment is increased. Meanwhile, the method adopts the overlap-add method to ensure the smoothness of the inter-frame signals, but the processing method can increase the delay, does not meet the low-delay target or requirement of Bluetooth and influences the use experience of users.

Disclosure of Invention

The application provides a method, a device, an encoding method, a medium and equipment for detecting and eliminating noise, aiming at the problems of complex flow, high power consumption and high delay when Hiss noise is detected and eliminated.

In a first aspect, the present application provides a method for detecting and eliminating noise, including: acquiring an original frequency domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by a coder/decoder with improved discrete cosine transform processing; calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients; setting a noise updating speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is larger than a preset threshold; calculating the noise energy of the current frame audio according to the noise updating speed parameter; and calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performing subsequent encoding or decoding process according to the updated spectral coefficient.

Optionally, in the process of encoding or decoding the audio by the codec with the modified discrete cosine transform process, obtaining the original frequency domain spectral coefficient corresponding to the current frame audio includes: in the standard coding or decoding process of the coder and the decoder, the audio of the current frame is subjected to improved discrete cosine transform to obtain the original frequency domain spectral coefficient.

Optionally, calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients, including: calculating a pseudo spectrum corresponding to the current frame audio according to the original frequency domain spectral coefficient; calculating and obtaining a pseudo-spectrum geometric mean value and a pseudo-spectrum arithmetic mean value corresponding to the current frame audio according to the pseudo-spectrum; and calculating to obtain the pseudospectral flatness according to the pseudospectral geometric mean and the pseudospectral arithmetic mean.

Optionally, setting a noise update speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is greater than a preset threshold, including: and setting a noise updating speed parameter according to the size of the pseudospectral flatness, wherein the size of the noise updating speed parameter is in direct proportion to the size of the pseudospectral flatness.

Optionally, setting a noise update speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is greater than a preset threshold, further comprising: under the condition that the flatness of the pseudo-spectrum is greater than a preset threshold, judging whether the flatness of the pseudo-spectrum corresponding to one audio frame is smaller than the preset threshold or not in a preset number of frames of audio before the current frame of audio; if not, setting a noise updating speed parameter according to the size of the pseudospectral flatness corresponding to the current frame audio.

Optionally, calculating the noise energy of the current frame audio according to the noise update speed parameter includes: selecting a non-speech frequency band of the current frame audio, and calculating a pseudo-spectral median corresponding to the current frame audio; and calculating to obtain the current frame moving average energy corresponding to the current frame audio according to the pseudo-spectral median, the noise updating speed parameter and the moving average energy of the previous frame audio, and taking the current frame moving average energy as the noise energy of the current frame audio.

In a second aspect, the present application provides an apparatus for detecting and removing noise, comprising: the frequency domain spectral coefficient acquisition module is used for acquiring an original frequency domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by the codec with the improved discrete cosine transform processing; the pseudo-spectral flatness calculation module is used for calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients; the noise updating speed parameter determining module is used for setting a noise updating speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is greater than a preset threshold; the noise energy calculation module calculates the noise energy of the current frame audio according to the noise updating speed parameter; and the spectral coefficient updating module is used for calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performing subsequent encoding or decoding process according to the updated spectral coefficient.

In a third aspect, the present application provides an audio encoding method, including: in the process of coding the audio by the coder with improved discrete cosine transform processing, obtaining an original frequency domain spectral coefficient corresponding to the current frame audio; calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficient; setting a noise updating speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is larger than a preset threshold; calculating the noise energy of the current frame audio according to the noise updating speed parameter; calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy; and updating other coding modules in the coder according to the updated spectral coefficients, and continuously coding the current frame audio by using the updated coding modules.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program, wherein the computer program is operative to perform the method of detecting and removing noise in scheme one or the method of audio encoding in scheme three.

In a fifth aspect, the present application provides a computer apparatus comprising a processor and a memory, the memory storing a computer program, wherein: the processor operates the computer program to perform the method of detecting and removing noise in scheme one or the audio encoding method in scheme three.

The method for detecting and eliminating the noise encodes the audio data by utilizing the existing encoding process in the encoder, sets the noise updating speed parameter by utilizing the pseudospectral flatness, performs transition among multiple frames, has the characteristics of low power consumption and low delay, and improves the use experience of users.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below illustrate some embodiments of the present application by way of example.

FIG. 1 is a schematic diagram illustrating one embodiment of a method of detecting and canceling noise according to the present application;

FIG. 2 is a diagram illustrating a section of noise and its corresponding noise flatness according to the present application;

FIG. 3 is a schematic diagram of a section of a human voice and its corresponding flatness of the human voice spectrum of the present application;

FIG. 4 is a schematic diagram illustrating one embodiment of an apparatus for detecting and canceling noise according to the present disclosure;

fig. 5 is a schematic diagram illustrating an embodiment of an audio encoding method of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

The following detailed description of the preferred embodiments of the present application, taken in conjunction with the accompanying drawings, will provide those skilled in the art with a better understanding of the advantages and features of the present application, and will make the scope of the present application more clear and definite.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Hiss noise is a stable additive noise in the full frequency band, and sounds a hissing sound, such as the noise is often introduced by the process of some old recordings using digital storage, and is also easily introduced by recording equipment and the like during live broadcasting, so that the user experience is reduced. In order to eliminate noise, the prior art includes a processing flow for suppressing Hiss noise: 1. inputting an audio signal to be processed (PCM format); 2. an analysis window; 3. fourier transform; 4. identifying a frame type; 5, estimating and updating Hiss noise; 6. a spectral gain; 7. smoothing the frequency spectrum; 8. multiplying; 9. performing Fourier inversion; 10. a synthesis window; 11. overlap-add; 12. and outputting the audio signal with the Hiss noise suppressed. In the existing bluetooth application field, there are requirements for high sound quality, low power consumption and low delay. Firstly, the processing procedures are multiple and complex, and the power consumption of the Bluetooth equipment is increased. Meanwhile, the method adopts the overlap-add method to ensure the smoothness of the inter-frame signals, but the processing method can increase the delay, does not meet the low-delay target or requirement of Bluetooth and influences the use experience of users.

In view of the above problems, the present application provides a method, an apparatus, an encoding method, a medium, and a device for detecting and eliminating noise, where the method includes: acquiring an original frequency domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by a coder/decoder with improved discrete cosine transform processing; calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients; setting a noise updating speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is larger than a preset threshold; calculating the noise energy of the current frame audio according to the noise updating speed parameter; and calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performing subsequent encoding or decoding process according to the updated spectral coefficient. The technical scheme of the application mainly aims at Hiss noise. The noise is of various types, including stationary and non-stationary, and additive and multiplicative, and the application is also effective for additive stationary noise.

The method for detecting and eliminating the noise encodes or decodes the audio by using the existing encoding or decoding process in the codec with the improved discrete cosine transform processing, obtains the original frequency domain spectral coefficient, obtains the pseudo-spectral flatness, sets the noise updating speed parameter by using the pseudo-spectral flatness, performs transition among multiple frames, has the characteristics of low power consumption and low delay, and improves the user experience.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The specific embodiments described below can be combined with each other to form new embodiments. The same or similar ideas or processes described in one embodiment may not be repeated in other embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram illustrating one embodiment of a method for detecting and canceling noise according to the present application.

In the embodiment shown in fig. 1, the method for detecting and removing noise of the present application includes a process S101, where in the process of encoding or decoding the audio by the codec with the modified discrete cosine transform process, the original frequency-domain spectral coefficients corresponding to the current frame audio are obtained.

In this embodiment, in the method for detecting and removing noise of the present application, an original frequency domain spectral coefficient of a current frame audio is obtained in an encoding or decoding process by using an existing standard encoding or decoding process of an audio in a codec through an improved discrete cosine transform process, and then subsequent noise detection is performed through the original frequency domain spectral coefficient.

Optionally, in the process of encoding or decoding the audio by the codec with the modified discrete cosine transform, obtaining an original frequency domain spectral coefficient corresponding to the current frame audio, including: in the standard coding or decoding process of the coder and the decoder, the audio of the current frame is subjected to improved discrete cosine transform to obtain the original frequency domain spectral coefficient.

In this alternative embodiment, for convenience of description, the present application takes the LC3 codec as an example to encode audio for the codec with the modified dct process, and the technical solution of the present application is described. Wherein the process is similar for other codecs with modified discrete cosine transform processing.

The method comprises the steps of utilizing an existing improved discrete cosine transform module in an LC3 encoder to encode current frame audio to obtain an original frequency domain spectral coefficient corresponding to the current frame audio, and then utilizing the original frequency domain spectral coefficient to detect Hiss noise in audio data. In the background art, when the Hiss noise is eliminated, the prior art includes processes such as an analysis window, fourier transform, inverse fourier transform, and the like, so that the problems of high power consumption, high delay, and poor user experience exist. In the method, the processing procedures can be omitted by means of the existing coding module in the coder, so that the power consumption is reduced, and the requirements on the storage space and the computing capability of the whole coding system are reduced. Meanwhile, in the scheme in the prior art, certain errors are introduced in the processes of Fourier transform, inverse transform, a synthesis window, an analysis window and the like, and the influence of high delay on tone quality is avoided because the modules are omitted in the method; in addition, the technical scheme in the application has lower requirements on computing power, and the endurance time of the Bluetooth embedded equipment is ensured.

Specifically, taking the example that an LC3 encoder encodes audio data with a frame length of 10ms and a sampling rate of 48kHz to obtain frequency-domain spectral coefficients, an acquisition process of original frequency-domain spectral coefficients of audio is described. According to LC3 weavingThe standard coding process of the coder completes low-delay modified discrete cosine transform (LD-MDCT) calculation on the input audio data with the frame length of 10ms to obtain a corresponding original frequency domain spectral coefficient. Wherein, the audio data of the current frame, n =0,1, x _s (n)2,…,N _F ，

t(n)＝x _s (Z-N _F +n),for n＝0…2·N _F -1-Z

t(2N _F -Z+n)＝0,for n＝0…Z-1

In the above formula, based on the LC3 standard specification, N _F Is 480, Z is 180, w _{Nms_NF} (n) is the low-delay MDCT window, and X (k) is the current frame time-domain audio data X _s (n) corresponding frequency domain spectral coefficients.

In the embodiment shown in fig. 1, the method for detecting and removing noise of the present application includes a process S102 of calculating a pseudo-spectral flatness of the audio of the current frame according to the original frequency-domain spectral coefficients.

Optionally, calculating the pseudo-spectral flatness of the current frame audio data according to the original frequency domain spectral coefficients includes: calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients, wherein the pseudo-spectral flatness comprises the following steps: calculating a pseudo spectrum corresponding to the current frame audio according to the original frequency domain spectral coefficient; calculating and obtaining a pseudo-spectrum geometric mean value and a pseudo-spectrum arithmetic mean value corresponding to the current frame audio according to the pseudo-spectrum; and calculating to obtain the pseudospectral flatness according to the pseudospectral geometric mean and the pseudospectral arithmetic mean.

Specifically, the process of calculating the pseudo-spectral flatness of the current frame audio data according to the original frequency domain spectral coefficients is as follows:

calculating a pseudo spectrum:

wherein X (k) =0, when k = -1 or N _F Time of flight

Calculating a pseudo-spectral geometric mean:

calculating the arithmetic mean of the pseudo-spectrum:

calculating the pseudospectral flatness:

in the embodiment shown in fig. 1, the method for detecting and eliminating noise of the present application includes a process S103, where a noise update speed parameter is set according to a pseudo-spectral flatness if the pseudo-spectral flatness is greater than a preset threshold.

In this embodiment, when noise removal of audio is performed, because noise data in an audio frame is removed, there may be a difference in the degree of noise removal between two consecutive frames, so that the originally smooth audio fluctuates greatly, thereby reducing the user experience. In order to ensure the smoothness of transition among multiple frames after noise processing, the noise updating speed parameter is set in the process of noise elimination so as to adjust the degree of noise elimination among different audio frames, ensure the audio tone quality and eliminate noise.

Specifically, the preset threshold is set to 0.15, where the preset threshold is used to distinguish whether the current frame audio is a pitch frame of a normal audio or a voiced sound frame containing noise. If the flatness of the pseudo-spectrum is less than the threshold, the current frame is a tone signal and is a tone frame, and noise elimination processing is not needed; if the flatness of the pseudo spectrum is greater than or equal to the preset threshold, the audio frame is a voiced frame, and noise elimination processing is required.

Optionally, under the condition that the pseudo-spectral flatness is greater than the preset threshold, setting a noise update speed parameter according to the pseudo-spectral flatness, including: and setting a noise updating speed parameter according to the size of the pseudospectral flatness, wherein the size of the noise updating speed parameter is in direct proportion to the size of the pseudospectral flatness.

In this alternative embodiment, when determining the noise update speed parameter, reasonable setting is performed according to the magnitude of the pseudo-spectral flatness. Wherein, the magnitude of the noise updating speed parameter is in direct proportion to the magnitude of the pseudospectral flatness. Correspondingly setting a larger noise updating speed parameter when the pseudo-spectral flatness of the current frame audio data is larger; the smaller the pseudospectral flatness of the current frame audio data is, the smaller the noise update speed parameter is correspondingly set.

Optionally, setting a noise update speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is greater than a preset threshold, further comprising: under the condition that the flatness of the pseudo spectrum is greater than a preset threshold, judging whether the flatness of the pseudo spectrum corresponding to one audio frame is smaller than the preset threshold or not in a preset number of frames of audio before the current frame of audio; if not, setting a noise updating speed parameter according to the size of the pseudospectral flatness corresponding to the current frame audio.

In this alternative embodiment, when the pseudo-spectral flatness of the audio of the current frame is determined to be greater than or equal to the predetermined threshold and is determined to be a voiced frame containing noise, specifically, for speech, the unvoiced speech after the voiced frame is generally spectral flatness similar to noise, and for the sake of more accurate noise processing, a pitch frame delay factor is set, and if the current frame is a pitch frame, the following frames are considered to be pitch frames regardless of the magnitude of the spectral flatness. If the current frame is a tone frame, the Hiss noise does not exist or the energy of the Hiss noise is very small, the spectral coefficient is not updated at the moment, the damage to the tone quality is avoided, the noise calculation is still updated at the moment, and the reasonable noise estimation is convenient for a certain frame in the future when the spectral coefficient needs to be updated. Specifically, the frame number of the preset number of frames may be set to 5 frames, and it should be noted that, regarding the selection of the preset threshold and the preset number of frames, reasonable adjustment may be performed according to the actual processing requirement, and the present application is not particularly limited.

The noise update speed α is set according to the spectral flatness, i.e., the lower the spectral flatness, the slower the noise update speed becomes, and vice versa, the stronger the pitch component becomes. Specifically, the update speed α may be set to 0.05 when the spectrum flatness is 0.1, and 0.25 when the spectrum flatness is 0.8, so that the linear relationship is as follows, and other values are obtained according to the following formula.

Fig. 2 shows a schematic diagram of a section of noise and its corresponding noise flatness according to the present application. Fig. 3 shows a schematic diagram of a section of human voice and its corresponding human voice spectrum flatness.

In the embodiment shown in fig. 1, the method for detecting and removing noise of the present application includes a process S104 of calculating noise energy of the current frame audio according to the noise update speed parameter.

Specifically, a non-voice frequency band is selected and a median of a pseudo spectrum is calculated, usually, main energy of the voice frequency band is concentrated in a range of 300Hz to 3400Hz, taking a currently configured sampling rate of 48Hz as an example, an effective bandwidth of the non-voice frequency band is 24kHz, a frequency band of 8kHz to 20kHz can be selected to calculate and judge Hiss noise, and a corresponding spectral coefficient index k ranges from 160 to 400.

Median value of non-speech band pseudo spectrum of current frame:

X _pseudo-Med ＝median(X _pseudo (k) K = 160-400, the median () operation represents taking the median value thereof.

The moving average energy of the current frame is:

P _ma ＝α*P _ma-la +(1-α)*X _pseudo-Me in which P is _ma-last Is the moving average energy of the previous frame, and α is a moving average factor, which is used to control the noise update speed and avoid abrupt change of the inter-frame noise estimation, typically taking a value of 0.95.

In the embodiment shown in fig. 1, the method for detecting and removing noise of the present application includes a process S105, calculating updated spectral coefficients of the current frame audio according to the original frequency domain spectral coefficients and the noise energy, and performing a subsequent encoding or decoding process according to the updated spectral coefficients.

Specifically, as mentioned above, when the spectral flatness is greater than 0.15, which indicates that there is noise in the current frame audio, the noise processing is started, and the noise is removed by updating the spectral coefficients:

the updating of the spectral coefficient is as shown in the above formula, and after obtaining a new spectral coefficient, the encoding process of the rest modules of the encoder is continuously completed in the encoder. And finishing the rest coding modules according to LC3 standard specifications, including transform domain noise shaping, time domain noise shaping, quantization and noise level estimation, arithmetic and residual coding and code stream packaging.

The method for detecting and eliminating the noise can be applied to the coding process of a coder or the decoding process of a decoder with improved discrete cosine transform processing to detect and eliminate the Hiss noise. In addition, the method for detecting and eliminating the noise encodes the audio data by using the existing encoding process in the encoder, sets the noise updating speed parameter by using the pseudospectral flatness, performs transition among multiple frames, has the characteristics of low power consumption and low delay, and improves the user experience.

Fig. 4 shows an embodiment of the apparatus for detecting and removing noise according to the present application.

In the embodiment shown in fig. 4, the apparatus for detecting and eliminating noise of the present application includes: a frequency-domain spectral coefficient obtaining module 401, which obtains an original frequency-domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by the codec with the modified discrete cosine transform processing; a pseudo-spectral flatness calculation module 402, which calculates the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients; a noise update speed parameter determination module 403, configured to set a noise update speed parameter according to the pseudo-spectrum flatness if the pseudo-spectrum flatness is greater than a preset threshold; a noise energy calculating module 404, which calculates the noise energy of the current frame audio according to the noise update speed parameter; and a spectral coefficient updating module 405, which calculates an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performs a subsequent encoding or decoding process according to the updated spectral coefficient.

Optionally, in the frequency domain spectral coefficient obtaining module 401, in the standard encoding or decoding process of the codec, the modified discrete cosine transform is performed on the current frame audio to obtain the original frequency domain spectral coefficient.

Optionally, in the pseudo-spectrum flatness calculation module 402, a pseudo-spectrum corresponding to the current frame audio is calculated according to the original frequency domain spectral coefficient; calculating and obtaining a pseudo-spectrum geometric mean value and a pseudo-spectrum arithmetic mean value corresponding to the current frame audio according to the pseudo-spectrum; and calculating to obtain the pseudospectral flatness according to the pseudospectral geometric mean and the pseudospectral arithmetic mean.

Optionally, in the noise update speed parameter determining module 403, a noise update speed parameter is set according to the magnitude of the pseudo-spectral flatness, where the magnitude of the noise update speed parameter is in a direct proportional relationship with the magnitude of the pseudo-spectral flatness.

Optionally, in the noise update speed parameter determining module 403, under the condition that the pseudo-spectral flatness is greater than the preset threshold, it is determined whether there is a pseudo-spectral flatness corresponding to an audio frame that is less than the preset threshold in a preset number of frames of audio before the current frame of audio; if not, setting a noise updating speed parameter according to the size of the pseudospectral flatness corresponding to the current frame audio.

Optionally, in the noise energy calculation module 404, a non-speech frequency band of the current frame audio is selected, and a pseudo-spectral median corresponding to the current frame audio is calculated; and calculating to obtain the current frame moving average energy corresponding to the current frame audio according to the pseudo-spectral median, the noise updating speed parameter and the moving average energy of the previous frame audio, and taking the current frame moving average energy as the noise energy of the current frame audio.

The device for detecting and eliminating the noise encodes the audio data by utilizing the existing encoding process in the encoder, sets the noise updating speed parameter by utilizing the pseudospectral flatness, performs transition among multiple frames, has the characteristics of low power consumption and low delay, and improves the use experience of users.

Fig. 5 shows an embodiment of the audio encoding method of the present application.

In the embodiment shown in fig. 5, the audio encoding method of the present application includes: the process S501, in the process of coding the audio by the coder with improved discrete cosine transform processing, obtaining the original frequency domain spectral coefficient corresponding to the current frame audio; the process S502, calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficient; a process S503, setting a noise updating speed parameter according to the pseudospectral flatness under the condition that the pseudospectral flatness is greater than a preset threshold; the process S504, calculating the noise energy of the current frame audio frequency according to the noise updating speed parameter; and a process S505, calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy; and a process S506, updating other coding modules in the encoder according to the updated spectral coefficients, and continuing to encode the current frame audio by using the updated coding modules.

The audio coding method of the application codes audio data by using the existing coding process in the coder, sets the noise updating speed parameter by using the pseudo-spectral flatness, performs transition among multiple frames, has the characteristics of low power consumption and low delay, and improves the user experience.

In one embodiment of the present application, a computer-readable storage medium stores computer instructions, wherein the computer instructions are operable to perform the method of detecting and removing noise or the method of audio coding described in any of the embodiments. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.

The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, or any combination thereof. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one embodiment of the present application, a computer device includes a processor and a memory, the memory storing computer instructions, wherein: the processor operates the computer instructions to perform the method of detecting and eliminating noise or the method of audio encoding described in any of the embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are merely examples, which are not intended to limit the scope of the present disclosure, and all equivalent structural changes made by using the contents of the specification and the drawings, or any other related technical fields, are also included in the scope of the present disclosure.

Claims

1. A method of detecting and canceling noise, comprising:

acquiring an original frequency domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by a coder/decoder with improved discrete cosine transform processing;

calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficient;

setting a noise updating speed parameter according to the pseudo-spectrum flatness under the condition that the pseudo-spectrum flatness is larger than a preset threshold;

calculating the noise energy of the current frame audio according to the noise updating speed parameter; and

and calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performing subsequent encoding or decoding process according to the updated spectral coefficient.

2. The method for detecting and removing noise according to claim 1, wherein the obtaining of the original frequency-domain spectral coefficients corresponding to the current frame audio in the encoding or decoding process of the audio by the codec with modified discrete cosine transform process comprises:

and in the standard encoding or decoding process of the encoder and the decoder, performing improved discrete cosine transform on the current frame audio to obtain the original frequency domain spectral coefficient.

3. The method for detecting and removing noise according to claim 1, wherein said calculating the pseudo-spectral flatness of the current frame audio according to the original frequency-domain spectral coefficients comprises:

calculating a pseudo spectrum corresponding to the current frame audio according to the original frequency domain spectral coefficient;

calculating and obtaining a pseudo-spectrum geometric mean value and a pseudo-spectrum arithmetic mean value corresponding to the current frame audio according to the pseudo-spectrum;

and calculating to obtain the pseudospectral flatness according to the pseudospectral geometric mean and the pseudospectral arithmetic mean.

4. The method for detecting and eliminating noise according to claim 1, wherein the setting a noise update speed parameter according to the pseudo-spectral flatness if the pseudo-spectral flatness is greater than a preset threshold comprises:

and setting the noise updating speed parameter according to the size of the pseudo-spectrum flatness, wherein the size of the noise updating speed parameter is in direct proportion to the size of the pseudo-spectrum flatness.

5. The method for detecting and removing noise according to claim 4, wherein the setting of the noise update speed parameter according to the pseudo-spectral flatness if the pseudo-spectral flatness is greater than a preset threshold further comprises:

under the condition that the pseudo-spectrum flatness is larger than a preset threshold, judging whether the pseudo-spectrum flatness corresponding to one audio frame is smaller than the preset threshold or not in a preset number of frames of audio before the current frame of audio;

if not, setting the noise updating speed parameter according to the size of the pseudo-spectral flatness corresponding to the current frame audio.

6. The method for detecting and removing noise according to claim 1, wherein said calculating the noise energy of the current frame audio according to the noise update speed parameter comprises:

selecting a non-voice frequency band of the current frame audio, and calculating a pseudo-spectral median corresponding to the current frame audio;

and calculating to obtain the current frame moving average energy corresponding to the current frame audio according to the pseudo-spectral median, the noise updating speed parameter and the moving average energy of the previous frame audio, and using the current frame moving average energy as the noise energy of the current frame audio.

7. An apparatus for detecting and canceling noise, comprising:

the frequency domain spectral coefficient acquisition module is used for acquiring an original frequency domain spectral coefficient corresponding to the current frame audio in the process of encoding or decoding the audio by the codec with the improved discrete cosine transform processing;

the pseudo-spectral flatness calculation module is used for calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients;

the noise updating speed parameter determining module is used for setting a noise updating speed parameter according to the pseudo-spectrum flatness under the condition that the pseudo-spectrum flatness is larger than a preset threshold;

the noise energy calculation module calculates the noise energy of the current frame audio according to the noise updating speed parameter; and

and the spectral coefficient updating module is used for calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy, and performing a subsequent encoding or decoding process according to the updated spectral coefficient.

8. An audio encoding method, comprising:

in the process of coding the audio by the coder with improved discrete cosine transform processing, obtaining an original frequency domain spectral coefficient corresponding to the current frame audio;

calculating the pseudo-spectral flatness of the current frame audio according to the original frequency domain spectral coefficients;

calculating to obtain an updated spectral coefficient of the current frame audio according to the original frequency domain spectral coefficient and the noise energy;

and updating other encoding modules in the encoder according to the updated spectral coefficients, and continuously encoding the current frame audio by using the updated encoding modules.

9. A computer-readable storage medium storing a computer program, wherein the computer program is operative to perform the method of detecting and removing noise of any one of claims 1 to 6 or the audio encoding method of claim 8.

10. A computer device comprising a processor and a memory, the memory storing a computer program, wherein: the processor operates the computer program to perform the method of detecting and removing noise of any one of claims 1-6 or the audio encoding method of claim 8.