CN112259116B

CN112259116B - Noise reduction method and device for audio data, electronic equipment and storage medium

Info

Publication number: CN112259116B
Application number: CN202011098018.1A
Authority: CN
Inventors: 吴威麒; 张金亮; 高华; 许一峰
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2024-03-15
Anticipated expiration: 2040-10-14
Also published as: CN112259116A

Abstract

The embodiment of the disclosure discloses a noise reduction method and device for audio data, electronic equipment and a storage medium, wherein the method comprises the following steps: carrying out molecular band processing on the audio frame data, wherein the frequency intervals of all the sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as a first sub-band; inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band; based on the gain of the first sub-band, determining the gains of other sub-bands except the first sub-band, and respectively carrying out noise reduction treatment on the other sub-bands according to the gains of the other sub-bands; and determining the noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands. The noise reduction model is used for reducing noise of the first sub-band, and the gain mapping is used for reducing noise of other sub-bands, so that the training efficiency of the model can be improved, and the noise reduction efficiency of the audio data can be improved.

Description

Noise reduction method and device for audio data, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a noise reduction method and device for audio data, electronic equipment and a storage medium.

Background

In recent years, noise reduction of audio data using a deep learning model is one of the main trends of noise reduction. The model obtained by training the audio data sample with the low frequency sampling rate can be applied to noise reduction of the audio data with the low frequency sampling rate, and the model obtained by training the audio data sample with the high frequency sampling rate can be applied to noise reduction of the audio data with the high frequency sampling rate.

The existing noise reduction method at least comprises the following technical problems: when the sampling rate of the audio data sample is higher, the model training needs longer sample points, more model parameters and more complex model structures, so that the model training efficiency is lower; moreover, when the noise reduction is performed on the audio data based on the model, the noise reduction process is correspondingly time-consuming, so that the noise reduction efficiency of the audio data is low.

Disclosure of Invention

The embodiment of the disclosure provides a noise reduction method, device, electronic equipment and storage medium for audio data, which can not only improve model training efficiency, but also improve noise reduction efficiency for the audio data.

In a first aspect, an embodiment of the present disclosure provides a noise reduction method for audio data, including:

carrying out molecular band processing on the audio frame data, wherein the frequency intervals of all the sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as a first sub-band;

inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band;

based on the gain of the first sub-band, determining the gains of other sub-bands except the first sub-band, and respectively carrying out noise reduction treatment on the other sub-bands according to the gains of the other sub-bands;

and determining the noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands.

In a second aspect, an embodiment of the present disclosure further provides a noise reduction apparatus for audio data, including:

the sub-band module is used for carrying out sub-band processing on the audio frame data, wherein the frequency intervals of all the sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as a first sub-band;

the first noise reduction module is used for inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band;

The second noise reduction module is used for determining the gains of other sub-bands except the first sub-band based on the gain of the first sub-band and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands;

the noise reduction data determining module is used for determining noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands.

In a third aspect, embodiments of the present disclosure further provide an electronic device, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of noise reduction of audio data as described in any of the embodiments of the present disclosure.

In a fourth aspect, the presently disclosed embodiments also provide a storage medium containing computer-executable instructions that, when executed by a computer processor, are used to perform a method of noise reduction of audio data as described in any of the presently disclosed embodiments.

According to the technical scheme, the audio frame data are subjected to molecular band processing, wherein frequency intervals of all sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as a first sub-band; inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band; based on the gain of the first sub-band, determining the gains of other sub-bands except the first sub-band, and respectively carrying out noise reduction treatment on the other sub-bands according to the gains of the other sub-bands; and determining the noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands.

The sound frequency range to which the human ear is most sensitive is generally considered to be the lower frequency range. Based on this, in the technical solution of the embodiment of the disclosure, the audio data after framing is divided into each sub-band, only the first sub-band corresponding to the low frequency section is denoised by the denoising model, and the gains of other sub-bands are mapped according to the gain of the first sub-band output by the denoising model, so as to denoise the other sub-bands. Compared with the traditional method for reducing the noise of the whole frequency band by using the noise reduction model, the method reduces the overall noise reduction time consumption and improves the noise reduction efficiency.

In addition, the noise reduction model is only applied to the first sub-band noise reduction, but not the full-band noise reduction, and can be trained by only passing through the sample data of the first sub-band, so that the model training efficiency can be improved. Especially, under the condition of model training for high-sampling-rate audio data, model parameters and model complexity can be greatly reduced and model training efficiency can be improved by training only sample data of a first sub-band.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a flowchart of a method for noise reduction of audio data according to an embodiment of the disclosure;

fig. 2 is a flowchart of a noise reduction method for audio data according to a second embodiment of the disclosure;

fig. 3 is a flowchart of a noise reduction method for audio data according to a third embodiment of the disclosure;

fig. 4 is a schematic structural diagram of a noise reduction device for audio data according to a fifth embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to a sixth embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

Example 1

Fig. 1 is a flowchart of a noise reduction method for audio data according to an embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to noise reduction of audio data, and is particularly applicable to real-time noise reduction of audio data with high sampling rate. The method may be performed by a noise reduction device for audio data, which may be implemented in software and/or hardware, which may be configured in an electronic device, such as a computer.

As shown in fig. 1, the noise reduction method for audio data provided in this embodiment includes:

s110, carrying out sub-band processing on the audio frame data, wherein the frequency intervals of all the sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as the first sub-band.

In the embodiment of the disclosure, the audio data to be noise-reduced may be an audio data stream collected in real time or an audio data file after collection.

Generally, before noise reduction is performed on audio data, frame processing may be performed on the audio data to obtain audio frame data. The framing processing is to be understood as processing sampling points in the audio data in a segmentation manner, and may be based on the existing Matlab framing code or other framing modes, which is not exhaustive herein. By framing the audio data, the method is not only beneficial to real-time processing of the audio data stream and noise reduction after the audio data acquisition is completed, but also beneficial to simultaneous noise reduction of all audio frame data in the audio data file so as to improve noise reduction efficiency.

When audio frame data is acquired, sub-band processing may be performed on each frame of data. Wherein different sub-bands of the audio frame data may be considered as data portions of different frequency bins contained in the audio frame data. The audio frame data may be parsed into data portions of different frequency bins based on a time-domain and/or frequency-domain processing method.

Wherein the frequency interval of each sub-band of the audio data is different, it is understood that the end point values of the frequency interval of each sub-band are not completely identical. The frequency intervals of the sub-bands may be overlapping, sequentially adjacent or non-adjacent, etc. For example, audio frame data with a sampling rate of 48khz, and a frequency range of 0hz to 24khz, the frequency bins corresponding to the sub-bands of the audio frame data may overlap, for example, 0hz to 8khz, 7khz to 16khz, and 15khz to 24khz; it may also be the case that they are adjacent one another in succession, for example 0hz to 8khz, 8khz to 16khz and 16khz to 24khz; it may also be the case that they are not adjacent, for example, 0hz-8khz, 9khz-16khz, 17khz-24khz, etc.

In some preferred implementations of the embodiments of the present disclosure, the frequency bins of the respective sub-bands of the audio frame data are adjacent in sequence, and the right-hand end point value of the frequency bin of the preceding sub-band may be considered to be equal to the left-hand end point value of the frequency bin of the following sub-band. In the preferred implementation modes, the frequency intervals of the sub-bands are adjacently arranged, so that the repeated noise reduction processing of the audio frame data of the overlapped frequency areas is avoided, the audio frame data of some frequency areas can not be omitted during noise reduction, the noise reduction of the whole band of the audio frame data can be completed, and the noise reduction effect can be improved to a certain extent.

Among the frequency bins with the smallest maximum bin value, the frequency bin corresponding to each subband can be considered as the opposite low frequency bin. The low frequency range relative to the frequency range may be described in terms of a frequency range having the smallest range minimum value, a frequency range having the smallest range intermediate value, or the like, and is not intended to be exhaustive.

The sound frequency range to which the human ear is most sensitive is 200hz to 800hz, which is generally considered to be a lower frequency range than the sound frequency range of 20hz to 20000 hz. Therefore, after the audio frame data is sub-band, the sub-band corresponding to the frequency interval with the minimum interval maximum value is used as the first sub-band, so that the first sub-band corresponding to the relatively low frequency interval and containing the frequency range sensitive to human ears is selected from the sub-bands, and a foundation is laid for the subsequent noise reduction of the first sub-band through a noise reduction model.

S120, inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band.

In the embodiment of the disclosure, the noise reduction model may be, for example, a cyclic neural network noise suppression (Recurrent Neural Network Noise, RNNoise) model, a convolutional neural network (Convolutional Neural Networks, CNN) model, or the like, and other audio noise reduction models may be applied thereto, which is not exhaustive herein. The noise reduction model can be obtained through pre-training, noise reduction of the first sub-band of the audio frame data can be achieved based on the noise reduction model, and meanwhile gains of the first sub-band can be output.

For each frame of audio data, the gain of the first subband may be considered as a set of gains corresponding to each frequency point in the frequency interval corresponding to the first subband, where the gain corresponding to each frequency point is a gain value. The gain value may be, for example, frequency domain information after noise reduction, and a duty ratio value of the frequency domain information before noise reduction. Wherein the gain is generally in the range of 0-1, and the closer the gain is to 0, the more noise the audio frame data is considered to be, the more the audio frame data needs to be eliminated.

Compared with the traditional noise reduction processing of the full-band data of the audio data, the noise reduction processing is carried out only on the first sub-band by utilizing the noise reduction model, so that the overall noise reduction time consumption of the audio data can be reduced, the noise reduction efficiency is improved, and the advantages are very obvious in the application of real-time noise reduction of the audio data.

In addition, the higher the sampling rate is, the more complex the structure of the traditional noise reduction model is, the more calculation resources are consumed in the noise reduction process, so that the noise reduction of the audio data is difficult to fall to a terminal with limited calculation resources such as a notebook or a mobile phone, and the application of actual engineering is not facilitated. The noise reduction model provided by the embodiment of the disclosure can reduce noise of only the first sub-band of the low-frequency interval when noise reduction is performed on the audio data with high sampling rate, so that the complexity of the model is greatly reduced, the consumed computing resources are also greatly reduced, and the noise reduction model is beneficial to falling to terminals with limited computing resources and is beneficial to application of actual engineering.

In some optional implementations of embodiments of the present disclosure, a training manner of a noise reduction model includes: acquiring sample frame data which are the same as the frequency interval of the first sub-band and target frequency domain information of the sample frame data; and training the noise reduction model by utilizing the frequency domain information of the sample frame data and the target frequency domain information until the noise reduction model converges.

When the noise reduction model is trained, the required sample data can be the same as the frequency range of the audio data to be noise reduced, or can be different from the frequency range of the audio data to be noise reduced, and the sample frame data and the first sub-band only need to be ensured to have the same frequency interval after the sample data is subjected to framing and/or molecular band processing.

Typically, the sampling rate of the audio data is twice as high as its frequency maximum to meet the acquisition requirements. The sample rate of existing mostly open source audio data sets is low, e.g. 16khz. When the maximum frequency of the audio data needing noise reduction is high, the sampling rate is high, for example, the sampling rate reaches 32khz, 48khz and the like, if the traditional noise reduction model training is performed by using full-band sample data, not only is the audio data set without support of an open source, but also a great deal of time is required to perform sample data acquisition with high sampling rate, and more sample points, more model parameters and more complex model structures are required in the training process, so that the model training efficiency is low.

In these alternative embodiments of the present disclosure, the existing open source audio data set with a lower sampling rate may be utilized to frame the sample data in the data set, and/or the molecular band process may be used to make the sample frame data and the first sub-band have the same frequency interval, so that the noise reduction model training may be performed to apply to noise reduction of the audio data with a high sampling rate. Therefore, time is not required to be consumed for collecting sample data with high sampling rate, and training is only carried out according to sample frame data of a frequency interval of the first sub-band, so that model parameters and model complexity can be greatly reduced, and model training efficiency is improved.

S130, based on the gain of the first sub-band, determining the gains of other sub-bands except the first sub-band, and respectively carrying out noise reduction processing on the other sub-bands according to the gains of the other sub-bands.

In the embodiment of the disclosure, since the frequency intervals of each subband are sequentially adjacent, and the correlation of the gains of adjacent frequency points is strong, the gains of each frequency point in the frequency intervals corresponding to other subbands can be determined based on the gain of the first subband. The method for determining the gains of the other subbands except the first subband based on the gain of the first subband may be, for example, sequentially determining the gains of the frequency points of the other subbands according to a gain mapping relationship between adjacent frequency points, or determining the gains of the other subbands according to the gain of the frequency point having a larger frequency value in the frequency interval of the first subband, or the like.

In some optional implementations of embodiments of the present disclosure, determining gains for subbands other than the first subband based on the gain for the first subband includes: and integrating the average gain from the preset frequency point value to the maximum value of the interval in the frequency interval of the first sub-band and the judgment probability that the first sub-band output by the noise reduction model belongs to the preset category, and determining the gains of other sub-bands except the first sub-band.

The average gain from the preset frequency point value to the maximum value of the interval can be considered as the average value of the gains corresponding to the frequency points in the frequency range from the preset frequency point value to the maximum value of the interval. Since the gain of the frequency point with a larger frequency value in the frequency interval corresponding to the first sub-band has a higher correlation with the gain of other sub-bands corresponding to the opposite high frequency interval. Therefore, the average gain from the frequency bin of the first subband to the maximum value of the bin can be preset as one of factors for determining the gains of other subbands.

The noise reduction model may perform noise reduction on the frequency information of the first sub-band, and may further perform detection on the effective sound type of the audio frame data, where the effective sound type is, for example, a human sound type, or an instrument sound type. When the value of the decision probability of the preset category is larger, the probability that the audio frame data belongs to noise can be considered smaller, and the corresponding gain is larger. Because the same audio frame data has higher relevance of the judgment probability that each sub-band belongs to the preset category. Therefore, the decision probability of the first sub-band belonging to the preset category can be used as one of the factors for determining the gains of other sub-bands.

In these optional embodiments, by integrating the average gain from the preset frequency point value to the interval maximum value in the frequency interval of the first subband and the decision probability that the first subband output by the noise reduction model belongs to the preset class, the gains of other subbands with better noise reduction effect can be determined.

S140, according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands, determining the noise-reduced audio frame data.

In the embodiment of the disclosure, the time domain information of the noise-reduced audio frame data can be obtained according to the frequency domain information after the noise reduction of the first sub-band and the noise reduction processing result of other sub-bands. In addition, the noise-reduced audio frame data can be synthesized to be restored into the noise-reduced audio data stream or audio data file, so that the time-domain audio data stream or audio file can be played conveniently.

In some optional implementations of the embodiments of the present disclosure, the noise reduction method of audio data is applied to noise reduction of voice data, and the corresponding frequency interval with the smallest interval maximum value includes a human voice frequency interval.

In these alternative embodiments, the noise reduction method may be applied to noise reduction of voice data, for example, in a noisy communication environment, noise reduction may be performed on a voice stream input by a communication person in real time, noise reduction may be performed on a recorded voice file, or the like. In addition, when the method is applied to noise reduction of voice data, a frequency interval with the minimum maximum value of the interval, namely an interval containing the frequency of human voice in the voice data, is needed, so that the noise reduction effect of the voice data is improved.

According to the technical scheme, the audio data after framing is divided into the sub-bands, only the first sub-band corresponding to the low frequency interval is subjected to noise reduction through the noise reduction model, and the gains of other sub-bands are mapped according to the gain of the first sub-band output by the noise reduction model so as to reduce the noise of the other sub-bands. Compared with the traditional method for reducing the noise of the whole frequency band by using the noise reduction model, the method reduces the overall noise reduction time consumption and improves the noise reduction efficiency.

Example two

Embodiments of the present disclosure may be combined with each of the alternatives in the noise reduction method of audio data provided in the above embodiments. The noise reduction method for the audio data provided by the embodiment optimizes the steps of carrying out the molecular band processing on the time domain, correspondingly synthesizing the noise reduced audio frame data on the time domain, and the like, and enriches the molecular band processing modes. In addition, the amount of calculation of the noise reduction model can be reduced by performing the downsampling process, the noise reduction efficiency can be improved to a certain extent, and the sound quality of the audio frame data can be ensured by performing the upsampling process.

Fig. 2 is a flowchart of a noise reduction method for audio data according to a second embodiment of the disclosure. Referring to fig. 2, the noise reduction method for audio data provided in this embodiment includes:

s210, carrying out sub-band processing on the audio frame data in a time domain, wherein the frequency intervals of all the sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is taken as a first sub-band.

The processing of the audio frame data in the time domain by the molecular band may include: the audio frame data is sub-band processed by an analysis filter bank, wherein the pass bands of the individual filters in the analysis filter bank are different.

The filters in the analysis filter bank may be quadrature mirror filters, discrete cosine modulated filters, or other filters applicable to subband processing, which is not intended to be exhaustive. The number of filters in the filter bank may be equal to the number of subbands required by the user, and the pass band of each filter may correspond to a frequency interval required by each subband, for example, may be respectively equal to a frequency interval required by each subband. By respectively convolving the audio frame data of the time domain with each filter, the time domain information of different sub-bands can be obtained, so as to realize the division of different sub-bands.

S220, inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band.

In the embodiment of the disclosure, after obtaining the time domain information of each sub-band, only the time domain information of the first sub-band may be subjected to time-frequency transformation processing, without transforming the time domain data of other sub-bands. The time domain information of the first sub-band may be fourier transformed to obtain frequency domain information of the first sub-band. By using the noise reduction model, the noise reduction of the frequency domain information of the first sub-band can be realized, and the gains corresponding to the frequency points in the frequency domain section corresponding to the first sub-band are output.

S230, based on the gain of the first sub-band, the gains of other sub-bands except the first sub-band are determined, and the time domain information of the other sub-bands is subjected to noise reduction processing according to the gains of the other sub-bands.

After obtaining the gains of other sub-bands, the corresponding gains can be directly multiplied by the time domain information of the other sub-bands, so as to realize noise reduction processing on the time domain information of the other sub-bands.

S240, converting the frequency domain information after noise reduction of the first sub-band into time domain information, and synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands to obtain the time domain information of the audio frame data after noise reduction.

The frequency domain information after the noise reduction of the first sub-band may be converted into time domain information by inverse fourier transform. The time domain information of the noise-reduced audio frame data can be obtained by synthesizing the time domain information after noise reduction of each sub-band.

The synthesizing the time domain information after the noise reduction of the first sub-band and the time domain information after the noise reduction of other sub-bands may include: and synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands through a comprehensive filter bank. Wherein, each filter in the comprehensive filter bank needs to be corresponding to each filter in the analysis filter bank to realize accurate synthesis of the audio frame data after the molecular band.

In some further implementations of embodiments of the present disclosure, further comprising: downsampling the audio frame data before sub-band processing of the audio frame data by the analysis filter bank; or, after the audio frame data is subjected to the sub-band processing through the analysis filter bank, the time domain information of the first sub-band is subjected to the downsampling processing;

correspondingly, before synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands through the comprehensive filter bank, carrying out up-sampling processing on the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of the other sub-bands; or after the time domain information after the noise reduction of the first sub-band and the time domain information after the noise reduction of other sub-bands are synthesized through the synthesis filter bank, the up-sampling processing is performed on the time domain information of the noise-reduced audio frame data.

As known from Nobel equation, the downsampling process is performed on the audio frame data first, and then the sub-band process is performed on the audio frame data through the analysis filter bank, which may be equivalent to the downsampling process performed on the time domain information of the first sub-band first. Similarly, the up-sampling processing is performed on the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands, and then the up-sampled time domain information is synthesized by the synthesis filter bank, which may be equivalent to the up-sampling processing performed on the synthesized time domain information after the synthesis of the time domain information after noise reduction of each sub-band by the synthesis filter bank.

In these further implementations, by the downsampling operation, the amount of data to be calculated by the noise reduction model can be reduced, so that the filtering efficiency of the noise reduction model can be improved to some extent. Through the up-sampling operation, the tone quality of the audio frame data can be restored to a certain extent, and the hearing experience of a user when the audio frame data is played is ensured.

For example, when the number of subbands required by the user is 3, the process of denoising the audio frame data x (n), n=1, 2,..l (where L is the total number of sample points) may be:

First, the filter bank may be designed for the number of subbands, and for the frequency range of the first subband.

For example, 3 discrete cosine modulated filters may be used to sub-band process the audio frame data, and the expression for each filter may be:

wherein h is ₀ (n) can represent a prototype low-pass filter, h ₁ (n)、h ₂ (n) and h ₃ (n) may represent 3 discrete cosine modulated filters, respectively; where N may be equal to the number of filters and the filter length may be len.

At the same time, audio frame data may be downsampled.

For example, the audio frame data may be subjected to downsampling processing by M times, and the expression of the downsampling processing may be:

wherein dec (m, n) can represent time domain information of an nth sampling point of an mth group after the next sample; m may be equal to the number of subbands.

Second, a filter h can be utilized ₁ (n)、h ₂ (n) and h ₃ (n) respectively carrying out molecular band processing on the audio frame data after each group of downsampling processing, wherein the expression of the molecular band processing can be as follows:

x(m,n)＝dec(m,n)×h _m (n)；

where x (m, n) may represent time domain information of an nth point of the mth subband. The time domain information of each sub-band can be obtained by performing time domain convolution on each set of downsampled time domain information with each filter in the analysis filter bank, respectively.

Again, only the time domain information of the first sub-band may be madeThe fast fourier transform (fast Fourier transform, FFT) of the individual points, the other subbands are not used for this process. And the FFT transformation of the time domain information of the first subband can be simplified as:

wherein S (1, k) may represent frequency domain information corresponding to a kth point in the first subband; where x (1, n) may represent time domain information of an nth point of a 1 st subband (i.e., a first subband); wherein,it may be indicated that the first subband is FFT transformed.

Then, S (1, k) is input into a noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band; based on the gain of the first sub-band, determining the gains of other sub-bands except the first sub-band, and respectively carrying out noise reduction processing on the time domain information of the other sub-bands according to the gains of the other sub-bands; and converting the frequency domain information after noise reduction of the first sub-band into time domain information.

And then, synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands through a comprehensive filter bank.

Wherein each filter in the synthesis filter bank needs to correspond to each filter in the analysis filter bank. For example, 3 discrete cosine modulated filters may be used as each filter in the synthesis filter bank, and the expression of each filter in the synthesis filter bank may be:

Wherein, the filter f in the comprehensive filter bank ₁ (n)、f ₂ (n) and f ₃ (n), and h ₁ (n)、h ₂ (n) and h ₃ (n) respectively correspond to. Through f ₁ (n)、f ₂ (n) and f ₃ (n) filtering the time domain information after noise reduction of each sub-band, and the expression of the filtering may be:

syn(m,n)＝out(m,n)×f _m (n)；

wherein out (m, n) may represent time domain information of an nth sampling point of the mth sub-band after noise reduction; syn (m, n) may represent the time domain information after the mth subband n employs point filtering. The time domain information of each sub-band is convolved with the corresponding sub-band synthesis filter, so that a foundation can be laid for synthesizing the noise-reduced audio frame data.

Finally, syn (M, n) is subjected to up (M, n) =sample (syn (M, n)) by M times, and audio frame data after final noise reduction enhancement is obtained through synthesis, and the synthesized expression may be:

where enh (n) may represent the final enhanced audio data.

According to the technical scheme, the steps of carrying out molecular band processing on the time domain, correspondingly synthesizing the noise-reduced audio frame data on the time domain and the like are optimized, and the molecular band processing mode is enriched. In addition, the amount of calculation of the noise reduction model can be reduced by performing the downsampling process, the noise reduction efficiency can be improved to a certain extent, and the sound quality of the audio frame data can be ensured by performing the upsampling process. In addition, the noise reduction method of the audio data provided by the embodiment of the present disclosure belongs to the same technical concept as the noise reduction method of the audio data provided by the above embodiment, technical details which are not described in detail in the present embodiment can be seen in the above embodiment, and the same technical features have the same beneficial effects in the present embodiment as in the above embodiment.

Example III

Embodiments of the present disclosure may be combined with each of the alternatives in the noise reduction method of audio data provided in the above embodiments. The noise reduction method for the audio data provided by the embodiment optimizes the steps of carrying out the molecular band processing on the frequency domain, correspondingly synthesizing the noise reduced audio frame data on the frequency domain, and the like, and enriches the molecular band processing modes.

Fig. 3 is a flowchart of a noise reduction method for audio data according to a third embodiment of the disclosure. Referring to fig. 3, the noise reduction method for audio data provided in this embodiment includes:

s310, after converting the audio frame data into frequency domain information, carrying out sub-band processing on the frequency domain information of the audio frame data, wherein the frequency intervals of all sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as a first sub-band.

The time domain information of the audio frame data can be directly subjected to fourier transformation to obtain the frequency domain information of the full-band audio frame data. Further, performing the subband processing on the frequency domain information of the audio frame data may include: and grouping the frequency domain information of the audio frame data according to the frequency interval to obtain the frequency domain information of each sub-band.

The frequency domain information of the audio frame data may include component information of each frequency point corresponding to each sampling point after the current audio frame data is subjected to time-frequency conversion, for example, how many components each sampling point contains at frequency points of 1khz, 2khz, 3khz, 24khz, and the like. The frequency domain information of the audio frame data is grouped according to frequency intervals, which can be understood as dividing the frequency intervals according to frequency points, and the frequency domain information in each frequency interval is used as the frequency domain information of different sub-bands.

Illustratively, the frequency division intervals are divided by frequency points, for example, if the frequency range of the full frequency band is 0hz to 24khz, two frequency points of 8khz and 16khz can be used as the division points, and 0hz to 24khz can be divided into three frequency intervals of 0hz to 8khz, 8khz to 16khz and 16khz to 24 khz. The frequency domain information in each frequency interval can be regarded as frequency domain information of different subbands.

S320, inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band.

Wherein after determining the frequency domain information of the first sub-band, only the frequency domain information of the first sub-band may be input to the noise reduction model without inputting the frequency domain data of the other sub-bands to the noise reduction model.

S330, based on the gain of the first sub-band, the gains of other sub-bands except the first sub-band are determined, and the noise reduction processing is performed on the frequency domain information of the other sub-bands according to the gains of the other sub-bands.

After obtaining the gains of other sub-bands, the frequency domain information of the other sub-bands can be directly utilized to multiply the corresponding gains so as to realize the noise reduction processing of the frequency domain information of the other sub-bands.

And S340, splicing the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands to obtain the frequency domain information after noise reduction of the audio frame data.

The method for splicing the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands comprises the following steps: and splicing the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands according to the frequency interval.

Each frequency domain information is sub-band-wise performed in each frequency domain section, and the frequency domain information of each sub-band after noise reduction is spliced again in each divided frequency domain section, so that the frequency domain information of the full frequency band after noise reduction of the audio frame data can be obtained.

S350, converting the frequency domain information after the noise reduction of the audio frame data into time domain information, and obtaining the time domain information of the frame data of the noise reduced audio data.

The inverse fourier transform may be used to transform the frequency domain information of the audio frame data after noise reduction into time domain information, so as to obtain the time domain information of the audio frame data after noise reduction.

Illustratively, when the number of subbands required by the user is 3, the process of denoising the audio frame data x (n), n=1, 2,..l (where L is the total number of sample points) may also be:

first, the FFT of L points is performed on the audio frame data x (n), which can be simply expressed as:

wherein X (k) is frequency domain information of a kth point in the audio frame data.

Next, X (k) is grouped according to frequency intervals to obtain frequency domain information of each subband, which can be expressed simply as:

wherein M may represent the number of subbands; s (m, k) may represent frequency domain information corresponding to a kth point in an mth subband.

Thirdly, S (1, k) can be input into the noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first sub-band and the gain of the first sub-band; and determining the gains of other sub-bands except the first sub-band based on the gain of the first sub-band, and respectively carrying out noise reduction processing on the frequency domain information of the other sub-bands according to the gains of the other sub-bands.

Then, all the sub-bands S (m, k) can be spliced together to obtain full-band frequency domain information S (k), and the time domain information of the frame data of the noise-reduced audio data can be obtained through inverse FFT, and the inverse FFT can be simply expressed as:

enh(n)＝ifft(S(k))；

Where enh (n) may represent the final enhanced audio data.

According to the technical scheme, the steps of carrying out molecular band processing on the frequency domain, correspondingly synthesizing the noise-reduced audio frame data on the frequency domain and the like are optimized, and the molecular band processing mode is enriched. In addition, the noise reduction method of the audio data provided by the embodiment of the present disclosure belongs to the same technical concept as the noise reduction method of the audio data provided by the above embodiment, technical details which are not described in detail in the present embodiment can be seen in the above embodiment, and the same technical features have the same beneficial effects in the present embodiment as in the above embodiment.

Example IV

Embodiments of the present disclosure may be combined with each of the alternatives in the noise reduction method of audio data provided in the above embodiments. According to the noise reduction method for the audio data, the step of determining the gains of other sub-bands is optimized, mapping from the gain of the first sub-band to the gain of the other sub-bands can be achieved, and a foundation is laid for noise reduction processing of the other sub-bands.

In some optional implementations of the embodiments of the present disclosure, integrating an average gain from a preset frequency point value to a maximum value of the interval in a frequency interval of the first subband and a decision probability that the first subband output by the noise reduction model belongs to a preset class, determining gains of other subbands except the first subband includes:

Presetting an average gain from a frequency point value to a maximum value of a frequency interval in a frequency interval of a first sub-band as a first gain factor; determining the judgment probability of other sub-bands except the first sub-band belonging to a preset category according to the first gain factor and the judgment probability of the first sub-band belonging to the preset category output by the noise reduction model; determining a second gain factor of other sub-bands according to the judgment probability that the other sub-bands belong to the preset category; and integrating the first gain factor, the second gain factor and the judgment probability that other subbands belong to a preset category to determine the gains of other subbands except the first subband.

Illustratively, it is assumed that in the current audio frame data, the gain of the kth frequency of the mth subband is denoted as G (m, k), and the gain G (1, k) of the first subband may be simply denoted as G (k) for convenience of description.

Since the gain of the frequency point with a larger frequency value in the frequency interval corresponding to the first sub-band has a higher correlation with the gain of other sub-bands corresponding to the opposite high frequency interval. Therefore, the average gain from the frequency point value to the maximum value of the interval in the frequency interval of the first sub-band can be preset as a first gain factor for determining the gains of other sub-bands, so that smooth transition between the sub-bands after noise reduction according to the gains can be ensured.

The calculation formula of the first gain factor can be simplified as:

avgGainH may represent a first gain factor; nFFT is a constant value and can represent the total number of frequency points after time-frequency conversion; bw is also a constant value, and can represent the number of frequency points from a preset frequency point value to a maximum value of an interval, and is typically an empirical value, for example, a value in the range of 1/4 to 1/3 of nFFT.

The decision probability of the first sub-band estimated by the noise reduction model belonging to the preset category may be considered as the decision probability of the current audio frame data belonging to the preset category, and may be denoted as vad.

The determining the decision probability that the other sub-bands except the first sub-band belong to the preset class according to the first gain factor avgGainH and the decision probability vad that the first sub-band output by the noise reduction model belongs to the preset class may be determined, for example, based on the following formula:

the avgProbH may represent a decision probability that other subbands than the first subband belong to a preset class. The avgProbH may be obtained by determining a weighted sum of the avgGainH and vad, in addition to the square root of the avgGainH and vad, and other ways of fusing the avgGainH and vad to determine the decision probabilities that the other sub-bands except the first sub-band belong to the preset class may be applied thereto, which is not particularly limited herein.

The second gain factor of the other sub-band is determined according to the decision probability avgProbH of the other sub-band belonging to the preset category, for example, may be determined based on the following formula:

gainH may represent a second gain factor for the other sub-bands, and gainH and avgProbH need to be positively correlated, and the range of values need to be [0,1]. In addition to determining gainH by hyperbolic tangent tanh, other ways of determining gainH that can satisfy gainH positive correlation with avgProbH and that has a gainH range of [0,1] are also applicable herein, and are not specifically limited herein.

The gain of the other sub-bands except the first sub-band is determined by integrating the first gain factor avgGainH, the second gain factor gainH, and the decision probability avgProbH that the other sub-bands belong to a preset class, for example, the gain of the other sub-bands except the first sub-band may be determined based on the following formula:

where gain may represent the gain of other subbands than the first subband. Because the relative high-frequency intervals corresponding to other sub-bands except the first sub-band contain less component information of effective sound, the noise reduction requirements of other sub-bands can be met by setting the gains of all frequency points in the other sub-bands to the same gain value.

According to the technical scheme, the step of determining the gains of other sub-bands is optimized, mapping from the gain of the first sub-band to the gains of the other sub-bands can be achieved, and a foundation is laid for noise reduction processing of the other sub-bands. In addition, the noise reduction method of the audio data provided by the embodiment of the present disclosure belongs to the same technical concept as the noise reduction method of the audio data provided by the above embodiment, technical details which are not described in detail in the present embodiment can be seen in the above embodiment, and the same technical features have the same beneficial effects in the present embodiment as in the above embodiment.

Example five

Fig. 4 is a schematic structural diagram of a noise reduction device for audio data according to a fifth embodiment of the disclosure. The noise reduction device for audio data provided by the embodiment is suitable for the situation of noise reduction of audio data, and is particularly suitable for the situation of real-time noise reduction of audio data with high sampling rate.

As shown in fig. 4, the noise reduction apparatus of audio data includes:

a subband module 410, configured to perform subband processing on the audio frame data, where frequency intervals of each subband are different, and a subband corresponding to a frequency interval with a minimum interval maximum value is used as a first subband;

the first noise reduction module 420 is configured to input the frequency domain information of the first subband into a noise reduction model, so that the noise reduction model outputs the frequency domain information after noise reduction of the first subband and the gain of the first subband;

the second noise reduction module 430 is configured to determine gains of other subbands except the first subband based on the gain of the first subband, and perform noise reduction processing on the other subbands according to the gains of the other subbands, respectively;

the noise reduction data determining module 440 is configured to determine noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the result of noise reduction processing on other sub-bands.

In some optional implementations of embodiments of the present disclosure, a molecular band module includes:

the time domain molecular band sub-module is used for carrying out molecular band processing on the audio frame data in the time domain;

correspondingly, the second noise reduction module is used for respectively carrying out noise reduction treatment on the time domain information of other sub-bands according to the gains of the other sub-bands;

the noise reduction data determining module is used for converting the frequency domain information after noise reduction of the first sub-band into time domain information, and synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands to obtain the time domain information of the noise reduction audio frame data.

In some further implementations of embodiments of the present disclosure, a time-domain molecular band sub-module is specifically configured to perform sub-band processing on audio frame data through an analysis filter bank, where pass bands of filters in the analysis filter bank are different;

correspondingly, the noise reduction data determining module is specifically configured to synthesize, through the synthesis filter bank, the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of the other sub-bands.

In some further implementations of embodiments of the present disclosure, the time domain molecular band sub-module is further configured to downsample the audio frame data before sub-band processing the audio frame data by the analysis filter bank; or, after the audio frame data is subjected to the sub-band processing through the analysis filter bank, the time domain information of the first sub-band is subjected to the downsampling processing;

Correspondingly, the noise reduction data determining module is further configured to upsample the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of the other sub-bands before synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of the other sub-bands through the synthesis filter bank; or after the time domain information after the noise reduction of the first sub-band and the time domain information after the noise reduction of other sub-bands are synthesized through the synthesis filter bank, the up-sampling processing is performed on the time domain information of the noise-reduced audio frame data.

the frequency domain sub-band module is used for carrying out sub-band processing on the frequency domain information of the audio frame data after converting the audio frame data into the frequency domain information;

correspondingly, the second noise reduction module is used for respectively carrying out noise reduction treatment on the frequency domain information of other sub-bands according to the gains of the other sub-bands;

the noise reduction data determining module is used for splicing the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands to obtain the frequency domain information after noise reduction of the audio frame data; and converting the frequency domain information after the noise reduction of the audio frame data into time domain information to obtain the time domain information of the frame data of the noise reduced audio data.

In some further implementations of the embodiments of the present disclosure, the frequency domain sub-band module is specifically configured to group frequency domain information of audio frame data according to frequency intervals to obtain frequency domain information of each sub-band;

correspondingly, the noise reduction data determining module is specifically configured to splice the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands according to the frequency interval.

In some optional implementations of embodiments of the present disclosure, the second noise reduction module includes:

the gain mapping sub-module is used for integrating the average gain from the preset frequency point value to the maximum value of the interval in the frequency interval of the first sub-band and determining the gains of other sub-bands except the first sub-band according to the judgment probability that the first sub-band outputted by the noise reduction model belongs to the preset category.

In some further implementations of the embodiments of the present disclosure, the gain mapping submodule is specifically configured to preset an average gain from a frequency point value to a maximum value of a frequency interval of the first subband as a first gain factor; determining the judgment probability of other sub-bands except the first sub-band belonging to a preset category according to the first gain factor and the judgment probability of the first sub-band belonging to the preset category output by the noise reduction model; determining a second gain factor of other sub-bands according to the judgment probability that the other sub-bands belong to the preset category; and integrating the first gain factor, the second gain factor and the judgment probability that other subbands belong to a preset category to determine the gains of other subbands except the first subband.

In some optional implementations of the embodiments of the present disclosure, the frequency bins of each subband are adjacent in sequence.

In some optional implementations of the embodiments of the present disclosure, the noise reduction device for audio data is applied to noise reduction for voice data, and the corresponding frequency interval with the smallest interval maximum value includes a human voice frequency interval.

The noise reduction device for the audio data provided by the embodiment of the disclosure can execute the noise reduction method for the audio data provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that each unit and module included in the above apparatus are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present disclosure.

Example six

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., a terminal device or server in fig. 5) 500 suitable for implementing a method of noise reduction of audio data in accordance with an embodiment of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processor, a graphics processor, etc.) 501 that may perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 502 or a program loaded from a storage means 508 into a random access Memory (Random Access Memory, RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. When the computer program is executed by the processing device 501, the above-described functions defined in the noise reduction method of audio data of the embodiment of the present disclosure are performed.

The electronic device provided by the embodiment of the present disclosure and the noise reduction method of audio data provided by the above embodiment belong to the same disclosure concept, and technical details not described in detail in the embodiment of the present disclosure may be referred to the above embodiment, and the present embodiment has the same beneficial effects as the above embodiment.

Example seven

The present disclosure provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method of noise reduction of audio data provided by the above embodiments.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (EPROM) or FLASH Memory (FLASH), an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

carrying out molecular band processing on the audio frame data, wherein the frequency intervals of all the sub-bands are different, and the sub-band corresponding to the frequency interval with the smallest interval maximum value is used as a first sub-band; inputting the frequency domain information of the first sub-band into a noise reduction model so that the noise reduction model outputs the frequency domain information of the first sub-band after noise reduction and the gain of the first sub-band; based on the gain of the first sub-band, determining the gains of other sub-bands except the first sub-band, and respectively carrying out noise reduction treatment on the other sub-bands according to the gains of the other sub-bands; and determining the noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The names of the units and modules do not in some cases limit the units and modules themselves, and the first noise reduction module may be described as a "first subband noise reduction module", for example.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a field programmable gate array (Field Programmable Gate Array, FPGA), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a special standard product (Application Specific Standard Parts, ASSP), a System On Chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data, the method comprising:

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example two ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the performing molecular band processing on the audio frame data includes:

carrying out molecular band processing on the audio frame data in the time domain;

Correspondingly, the noise reduction processing is performed on other sub-bands according to the gains of the other sub-bands, including: respectively carrying out noise reduction treatment on time domain information of other sub-bands according to gains of the other sub-bands;

the determining the noise-reduced audio frame data according to the frequency domain information after the noise reduction of the first sub-band and the noise reduction processing result of other sub-bands includes:

converting the frequency domain information after noise reduction of the first sub-band into time domain information, and synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands to obtain the time domain information of the audio frame data after noise reduction.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example three ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the performing the subband processing on the audio frame data in the time domain includes:

carrying out molecular band processing on the audio frame data through an analysis filter bank, wherein the pass bands of all filters in the analysis filter bank are different;

correspondingly, the synthesizing the time domain information after the noise reduction of the first sub-band and the time domain information after the noise reduction of other sub-bands includes: and synthesizing the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands through a comprehensive filter bank.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example four ], further comprising:

downsampling the audio frame data before the audio frame data is subjected to sub-band processing by the analysis filter bank; or after the audio frame data is subjected to molecular band processing through the analysis filter bank, carrying out downsampling processing on time domain information of a first sub-band;

correspondingly, before the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands are synthesized by the comprehensive filter bank, the time domain information after noise reduction of the first sub-band and the time domain information after noise reduction of other sub-bands are up-sampled; or after the time domain information after the noise reduction of the first sub-band and the time domain information after the noise reduction of other sub-bands are synthesized by the synthesis filter bank, carrying out up-sampling processing on the time domain information of the noise-reduced audio frame data.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example five ], further comprising:

After converting the audio frame data into frequency domain information, carrying out molecular band processing on the frequency domain information of the audio frame data;

correspondingly, the noise reduction processing is performed on other sub-bands according to the gains of the other sub-bands, including: respectively carrying out noise reduction treatment on the frequency domain information of other sub-bands according to the gains of the other sub-bands;

splicing the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands to obtain the frequency domain information after noise reduction of the audio frame data; and converting the frequency domain information after the noise reduction of the audio frame data into time domain information to obtain the time domain information of the frame data of the audio data after the noise reduction.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example six ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the performing a subband processing on the frequency domain information of the audio frame data includes:

grouping the frequency domain information of the audio frame data according to frequency intervals to obtain the frequency domain information of each sub-band;

Correspondingly, the splicing the frequency domain information after the noise reduction of the first sub-band and the frequency domain information after the noise reduction of other sub-bands includes:

and splicing the frequency domain information after noise reduction of the first sub-band and the frequency domain information after noise reduction of other sub-bands according to a frequency interval.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example seventh ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the training manner of the noise reduction model includes:

acquiring sample frame data which are the same as the frequency interval of the first sub-band and target frequency domain information of the sample frame data;

and training the noise reduction model by utilizing the frequency domain information of the sample frame data and the target frequency domain information until the noise reduction model converges.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example eight ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the determining, based on the gain of the first sub-band, the gain of the sub-band other than the first sub-band includes:

and integrating the average gain from a preset frequency point value to the maximum value of the interval in the frequency interval of the first sub-band and the judgment probability that the first sub-band output by the noise reduction model belongs to a preset category, and determining the gains of other sub-bands except the first sub-band.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example nine ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the synthesizing, in a frequency interval of the first subband, an average gain from a preset frequency point value to the interval maximum value, and a decision probability that the first subband output by the noise reduction model belongs to a preset class, determining gains of other subbands except the first subband includes:

presetting an average gain from a frequency point value to the maximum value of the interval in the frequency interval of the first sub-band as a first gain factor;

determining the judgment probability that other sub-bands except the first sub-band belong to a preset category according to the first gain factor and the judgment probability that the first sub-band output by the noise reduction model belongs to the preset category;

determining a second gain factor of other sub-bands according to the judgment probability that the other sub-bands belong to the preset category;

and integrating the first gain factor, the second gain factor and the judgment probability that other subbands belong to a preset category to determine the gains of other subbands except the first subband.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example ten ], further comprising:

in some optional implementations of the embodiments of the disclosure, the frequency bins of the subbands are adjacent in sequence.

According to one or more embodiments of the present disclosure, there is provided a noise reduction method of audio data [ example eleven ], further comprising:

in some optional implementations of the embodiments of the present disclosure, the noise reduction method of the audio data is applied to noise reduction of the voice data, and accordingly, the frequency interval with the smallest maximum value of the interval includes a voice frequency interval.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method of noise reduction of audio data, comprising:

determining the noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands;

wherein, the frequency intervals of all the sub-bands are adjacent in sequence;

the determining the gain of the other sub-bands except the first sub-band based on the gain of the first sub-band includes:

2. The method of claim 1, wherein the performing molecular band processing on the audio frame data comprises:

3. The method of claim 2, wherein the sub-band processing of the audio frame data in the time domain comprises:

4. A method according to claim 3, further comprising:

5. The method of claim 1, wherein the performing molecular band processing on the audio frame data comprises:

6. The method of claim 5, wherein said sub-band processing of the frequency domain information of the audio frame data comprises:

7. The method of claim 1, wherein the training mode of the noise reduction model comprises:

8. The method of claim 1, wherein the formula of the decision probability that the other sub-bands except the first sub-band belong to the preset class is:

wherein avgProbH is the judgment probability that other sub-bands except the first sub-band belong to a preset category; vad is the judgment probability that the first sub-band output by the noise reduction model belongs to a preset category; avgGainH is the first gain factor.

9. The method of claim 1, wherein the second gain factor for the other subband is formulated as:

Wherein gainH is a second gain factor of other subbands; avgProbH is the judgment probability that other sub-bands except the first sub-band belong to a preset category; tanh is a hyperbolic tangent function.

10. The method of claim 1, wherein the formula for the gain of the other subbands than the first subband is:

wherein gain is the gain of the other sub-bands except the first sub-band; gainH is the second gain factor for the other sub-band; avgGainH is a first gain factor; avgProbH is a decision probability that other subbands than the first subband belong to a preset class.

11. The method according to any of claims 1-10, wherein the frequency interval with the smallest maximum value of the interval, which is applied to noise reduction of voice data, comprises a human voice frequency interval.

12. A noise reduction device for audio data, comprising:

the noise reduction data determining module is used for determining noise-reduced audio frame data according to the frequency domain information after noise reduction of the first sub-band and the noise reduction processing result of other sub-bands;

wherein, the frequency intervals of all the sub-bands are adjacent in sequence;

the second noise reduction module includes:

the gain mapping sub-module is used for presetting average gain from a frequency point value to the maximum value of the interval in the frequency interval of the first sub-band as a first gain factor; determining the judgment probability that other sub-bands except the first sub-band belong to a preset category according to the first gain factor and the judgment probability that the first sub-band output by the noise reduction model belongs to the preset category; determining a second gain factor of other sub-bands according to the judgment probability that the other sub-bands belong to the preset category; and integrating the first gain factor, the second gain factor and the judgment probability that other subbands belong to a preset category to determine the gains of other subbands except the first subband.

13. An electronic device, the electronic device comprising:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of noise reduction of audio data as recited in any of claims 1-11.

14. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the method of noise reduction of audio data as claimed in any one of claims 1 to 11.