CN111477237B

CN111477237B - Audio noise reduction method and device and electronic equipment

Info

Publication number: CN111477237B
Application number: CN201910010479.XA
Authority: CN
Inventors: 刘鲁鹏
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2022-01-07
Anticipated expiration: 2039-01-04
Also published as: CN111477237A

Abstract

The invention provides an audio noise reduction method, which comprises the steps of obtaining first audio data comprising an audio frame sequence, extracting a first amplitude spectrum of each audio frame, sequentially processing the first amplitude spectrum of each audio frame to obtain a second amplitude spectrum after noise reduction, and constructing second audio data after noise reduction processing based on the second amplitude spectrum of each audio frame, wherein the step of processing the first amplitude spectrum of each audio frame comprises the step of inputting the first amplitude spectrum of the audio frame and the second amplitude spectrum of at least one audio frame before the audio frame into a noise reduction model together to obtain the second amplitude spectrum of the audio frame. The present disclosure also provides an audio noise reduction apparatus, an electronic device, and a computer-readable storage medium.

Description

Audio noise reduction method and device and electronic equipment

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an audio denoising method and apparatus, and an electronic device.

Background

With the continuous development of the deep learning field, deep learning techniques represented by deep neural networks are increasingly applied to the audio signal processing field. The deep neural network-based audio noise reduction algorithm should have a good noise suppression capability by virtue of its good nonlinear expression capability and robust generalization capability. However, the inventor finds that the effect of the existing noise reduction technology is not satisfactory, and therefore, how to improve the noise reduction effect is a problem to be solved.

Disclosure of Invention

In view of this, the present disclosure provides an audio noise reduction method, an audio noise reduction device and an electronic device.

One aspect of the present disclosure provides an audio noise reduction method, including obtaining first audio data including a sequence of audio frames, extracting a first amplitude spectrum of each audio frame, sequentially processing the first amplitude spectrum of each audio frame to obtain a noise-reduced second amplitude spectrum, and constructing noise-reduced second audio data based on the second amplitude spectrum of each audio frame, where processing the first amplitude spectrum of each audio frame includes inputting the first amplitude spectrum of the audio frame and a second amplitude spectrum of at least one audio frame preceding the audio frame into a noise reduction model together to obtain the second amplitude spectrum of the audio frame.

According to an embodiment of the present disclosure, the jointly inputting the first magnitude spectrum of the audio frame and the second magnitude spectrum of at least one audio frame preceding the audio frame into the noise reduction model to obtain the second magnitude spectrum of the audio frame includes determining the second magnitude spectra of a predetermined number of consecutive audio frames preceding and adjacent to the audio frame, merging the first magnitude spectra of the audio frame and the second magnitude spectra of the predetermined number of audio frames to obtain input data, and inputting the input data into the noise reduction model to obtain the second magnitude spectrum of the audio frame.

According to an embodiment of the present disclosure, the method further includes randomly determining at least one time period from the clean audio data, adding noise data to each of the time periods according to a randomly determined signal-to-noise ratio to obtain noisy audio data, and training the noise reduction model using the noisy audio data.

According to an embodiment of the present disclosure, the noise reduction model is a neural network that uses a linear rectification function as an activation function.

Another aspect of the present disclosure provides an audio noise reduction apparatus, including an obtaining module, an extracting module, a processing module, and a constructing module. An obtaining module is configured to obtain first audio data comprising a sequence of audio frames. And the extraction module is used for extracting the first amplitude spectrum of each audio frame. And the processing module is used for sequentially processing the first amplitude spectrum of each audio frame to obtain a second amplitude spectrum after noise reduction, wherein the processing of the first amplitude spectrum of each audio frame comprises the step of inputting the first amplitude spectrum of the audio frame and the second amplitude spectrum of at least one audio frame before the audio frame into a noise reduction model together to obtain the second amplitude spectrum of the audio frame. And the construction module is used for constructing the second audio data subjected to noise reduction processing based on the second amplitude spectrum of each audio frame.

According to the embodiment of the disclosure, the processing module comprises a determining submodule, a combining submodule and a processing submodule. A determination submodule for determining a second magnitude spectrum of a predetermined number of consecutive audio frames preceding and adjacent to the audio frame. And the merging submodule is used for merging the first amplitude spectrum of the audio frames and the second amplitude spectrum of the predetermined number of audio frames to obtain input data. And the processing submodule is used for inputting the input data into a noise reduction model to obtain a second amplitude spectrum of the audio frame.

According to an embodiment of the present disclosure, the apparatus further includes a determination module, a preparation module, and a training module. A determination module for randomly determining at least one time period from the clean audio data. And the preparation module is used for adding the noise data into each time slot according to the randomly determined signal-to-noise ratio to obtain the audio data containing the noise. A training module to train the noise reduction model using the noisy audio data.

Another aspect of the disclosure provides an electronic device comprising at least one processor and at least one memory storing one or more computer-readable instructions, wherein the one or more computer-readable instructions, when executed by the at least one processor, cause the processor to perform the method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

The method of the embodiment of the present disclosure enhances the noise reduction effect of the current frame by using the noise reduction results of several past frames.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of an audio noise reduction method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an audio noise reduction method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart for processing a first magnitude spectrum of each audio frame to obtain a noise-reduced second magnitude spectrum according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram for training a noise reduction model according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an audio noise reduction apparatus according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of a processing module according to an embodiment of the disclosure;

fig. 7 schematically shows a block diagram of an audio noise reduction apparatus according to another embodiment of the present disclosure; and

FIG. 8 schematically illustrates a block diagram of a computer system suitable for implementing an audio noise reduction apparatus according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The embodiment of the disclosure provides an audio noise reduction method, which includes obtaining first audio data including an audio frame sequence, extracting a first amplitude spectrum of each audio frame, sequentially processing the first amplitude spectrum of each audio frame to obtain a noise-reduced second amplitude spectrum, and constructing the noise-reduced second audio data based on the second amplitude spectrum of each audio frame, wherein the processing of the first amplitude spectrum of each audio frame includes inputting the first amplitude spectrum of the audio frame and the second amplitude spectrum of at least one audio frame before the audio frame into a noise reduction model together to obtain the second amplitude spectrum of the audio frame.

Fig. 1 schematically illustrates an application scenario of an audio noise reduction method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the method of the embodiment of the present disclosure processes to-be-processed first audio data 101 through a noise reduction model 110, to obtain noise-reduced second audio data 111. The noise reduction model 110 may be, for example, a neural network comprising an input layer, hidden layers, and an output layer. An activation function is arranged among all layers, so that the neural network has good adaptability in a scene with a complex rule.

Fig. 2 schematically shows a flow chart of an audio noise reduction method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, first audio data including a sequence of audio frames is acquired.

In operation S220, a first magnitude spectrum of each audio frame is extracted. Each magnitude spectrum used by the method according to embodiments of the present disclosure may be a log magnitude spectrum.

In operation S230, the first amplitude spectrum of each audio frame is sequentially processed to obtain a second amplitude spectrum after noise reduction, where the processing of the first amplitude spectrum of each audio frame includes inputting the first amplitude spectrum of the audio frame and the second amplitude spectrum of at least one audio frame before the audio frame into a noise reduction model together to obtain the second amplitude spectrum of the audio frame.

Operation S230 of the embodiment of the present disclosure is explained below with reference to fig. 3.

Fig. 3 schematically shows a flowchart for processing a first magnitude spectrum of each audio frame to obtain a noise-reduced second magnitude spectrum according to an embodiment of the present disclosure.

As shown in fig. 3, the method includes operations S310 to S330.

In operation S310, a second magnitude spectrum of a predetermined number of consecutive audio frames preceding and adjacent to the audio frame is determined.

In operation S320, the first magnitude spectra of the audio frames and the second magnitude spectra of the predetermined number of audio frames are combined to obtain input data.

In operation S330, the input data is input into a noise reduction model, and a second magnitude spectrum of the audio frame is obtained.

For example, for the 10 th audio frame, 5 audio frames before the audio frame, i.e., the 5 th to 9 th audio frames, may be determined, and the second amplitude spectrum of the audio frames may be determined, wherein the second amplitude spectrum is the amplitude spectrum of the audio frames after the noise reduction processing. The second magnitude spectrum of the 5 th to 9 th audio frames may be combined with the first magnitude spectrum of the current audio frame (10 th audio frame). Specifically, when the input form of the noise reduction model is a vector form, the feature vectors corresponding to the magnitude spectra of the audio frames may be spliced into one vector. And inputting the merged data into a noise reduction model to obtain a second amplitude spectrum of the 10 th audio frame, namely a noise reduction result of the 10 th audio frame. By analogy, when the 11 th audio frame is processed, the second magnitude spectrum of the 6 th to 10 th audio frames and the first magnitude spectrum of the 11 th audio frame may be input into the noise reduction model together, so as to obtain the second magnitude spectrum of the 11 th audio frame, and so on.

Reference is made back to fig. 2. In operation S240, second audio data after noise reduction processing is constructed based on the second magnitude spectrum of each audio frame. And reconstructing second audio data subjected to noise reduction processing by utilizing the second amplitude spectrum of each audio frame and combining the original phase of the audio frame.

The method inputs the noise reduction results of a plurality of previous audio frames and the amplitude spectrum of the current audio frame into the noise reduction model together, and can improve the noise reduction effect of the current frame.

According to an embodiment of the present disclosure, the noise reduction model is a neural network that uses a linear rectification function (RELU) as an activation function. In the prior art, a nonlinear activation function, such as sigmoid, tanh activation function, etc., is generally used for processing audio, because the nonlinear activation function is generally considered to achieve better effect when the features are more complicated. The inventor finds that in the audio noise reduction process, the training process of the model can be simplified on the premise of not influencing the noise reduction effect by using the linear rectification function.

According to an embodiment of the present disclosure, the neural network may include, for example, one fully-connected input layer, two fully-connected hidden layers, and one fully-connected output layer. The fully-connected input layer can have 1024 nodes, the input dimension is M multiplied by N, N represents the dimension of a single-frame amplitude spectrum, and M represents that the input vector totally contains M frames of amplitude spectra; the fully connected hidden layer may have 1024 nodes; the fully connected output layer may have N nodes; the fully connected input layer and the fully connected hidden layer use the activation function of the RELU, and the fully connected output layer has no activation function.

Before using the noise reduction model, the model needs to be trained using training data. The traditional training data is obtained by superposing a piece of noise data with the same time length on a piece of original audio data, the superposed audio data is used as training data, and the original audio data is used as target data for noise reduction. The inventor finds that the noise reduction model trained by the method has poor application effect in an actual scene. The embodiment of the disclosure provides a method for training a noise reduction model, and the noise reduction effect in an actual scene is expected to be improved.

FIG. 4 schematically shows a flow diagram for training a noise reduction model according to an embodiment of the disclosure.

As shown in fig. 4, the method includes operations S410 to S430.

At least one time period is randomly determined from the clean audio data in operation S410. For example, a 5-minute period of audio data, 00: 38-00: 46, 01: 59-02: 17, 02: 26-02: 31, 03: 30-04: 52, it should be understood that the number, length, and location of the time periods may be random, or at least one of the number, length, and location may be predetermined or determined according to a preset rule, and the other parameters are random.

In operation S420, the noise data is added to each of the time segments according to the randomly determined signal-to-noise ratio, so as to obtain the audio data containing noise. The signal-to-noise ratio may be determined randomly within a certain range, for example, or from a number of alternatives, for example from-5 dB, 0dB, 5dB, 10dB, 15dB, 20 dB. The signal-to-noise ratio used at different time periods may be different.

In operation S430, the noise reduction model is trained with the noisy audio data.

According to the method, the time period is randomly selected in the clean audio and the noise is added, so that the trained noise reduction model has certain adaptability to transient unsteady noise, and the noise reduction performance of the noise reduction model in a real noise environment can be improved.

Based on the same inventive concept, the present disclosure also provides an audio noise reduction device, and the audio noise reduction device according to the embodiment of the present disclosure is described below with reference to fig. 5 to 7.

Fig. 5 schematically shows a block diagram of an audio noise reduction apparatus 500 according to an embodiment of the present disclosure.

As shown in fig. 5, the audio noise reducer 500 includes an obtaining module 510, an extracting module 520, a processing module 530, and a constructing module 540. The audio noise reduction apparatus 500 may perform the various methods described above.

The obtaining module 510, for example performing operation S210 described above with reference to fig. 2, is configured to obtain first audio data comprising a sequence of audio frames.

The extraction module 520, for example performing operation S220 described above with reference to fig. 2, is configured to extract a first magnitude spectrum of each audio frame.

The processing module 530, for example, performs operation S230 described above with reference to fig. 2, and is configured to sequentially process the first magnitude spectrum of each audio frame to obtain a noise-reduced second magnitude spectrum.

The construction module 540, for example, performs operation S240 described above with reference to fig. 2, for constructing noise-reduced second audio data based on the second magnitude spectrum of each audio frame.

Wherein the processing the first amplitude spectrum of each audio frame includes inputting the first amplitude spectrum of the audio frame and the second amplitude spectrum of at least one audio frame before the audio frame into the noise reduction model together to obtain the second amplitude spectrum of the audio frame.

Fig. 6 schematically illustrates a block diagram of a processing module 530 according to an embodiment of the disclosure.

As shown in fig. 6, the processing module 530 includes a determination sub-module 610, a merging sub-module 620, and a processing sub-module 630.

The determining sub-module 610, for example performing operation S310 described above with reference to fig. 3, is configured to determine a second magnitude spectrum of a predetermined number of consecutive audio frames preceding and adjacent to the audio frame.

The combining sub-module 620, for example performing operation S320 described above with reference to fig. 3, is configured to combine the first magnitude spectrum of the audio frames and the second magnitude spectrum of the predetermined number of audio frames to obtain input data.

The processing sub-module 630, for example performing operation S330 described above with reference to fig. 3, is configured to input the input data into a noise reduction model, resulting in a second magnitude spectrum of the audio frame.

Fig. 7 schematically shows a block diagram of an audio noise reduction apparatus 700 according to another embodiment of the present disclosure.

As shown in fig. 7, the audio noise reduction apparatus 700 further includes a determination module 710, a preparation module 720 and a training module 730 based on the foregoing embodiments.

The determining module 710, for example performing operation S410 described above with reference to fig. 4, is configured to randomly determine at least one time period from the clean audio data.

The preparation module 720, for example performing operation S420 described above with reference to fig. 4, is configured to add the noise data to each of the time segments according to the randomly determined signal-to-noise ratio, so as to obtain the noise-containing audio data.

A training module 730, for example performing operation S430 described above with reference to fig. 4, for training the noise reduction model using the noisy audio data.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any of the obtaining module 510, the extracting module 520, the processing module 530, the constructing module 540, the determining sub-module 610, the combining sub-module 620, the processing sub-module 630, the determining module 710, the preparing module 720, and the training module 730 may be combined and implemented in one module, or any one of them may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 510, the extracting module 520, the processing module 530, the constructing module 540, the determining sub-module 610, the combining sub-module 620, the processing sub-module 630, the determining module 710, the preparing module 720, and the training module 730 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or an appropriate combination of any of them. Alternatively, at least one of the obtaining module 510, the extracting module 520, the processing module 530, the constructing module 540, the determining sub-module 610, the combining sub-module 620, the processing sub-module 630, the determining module 710, the preparing module 720 and the training module 730 may be at least partially implemented as a computer program module which, when executed, may perform a corresponding function.

Fig. 8 schematically illustrates a block diagram of a computer system 800 suitable for implementing the above-described method according to an embodiment of the present disclosure. The computer system illustrated in FIG. 8 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 8, a computer system 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the system 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 806. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

System 800 may also include an input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 806, according to an embodiment of the disclosure. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. An audio noise reduction method comprising:

obtaining first audio data comprising a sequence of audio frames;

extracting a first amplitude spectrum of each audio frame;

sequentially processing the first amplitude spectrum of each audio frame to obtain a second amplitude spectrum after noise reduction; and

constructing second audio data after noise reduction processing based on the second amplitude spectrum of each audio frame and combining the original phases of the audio frames,

wherein the processing the first magnitude spectrum of each audio frame comprises: and inputting the first amplitude spectrum of the audio frame and the second amplitude spectrum of at least one audio frame before the audio frame into a noise reduction model together to obtain the second amplitude spectrum of the audio frame.

2. The method according to claim 1, wherein the inputting the first magnitude spectrum of the audio frame and the second magnitude spectrum of at least one audio frame before the audio frame into the noise reduction model together to obtain the second magnitude spectrum of the audio frame comprises:

determining a second magnitude spectrum for a predetermined number of consecutive audio frames preceding and adjacent to the audio frame;

merging the first amplitude spectrum of the audio frames and the second amplitude spectrum of the predetermined number of audio frames to obtain input data;

and inputting the input data into a noise reduction model to obtain a second amplitude spectrum of the audio frame.

3. The method of claim 1, further comprising:

randomly determining at least one time period from the clean audio data;

adding the noise data into each time slot according to the signal-to-noise ratio determined randomly to obtain noise-containing audio data; and

training the noise reduction model using the noisy audio data.

4. The method of claim 1, wherein the noise reduction model is a neural network that uses a linear rectification function as an activation function.

5. An audio noise reduction apparatus comprising:

an acquisition module for acquiring first audio data comprising a sequence of audio frames;

the extraction module is used for extracting a first amplitude spectrum of each audio frame;

the processing module is used for sequentially processing the first amplitude spectrum of each audio frame to obtain a second amplitude spectrum after noise reduction;

a construction module for constructing second audio data after noise reduction processing based on the second amplitude spectrum of each audio frame and combining the original phase of the audio frame,

6. The apparatus of claim 5, wherein the processing module comprises:

a determining sub-module for determining a second magnitude spectrum of a predetermined number of consecutive audio frames preceding and adjacent to the audio frame;

a merging submodule, configured to merge the first magnitude spectrum of the audio frame and the second magnitude spectrum of the predetermined number of audio frames to obtain input data;

and the processing submodule is used for inputting the input data into a noise reduction model to obtain a second amplitude spectrum of the audio frame.

7. The apparatus of claim 5, further comprising:

a determination module for randomly determining at least one time period from the clean audio data;

the preparation module is used for adding the noise data into each time slot according to the signal-to-noise ratio determined randomly to obtain the audio data containing the noise; and

a training module to train the noise reduction model using the noisy audio data.

8. The apparatus of claim 5, wherein the noise reduction model is a neural network that uses a linear rectification function as an activation function.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more computer programs,

wherein the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 4.

10. A computer readable medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 4.