CN111477239A

CN111477239A - Noise removing method and system based on GRU neural network

Info

Publication number: CN111477239A
Application number: CN202010242904.0A
Authority: CN
Inventors: 曾志先; 肖龙源; 李稀敏; 叶志坚; 刘晓葳
Original assignee: Xiamen Kuaishangtong Technology Co Ltd
Current assignee: Xiamen Kuaishangtong Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-31
Anticipated expiration: 2040-03-31
Also published as: CN111477239B

Abstract

the invention discloses a noise removing method based on a GRU neural network, a server and a storage medium, wherein the method comprises the steps of building a GRU neural network model, training the GRU neural network model, converting an audio file with noise into spectrogram data through an L ibrosa library of Python, transmitting the spectrogram data into the trained GRU neural network, outputting the audio spectrogram data after the noise is removed, converting the audio spectrogram data after the noise is removed into audio data through an open-source vocoder decoding network, and storing the audio data into an audio file form.

Description

Noise removing method and system based on GRU neural network

Technical Field

The invention relates to the technical field of noise removal, in particular to a noise removal method and a noise removal system based on a GRU neural network.

Background

In current speech preprocessing systems, removing noise from sound is a very important part of the preprocessing process, and the noise will affect the effect of audio in speech recognition and other applications, so the noise reduction algorithm is also a very important part of audio preprocessing.

The conventional noise reduction method at present is commonly provided with a spectral subtraction method and a statistical method. The spectral subtraction is to subtract a noise spectrum from the power spectrum to perform a denoising function; the statistical method is to extract clean speech by performing probability estimation on the speech spectrum. The above spectral subtraction and statistical methods need to be established on the experience of observing the speech to reduce noise, and cannot be well used in practical situations, and meanwhile, the problem of introducing music noise exists.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provides a method and a system for removing noise based on a GRU neural network. The invention removes the noise in the audio by an RNN neural network method, converts the original audio into the audio after removing the noise by a deep learning end-to-end mode, and realizes the input of the original band noise audio and the output of the original band noise audio, namely the audio after removing the noise.

The invention provides a noise removing method based on a GRU neural network, which comprises the following steps:

Step one, building a GRU neural network model, wherein the GRU neural network model comprises three GRU network layers and a full-connection network layer;

Step two, training a GRU neural network model;

converting the audio file with the noise into spectrogram data through an L ibrosa library of Python;

Step four, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after removing the noise,

The spectrogram data is of a two-dimensional matrix structure, the first dimension of the matrix structure is the frame number of the audio, and the second dimension is the characteristic dimension of the spectrogram;

Converting the audio frequency spectrogram data subjected to noise removal into audio data through an open-source vocoder decoding network;

And step six, storing the audio data into an audio file form.

Further, in the embodiment of the present invention, before training the GRU neural network model in the second step, collecting a first number of clean noiseless audios; the noise audio frequencies of the second number are added into the noiseless audio frequencies respectively to manufacture noise adding frequency, and the noise audio frequencies comprise at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.

Further, the adding manner of the noise audio to the noiseless audio is point-to-point summation of the noiseless audio and the noise audio.

Further, Ai Shell tones are used as clean, noiseless tones.

Further, in the embodiment of the present invention, the step of training the GRU neural network model includes:

Inputting spectrogram characteristic data of the noise frequency into a GRU neural network model to obtain a predicted clean audio frequency spectrogram;

Calculating loss values of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise-added frequency;

And carrying out iterative training to obtain a trained GRU neural network model.

In an embodiment of the invention, iterative training of loss values is performed by an Adam Optimizer in the sensor Flow.

In the embodiment of the invention, the loss value is calculated by means of the Euclidean distance.

In the embodiment of the present invention, spectrogram feature data of the noisy audio is used as input data of the GRU neural network, and spectrogram data of the noiseless audio is used as tag data of the GRU neural network.

In one embodiment of the present invention, a server is provided, which includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the above-described GRU neural network-based denoising method steps.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the above-mentioned noise removing method based on the GRU neural network.

Compared with the existing spectral subtraction and statistical methods for noise reduction, the noise removing method and system based on the GRU neural network have the following beneficial effects:

(1) The method for removing the noise in the audio frequency by using the GRU deep learning neural network realizes an end-to-end network structure model, can remove noise sounds in different scenes, and has high adaptability and high stability.

(2) The GRU network has a simple structure and occupies less resources, so that the overall noise removing efficiency is high, the performance is good, and the method can be applied to a real environment, for example, the voice recognition technology, can remove noise of voice in real time, can reduce the influence of noise sound on voice recognition, improve the accuracy of voice recognition, and reduce the probability of error recognition.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting it to the details of the description. In the drawings:

Fig. 1 is a schematic flowchart of a method for removing noise based on a GRU neural network according to embodiment 1 of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The plurality of the present invention includes one or more, and two or more.

Example 1

The invention provides a noise removing method based on a GRU neural network, which comprises the following steps,

Step S1, collecting a first number of clean noiseless audios, preferably, in the embodiment of the present invention, AiShell audios are used as clean noiseless audios, and these noiseless audios are also called as original audio data and will be used as label data for GRU neural network model training;

Step S2, adding a second number of noise audios into the noiseless audios respectively to make noise-added audios; the noise-added audio is used as input data of a training model, and the original clean noise-free audio corresponding to each noise-added audio is used as label data of the noise-added audio;

The noise audio comprises at least one of office environment noise, dining room environment noise, school environment noise and outdoor environment noise;

The adding mode of adding the noise audio into the noiseless audio is point-to-point summation of the noiseless audio and the noise audio;

In the embodiment of the present invention, preferably, the second number is 1000;

step S3, converting the noiseless audio and the manufactured noise-added frequency into a spectrogram through an FFT algorithm through an L ibrosa library in Python, wherein the converted spectrogram data is of a two-dimensional matrix structure, the first dimension of the matrix structure is the frame number of the audio, and the second dimension is the characteristic dimension of the spectrogram;

Step S4, building a GRU neural network model;

The GRU network model comprises three GRU network layers and a full-connection network layer, wherein each GRU network layer comprises 300 neurons, and in the embodiment of the invention, the structure of the full-connection network layer is 300 × 1024

Step S5, inputting spectrogram feature data of the noise-added frequency into a GRU neural network model to obtain a predicted clean audio frequency spectrogram;

Step S6, calculating loss values of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise-added frequency; in the embodiment of the invention, the loss value is calculated in a Euclidean distance mode, the difference value between the clean audio frequency spectrogram predicted by the GRU neural network and the original clean audio frequency spectrogram is calculated, and the obtained result is the loss value of the current network prediction result.

Step S7, carrying out iterative training to obtain a trained GRU neural network model; in the embodiment of the invention, iterative training of loss values is performed through an Adam Optimizer in a sensor Flow, and further, preferably, 2 ten thousand batches are trained, and 64 audio spectrogram data are transmitted into each batch for training;

step S8, converting the audio file with the noise into spectrogram data through an L ibrosa library of Python;

Step S9, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after removing noise,

And step S10, converting the audio frequency spectrogram data after removing the noise into audio data through an open source vocoder decoding network, and storing the audio data in an audio file form.

Example 2

The embodiment of the invention provides a server, which comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the GRU neural network based denoising method steps. The steps of the method for removing noise based on the GRU neural network in this embodiment are the same as those in embodiment 1, and are not described again.

Example 3

In an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the noise removing method steps based on the GRU neural network. The steps of the method for removing noise based on the GRU neural network in this embodiment are the same as those in embodiment 1, and are not described again.

The invention provides a noise removing method based on a GRU neural network, which can be stored in a computer readable storage medium if the noise removing method is realized in the form of a software functional module and sold or used as an independent product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a server (which may be a personal computer, a cloud server, a network device, or a device including a processor, etc.) to execute all or part of the methods described in the embodiments of the present invention. The computer readable storage medium includes, but is not limited to, Read Only Memory (ROM), Random Access Memory (RAM), a usb disk, a removable hard disk, a magnetic disk or an optical disk, and various media capable of storing program codes. Embodiments of the invention are not limited to any specific combination of hardware and software.

The above are only preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, but it should be understood that the present invention is not limited to the above embodiments, and modifications made by persons skilled in the art in combination with the known or prior art and knowledge or equivalent replacement of part or all of the technical features thereof by the present invention should also be considered within the protection scope of the present invention.

Claims

1. A method for removing noise based on a GRU neural network is characterized by comprising the following steps:

Step two, training a GRU neural network model;

And step six, storing the audio data into an audio file form.

2. The method for denoising based on GRU neural network of claim 1, wherein the step two training the GRU neural network model further comprises,

Collecting a first quantity of clean noiseless audio;

The noise audio frequencies of the second number are added into the noiseless audio frequencies respectively to manufacture noise adding frequency, and the noise audio frequencies comprise at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.

3. The GRU neural network-based denoising method of claim 2,

The adding mode of adding the noise audio into the noiseless audio is point-to-point summation of the noiseless audio and the noise audio.

4. The GRU neural network-based denoising method of claim 2,

AiShell audio is used as clean noiseless audio.

5. The GRU neural network-based denoising method of claim 2,

The step of training the GRU neural network model comprises the following steps:

6. The GRU neural network-based denoising method of claim 5,

Iterative training of loss values was performed by adammoptimizer in TensorFlow.

7. The GRU neural network-based denoising method of claim 5,

The loss value is calculated by means of the euclidean distance.

8. The GRU neural network-based denoising method of claim 5,

The spectrogram feature data of the noise-added audio is used as input data of the GRU neural network, and the spectrogram data of the noiseless audio is used as label data of the GRU neural network.

9. A server, characterized in that the server comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method steps of any of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which when executed by a processor implements the method steps of the GRU neural network-based denoising method of any of claims 1-8.