CN111477239B

CN111477239B - Noise removing method and system based on GRU neural network

Info

Publication number: CN111477239B
Application number: CN202010242904.0A
Authority: CN
Inventors: 曾志先; 肖龙源; 李稀敏; 叶志坚; 刘晓葳
Original assignee: Xiamen Kuaishangtong Technology Co Ltd
Current assignee: Xiamen Kuaishangtong Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-05-09
Anticipated expiration: 2040-03-31
Also published as: CN111477239A

Abstract

The invention discloses a noise removing method based on a GRU neural network, a server and a storage medium, wherein the method comprises the following steps: building a GRU neural network model, and training the GRU neural network model; converting the audio file with noise into spectrogram data through a library of Python; transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition, and converting the audio spectrogram data after noise addition into audio data through an open-source vocoder decoding network; the audio data is saved as an audio file. The method for removing noise in audio by adopting the GRU deep learning neural network realizes end-to-end network structure model, can remove noise sounds in different scenes, has higher adaptability and higher stability, improves the accuracy of voice recognition, and reduces the probability of false recognition.

Description

Noise removing method and system based on GRU neural network

Technical Field

The invention relates to the technical field of noise removal, in particular to a GRU neural network-based noise removal method and system.

Background

In current speech preprocessing systems, removing noise from the sound is a very important element of the preprocessing process, and noise will affect the audio effect in speech recognition and other applications, so noise reduction algorithms are also a very important part of the audio preprocessing.

At present, the traditional noise reduction method is common in spectral subtraction and statistical methods. The spectral subtraction is to subtract a noise spectrum from a power spectrum to perform a denoising function; the statistical method is to extract pure speech by probability estimation of the speech spectrum. The above-mentioned spectral subtraction and statistical method are all required to be established on the observation experience of the voice to make noise reduction, and cannot be used well in practical situations, and meanwhile, the problem of introducing music noise also exists.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provides a GRU neural network-based noise removing method and system. The invention removes noise in the audio by using the RNN neural network method, converts the original audio into the audio after removing the noise by using a deep learning end-to-end mode, and realizes that the input original audio with the noise is output, namely the audio after removing the noise.

The invention provides a noise removing method based on a GRU neural network, which comprises the following steps:

firstly, building a GRU neural network model, wherein the GRU network model comprises three GRU network layers and a full-connection network layer;

training a GRU neural network model;

step three, converting the audio file with noise into spectrogram data through a Librosa library of Python;

step four, transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,

the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;

fifthly, converting the audio spectrogram data after noise addition removal into audio data through an open-source vocoder decoding network;

and step six, storing the audio data in the form of an audio file.

Further, in the embodiment of the present invention, before training the GRU neural network model in the second step, collecting a first amount of clean noiseless audio; and respectively adding the second amount of noise audio to the noiseless audio to manufacture noise adding frequencies, wherein the noise audio comprises at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.

Further, the adding of the noise audio to the noiseless audio is performed by summing the noiseless audio and the noise audio point-to-point.

Further, ai Shell audio is used as clean, noiseless audio.

Further, in an embodiment of the present invention, the step of training the GRU neural network model includes:

inputting the spectrogram characteristic data of the noise-added frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;

calculating a loss value of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise adding frequency;

and performing iterative training to obtain a trained GRU neural network model.

In the embodiment of the invention, the iterative training of the loss value is performed by an Adam Optimizer in a Tensor Flow.

In the embodiment of the invention, the loss value is calculated by means of Euclidean distance.

In the embodiment of the invention, the spectrogram characteristic data of the noise-added audio is used as input data of the GRU neural network, and the spectrogram data of the noise-free audio is used as label data of the GRU neural network.

An embodiment of the present invention provides a server including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the above-described GRU neural network-based noise removal method steps.

An embodiment of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the noise removal method steps based on a GRU neural network as described above.

Compared with the existing spectral subtraction and statistical method for noise reduction, the GRU neural network-based noise removal method and system have the following beneficial effects:

(1) The invention adopts the GRU deep learning neural network method to realize the end-to-end network structure model to remove the noise in the audio, and the method can remove the noise sounds in different scenes, and has higher adaptability and higher stability.

(2) Because GRU network simple structure occupies the resource less, so whole noise removal is efficient, and the performance is good, can be applied to in the middle of the reality environment and use, for example in the speech recognition technique, real-time denoising handles the speech, then can reduce noise sound to speech recognition's influence, improves speech recognition's rate of accuracy, reduces the probability of error recognition.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention. In the drawings:

fig. 1 is a schematic flow chart of a noise removing method based on a GRU neural network in embodiment 1 of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear and obvious, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the present invention includes one or more, two or more.

Example 1

The invention provides a noise removing method based on GRU neural network, which comprises the following steps,

step S1, collecting a first number of clean and noiseless audios, wherein in the embodiment of the invention, preferentially, the AiShell audios are used as clean and noiseless audios, which are also called as original audio data and are used as tag data for training a GRU neural network model;

step S2, adding a second amount of noise audio to the noiseless audio respectively to manufacture noisy audio; the noise adding frequency is used as input data of a training model, and the original clean noiseless audio corresponding to each noise adding audio is used as label data of the noise adding audio;

the noise audio comprises at least one of office environmental noise, canteen environmental noise, school environmental noise and outdoor environmental noise;

the adding mode of adding the noise audio to the noiseless audio is that the noiseless audio and the noise audio are summed point to point;

in the embodiment of the present invention, preferably, the second number is 1000 pieces;

s3, converting the noiseless audio frequency and the produced noise adding frequency into a spectrogram through an FFT algorithm by using a library in Python; in the embodiment of the present invention, 1024 is preferably used, that is, 1024 values are used for each frame of audio data, these spectrogram data will be used as data for training a model, spectrogram characteristic data with noise frequency will be used as input data of the GRU neural network, and spectrogram data of noise-free audio will be used as tag data of the GRU neural network;

s4, building a GRU neural network model;

the GRU network model comprises three GRU network layers and a full-connection network layer, wherein each GRU network layer has 300 neurons, and in the embodiment of the invention, the full-connection network layer has a structure of 300 x 1024

S5, inputting the spectrogram characteristic data with the noise frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;

s6, calculating a loss value of the predicted clean audio frequency spectrogram and an original clean audio frequency spectrogram corresponding to the noise adding frequency; in the embodiment of the invention, the loss value is calculated in a Euclidean distance mode, the difference value calculation is carried out on the clean audio frequency spectrogram predicted by the GRU neural network and the original clean audio frequency spectrogram, and the obtained result is the loss value of the current network prediction result.

Step S7, performing iterative training to obtain a trained GRU neural network model; in the embodiment of the invention, iterative training of loss values is carried out by an Adam Optimizer in a Tensor Flow, and further, preferably, 2 ten thousands of batches are trained, and 64 audio spectrogram data are transmitted into each batch for training;

s8, converting the audio file with noise into spectrogram data through a Librosa library of Python;

step S9, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,

and S10, converting the audio spectrogram data after noise removal into audio data through an open-source vocoder decoding network, and storing the audio data in an audio file form.

Example 2

The embodiment of the invention provides a server, which comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a GRU neural network based noise removal method step. The steps of the noise removing method based on the GRU neural network in this embodiment are the same as those in embodiment 1, and will not be described again.

Example 3

Embodiments of the present invention provide a computer readable storage medium storing a computer program that, when executed by a processor, performs a method for removing noise based on a GRU neural network. The steps of the noise removing method based on the GRU neural network in this embodiment are the same as those in embodiment 1, and will not be described again.

The invention provides a GRU neural network-based noise removal method which can be stored in a computer readable storage medium if implemented in the form of a software functional module and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium, comprising several instructions for causing a server (which may be a personal computer, a cloud server, a network device, or a device comprising a processor, etc.) to perform all or part of the methods described in the embodiments of the present invention. The computer readable storage medium includes, but is not limited to, a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a U-disk, a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which program codes can be stored. Embodiments of the invention are not limited to any specific combination of hardware and software.

The foregoing is merely illustrative of the preferred embodiments of the present invention, and it should be understood that the present invention is not limited to the above embodiments, and modifications in combination with known or existing technology and knowledge by those skilled in the art, or equivalent substitution of some or all of the technical features thereof, should be considered as being within the scope of the present invention.

Claims

1. A method for removing noise based on a GRU neural network, the method comprising:

training a GRU neural network model;

step six, the audio data is stored as an audio file form;

the step of training the GRU neural network model comprises the following steps:

calculating a loss value of the predicted clean audio frequency spectrogram and an original clean audio frequency spectrogram corresponding to the noise adding frequency;

and performing iterative training to obtain a trained GRU neural network model.

2. The method of removing noise based on a GRU neural network according to claim 1, wherein the training of the GRU neural network model in step two further comprises,

collecting a first amount of clean, noiseless audio;

and respectively adding the second amount of noise audio to the noiseless audio to manufacture noise adding frequencies, wherein the noise audio comprises at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.

3. The method for removing noise based on GRU neural network according to claim 2, wherein,

the addition of noise audio to noise-free audio is by point-to-point summation of noise-free audio and noise audio.

4. The method for removing noise based on GRU neural network according to claim 2, wherein,

the AiShell audio is used as clean noiseless audio.

5. The method for removing noise based on GRU neural network of claim 1,

iterative training of loss values was performed by AdamOptimizer in TensorFlow.

6. The method for removing noise based on GRU neural network of claim 1,

the loss value is calculated by means of the euclidean distance.

7. The method for removing noise based on GRU neural network of claim 1,

and taking the spectrogram characteristic data of the noise-added audio as input data of the GRU neural network, and taking the spectrogram data of the noise-free audio as tag data of the GRU neural network.

8. A server, wherein the server comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the GRU neural network based denoising method steps of any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the noise-removing method steps based on a GRU neural network as claimed in any one of claims 1 to 7.