CN111477239A - Noise removing method and system based on GRU neural network - Google Patents

Noise removing method and system based on GRU neural network Download PDF

Info

Publication number
CN111477239A
CN111477239A CN202010242904.0A CN202010242904A CN111477239A CN 111477239 A CN111477239 A CN 111477239A CN 202010242904 A CN202010242904 A CN 202010242904A CN 111477239 A CN111477239 A CN 111477239A
Authority
CN
China
Prior art keywords
audio
neural network
noise
gru neural
spectrogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010242904.0A
Other languages
Chinese (zh)
Other versions
CN111477239B (en
Inventor
曾志先
肖龙源
李稀敏
叶志坚
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010242904.0A priority Critical patent/CN111477239B/en
Publication of CN111477239A publication Critical patent/CN111477239A/en
Application granted granted Critical
Publication of CN111477239B publication Critical patent/CN111477239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

the invention discloses a noise removing method based on a GRU neural network, a server and a storage medium, wherein the method comprises the steps of building a GRU neural network model, training the GRU neural network model, converting an audio file with noise into spectrogram data through an L ibrosa library of Python, transmitting the spectrogram data into the trained GRU neural network, outputting the audio spectrogram data after the noise is removed, converting the audio spectrogram data after the noise is removed into audio data through an open-source vocoder decoding network, and storing the audio data into an audio file form.

Description

Noise removing method and system based on GRU neural network
Technical Field
The invention relates to the technical field of noise removal, in particular to a noise removal method and a noise removal system based on a GRU neural network.
Background
In current speech preprocessing systems, removing noise from sound is a very important part of the preprocessing process, and the noise will affect the effect of audio in speech recognition and other applications, so the noise reduction algorithm is also a very important part of audio preprocessing.
The conventional noise reduction method at present is commonly provided with a spectral subtraction method and a statistical method. The spectral subtraction is to subtract a noise spectrum from the power spectrum to perform a denoising function; the statistical method is to extract clean speech by performing probability estimation on the speech spectrum. The above spectral subtraction and statistical methods need to be established on the experience of observing the speech to reduce noise, and cannot be well used in practical situations, and meanwhile, the problem of introducing music noise exists.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provides a method and a system for removing noise based on a GRU neural network. The invention removes the noise in the audio by an RNN neural network method, converts the original audio into the audio after removing the noise by a deep learning end-to-end mode, and realizes the input of the original band noise audio and the output of the original band noise audio, namely the audio after removing the noise.
The invention provides a noise removing method based on a GRU neural network, which comprises the following steps:
Step one, building a GRU neural network model, wherein the GRU neural network model comprises three GRU network layers and a full-connection network layer;
Step two, training a GRU neural network model;
converting the audio file with the noise into spectrogram data through an L ibrosa library of Python;
Step four, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after removing the noise,
The spectrogram data is of a two-dimensional matrix structure, the first dimension of the matrix structure is the frame number of the audio, and the second dimension is the characteristic dimension of the spectrogram;
Converting the audio frequency spectrogram data subjected to noise removal into audio data through an open-source vocoder decoding network;
And step six, storing the audio data into an audio file form.
Further, in the embodiment of the present invention, before training the GRU neural network model in the second step, collecting a first number of clean noiseless audios; the noise audio frequencies of the second number are added into the noiseless audio frequencies respectively to manufacture noise adding frequency, and the noise audio frequencies comprise at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.
Further, the adding manner of the noise audio to the noiseless audio is point-to-point summation of the noiseless audio and the noise audio.
Further, Ai Shell tones are used as clean, noiseless tones.
Further, in the embodiment of the present invention, the step of training the GRU neural network model includes:
Inputting spectrogram characteristic data of the noise frequency into a GRU neural network model to obtain a predicted clean audio frequency spectrogram;
Calculating loss values of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise-added frequency;
And carrying out iterative training to obtain a trained GRU neural network model.
In an embodiment of the invention, iterative training of loss values is performed by an Adam Optimizer in the sensor Flow.
In the embodiment of the invention, the loss value is calculated by means of the Euclidean distance.
In the embodiment of the present invention, spectrogram feature data of the noisy audio is used as input data of the GRU neural network, and spectrogram data of the noiseless audio is used as tag data of the GRU neural network.
In one embodiment of the present invention, a server is provided, which includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the above-described GRU neural network-based denoising method steps.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the above-mentioned noise removing method based on the GRU neural network.
Compared with the existing spectral subtraction and statistical methods for noise reduction, the noise removing method and system based on the GRU neural network have the following beneficial effects:
(1) The method for removing the noise in the audio frequency by using the GRU deep learning neural network realizes an end-to-end network structure model, can remove noise sounds in different scenes, and has high adaptability and high stability.
(2) The GRU network has a simple structure and occupies less resources, so that the overall noise removing efficiency is high, the performance is good, and the method can be applied to a real environment, for example, the voice recognition technology, can remove noise of voice in real time, can reduce the influence of noise sound on voice recognition, improve the accuracy of voice recognition, and reduce the probability of error recognition.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting it to the details of the description. In the drawings:
Fig. 1 is a schematic flowchart of a method for removing noise based on a GRU neural network according to embodiment 1 of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The plurality of the present invention includes one or more, and two or more.
Example 1
The invention provides a noise removing method based on a GRU neural network, which comprises the following steps,
Step S1, collecting a first number of clean noiseless audios, preferably, in the embodiment of the present invention, AiShell audios are used as clean noiseless audios, and these noiseless audios are also called as original audio data and will be used as label data for GRU neural network model training;
Step S2, adding a second number of noise audios into the noiseless audios respectively to make noise-added audios; the noise-added audio is used as input data of a training model, and the original clean noise-free audio corresponding to each noise-added audio is used as label data of the noise-added audio;
The noise audio comprises at least one of office environment noise, dining room environment noise, school environment noise and outdoor environment noise;
The adding mode of adding the noise audio into the noiseless audio is point-to-point summation of the noiseless audio and the noise audio;
In the embodiment of the present invention, preferably, the second number is 1000;
step S3, converting the noiseless audio and the manufactured noise-added frequency into a spectrogram through an FFT algorithm through an L ibrosa library in Python, wherein the converted spectrogram data is of a two-dimensional matrix structure, the first dimension of the matrix structure is the frame number of the audio, and the second dimension is the characteristic dimension of the spectrogram;
Step S4, building a GRU neural network model;
The GRU network model comprises three GRU network layers and a full-connection network layer, wherein each GRU network layer comprises 300 neurons, and in the embodiment of the invention, the structure of the full-connection network layer is 300 × 1024
Step S5, inputting spectrogram feature data of the noise-added frequency into a GRU neural network model to obtain a predicted clean audio frequency spectrogram;
Step S6, calculating loss values of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise-added frequency; in the embodiment of the invention, the loss value is calculated in a Euclidean distance mode, the difference value between the clean audio frequency spectrogram predicted by the GRU neural network and the original clean audio frequency spectrogram is calculated, and the obtained result is the loss value of the current network prediction result.
Step S7, carrying out iterative training to obtain a trained GRU neural network model; in the embodiment of the invention, iterative training of loss values is performed through an Adam Optimizer in a sensor Flow, and further, preferably, 2 ten thousand batches are trained, and 64 audio spectrogram data are transmitted into each batch for training;
step S8, converting the audio file with the noise into spectrogram data through an L ibrosa library of Python;
Step S9, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after removing noise,
The spectrogram data is of a two-dimensional matrix structure, the first dimension of the matrix structure is the frame number of the audio, and the second dimension is the characteristic dimension of the spectrogram;
And step S10, converting the audio frequency spectrogram data after removing the noise into audio data through an open source vocoder decoding network, and storing the audio data in an audio file form.
Example 2
The embodiment of the invention provides a server, which comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the GRU neural network based denoising method steps. The steps of the method for removing noise based on the GRU neural network in this embodiment are the same as those in embodiment 1, and are not described again.
Example 3
In an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the noise removing method steps based on the GRU neural network. The steps of the method for removing noise based on the GRU neural network in this embodiment are the same as those in embodiment 1, and are not described again.
The invention provides a noise removing method based on a GRU neural network, which can be stored in a computer readable storage medium if the noise removing method is realized in the form of a software functional module and sold or used as an independent product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a server (which may be a personal computer, a cloud server, a network device, or a device including a processor, etc.) to execute all or part of the methods described in the embodiments of the present invention. The computer readable storage medium includes, but is not limited to, Read Only Memory (ROM), Random Access Memory (RAM), a usb disk, a removable hard disk, a magnetic disk or an optical disk, and various media capable of storing program codes. Embodiments of the invention are not limited to any specific combination of hardware and software.
The above are only preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, but it should be understood that the present invention is not limited to the above embodiments, and modifications made by persons skilled in the art in combination with the known or prior art and knowledge or equivalent replacement of part or all of the technical features thereof by the present invention should also be considered within the protection scope of the present invention.

Claims (10)

1. A method for removing noise based on a GRU neural network is characterized by comprising the following steps:
Step one, building a GRU neural network model, wherein the GRU neural network model comprises three GRU network layers and a full-connection network layer;
Step two, training a GRU neural network model;
converting the audio file with the noise into spectrogram data through an L ibrosa library of Python;
Step four, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after removing the noise,
The spectrogram data is of a two-dimensional matrix structure, the first dimension of the matrix structure is the frame number of the audio, and the second dimension is the characteristic dimension of the spectrogram;
Converting the audio frequency spectrogram data subjected to noise removal into audio data through an open-source vocoder decoding network;
And step six, storing the audio data into an audio file form.
2. The method for denoising based on GRU neural network of claim 1, wherein the step two training the GRU neural network model further comprises,
Collecting a first quantity of clean noiseless audio;
The noise audio frequencies of the second number are added into the noiseless audio frequencies respectively to manufacture noise adding frequency, and the noise audio frequencies comprise at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.
3. The GRU neural network-based denoising method of claim 2,
The adding mode of adding the noise audio into the noiseless audio is point-to-point summation of the noiseless audio and the noise audio.
4. The GRU neural network-based denoising method of claim 2,
AiShell audio is used as clean noiseless audio.
5. The GRU neural network-based denoising method of claim 2,
The step of training the GRU neural network model comprises the following steps:
Inputting spectrogram characteristic data of the noise frequency into a GRU neural network model to obtain a predicted clean audio frequency spectrogram;
Calculating loss values of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise-added frequency;
And carrying out iterative training to obtain a trained GRU neural network model.
6. The GRU neural network-based denoising method of claim 5,
Iterative training of loss values was performed by adammoptimizer in TensorFlow.
7. The GRU neural network-based denoising method of claim 5,
The loss value is calculated by means of the euclidean distance.
8. The GRU neural network-based denoising method of claim 5,
The spectrogram feature data of the noise-added audio is used as input data of the GRU neural network, and the spectrogram data of the noiseless audio is used as label data of the GRU neural network.
9. A server, characterized in that the server comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method steps of any of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which when executed by a processor implements the method steps of the GRU neural network-based denoising method of any of claims 1-8.
CN202010242904.0A 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network Active CN111477239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242904.0A CN111477239B (en) 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242904.0A CN111477239B (en) 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network

Publications (2)

Publication Number Publication Date
CN111477239A true CN111477239A (en) 2020-07-31
CN111477239B CN111477239B (en) 2023-05-09

Family

ID=71750358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242904.0A Active CN111477239B (en) 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network

Country Status (1)

Country Link
CN (1) CN111477239B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109273021A (en) * 2018-08-09 2019-01-25 厦门亿联网络技术股份有限公司 A kind of real-time conferencing noise-reduction method and device based on RNN
CN109671433A (en) * 2019-01-10 2019-04-23 腾讯科技(深圳)有限公司 A kind of detection method and relevant apparatus of keyword
CN109785850A (en) * 2019-01-18 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of noise detecting method, device and storage medium
CN110120225A (en) * 2019-04-01 2019-08-13 西安电子科技大学 A kind of audio defeat system and method for the structure based on GRU network
CN110491404A (en) * 2019-08-15 2019-11-22 广州华多网络科技有限公司 Method of speech processing, device, terminal device and storage medium
CN110534118A (en) * 2019-07-29 2019-12-03 安徽继远软件有限公司 Transformer/reactor method for diagnosing faults based on Application on Voiceprint Recognition and neural network
CN110647980A (en) * 2019-09-18 2020-01-03 成都理工大学 Time sequence prediction method based on GRU neural network
CN110751945A (en) * 2019-10-17 2020-02-04 成都三零凯天通信实业有限公司 End-to-end voice recognition method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109273021A (en) * 2018-08-09 2019-01-25 厦门亿联网络技术股份有限公司 A kind of real-time conferencing noise-reduction method and device based on RNN
CN109671433A (en) * 2019-01-10 2019-04-23 腾讯科技(深圳)有限公司 A kind of detection method and relevant apparatus of keyword
CN109785850A (en) * 2019-01-18 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of noise detecting method, device and storage medium
CN110120225A (en) * 2019-04-01 2019-08-13 西安电子科技大学 A kind of audio defeat system and method for the structure based on GRU network
CN110534118A (en) * 2019-07-29 2019-12-03 安徽继远软件有限公司 Transformer/reactor method for diagnosing faults based on Application on Voiceprint Recognition and neural network
CN110491404A (en) * 2019-08-15 2019-11-22 广州华多网络科技有限公司 Method of speech processing, device, terminal device and storage medium
CN110647980A (en) * 2019-09-18 2020-01-03 成都理工大学 Time sequence prediction method based on GRU neural network
CN110751945A (en) * 2019-10-17 2020-02-04 成都三零凯天通信实业有限公司 End-to-end voice recognition method

Also Published As

Publication number Publication date
CN111477239B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
JP7337953B2 (en) Speech recognition method and device, neural network training method and device, and computer program
Zhang et al. Deep learning for environmentally robust speech recognition: An overview of recent developments
JP6671020B2 (en) Dialogue act estimation method, dialogue act estimation device and program
CN111161752A (en) Echo cancellation method and device
Wu et al. Increasing compactness of deep learning based speech enhancement models with parameter pruning and quantization techniques
CN112949708A (en) Emotion recognition method and device, computer equipment and storage medium
CN111128222B (en) Speech separation method, speech separation model training method, and computer-readable medium
JP7329393B2 (en) Audio signal processing device, audio signal processing method, audio signal processing program, learning device, learning method and learning program
CN113196385B (en) Method and system for audio signal processing and computer readable storage medium
CN115602165A (en) Digital staff intelligent system based on financial system
KR102026226B1 (en) Method for extracting signal unit features using variational inference model based deep learning and system thereof
CN114596879A (en) False voice detection method and device, electronic equipment and storage medium
Ueda et al. Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
CN110890098B (en) Blind signal separation method and device and electronic equipment
Lodagala et al. Ccc-wav2vec 2.0: Clustering aided cross contrastive self-supervised learning of speech representations
CN114360571A (en) Reference-based speech enhancement method
CN112331232B (en) Voice emotion recognition method combining CGAN spectrogram denoising and bilateral filtering spectrogram enhancement
JP2020095732A (en) Dialogue action estimation method, dialogue action estimation device and program
Yang et al. Approaching optimal embedding in audio steganography with GAN
CN113707172B (en) Single-channel voice separation method, system and computer equipment of sparse orthogonal network
CN116524931A (en) System, method, electronic equipment and medium for converting voice of 5G rich media message into text
CN111477239B (en) Noise removing method and system based on GRU neural network
CN115116469A (en) Feature representation extraction method, feature representation extraction device, feature representation extraction apparatus, feature representation extraction medium, and program product
Zhipeng et al. Voiceprint recognition based on BP Neural Network and CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant