CN111477239B - Noise removing method and system based on GRU neural network - Google Patents
Noise removing method and system based on GRU neural network Download PDFInfo
- Publication number
- CN111477239B CN111477239B CN202010242904.0A CN202010242904A CN111477239B CN 111477239 B CN111477239 B CN 111477239B CN 202010242904 A CN202010242904 A CN 202010242904A CN 111477239 B CN111477239 B CN 111477239B
- Authority
- CN
- China
- Prior art keywords
- noise
- audio
- neural network
- gru neural
- spectrogram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000003062 neural network model Methods 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 3
- 230000007613 environmental effect Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000011410 subtraction method Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The invention discloses a noise removing method based on a GRU neural network, a server and a storage medium, wherein the method comprises the following steps: building a GRU neural network model, and training the GRU neural network model; converting the audio file with noise into spectrogram data through a library of Python; transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition, and converting the audio spectrogram data after noise addition into audio data through an open-source vocoder decoding network; the audio data is saved as an audio file. The method for removing noise in audio by adopting the GRU deep learning neural network realizes end-to-end network structure model, can remove noise sounds in different scenes, has higher adaptability and higher stability, improves the accuracy of voice recognition, and reduces the probability of false recognition.
Description
Technical Field
The invention relates to the technical field of noise removal, in particular to a GRU neural network-based noise removal method and system.
Background
In current speech preprocessing systems, removing noise from the sound is a very important element of the preprocessing process, and noise will affect the audio effect in speech recognition and other applications, so noise reduction algorithms are also a very important part of the audio preprocessing.
At present, the traditional noise reduction method is common in spectral subtraction and statistical methods. The spectral subtraction is to subtract a noise spectrum from a power spectrum to perform a denoising function; the statistical method is to extract pure speech by probability estimation of the speech spectrum. The above-mentioned spectral subtraction and statistical method are all required to be established on the observation experience of the voice to make noise reduction, and cannot be used well in practical situations, and meanwhile, the problem of introducing music noise also exists.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provides a GRU neural network-based noise removing method and system. The invention removes noise in the audio by using the RNN neural network method, converts the original audio into the audio after removing the noise by using a deep learning end-to-end mode, and realizes that the input original audio with the noise is output, namely the audio after removing the noise.
The invention provides a noise removing method based on a GRU neural network, which comprises the following steps:
firstly, building a GRU neural network model, wherein the GRU network model comprises three GRU network layers and a full-connection network layer;
training a GRU neural network model;
step three, converting the audio file with noise into spectrogram data through a Librosa library of Python;
step four, transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,
the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;
fifthly, converting the audio spectrogram data after noise addition removal into audio data through an open-source vocoder decoding network;
and step six, storing the audio data in the form of an audio file.
Further, in the embodiment of the present invention, before training the GRU neural network model in the second step, collecting a first amount of clean noiseless audio; and respectively adding the second amount of noise audio to the noiseless audio to manufacture noise adding frequencies, wherein the noise audio comprises at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.
Further, the adding of the noise audio to the noiseless audio is performed by summing the noiseless audio and the noise audio point-to-point.
Further, ai Shell audio is used as clean, noiseless audio.
Further, in an embodiment of the present invention, the step of training the GRU neural network model includes:
inputting the spectrogram characteristic data of the noise-added frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;
calculating a loss value of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise adding frequency;
and performing iterative training to obtain a trained GRU neural network model.
In the embodiment of the invention, the iterative training of the loss value is performed by an Adam Optimizer in a Tensor Flow.
In the embodiment of the invention, the loss value is calculated by means of Euclidean distance.
In the embodiment of the invention, the spectrogram characteristic data of the noise-added audio is used as input data of the GRU neural network, and the spectrogram data of the noise-free audio is used as label data of the GRU neural network.
An embodiment of the present invention provides a server including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the above-described GRU neural network-based noise removal method steps.
An embodiment of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the noise removal method steps based on a GRU neural network as described above.
Compared with the existing spectral subtraction and statistical method for noise reduction, the GRU neural network-based noise removal method and system have the following beneficial effects:
(1) The invention adopts the GRU deep learning neural network method to realize the end-to-end network structure model to remove the noise in the audio, and the method can remove the noise sounds in different scenes, and has higher adaptability and higher stability.
(2) Because GRU network simple structure occupies the resource less, so whole noise removal is efficient, and the performance is good, can be applied to in the middle of the reality environment and use, for example in the speech recognition technique, real-time denoising handles the speech, then can reduce noise sound to speech recognition's influence, improves speech recognition's rate of accuracy, reduces the probability of error recognition.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention. In the drawings:
fig. 1 is a schematic flow chart of a noise removing method based on a GRU neural network in embodiment 1 of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear and obvious, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the present invention includes one or more, two or more.
Example 1
The invention provides a noise removing method based on GRU neural network, which comprises the following steps,
step S1, collecting a first number of clean and noiseless audios, wherein in the embodiment of the invention, preferentially, the AiShell audios are used as clean and noiseless audios, which are also called as original audio data and are used as tag data for training a GRU neural network model;
step S2, adding a second amount of noise audio to the noiseless audio respectively to manufacture noisy audio; the noise adding frequency is used as input data of a training model, and the original clean noiseless audio corresponding to each noise adding audio is used as label data of the noise adding audio;
the noise audio comprises at least one of office environmental noise, canteen environmental noise, school environmental noise and outdoor environmental noise;
the adding mode of adding the noise audio to the noiseless audio is that the noiseless audio and the noise audio are summed point to point;
in the embodiment of the present invention, preferably, the second number is 1000 pieces;
s3, converting the noiseless audio frequency and the produced noise adding frequency into a spectrogram through an FFT algorithm by using a library in Python; in the embodiment of the present invention, 1024 is preferably used, that is, 1024 values are used for each frame of audio data, these spectrogram data will be used as data for training a model, spectrogram characteristic data with noise frequency will be used as input data of the GRU neural network, and spectrogram data of noise-free audio will be used as tag data of the GRU neural network;
s4, building a GRU neural network model;
the GRU network model comprises three GRU network layers and a full-connection network layer, wherein each GRU network layer has 300 neurons, and in the embodiment of the invention, the full-connection network layer has a structure of 300 x 1024
S5, inputting the spectrogram characteristic data with the noise frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;
s6, calculating a loss value of the predicted clean audio frequency spectrogram and an original clean audio frequency spectrogram corresponding to the noise adding frequency; in the embodiment of the invention, the loss value is calculated in a Euclidean distance mode, the difference value calculation is carried out on the clean audio frequency spectrogram predicted by the GRU neural network and the original clean audio frequency spectrogram, and the obtained result is the loss value of the current network prediction result.
Step S7, performing iterative training to obtain a trained GRU neural network model; in the embodiment of the invention, iterative training of loss values is carried out by an Adam Optimizer in a Tensor Flow, and further, preferably, 2 ten thousands of batches are trained, and 64 audio spectrogram data are transmitted into each batch for training;
s8, converting the audio file with noise into spectrogram data through a Librosa library of Python;
step S9, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,
the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;
and S10, converting the audio spectrogram data after noise removal into audio data through an open-source vocoder decoding network, and storing the audio data in an audio file form.
Example 2
The embodiment of the invention provides a server, which comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a GRU neural network based noise removal method step. The steps of the noise removing method based on the GRU neural network in this embodiment are the same as those in embodiment 1, and will not be described again.
Example 3
Embodiments of the present invention provide a computer readable storage medium storing a computer program that, when executed by a processor, performs a method for removing noise based on a GRU neural network. The steps of the noise removing method based on the GRU neural network in this embodiment are the same as those in embodiment 1, and will not be described again.
The invention provides a GRU neural network-based noise removal method which can be stored in a computer readable storage medium if implemented in the form of a software functional module and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium, comprising several instructions for causing a server (which may be a personal computer, a cloud server, a network device, or a device comprising a processor, etc.) to perform all or part of the methods described in the embodiments of the present invention. The computer readable storage medium includes, but is not limited to, a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a U-disk, a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which program codes can be stored. Embodiments of the invention are not limited to any specific combination of hardware and software.
The foregoing is merely illustrative of the preferred embodiments of the present invention, and it should be understood that the present invention is not limited to the above embodiments, and modifications in combination with known or existing technology and knowledge by those skilled in the art, or equivalent substitution of some or all of the technical features thereof, should be considered as being within the scope of the present invention.
Claims (9)
1. A method for removing noise based on a GRU neural network, the method comprising:
firstly, building a GRU neural network model, wherein the GRU network model comprises three GRU network layers and a full-connection network layer;
training a GRU neural network model;
step three, converting the audio file with noise into spectrogram data through a Librosa library of Python;
step four, transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,
the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;
fifthly, converting the audio spectrogram data after noise addition removal into audio data through an open-source vocoder decoding network;
step six, the audio data is stored as an audio file form;
the step of training the GRU neural network model comprises the following steps:
inputting the spectrogram characteristic data of the noise-added frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;
calculating a loss value of the predicted clean audio frequency spectrogram and an original clean audio frequency spectrogram corresponding to the noise adding frequency;
and performing iterative training to obtain a trained GRU neural network model.
2. The method of removing noise based on a GRU neural network according to claim 1, wherein the training of the GRU neural network model in step two further comprises,
collecting a first amount of clean, noiseless audio;
and respectively adding the second amount of noise audio to the noiseless audio to manufacture noise adding frequencies, wherein the noise audio comprises at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.
3. The method for removing noise based on GRU neural network according to claim 2, wherein,
the addition of noise audio to noise-free audio is by point-to-point summation of noise-free audio and noise audio.
4. The method for removing noise based on GRU neural network according to claim 2, wherein,
the AiShell audio is used as clean noiseless audio.
5. The method for removing noise based on GRU neural network of claim 1,
iterative training of loss values was performed by AdamOptimizer in TensorFlow.
6. The method for removing noise based on GRU neural network of claim 1,
the loss value is calculated by means of the euclidean distance.
7. The method for removing noise based on GRU neural network of claim 1,
and taking the spectrogram characteristic data of the noise-added audio as input data of the GRU neural network, and taking the spectrogram data of the noise-free audio as tag data of the GRU neural network.
8. A server, wherein the server comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the GRU neural network based denoising method steps of any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the noise-removing method steps based on a GRU neural network as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010242904.0A CN111477239B (en) | 2020-03-31 | 2020-03-31 | Noise removing method and system based on GRU neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010242904.0A CN111477239B (en) | 2020-03-31 | 2020-03-31 | Noise removing method and system based on GRU neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111477239A CN111477239A (en) | 2020-07-31 |
CN111477239B true CN111477239B (en) | 2023-05-09 |
Family
ID=71750358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010242904.0A Active CN111477239B (en) | 2020-03-31 | 2020-03-31 | Noise removing method and system based on GRU neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111477239B (en) |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109273021B (en) * | 2018-08-09 | 2021-11-30 | 厦门亿联网络技术股份有限公司 | RNN-based real-time conference noise reduction method and device |
CN109671433B (en) * | 2019-01-10 | 2023-06-16 | 腾讯科技(深圳)有限公司 | Keyword detection method and related device |
CN109785850A (en) * | 2019-01-18 | 2019-05-21 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of noise detecting method, device and storage medium |
CN110120225A (en) * | 2019-04-01 | 2019-08-13 | 西安电子科技大学 | A kind of audio defeat system and method for the structure based on GRU network |
CN110534118B (en) * | 2019-07-29 | 2021-10-08 | 安徽继远软件有限公司 | Transformer/reactor fault diagnosis method based on voiceprint recognition and neural network |
CN110491404B (en) * | 2019-08-15 | 2020-12-22 | 广州华多网络科技有限公司 | Voice processing method, device, terminal equipment and storage medium |
CN110647980A (en) * | 2019-09-18 | 2020-01-03 | 成都理工大学 | Time sequence prediction method based on GRU neural network |
CN110751945A (en) * | 2019-10-17 | 2020-02-04 | 成都三零凯天通信实业有限公司 | End-to-end voice recognition method |
-
2020
- 2020-03-31 CN CN202010242904.0A patent/CN111477239B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111477239A (en) | 2020-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109841226B (en) | Single-channel real-time noise reduction method based on convolution recurrent neural network | |
EP3926623A1 (en) | Speech recognition method and apparatus, and neural network training method and apparatus | |
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
CN112735460B (en) | Beam forming method and system based on time-frequency masking value estimation | |
CN112767959B (en) | Voice enhancement method, device, equipment and medium | |
CN113345460B (en) | Audio signal processing method, device, equipment and storage medium | |
CN112309411A (en) | Phase-sensitive gated multi-scale void convolutional network speech enhancement method and system | |
CN113808602A (en) | Speech enhancement method, model training method and related equipment | |
KR102026226B1 (en) | Method for extracting signal unit features using variational inference model based deep learning and system thereof | |
CN117174105A (en) | Speech noise reduction and dereverberation method based on improved deep convolutional network | |
JP7329393B2 (en) | Audio signal processing device, audio signal processing method, audio signal processing program, learning device, learning method and learning program | |
JP2004310091A (en) | Method and apparatus for formant tracking using residual model | |
CN112331232B (en) | Voice emotion recognition method combining CGAN spectrogram denoising and bilateral filtering spectrogram enhancement | |
Ram et al. | Speech enhancement through improvised conditional generative adversarial networks | |
Lodagala et al. | Ccc-wav2vec 2.0: Clustering aided cross contrastive self-supervised learning of speech representations | |
CN114360571A (en) | Reference-based speech enhancement method | |
Chantas et al. | Sparse audio inpainting with variational bayesian inference | |
Zhou et al. | Speech denoising using Bayesian NMF with online base update | |
CN111653270B (en) | Voice processing method and device, computer readable storage medium and electronic equipment | |
CN111477239B (en) | Noise removing method and system based on GRU neural network | |
Raj et al. | Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients | |
Yang et al. | Approaching optimal embedding in audio steganography with GAN | |
JP2020095732A (en) | Dialogue action estimation method, dialogue action estimation device and program | |
WO2022213825A1 (en) | Neural network-based end-to-end speech enhancement method and apparatus | |
CN113707172B (en) | Single-channel voice separation method, system and computer equipment of sparse orthogonal network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |