CN111477239B - Noise removing method and system based on GRU neural network - Google Patents

Noise removing method and system based on GRU neural network Download PDF

Info

Publication number
CN111477239B
CN111477239B CN202010242904.0A CN202010242904A CN111477239B CN 111477239 B CN111477239 B CN 111477239B CN 202010242904 A CN202010242904 A CN 202010242904A CN 111477239 B CN111477239 B CN 111477239B
Authority
CN
China
Prior art keywords
noise
audio
neural network
gru neural
spectrogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010242904.0A
Other languages
Chinese (zh)
Other versions
CN111477239A (en
Inventor
曾志先
肖龙源
李稀敏
叶志坚
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010242904.0A priority Critical patent/CN111477239B/en
Publication of CN111477239A publication Critical patent/CN111477239A/en
Application granted granted Critical
Publication of CN111477239B publication Critical patent/CN111477239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention discloses a noise removing method based on a GRU neural network, a server and a storage medium, wherein the method comprises the following steps: building a GRU neural network model, and training the GRU neural network model; converting the audio file with noise into spectrogram data through a library of Python; transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition, and converting the audio spectrogram data after noise addition into audio data through an open-source vocoder decoding network; the audio data is saved as an audio file. The method for removing noise in audio by adopting the GRU deep learning neural network realizes end-to-end network structure model, can remove noise sounds in different scenes, has higher adaptability and higher stability, improves the accuracy of voice recognition, and reduces the probability of false recognition.

Description

Noise removing method and system based on GRU neural network
Technical Field
The invention relates to the technical field of noise removal, in particular to a GRU neural network-based noise removal method and system.
Background
In current speech preprocessing systems, removing noise from the sound is a very important element of the preprocessing process, and noise will affect the audio effect in speech recognition and other applications, so noise reduction algorithms are also a very important part of the audio preprocessing.
At present, the traditional noise reduction method is common in spectral subtraction and statistical methods. The spectral subtraction is to subtract a noise spectrum from a power spectrum to perform a denoising function; the statistical method is to extract pure speech by probability estimation of the speech spectrum. The above-mentioned spectral subtraction and statistical method are all required to be established on the observation experience of the voice to make noise reduction, and cannot be used well in practical situations, and meanwhile, the problem of introducing music noise also exists.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provides a GRU neural network-based noise removing method and system. The invention removes noise in the audio by using the RNN neural network method, converts the original audio into the audio after removing the noise by using a deep learning end-to-end mode, and realizes that the input original audio with the noise is output, namely the audio after removing the noise.
The invention provides a noise removing method based on a GRU neural network, which comprises the following steps:
firstly, building a GRU neural network model, wherein the GRU network model comprises three GRU network layers and a full-connection network layer;
training a GRU neural network model;
step three, converting the audio file with noise into spectrogram data through a Librosa library of Python;
step four, transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,
the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;
fifthly, converting the audio spectrogram data after noise addition removal into audio data through an open-source vocoder decoding network;
and step six, storing the audio data in the form of an audio file.
Further, in the embodiment of the present invention, before training the GRU neural network model in the second step, collecting a first amount of clean noiseless audio; and respectively adding the second amount of noise audio to the noiseless audio to manufacture noise adding frequencies, wherein the noise audio comprises at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.
Further, the adding of the noise audio to the noiseless audio is performed by summing the noiseless audio and the noise audio point-to-point.
Further, ai Shell audio is used as clean, noiseless audio.
Further, in an embodiment of the present invention, the step of training the GRU neural network model includes:
inputting the spectrogram characteristic data of the noise-added frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;
calculating a loss value of the predicted clean audio frequency spectrogram and the original clean audio frequency spectrogram corresponding to the noise adding frequency;
and performing iterative training to obtain a trained GRU neural network model.
In the embodiment of the invention, the iterative training of the loss value is performed by an Adam Optimizer in a Tensor Flow.
In the embodiment of the invention, the loss value is calculated by means of Euclidean distance.
In the embodiment of the invention, the spectrogram characteristic data of the noise-added audio is used as input data of the GRU neural network, and the spectrogram data of the noise-free audio is used as label data of the GRU neural network.
An embodiment of the present invention provides a server including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the above-described GRU neural network-based noise removal method steps.
An embodiment of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the noise removal method steps based on a GRU neural network as described above.
Compared with the existing spectral subtraction and statistical method for noise reduction, the GRU neural network-based noise removal method and system have the following beneficial effects:
(1) The invention adopts the GRU deep learning neural network method to realize the end-to-end network structure model to remove the noise in the audio, and the method can remove the noise sounds in different scenes, and has higher adaptability and higher stability.
(2) Because GRU network simple structure occupies the resource less, so whole noise removal is efficient, and the performance is good, can be applied to in the middle of the reality environment and use, for example in the speech recognition technique, real-time denoising handles the speech, then can reduce noise sound to speech recognition's influence, improves speech recognition's rate of accuracy, reduces the probability of error recognition.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention. In the drawings:
fig. 1 is a schematic flow chart of a noise removing method based on a GRU neural network in embodiment 1 of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear and obvious, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the present invention includes one or more, two or more.
Example 1
The invention provides a noise removing method based on GRU neural network, which comprises the following steps,
step S1, collecting a first number of clean and noiseless audios, wherein in the embodiment of the invention, preferentially, the AiShell audios are used as clean and noiseless audios, which are also called as original audio data and are used as tag data for training a GRU neural network model;
step S2, adding a second amount of noise audio to the noiseless audio respectively to manufacture noisy audio; the noise adding frequency is used as input data of a training model, and the original clean noiseless audio corresponding to each noise adding audio is used as label data of the noise adding audio;
the noise audio comprises at least one of office environmental noise, canteen environmental noise, school environmental noise and outdoor environmental noise;
the adding mode of adding the noise audio to the noiseless audio is that the noiseless audio and the noise audio are summed point to point;
in the embodiment of the present invention, preferably, the second number is 1000 pieces;
s3, converting the noiseless audio frequency and the produced noise adding frequency into a spectrogram through an FFT algorithm by using a library in Python; in the embodiment of the present invention, 1024 is preferably used, that is, 1024 values are used for each frame of audio data, these spectrogram data will be used as data for training a model, spectrogram characteristic data with noise frequency will be used as input data of the GRU neural network, and spectrogram data of noise-free audio will be used as tag data of the GRU neural network;
s4, building a GRU neural network model;
the GRU network model comprises three GRU network layers and a full-connection network layer, wherein each GRU network layer has 300 neurons, and in the embodiment of the invention, the full-connection network layer has a structure of 300 x 1024
S5, inputting the spectrogram characteristic data with the noise frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;
s6, calculating a loss value of the predicted clean audio frequency spectrogram and an original clean audio frequency spectrogram corresponding to the noise adding frequency; in the embodiment of the invention, the loss value is calculated in a Euclidean distance mode, the difference value calculation is carried out on the clean audio frequency spectrogram predicted by the GRU neural network and the original clean audio frequency spectrogram, and the obtained result is the loss value of the current network prediction result.
Step S7, performing iterative training to obtain a trained GRU neural network model; in the embodiment of the invention, iterative training of loss values is carried out by an Adam Optimizer in a Tensor Flow, and further, preferably, 2 ten thousands of batches are trained, and 64 audio spectrogram data are transmitted into each batch for training;
s8, converting the audio file with noise into spectrogram data through a Librosa library of Python;
step S9, transmitting the spectrogram data to the trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,
the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;
and S10, converting the audio spectrogram data after noise removal into audio data through an open-source vocoder decoding network, and storing the audio data in an audio file form.
Example 2
The embodiment of the invention provides a server, which comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a GRU neural network based noise removal method step. The steps of the noise removing method based on the GRU neural network in this embodiment are the same as those in embodiment 1, and will not be described again.
Example 3
Embodiments of the present invention provide a computer readable storage medium storing a computer program that, when executed by a processor, performs a method for removing noise based on a GRU neural network. The steps of the noise removing method based on the GRU neural network in this embodiment are the same as those in embodiment 1, and will not be described again.
The invention provides a GRU neural network-based noise removal method which can be stored in a computer readable storage medium if implemented in the form of a software functional module and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium, comprising several instructions for causing a server (which may be a personal computer, a cloud server, a network device, or a device comprising a processor, etc.) to perform all or part of the methods described in the embodiments of the present invention. The computer readable storage medium includes, but is not limited to, a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a U-disk, a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which program codes can be stored. Embodiments of the invention are not limited to any specific combination of hardware and software.
The foregoing is merely illustrative of the preferred embodiments of the present invention, and it should be understood that the present invention is not limited to the above embodiments, and modifications in combination with known or existing technology and knowledge by those skilled in the art, or equivalent substitution of some or all of the technical features thereof, should be considered as being within the scope of the present invention.

Claims (9)

1. A method for removing noise based on a GRU neural network, the method comprising:
firstly, building a GRU neural network model, wherein the GRU network model comprises three GRU network layers and a full-connection network layer;
training a GRU neural network model;
step three, converting the audio file with noise into spectrogram data through a Librosa library of Python;
step four, transmitting the spectrogram data to a trained GRU neural network, outputting the audio spectrogram data after noise addition is removed,
the spectrogram data is in a two-dimensional matrix structure, wherein the first dimension of the matrix structure is the number of frames of audio frequency, and the second dimension is the characteristic dimension of the spectrogram;
fifthly, converting the audio spectrogram data after noise addition removal into audio data through an open-source vocoder decoding network;
step six, the audio data is stored as an audio file form;
the step of training the GRU neural network model comprises the following steps:
inputting the spectrogram characteristic data of the noise-added frequency into the GRU neural network model to obtain a predicted clean audio spectrogram;
calculating a loss value of the predicted clean audio frequency spectrogram and an original clean audio frequency spectrogram corresponding to the noise adding frequency;
and performing iterative training to obtain a trained GRU neural network model.
2. The method of removing noise based on a GRU neural network according to claim 1, wherein the training of the GRU neural network model in step two further comprises,
collecting a first amount of clean, noiseless audio;
and respectively adding the second amount of noise audio to the noiseless audio to manufacture noise adding frequencies, wherein the noise audio comprises at least one of office environment noise, canteen environment noise, school environment noise and outdoor environment noise.
3. The method for removing noise based on GRU neural network according to claim 2, wherein,
the addition of noise audio to noise-free audio is by point-to-point summation of noise-free audio and noise audio.
4. The method for removing noise based on GRU neural network according to claim 2, wherein,
the AiShell audio is used as clean noiseless audio.
5. The method for removing noise based on GRU neural network of claim 1,
iterative training of loss values was performed by AdamOptimizer in TensorFlow.
6. The method for removing noise based on GRU neural network of claim 1,
the loss value is calculated by means of the euclidean distance.
7. The method for removing noise based on GRU neural network of claim 1,
and taking the spectrogram characteristic data of the noise-added audio as input data of the GRU neural network, and taking the spectrogram data of the noise-free audio as tag data of the GRU neural network.
8. A server, wherein the server comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the GRU neural network based denoising method steps of any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the noise-removing method steps based on a GRU neural network as claimed in any one of claims 1 to 7.
CN202010242904.0A 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network Active CN111477239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242904.0A CN111477239B (en) 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242904.0A CN111477239B (en) 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network

Publications (2)

Publication Number Publication Date
CN111477239A CN111477239A (en) 2020-07-31
CN111477239B true CN111477239B (en) 2023-05-09

Family

ID=71750358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242904.0A Active CN111477239B (en) 2020-03-31 2020-03-31 Noise removing method and system based on GRU neural network

Country Status (1)

Country Link
CN (1) CN111477239B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109273021B (en) * 2018-08-09 2021-11-30 厦门亿联网络技术股份有限公司 RNN-based real-time conference noise reduction method and device
CN109671433B (en) * 2019-01-10 2023-06-16 腾讯科技(深圳)有限公司 Keyword detection method and related device
CN109785850A (en) * 2019-01-18 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of noise detecting method, device and storage medium
CN110120225A (en) * 2019-04-01 2019-08-13 西安电子科技大学 A kind of audio defeat system and method for the structure based on GRU network
CN110534118B (en) * 2019-07-29 2021-10-08 安徽继远软件有限公司 Transformer/reactor fault diagnosis method based on voiceprint recognition and neural network
CN110491404B (en) * 2019-08-15 2020-12-22 广州华多网络科技有限公司 Voice processing method, device, terminal equipment and storage medium
CN110647980A (en) * 2019-09-18 2020-01-03 成都理工大学 Time sequence prediction method based on GRU neural network
CN110751945A (en) * 2019-10-17 2020-02-04 成都三零凯天通信实业有限公司 End-to-end voice recognition method

Also Published As

Publication number Publication date
CN111477239A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN109841226B (en) Single-channel real-time noise reduction method based on convolution recurrent neural network
EP3926623A1 (en) Speech recognition method and apparatus, and neural network training method and apparatus
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN112735460B (en) Beam forming method and system based on time-frequency masking value estimation
CN112767959B (en) Voice enhancement method, device, equipment and medium
CN113345460B (en) Audio signal processing method, device, equipment and storage medium
CN112309411A (en) Phase-sensitive gated multi-scale void convolutional network speech enhancement method and system
CN113808602A (en) Speech enhancement method, model training method and related equipment
KR102026226B1 (en) Method for extracting signal unit features using variational inference model based deep learning and system thereof
CN117174105A (en) Speech noise reduction and dereverberation method based on improved deep convolutional network
JP7329393B2 (en) Audio signal processing device, audio signal processing method, audio signal processing program, learning device, learning method and learning program
JP2004310091A (en) Method and apparatus for formant tracking using residual model
CN112331232B (en) Voice emotion recognition method combining CGAN spectrogram denoising and bilateral filtering spectrogram enhancement
Ram et al. Speech enhancement through improvised conditional generative adversarial networks
Lodagala et al. Ccc-wav2vec 2.0: Clustering aided cross contrastive self-supervised learning of speech representations
CN114360571A (en) Reference-based speech enhancement method
Chantas et al. Sparse audio inpainting with variational bayesian inference
Zhou et al. Speech denoising using Bayesian NMF with online base update
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN111477239B (en) Noise removing method and system based on GRU neural network
Raj et al. Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients
Yang et al. Approaching optimal embedding in audio steganography with GAN
JP2020095732A (en) Dialogue action estimation method, dialogue action estimation device and program
WO2022213825A1 (en) Neural network-based end-to-end speech enhancement method and apparatus
CN113707172B (en) Single-channel voice separation method, system and computer equipment of sparse orthogonal network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant