CN113421581B - Real-time voice noise reduction method for jump network - Google Patents

Real-time voice noise reduction method for jump network Download PDF

Info

Publication number
CN113421581B
CN113421581B CN202110971215.8A CN202110971215A CN113421581B CN 113421581 B CN113421581 B CN 113421581B CN 202110971215 A CN202110971215 A CN 202110971215A CN 113421581 B CN113421581 B CN 113421581B
Authority
CN
China
Prior art keywords
signal
noise
audio
voice
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110971215.8A
Other languages
Chinese (zh)
Other versions
CN113421581A (en
Inventor
黄祥康
吴庆耀
白剑
黄海亮
梁瑛玮
张海林
鲁和平
李长杰
陈焕然
李乐
王浩
洪行健
冷冬
丁一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yifang Information Technology Co ltd
Original Assignee
Guangzhou Easefun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Easefun Information Technology Co ltd filed Critical Guangzhou Easefun Information Technology Co ltd
Priority to CN202110971215.8A priority Critical patent/CN113421581B/en
Publication of CN113421581A publication Critical patent/CN113421581A/en
Application granted granted Critical
Publication of CN113421581B publication Critical patent/CN113421581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Abstract

The invention discloses a real-time voice noise reduction method of a hop network, which is based on a multilayer short-time Fourier transform loss function and comprises the following steps: constructing an audio training set of network training by using a frequency band shielding and signal reverberation data enhancement method; constructing a jumping Unet lightweight network structure; and training a model by using a multilayer short-time Fourier transform loss function, and denoising by using the trained model. The invention adopts a jumping Unet network structure to lighten the model, and greatly improves the generalization capability of the model to the processing of different noise types by utilizing a loss function based on multilayer short-time Fourier transform and data enhancement means such as noise shift, signal reverberation and the like.

Description

Real-time voice noise reduction method for jump network
Technical Field
The present invention relates to a voice noise reduction method, and more particularly, to a voice noise reduction method for a hop network.
Background
The voice enhancement technology is always a popular research field, has great practicability in life, such as video conference, voice communication and the like, and can greatly improve the voice and video call quality of people by utilizing the voice enhancement noise reduction technology. The traditional speech noise reduction method mainly uses a spectral subtraction method and a method based on a statistical model, and the algorithm cannot achieve good effect in dealing with non-stationary noise signals. The traditional methods such as wiener filtering and the like are difficult to process noise signals which are not stable or are talked by multiple people, and the noise removing method of the deep neural network appearing later improves the noise signals, but the processing speed is slow, and the effect is difficult to play in practical application.
In recent years, with the development of deep learning, deep learning is also used to perform noise reduction processing on audio signals, and a good effect is obtained. The common deep neural network has large parameter quantity and complex model, so the time for processing the audio frequency is longer.
Disclosure of Invention
In view of the above technical problems, an object of the present invention is to provide a voice denoising method for a skip network, which employs a lighter network structure, and transmits noisy audio information as input information of the network to an input layer, and uses pure audio information without noise as output target data to perform supervised training.
The technical scheme of the invention is as follows:
a real-time voice noise reduction method of a hop network is based on a multilayer short-time Fourier transform loss function, and is characterized by comprising the following steps:
s1: constructing an audio training set of network training by using a frequency band shielding and signal reverberation data enhancement method, wherein the frequency band shielding allows audio to pass through a band elimination filter to remove partial frequency in the audio, and the signal reverberation is added into the original audio after the audio is continuously attenuated and delayed;
s2: constructing a jumping Unet lightweight network structure, obtaining the characteristics of different channel numbers by performing convolution and transposition convolution on the characteristics, and connecting and adding the characteristics of the same channel number to enable the Unet lightweight network to learn the relationship between high-level and low-level characteristics at the same time;
s3: using multi-layer short-time Fourier transform loss functions
Figure 933455DEST_PATH_IMAGE001
And absolute mean error
Figure 909501DEST_PATH_IMAGE002
And as a loss function of the model, training the model by an Adam optimization algorithm, and performing noise reduction by using the trained model.
The preferred technical solution of the present invention is that the audio training set for constructing the network training in step S1 includes the following steps:
s101: acquiring pure voice signals and noise signals as training data of a model through a Valentini data set and a DNS2020 reference data set;
s102: superposing various noise signals to obtain a mixed noise signal;
s103: randomly intercepting and synthesizing the mixed noise signal and the voice signal to obtain a voice signal with mixed noise;
s104: the speech signal and the original noise signal are delayed, attenuation processing is added to the speech signal with mixed noise, and a noise speech signal with reverberation is obtained.
The preferable technical solution of the present invention is that step S2 specifically includes:
s201: constructing an encoding module, enabling the audio signal to pass through a one-dimensional convolution module, then carrying out zero setting processing on a numerical value smaller than zero through a relu activation function, then continuing carrying out convolution processing on a convolution kernel with twice channel number, and finally obtaining an encoded signal through a gate control linear unit;
s202: the coded signals are processed by an LSTM signal processing module, wherein the LSTM signal processing module is constructed by a unidirectional LSTM network or a bidirectional LSTM network;
s203: a decoding module is constructed, the number of channels is reduced through a one-dimensional convolution module after the encoded signal is processed by an LSTM signal processing module, the signal is processed through a gate control linear unit, and finally the voice-enhanced audio is obtained through a one-dimensional transposition convolution module;
s204: and connecting modules of which the number of output channels of the coding module is equal to that of input channels of the decoding module to construct a hopping Unet lightweight network structure.
The preferable technical solution of the present invention is that step S3 specifically includes:
s301: construction of input noise signal and clean audio signal
Figure 919045DEST_PATH_IMAGE003
A loss function;
s302: constructing the input noise signal and the pure audio signal by respectively carrying out short-time Fourier transform of different parameters
Figure 421571DEST_PATH_IMAGE004
A loss function;
s303: inputting the model parameters of the coding module, the decoding module and the LSTM into an Adam optimizer for optimization learning, and training a final model;
s304: and directly inputting the voice signal with noise into the trained final model to obtain the voice enhanced voice signal.
The preferred technical scheme of the invention is that the formula of the gate control linear unit is as follows:
Figure 756737DEST_PATH_IMAGE005
where X is the output of the convolution module, W, b, V, c are all learnable parameters, ⨂ is the element product, and σ (-) is the sigmoid function.
The preferred technical solution of the present invention is that the formula of the loss function is as follows:
Figure 220080DEST_PATH_IMAGE006
wherein
Figure 33315DEST_PATH_IMAGE007
In order to be a pure speech signal, the speech signal,
Figure 265713DEST_PATH_IMAGE008
for the enhanced speech signal, T is the audio length, and the value of T varies from sample to sample.
The preferred technical solution of the present invention is that the formula of the loss function is as follows:
Figure 37360DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 847053DEST_PATH_IMAGE011
in order to perform a short-time fourier transform,
Figure 463979DEST_PATH_IMAGE012
in order to be a pure speech signal, the speech signal,
Figure 550884DEST_PATH_IMAGE013
for the enhanced speech signal, T is the audio length, T values are different among different samples, the number of transform parameter Fourier transform points is selected to be 512, 1024 and 2048, frame phase shift is 50, 120 and 240 correspondingly, and the window length is longThe degrees correspond to 240, 600 and 1200.
Compared with the prior art, the invention has the following beneficial effects:
the invention can obtain better voice noise reduction effect, and has the advantages of small distortion, strong generalization capability and good noise reduction effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic diagram of a voice denoising method for a hop network according to an embodiment of the present invention;
fig. 2 is a flowchart of a voice noise reduction method for a hop network according to embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of a hop denoising network;
fig. 4 is a schematic diagram of an LSTM network.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
As shown in fig. 1-2, the speech noise reduction method according to the embodiment of the present invention is based on a multi-layer short-time fourier transform loss function, and includes:
s1: constructing an audio training set of network training by using a frequency band shielding and signal reverberation data enhancement method, wherein the frequency band shielding is to enable audio to pass through a band elimination filter to remove partial frequency in the audio, and the signal reverberation is to add the audio into the original audio after continuous attenuation and delay;
s2: constructing a jumping Unet lightweight network structure, as shown in FIG. 3, obtaining features with different channel numbers by performing convolution and transposition convolution on the features, and connecting and adding the features with the same channel number, so that a model can learn the relationship between high-level and low-level features at the same time to obtain a better effect;
s3: and (3) taking the multilayer short-time Fourier transform loss function and the absolute mean error as the loss function of the model, training the model by an Adam optimization algorithm, and using the trained model to reduce noise.
The audio training set for constructing the network training in step S1 includes the following steps:
s101: acquiring pure voice signals and noise signals as training data of a model through a Valentini data set and a DNS2020 reference data set;
wherein, Valentini is a training data set provided by the university of Edinburgh Voice technology research center and used as a voice enhancement and voice synthesis algorithm, and DNS2020 is a deep voice denoising challenge held by Microsoft, and a large amount of clean voice signals and noise signals are provided therein.
S102: superposing various noise signals to obtain a mixed noise signal;
s103: randomly intercepting and synthesizing the mixed noise signal and the voice signal to obtain a voice signal with mixed noise;
s104: the speech signal and the original noise signal are delayed, attenuation processing is added to the speech signal with mixed noise, and a noise speech signal with reverberation is obtained.
Wherein the audio training set in step S1 includes audio data and noise data, and various types of noisy audio and pure audio data for supervision corresponding thereto are synthesized.
Step S2 specifically includes:
s201: constructing an encoding module, wherein the audio signal passes through a one-dimensional convolution module, then is processed by a relu activation function, then is continuously convoluted by a convolution kernel with twice channel number, and finally passes through a gate control linear unit to obtain an encoded signal;
the relu activation function formula is as follows:
Figure 493432DEST_PATH_IMAGE014
the formula for the gated linear cell is as follows:
Figure 665788DEST_PATH_IMAGE015
where X is the output of the convolution module, W, b, V, c are all learnable parameters, ⨂ is the element product, and σ (-) is the sigmoid function.
S202: a coding signal processing module is constructed by adopting a unidirectional LSTM network or a bidirectional LSTM network, and the coded signals are processed by the LSTM signal processing module. The Long Short Term Memory (LSTM) network is a special RNN model, and its special structural design makes it possible to avoid the Long Term dependence problem. In fig. 4, the σ symbol represents a sigmoid function, tanh represents a tanh function, and + represents vector corresponding bit addition, where the left side controls forgetting of the previous cell state by using the input x and the previous cell hidden state h as the inputs of the sigmoid function, and the middle part input x and the previous cell hidden state h pass through the sigmoid and tanh functions to determine which new input information of the cell is to be retained and used for updating the cell state. The rightmost part finally outputs the unit output h by combining the input x and the previous unit hidden state h and the unit state output.
S203: constructing a decoding module, wherein the decoding module is described as follows, after the encoded signal is processed by an LSTM signal processing module, firstly reducing the number of channels through a one-dimensional convolution module, then processing the signal through a gate control linear unit, and finally obtaining the voice-enhanced audio through a one-dimensional transposition convolution module;
s204: and connecting the modules of which the number of output channels of the coding module is equal to that of input channels of the decoding module so as to construct a hop network structure.
Step S3 specifically includes:
s301: constructing enhanced speech signal and pure speech signal
Figure 968897DEST_PATH_IMAGE016
A loss function;
Figure 175887DEST_PATH_IMAGE016
the formula for the loss function is as follows:
Figure 23757DEST_PATH_IMAGE017
wherein
Figure 683409DEST_PATH_IMAGE018
In order to be a pure speech signal, the speech signal,
Figure 907717DEST_PATH_IMAGE019
for the enhanced speech signal, T is the audio length, and the value of T varies from sample to sample.
S302: respectively constructing the enhanced voice signal and the pure voice signal by short-time Fourier transform of different parameters
Figure 828268DEST_PATH_IMAGE020
A loss function;
Figure 847040DEST_PATH_IMAGE021
the formula of the loss function is as follows
Figure 993987DEST_PATH_IMAGE010
Wherein the content of the first and second substances,
Figure 21986DEST_PATH_IMAGE022
in order to perform a short-time fourier transform,
Figure 937990DEST_PATH_IMAGE023
in order to be a pure speech signal, the speech signal,
Figure 986717DEST_PATH_IMAGE024
for the enhanced speech signal, T is the audio length, T varies from sample to sample, the number of transform parameter fourier transform points is selected to be 512, 1024 and 2048, the frame phase shift corresponds to 50, 120 and 240, and the window length corresponds to 240, 600 and 1200.
S303: inputting the model parameters of the coding module, the decoding module and the LSTM into an Adam optimizer for optimization learning, and training a final model;
s304: and directly inputting the voice signal with the noise into the network to obtain the voice enhanced voice signal.
The target sampling rate of the audio data of the data set constructed in the present invention is 16 k. For audio signals with different sampling rates, resampling to a target sampling rate is needed, and then the audio signals are directly input into a network, so that the audio signals after speech enhancement can be obtained. Through the specific implementation mode, the invention can obtain better voice noise reduction effect. The method has the advantages of small distortion, strong generalization capability and good noise reduction effect.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (7)

1. A real-time voice noise reduction method of a hop network is based on a multilayer short-time Fourier transform loss function, and is characterized by comprising the following steps:
s1: constructing an audio training set of network training by using a frequency band shielding and signal reverberation data enhancement method, wherein the frequency band shielding allows audio to pass through a band elimination filter to remove partial frequency in the audio, and the signal reverberation is added into the original audio after the audio is continuously attenuated and delayed;
s2: constructing a jumping Unet lightweight network structure, obtaining the characteristics of different channel numbers by performing convolution and transposition convolution on the characteristics, and connecting and adding the characteristics of the same channel number to enable the Unet lightweight network to learn the relationship between high-level and low-level characteristics at the same time;
s3: using multi-layer short-time Fourier transform loss functions
Figure 207409DEST_PATH_IMAGE001
And absolute mean error
Figure 390129DEST_PATH_IMAGE002
And as a loss function of the model, training the model by an Adam optimization algorithm, and performing noise reduction by using the trained model.
2. The method for reducing noise of real-time speech according to claim 1, wherein the step S1 of constructing the audio training set of the network training comprises the steps of:
s101: acquiring pure voice signals and noise signals as training data of a model through a Valentini data set and a DNS2020 reference data set;
s102: superposing various noise signals to obtain a mixed noise signal;
s103: randomly intercepting and synthesizing the mixed noise signal and the voice signal to obtain a voice signal with mixed noise;
s104: the speech signal and the original noise signal are delayed, attenuation processing is added to the speech signal with mixed noise, and a noise speech signal with reverberation is obtained.
3. The real-time speech noise reduction method according to claim 2, wherein step S2 specifically includes:
s201: constructing an encoding module, enabling the audio signal to pass through a one-dimensional convolution module, then carrying out zero setting processing on a numerical value smaller than zero through a relu activation function, then continuing carrying out convolution processing on a convolution kernel with twice channel number, and finally obtaining an encoded signal through a gate control linear unit;
s202: the coded signals are processed by an LSTM signal processing module, wherein the LSTM signal processing module is constructed by a unidirectional LSTM network or a bidirectional LSTM network;
s203: a decoding module is constructed, the number of channels is reduced through a one-dimensional convolution module after the encoded signal is processed by an LSTM signal processing module, the signal is processed through a gate control linear unit, and finally the voice-enhanced audio is obtained through a one-dimensional transposition convolution module;
s204: and connecting modules of which the number of output channels of the coding module is equal to that of input channels of the decoding module to construct a hopping Unet lightweight network structure.
4. The real-time speech noise reduction method according to claim 3, wherein step S3 specifically includes:
s301: construction of input noise signal and clean audio signal
Figure 373128DEST_PATH_IMAGE002
A loss function;
s302: constructing the input noise signal and the pure audio signal by respectively carrying out short-time Fourier transform of different parameters
Figure 33917DEST_PATH_IMAGE003
A loss function;
s303: inputting the model parameters of the coding module, the decoding module and the LSTM into an Adam optimizer for optimization learning, and training a final model;
s304: and directly inputting the voice signal with noise into the trained final model to obtain the voice enhanced voice signal.
5. The real-time speech noise reduction method of claim 3,
the formula of the gated linear unit is as follows:
Figure 176185DEST_PATH_IMAGE004
where X is the output of the convolution module, W, b, V, c are all learnable parameters, ⨂ is the element product, and σ (-) is the sigmoid function.
6. The real-time speech noise reduction method of claim 4,
the formula of the loss function is as follows:
Figure 264227DEST_PATH_IMAGE005
wherein
Figure 531260DEST_PATH_IMAGE006
In order to be a pure speech signal, the speech signal,
Figure 933423DEST_PATH_IMAGE007
for the enhanced speech signal, T is the audio length, and the value of T varies from sample to sample.
7. The real-time speech noise reduction method of claim 6,
the above-mentioned
Figure 867881DEST_PATH_IMAGE008
The formula for the loss function is as follows:
Figure 189140DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 209049DEST_PATH_IMAGE010
in order to perform a short-time fourier transform,
Figure 946061DEST_PATH_IMAGE011
in order to be a pure speech signal, the speech signal,
Figure 672708DEST_PATH_IMAGE012
for the enhanced speech signal, T is the audio length, T varies from sample to sample, the number of transform parameter fourier transform points is selected to be 512, 1024 and 2048, the frame phase shift corresponds to 50, 120 and 240, and the window length corresponds to 240, 600 and 1200.
CN202110971215.8A 2021-08-24 2021-08-24 Real-time voice noise reduction method for jump network Active CN113421581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110971215.8A CN113421581B (en) 2021-08-24 2021-08-24 Real-time voice noise reduction method for jump network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110971215.8A CN113421581B (en) 2021-08-24 2021-08-24 Real-time voice noise reduction method for jump network

Publications (2)

Publication Number Publication Date
CN113421581A CN113421581A (en) 2021-09-21
CN113421581B true CN113421581B (en) 2021-11-02

Family

ID=77719525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110971215.8A Active CN113421581B (en) 2021-08-24 2021-08-24 Real-time voice noise reduction method for jump network

Country Status (1)

Country Link
CN (1) CN113421581B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949821A (en) * 2019-03-15 2019-06-28 慧言科技(天津)有限公司 A method of far field speech dereverbcration is carried out using the U-NET structure of CNN
CN111757172A (en) * 2019-03-29 2020-10-09 Tcl集团股份有限公司 HDR video acquisition method, HDR video acquisition device and terminal equipment
CN112151059A (en) * 2020-09-25 2020-12-29 南京工程学院 Microphone array-oriented channel attention weighted speech enhancement method
CN113011093A (en) * 2021-03-15 2021-06-22 哈尔滨工程大学 Ship navigation noise simulation generation method based on LCWaveGAN

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190385282A1 (en) * 2018-06-18 2019-12-19 Drvision Technologies Llc Robust methods for deep image transformation, integration and prediction
US10923141B2 (en) * 2018-08-06 2021-02-16 Spotify Ab Singing voice separation with deep u-net convolutional networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949821A (en) * 2019-03-15 2019-06-28 慧言科技(天津)有限公司 A method of far field speech dereverbcration is carried out using the U-NET structure of CNN
CN111757172A (en) * 2019-03-29 2020-10-09 Tcl集团股份有限公司 HDR video acquisition method, HDR video acquisition device and terminal equipment
CN112151059A (en) * 2020-09-25 2020-12-29 南京工程学院 Microphone array-oriented channel attention weighted speech enhancement method
CN113011093A (en) * 2021-03-15 2021-06-22 哈尔滨工程大学 Ship navigation noise simulation generation method based on LCWaveGAN

Also Published As

Publication number Publication date
CN113421581A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN110491404B (en) Voice processing method, device, terminal equipment and storage medium
CN111564160B (en) Voice noise reduction method based on AEWGAN
CN111081268A (en) Phase-correlated shared deep convolutional neural network speech enhancement method
CN111653285B (en) Packet loss compensation method and device
Lin et al. Speech enhancement using multi-stage self-attentive temporal convolutional networks
CN111524530A (en) Voice noise reduction method based on expansion causal convolution
CN116486826A (en) Voice enhancement method based on converged network
Lim et al. Harmonic and percussive source separation using a convolutional auto encoder
CN113782044B (en) Voice enhancement method and device
CN113421581B (en) Real-time voice noise reduction method for jump network
CN117174105A (en) Speech noise reduction and dereverberation method based on improved deep convolutional network
CN113936680B (en) Single-channel voice enhancement method based on multi-scale information perception convolutional neural network
CN111916060A (en) Deep learning voice endpoint detection method and system based on spectral subtraction
CN115331690A (en) Method for eliminating noise of call voice in real time
Xiang et al. Joint waveform and magnitude processing for monaural speech enhancement
Hou et al. A real-time speech enhancement algorithm based on convolutional recurrent network and Wiener filter
Ullah et al. Semi-supervised transient noise suppression using OMLSA and SNMF algorithms
CN115116451A (en) Audio decoding method, audio encoding method, audio decoding device, audio encoding device, electronic equipment and storage medium
CN114822569A (en) Audio signal processing method, device, equipment and computer readable storage medium
CN110751958A (en) Noise reduction method based on RCED network
Pastor-Naranjo et al. Conditional Generative Adversarial Networks for Acoustic Echo Cancellation
WO2024055751A1 (en) Audio data processing method and apparatus, device, storage medium, and program product
CN113903355B (en) Voice acquisition method and device, electronic equipment and storage medium
Lee et al. Stacked U-Net with high-level feature transfer for parameter efficient speech enhancement
US20240096332A1 (en) Audio signal processing method, audio signal processing apparatus, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 402, No. 66, North Street, University Town Center, Panyu District, Guangzhou City, Guangdong Province, 510006

Patentee after: Yifang Information Technology Co.,Ltd.

Address before: 510006 Room 601, 603, 605, science museum, Guangdong University of technology, 100 Waihuan West Road, Xiaoguwei street, Panyu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU EASEFUN INFORMATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address