CN114495960A - Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium - Google Patents

Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114495960A
CN114495960A CN202111605349.4A CN202111605349A CN114495960A CN 114495960 A CN114495960 A CN 114495960A CN 202111605349 A CN202111605349 A CN 202111605349A CN 114495960 A CN114495960 A CN 114495960A
Authority
CN
China
Prior art keywords
signal
audio
neural network
calculating
input signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111605349.4A
Other languages
Chinese (zh)
Inventor
黄景标
陈庭威
林聚财
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202111605349.4A priority Critical patent/CN114495960A/en
Publication of CN114495960A publication Critical patent/CN114495960A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02087Noise filtering the noise being separate speech, e.g. cocktail party

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The application discloses an audio noise reduction filtering method, a noise reduction filter device, electronic equipment and a computer storage medium, and relates to the technical field of audio signal processing. The method comprises the following steps: acquiring characteristic parameters of the audio input signal by using a preset neural network; calculating a filtering weight coefficient based on the characteristic parameters; processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; calculating a cost value based on the filtered audio signal and the real signal; and training the preset neural network by using the cost value. By the mode, the audio noise reduction filtering method can effectively reduce noise in an audio system and improve voice quality.

Description

Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium
Technical Field
The present application relates to the field of audio signal processing technologies, and in particular, to an audio noise reduction filtering method, a noise reduction filtering apparatus, an electronic device, and a computer storage medium.
Background
In real life, when people start a hands-free phone or a video conference terminal to carry out a video conference by using a mobile terminal such as a mobile phone, various noises exist in a field environment, and a microphone can acquire environmental noises besides a target signal, so that the noises need to be suppressed by using a filtering technology.
Disclosure of Invention
The technical problem mainly solved by the application is to provide an audio noise reduction filtering method, a noise reduction filtering device, an electronic device and a computer storage medium, so as to reduce noise and improve voice quality.
In order to solve the technical problem, the application adopts a technical scheme that: an audio noise reduction filtering method is provided. The method comprises the following steps:
acquiring characteristic parameters of the audio input signal by using a preset neural network; calculating a filtering weight coefficient based on the characteristic parameters; processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; calculating a cost value based on the filtered audio signal and the real signal; and training the preset neural network by using the cost value.
In order to solve the above technical problem, another technical solution adopted by the present application is: provided is a noise reduction filter device. The noise reduction filter device includes:
the preset neural network module is used for acquiring characteristic parameters of the audio input signal; the calculation module is connected with the preset neural network module and used for calculating a filtering weight coefficient based on the characteristic parameters; the filter module is connected with the calculation module and used for processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; the calculation module is further used for calculating the filtered audio signal and the real signal cost value, sending the cost value to the preset neural network module and training by using the cost value.
In order to solve the above technical problem, another technical solution adopted by the present application is: the electronic equipment comprises a processor and a memory connected with the processor, wherein program data are stored in the memory, and the processor executes the program data stored in the memory to realize the following steps: acquiring characteristic parameters of the audio input signal by using a preset neural network; calculating a filtering weight coefficient based on the characteristic parameters; processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; calculating a cost value based on the filtered audio signal and the real signal; and training the preset neural network by using the cost value.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer storage medium having stored therein program instructions that are executed to implement: acquiring characteristic parameters of the audio input signal by using a preset neural network; calculating a filtering weight coefficient based on the characteristic parameters; processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; calculating a cost value based on the filtered audio signal and the real signal; and training the preset neural network by using the cost value.
The beneficial effect of this application is: different from the situation of the prior art, the audio noise reduction filtering method adopts the mode of combining the preset neural network with the traditional signal processing to process the audio sound, utilizes the advantage that the preset neural network can better solve the characteristic parameters which are difficult to estimate in the traditional signal processing, trains the preset neural network to obtain a final preset neural network model, thereby obtaining the filtering weight coefficient to carry out the traditional signal processing on the audio sound and steadily ensuring the voice quality. This application combines preset neural network and traditional signal processing, makes both supplement each other, and when the SNR step-down, also can reduce audio input signal's noise, has promoted the speech quality of final audio output.
Drawings
FIG. 1 is a schematic structural diagram of an embodiment of a noise reduction filter apparatus according to the present application;
FIG. 2 is a schematic flow chart diagram illustrating an embodiment of an audio denoising filtering method according to the present application;
FIG. 3 is a specific flowchart of step S101 in FIG. 2;
FIG. 4 is a specific flowchart of step S101 in FIG. 2;
FIG. 5 is a specific flowchart of step S102 in FIG. 2;
FIG. 6 is a specific flowchart of step S401 in FIG. 5;
FIG. 7 is a specific flowchart of step S402 in FIG. 5;
FIG. 8 is a schematic flow chart diagram illustrating another embodiment of the audio denoising and filtering method of the present application;
FIG. 9 is a schematic diagram illustrating an embodiment of an audio noise reduction filtering method according to the present application;
FIG. 10 is a schematic structural diagram of an embodiment of an electronic device of the present application;
FIG. 11 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The present application firstly proposes a noise reduction filter device 100, as shown in fig. 1, fig. 1 is a schematic structural diagram of an embodiment of the noise reduction filter device of the present application, and the noise reduction filter device 100 of the present embodiment includes:
the preset neural network module 110 is configured to obtain a characteristic parameter of the audio input signal; a calculating module 120 connected to the preset neural network module 110, configured to calculate a filtering weight coefficient based on the characteristic parameter; a filter module 130, connected to the computing module 120, configured to process the audio input signal based on the filtering weight coefficient, and obtain a filtered audio signal; the calculating module 120 is further configured to calculate the filtered audio signal and the real signal cost value, send the cost value to the preset neural network module 110, and perform training by using the cost value.
The preset neural network module 110 is configured to train feature parameters of the audio input signal, and input the feature parameters to be processed of the audio input signal into the preset neural network module 110 for training, so as to obtain three processed feature parameters, which are a noise covariance matrix, a received signal covariance matrix, and a prior signal-to-noise ratio.
Alternatively, the preset Neural Network module 110 may adopt various common Neural networks, such as Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and Convolutional Recurrent Neural Network (CRNN).
Taking CNN as a preset neural network module as an example, CNN can be regarded as an end-to-end black box, the middle is a hidden layer, the hidden layer may include a convolutional layer and a pooling layer, one end is an input layer, and the other end is an output layer. When the characteristic parameters to be processed of the audio input signal are input into the input layer, normalization is adopted in the input layer for convenient calculation, and when the convolution layer is arranged in the hidden layer, the characteristic parameters to be processed are subjected to characteristic extraction, so that the characteristics of the original signal are enhanced, and the noise is reduced; the pooling layer in the hidden layer is to reduce the amount of data as much as possible while retaining useful information. And finally, outputting the processed three characteristic parameters of the training at the output layer, wherein the parameters in the hidden layer can be updated during each training.
A calculating module 120, one end of which is connected to the filter module 130, and calculates a correlation coefficient between frames and a filtering weight coefficient by using the noise covariance matrix, the received signal covariance matrix and the prior signal-to-noise ratio, and the other end of which is connected to the preset neural network module 110, and acquires the noise covariance matrix, the received signal covariance matrix and the prior signal-to-noise ratio from the preset neural network module 110, and inputs the audio signal and the real signal filtered by the filter module 130 into the cost function to calculate a cost value, and outputs the cost value to the preset neural network module 110, so that the preset neural network module 110 continues training until the final preset neural network module 110 is obtained.
The filter module 130 is connected to the preset neural network module 110, trains the preset neural network module 110 to obtain the processed three characteristic parameters, calculates a filtering weight parameter based on a formula, inputs the filtering weight parameter into the filter module 130, processes the audio input signal, and obtains the filtered audio signal. The filter module 130 may be any filter in the audio filter, and is not limited herein.
The preset neural network module 110 and the filter module 130 are used for processing the audio input signal in a combined manner, so that the problem that some filtering weight coefficients are difficult to estimate in the traditional signal processing method is solved, the noise reduction effect and the voice quality are effectively balanced, and the final effect of voice processing is improved.
The present application further provides an audio noise reduction filtering method, as shown in fig. 2, fig. 2 is a schematic flowchart of an embodiment of the audio noise reduction filtering method according to the present application, and the method may be applied to the noise reduction filtering apparatus 100, and specifically includes steps S101 to S105:
step S101: and acquiring the characteristic parameters of the audio input signal by using a preset neural network.
Obtaining sample audio, obtaining characteristic parameters of audio input signals to be processed, inputting the characteristic parameters into a preset neural network for training, obtaining the characteristic parameters of the audio input signals after processing from the preset neural network, wherein the obtained characteristic parameters after processing comprise: a noise covariance matrix, a received signal covariance matrix, and a prior signal-to-noise ratio.
Optionally, in this embodiment, step S101 may be implemented by the method shown in fig. 3, and the specific implementation steps include step S201 to step S202:
step S201: based on the audio input signal, a real part and an imaginary part of the audio input signal are obtained.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, wherein Xk,lRepresenting the target signal, Nk,lRepresenting a noise signal, Yk.lRepresenting the audio input signal of the microphone, k representing a frequency point and l a time frame, we omit the notation of the frequency point in the following since the operation of each frequency point is the same.
For noise reduction algorithms, all the noise reduction methods can be regarded as calculating a weight vector for the microphone audio input signal, and recovering the target signal through the weight vector, that is:
Figure BDA0003433891310000051
wlnamely the filter weight coefficients, are,
Figure BDA0003433891310000052
is a filtered audio signal.
In the multi-frame algorithm, equation (1) may be modified as follows:
Figure BDA0003433891310000053
wherein:
yl=[Yl,Yl-1,…,Yl-N+1]T (3)
whereinTRepresents a transpose of a matrixHRepresenting the conjugate transpose of the matrix, wlThe representation manner of (1) is the same as above; n is typically taken to be 4, indicating that 4 frames of the historical frame were taken.
Assuming now that the target signal is not coherent with the noise signal in the signal received by the microphone:
Φy,l=Φx,ln.l (4)
wherein phix,lRepresenting the covariance matrix of the target signal, phin.lRepresenting the noise covariance matrix, Φy,lRepresenting the received signal covariance matrix.
For noise covariance matrix Φn.lAnd the covariance matrix phi of the received signaly,lBased on the audio input signal YlThe real part and the imaginary part of the value of the audio input signal are obtained by using formula (5).
yc,l=[Real(Yl),Imag(Yl)]T (5)
Among them, Real (Y)l) Representing an input signal Y to an audio frequencylTaking the real part, Imag (Y)l) Representing an input signal Y to an audio frequencylTaking the imaginary part, yc,lRepresenting an audio input signal YlMatrix of real and imaginary partsTRepresenting the transpose of the matrix.
Step S202: and obtaining a noise covariance matrix and a received signal covariance matrix based on the real part and the imaginary part.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, based on the audio input signal Y described abovelThe real part and the imaginary part can be arranged according to the Hermite matrix by adopting a mapping mode of a preset neural network to obtain a noise covariance matrix and a received signal covariance matrix, and the estimated values of the noise covariance matrix and the received signal covariance matrix can be obtained by adopting a formula (6) and a formula (7):
Figure BDA0003433891310000061
Figure BDA0003433891310000062
wherein Hermitian {. denotes arranging the values in parentheses in the format of the Hermitian matrix,
Figure BDA0003433891310000063
expressed as an estimate of the received signal covariance matrix,
Figure BDA0003433891310000064
an estimate, y, representing a noise covariance matrixc,lRepresenting an audio input signal YlA matrix of the real part and the imaginary part,
Figure BDA0003433891310000065
different mapping modes of the adopted preset neural network are represented, and the preset neural network can adopt various common neural networks, such as RNN, CNN, CRNN and the like.
Optionally, in this embodiment, step S101 may be implemented by the method shown in fig. 4, and the specific implementation steps include step S301 to step S302:
step S301: the absolute value of the audio input signal is obtained and the base 10 logarithm of the absolute value is calculated.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, based on the audio input signal YlObtaining a log of the logarithm10|YlThe value of | is given.
Step S302: the prior signal-to-noise ratio is obtained based on the logarithm.
Based on the logarithm, a mapping mode of a preset neural network is adopted to obtain a prior signal-to-noise ratio, and an estimated value of the prior signal-to-noise ratio can be obtained by adopting a formula (8):
Figure BDA0003433891310000071
wherein the content of the first and second substances,
Figure BDA0003433891310000072
an estimate representing the signal-to-noise ratio a priori,
Figure BDA0003433891310000073
different mapping modes of the adopted neural network are represented, and the preset neural network can adopt various common neural networks, such as RNN, CNN, CRNN and the like.
Step S102: and calculating a filtering weight coefficient based on the characteristic parameters.
The preset neural network module calculates a filtering weight coefficient through a formula based on the characteristic parameters of the audio input signal, wherein the characteristic parameters of the audio input signal comprise: a noise covariance matrix, a received signal covariance matrix, and a prior signal-to-noise ratio.
Optionally, in this embodiment, step S102 may be implemented by the method shown in fig. 5, and the specific implementation steps include step S401 to step S402:
step S401: and calculating the interframe correlation coefficient based on the noise covariance matrix, the received signal covariance matrix and the prior signal-to-noise ratio.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, it is assumed that the multi-frame target signal can be decomposed as follows:
xl=γx,lXl+x′l (9)
wherein, γx,lXlX 'representing a correlation component existing between signals in a multi-frame signal'lRepresenting non-correlated components, gamma, in a multi-frame signalx,lRepresenting the inter-frame correlation coefficient.
For voice signals, when the related components between the voice signals are ensured, the quality of the voice signals can be ensured.
The calculation module calculates the interframe correlation coefficient based on the estimated values of the noise covariance matrix, the received signal covariance matrix and the prior signal-to-noise ratio acquired by the preset neural network as real values.
Optionally, in this embodiment, step S401 may be implemented by the method shown in fig. 6, and the specific implementation steps include step S501 to step S506:
step S501: and acquiring a sum of the prior signal-to-noise ratio and the reciprocal of the prior signal-to-noise ratio, and acquiring a first product of the sum, the received signal covariance matrix and a preset matrix.
Step S502: and acquiring the transposition of the preset matrix, the covariance matrix of the received signals and a second product of the preset matrix.
Step S503: and acquiring a third product of the reciprocal of the prior signal-to-noise ratio, the noise covariance matrix and the preset matrix.
Step S504: and acquiring the transposition of the preset matrix, the noise covariance matrix and a fourth product of the preset matrix.
Step S505: a first quotient value of the first product and the second product is obtained, and a second quotient value of the third product and the fourth product is obtained.
Step S506: and obtaining a difference value between the first quotient value and the second quotient value to obtain an interframe correlation coefficient, wherein the preset matrix is e ═ 1,0, …,0 ^ T.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, steps S501 to S506 can be implemented by using equation (10):
Figure BDA0003433891310000081
wherein, γx,lExpressed as inter-frame correlation coefficient, ξlRepresenting the a priori signal-to-noise ratio, gammax,lRepresenting a correlation coefficient, phi, between said frame signalsy,lRepresenting the covariance matrix of the received signal, phix,lRepresenting the covariance matrix of the target received signal, phin.lRepresenting the noise covariance matrix, and presetting the matrix as e ═ 1,0, …,0]^T,eTRepresenting the transpose of the predetermined matrix e.
Step S402: and calculating a filtering weight coefficient based on the interframe correlation coefficient and the noise covariance matrix.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, the calculation module calculates the filtering weight coefficient based on the inter-frame correlation coefficient obtained above and an estimated value of the noise covariance matrix obtained by the preset neural network as a true value.
Optionally, in this embodiment, step S402 may be implemented by the method shown in fig. 7, and the specific implementation steps include step S601 to step S603:
step S601: a fifth product of an inverse of the noise covariance matrix and the inter-frame correlation coefficients is obtained.
Step S602: and acquiring a sixth product of the conjugate transpose of the interframe correlation coefficient, the inverse matrix of the noise covariance matrix and the interframe correlation coefficient.
Step S603: and acquiring a third quotient of the fifth product and the sixth product to obtain a filtering weight coefficient.
Model Y for receiving signal by microphonek.l=Xk,l+Nk,lFor example, according to the definition of the minimum variance distortionless response:
Figure BDA0003433891310000091
steps S601 to S603 can be implemented by using equation (12):
Figure BDA0003433891310000092
wherein phin.lRepresenting the covariance of the noise in question,
Figure BDA0003433891310000093
represents phin.lThe inverse of the matrix is the inverse of the matrix,
Figure BDA0003433891310000094
representing estimated values of filter weight coefficients, gammax,lRepresents a correlation coefficient between the frame signals,
Figure BDA0003433891310000095
representing the conjugate transpose of the correlation coefficient between the frame signals.
Step S103: and processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal.
The filter module processes the audio input signal based on the filtering weight coefficient calculated by the calculation module to obtain a filtered audio signal.
Step S104: a cost value is calculated based on the filtered audio signal and the real signal.
The calculation module calculates a cost value based on the filtered audio signal and the real signal.
Step S105: and training the preset neural network by using the cost value.
And the preset neural network module trains the preset neural network by using the cost value until the preset neural network module converges or reaches the preset training times, and the trained preset neural network module processes the subsequent audio input signal.
The present application further provides an audio noise reduction filtering method, as shown in fig. 8, fig. 8 is a schematic flow chart of another embodiment of the audio noise reduction filtering method of the present application, and the specific implementation steps include steps S701 to S706:
step S701: and acquiring the characteristic parameters of the audio input signal by using a preset neural network.
Step S701 is identical to step S101, and is not described again.
Step S702: and calculating a filtering weight coefficient based on the characteristic parameters.
Step S702 is the same as step S102, and is not repeated.
Step S703: and processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal.
Step S703 is identical to step S103, and is not described again.
Step S704: and constructing a cost function for the preset neural network.
Wherein, the cost function adopts formula (13):
Figure BDA0003433891310000101
wherein the content of the first and second substances,
Figure BDA0003433891310000102
which is representative of the true signal or signals,
Figure BDA0003433891310000103
representing the filtered audio signal.
For the neural network, a training target needs to exist when network training is performed, and three training targets exist for the three characteristic parameters, so that the network training is not friendly, so that in the scheme, the three characteristic parameters can be calculated through a formula (12) to obtain a weight coefficient, the weight coefficient is multiplied by an unprocessed signal to obtain a processed signal, and the real signal is used as the training target.
Step S705: and calculating the cost values of the filtered audio signal and the real signal by using the cost function.
And inputting the filtered audio signal and the real signal into a cost function so as to obtain a cost value.
Step S706: and training the preset neural network by using the cost value.
Step S706 is identical to step S105 and will not be described again.
Optionally, the audio noise reduction filtering method of this embodiment further includes step S707:
step S707: and responding to the convergence of a preset neural network, and processing the audio input signal by using the corresponding filtering weight coefficient to obtain a target signal.
When the preset neural network converges or reaches the preset training times, the value of the time is the current minimum value, the trained preset neural network model is obtained, and the audio input signal is processed through the filtering weight coefficient obtained by the preset neural network model at the moment, so that the target signal can be obtained.
In an application scenario, as shown in fig. 9, fig. 9 is a schematic diagram of an implementation of an embodiment of the audio noise reduction filtering method according to the present application. The dotted line part in the figure represents the flow trend that the preset neural network training needs to be added, and the part is not needed in the actual inference process.
As shown in fig. 9, the audio input signal is sent to the preset neural network module 110 for training, so as to obtain three different processed characteristic parameters, i.e. a noise covariance matrix, an input signal covariance matrix, and a priori signal-to-noise ratio; calculating the obtained processed characteristic parameters according to a formula through a calculating module 120 to obtain inter-frame correlation coefficients; calculating a filtering weight coefficient according to the interframe correlation coefficient and the prior signal-to-noise ratio; based on the filtering weight coefficient, the filter module 130 is used to filter and output the signal, and the filtered signal and the real signal are sent to the cost function in the calculation module 120 to calculate the cost value, and are reversely sent to the preset neural network module 110.
Optionally, the present application further proposes an electronic device 200. As shown in fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an electronic device 200 of the present application, where the electronic device 200 includes a processor 201 and a memory 202 connected to the processor 201.
The processor 201 may also be referred to as a CPU (Central Processing Unit). The processor 201 may be an integrated circuit chip having signal processing capabilities. The processor 201 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 202 is used for storing program data required for the processor 201 to operate.
The processor 201 is used for executing the program data stored in the memory 202 to realize that the characteristic parameters of the audio input signal are acquired by utilizing a preset neural network; calculating a filtering weight coefficient based on the characteristic parameters; processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; calculating a cost value based on the filtered audio signal and the real signal; and training the preset neural network by using the cost value.
Optionally, the present application further proposes a computer storage medium 300. As shown in fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a computer storage medium 300 according to the present application.
The computer storage medium 300 of the embodiment of the present application stores therein program instructions 310, and the program instructions 310 are executed to implement: acquiring characteristic parameters of the audio input signal by using a preset neural network; calculating a filtering weight coefficient based on the characteristic parameters; processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal; calculating a cost value based on the filtered audio signal and the real signal; and training the preset neural network by using the cost value.
The program instructions 310 may form a program file stored in the storage medium in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
Different from the situation of the prior art, the audio noise reduction filtering method adopts the mode of combining the preset neural network with the traditional signal processing to process the audio sound, utilizes the advantage that the preset neural network can better solve the characteristic parameters which are difficult to estimate in the traditional signal processing, trains the preset neural network to obtain a final preset neural network model, thereby obtaining the filtering weight coefficient to carry out the traditional signal processing on the audio sound and steadily ensuring the voice quality. This application combines preset neural network and traditional signal processing, makes both complement each other, when the SNR step-down, also can reduce audio input signal's noise, has promoted the speech quality of final audio output.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims (11)

1. An audio noise reduction filtering method, comprising:
acquiring characteristic parameters of the audio input signal by using a preset neural network;
calculating a filtering weight coefficient based on the characteristic parameters;
processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal;
calculating a cost value based on the filtered audio signal and the real signal;
and training the preset neural network by using the cost value.
2. The audio noise reduction filtering method according to claim 1, wherein the feature parameters include: the calculating of the filtering weight coefficient based on the characteristic parameters comprises the following steps:
calculating an interframe correlation coefficient based on the noise covariance matrix, the received signal covariance matrix and the prior signal-to-noise ratio;
calculating a filtering weight coefficient based on the inter-frame correlation coefficient and the noise covariance matrix.
3. The method of claim 2, wherein the computing inter-frame correlation coefficients based on the noise covariance matrix, the received signal covariance matrix, and the prior signal-to-noise ratio comprises:
acquiring a sum of the prior signal-to-noise ratio and a reciprocal of the prior signal-to-noise ratio, and acquiring a first product of the sum, the received signal covariance matrix and a preset matrix;
acquiring a second product of the transposition of the preset matrix, the covariance matrix of the received signals and the preset matrix;
obtaining a third product of the reciprocal of the prior signal-to-noise ratio, the noise covariance matrix and the preset matrix;
acquiring a transposition of the preset matrix, a fourth product of the noise covariance matrix and the preset matrix;
obtaining a first quotient value of the first product and the second product, and obtaining a second quotient value of the third product and the fourth product;
obtaining a difference value between the first quotient value and the second quotient value to obtain the inter-frame correlation coefficient;
wherein the preset matrix is e ═ 1,0, …,0 ^ T.
4. The method of claim 3, wherein the calculating the filtering weight coefficient based on the inter-frame correlation coefficient and the noise covariance matrix comprises:
acquiring a fifth product of an inverse matrix of the noise covariance matrix and the inter-frame correlation coefficient;
acquiring a sixth product of the conjugate transpose of the interframe correlation coefficient, the inverse matrix of the noise covariance matrix and the interframe correlation coefficient;
and acquiring a third quotient of the fifth product and the sixth product to obtain the filtering weight coefficient.
5. The audio noise reduction filtering method according to claim 1, further comprising:
constructing a cost function for the preset neural network;
said calculating a cost value based on said filtered audio signal and a real signal comprising:
calculating a cost value of the cost function based on the filtered audio signal and the real signal;
wherein, the cost function adopts a formula:
Figure FDA0003433891300000021
wherein the content of the first and second substances,
Figure FDA0003433891300000022
which is representative of the true signal or signals,
Figure FDA0003433891300000023
representing the filtered audio signal.
6. The audio noise reduction filtering method according to claim 1, further comprising:
and responding to the convergence of the preset neural network, and processing the audio input signal by using the corresponding filtering weight coefficient to obtain a target signal.
7. The audio noise reduction filtering method according to claim 2, wherein the obtaining the characteristic parameters of the audio input signal by using the preset neural network comprises:
acquiring a real part and an imaginary part of the audio input signal based on the audio input signal;
and acquiring the noise covariance matrix and the received signal covariance matrix based on the real part and the imaginary part.
8. The audio noise reduction filtering method according to claim 2, wherein the obtaining the characteristic parameters of the audio input signal by using the preset neural network comprises:
acquiring an absolute value of the audio input signal, and calculating a base-10 logarithm of the absolute value;
obtaining the prior signal-to-noise ratio based on the logarithm.
9. A noise reduction filter apparatus, comprising:
the preset neural network module is used for acquiring characteristic parameters of the audio input signal;
the calculation module is connected with the preset neural network module and used for calculating a filtering weight coefficient based on the characteristic parameters;
the filter module is connected with the calculation module and used for processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal;
the calculation module is further configured to calculate the filtered audio signal and a real signal cost value, send the cost value to the preset neural network module, and perform training by using the cost value.
10. An electronic device, comprising a processor and a memory coupled to the processor, wherein the memory stores program data therein, and the processor executes the program data stored in the memory to perform operations to:
acquiring characteristic parameters of the audio input signal by using a preset neural network;
calculating a filtering weight coefficient based on the characteristic parameters;
processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal;
calculating a cost value based on the filtered audio signal and the real signal;
and training the preset neural network by using the cost value.
11. A computer storage medium having stored therein program instructions that are executed to implement:
acquiring characteristic parameters of the audio input signal by using a preset neural network;
calculating a filtering weight coefficient based on the characteristic parameters;
processing the audio input signal based on the filtering weight coefficient to obtain a filtered audio signal;
calculating a cost value based on the filtered audio signal and the real signal;
and training the preset neural network by using the cost value.
CN202111605349.4A 2021-12-25 2021-12-25 Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium Pending CN114495960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111605349.4A CN114495960A (en) 2021-12-25 2021-12-25 Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111605349.4A CN114495960A (en) 2021-12-25 2021-12-25 Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114495960A true CN114495960A (en) 2022-05-13

Family

ID=81496570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111605349.4A Pending CN114495960A (en) 2021-12-25 2021-12-25 Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114495960A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030821A (en) * 2023-03-27 2023-04-28 北京探境科技有限公司 Audio processing method, device, electronic equipment and readable storage medium
WO2023240887A1 (en) * 2022-06-14 2023-12-21 青岛海尔科技有限公司 Dereverberation method and apparatus, device, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023240887A1 (en) * 2022-06-14 2023-12-21 青岛海尔科技有限公司 Dereverberation method and apparatus, device, and storage medium
CN116030821A (en) * 2023-03-27 2023-04-28 北京探境科技有限公司 Audio processing method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US8325909B2 (en) Acoustic echo suppression
US20150163587A1 (en) Audio Information Processing Method and Apparatus
CN114495960A (en) Audio noise reduction filtering method, noise reduction filtering device, electronic equipment and storage medium
CN111128221B (en) Audio signal processing method and device, terminal and storage medium
CN110265054B (en) Speech signal processing method, device, computer readable storage medium and computer equipment
CN110289009B (en) Sound signal processing method and device and interactive intelligent equipment
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN112489670B (en) Time delay estimation method, device, terminal equipment and computer readable storage medium
WO2013121749A1 (en) Echo canceling apparatus, echo canceling method, and telephone communication apparatus
Gil-Cacho et al. Nonlinear acoustic echo cancellation based on a parallel-cascade kernel affine projection algorithm
CN109215672B (en) Method, device and equipment for processing sound information
CN113744748A (en) Network model training method, echo cancellation method and device
CN112201273A (en) Noise power spectral density calculation method, system, equipment and medium
US8515096B2 (en) Incorporating prior knowledge into independent component analysis
CN110021289B (en) Sound signal processing method, device and storage medium
CN116705045A (en) Echo cancellation method, apparatus, computer device and storage medium
CN112929506A (en) Audio signal processing method and apparatus, computer storage medium, and electronic device
CN111370016A (en) Echo cancellation method and electronic equipment
CN112242145A (en) Voice filtering method, device, medium and electronic equipment
CN111010566A (en) Non-local network-based video compression distortion restoration method and system
US11924367B1 (en) Joint noise and echo suppression for two-way audio communication enhancement
WO2023093292A1 (en) Multi-channel echo cancellation method and related apparatus
CN110931038B (en) Voice enhancement method, device, equipment and storage medium
WO2022247427A1 (en) Signal filtering method and apparatus, storage medium and electronic device
CN112397080B (en) Echo cancellation method and apparatus, voice device, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination