CN110970050A

CN110970050A - Voice noise reduction method, device, equipment and medium

Info

Publication number: CN110970050A
Application number: CN201911330762.7A
Authority: CN
Inventors: 栾天祥; 冯大航; 陈孝良; 常乐
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-04-07
Anticipated expiration: 2039-12-20
Also published as: CN110970050B

Abstract

The disclosure provides a voice noise reduction method, a voice noise reduction device, equipment and a medium, and belongs to the technical field of voice. According to the method and the device, gain modulation information of the voice to be denoised is obtained according to the probability distribution of the voice to be denoised and the modulation gain function corresponding to each noise type, and then denoising is carried out on the voice to be denoised according to the gain modulation information. Because different noise time domain characteristics, frequency domain characteristics and the like of different noise types need different noise reduction processes, compared with the method for reducing the noise of the voice to be reduced by using the same noise reduction algorithm, the voice noise reduction method takes the situation that the voice to be reduced contains different noise types of noise into consideration, and obtains different gain modulation information according to the different noise types of the voice to be reduced containing the noise, thereby distinguishing the noise reduction processes of different voices to be reduced, reducing the noise more specifically, improving the noise reduction effect, and being beneficial to improving the awakening rate and the recognition rate of the voice after the noise reduction.

Description

Voice noise reduction method, device, equipment and medium

Technical Field

The present disclosure relates to the field of speech technologies, and in particular, to a method, an apparatus, a device, and a medium for speech noise reduction.

Background

In recent years, with the continuous development of voice technology, intelligent voice interaction systems such as intelligent sound boxes and vehicle-mounted voice interaction systems are continuously popularized, the intelligent voice interaction systems receive voice of a user, and the received voice is recognized by using an intelligent voice recognition technology to realize human-computer interaction. In practical use, different types of noise are often mixed in the voice of the user received by the intelligent voice interaction system, for example, the voice received by the vehicle-mounted voice interaction system may have wind noise generated when a vehicle runs through a window or noise generated when rains knock glass. These noises affect the wake-up rate and recognition rate of the intelligent voice interactive system, so that the voices need to be subjected to noise reduction processing firstly.

In the related art, the speech noise reduction method of the noise reduction processing is generally: and performing self-adaptive filtering on the voice by using a filter with a higher order number to finish the noise reduction step.

In the method, the same noise reduction algorithm is used for noise reduction of the voice, the condition that the voice comprises different noise types of noise is not considered, the noise reduction effect is poor, the voice awakening rate and the recognition rate after noise reduction are low, and the accuracy of the intelligent voice interaction system is reduced.

Disclosure of Invention

The embodiment of the disclosure provides a voice noise reduction method, a voice noise reduction device, voice noise reduction equipment and a voice noise reduction medium, which can solve the problems that in the related art, the same noise reduction algorithm is used for noise reduction of voice, the noise reduction effect is poor, and the voice awakening rate and the recognition rate after noise reduction are low. The technical scheme is as follows:

in one aspect, a method for reducing noise in speech is provided, the method comprising:

determining probability distribution of the voice to be subjected to noise reduction according to the frequency domain characteristics of the voice to be subjected to noise reduction, wherein the probability distribution is used for representing the probability that the type of noise in the voice to be subjected to noise reduction is each of a plurality of noise types;

acquiring gain modulation information of the voice to be denoised based on the probability distribution of the voice to be denoised and a modulation gain function corresponding to each noise type;

and denoising the voice to be denoised according to the gain modulation information.

In one possible implementation manner, the determining, according to the frequency domain feature of the speech to be noise-reduced, a probability distribution of the speech to be noise-reduced includes:

determining a probability function corresponding to each noise type corresponding to the voice to be denoised based on the frequency domain characteristics of the voice to be denoised;

obtaining the prior probability corresponding to each noise type;

and acquiring the probability that the type of noise in the voice to be denoised is the type of each noise according to the probability function corresponding to each noise type corresponding to the voice to be denoised and the prior probability, and obtaining the probability distribution of the voice to be denoised.

In one possible implementation of the method according to the invention,

the obtaining, according to the probability function corresponding to each noise type corresponding to the speech to be noise-reduced and the prior probability, the probability that the type of noise in the speech to be noise-reduced is the noise type, includes:

for each noise type, obtaining the product of a probability function corresponding to the noise type and a prior probability;

acquiring the sum of the products corresponding to all the noise types;

and acquiring a ratio between the product corresponding to the noise type and the sum, and taking the ratio as the probability that the type of the noise in the voice to be denoised is the noise type.

In one possible implementation, the method further includes:

and determining a probability function and a prior probability corresponding to the noise type based on the voice to be subjected to noise reduction and at least one historical voice to be subjected to noise reduction.

In a possible implementation manner, the obtaining gain modulation information of the speech to be noise reduced based on the probability distribution of the speech to be noise reduced and a modulation gain function corresponding to each noise type includes:

for each noise type, acquiring the probability that the type of noise in the voice to be denoised in the probability distribution is the product of the probability of the noise type and a modulation gain function corresponding to the noise type;

and acquiring the sum of the products corresponding to all the noise types, and taking the sum as the gain modulation information of the voice to be denoised.

In one possible implementation manner, the determining, according to the frequency domain characteristics of the speech to be noise-reduced, the probability distribution of the speech to be noise-reduced includes;

inputting the voice to be denoised into a probability distribution determining model, extracting the frequency domain characteristics of the voice to be denoised by the probability distribution determining model, and determining and outputting the probability distribution of the voice to be denoised according to the frequency domain characteristics and different noise frequency domain distribution rules.

In one possible implementation, the obtaining of the probability distribution determination model includes:

training the initial model based on a plurality of voice samples to obtain a candidate probability distribution determination model;

and based on the target quantization parameter, carrying out quantization processing on the model parameter in the candidate probability distribution determination model to obtain the probability distribution determination model.

In a possible implementation manner, the quantizing the model parameters in the candidate probability distribution determining model based on the target quantization parameter to obtain the probability distribution determining model includes:

determining each hidden layer of a model for the probability distribution, and acquiring the ratio of the weight with the maximum absolute value in the weight matrix of the hidden layer to a target value as quantization amplitude;

and obtaining the product of the quantization amplitude and any weight in the weight matrix as the quantized weight corresponding to any weight to obtain the probability distribution determination model.

In one possible implementation manner, the denoising the speech to be denoised according to the gain modulation information includes:

acquiring the wave beam forming filtering information of the voice to be denoised according to the gain modulation information;

and carrying out filtering processing on the voice to be denoised according to the wave beam forming filtering information. .

In one aspect, an apparatus for speech noise reduction is provided, the apparatus comprising:

the determining module is used for determining the probability distribution of the voice to be subjected to noise reduction according to the frequency domain characteristics of the voice to be subjected to noise reduction, wherein the probability distribution is used for representing the probability that the type of noise in the voice to be subjected to noise reduction is each of a plurality of noise types;

the acquisition module is used for acquiring gain modulation information of the voice to be denoised based on the probability distribution of the voice to be denoised and a modulation gain function corresponding to each noise type;

and the noise reduction module is used for reducing the noise of the voice to be subjected to noise reduction according to the gain modulation information.

In one possible implementation, the determining module is further configured to:

obtaining the prior probability corresponding to each noise type;

In one possible implementation manner, the obtaining module is further configured to:

acquiring the sum of the products corresponding to all the noise types;

In one possible implementation, the determining module is further configured to;

In one possible implementation of the method according to the invention,

the obtaining module is further configured to obtain beamforming filtering information of the voice to be denoised according to the gain modulation information;

the device further comprises a filtering module, configured to perform filtering processing on the speech to be denoised according to the beamforming filtering information.

In one aspect, a computer device is provided and includes one or more processors and one or more memories, where at least one instruction is stored in the one or more memories, and the instruction is loaded and executed by the one or more processors to implement the operations performed by the voice noise reduction method.

In one aspect, a computer-readable storage medium is provided, and at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement the operations performed by the above-mentioned voice noise reduction method.

The beneficial effects brought by the technical scheme provided by the embodiment of the disclosure at least can include:

according to the technical scheme provided by the embodiment of the disclosure, gain modulation information of the voice to be denoised is obtained according to the probability distribution of the voice to be denoised and the modulation gain function corresponding to each noise type, and then the voice to be denoised is denoised according to the gain modulation information. Because different noise time domain characteristics, frequency domain characteristics and the like of different noise types need different noise reduction processes, compared with the method for reducing the noise of the voice to be reduced by using the same noise reduction algorithm, the voice noise reduction method takes the situation that the voice to be reduced contains different noise types of noise into consideration, and obtains different gain modulation information according to the different noise types of the voice to be reduced containing the noise, thereby distinguishing the noise reduction processes of different voices to be reduced, reducing the noise more specifically, improving the noise reduction effect, and being beneficial to improving the awakening rate and the recognition rate of the voice after the noise reduction.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a schematic diagram of a speech noise reduction system provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method for reducing noise in speech according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a noise detection deep neural network model provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of a noise detection algorithm provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart of a noise suppression array signal processing algorithm provided by an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a speech noise reduction apparatus provided in an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a speech noise reduction system provided by an embodiment of the present disclosure, and referring to fig. 1 (a), the speech noise reduction system may include a speech acquisition device 110 and a computer device 120, or may be the computer device 120 alone.

When the voice noise reduction system includes the voice capturing device 110 and the computer device 120, the voice capturing device 110 may be connected to the computer device 120 through a network or a data line. The voice collecting device 110 may have a voice collecting function, and may collect voices to be reduced in various environments, specifically, the various environments may include a compartment of a driving vehicle. The computer device 120 may have a data processing function, and may perform noise reduction processing on the speech to be noise reduced acquired by the speech acquisition device 110.

In one possible implementation, as shown in fig. 1 (b), the voice noise reduction system may be an in-vehicle environment, the computer device 120 may be a voice smart terminal embedded device installed on a central control, and the voice collecting device 110 may include some or all of a micro microphone array installed on a vehicle window, a raindrop pressure sensor installed on a vehicle window, a distributed microphone array installed at a main driving location, or a distributed microphone array installed at a riding location, which is not limited by the embodiment of the present disclosure.

When the speech noise reduction system only includes the computer device 120, the computer device 120 may have a speech acquisition function and a data processing function, and the computer device 120 may acquire the speech to be noise reduced in various environments and perform noise reduction processing on the speech to be noise reduced.

In one possible implementation manner, the computer device 120 may be a terminal, and may also be a server, which is not limited in this disclosure.

Fig. 2 is a flowchart of a voice denoising method provided in an embodiment of the present disclosure, and referring to fig. 2, the method includes:

201. the computer device obtains the voice to be denoised.

In the embodiment of the present disclosure, the computer device may be a terminal or a server. The voice to be denoised can be voice with noise acquired in various scenes, for example, in vehicles such as automobiles, ships, airplanes and the like, the acquired voice to be denoised can comprise wind noise in automobile windowing high-speed driving or rain noise in automobile rain-day driving; in a home environment, the acquired voice to be denoised may include noise of a television or rotation noise of a washing machine, which is not limited in the embodiment of the present disclosure.

The way of acquiring the speech to be denoised by the computer device can be various, and in one possible implementation, the acquiring process can include any one of the following ways one to three:

in the first mode, the computer equipment directly acquires the voice to be denoised.

The computer device can have a voice collecting function, and the computer device can directly collect the voice to be denoised.

And secondly, the computer equipment acquires the voice to be denoised, which is acquired by the voice acquisition equipment.

The computer device may be connected to the voice collecting device through a network or a data line to obtain the voice to be denoised, which is collected by the voice collecting device, and the voice collecting device may be any kind of device having a voice collecting function, which is not limited in the embodiments of the present disclosure.

And thirdly, the computer equipment can extract the voice to be denoised from the database.

In the third mode, the speech to be denoised can be stored in a database, and when the computer device needs to process the speech to be denoised, the speech to be denoised is extracted from the database.

202. And the computer equipment determines the probability distribution of the voice to be subjected to noise reduction according to the frequency domain characteristics of the voice to be subjected to noise reduction.

The probability distribution is used for representing the probability that the type of noise in the voice to be denoised is each of a plurality of noise types, the probability distribution considers the situation that the voice to be denoised contains different noise types of noise, and the used data is comprehensive.

The computer device may determine probability distribution of the speech to be noise-reduced in multiple ways, and in one possible implementation manner, the speech to be noise-reduced may have multiple noise types, and the computer device may determine, based on the frequency domain feature of the speech to be noise-reduced, a probability function corresponding to each noise type corresponding to the speech to be noise-reduced, obtain a prior probability corresponding to each noise type, and obtain, according to the probability function corresponding to each noise type corresponding to the speech to be noise-reduced and the prior probability, a probability that the type of noise in the speech to be noise-reduced is the noise type, so as to obtain the probability distribution of the speech to be noise-reduced. The plurality of noise types may be set by a technician according to a requirement, and the embodiment of the present disclosure does not limit this.

In a possible implementation manner, the determining, by the computer device, a probability function corresponding to each noise type corresponding to the speech to be noise-reduced according to the frequency domain feature based on the speech to be noise-reduced, and after obtaining a prior probability corresponding to each noise type, a determination process of determining the probability distribution may specifically be: for each noise type, the computer device may obtain a product of a probability function and a prior probability corresponding to the noise type, obtain a sum of the products corresponding to all noise types, obtain a ratio between the product corresponding to the noise type and the sum, and use the ratio as a probability that a type of noise in the speech to be denoised is the noise type.

For example, the computer device may implement the above process by the following formula one:

the formula I is as follows: r is_ik＝p_k*M(x_i|m_k,s_k)/sum_j{p_j*M(x_i|m_j,s_j)}

In formula one, x_iRepresenting the ith voice to be denoised, i being the identifier of the voice to be denoised, p_kRepresenting the prior probability of the noise type k, p_jRepresenting the prior probability of the noise type j, k being the noise type identifier, M (x)_i|m_k,s_k) Representing the probability function for the noise type k, where m_kFirst order statistic, s, representing a probability function corresponding to noise type k_kSecond order statistic representing probability function corresponding to noise type k, corresponding to M (x)_i|m_j,s_j) Representing the probability function for the noise type k, where m_jFirst order statistic, s, representing a probability function corresponding to noise type j_jRepresenting a second order statistic of a probability function corresponding to a noise type j, r_ikRepresents the probability that the noise type of the ith voice to be denoised is k,/represents the division number sum represents the summation function, sum_jIndicates summing all noise types, and indicates a multiplier.

The probability function and the prior probability corresponding to different noise types may be set by a technician according to requirements, or may be changed based on the acquired multiple voices to be denoised. In one possible implementation, the computer device may determine a probability function and a prior probability corresponding to the noise type based on the speech to be noise-reduced and at least one historical speech to be noise-reduced. The determining process can adjust the probability function and the prior probability corresponding to the noise type according to the actual situation, so that the probability function and the prior probability corresponding to the noise type are more accurate, and more accurate probability distribution can be obtained.

For example, the computer device may also be implemented in other manners, for example, directly based on a ratio between the prior probability corresponding to each noise type and a sum of the prior probabilities corresponding to all the noise types, and taking the ratio as a probability that the type of noise in the speech to be denoised is the noise type. The embodiment of the present disclosure does not limit what specific implementation manner is adopted.

In a specific possible implementation manner, the probability function and the prior probability corresponding to the noise type may be updated in real time in the process of collecting the speech to be noise-reduced and reducing the noise thereof, and the process of obtaining the distribution probability corresponding to the noise type by the computer device based on the noise type of the speech to be noise-reduced may include the following steps one to three:

step one, for the voice to be denoised acquired at the previous moment, for each noise type, the computer equipment acquires the product of the probability function corresponding to the noise type and the prior probability, acquires the sum of the products corresponding to all the noise types, acquires the ratio between the product corresponding to the noise type and the sum, and takes the ratio as the probability that the type of the noise in the voice to be denoised acquired at the previous moment is the noise type.

And step two, for each noise type, the computer equipment updates the probability function and the prior probability corresponding to the noise type based on the probability of the noise type, the voice to be subjected to noise reduction acquired at the current moment and the voice to be subjected to noise reduction acquired before the current moment.

And step three, for the voice to be denoised acquired at the current moment, for each noise type, the computer equipment acquires the product of the probability function corresponding to the noise type and the prior probability based on the updated probability function and the prior probability of the noise classification, acquires the sum of the products corresponding to all the noise types, acquires the ratio between the product corresponding to the noise type and the sum, and takes the ratio as the probability that the type of noise in the voice to be denoised acquired at the current moment is the noise type.

Specifically, the step one to the step three can be realized by iterative optimization calculation of the formula one and the formula two to the formula four.

The formula II is as follows: p is a radical of_k＝sum_i{r_ik}

The formula III is as follows: m is_k＝sum_i{r_ik*x_i}/sum_i{r_ik}

The formula four is as follows: s_k＝sum_i{r_ik(x_i-m_k)²}/sum_i{r_ik}

In formulas two to four, x_iRepresenting the ith voice to be denoised, i being the identifier of the voice to be denoised, p_kRepresenting the prior probability of the noise type k, p_jRepresenting the prior probability of the noise type j, k being the noise type identifier, M (x)_i|m_k,s_k) Representing the probability function for the noise type k, where m_kFirst order statistic, s, representing a probability function corresponding to noise type k_kSecond order statistic, r, representing a probability function corresponding to noise type k_ikRepresents the probability that the noise type of the ith speech to be denoised is k,/represents the division number, ()²Representing the quadratic term, sum represents the sum function, sum_jRepresentation summing all noise types_iRepresenting r corresponding to all speech to be denoised_ikAnd (6) summing.

In a possible implementation manner, the computer device may input the speech to be noise-reduced into the probability distribution determination model, extract the frequency domain feature of the speech to be noise-reduced from the probability distribution determination model, and determine and output the probability distribution of the speech to be noise-reduced according to the frequency domain feature and different noise frequency domain distribution rules.

For example, the probability distribution determination model may perform the calculation processing of the above formula one to formula four on the input speech to be denoised, or may perform other calculations, which is not limited in the embodiment of the present disclosure.

In a possible implementation manner, the probability distribution determination model may be a quantized model, so that when the probability distribution determination model is used to process the speech to be denoised, the computation amount is small, the speed is high, and the processing efficiency is high.

Specifically, before the computer device determines the model of the probability distribution of the input speech to be denoised, the following steps one to two may be performed:

step one, training an initial model by computer equipment based on a plurality of voice samples to obtain a candidate probability distribution determination model.

Wherein, the voice sample comprises noise and voice signal. The plurality of voice samples can be collected in a specific environment, and the probability distribution determining model is trained to perform the probability distribution determining step on the voice to be denoised collected in the specific environment. The plurality of voice samples may also be collected in a plurality of environments, and the probability distribution determination model is obtained through training, so that the step of determining the probability distribution of the voice to be denoised collected in various environments is performed.

In one possible implementation, the training process of the probability distribution determination model may be: the computer equipment inputs the voice samples into an initial model, outputs the prediction probability distribution of each voice sample, adjusts the model parameters of the initial model based on the similarity between the prediction probability distribution and the target probability distribution of each voice sample, and stops adjusting until the target conditions are met to obtain the probability distribution determination model. The target condition may be that the accuracy of the probability distribution output by the model is greater than an accuracy threshold, or may be that the similarity converges, or the number of iterations reaches a target number, which is not limited in the embodiments of the present disclosure.

Of course, the above description only takes the case that the probability distribution determining model has the feature extraction function as an example, and the computer device may also input the frequency domain feature of the signal into the probability distribution determining model, and process the frequency domain feature by the probability distribution determining model to obtain the probability distribution, then the above process may be: the computer equipment extracts the frequency domain characteristics of the voice to be denoised, inputs the frequency domain characteristics into a probability distribution determining model, and determines and outputs the probability distribution of the voice to be denoised according to the frequency domain characteristics and different noise frequency domain distribution rules by the probability distribution determining model. The embodiment of the present disclosure does not limit what specific implementation manner is adopted.

In one possible implementation, the probability distribution determination model may be a deep neural network model, and in particular, the deep neural network model may include two fully connected layers, a Long Short Term Memory (LSTM) hidden layer, and a normalization (softmax) layer. The computer device may construct a training set, which may include noise of the labeled plurality of speech samples and frequency domain features of the speech signal, and train to obtain the probability distribution determination model based on the training set. In one possible implementation, the computer device may perform pre-noise reduction on the plurality of noises in the training set, and the pre-noise reduction may be performed in various manners, for example, an echo cancellation (echo cancellation) algorithm may be used, which is not limited by the embodiment of the disclosure. For example, fig. 3 is a schematic structural diagram of a noise detection deep neural network model provided by an embodiment of the present disclosure, and refer to fig. 3. This fig. 3 includes two Fully-connected layers (full-connected layers), a Long Short Term Memory (LSTM) hidden layer, which is modeled with eye selection (with temporal optimization), and a normalization layer (softmaxlayer).

The structure of the deep neural network model can be modified and replaced by a technician according to requirements, for example, the feature input dimension, the number of nodes, the type of hidden layer and the type of activation function of each layer can be changed according to requirements, which is not limited by the embodiment of the disclosure.

And secondly, the computer equipment carries out quantization processing on model parameters in the candidate probability distribution determination model based on the target quantization parameters to obtain the probability distribution determination model.

In the model quantization process, the original model can be compressed by reducing the bit number required for representing each weight, so that the required consumed computing resource is reduced, and the computing speed of the model is improved. The target quantization parameter may refer to a target bit number, for example, 16 bits or 8 bits, which the model quantization step intends to quantize the model, that is, 16 bits (bits) or 8 bits may be quantized on the candidate probability distribution determination model, so as to obtain different calculation data types. The computer device may perform step two based on a plurality of quantization methods, such as a tensor flow graph (tensoroflow) quantization method, which is not limited by the embodiments of the present disclosure.

In one possible implementation, the target quantization method may include: the computer equipment determines each hidden layer of the model for the probability distribution, and obtains the ratio of the weight with the maximum absolute value in the weight matrix of the hidden layer to a target value as the quantization amplitude; and obtaining the product of the quantization amplitude and any weight in the weight matrix as the quantized weight corresponding to the any weight to obtain the probability distribution determination model. The target value may be obtained by the target quantization parameter, for example, the target quantization parameter is 8 bits, and the target value may be 127.

In a specific possible implementation manner, the probability distribution determination model is a deep neural network model, and the computer device in step two may use a virtual Dynamic (Dummy Dynamic) quantization method, that is, the following model quantization is implemented by the following formula five and formula six:

the formula five is as follows: w_scale＝max{abs(W(i))}/127

Formula six: b_scale＝W_scale*F_scale

In the first formula and the second formula, W represents a weight matrix of any hidden layer in the probability distribution determination model, W (i) represents any weight in the weight matrix W, i is a mark of the weight, abs represents an absolute value function, max represents a maximum value solving function, W_scaleTo quantize the level, b_scaleRepresenting a residual vector, F_scaleRepresents any input weight, represents a multiplication sign, and represents a division sign.

203. And the computer equipment acquires gain modulation information of the voice to be denoised based on the probability distribution of the voice to be denoised and the modulation gain function corresponding to each noise type.

The frequency domain characteristics of different types of noise are different, and the corresponding noise reduction method can be used for more effectively and more accurately reducing noise. In step 203, when the computer device obtains the gain modulation information of the speech to be noise-reduced, the probability distribution of the speech to be noise-reduced and the modulation gain function corresponding to each noise type are considered, so that if the types of the noise in the speech to be noise-reduced are different, different value-added modulation information may be obtained. In one possible implementation, the gain modulation information may include gain modulation values at different frequency points.

In one possible implementation, for each noise type, the computer device may obtain a product of a probability that a type of noise in the speech to be noise-reduced in the probability distribution is the noise type and a modulation gain function corresponding to the noise type, obtain a sum of the products corresponding to all the noise types, and use the sum as the gain modulation information of the speech to be noise-reduced.

In this way, the product is the combination of the probability that the type of noise in the speech to be noise-reduced is the noise type and the modulation gain function corresponding to the noise type, taking into account both, and the sum of the products corresponding to all the noise types comprehensively takes into account the fact that the noise in the speech to be noise-reduced is each of the plurality of noise types, so that the gain modulation information of the speech to be noise-reduced can be comprehensively and accurately represented.

In one possible implementation, the above process may be implemented by the following formula seven:

the formula seven: a is_if＝sum_k{r_ik*A_k(f)}

Wherein, a_ifRepresenting the gain modulation value on each frequency point of the voice signal, i is the mark of the voice to be denoised, f is the mark of the frequency point, sum represents the summation function, A_k(f) And (4) representing a signal frequency domain modulation gain function on the type k, wherein k is a noise type identifier.

In one possible implementation, the A_k(f) The method can be obtained by training a decision tree based on a Recurrent Neural Network (RNN), and the specific process can be realized by the following formulas eight to nine.

The formula eight: g (Q)_t,A_f)＝sum_k{n_k/N_t*H(Q_k(A_f))}

The formula is nine: a. the_f＝argmin(G(Q_t,A_f))

Q_tRepresenting a set of signal frequency domain features, n, at a node of a current decision tree_kRepresenting the number of data on branch k of the decision tree, k being the molecular identifier of the decision tree, N_tRepresenting the total data size of the decision tree at time t, t being the time stamp, A_fRepresents the gain corresponding to frequency point f, where f is the frequency point identifier, argmin () represents the function that makes a certain functional take the minimum value, G (Q)_t,A_f) The objective function representing the RNN decision tree, and H () representing the objective function of the decision tree, may have various forms, for example, may be a data classification impurity function, and this is not limited in this disclosure.

In a specific possible implementation manner, after obtaining the probability distribution of the speech to be noise-reduced through the iterative optimization calculation of the formula one to formula four, the computer device may obtain the gain modulation value at each frequency point of the speech to be noise-reduced in the noise reduction algorithm based on the formula seven.

The above steps 201 to 203 are steps of acquiring, by the computer device, the voice to be noise-reduced, and acquiring, based on the voice to be noise-reduced, gain modulation information of the voice to be noise-reduced. In a specific example, the process may be as shown in fig. 4, and fig. 4 is a flow chart of a noise detection algorithm provided by the embodiment of the disclosure, see fig. 4. In the flow chart, data, namely, voice to be denoised, is acquired by a sensor, then the voice to be denoised is input into a data validity detection model, a detection result is output by the data validity detection model, the data validity detection model can be used for detecting whether the voice to be denoised is valid and can be used for identifying, when the detection result is yes, noise or voice type detection can be performed and probability distribution of the voice to be denoised is acquired, finally, parameters of a denoising algorithm are updated based on the probability distribution, and the parameters of the denoising algorithm are gain modulation information of the voice to be denoised.

The noise detection algorithm can effectively analyze noise types, optimize noise reduction algorithm parameters, namely, obtain gain modulation information of the voice to be noise reduced, enhance noise reduction effect, particularly obtain corresponding gain modulation information according to judgment of different noise types, such as sound of raindrops hitting vehicle windows and noise when the vehicle windows are opened at high speed, effectively control switching of the noise reduction algorithm parameters, enhance noise reduction effect,

204. and the computer equipment carries out noise reduction on the voice to be subjected to noise reduction according to the gain modulation information.

In step 204, the computer device may perform noise reduction on the speech to be noise-reduced according to the gain modulation information obtained in the above step, considering that the speech to be noise-reduced includes different noise types of noise, so as to effectively improve the noise reduction effect.

Specifically, the computer device may perform noise reduction on the speech to be noise reduced in various ways, for example, the computer may use a noise reduction model or a noise reduction algorithm. The computer device implements a noise reduction process on the speech to be noise reduced through a noise reduction algorithm, and there are various noise reduction algorithms that can be used, for example, the LMS adaptive filter noise reduction, the basic spectral subtraction, and the like, which is not limited in this disclosure.

In a possible implementation manner, the computer device may obtain beamforming filtering information of the speech to be noise-reduced according to the gain modulation information, and perform filtering processing on the speech to be noise-reduced according to the beamforming filtering information.

The beamforming filter information may include various contents, for example, parameter information required for filtering processing may be included, and a beamforming filter vector may also be included, which is not limited in this disclosure.

The above process may have multiple implementation manners, and in one possible implementation manner, after the computer device obtains the beamforming filtering information of the speech to be noise-reduced acquired at the previous time according to the gain modulation information, the computer device may update the beamforming filtering information according to the speech to be noise-reduced at the current time and the speech to be noise-reduced acquired at the previous time, and obtain the updated beamforming filtering information, that is, the beamforming filtering information corresponding to the speech to be noise-reduced at the current time. The computer device may perform filtering processing on the voice to be denoised acquired at the current time based on the updated beamforming filtering information until the output of the filtering processing meets a target condition, and output the voice to be denoised after filtering. For example, the target condition may be convergence of an objective function, which is not limited by the embodiment of the present disclosure.

In one possible implementation, the above process may be implemented by a microphone array noise reduction algorithm. For the speech to be noise reduced, the computer device may represent the signal to be noise reduced as the following formula ten. The beamforming filter information may include beamforming iteration optimization parameters, such as s (k), g (k), and p (k), and a beamforming filter vector iteration algorithm shown in equation fourteen below. The process of updating the beamforming filtering information by the computer device according to the speech to be denoised at the current moment and the speech to be denoised acquired at the previous moment can be realized by the following formulas eleven to thirteen. UpdatingAfter the completion, the computer device may perform a filtering process on the speech to be denoised based on the updated beamforming filtering information by using the following formula fourteen. The target condition may be an objective function sum_k{u_K ^(K-k)|y_k|²When the filtering output makes the target function in the convergence state, the computer device can output the time domain audio signal Y (K) after the current noise reduction, and the time domain audio signal can be used for voice recognition and voice awakening. Wherein, the formula ten to the formula fourteen are specifically:

formula ten: y (K) ([ y (1, K) ], y (F, K)]^T

Formula eleven: g (K) ═ P (K-1) y (K)/(u)_K+y(K)^TP(K-1)y(K))

Equation twelve: p (K) ═ 1/u (P (K-1) -g (K) y (K)^TP(K-1))

Formula thirteen: s (k) ═ v_s ^T*P(K)*v_s]^-1

The formula fourteen: w_w(K)＝S(K)A(K)/(u_K*S(K-1)){I-g(K)yk^T}W_w(K-1)

y (K) is a frequency domain representation of the input signal at time K, i.e., the speech to be denoised, [ 2 ]]^TRepresenting transposition operation, K being time mark, F, F being frequency point mark, S (K), g (K), P (K) representing iterative optimization parameters of beam forming, vs representing beam vector of input signal expected direction, u_KIndicates the optimum restriction value [ alpha ], [ alpha ]]-¹Represents the matrix inversion, where yk represents the kth number in the vector y (K), u_K ^(K-k)Denotes the (K-K) order term for uK, A (K) represents the gain modulation information, W_w(K) The beamforming filter vector, W, representing time K_wAnd (K-1) denotes a beamforming filter vector at the time (K-1), where K is a positive number.

For example, the process may be as shown in fig. 5, and fig. 5 is a flowchart of a noise suppression array signal processing algorithm provided by the embodiment of the present disclosure, see fig. 5. In the flow chart, the frequency domain representation of the input signal at the K time is firstly input, and then the speech to be denoised at a plurality of times is based onUpdating beamforming parameters g (K), P (K), B (K) based on updated waveform bundling parameters W_w(K) And the beam vector W at time (K-1)_w(K-1), obtaining a beam vector at the time K, obtaining gain modulation information based on the beam vector at the time K and frequency domain gain control, that is, obtaining the noise-reduced audio signal at the time K in step 203, repeating the process until the noise-reduced target function corresponding to the audio signal at the time K is in a convergence state when the noise-reduced target function corresponding to the audio signal at the time K is not in the convergence state, and outputting the noise-reduced audio signal at the time K.

In a possible implementation manner, the beamforming algorithm may perform similar optimization calculation in the time domain instead of performing optimization calculation in the signal frequency domain, which is not limited in the embodiment of the present disclosure.

In the above possible specific implementation manner, the computer device can effectively perform effective noise elimination and suppression on the specified noise type by using the noise reduction algorithm of the microphone array, and meanwhile, the gain modulation information can effectively ensure that the voice is not excessively distorted, so that the wake-up rate and the recognition rate of the intelligent voice system can be effectively improved.

In a possible implementation manner, the deep neural network model in step 202 may be a deep neural network model for a vehicle-mounted environment, and after obtaining probability distribution of the speech to be denoised based on the deep neural network model, the computer device performs the above steps 203 to 204, wakes up the intelligent speech system using the speech denoised in step 204, and recognizes the speech using the intelligent speech system. The process can provide effective noise elimination and suppression methods for the vehicle machine carrying the intelligent voice system, and effectively improve the awakening rate and the recognition rate of the intelligent voice system. In addition, the computer equipment can also carry out deep neural network model training aiming at the vehicle-mounted signal data in a specific noise environment, and can continuously increase the inhibition support to new type special noise, thereby ensuring the continuous iteration performance of the method. The method also has generalization capability, can adapt to more similar noise reduction scenes, such as other vehicles (airplanes, ships), home environments (televisions, washing machine noise suppression) and the like, and the specific method can be that a corresponding deep neural network model is obtained by training voice samples of mobile phones using more similar noise reduction scenes, and then the voice noise reduction process from the step 201 to the step 204 is carried out based on the deep neural network model.

According to the method provided by the embodiment of the disclosure, gain modulation information of the voice to be denoised is obtained according to the probability distribution of the voice to be denoised and the modulation gain function corresponding to each noise type, and then the voice to be denoised is denoised according to the gain modulation information. Because the noise time domain characteristics, the frequency domain characteristics and the like of different noise types are different, different noise reduction processes are needed, compared with the method for reducing the noise of the voice to be reduced containing different types of noise by using the same noise reduction algorithm, the voice noise reduction method distinguishes the noise reduction processes of the voice to be reduced containing different types of noise, reduces noise more pertinently, improves the noise reduction effect, and is beneficial to improving the awakening rate and the recognition rate of the voice after noise reduction.

Fig. 6 is a schematic structural diagram of a speech noise reduction apparatus provided in an embodiment of the present disclosure, and referring to fig. 6, the apparatus includes:

a determining module 601, configured to determine, according to a frequency domain feature of a speech to be noise-reduced, a probability distribution of the speech to be noise-reduced, where the probability distribution is used to indicate a probability that a type of noise in the speech to be noise-reduced is each of multiple noise types;

an obtaining module 602, configured to obtain gain modulation information of the speech to be noise reduced based on the probability distribution of the speech to be noise reduced and a modulation gain function corresponding to each noise type;

and a noise reduction module 603, configured to perform noise reduction on the speech to be noise reduced according to the gain modulation information.

In one possible implementation, the determining module 601 is further configured to:

obtaining the prior probability corresponding to each noise type;

and obtaining the probability that the type of noise in the voice to be denoised is the type of each noise according to the probability function corresponding to each noise type corresponding to the voice to be denoised and the prior probability, and obtaining the probability distribution of the voice to be denoised.

In one possible implementation, the obtaining module 602 is further configured to:

acquiring the sum of the products corresponding to all the noise types;

and acquiring the ratio between the product corresponding to the noise type and the sum, and taking the ratio as the probability that the type of the noise in the voice to be denoised is the noise type.

In one possible implementation, the determining module 601 is further configured to;

determining each hidden layer of the model for the probability distribution, and acquiring the ratio of the weight with the maximum absolute value in the weight matrix of the hidden layer to a target value as quantization amplitude;

and obtaining the product of the quantization amplitude and any weight in the weight matrix as the quantized weight corresponding to the any weight to obtain the probability distribution determination model.

In one possible implementation of the method according to the invention,

the obtaining module 602 is further configured to obtain beamforming filtering information of the speech to be denoised according to the gain modulation information;

the device also comprises a filtering module used for filtering the voice to be denoised according to the wave beam forming filtering information.

According to the device provided by the embodiment of the disclosure, gain modulation information of the voice to be denoised is obtained according to the probability distribution of the voice to be denoised and the modulation gain function corresponding to each noise type, and then the voice to be denoised is denoised according to the gain modulation information. Because different noise time domain characteristics, frequency domain characteristics and the like of different noise types need different noise reduction processes, compared with the method for reducing the noise of the voice to be reduced by using the same noise reduction algorithm, the voice noise reduction method takes the situation that the voice to be reduced contains different noise types of noise into consideration, and obtains different gain modulation information according to the different noise types of the voice to be reduced containing the noise, thereby distinguishing the noise reduction processes of different voices to be reduced, reducing the noise more specifically, improving the noise reduction effect, and being beneficial to improving the awakening rate and the recognition rate of the voice after the noise reduction.

It should be noted that: in the voice noise reduction device provided in the above embodiment, when reducing noise, only the division of the above functional modules is used for illustration, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the voice noise reduction device and the voice noise reduction method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure. The terminal 700 may be: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on. The terminal can also be a voice intelligent terminal embedded device installed on the central control.

In general, terminal 700 includes: one or more processors 701 and one or more memories 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the speech noise reduction methods provided by method embodiments in the present disclosure.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 704, a display screen 705, a camera assembly 706, an audio circuit 707, a positioning component 708, and a power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, providing the front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is used to locate the current geographic position of the terminal 700 to implement navigation or LBS (location based Service). The positioning component 708 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 709 is provided to supply power to various components of terminal 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side frame of terminal 700 and/or underneath display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the terminal 700. When a physical button or a vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical button or the vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is adjusted down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on a front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 is gradually increased, the processor 701 controls the display 705 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 is not intended to be limiting of terminal 700 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where the one or more memories 802 store at least one instruction, and the at least one instruction is loaded and executed by the one or more processors 801 to implement the voice noise reduction method according to the above-described method embodiments. Of course, the server 800 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 800 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor to perform the speech noise reduction method of the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing is considered as illustrative of the embodiments of the disclosure and is not to be construed as limiting thereof, and any modifications, equivalents, improvements and the like made within the spirit and principle of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A method for speech noise reduction, the method comprising:

2. The method according to claim 1, wherein the determining the probability distribution of the speech to be noise-reduced according to the frequency domain characteristics of the speech to be noise-reduced comprises:

obtaining the prior probability corresponding to each noise type;

3. The method according to claim 2, wherein the obtaining, according to the probability function corresponding to each noise type corresponding to the speech to be noise-reduced and the prior probability, the probability that the type of noise in the speech to be noise-reduced is the noise type comprises:

acquiring the sum of the products corresponding to all the noise types;

4. The method of claim 2, further comprising:

5. The method according to claim 1, wherein the obtaining gain modulation information of the speech to be noise-reduced based on the probability distribution of the speech to be noise-reduced and a modulation gain function corresponding to each noise type comprises:

6. The method according to claim 1, wherein the determining the probability distribution of the speech to be noise-reduced according to the frequency domain characteristics of the speech to be noise-reduced comprises;

7. The method of claim 6, wherein the obtaining of the probability distribution determination model comprises:

8. The method according to claim 7, wherein the quantizing the model parameters in the candidate probability distribution determination model based on the target quantization parameters to obtain the probability distribution determination model comprises:

9. The method according to claim 1, wherein the denoising the speech to be denoised according to the gain modulation information comprises:

and carrying out filtering processing on the voice to be denoised according to the wave beam forming filtering information.

10. A speech noise reduction apparatus, characterized in that the apparatus comprises a plurality of functional modules for performing the speech noise reduction method of any one of claims 1 to 9.

11. A computer device comprising one or more processors and one or more memories having stored therein at least one instruction that is loaded and executed by the one or more processors to perform operations performed by the voice noise reduction method of any of claims 1-9.

12. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor to perform operations performed by the speech noise reduction method of any of claims 1 to 9.