CN111883164B

CN111883164B - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN111883164B
Application number: CN202010575643.4A
Authority: CN
Inventors: 张旭; 郑羲光; 张晨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2023-11-03
Anticipated expiration: 2040-06-22
Also published as: CN111883164A

Abstract

The present disclosure relates to a model training method, apparatus, electronic device, and storage medium, the training method comprising obtaining a plurality of sample data, and determining first characteristic information and amplitude characteristic information of noisy audio data in each sample data at each sampling point according to original audio data and noisy audio data in the sample data. And adjusting the first characteristic information to obtain target characteristic information, and training the model to be trained according to the target characteristic information and the corresponding amplitude characteristic information to obtain a trained model. When the trained model is used for denoising the audio data, the denoising intensity of the model can be enhanced in a lower signal-to-noise ratio range, the denoising intensity of the model can be reduced in a higher signal-to-noise ratio range, and the trained model can obtain different denoising effects for the audio data in different signal-to-noise ratio ranges.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular relates to a model training method, a device, electronic equipment and a storage medium.

Background

With the development of computer technology, neural networks are increasingly used in the processing of audio data, and compared with the traditional algorithm, the neural networks can often obtain better effects and performances. In the denoising process of the audio data, firstly, a model to be trained is trained to obtain a trained model, and then the audio data is denoised through the trained model to obtain the audio data from which the noise data is removed.

In the training process of the model to be trained, sample data are random, the sample data can be audio data with low signal to noise ratio or audio data with high signal to noise ratio, and the model obtained by training the random sample data cannot be used for denoising aiming at audio data with different signal to noise ratio ranges by adopting different denoising intensities.

Disclosure of Invention

The disclosure provides a model training method, a device, electronic equipment and a storage medium, which are used for at least solving the problem that a model cannot denoise audio data with different signal to noise ratio ranges by adopting different denoising intensities in the denoising process of the audio data.

The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a model training method, including:

Acquiring a plurality of sample data, wherein each sample data comprises original audio data and noisy audio data;

according to the original audio data and the noisy audio data, determining first characteristic information and amplitude characteristic information of the noisy audio data in each sample data at each sampling point, wherein the first characteristic information is used for representing signal-to-noise ratio information of the noisy audio data at the corresponding sampling point;

adjusting the first characteristic information to obtain target characteristic information; the method comprises the steps of reducing first characteristic information when the first characteristic information is smaller than or equal to a first threshold value, and increasing the first characteristic information when the first characteristic information is larger than or equal to a second threshold value, wherein the first threshold value is smaller than the second threshold value;

inputting the amplitude characteristic information into a model to be trained to obtain second characteristic information output by the model to be trained;

acquiring a loss value of the model to be trained according to the second characteristic information and the target characteristic information;

and adjusting model parameters of the model to be trained according to the loss value, and taking the model to be trained as a trained model when the loss value is smaller than or equal to a preset threshold value.

Optionally, the adjusting the first feature information to obtain target feature information includes:

when the first characteristic information is smaller than or equal to the first threshold value, the first characteristic information is reduced to be lower than a third threshold value;

and when the first characteristic information is larger than or equal to the second threshold value, the first characteristic information is increased to be larger than a fourth threshold value.

Optionally, the method further comprises: when the first characteristic information is larger than the first threshold value and smaller than the second threshold value, the first characteristic information is adjusted to be between a fifth threshold value and a sixth threshold value, and the fifth threshold value is smaller than the sixth threshold value.

Optionally, the adjusting the first feature information to obtain target feature information includes: and adjusting the first characteristic information through a mapping function to obtain the target characteristic information.

Optionally, the first characteristic information is a ratio between an amplitude value of the original audio data corresponding to the sampling point and an amplitude value of the noisy audio data, and the first characteristic information is less than or equal to 1.

Optionally, the determining, according to the original audio data and the noisy audio data, the first feature information and the amplitude feature information of the noisy audio data in each sample data at each sampling point includes:

Converting original audio data in target sample data into a first frequency domain signal, and converting noisy audio data in the target sample data into a second frequency domain signal; the target sample data is any one sample data of the plurality of sample data;

and determining first characteristic information and amplitude characteristic information of the noisy frequency data in the target sample data at each sampling point according to the first frequency domain signal and the second frequency domain signal.

According to a second aspect of embodiments of the present disclosure, there is provided a model training apparatus, comprising:

a first acquisition module configured to acquire a plurality of sample data, each of the sample data including original audio data and noisy audio data;

the determining module is configured to determine first characteristic information and amplitude characteristic information of the noisy audio data in each sample data at each sampling point according to the original audio data and the noisy audio data, wherein the first characteristic information is used for representing signal-to-noise ratio information of the noisy audio data at the corresponding sampling point;

the first adjustment module is configured to adjust the first characteristic information to obtain target characteristic information, wherein the first characteristic information is reduced when the first characteristic information is smaller than or equal to a first threshold value, and the first characteristic information is increased when the first characteristic information is larger than or equal to a second threshold value, and the first threshold value is smaller than the second threshold value;

The input module is configured to input the amplitude characteristic information into a model to be trained to obtain second characteristic information output by the model to be trained;

the second acquisition module is configured to acquire a loss value of the model to be trained according to the second characteristic information and the target characteristic information;

and the second adjusting module is configured to adjust the model parameters of the model to be trained according to the loss value until the loss value is smaller than or equal to a preset threshold value, and the model to be trained is used as a trained model.

Optionally, the first adjustment module is specifically configured to reduce the first characteristic information to be below a third threshold when the first characteristic information is less than or equal to the first threshold; and when the first characteristic information is larger than or equal to the second threshold value, the first characteristic information is increased to be larger than a fourth threshold value.

Optionally, the first adjusting module is further specifically configured to adjust the first characteristic information to a range between a fifth threshold value and a sixth threshold value when the first characteristic information is greater than the first threshold value and less than the second threshold value, and the fifth threshold value is less than the sixth threshold value.

Optionally, the first adjustment module is specifically configured to adjust the first feature information through a mapping function, so as to obtain the target feature information.

Optionally, the determining module is specifically configured to convert the original audio data in the target sample data into a first frequency domain signal, and convert the noisy audio data in the target sample data into a second frequency domain signal; the target sample data is any one sample data of the plurality of sample data; and determining first characteristic information and amplitude characteristic information of the noisy frequency data in the target sample data at each sampling point according to the first frequency domain signal and the second frequency domain signal.

According to a third aspect of embodiments of the present disclosure, there is provided another electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the model training method as provided in the first aspect of the embodiments of the present disclosure described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the model training method as provided in the first aspect of embodiments of the present disclosure described above.

According to a fifth aspect of embodiments of the present disclosure there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the model training method as provided in the first aspect of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in this embodiment, a plurality of sample data are acquired, and first characteristic information and amplitude characteristic information of noisy audio data in each sample data at each sampling point are determined according to original audio data and noisy audio data in the sample data. And adjusting the first characteristic information to obtain target characteristic information, inputting the amplitude characteristic information into the model to be trained to obtain second characteristic information output by the model to be trained, and obtaining a loss value of the model to be trained according to the second characteristic information and the target characteristic information. And adjusting model parameters of the model to be trained according to the loss value, and taking the model to be trained as a trained model when the loss value is smaller than or equal to a preset threshold value. When the trained model is used for denoising the audio data, the denoising intensity of the model can be enhanced in a lower signal-to-noise ratio range, the denoising intensity of the model can be reduced in a higher signal-to-noise ratio range, and the trained model can obtain different denoising effects for the audio data in different signal-to-noise ratio ranges.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flow chart illustrating a model training method according to an exemplary embodiment;

FIG. 2 is a flow chart illustrating another model training method according to an exemplary embodiment;

FIG. 3 is a graph of a first mapping function;

FIG. 4 is a graph of a second mapping function;

FIG. 5 is a block diagram of a model training apparatus, according to an example embodiment;

FIG. 6 is a block diagram of an electronic device, shown in accordance with an exemplary embodiment;

fig. 7 is a block diagram of yet another electronic device, shown in accordance with an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart of a model training method according to an exemplary embodiment, and referring to fig. 1, the model training method provided in this embodiment may be applicable to denoising of audio data, so that a trained model may obtain different denoising intensities for audio data in different signal-to-noise ratio ranges. The model training method provided in this embodiment may be executed by a model training apparatus, where the model training apparatus is typically implemented in software and/or hardware, and the model training apparatus may be disposed in an electronic device, and the method may include:

step 101, a plurality of sample data are acquired.

In this embodiment, each sample data includes original audio data and noisy audio data, which is synthesized from the original audio data and the noisy audio data. The specific process of acquiring the sample data may refer to the prior art, and this embodiment is not limited thereto.

Step 102, according to the original audio data and the noisy audio data, determining first characteristic information and amplitude characteristic information of the noisy audio data in each sample data at each sampling point.

The first characteristic information is used for representing signal-to-noise ratio information of the noisy audio data at the corresponding sampling point, and the amplitude characteristic information is an amplitude value of the noisy audio data corresponding to the sampling point.

In this embodiment, after the sample data is obtained, the first characteristic information and the corresponding amplitude characteristic information of the noisy frequency data at each sampling point may be determined according to the original audio data and the noisy frequency data included in the sample data, that is, the signal-to-noise ratio and the amplitude value of the noisy frequency data at each sampling point may be determined. The first characteristic information may be a ratio between an amplitude value of the original audio data corresponding to the sampling point and an amplitude value of the noisy audio data, and may be set to be less than or equal to 1.

By way of example, step 102 may be implemented as follows:

converting the original audio data in the target sample data into a first frequency domain signal, and converting the noisy audio data in the target sample data into a second frequency domain signal; the target sample data is any one sample data among a plurality of sample data;

and determining first characteristic information and amplitude characteristic information of the noisy audio data in the target sample data at each sampling point according to the first frequency domain signal and the second frequency domain signal.

In this embodiment, the same fourier transform may be performed on the original audio data and the noisy audio data included in the sample data, and the original audio data and the noisy audio data are respectively converted from time domain signals to frequency domain signals, so as to obtain information such as amplitude values and frequency values of the original audio data and the noisy audio data, respectively. For example, the original audio data may be converted into a first frequency domain signal by the same fourier transform, which may include sinusoidal components having frequencies of 3kHz, 5kHz, 7kHz and 9kHz, and into a second frequency domain signal, which may also include sinusoidal components having frequencies of 3kHz, 5kHz, 7kHz and 9 kHz. And then calculating the ratio between the amplitude value of the first frequency domain signal and the amplitude value of the second frequency domain signal corresponding to each same frequency point (sampling point) to obtain corresponding first characteristic information, and taking the amplitude value of the second frequency domain signal as corresponding amplitude characteristic information. If the sampling point is calculated to be 3kHz, the ratio between the amplitude value of the first frequency domain signal and the amplitude value of the second frequency domain signal is calculated to obtain first characteristic information 0.3, and meanwhile, the amplitude value of the second frequency domain signal at 3kHz can be determined to be amplitude characteristic information corresponding to the calculated first characteristic information 0.3. And similarly, sequentially calculating to obtain first characteristic information 0.4, first characteristic information 0.6 and first characteristic information 0.9 which respectively correspond to the sampling points 5kHz, 7kHz and 9kHz, and determining amplitude characteristic information respectively corresponding to each first characteristic information.

When the fourier transform is performed on the original audio data and the noisy audio data, parameters used for the transform may be selected according to the requirement, and the specific fourier transform process may refer to the prior art, which is not limited in this embodiment.

Step 103, the first characteristic information is adjusted to obtain target characteristic information.

And when the first characteristic information is larger than or equal to a second threshold value, the first characteristic information is increased, and the first threshold value is smaller than the second threshold value.

In this embodiment, after the first feature information and the amplitude feature information of each sampling point are determined, the first feature information may be adjusted to obtain target feature information corresponding to the first feature information. The first threshold and the second threshold correspond to different signal-to-noise ratios, respectively. In combination with steps 101 to 102, the first characteristic information represents a signal-to-noise ratio range of the noisy audio data, the larger the first characteristic information represents a signal-to-noise ratio of the noisy audio data, and the smaller the first characteristic information represents a signal-to-noise ratio of the noisy audio data to be lower. And when the first characteristic information is smaller than or equal to a first threshold value, the signal-to-noise ratio of the noisy audio data is low, and when the first characteristic information is larger than or equal to a second threshold value, the signal-to-noise ratio of the noisy audio data is high.

For example, the first threshold may be set to 0.3 (signal-to-noise ratio of noisy audio data is low), and when the first feature information is 0.3, the first feature information may be reduced such that the reduced first feature information (i.e., target feature information) is 0.2. Similarly, the second threshold may be set to 0.9 (the signal-to-noise ratio of the noisy audio data is high), and when the first feature information is 0.9, the first feature information is increased so that the increased first feature information (i.e., the target feature information) is 0.95. The above is merely an exemplary example, and specific values of the first threshold value and the second threshold value, and the decreasing and increasing methods of the first characteristic information may be set according to requirements, which is not limited in this embodiment.

And 104, inputting the amplitude characteristic information into the model to be trained to obtain second characteristic information output by the model to be trained.

And 105, acquiring a loss value of the model to be trained according to the second characteristic information and the target characteristic information.

And 106, adjusting model parameters of the model to be trained according to the loss value, and taking the model to be trained as a trained model when the loss value is smaller than or equal to a preset threshold value.

For steps 104 to 106, when training the model to be trained according to the target feature information and the corresponding amplitude feature information, the amplitude feature information of each sampling point may be used as an input of the model to be trained, and the target feature information corresponding to each sampling point (i.e., the target feature information obtained after the adjustment of the first feature information of each sampling point) may be used as a target of the model to be trained, so as to train the model to be trained.

For example, the amplitude characteristic information corresponding to the sampling point 3kHz may be input into the model to be trained, so as to obtain the second characteristic information corresponding to the sampling point 3kHz output by the model to be trained. And then calculating according to the target characteristic information and the second characteristic information corresponding to the sampling point 3kHz to obtain a loss value of the model to be trained, and adjusting model parameters of the model to be trained according to the loss value to complete one-time training of the model to be trained. And similarly, training the model to be trained according to the amplitude characteristic information and the target characteristic information which correspond to the sampling points of 5kHz, 7kHz and 9kHz respectively. And similarly, training the model to be trained for a plurality of times according to the target characteristic information and the amplitude characteristic information corresponding to each sampling point in each sample data until the loss value is smaller than or equal to a preset threshold value, determining that training is completed, and taking the model to be trained as a trained model. The specific value of the preset threshold may be set according to the requirement, and the calculation method of the loss value and the adjustment process of the model parameter may refer to the prior art, which is not limited in this embodiment.

In this embodiment, after the trained model is obtained, the audio data may be denoised through the trained model, so as to obtain the audio data from which the noise data is removed. In combination with the above example, in the denoising process, fourier transform may be performed on the audio data to obtain a frequency domain signal of the audio data, and an amplitude value (amplitude feature information) and a phase value corresponding to each frequency value (sampling point) in the frequency domain signal of the audio data are determined. And taking the amplitude value corresponding to each frequency value as the input of a trained model, and obtaining second characteristic information corresponding to each frequency value through the trained model.

In combination with step 102, the first characteristic information is a ratio between an amplitude value of the first frequency domain signal and an amplitude value of the second frequency domain signal in the sample data, and then the amplitude value of the first frequency domain signal is a product between the first characteristic information and the amplitude value of the second frequency domain signal. From this, it is known that the amplitude value of the frequency domain signal of the audio data from which the noise data is removed is the product of the second characteristic information and the amplitude value of the frequency domain signal of the audio data. Therefore, after determining the second characteristic information corresponding to each frequency value, the product between the second characteristic information and the corresponding amplitude value is the amplitude value corresponding to each frequency value.

Meanwhile, in the fourier transform process, the same fourier transform is performed on the original audio data and the noisy audio data in the sample data, that is, when the frequency values (sampling points) are the same, the phase values in the first frequency domain signal and the phase values in the second frequency domain signal are the same. Similarly, the phase value corresponding to each frequency value in the frequency domain signal of the audio data from which the noise data is removed is the same as the phase value in the audio data. And taking the phase value of the audio data as the phase value of the audio data after removing the noise data, and carrying out inverse Fourier change according to each frequency value in the frequency domain signal of the audio data after removing the noise data and the amplitude value and the phase value corresponding to each frequency value respectively to obtain the audio data after removing the noise data.

In practical application, the first characteristic information smaller than or equal to the first threshold value is reduced, and the first characteristic information larger than or equal to the second threshold value is increased, so that the signal-to-noise ratio range of the noisy frequency data can be adjusted. Training the model to be trained through the adjusted first characteristic information (namely the target characteristic information) to obtain a trained model, so that different denoising intensities can be obtained when the trained model denoises the audio data in different signal-to-noise ratio ranges. In combination with the above example, if the first feature information is 0.3 (less than or equal to the first threshold value 0.3), the model obtained by training the first feature information 0.3 obtains second feature information that is close to or equal to 0.3 when denoising the audio data with the first feature information of 0.3. When the model obtained through training the target characteristic information 0.2 performs denoising on the audio data with the first characteristic information of 0.3, the obtained second characteristic information is close to or equal to 0.2, the amplitude value obtained through calculating the second characteristic information 0.2 is lower than the amplitude value obtained through calculating the second characteristic information 0.3, the amplitude value of the audio data after removing the noise data is reduced, and the denoising strength of the trained model is increased. Similarly, if the first characteristic information is 0.9 (greater than or equal to the second threshold value 0.9), the model obtained by training the first characteristic information 0.9 obtains second characteristic information which is close to or equal to 0.9 when denoising the audio data with the first characteristic information of 0.9. When the model obtained through training of the target characteristic information 0.95 is used for denoising the audio data with the first characteristic information of 0.9, the obtained second characteristic information is close to or equal to 0.95, the amplitude value obtained through calculation of the second characteristic information 0.95 is higher than the amplitude value obtained through calculation of the second characteristic information 0.9, the amplitude value of the audio data after removing the noise data is increased, and the denoising intensity of the trained model is reduced. That is, the denoising strength of the model can be enhanced at a lower signal-to-noise ratio range, and the denoising strength of the model can be reduced at a higher signal-to-noise ratio range.

In summary, in this embodiment, a plurality of sample data are obtained, and according to the original audio data and the noisy audio data in the sample data, the first characteristic information and the amplitude characteristic information of the noisy audio data in each sample data at each sampling point are determined. And adjusting the first characteristic information to obtain target characteristic information, inputting the amplitude characteristic information into the model to be trained to obtain second characteristic information output by the model to be trained, and obtaining a loss value of the model to be trained according to the second characteristic information and the target characteristic information. And adjusting model parameters of the model to be trained according to the loss value, and taking the model to be trained as a trained model when the loss value is smaller than or equal to a preset threshold value. When the trained model is used for denoising the audio data, the denoising intensity of the model can be enhanced in a lower signal-to-noise ratio range, the denoising intensity of the model can be reduced in a higher signal-to-noise ratio range, and the trained model can obtain different denoising effects for the audio data in different signal-to-noise ratio ranges.

FIG. 2 is a flow chart illustrating another model training method, according to an example embodiment, referring to FIG. 2, which may include:

Step 201, obtaining a plurality of original audio data.

In this embodiment, in the process of acquiring a plurality of sample data, a plurality of original audio data may be acquired first. In particular, the electronic device may directly receive a plurality of original audio data input by a user, and the original audio data may be, for example, fixed-length music or voice. The method for obtaining the original audio data and the specific type of the original audio data can be set according to requirements, and the embodiment is not limited to this.

Step 202, adding noise data to the target audio data according to a preset rule to obtain noisy audio data corresponding to the target audio data, and taking the target audio data and the noisy audio data corresponding to the target audio data as sample data.

Wherein the target audio data is any one of a plurality of original audio data.

In this embodiment, after the original audio data is obtained, the original audio data may be processed to obtain noisy audio data corresponding to the original audio data. Specifically, noise data may be added to the original audio data according to a preset rule to obtain noisy audio data corresponding to the original audio data, where the original audio data and the noisy audio data are used as sample data. The noise data may be, for example, fixed-length speech. For example, after music (original audio data) is acquired, speech (noise data) may be synthesized into the music to obtain noisy audio data, and during the synthesis, the signal-to-noise ratio of the noisy audio data may be brought to a preset value (for example, 20 db) according to a preset rule. The specific process of adding noise data to the original audio data may be set according to requirements, which is not limited in this embodiment. Noise data is added to the original audio data through a preset rule to obtain noisy frequency data, further sample data are obtained, sample data meeting training requirements can be obtained, and training efficiency is improved.

Step 203, determining first characteristic information and amplitude characteristic information of the noisy audio data in each sample data at each sampling point according to the original audio data and the noisy audio data.

And 204, adjusting the first characteristic information through the mapping function to obtain target characteristic information.

In this embodiment, the first feature information may be adjusted by a mapping function, so as to obtain target feature information corresponding to the first feature information.

Alternatively, when the first characteristic information is equal to or less than the first threshold value, the first characteristic information may be reduced to be below the third threshold value;

when the first characteristic information is equal to or greater than the second threshold value, the first characteristic information may be increased to be equal to or greater than the fourth threshold value.

In this embodiment, the first feature information smaller than or equal to the first threshold may be directly reduced to be smaller than the third threshold, so that the trained model may have a stronger denoising effect when aiming at the audio data with a lower signal-to-noise ratio. And increasing the first characteristic information which is larger than or equal to the second threshold to be larger than a fourth threshold so that the trained model has lower denoising effect when aiming at the audio data with higher signal to noise ratio.

For example, the first mapping function may be:

and adjusting the first characteristic information. As shown in fig. 3, fig. 3 is a graph of a first mapping function, in which the first characteristic information smaller than 0.6 (first threshold) can be reduced to 0.1 (third threshold) or less and the first characteristic information larger than 0.8 (second threshold) can be increased to 0.9 (fourth threshold) or more.

For another example, the second mapping function may be:

and adjusting the first characteristic information. As shown in fig. 4, fig. 4 is a graph of a second mapping function, by which the first characteristic information of 0.5 or less (first threshold value) can be adjusted to 0; the first characteristic information of 0.9 or more (second threshold value) is adjusted to 1.

In this embodiment, when the first feature information is smaller than or equal to the first threshold, the first feature information is directly reduced to be smaller than the third threshold, so that smaller first feature information (target feature information) can be obtained when the signal-to-noise ratio is lower. For example, the first feature information smaller than or equal to 0.5 is adjusted to 0, and the model to be trained is trained through the smaller first feature information (target feature information), so that when the audio data with a lower signal-to-noise ratio is denoised, the intensity of the audio data in the noisy audio data can be reduced by the trained model, the denoising intensity is increased, and the denoising effect is improved. For example, if the first feature information is 0.3, the model trained by the first feature information 0.3 obtains second feature information close to 0.3 when denoising audio data with the first feature information of 0.3 (low signal-to-noise ratio and large noise data). When the model trained by the adjusted first characteristic information (target characteristic information) 0 performs denoising on the audio data with the first characteristic information of 0.3, the obtained second characteristic information is close to 0, and the amplitude value calculated by the second characteristic information 0 is lower than the amplitude value calculated by the second characteristic information of 0.3, so that the amplitude of the audio data obtained by calculation after removing the noise data is lower. Therefore, when the audio data with lower signal-to-noise ratio (namely, larger noise data) is denoised, larger denoising intensity can be obtained, and the intensity of the audio data after the noise data is removed is reduced.

Similarly, when the first characteristic information is larger than or equal to the second threshold, the first characteristic information is directly increased to be larger than the fourth threshold, and larger target characteristic information can be obtained when the signal-to-noise ratio is higher. For example, the first feature information greater than or equal to 0.9 is adjusted to be 1, and the model to be trained is trained through the larger first feature information (target feature information), so that when the audio data with higher signal to noise ratio is denoised, the audio data in the audio data with noise can be enhanced by the model obtained through training, the denoising intensity is reduced, and the denoising effect is improved. For example, if the first feature information is 0.9, the model trained by the first feature information 0.9 obtains second feature information close to 0.9 when denoising audio data with the first feature information of 0.9 (signal-to-noise ratio is high and noise data is small). When the model trained by the adjusted first characteristic information 1 (target characteristic information) performs denoising on the audio data with the first characteristic information of 0.9, the obtained second characteristic information is close to 1, and the amplitude value calculated by the second characteristic information 1 is higher than the amplitude value calculated by the first characteristic information of 0.9, so that the amplitude of the audio data after the noise data is removed is higher. Therefore, when the audio data with higher signal-to-noise ratio (namely smaller noise data) is denoised, the denoising intensity can be reduced, and the intensity of the audio data after the noise data is removed can be improved.

Optionally, when the first characteristic information is greater than the first threshold and less than the second threshold, the first characteristic information is adjusted between a fifth threshold and a sixth threshold, and the fifth threshold is less than the sixth threshold.

In this embodiment, when the first characteristic information is greater than the first threshold value and less than the second threshold value, the first characteristic information may be adjusted between the fifth threshold value and the sixth threshold value. By adjusting the first characteristic information between the first threshold value and the second threshold value, the overall denoising effect of the model can be adjusted, and the applicability of the model is improved. Referring to fig. 3, the first characteristic information greater than 0.6 (first threshold value) and less than 0.8 (second threshold value) may be adjusted to between 0.1 (fifth threshold value) and 0.9 (sixth threshold value). And in connection with fig. 4, the first characteristic information greater than 0.5 (first threshold value) and less than 0.9 (second threshold value) may be adjusted to between 0 (fifth threshold value) and 1 (sixth threshold value).

In practical application, when the first characteristic information is adjusted through the mapping function, the first characteristic information in the whole range can be quickly adjusted, so that the adjustment efficiency of the first characteristic information is improved. It should be noted that, the method for adjusting the first feature information may include, but is not limited to, a method by a mapping function.

And 205, inputting the amplitude characteristic information into the model to be trained to obtain second characteristic information output by the model to be trained.

And 206, acquiring a loss value of the model to be trained according to the second characteristic information and the target characteristic information.

And step 207, adjusting model parameters of the model to be trained according to the loss value, and taking the model to be trained as a trained model when the loss value is smaller than or equal to a preset threshold value.

In summary, in this embodiment, the first feature information in different signal-to-noise ratio ranges may be adjusted differently to obtain the target feature information in different ranges, and the model to be trained is trained by the target feature information in different ranges, so that the trained model can obtain different denoising effects for the audio data in different signal-to-noise ratio ranges.

Optionally, when the first feature information is greater than or equal to 1, the first feature information is adjusted to 1.

In this embodiment, after the first feature information is determined, if the first feature information is 1 or more, the first feature information may be adjusted to 1. For example, after determining the first characteristic information, the following function may be performed on the first characteristic information:

in combination with step 102, mask is the first feature information, magX is the amplitude value in the first frequency domain signal, and MagY is the amplitude value in the second frequency domain signal. When the first characteristic information is 1 or more, the first characteristic information is adjusted to 1, so that the situation that the first characteristic information is 1 or more can be avoided when noisy audio data is generated according to the original audio data and the noise data, and the amplitude in the second frequency domain signal is smaller than the amplitude in the first frequency domain signal due to phase offset and other reasons. And further, the problems that when the first characteristic information is larger than 1, a larger target (target characteristic information) appears in the model training process, so that the model is poor in convergence and long in training time in the training process can be avoided.

Referring to fig. 5, fig. 5 is a block diagram of a model training apparatus, according to an exemplary embodiment. The model training apparatus 500 may be applied to denoising of audio data, and may include: a first acquisition module 501, a determination module 502, a first adjustment module 503, an input module 504, a second acquisition module 505, and a second adjustment module 506.

The first acquisition module 501 is configured to acquire a plurality of sample data, each sample data including raw audio data and noisy audio data.

The determining module 502 is configured to determine, according to the original audio data and the noisy audio data, first characteristic information and amplitude characteristic information of the noisy audio data in each sample data at respective sampling points, where the first characteristic information is used to represent signal-to-noise ratio information of the noisy audio data at the corresponding sampling points.

The first adjustment module 503 is configured to adjust the first feature information to obtain the target feature information, where the first feature information is reduced when the first feature information is equal to or less than a first threshold value, and the first feature information is increased when the first feature information is equal to or greater than a second threshold value, and the first threshold value is less than the second threshold value.

The input module 504 is configured to input the amplitude characteristic information into the model to be trained, and obtain second characteristic information output by the model to be trained.

The second obtaining module 505 is configured to obtain a loss value of the model to be trained according to the second feature information and the target feature information;

the second adjustment module 506 is configured to adjust model parameters of the model to be trained according to the loss value, until the loss value is less than or equal to a preset threshold, and the model to be trained is used as a trained model.

Optionally, the first adjustment module 503 is specifically configured to reduce the first characteristic information to below a third threshold when the first characteristic information is less than or equal to the first threshold; and when the first characteristic information is larger than or equal to the second threshold value, increasing the first characteristic information to be larger than or equal to the fourth threshold value.

Optionally, the first adjusting module 503 is further specifically configured to adjust the first characteristic information to be between a fifth threshold value and a sixth threshold value when the first characteristic information is greater than the first threshold value and smaller than the second threshold value, and the fifth threshold value is smaller than the sixth threshold value.

Optionally, the first adjustment module 503 is specifically configured to adjust the first feature information through a mapping function to obtain the target feature information.

Optionally, the determining module 502 is specifically configured to convert the original audio data in the target sample data into a first frequency domain signal, and convert the noisy audio data in the target sample data into a second frequency domain signal; the target sample data is any one sample data among a plurality of sample data; and determining first characteristic information and amplitude characteristic information of the noisy audio data in the target sample data at each sampling point according to the first frequency domain signal and the second frequency domain signal.

Referring to fig. 6, fig. 6 is a block diagram of an electronic device, according to an example embodiment. The electronic device 600 includes:

a processor 601.

A memory 602 for storing instructions executable by the processor 601.

Wherein the processor 601 is configured to execute executable instructions stored in the memory 602 to implement the model training method in the embodiment shown in fig. 1 or fig. 2.

In an exemplary embodiment, a storage medium is also provided, such as a memory 602, including instructions executable by the processor 601 of the processor 600 to perform the model training method of the embodiment shown in fig. 1 or 2.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the model training method in the embodiment as shown in fig. 1 or fig. 2 is also provided.

Referring to fig. 7, fig. 7 is a block diagram of yet another electronic device, shown according to an example embodiment, the electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 713, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the apparatus 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the model training method described above. Further, the processing component 702 can include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on the apparatus 700, contact data, phonebook data, messages, pictures, videos, and the like. The memory 704 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 706 provides power to the various components of the device 700. The power components 706 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 700.

The multimedia component 708 includes a screen between the device 700 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front-facing camera and/or a rear-facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the apparatus 700 is in an operational mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a Microphone (MIC) configured to receive external audio signals when the device 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 704 or transmitted via the communication component 716. In some embodiments, the audio component 710 further includes a speaker for outputting audio signals.

The I/O interface 713 provides an interface between the processing component 702 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the apparatus 700. For example, the sensor assembly 714 may detect an on/off state of the device 700, a relative positioning of the components, such as a display and keypad of the device 700, a change in position of the device 700 or a component of the device 700, the presence or absence of user contact with the device 700, an orientation or acceleration/deceleration of the device 700, and a change in temperature of the device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate communication between the apparatus 700 and other devices in a wired or wireless manner. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 716 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the model training methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 704, including instructions executable by processor 720 of apparatus 700 to perform the model training method described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of model training, comprising:

2. The method of claim 1, wherein said adjusting said first characteristic information to obtain target characteristic information comprises:

3. The method as recited in claim 2, further comprising:

when the first characteristic information is larger than the first threshold value and smaller than the second threshold value, the first characteristic information is adjusted to be between a fifth threshold value and a sixth threshold value, and the fifth threshold value is smaller than the sixth threshold value.

4. The method of claim 1, wherein said adjusting said first characteristic information to obtain target characteristic information comprises:

and adjusting the first characteristic information through a mapping function to obtain the target characteristic information.

5. The method of claim 1, wherein the first characteristic information is a ratio between an amplitude value of the original audio data corresponding to the sampling point and an amplitude value of the noisy audio data, and the first characteristic information is 1 or less.

6. The method of claim 5, wherein determining the first characteristic information and the amplitude characteristic information of the noisy audio data in each sample data at each sampling point based on the original audio data and the noisy audio data comprises:

7. A model training device, comprising:

the first adjusting module is configured to adjust the first characteristic information to obtain target characteristic information; the method comprises the steps of reducing first characteristic information when the first characteristic information is smaller than or equal to a first threshold value, and increasing the first characteristic information when the first characteristic information is larger than or equal to a second threshold value, wherein the first threshold value is smaller than the second threshold value;

8. The apparatus according to claim 7, wherein the first adjustment module is specifically configured to reduce the first characteristic information below a third threshold when the first characteristic information is less than or equal to the first threshold; and when the first characteristic information is larger than or equal to the second threshold value, the first characteristic information is increased to be larger than a fourth threshold value.

9. The apparatus of claim 8, wherein the first adjustment module is further specifically configured to adjust the first characteristic information to between a fifth threshold and a sixth threshold when the first characteristic information is greater than the first threshold and less than the second threshold, the fifth threshold being less than the sixth threshold.

10. The apparatus according to claim 7, wherein the first adjustment module is specifically configured to adjust the first feature information by a mapping function to obtain the target feature information.

11. The apparatus of claim 7, wherein the first characteristic information is a ratio between an amplitude value of the original audio data corresponding to the sampling point and an amplitude value of the noisy audio data, and the first characteristic information is 1 or less.

12. The apparatus according to claim 11, wherein the determining module is specifically configured to convert the original audio data in the target sample data into a first frequency domain signal and to convert the noisy audio data in the target sample data into a second frequency domain signal; the target sample data is any one sample data of the plurality of sample data; and determining first characteristic information and amplitude characteristic information of the noisy frequency data in the target sample data at each sampling point according to the first frequency domain signal and the second frequency domain signal.

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the model training method of any of claims 1-6.

14. A storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the model training method of any of claims 1-6.