CN117238307A

CN117238307A - Audio optimization processing method and system based on deep learning

Info

Publication number: CN117238307A
Application number: CN202311500231.4A
Authority: CN
Inventors: 刘耀明; 翟立志
Original assignee: Shenzhen Cloudwinner Network Technology Co ltd
Current assignee: Shenzhen Cloudwinner Network Technology Co ltd
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2023-12-15
Anticipated expiration: 2043-11-13
Also published as: CN117238307B

Abstract

The invention discloses an audio optimization processing method and system based on deep learning, which relate to the technical field of audio data processing and comprise the following steps: acquiring an audio signal; based on deep learning, a background noise model is established, and an audio optimization model is established; dividing the noise into easily-identified noise and difficultly-identified noise by using a background noise model, and reducing the audio intensity of the subordinate easily-identified noise to a preset intensity; reducing the audio intensity of the first sinusoidal audio signal to a preset intensity; performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal; extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal; the optimized audio signal is output. Through setting up deep learning module, noise classification module, noise processing module and audio optimization module, it is more accurate to fall the noise treatment, can pointedly optimize the audio frequency, and then promotes the audio frequency effect.

Description

Audio optimization processing method and system based on deep learning

Technical Field

The invention relates to the technical field of audio data processing, in particular to an audio optimization processing method and system based on deep learning.

Background

With the development of internet technology, internet applications are rapidly developing; among other things, internet applications may include, but are not limited to: instant messaging applications, social networking services applications, voice communications applications, and the like. The internet application can be installed in a terminal such as a notebook computer, a mobile phone, a PAD and the like, and a terminal side user can use the internet application in the terminal to perform audio calls such as voice calls, audio chat and the like with other users. Sound quality is an important factor affecting audio calls.

In practice, it is found that when the audio system plays audio, the speaker is sometimes mixed with plosive or tone-changing and distortion phenomena. The reason for this phenomenon is found through repeated experiments: the optimization processing method of the data processing module is carried out according to a set fixed mode, namely, the same noise reduction optimization measures are adopted for all received audio data without distinction, so that the playing effect of some audio data after noise reduction optimization is worse.

Disclosure of Invention

In order to solve the technical problems, the technical scheme provides an audio optimization processing method and system based on deep learning, and solves the problem that the playing effect of some audio data after noise reduction optimization is worse due to the fact that the same noise reduction optimization measures are adopted for all received audio data in the background technology without distinction.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the audio optimization processing method based on deep learning comprises the following steps:

acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;

based on deep learning, a background noise model is established, and an audio optimization model is established;

obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;

for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;

acquiring at least one characteristic optimization audio signal by using an audio optimization model, capturing a second sinusoidal audio signal with a difference between the second sinusoidal audio signal and the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal on two definition domains;

extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;

and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.

Preferably, the sampling, quantizing and encoding comprises the steps of:

making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;

taking a limited time point to finish sampling;

collecting the corresponding audio signal intensity at the time point to finish quantization;

the resulting quantized data is encoded and represented in a digital format recognizable by a computer.

Preferably, the establishing the background noise model based on the deep learning includes the following steps:

the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;

sampling, quantizing and encoding the sample noise to obtain a noise digital signal.

Preferably, the building of the audio optimization model includes the following steps:

acquiring various standard audio signals by big data;

and sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals.

Preferably, the classifying the noise into the noise easy to be recognized and the noise difficult to be recognized according to the noise characteristics includes the following steps:

acquiring all noise digital signals in a background noise model;

integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;

if the integral difference is larger than the preset difference, the noise digital signal is easy to identify noise;

if the integral difference value does not exceed the preset difference value, the noise digital signal is indistinguishable noise.

Preferably, the fourier transform is specifically as follows:

wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;

the inverse fourier transform is specifically as follows:

where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.

Preferably, the capturing the subordinate easily identifiable noise in the at least one sinusoidal audio signal, and reducing the audio intensity of the subordinate easily identifiable noise to the preset intensity includes the following steps:

acquiring easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with the difference from the easy-to-recognize noise within a preset range;

obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise inverse signal of (1);

the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;

reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:

obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);

and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.

Preferably, the disturbing processing of the second sinusoidal audio signal comprises the steps of:

obtaining a sine function fitting a second sinusoidal audio signalObtaining a characteristic function of fitting the characteristic optimized audio signal>；

Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>And obtaining a third sinusoidal audio signal to finish the disturbance.

Preferably, the audio key frame extraction is performed on the third sinusoidal audio signal, and the changing of the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame, includes the following steps:

the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;

searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;

intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;

acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;

function of mutation cancellation signal，/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;

and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.

The audio optimization processing system based on the deep learning is used for realizing the audio optimization processing method based on the deep learning, and comprises the following steps:

the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;

the Fourier processing module performs Fourier transformation and inverse Fourier transformation;

the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;

the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;

the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;

the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.

Compared with the prior art, the invention has the beneficial effects that:

through setting up deep learning module, noise classification module, noise processing module and audio optimization module, classify the noise, divide into easily discern noise and difficultly discern the noise with the noise, adopt two kinds of noise reduction methods, fall the noise of easily discerning respectively with difficultly discerning the noise, it is more accurate to fall the noise treatment, simultaneously also very swiftly, can not influence audio data, after decomposing audio Fourier transform, use the standard audio in the audio optimization model to optimize the signal of decomposition, to the decomposition signal that exists the explosion point individually, use audio keyframe intercepting mode to carry out audio optimization, can pointedly optimize the audio, and then promote audio effect.

Drawings

FIG. 1 is a schematic flow chart of an audio optimization processing method based on deep learning;

FIG. 2 is a schematic diagram of the sampling, quantization and encoding flow of the present invention;

FIG. 3 is a schematic flow chart of the background noise model establishment based on deep learning;

FIG. 4 is a schematic flow chart of the audio optimization model establishment method;

FIG. 5 is a flow chart of the noise classification according to the noise characteristics of the present invention, wherein the noise is easily distinguished and the noise is difficult to distinguish;

FIG. 6 is a flow chart of capturing a slave easily identifiable noise in at least one sinusoidal audio signal, and reducing the audio intensity of the slave easily identifiable noise to a predetermined intensity;

FIG. 7 is a schematic diagram of a disturbance processing flow for a second sinusoidal audio signal according to the present invention;

fig. 8 is a schematic flow chart of audio key frame extraction for a third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal intercepted by the audio key frame.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.

Referring to fig. 1, the audio optimization processing method based on deep learning includes:

acquiring at least one characteristic optimization audio signal by using an audio optimization model, and capturing a second sinusoidal audio signal with a difference within a preset range from the characteristic optimization audio signal in the at least one sinusoidal audio signal, wherein the second sinusoidal audio signal is necessarily the same kind of signal as the characteristic optimization audio signal, but the difference exists between the second sinusoidal audio signal and the characteristic optimization audio signal, and the characteristic optimization audio signal is a standard signal, so that the second sinusoidal audio signal is optimized based on the characteristic optimization audio signal, and disturbance processing is performed on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal in a definition domain of the second sinusoidal audio signal and the characteristic optimization audio signal;

Referring to fig. 2, the sampling, quantization and encoding includes the steps of:

taking a limited time point to finish sampling;

Referring to fig. 3, based on deep learning, the background noise model is built up by the steps of:

sampling, quantizing and encoding the sample noise to obtain a noise digital signal;

i.e. all possible noise present in the audio signal is collected as samples for comparison to find noise in the audio signal when the audio signal is denoised.

Referring to fig. 4, the audio optimization model is built up by the steps of:

acquiring various standard audio signals by big data;

sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals;

during optimization, the audio digital signal is decomposed into at least one sinusoidal audio signal, various standard audio signals are stored in an audio optimization model in advance, and the sinusoidal audio signal is optimized by using the standard audio signals.

Referring to fig. 5, according to noise characteristics, classifying noise into easily identifiable noise and difficult-to-identify noise includes the steps of:

acquiring all noise digital signals in a background noise model;

if the integral difference is larger than the preset difference, the difference between the noise digital signal and the audio digital signal is very large, and the noise digital signal is easy to identify;

if the integral difference value does not exceed the preset difference value, the difference between the noise digital signal and the audio digital signal is not obvious, and the noise digital signal is difficult to identify.

The fourier transform is specifically as follows:

the inverse fourier transform is specifically as follows:

Referring to fig. 6, capturing the slave easily identifiable noise in the at least one sinusoidal audio signal, and reducing the audio intensity of the slave easily identifiable noise to a preset intensity includes the steps of:

noise is present in the audio digital signal, and therefore, noise is also present in at least one sinusoidal audio signal obtained by decomposition of the audio digital signal;

the method comprises the steps of obtaining easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with a difference from the easy-to-recognize noise within a preset range, wherein the subordinate easy-to-recognize noise is close to the easy-to-recognize noise, so that the subordinate easy-to-recognize noise is necessarily noise;

obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise signal which is easily distinguished from the subordinate noiseSignals with opposite tones are overlapped and eliminated mutually;

Referring to fig. 7, the disturbance processing of the second sinusoidal audio signal includes the steps of:

Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>Obtaining a third sinusoidal audioSignal, finish the disturbance;

the disturbed third sinusoidal audio signal is still not exactly identical to the feature optimized audio signal, and the third sinusoidal audio signal may at some point have a plosive, i.e. a sudden change, due to the sinusoidal functionIs an approximate fitting function to the second sinusoidal audio signal, characteristic function +.>The method is an approximate fitting function of the feature optimization audio signal, and no abrupt sound exists in the sine function and the feature function, so that disturbance completed by using the sine function and the feature function cannot eliminate abrupt sound in the second sinusoidal audio signal, and abrupt sound exists in the obtained third sinusoidal audio signal.

Therefore, for abrupt voice in the third sinusoidal audio signal, audio key frame extraction is performed, and the abrupt voice is cancelled in the audio key frame.

Referring to fig. 8, performing audio key frame extraction on the third sinusoidal audio signal, and changing the abrupt part of the third sinusoidal audio signal that is intercepted by the audio key frame includes the steps of:

The working process of the audio optimization processing system based on deep learning is as follows:

step one: the audio processing module acquires an audio signal, digitizes the continuous signal into an audio digital signal, the digitizing comprises sampling, quantizing and encoding, and the Fourier processing module decomposes the audio digital signal into at least one sinusoidal audio signal by using Fourier transformation;

step two: the deep learning module establishes a background noise model and an audio optimization model based on deep learning;

step three: the noise classification module obtains noise characteristics by utilizing a background noise model, and divides the noise into easily-identifiable noise and difficultly-identifiable noise according to the noise characteristics, the noise processing module captures subordinate easily-identifiable noise in at least one sinusoidal audio signal, reduces the audio intensity of the subordinate easily-identifiable noise to preset intensity, and the human ear cannot identify audio with the intensity lower than the preset intensity;

step four: the method comprises the steps that for indistinguishable noise, a Fourier processing module uses Fourier transformation to decompose the indistinguishable noise into at least one sinusoidal noise signal, and the noise processing module captures a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reduces the audio intensity of the first sinusoidal audio signal to a preset intensity;

step five: the audio optimization module acquires at least one characteristic optimization audio signal by using an audio optimization model, captures a second sinusoidal audio signal with a difference from the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal;

step six: the audio optimization module extracts audio key frames of the third sinusoidal audio signal and changes abrupt parts, intercepted by the audio key frames, in the third sinusoidal audio signal;

step seven: the Fourier processing module performs inverse Fourier transformation on the at least one adjusted sinusoidal audio signal, the audio processing module obtains an optimized audio digital signal, and the audio digital signal is reconverted into an optimized audio signal to output the optimized audio signal.

Still further, the present solution also proposes a storage medium having a computer readable program stored thereon, the computer readable program executing the above-described deep learning-based audio optimization processing method when called.

It is understood that the storage medium may be a magnetic medium, e.g., floppy disk, hard disk, magnetic tape; optical media such as DVD; or a semiconductor medium such as a solid state disk SolidStateDisk, SSD, etc.

In summary, the invention has the advantages that: through setting up deep learning module, noise classification module, noise processing module and audio optimization module, classify the noise, divide into easily discern noise and difficultly discern the noise with the noise, adopt two kinds of noise reduction methods, fall the noise of easily discerning respectively with difficultly discerning the noise, it is more accurate to fall the noise treatment, simultaneously also very swiftly, can not influence audio data, after decomposing audio Fourier transform, use the standard audio in the audio optimization model to optimize the signal of decomposition, to the decomposition signal that exists the explosion point individually, use audio keyframe intercepting mode to carry out audio optimization, can pointedly optimize the audio, and then promote audio effect.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The audio optimization processing method based on deep learning is characterized by comprising the following steps of:

2. The deep learning based audio optimization processing method according to claim 1, wherein the sampling, quantizing and encoding comprises the steps of:

taking a limited time point to finish sampling;

3. The method for optimizing audio based on deep learning according to claim 2, wherein the step of establishing a background noise model based on the deep learning comprises the steps of:

4. The method for deep learning based audio optimization processing according to claim 3, wherein the step of creating the audio optimization model comprises the steps of:

acquiring various standard audio signals by big data;

5. The method for optimizing audio processing based on deep learning according to claim 4, wherein the classifying noise into easily identifiable noise and difficult-to-identify noise according to noise characteristics comprises the steps of:

acquiring all noise digital signals in a background noise model;

6. The method for optimizing audio based on deep learning according to claim 5, wherein the fourier transform is specifically as follows:

，

the inverse fourier transform is specifically as follows:

，

7. The method of deep learning based audio optimization processing of claim 6, wherein capturing the slave easily identifiable noise in the at least one sinusoidal audio signal and reducing the audio intensity of the slave easily identifiable noise to a preset intensity comprises the steps of:

8. The method of deep learning based audio optimization processing according to claim 7, wherein the perturbing the second sinusoidal audio signal comprises the steps of:

obtaining a sine function fitting a second sinusoidal audio signalObtaining a feature function of the fitting feature optimized audio signal；

9. The method for deep learning based audio optimization processing of claim 8, wherein the performing audio key frame extraction on the third sinusoidal audio signal, and changing the abrupt part of the third sinusoidal audio signal that is intercepted by the audio key frame comprises the steps of:

10. A deep learning-based audio optimization processing system for implementing the deep learning-based audio optimization processing method according to any one of claims 1 to 9, comprising: