CN117238307A - Audio optimization processing method and system based on deep learning - Google Patents

Audio optimization processing method and system based on deep learning Download PDF

Info

Publication number
CN117238307A
CN117238307A CN202311500231.4A CN202311500231A CN117238307A CN 117238307 A CN117238307 A CN 117238307A CN 202311500231 A CN202311500231 A CN 202311500231A CN 117238307 A CN117238307 A CN 117238307A
Authority
CN
China
Prior art keywords
audio
noise
signal
sinusoidal
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311500231.4A
Other languages
Chinese (zh)
Other versions
CN117238307B (en
Inventor
刘耀明
翟立志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Cloudwinner Network Technology Co ltd
Original Assignee
Shenzhen Cloudwinner Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Cloudwinner Network Technology Co ltd filed Critical Shenzhen Cloudwinner Network Technology Co ltd
Priority to CN202311500231.4A priority Critical patent/CN117238307B/en
Publication of CN117238307A publication Critical patent/CN117238307A/en
Application granted granted Critical
Publication of CN117238307B publication Critical patent/CN117238307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an audio optimization processing method and system based on deep learning, which relate to the technical field of audio data processing and comprise the following steps: acquiring an audio signal; based on deep learning, a background noise model is established, and an audio optimization model is established; dividing the noise into easily-identified noise and difficultly-identified noise by using a background noise model, and reducing the audio intensity of the subordinate easily-identified noise to a preset intensity; reducing the audio intensity of the first sinusoidal audio signal to a preset intensity; performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal; extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal; the optimized audio signal is output. Through setting up deep learning module, noise classification module, noise processing module and audio optimization module, it is more accurate to fall the noise treatment, can pointedly optimize the audio frequency, and then promotes the audio frequency effect.

Description

Audio optimization processing method and system based on deep learning
Technical Field
The invention relates to the technical field of audio data processing, in particular to an audio optimization processing method and system based on deep learning.
Background
With the development of internet technology, internet applications are rapidly developing; among other things, internet applications may include, but are not limited to: instant messaging applications, social networking services applications, voice communications applications, and the like. The internet application can be installed in a terminal such as a notebook computer, a mobile phone, a PAD and the like, and a terminal side user can use the internet application in the terminal to perform audio calls such as voice calls, audio chat and the like with other users. Sound quality is an important factor affecting audio calls.
In practice, it is found that when the audio system plays audio, the speaker is sometimes mixed with plosive or tone-changing and distortion phenomena. The reason for this phenomenon is found through repeated experiments: the optimization processing method of the data processing module is carried out according to a set fixed mode, namely, the same noise reduction optimization measures are adopted for all received audio data without distinction, so that the playing effect of some audio data after noise reduction optimization is worse.
Disclosure of Invention
In order to solve the technical problems, the technical scheme provides an audio optimization processing method and system based on deep learning, and solves the problem that the playing effect of some audio data after noise reduction optimization is worse due to the fact that the same noise reduction optimization measures are adopted for all received audio data in the background technology without distinction.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the audio optimization processing method based on deep learning comprises the following steps:
acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;
based on deep learning, a background noise model is established, and an audio optimization model is established;
obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;
for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;
acquiring at least one characteristic optimization audio signal by using an audio optimization model, capturing a second sinusoidal audio signal with a difference between the second sinusoidal audio signal and the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal on two definition domains;
extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;
and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.
Preferably, the sampling, quantizing and encoding comprises the steps of:
making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;
taking a limited time point to finish sampling;
collecting the corresponding audio signal intensity at the time point to finish quantization;
the resulting quantized data is encoded and represented in a digital format recognizable by a computer.
Preferably, the establishing the background noise model based on the deep learning includes the following steps:
the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;
sampling, quantizing and encoding the sample noise to obtain a noise digital signal.
Preferably, the building of the audio optimization model includes the following steps:
acquiring various standard audio signals by big data;
and sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals.
Preferably, the classifying the noise into the noise easy to be recognized and the noise difficult to be recognized according to the noise characteristics includes the following steps:
acquiring all noise digital signals in a background noise model;
integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;
if the integral difference is larger than the preset difference, the noise digital signal is easy to identify noise;
if the integral difference value does not exceed the preset difference value, the noise digital signal is indistinguishable noise.
Preferably, the fourier transform is specifically as follows:
wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;
the inverse fourier transform is specifically as follows:
where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.
Preferably, the capturing the subordinate easily identifiable noise in the at least one sinusoidal audio signal, and reducing the audio intensity of the subordinate easily identifiable noise to the preset intensity includes the following steps:
acquiring easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with the difference from the easy-to-recognize noise within a preset range;
obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise inverse signal of (1);
the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;
reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:
obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);
and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.
Preferably, the disturbing processing of the second sinusoidal audio signal comprises the steps of:
obtaining a sine function fitting a second sinusoidal audio signalObtaining a characteristic function of fitting the characteristic optimized audio signal>
Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>And obtaining a third sinusoidal audio signal to finish the disturbance.
Preferably, the audio key frame extraction is performed on the third sinusoidal audio signal, and the changing of the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame, includes the following steps:
the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;
searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;
intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;
acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;
function of mutation cancellation signal,/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;
and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.
The audio optimization processing system based on the deep learning is used for realizing the audio optimization processing method based on the deep learning, and comprises the following steps:
the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;
the Fourier processing module performs Fourier transformation and inverse Fourier transformation;
the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;
the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;
the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;
the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.
Compared with the prior art, the invention has the beneficial effects that:
through setting up deep learning module, noise classification module, noise processing module and audio optimization module, classify the noise, divide into easily discern noise and difficultly discern the noise with the noise, adopt two kinds of noise reduction methods, fall the noise of easily discerning respectively with difficultly discerning the noise, it is more accurate to fall the noise treatment, simultaneously also very swiftly, can not influence audio data, after decomposing audio Fourier transform, use the standard audio in the audio optimization model to optimize the signal of decomposition, to the decomposition signal that exists the explosion point individually, use audio keyframe intercepting mode to carry out audio optimization, can pointedly optimize the audio, and then promote audio effect.
Drawings
FIG. 1 is a schematic flow chart of an audio optimization processing method based on deep learning;
FIG. 2 is a schematic diagram of the sampling, quantization and encoding flow of the present invention;
FIG. 3 is a schematic flow chart of the background noise model establishment based on deep learning;
FIG. 4 is a schematic flow chart of the audio optimization model establishment method;
FIG. 5 is a flow chart of the noise classification according to the noise characteristics of the present invention, wherein the noise is easily distinguished and the noise is difficult to distinguish;
FIG. 6 is a flow chart of capturing a slave easily identifiable noise in at least one sinusoidal audio signal, and reducing the audio intensity of the slave easily identifiable noise to a predetermined intensity;
FIG. 7 is a schematic diagram of a disturbance processing flow for a second sinusoidal audio signal according to the present invention;
fig. 8 is a schematic flow chart of audio key frame extraction for a third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal intercepted by the audio key frame.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.
Referring to fig. 1, the audio optimization processing method based on deep learning includes:
acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;
based on deep learning, a background noise model is established, and an audio optimization model is established;
obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;
for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;
acquiring at least one characteristic optimization audio signal by using an audio optimization model, and capturing a second sinusoidal audio signal with a difference within a preset range from the characteristic optimization audio signal in the at least one sinusoidal audio signal, wherein the second sinusoidal audio signal is necessarily the same kind of signal as the characteristic optimization audio signal, but the difference exists between the second sinusoidal audio signal and the characteristic optimization audio signal, and the characteristic optimization audio signal is a standard signal, so that the second sinusoidal audio signal is optimized based on the characteristic optimization audio signal, and disturbance processing is performed on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal in a definition domain of the second sinusoidal audio signal and the characteristic optimization audio signal;
extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;
and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.
Referring to fig. 2, the sampling, quantization and encoding includes the steps of:
making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;
taking a limited time point to finish sampling;
collecting the corresponding audio signal intensity at the time point to finish quantization;
the resulting quantized data is encoded and represented in a digital format recognizable by a computer.
Referring to fig. 3, based on deep learning, the background noise model is built up by the steps of:
the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;
sampling, quantizing and encoding the sample noise to obtain a noise digital signal;
i.e. all possible noise present in the audio signal is collected as samples for comparison to find noise in the audio signal when the audio signal is denoised.
Referring to fig. 4, the audio optimization model is built up by the steps of:
acquiring various standard audio signals by big data;
sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals;
during optimization, the audio digital signal is decomposed into at least one sinusoidal audio signal, various standard audio signals are stored in an audio optimization model in advance, and the sinusoidal audio signal is optimized by using the standard audio signals.
Referring to fig. 5, according to noise characteristics, classifying noise into easily identifiable noise and difficult-to-identify noise includes the steps of:
acquiring all noise digital signals in a background noise model;
integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;
if the integral difference is larger than the preset difference, the difference between the noise digital signal and the audio digital signal is very large, and the noise digital signal is easy to identify;
if the integral difference value does not exceed the preset difference value, the difference between the noise digital signal and the audio digital signal is not obvious, and the noise digital signal is difficult to identify.
The fourier transform is specifically as follows:
wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;
the inverse fourier transform is specifically as follows:
where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.
Referring to fig. 6, capturing the slave easily identifiable noise in the at least one sinusoidal audio signal, and reducing the audio intensity of the slave easily identifiable noise to a preset intensity includes the steps of:
noise is present in the audio digital signal, and therefore, noise is also present in at least one sinusoidal audio signal obtained by decomposition of the audio digital signal;
the method comprises the steps of obtaining easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with a difference from the easy-to-recognize noise within a preset range, wherein the subordinate easy-to-recognize noise is close to the easy-to-recognize noise, so that the subordinate easy-to-recognize noise is necessarily noise;
obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise signal which is easily distinguished from the subordinate noiseSignals with opposite tones are overlapped and eliminated mutually;
the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;
reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:
obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);
and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.
Referring to fig. 7, the disturbance processing of the second sinusoidal audio signal includes the steps of:
obtaining a sine function fitting a second sinusoidal audio signalObtaining a characteristic function of fitting the characteristic optimized audio signal>
Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>Obtaining a third sinusoidal audioSignal, finish the disturbance;
the disturbed third sinusoidal audio signal is still not exactly identical to the feature optimized audio signal, and the third sinusoidal audio signal may at some point have a plosive, i.e. a sudden change, due to the sinusoidal functionIs an approximate fitting function to the second sinusoidal audio signal, characteristic function +.>The method is an approximate fitting function of the feature optimization audio signal, and no abrupt sound exists in the sine function and the feature function, so that disturbance completed by using the sine function and the feature function cannot eliminate abrupt sound in the second sinusoidal audio signal, and abrupt sound exists in the obtained third sinusoidal audio signal.
Therefore, for abrupt voice in the third sinusoidal audio signal, audio key frame extraction is performed, and the abrupt voice is cancelled in the audio key frame.
Referring to fig. 8, performing audio key frame extraction on the third sinusoidal audio signal, and changing the abrupt part of the third sinusoidal audio signal that is intercepted by the audio key frame includes the steps of:
the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;
searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;
intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;
acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;
function of mutation cancellation signal,/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;
and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.
The audio optimization processing system based on the deep learning is used for realizing the audio optimization processing method based on the deep learning, and comprises the following steps:
the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;
the Fourier processing module performs Fourier transformation and inverse Fourier transformation;
the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;
the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;
the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;
the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.
The working process of the audio optimization processing system based on deep learning is as follows:
step one: the audio processing module acquires an audio signal, digitizes the continuous signal into an audio digital signal, the digitizing comprises sampling, quantizing and encoding, and the Fourier processing module decomposes the audio digital signal into at least one sinusoidal audio signal by using Fourier transformation;
step two: the deep learning module establishes a background noise model and an audio optimization model based on deep learning;
step three: the noise classification module obtains noise characteristics by utilizing a background noise model, and divides the noise into easily-identifiable noise and difficultly-identifiable noise according to the noise characteristics, the noise processing module captures subordinate easily-identifiable noise in at least one sinusoidal audio signal, reduces the audio intensity of the subordinate easily-identifiable noise to preset intensity, and the human ear cannot identify audio with the intensity lower than the preset intensity;
step four: the method comprises the steps that for indistinguishable noise, a Fourier processing module uses Fourier transformation to decompose the indistinguishable noise into at least one sinusoidal noise signal, and the noise processing module captures a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reduces the audio intensity of the first sinusoidal audio signal to a preset intensity;
step five: the audio optimization module acquires at least one characteristic optimization audio signal by using an audio optimization model, captures a second sinusoidal audio signal with a difference from the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal;
step six: the audio optimization module extracts audio key frames of the third sinusoidal audio signal and changes abrupt parts, intercepted by the audio key frames, in the third sinusoidal audio signal;
step seven: the Fourier processing module performs inverse Fourier transformation on the at least one adjusted sinusoidal audio signal, the audio processing module obtains an optimized audio digital signal, and the audio digital signal is reconverted into an optimized audio signal to output the optimized audio signal.
Still further, the present solution also proposes a storage medium having a computer readable program stored thereon, the computer readable program executing the above-described deep learning-based audio optimization processing method when called.
It is understood that the storage medium may be a magnetic medium, e.g., floppy disk, hard disk, magnetic tape; optical media such as DVD; or a semiconductor medium such as a solid state disk SolidStateDisk, SSD, etc.
In summary, the invention has the advantages that: through setting up deep learning module, noise classification module, noise processing module and audio optimization module, classify the noise, divide into easily discern noise and difficultly discern the noise with the noise, adopt two kinds of noise reduction methods, fall the noise of easily discerning respectively with difficultly discerning the noise, it is more accurate to fall the noise treatment, simultaneously also very swiftly, can not influence audio data, after decomposing audio Fourier transform, use the standard audio in the audio optimization model to optimize the signal of decomposition, to the decomposition signal that exists the explosion point individually, use audio keyframe intercepting mode to carry out audio optimization, can pointedly optimize the audio, and then promote audio effect.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. The audio optimization processing method based on deep learning is characterized by comprising the following steps of:
acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;
based on deep learning, a background noise model is established, and an audio optimization model is established;
obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;
for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;
acquiring at least one characteristic optimization audio signal by using an audio optimization model, capturing a second sinusoidal audio signal with a difference between the second sinusoidal audio signal and the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal on two definition domains;
extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;
and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.
2. The deep learning based audio optimization processing method according to claim 1, wherein the sampling, quantizing and encoding comprises the steps of:
making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;
taking a limited time point to finish sampling;
collecting the corresponding audio signal intensity at the time point to finish quantization;
the resulting quantized data is encoded and represented in a digital format recognizable by a computer.
3. The method for optimizing audio based on deep learning according to claim 2, wherein the step of establishing a background noise model based on the deep learning comprises the steps of:
the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;
sampling, quantizing and encoding the sample noise to obtain a noise digital signal.
4. The method for deep learning based audio optimization processing according to claim 3, wherein the step of creating the audio optimization model comprises the steps of:
acquiring various standard audio signals by big data;
and sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals.
5. The method for optimizing audio processing based on deep learning according to claim 4, wherein the classifying noise into easily identifiable noise and difficult-to-identify noise according to noise characteristics comprises the steps of:
acquiring all noise digital signals in a background noise model;
integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;
if the integral difference is larger than the preset difference, the noise digital signal is easy to identify noise;
if the integral difference value does not exceed the preset difference value, the noise digital signal is indistinguishable noise.
6. The method for optimizing audio based on deep learning according to claim 5, wherein the fourier transform is specifically as follows:
wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;
the inverse fourier transform is specifically as follows:
where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.
7. The method of deep learning based audio optimization processing of claim 6, wherein capturing the slave easily identifiable noise in the at least one sinusoidal audio signal and reducing the audio intensity of the slave easily identifiable noise to a preset intensity comprises the steps of:
acquiring easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with the difference from the easy-to-recognize noise within a preset range;
obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise inverse signal of (1);
the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;
reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:
obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);
and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.
8. The method of deep learning based audio optimization processing according to claim 7, wherein the perturbing the second sinusoidal audio signal comprises the steps of:
obtaining a sine function fitting a second sinusoidal audio signalObtaining a feature function of the fitting feature optimized audio signal
Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>And obtaining a third sinusoidal audio signal to finish the disturbance.
9. The method for deep learning based audio optimization processing of claim 8, wherein the performing audio key frame extraction on the third sinusoidal audio signal, and changing the abrupt part of the third sinusoidal audio signal that is intercepted by the audio key frame comprises the steps of:
the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;
searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;
intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;
acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;
function of mutation cancellation signal,/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;
and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.
10. A deep learning-based audio optimization processing system for implementing the deep learning-based audio optimization processing method according to any one of claims 1 to 9, comprising:
the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;
the Fourier processing module performs Fourier transformation and inverse Fourier transformation;
the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;
the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;
the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;
the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.
CN202311500231.4A 2023-11-13 2023-11-13 Audio optimization processing method and system based on deep learning Active CN117238307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311500231.4A CN117238307B (en) 2023-11-13 2023-11-13 Audio optimization processing method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311500231.4A CN117238307B (en) 2023-11-13 2023-11-13 Audio optimization processing method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN117238307A true CN117238307A (en) 2023-12-15
CN117238307B CN117238307B (en) 2024-02-09

Family

ID=89082901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311500231.4A Active CN117238307B (en) 2023-11-13 2023-11-13 Audio optimization processing method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN117238307B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627416A (en) * 2020-05-13 2020-09-04 广州国音智能科技有限公司 Audio noise elimination method, device, equipment and storage medium
CN112017630A (en) * 2020-08-19 2020-12-01 北京字节跳动网络技术有限公司 Language identification method and device, electronic equipment and storage medium
CN112509592A (en) * 2020-11-18 2021-03-16 广东美的白色家电技术创新中心有限公司 Electrical equipment, noise processing method and readable storage medium
CN112567458A (en) * 2018-08-16 2021-03-26 三菱电机株式会社 Audio signal processing system, audio signal processing method, and computer-readable storage medium
US11146607B1 (en) * 2019-05-31 2021-10-12 Dialpad, Inc. Smart noise cancellation
CN115134711A (en) * 2021-03-26 2022-09-30 Oppo广东移动通信有限公司 Noise reduction method and device for audio playing equipment, electronic equipment and storage medium
CN116543731A (en) * 2022-01-25 2023-08-04 深圳市理邦精密仪器股份有限公司 Noise reduction method and medical equipment
CN116567474A (en) * 2023-05-30 2023-08-08 深圳华钜芯半导体有限公司 Bluetooth headset intelligence judgement system of making an uproar falls based on sound data discernment
CN116801157A (en) * 2023-08-28 2023-09-22 深圳市鑫正宇科技有限公司 Wireless earphone assembly and signal processing method thereof
CN116825123A (en) * 2023-06-19 2023-09-29 广东保伦电子股份有限公司 Tone quality optimization method and system based on audio push
CN116884429A (en) * 2023-09-05 2023-10-13 深圳市极客空间科技有限公司 Audio processing method based on signal enhancement
CN116994552A (en) * 2023-09-28 2023-11-03 深圳市齐奥通信技术有限公司 Audio noise reduction method and system based on deep learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112567458A (en) * 2018-08-16 2021-03-26 三菱电机株式会社 Audio signal processing system, audio signal processing method, and computer-readable storage medium
US11146607B1 (en) * 2019-05-31 2021-10-12 Dialpad, Inc. Smart noise cancellation
CN111627416A (en) * 2020-05-13 2020-09-04 广州国音智能科技有限公司 Audio noise elimination method, device, equipment and storage medium
CN112017630A (en) * 2020-08-19 2020-12-01 北京字节跳动网络技术有限公司 Language identification method and device, electronic equipment and storage medium
CN112509592A (en) * 2020-11-18 2021-03-16 广东美的白色家电技术创新中心有限公司 Electrical equipment, noise processing method and readable storage medium
CN115134711A (en) * 2021-03-26 2022-09-30 Oppo广东移动通信有限公司 Noise reduction method and device for audio playing equipment, electronic equipment and storage medium
CN116543731A (en) * 2022-01-25 2023-08-04 深圳市理邦精密仪器股份有限公司 Noise reduction method and medical equipment
CN116567474A (en) * 2023-05-30 2023-08-08 深圳华钜芯半导体有限公司 Bluetooth headset intelligence judgement system of making an uproar falls based on sound data discernment
CN116825123A (en) * 2023-06-19 2023-09-29 广东保伦电子股份有限公司 Tone quality optimization method and system based on audio push
CN116801157A (en) * 2023-08-28 2023-09-22 深圳市鑫正宇科技有限公司 Wireless earphone assembly and signal processing method thereof
CN116884429A (en) * 2023-09-05 2023-10-13 深圳市极客空间科技有限公司 Audio processing method based on signal enhancement
CN116994552A (en) * 2023-09-28 2023-11-03 深圳市齐奥通信技术有限公司 Audio noise reduction method and system based on deep learning

Also Published As

Publication number Publication date
CN117238307B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
US8325909B2 (en) Acoustic echo suppression
CN110769111A (en) Noise reduction method, system, storage medium and terminal
CN106098078A (en) A kind of audio recognition method that may filter that speaker noise and system thereof
CN111107284B (en) Real-time generation system and generation method for video subtitles
TWI573133B (en) Audio signal processing system and method
CN107592600B (en) Pickup screening method and pickup device based on distributed microphones
US20230335148A1 (en) Speech Separation Method, Electronic Device, Chip, and Computer-Readable Storage Medium
CN117238307B (en) Audio optimization processing method and system based on deep learning
CN112185405B (en) Bone conduction voice enhancement method based on differential operation and combined dictionary learning
WO2024017110A1 (en) Voice noise reduction method, model training method, apparatus, device, medium, and product
CN112750426B (en) Voice analysis system of mobile terminal
CN113012073A (en) Training method and device for video quality improvement model
KR20110024969A (en) Apparatus for filtering noise by using statistical model in voice signal and method thereof
CN113674752A (en) Method and device for reducing noise of audio signal, readable medium and electronic equipment
CN110599406B (en) Image enhancement method and device
WO2022166738A1 (en) Speech enhancement method and apparatus, and device and storage medium
CN111968627B (en) Bone conduction voice enhancement method based on joint dictionary learning and sparse representation
CN115440240A (en) Training method for voice noise reduction, voice noise reduction system and voice noise reduction method
CN115083440A (en) Audio signal noise reduction method, electronic device, and storage medium
CN112750452A (en) Voice processing method, device and system, intelligent terminal and electronic equipment
CN108899041B (en) Voice signal noise adding method, device and storage medium
CN112562712A (en) Recording data processing method and system, electronic equipment and storage medium
CN113033767A (en) Knowledge distillation-based data compression recovery method and system for neural network
CN110958417A (en) Method for removing compression noise of video call video based on voice clue
CN117854514B (en) Wireless earphone communication decoding optimization method and system for sound quality fidelity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant