CN117238307A - Audio optimization processing method and system based on deep learning - Google Patents
Audio optimization processing method and system based on deep learning Download PDFInfo
- Publication number
- CN117238307A CN117238307A CN202311500231.4A CN202311500231A CN117238307A CN 117238307 A CN117238307 A CN 117238307A CN 202311500231 A CN202311500231 A CN 202311500231A CN 117238307 A CN117238307 A CN 117238307A
- Authority
- CN
- China
- Prior art keywords
- audio
- noise
- signal
- sinusoidal
- audio signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 86
- 238000013135 deep learning Methods 0.000 title claims abstract description 44
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 230000005236 sound signal Effects 0.000 claims abstract description 235
- 238000012545 processing Methods 0.000 claims abstract description 42
- 230000008859 change Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 25
- 238000000034 method Methods 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 17
- 230000009466 transformation Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 230000008030 elimination Effects 0.000 claims description 6
- 238000003379 elimination reaction Methods 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 6
- 238000013139 quantization Methods 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 3
- 210000005069 ears Anatomy 0.000 claims description 3
- 230000003094 perturbing effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 5
- 230000009467 reduction Effects 0.000 description 6
- 238000000354 decomposition reaction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses an audio optimization processing method and system based on deep learning, which relate to the technical field of audio data processing and comprise the following steps: acquiring an audio signal; based on deep learning, a background noise model is established, and an audio optimization model is established; dividing the noise into easily-identified noise and difficultly-identified noise by using a background noise model, and reducing the audio intensity of the subordinate easily-identified noise to a preset intensity; reducing the audio intensity of the first sinusoidal audio signal to a preset intensity; performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal; extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal; the optimized audio signal is output. Through setting up deep learning module, noise classification module, noise processing module and audio optimization module, it is more accurate to fall the noise treatment, can pointedly optimize the audio frequency, and then promotes the audio frequency effect.
Description
Technical Field
The invention relates to the technical field of audio data processing, in particular to an audio optimization processing method and system based on deep learning.
Background
With the development of internet technology, internet applications are rapidly developing; among other things, internet applications may include, but are not limited to: instant messaging applications, social networking services applications, voice communications applications, and the like. The internet application can be installed in a terminal such as a notebook computer, a mobile phone, a PAD and the like, and a terminal side user can use the internet application in the terminal to perform audio calls such as voice calls, audio chat and the like with other users. Sound quality is an important factor affecting audio calls.
In practice, it is found that when the audio system plays audio, the speaker is sometimes mixed with plosive or tone-changing and distortion phenomena. The reason for this phenomenon is found through repeated experiments: the optimization processing method of the data processing module is carried out according to a set fixed mode, namely, the same noise reduction optimization measures are adopted for all received audio data without distinction, so that the playing effect of some audio data after noise reduction optimization is worse.
Disclosure of Invention
In order to solve the technical problems, the technical scheme provides an audio optimization processing method and system based on deep learning, and solves the problem that the playing effect of some audio data after noise reduction optimization is worse due to the fact that the same noise reduction optimization measures are adopted for all received audio data in the background technology without distinction.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the audio optimization processing method based on deep learning comprises the following steps:
acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;
based on deep learning, a background noise model is established, and an audio optimization model is established;
obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;
for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;
acquiring at least one characteristic optimization audio signal by using an audio optimization model, capturing a second sinusoidal audio signal with a difference between the second sinusoidal audio signal and the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal on two definition domains;
extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;
and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.
Preferably, the sampling, quantizing and encoding comprises the steps of:
making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;
taking a limited time point to finish sampling;
collecting the corresponding audio signal intensity at the time point to finish quantization;
the resulting quantized data is encoded and represented in a digital format recognizable by a computer.
Preferably, the establishing the background noise model based on the deep learning includes the following steps:
the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;
sampling, quantizing and encoding the sample noise to obtain a noise digital signal.
Preferably, the building of the audio optimization model includes the following steps:
acquiring various standard audio signals by big data;
and sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals.
Preferably, the classifying the noise into the noise easy to be recognized and the noise difficult to be recognized according to the noise characteristics includes the following steps:
acquiring all noise digital signals in a background noise model;
integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;
if the integral difference is larger than the preset difference, the noise digital signal is easy to identify noise;
if the integral difference value does not exceed the preset difference value, the noise digital signal is indistinguishable noise.
Preferably, the fourier transform is specifically as follows:
wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;
the inverse fourier transform is specifically as follows:
where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.
Preferably, the capturing the subordinate easily identifiable noise in the at least one sinusoidal audio signal, and reducing the audio intensity of the subordinate easily identifiable noise to the preset intensity includes the following steps:
acquiring easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with the difference from the easy-to-recognize noise within a preset range;
obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise inverse signal of (1);
the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;
reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:
obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);
and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.
Preferably, the disturbing processing of the second sinusoidal audio signal comprises the steps of:
obtaining a sine function fitting a second sinusoidal audio signalObtaining a characteristic function of fitting the characteristic optimized audio signal>;
Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>And obtaining a third sinusoidal audio signal to finish the disturbance.
Preferably, the audio key frame extraction is performed on the third sinusoidal audio signal, and the changing of the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame, includes the following steps:
the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;
searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;
intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;
acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;
function of mutation cancellation signal,/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;
and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.
The audio optimization processing system based on the deep learning is used for realizing the audio optimization processing method based on the deep learning, and comprises the following steps:
the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;
the Fourier processing module performs Fourier transformation and inverse Fourier transformation;
the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;
the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;
the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;
the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.
Compared with the prior art, the invention has the beneficial effects that:
through setting up deep learning module, noise classification module, noise processing module and audio optimization module, classify the noise, divide into easily discern noise and difficultly discern the noise with the noise, adopt two kinds of noise reduction methods, fall the noise of easily discerning respectively with difficultly discerning the noise, it is more accurate to fall the noise treatment, simultaneously also very swiftly, can not influence audio data, after decomposing audio Fourier transform, use the standard audio in the audio optimization model to optimize the signal of decomposition, to the decomposition signal that exists the explosion point individually, use audio keyframe intercepting mode to carry out audio optimization, can pointedly optimize the audio, and then promote audio effect.
Drawings
FIG. 1 is a schematic flow chart of an audio optimization processing method based on deep learning;
FIG. 2 is a schematic diagram of the sampling, quantization and encoding flow of the present invention;
FIG. 3 is a schematic flow chart of the background noise model establishment based on deep learning;
FIG. 4 is a schematic flow chart of the audio optimization model establishment method;
FIG. 5 is a flow chart of the noise classification according to the noise characteristics of the present invention, wherein the noise is easily distinguished and the noise is difficult to distinguish;
FIG. 6 is a flow chart of capturing a slave easily identifiable noise in at least one sinusoidal audio signal, and reducing the audio intensity of the slave easily identifiable noise to a predetermined intensity;
FIG. 7 is a schematic diagram of a disturbance processing flow for a second sinusoidal audio signal according to the present invention;
fig. 8 is a schematic flow chart of audio key frame extraction for a third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal intercepted by the audio key frame.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.
Referring to fig. 1, the audio optimization processing method based on deep learning includes:
acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;
based on deep learning, a background noise model is established, and an audio optimization model is established;
obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;
for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;
acquiring at least one characteristic optimization audio signal by using an audio optimization model, and capturing a second sinusoidal audio signal with a difference within a preset range from the characteristic optimization audio signal in the at least one sinusoidal audio signal, wherein the second sinusoidal audio signal is necessarily the same kind of signal as the characteristic optimization audio signal, but the difference exists between the second sinusoidal audio signal and the characteristic optimization audio signal, and the characteristic optimization audio signal is a standard signal, so that the second sinusoidal audio signal is optimized based on the characteristic optimization audio signal, and disturbance processing is performed on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal in a definition domain of the second sinusoidal audio signal and the characteristic optimization audio signal;
extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;
and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.
Referring to fig. 2, the sampling, quantization and encoding includes the steps of:
making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;
taking a limited time point to finish sampling;
collecting the corresponding audio signal intensity at the time point to finish quantization;
the resulting quantized data is encoded and represented in a digital format recognizable by a computer.
Referring to fig. 3, based on deep learning, the background noise model is built up by the steps of:
the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;
sampling, quantizing and encoding the sample noise to obtain a noise digital signal;
i.e. all possible noise present in the audio signal is collected as samples for comparison to find noise in the audio signal when the audio signal is denoised.
Referring to fig. 4, the audio optimization model is built up by the steps of:
acquiring various standard audio signals by big data;
sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals;
during optimization, the audio digital signal is decomposed into at least one sinusoidal audio signal, various standard audio signals are stored in an audio optimization model in advance, and the sinusoidal audio signal is optimized by using the standard audio signals.
Referring to fig. 5, according to noise characteristics, classifying noise into easily identifiable noise and difficult-to-identify noise includes the steps of:
acquiring all noise digital signals in a background noise model;
integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;
if the integral difference is larger than the preset difference, the difference between the noise digital signal and the audio digital signal is very large, and the noise digital signal is easy to identify;
if the integral difference value does not exceed the preset difference value, the difference between the noise digital signal and the audio digital signal is not obvious, and the noise digital signal is difficult to identify.
The fourier transform is specifically as follows:
wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;
the inverse fourier transform is specifically as follows:
where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.
Referring to fig. 6, capturing the slave easily identifiable noise in the at least one sinusoidal audio signal, and reducing the audio intensity of the slave easily identifiable noise to a preset intensity includes the steps of:
noise is present in the audio digital signal, and therefore, noise is also present in at least one sinusoidal audio signal obtained by decomposition of the audio digital signal;
the method comprises the steps of obtaining easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with a difference from the easy-to-recognize noise within a preset range, wherein the subordinate easy-to-recognize noise is close to the easy-to-recognize noise, so that the subordinate easy-to-recognize noise is necessarily noise;
obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise signal which is easily distinguished from the subordinate noiseSignals with opposite tones are overlapped and eliminated mutually;
the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;
reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:
obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);
and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.
Referring to fig. 7, the disturbance processing of the second sinusoidal audio signal includes the steps of:
obtaining a sine function fitting a second sinusoidal audio signalObtaining a characteristic function of fitting the characteristic optimized audio signal>;
Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>Obtaining a third sinusoidal audioSignal, finish the disturbance;
the disturbed third sinusoidal audio signal is still not exactly identical to the feature optimized audio signal, and the third sinusoidal audio signal may at some point have a plosive, i.e. a sudden change, due to the sinusoidal functionIs an approximate fitting function to the second sinusoidal audio signal, characteristic function +.>The method is an approximate fitting function of the feature optimization audio signal, and no abrupt sound exists in the sine function and the feature function, so that disturbance completed by using the sine function and the feature function cannot eliminate abrupt sound in the second sinusoidal audio signal, and abrupt sound exists in the obtained third sinusoidal audio signal.
Therefore, for abrupt voice in the third sinusoidal audio signal, audio key frame extraction is performed, and the abrupt voice is cancelled in the audio key frame.
Referring to fig. 8, performing audio key frame extraction on the third sinusoidal audio signal, and changing the abrupt part of the third sinusoidal audio signal that is intercepted by the audio key frame includes the steps of:
the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;
searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;
intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;
acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;
function of mutation cancellation signal,/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;
and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.
The audio optimization processing system based on the deep learning is used for realizing the audio optimization processing method based on the deep learning, and comprises the following steps:
the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;
the Fourier processing module performs Fourier transformation and inverse Fourier transformation;
the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;
the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;
the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;
the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.
The working process of the audio optimization processing system based on deep learning is as follows:
step one: the audio processing module acquires an audio signal, digitizes the continuous signal into an audio digital signal, the digitizing comprises sampling, quantizing and encoding, and the Fourier processing module decomposes the audio digital signal into at least one sinusoidal audio signal by using Fourier transformation;
step two: the deep learning module establishes a background noise model and an audio optimization model based on deep learning;
step three: the noise classification module obtains noise characteristics by utilizing a background noise model, and divides the noise into easily-identifiable noise and difficultly-identifiable noise according to the noise characteristics, the noise processing module captures subordinate easily-identifiable noise in at least one sinusoidal audio signal, reduces the audio intensity of the subordinate easily-identifiable noise to preset intensity, and the human ear cannot identify audio with the intensity lower than the preset intensity;
step four: the method comprises the steps that for indistinguishable noise, a Fourier processing module uses Fourier transformation to decompose the indistinguishable noise into at least one sinusoidal noise signal, and the noise processing module captures a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reduces the audio intensity of the first sinusoidal audio signal to a preset intensity;
step five: the audio optimization module acquires at least one characteristic optimization audio signal by using an audio optimization model, captures a second sinusoidal audio signal with a difference from the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal;
step six: the audio optimization module extracts audio key frames of the third sinusoidal audio signal and changes abrupt parts, intercepted by the audio key frames, in the third sinusoidal audio signal;
step seven: the Fourier processing module performs inverse Fourier transformation on the at least one adjusted sinusoidal audio signal, the audio processing module obtains an optimized audio digital signal, and the audio digital signal is reconverted into an optimized audio signal to output the optimized audio signal.
Still further, the present solution also proposes a storage medium having a computer readable program stored thereon, the computer readable program executing the above-described deep learning-based audio optimization processing method when called.
It is understood that the storage medium may be a magnetic medium, e.g., floppy disk, hard disk, magnetic tape; optical media such as DVD; or a semiconductor medium such as a solid state disk SolidStateDisk, SSD, etc.
In summary, the invention has the advantages that: through setting up deep learning module, noise classification module, noise processing module and audio optimization module, classify the noise, divide into easily discern noise and difficultly discern the noise with the noise, adopt two kinds of noise reduction methods, fall the noise of easily discerning respectively with difficultly discerning the noise, it is more accurate to fall the noise treatment, simultaneously also very swiftly, can not influence audio data, after decomposing audio Fourier transform, use the standard audio in the audio optimization model to optimize the signal of decomposition, to the decomposition signal that exists the explosion point individually, use audio keyframe intercepting mode to carry out audio optimization, can pointedly optimize the audio, and then promote audio effect.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. The audio optimization processing method based on deep learning is characterized by comprising the following steps of:
acquiring an audio signal, digitizing the continuous signal into an audio digital signal, the digitizing comprising sampling, quantizing and encoding, decomposing the audio digital signal into at least one sinusoidal audio signal using a fourier transform;
based on deep learning, a background noise model is established, and an audio optimization model is established;
obtaining noise characteristics by using a background noise model, dividing the noise into easily identifiable noise and difficultly identifiable noise according to the noise characteristics, capturing subordinate easily identifiable noise in at least one sinusoidal audio signal, reducing the audio intensity of the subordinate easily identifiable noise to a preset intensity, and enabling human ears to be incapable of identifying audio with the intensity lower than the preset intensity;
for the indistinguishable noise, decomposing the indistinguishable noise into at least one sinusoidal noise signal by using Fourier transform, capturing a first sinusoidal audio signal with a difference from the sinusoidal noise signal within a preset range in the at least one sinusoidal audio signal, and reducing the audio intensity of the first sinusoidal audio signal to the preset intensity, wherein the difference between the first sinusoidal audio signal and the sinusoidal noise signal is calculated in such a way that the absolute value of the difference between the first sinusoidal audio signal and the sinusoidal noise signal is integrated in a definition domain of the first sinusoidal audio signal and the sinusoidal noise signal;
acquiring at least one characteristic optimization audio signal by using an audio optimization model, capturing a second sinusoidal audio signal with a difference between the second sinusoidal audio signal and the characteristic optimization audio signal within a preset range in the at least one sinusoidal audio signal, and performing disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, wherein the difference between the second sinusoidal audio signal and the characteristic optimization audio signal is calculated by integrating the absolute value of the difference between the second sinusoidal audio signal and the characteristic optimization audio signal on two definition domains;
extracting an audio key frame from the third sinusoidal audio signal, and changing a sudden change part, intercepted by the audio key frame, in the third sinusoidal audio signal;
and carrying out inverse Fourier transform on the at least one adjusted sinusoidal audio signal to obtain an optimized audio digital signal, and reconverting the audio digital signal into an optimized audio signal to output the optimized audio signal.
2. The deep learning based audio optimization processing method according to claim 1, wherein the sampling, quantizing and encoding comprises the steps of:
making continuous images of the audio signal, wherein the horizontal axis is time, and the vertical axis is audio signal intensity;
taking a limited time point to finish sampling;
collecting the corresponding audio signal intensity at the time point to finish quantization;
the resulting quantized data is encoded and represented in a digital format recognizable by a computer.
3. The method for optimizing audio based on deep learning according to claim 2, wherein the step of establishing a background noise model based on the deep learning comprises the steps of:
the method comprises the steps of acquiring various actual noises from big data, removing noises which cannot occur in audio signals, and obtaining sample noises;
sampling, quantizing and encoding the sample noise to obtain a noise digital signal.
4. The method for deep learning based audio optimization processing according to claim 3, wherein the step of creating the audio optimization model comprises the steps of:
acquiring various standard audio signals by big data;
and sampling, quantizing and encoding various standard audio signals to obtain standard audio digital signals.
5. The method for optimizing audio processing based on deep learning according to claim 4, wherein the classifying noise into easily identifiable noise and difficult-to-identify noise according to noise characteristics comprises the steps of:
acquiring all noise digital signals in a background noise model;
integrating the absolute value of the difference between the noise digital signal and the audio digital signal on the definition domain of the noise digital signal and the audio digital signal to obtain an integrated difference value;
if the integral difference is larger than the preset difference, the noise digital signal is easy to identify noise;
if the integral difference value does not exceed the preset difference value, the noise digital signal is indistinguishable noise.
6. The method for optimizing audio based on deep learning according to claim 5, wherein the fourier transform is specifically as follows:
,
wherein F (x) is a signal after Fourier transform, i is a unit imaginary number, e is a natural constant, and F (t) is a signal before Fourier transform;
the inverse fourier transform is specifically as follows:
,
where G (t) is a signal before inverse fourier transform, i is an imaginary number, e is a natural constant, and G (x) is a signal after inverse fourier transform.
7. The method of deep learning based audio optimization processing of claim 6, wherein capturing the slave easily identifiable noise in the at least one sinusoidal audio signal and reducing the audio intensity of the slave easily identifiable noise to a preset intensity comprises the steps of:
acquiring easy-to-recognize noise, and searching in at least one sinusoidal audio signal to obtain subordinate easy-to-recognize noise with the difference from the easy-to-recognize noise within a preset range;
obtaining a first fitting function of subordinate easily-identified noiseAdding a functional image to at least one sinusoidal audio signal as +.>Is a noise inverse signal of (1);
the subordinate easily-identified noise is overlapped with the noise inverse signal to obtain a noise offset signal, and the audio frequency intensity of the noise offset signal is reduced to a preset intensity;
reducing the audio intensity of the first sinusoidal audio signal to a preset intensity comprises the steps of:
obtaining a second fitting function of the first sinusoidal audio signalAdding a functional image to at least one sinusoidal audio signal as +.>Is a sinusoidal inverse signal of (2);
and the first sinusoidal audio signal is overlapped with the sinusoidal inverse signal to obtain a noise elimination signal, and the audio intensity of the noise elimination signal is reduced to a preset intensity.
8. The method of deep learning based audio optimization processing according to claim 7, wherein the perturbing the second sinusoidal audio signal comprises the steps of:
obtaining a sine function fitting a second sinusoidal audio signalObtaining a feature function of the fitting feature optimized audio signal;
Will be a sine functionAnd characteristic function->Difference is made to obtain compensation function->Adding a compensating inverse function to the second sinusoidal audio signal>And obtaining a third sinusoidal audio signal to finish the disturbance.
9. The method for deep learning based audio optimization processing of claim 8, wherein the performing audio key frame extraction on the third sinusoidal audio signal, and changing the abrupt part of the third sinusoidal audio signal that is intercepted by the audio key frame comprises the steps of:
the third sinusoidal audio signal and the characteristic optimization audio signal are made in the same coordinate system, the horizontal axis is time, and the vertical axis is audio signal intensity;
searching at least one key time point in a coordinate system, wherein the difference value between the third sinusoidal audio signal and the characteristic optimization audio signal at the key time point exceeds a preset range;
intercepting a third sinusoidal audio signal at a key time point as an audio key frame, and intercepting a feature optimized audio signal at the key time point as a comparison key frame;
acquiring a value p of a third sinusoidal audio signal in an audio key frame, and acquiring a value q of a feature optimization audio signal in a comparison key frame;
function of mutation cancellation signal,/>Satisfying the value +.about.at the key time point>The value is 0 at points other than the key time point;
and superposing and outputting the mutation eliminating signal and the third sinusoidal audio signal to finish correction.
10. A deep learning-based audio optimization processing system for implementing the deep learning-based audio optimization processing method according to any one of claims 1 to 9, comprising:
the audio processing module acquires audio signals, digitizes continuous signals into audio digital signals, obtains optimized audio digital signals, reconverts the audio digital signals into optimized audio signals, and outputs optimized audio signals;
the Fourier processing module performs Fourier transformation and inverse Fourier transformation;
the deep learning module is used for establishing a background noise model and an audio optimization model based on deep learning;
the noise classification module is used for classifying the noise into easily-identified noise and difficultly-identified noise according to the noise characteristics;
the noise processing module reduces the audio intensity of the subordinate easy-to-recognize noise to a preset intensity, and reduces the audio intensity of the first sinusoidal audio signal to the preset intensity;
the audio optimization module performs disturbance processing on the second sinusoidal audio signal to obtain a third sinusoidal audio signal, and performs audio key frame extraction on the third sinusoidal audio signal to change the abrupt part of the third sinusoidal audio signal, which is intercepted by the audio key frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311500231.4A CN117238307B (en) | 2023-11-13 | 2023-11-13 | Audio optimization processing method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311500231.4A CN117238307B (en) | 2023-11-13 | 2023-11-13 | Audio optimization processing method and system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117238307A true CN117238307A (en) | 2023-12-15 |
CN117238307B CN117238307B (en) | 2024-02-09 |
Family
ID=89082901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311500231.4A Active CN117238307B (en) | 2023-11-13 | 2023-11-13 | Audio optimization processing method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117238307B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111627416A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Audio noise elimination method, device, equipment and storage medium |
CN112017630A (en) * | 2020-08-19 | 2020-12-01 | 北京字节跳动网络技术有限公司 | Language identification method and device, electronic equipment and storage medium |
CN112509592A (en) * | 2020-11-18 | 2021-03-16 | 广东美的白色家电技术创新中心有限公司 | Electrical equipment, noise processing method and readable storage medium |
CN112567458A (en) * | 2018-08-16 | 2021-03-26 | 三菱电机株式会社 | Audio signal processing system, audio signal processing method, and computer-readable storage medium |
US11146607B1 (en) * | 2019-05-31 | 2021-10-12 | Dialpad, Inc. | Smart noise cancellation |
CN115134711A (en) * | 2021-03-26 | 2022-09-30 | Oppo广东移动通信有限公司 | Noise reduction method and device for audio playing equipment, electronic equipment and storage medium |
CN116543731A (en) * | 2022-01-25 | 2023-08-04 | 深圳市理邦精密仪器股份有限公司 | Noise reduction method and medical equipment |
CN116567474A (en) * | 2023-05-30 | 2023-08-08 | 深圳华钜芯半导体有限公司 | Bluetooth headset intelligence judgement system of making an uproar falls based on sound data discernment |
CN116801157A (en) * | 2023-08-28 | 2023-09-22 | 深圳市鑫正宇科技有限公司 | Wireless earphone assembly and signal processing method thereof |
CN116825123A (en) * | 2023-06-19 | 2023-09-29 | 广东保伦电子股份有限公司 | Tone quality optimization method and system based on audio push |
CN116884429A (en) * | 2023-09-05 | 2023-10-13 | 深圳市极客空间科技有限公司 | Audio processing method based on signal enhancement |
CN116994552A (en) * | 2023-09-28 | 2023-11-03 | 深圳市齐奥通信技术有限公司 | Audio noise reduction method and system based on deep learning |
-
2023
- 2023-11-13 CN CN202311500231.4A patent/CN117238307B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112567458A (en) * | 2018-08-16 | 2021-03-26 | 三菱电机株式会社 | Audio signal processing system, audio signal processing method, and computer-readable storage medium |
US11146607B1 (en) * | 2019-05-31 | 2021-10-12 | Dialpad, Inc. | Smart noise cancellation |
CN111627416A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Audio noise elimination method, device, equipment and storage medium |
CN112017630A (en) * | 2020-08-19 | 2020-12-01 | 北京字节跳动网络技术有限公司 | Language identification method and device, electronic equipment and storage medium |
CN112509592A (en) * | 2020-11-18 | 2021-03-16 | 广东美的白色家电技术创新中心有限公司 | Electrical equipment, noise processing method and readable storage medium |
CN115134711A (en) * | 2021-03-26 | 2022-09-30 | Oppo广东移动通信有限公司 | Noise reduction method and device for audio playing equipment, electronic equipment and storage medium |
CN116543731A (en) * | 2022-01-25 | 2023-08-04 | 深圳市理邦精密仪器股份有限公司 | Noise reduction method and medical equipment |
CN116567474A (en) * | 2023-05-30 | 2023-08-08 | 深圳华钜芯半导体有限公司 | Bluetooth headset intelligence judgement system of making an uproar falls based on sound data discernment |
CN116825123A (en) * | 2023-06-19 | 2023-09-29 | 广东保伦电子股份有限公司 | Tone quality optimization method and system based on audio push |
CN116801157A (en) * | 2023-08-28 | 2023-09-22 | 深圳市鑫正宇科技有限公司 | Wireless earphone assembly and signal processing method thereof |
CN116884429A (en) * | 2023-09-05 | 2023-10-13 | 深圳市极客空间科技有限公司 | Audio processing method based on signal enhancement |
CN116994552A (en) * | 2023-09-28 | 2023-11-03 | 深圳市齐奥通信技术有限公司 | Audio noise reduction method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN117238307B (en) | 2024-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8325909B2 (en) | Acoustic echo suppression | |
CN110769111A (en) | Noise reduction method, system, storage medium and terminal | |
CN106098078A (en) | A kind of audio recognition method that may filter that speaker noise and system thereof | |
CN111107284B (en) | Real-time generation system and generation method for video subtitles | |
TWI573133B (en) | Audio signal processing system and method | |
CN107592600B (en) | Pickup screening method and pickup device based on distributed microphones | |
US20230335148A1 (en) | Speech Separation Method, Electronic Device, Chip, and Computer-Readable Storage Medium | |
CN117238307B (en) | Audio optimization processing method and system based on deep learning | |
CN112185405B (en) | Bone conduction voice enhancement method based on differential operation and combined dictionary learning | |
WO2024017110A1 (en) | Voice noise reduction method, model training method, apparatus, device, medium, and product | |
CN112750426B (en) | Voice analysis system of mobile terminal | |
CN113012073A (en) | Training method and device for video quality improvement model | |
KR20110024969A (en) | Apparatus for filtering noise by using statistical model in voice signal and method thereof | |
CN113674752A (en) | Method and device for reducing noise of audio signal, readable medium and electronic equipment | |
CN110599406B (en) | Image enhancement method and device | |
WO2022166738A1 (en) | Speech enhancement method and apparatus, and device and storage medium | |
CN111968627B (en) | Bone conduction voice enhancement method based on joint dictionary learning and sparse representation | |
CN115440240A (en) | Training method for voice noise reduction, voice noise reduction system and voice noise reduction method | |
CN115083440A (en) | Audio signal noise reduction method, electronic device, and storage medium | |
CN112750452A (en) | Voice processing method, device and system, intelligent terminal and electronic equipment | |
CN108899041B (en) | Voice signal noise adding method, device and storage medium | |
CN112562712A (en) | Recording data processing method and system, electronic equipment and storage medium | |
CN113033767A (en) | Knowledge distillation-based data compression recovery method and system for neural network | |
CN110958417A (en) | Method for removing compression noise of video call video based on voice clue | |
CN117854514B (en) | Wireless earphone communication decoding optimization method and system for sound quality fidelity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |