CN117153192B - Audio enhancement method, device, electronic equipment and storage medium - Google Patents

Audio enhancement method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117153192B
CN117153192B CN202311413048.0A CN202311413048A CN117153192B CN 117153192 B CN117153192 B CN 117153192B CN 202311413048 A CN202311413048 A CN 202311413048A CN 117153192 B CN117153192 B CN 117153192B
Authority
CN
China
Prior art keywords
audio
enhancement
interval
enhanced
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311413048.0A
Other languages
Chinese (zh)
Other versions
CN117153192A (en
Inventor
赵力
马峰
高建清
朱志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Suzhou Technology Co Ltd
Original Assignee
Iflytek Suzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iflytek Suzhou Technology Co Ltd filed Critical Iflytek Suzhou Technology Co Ltd
Priority to CN202311413048.0A priority Critical patent/CN117153192B/en
Publication of CN117153192A publication Critical patent/CN117153192A/en
Application granted granted Critical
Publication of CN117153192B publication Critical patent/CN117153192B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)

Abstract

The invention provides an audio enhancement method, an audio enhancement device, electronic equipment and a storage medium, and relates to the technical field of audio processing. The method comprises the following steps: performing coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an ambient sound signal; determining a target audio enhancement threshold based on the ambient sound signal if the cross-correlation coefficient of the coherent sound signal and the ambient sound signal is less than a preset correlation coefficient threshold; determining a target audio enhancement threshold based on a preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is greater than or equal to the preset correlation coefficient threshold; and carrying out enhancement processing on the audio signal to be enhanced based on the at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal. According to the method and the device for enhancing the target audio enhancement threshold, the target audio enhancement threshold is dynamically updated in real time according to the change of the audio signal to be enhanced, so that the enhancement effect of each component of the audio source can be better controlled, the hearing of a user is further improved, and the experience of the user is finally improved.

Description

Audio enhancement method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of audio processing technologies, and in particular, to an audio enhancement method, an audio enhancement device, an electronic device, and a storage medium.
Background
With the rapid development of audio-visual and data transmission technologies, people have increasingly higher requirements for audio. The user experience sense of the audio can be improved, the sounding effect of the audio source can be restored, and the original audio source can be enhanced, such as bass enhancement, treble expansion and the like, so that the audio output effect is more various.
At present, a complete audio is enhanced through fixed audio control parameters; for example, the music platform has a sound effect setting mode such as clear human voice, subwoofer, etc., that is, one or more fixed EQ (equalizers) are used to control the sound effect of the input audio source. However, the enhancement processing is performed by the fixed audio control parameters, which cannot accurately enhance the audio data because the audio data is constantly changing, thereby affecting the user's listening feeling, i.e., affecting the user's experience feeling.
Disclosure of Invention
The invention provides an audio enhancement method, an audio enhancement device, electronic equipment and a storage medium, which are used for solving the defect of low audio enhancement accuracy in the prior art and realizing real-time high-accuracy audio enhancement processing.
The invention provides an audio enhancement method, which comprises the following steps:
performing coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an ambient sound signal;
determining an initial audio enhancement threshold based on the ambient sound signal and determining the initial audio enhancement threshold as a target audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the ambient sound signal is smaller than a preset correlation coefficient threshold;
determining a target audio enhancement threshold based on a preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent acoustic signal and the environmental acoustic signal is greater than or equal to a preset correlation coefficient threshold, wherein the absolute value of the preset audio enhancement threshold is smaller than or equal to the absolute value of the initial audio enhancement threshold;
performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal;
the target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
According to the audio enhancement method provided by the invention, the audio signal to be enhanced is determined based on the following steps:
formant detection is carried out on the audio data to be enhanced to obtain a plurality of formants, and a frequency point set corresponding to the formants is determined, wherein the frequency point set comprises a plurality of frequency points corresponding to the formants;
determining a second target frequency point from the frequency point set based on a first target frequency point with the largest power in the frequency point set, wherein the second target frequency point is a frequency point with the smallest frequency difference value with the first target frequency point in the first target frequency point set, and the first target frequency point set comprises frequency points with the power difference value larger than a preset power difference value with the first target frequency point in the frequency point set;
determining a frequency range of a current interval based on the frequency interval between the first target frequency point and the second target frequency point, and determining an interval midpoint of the current interval based on the first target frequency point;
determining a second target frequency point set corresponding to the current interval from the frequency point sets based on the frequency interval and the interval midpoint;
Removing frequency points in the second target frequency point set in the frequency point set, returning to the first target frequency point with maximum power in the frequency point set, and determining the second target frequency point from the frequency point set until a preset condition is met, wherein the preset condition comprises that no frequency point exists in the frequency point set, or the number of current intervals reaches the number of preset intervals;
dividing the frequency of the audio data to be enhanced based on the middle point of each interval and the frequency range of each interval to obtain a plurality of subband sequence sets, determining any one of the subband sequence sets as the audio signal to be enhanced, respectively determining the target audio enhancement threshold corresponding to each subband sequence set, and respectively carrying out enhancement processing on each subband sequence set based on the target audio enhancement threshold corresponding to each subband sequence set.
According to the audio enhancement method provided by the invention, the determining the interval midpoint of the current interval based on the first target frequency point comprises the following steps:
determining a first interval endpoint of the current interval based on a difference value between the first target frequency point and the frequency interval, and determining a second interval endpoint of the current interval based on a sum value of the first target frequency point and the frequency interval;
Taking the endpoint with the largest frequency in the second interval endpoint of the previous interval and the first interval endpoint of the current interval as the third interval endpoint of the current interval, wherein the previous interval is the interval determined before the current interval is determined, and if the current interval is the first determined interval, the second interval endpoint of the previous interval is 0;
and determining an interval midpoint of the current interval based on an average value of a third interval endpoint of the current interval and a second interval endpoint of the current interval.
According to the audio enhancement method provided by the invention, the determining of the second target frequency point from the frequency point set based on the first target frequency point with the largest power in the frequency point set further comprises:
and removing frequency points with power smaller than preset power in the frequency point set.
According to the audio enhancement method provided by the invention, the initial audio enhancement threshold value is determined based on the environmental sound signal, and the method comprises the following steps:
carrying out logarithmic calculation on the absolute value of the environmental sound signal to obtain a logarithmic value;
the initial audio enhancement threshold is determined based on the product of the logarithmic value and a preset value.
According to the audio enhancement method provided by the invention, the audio signal to be enhanced comprises a plurality of frames of audio signals, the coherent sound signals comprise a plurality of frames of coherent sounds corresponding to the audio signals, the ambient sound signals comprise a plurality of frames of ambient sounds corresponding to the audio signals, and the target audio enhancement threshold comprises enhancement thresholds corresponding to a plurality of frames of the audio signals;
the enhancing the audio signal to be enhanced based on the at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal includes:
smoothing the multi-frame enhancement threshold to obtain a processed multi-frame enhancement threshold;
performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the processed multi-frame enhancement threshold value to obtain an enhanced audio signal;
wherein, any frame enhancement threshold after processing is determined based on the following steps:
and taking any processed enhancement threshold value of the frame as a processed enhancement threshold value corresponding to the current frame, and carrying out weighted aggregation processing on the processed enhancement threshold value corresponding to the previous frame and the enhancement threshold value corresponding to the current frame based on a first weight corresponding to the previous frame and a second weight corresponding to the current frame to obtain the processed enhancement threshold value corresponding to the current frame, wherein the sum of the first weight and the second weight is 1.
According to the audio enhancement method provided by the invention, the audio signal to be enhanced is enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value, so as to obtain an enhanced audio signal, and then the method further comprises the following steps:
and processing the enhanced audio signal based on preset audio control parameters to obtain a processed enhanced audio signal.
The present invention also provides an audio enhancement apparatus comprising:
the audio extraction module is used for carrying out coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an environmental sound signal;
a first determining module, configured to determine an initial audio enhancement threshold based on the ambient sound signal and determine the initial audio enhancement threshold as a target audio enhancement threshold when a cross-correlation coefficient of the coherent sound signal and the ambient sound signal is less than a preset correlation coefficient threshold;
a second determining module, configured to determine a target audio enhancement threshold based on a preset audio enhancement threshold when a cross-correlation coefficient of the coherent acoustic signal and the ambient acoustic signal is greater than or equal to a preset correlation coefficient threshold, where an absolute value of the preset audio enhancement threshold is less than or equal to an absolute value of the initial audio enhancement threshold;
The audio enhancement module is used for enhancing the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal;
the target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the above-described audio enhancement methods when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the audio enhancement method as described in any of the above.
The audio enhancement method, the device, the electronic equipment and the storage medium provided by the invention are used for carrying out coherent sound extraction on the audio signal to be enhanced to obtain the coherent sound signal and the environmental sound signal, under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is smaller than the preset correlation coefficient threshold value, the target audio enhancement threshold value is determined based on the environmental sound signal, under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is larger than or equal to the preset correlation coefficient threshold value, the target audio enhancement threshold value is determined based on the preset audio enhancement threshold value, so that the difference degree of the coherent sound and the environmental sound is determined in a manner of carrying out real-time dynamic setting on the difference degree of the coherent sound and the environmental sound signal to control the required target audio enhancement threshold value, and under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is smaller than the preset correlation coefficient threshold value is determined based on the environmental sound signal, so that the target audio enhancement threshold value is changed along with the change of the audio signal to be enhanced, compared with the condition that the audio signal to be enhanced, the audio enhancement is carried out on one complete audio, the audio enhancement parameter is updated in real time according to the change of the audio signal to be enhanced, thereby obtaining at least one audio enhancement parameter which is more accurate than the audio enhancement parameter to be enhanced, at least one enhancement threshold value is obtained from the audio enhancement signal to be enhanced audio enhancement signal, and the enhancement parameter is more accurate to be enhanced audio enhancement threshold value is determined, therefore, the target audio enhancement threshold value is dynamically updated in real time according to the change of the audio signal to be enhanced, so that the enhancement effect of each component of the audio source can be better controlled, better rendering effect is achieved, the hearing of a user is further improved, and finally the experience of the user is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an audio enhancement method according to the present invention;
FIG. 2 is a second flow chart of the audio enhancement method according to the present invention;
FIG. 3 is a third flow chart of the audio enhancement method according to the present invention;
fig. 4 is a schematic structural diagram of an audio enhancement device according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
With the rapid development of audio and video and data transmission technologies, people have higher requirements on audio and audio feeling, and have higher attention to audio and video, such as high-quality home cinema, surround sound in a car, intelligent headphones and the like, so as to achieve self mental pleasure. The user experience sense of the audio can be improved, the sounding effect of the audio source can be restored, and the original audio source can be enhanced, such as bass enhancement, treble expansion and the like, so that the audio output effect is more various, more playability is brought, and the user is immersed in the atmosphere presented by the audio source.
At present, a complete audio is enhanced through fixed audio control parameters; for example, the music platform has a sound effect setting mode such as clear voice and super bass, that is, one or more fixed EQ (equalizer) are used to control the sound effect of the input audio source, so that the performance of the voice or musical instrument in the song can be adjusted, and a certain effect can be achieved. However, the enhancement processing is performed by the fixed audio control parameters, which cannot accurately enhance the audio data because the audio data is constantly changing, thereby affecting the user's listening feeling, i.e., affecting the user's experience feeling.
In addition, the music platform has scene modes, which can be switched to a concert, a theater, a scene, etc., and the mode only renders the spatial sense of music, adjusts the space to a wider space, but does not perform more comprehensive sound effect control on the audio.
In addition, audio enhancement processing can also be performed based on deep learning; for example, by segmenting the input audio, then entering a trained classification network to classify the input audio into voice, music, noise and the like, then respectively enhancing the three types, using a trained enhancement network, and finally merging the three types into output; for another example, an equalization feature sequence and a reverberation feature sequence are obtained through an inference model according to the input audio, and the audio is processed and output by combining with preset sound effect information. However, with the big data driven model approach, its performance effect is very relevant to sample data, but the early data preparation is a very big project, where copyright issues are involved; secondly, the calculation force and the storage of the corresponding model are also factors to be considered, and the effect of the model is generally limited by the minimum parameter number, so that the effect of the model on a low-calculation-force platform can be unsatisfactory.
In addition, considering that in an actual scene, for example, when a sound source in a car is played, generally, stereo audio streams acquired from each large music platform are played through various speakers such as high frequency, medium frequency and low frequency in the car, and the frequency response characteristics of the speakers are fixed during the whole car production, so that the music performance or rendering effect is fixed, the listening effect is not necessarily optimal, and therefore enhancement processing is required to be performed on the audio source so as to improve the listening effect.
In view of the above problems, the present invention proposes the following embodiments. Fig. 1 is a schematic flow chart of an audio enhancement method according to the present invention, as shown in fig. 1, the audio enhancement method includes:
step 110, coherent sound extraction is performed on the audio signal to be enhanced, so as to obtain a coherent sound signal and an ambient sound signal.
Here, the audio signal to be enhanced is an audio signal to be enhanced processed. In some embodiments, the audio signal to be enhanced comprises a multi-frame audio signal, i.e. the audio signal to be enhanced is a sequence; correspondingly, the coherent sound signal comprises coherent sound corresponding to a plurality of frames of audio signals, and the ambient sound signal comprises ambient sound corresponding to a plurality of frames of audio signals.
Here, the algorithm of coherent sound extraction may include, but is not limited to: LS (least square method), PCA (principal component analysis ), APES (ambient phase estimation with a sparsity constraint, ambient acoustic phase estimation), MWF (multichannel wiener filter, multichannel wiener filtering), and the like. Based on the above, the embodiment of the invention can select the coherent sound extraction algorithm as required, so that different algorithms can be selected according to different computing forces of the platform, namely, the method has certain universality, so as to achieve the optimal listening effect of the corresponding platform, thereby improving the accuracy of audio enhancement, and further improving the listening effect of the audio, namely, improving the user experience.
It will be appreciated that spatial sound consists mainly of two components of different nature, one of which is a sound component with directivity, called coherent sound; the other is a sound component having diffusivity and unable to distinguish the direction, which is called ambient sound.
In some embodiments, considering that the enhancement modes required by different intervals in the audio data to be enhanced are different, based on the different enhancement modes, the audio data to be enhanced is divided to obtain a plurality of subband sequence sets, and any one subband sequence set in the plurality of subband sequence sets is determined as the audio signal to be enhanced. In other words, each subband sequence set in the plurality of subband sequence sets needs to be subjected to enhancement processing, and correspondingly, after the enhancement audio signals corresponding to each subband sequence set are obtained, the enhancement audio signals are overlapped to obtain enhancement audio data corresponding to the audio data to be enhanced.
In an embodiment, the audio data to be enhanced may be complete audio source data, so as to perform crossover processing on the complete audio source data, thereby respectively determining audio enhancement parameters corresponding to each component of the audio source, so that enhancement effects of each component of the audio source can be better adjusted, so as to improve hearing of a user, and finally improve experience of the user.
In an embodiment, the audio data to be enhanced is stereo sound, i.e. stereo audio source data. The audio data to be enhanced may be obtained from a music platform or device.
Step 120, determining an initial audio enhancement threshold based on the ambient sound signal and determining the initial audio enhancement threshold as a target audio enhancement threshold in the case that the cross-correlation coefficient of the coherent sound signal and the ambient sound signal is smaller than a preset correlation coefficient threshold.
Specifically, the ambient sound signal is subjected to mathematical operation processing to obtain an initial audio enhancement threshold. In some embodiments, the absolute value of the ambient sound signal is logarithmically calculated to obtain a logarithm value; an initial audio enhancement threshold is determined based on the logarithmic value.
In some embodiments, in a case where the coherent sound signal includes a coherent sound corresponding to a plurality of frames of audio signals, and the ambient sound signal includes an ambient sound corresponding to a plurality of frames of audio signals, for any one of the frames of audio signals, determining an initial audio enhancement threshold corresponding to the frame of audio signals based on the ambient sound corresponding to the frame of audio signals if a cross-correlation coefficient of the coherent sound corresponding to the frame of audio signals and the ambient sound corresponding to the frame of audio signals is less than a preset correlation coefficient threshold. At this time, the initial audio enhancement threshold includes an enhancement threshold corresponding to the multi-frame audio signal.
Illustratively, the calculation formula of the cross-correlation coefficient of coherent sound and ambient sound is as follows:
in the method, in the process of the invention,indicate the number of frames->Indicate->Coherent sound corresponding to frame audio signal, +.>Indicate->Ambient sound corresponding to frame audio signal, +.>Representing coherent sound +.>Is +.>Cross-correlation coefficient of>Representing mathematical expectations.
Here, the preset correlation coefficient threshold may be set according to actual needs, for example, 0.25, which is not limited in particular in the embodiment of the present invention.
Step 130, determining a target audio enhancement threshold based on a preset audio enhancement threshold, wherein the absolute value of the preset audio enhancement threshold is smaller than or equal to the absolute value of the initial audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the environment sound signal is larger than or equal to a preset correlation coefficient threshold.
Specifically, the preset audio enhancement threshold may be directly determined as the target audio enhancement threshold, or the preset audio enhancement threshold may be further processed to obtain the target audio enhancement threshold.
In some embodiments, in a case where the coherent sound signal includes a coherent sound corresponding to a multi-frame audio signal and the ambient sound signal includes an ambient sound corresponding to a multi-frame audio signal, for any one of the frame audio signals, the target audio enhancement threshold corresponding to the frame audio signal is determined based on the preset audio enhancement threshold in a case where a cross-correlation coefficient of the coherent sound corresponding to the frame audio signal and the ambient sound corresponding to the frame audio signal is greater than or equal to a preset correlation coefficient threshold. At this time, the target audio enhancement threshold includes an enhancement threshold corresponding to the multi-frame audio signal.
It will be appreciated that where the cross-correlation coefficient of the coherent sound and the ambient sound is large, the absolute value of the corresponding target audio enhancement threshold should be smaller than the absolute value of the target audio enhancement threshold (initial audio enhancement threshold) determined when the cross-correlation coefficient of the coherent sound and the ambient sound is small. In an embodiment, the preset audio enhancement threshold may be 0, such that it may be ensured that the absolute value of the preset audio enhancement threshold is less than or equal to the absolute value of any initial audio enhancement threshold determined based on the ambient sound signal.
And 140, performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal.
The target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
Specifically, based on at least one audio enhancement parameter and a target audio enhancement threshold, dynamic range control is performed on an audio signal to be enhanced to obtain an enhanced audio signal.
It is also necessary to acquire at least one audio enhancement parameter, considering that the target audio enhancement threshold determined in real time only defines the range of action of the audio enhancement parameter. The audio enhancement parameters are used for enhancement processing of the audio signal to be enhanced, i.e. for parameter control processing of the audio signal to be enhanced, such as sound effect parameter control and the like. The at least one audio enhancement parameter may include, but is not limited to, at least one of: gain (Gain), compression/expansion Ratio (Ratio), setup Time (attach Time), release Time (Release Time), hold Time (Hold Time), etc. Illustratively, the at least one audio enhancement parameter includes a compression/expansion ratio, and a compression (Compressor)/expansion (Expander) operation may be performed on the audio signal to be enhanced based on the compression/expansion ratio.
For example, coherent or ambient sound in the audio signal to be enhanced may be controlled, e.g., compressed or expanded, to achieve different listening effects, depending on the target audio enhancement threshold. For example, the at least one audio enhancement parameter includes a compression/expansion ratio and a gain, and in a high frequency band (such as around 6 kHz), a portion of the audio signal to be enhanced that is higher than the target audio enhancement threshold is compressed, and other portions of the audio signal to be enhanced that are not higher than the target audio enhancement threshold are not compressed. It will be appreciated that the target audio enhancement threshold is merely a range of action defining the compression/expansion ratio and is not a range of action defining the gain, i.e. the portion of the audio signal is the signal of the desired enhancement corresponding to the compression/expansion ratio of the at least one audio enhancement parameter.
It will be appreciated that the partial audio signal may be determined from the audio signal to be enhanced based on the target audio enhancement threshold, so as to define an application range of the individual enhancement parameter of the at least one audio enhancement parameter as the partial audio signal, that is, the partial audio signal is a signal of the at least one audio enhancement parameter corresponding to the desired enhancement of the at least one enhancement parameter, in other words, the partial audio signal is enhanced based on the at least one enhancement parameter, and the audio signal to be enhanced is enhanced based on other enhancement parameters of the at least one audio enhancement parameter than the at least one enhancement parameter.
In some embodiments, if the target audio enhancement threshold includes an enhancement threshold corresponding to a multi-frame audio signal, smoothing the multi-frame enhancement threshold to obtain a processed multi-frame enhancement threshold; and carrying out enhancement processing on the audio signal to be enhanced based on the at least one audio enhancement parameter and the processed multi-frame enhancement threshold value to obtain an enhanced audio signal. Based on the method, the accuracy of determining the enhancement threshold value can be improved through smoothing processing, so that the accuracy of audio enhancement is improved, and finally the listening effect is improved, so that the user experience is improved.
In some embodiments, parameter adjustment instructions are obtained, and at least one audio enhancement parameter is determined based on the parameter adjustment instructions. More specifically, when the audio enhancement parameters indicated by the parameter adjustment instruction are only partial parameters of at least one audio enhancement parameter, other parameters than the partial parameters in the at least one audio enhancement parameter are determined as preset parameter values. Based on the method, the user can select and set the audio enhancement parameters by himself, namely, the user can adjust the audio enhancement parameters according to the actual needs of the user, so as to improve the listening effect, improve the individuation level of the audio enhancement and finally improve the experience of the user. Further, at least one audio enhancement parameter may be provided to the user for selection, or a portion of the enhancement parameters may be selected for selection by the user.
It can be understood that, in the embodiment of the invention, the difference degree of coherent sound and environmental sound is determined by means of coherent sound extraction, so that the target audio enhancement threshold required by audio dynamic range control is dynamically set in real time based on the difference degree of the coherent sound and the environmental sound. Meanwhile, the embodiment of the invention does not need to carry out audio enhancement processing based on deep learning, so that a large number of data sets are not required to be processed, and a better audio enhancement effect can be achieved on each platform, particularly on a low-power platform, so that the hearing of a user is improved, and the experience of the user is finally improved.
The audio enhancement method provided by the embodiment of the invention carries out coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an environmental sound signal, determines a target audio enhancement threshold based on the environmental sound signal under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is smaller than a preset correlation coefficient threshold, determines the target audio enhancement threshold based on the preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is larger than or equal to the preset correlation coefficient threshold, thereby determining the difference degree of the coherent sound and the environmental sound by a coherent sound extraction mode, dynamically setting the target audio enhancement threshold required by audio dynamic range control based on the difference degree of the coherent sound signal and the environmental sound signal, and determining the target audio enhancement threshold based on the environmental sound signal under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is smaller than the preset correlation coefficient threshold, compared with the method that the whole audio is enhanced by fixed audio control parameters, the method and the device can update corresponding audio enhancement parameters according to the change of the audio signal to be enhanced in real time, so that the audio signal to be enhanced is enhanced more accurately based on at least one audio enhancement parameter and the target audio enhancement threshold, more accurate enhancement audio signals are obtained, namely the accuracy of audio enhancement is improved, the target audio enhancement threshold is used for determining partial audio signals from the audio signal to be enhanced, the partial audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter, and the target audio enhancement threshold is dynamically updated in real time according to the change of the audio signal to be enhanced, therefore, the enhancement effect of each component of the audio source can be better controlled, so that a better rendering effect is achieved, the hearing of a user is further improved, and finally the experience of the user is improved.
Based on any of the above embodiments, fig. 2 is a second schematic flow chart of the audio enhancement method provided by the present invention, as shown in fig. 2, the audio signal to be enhanced is determined based on the following steps:
step 210, detecting formants of the audio data to be enhanced to obtain a plurality of formants, and determining a frequency point set corresponding to the formants, wherein the frequency point set comprises a plurality of frequency points corresponding to the formants.
In an embodiment, the audio data to be enhanced may be complete audio source data, so as to perform crossover processing on the complete audio source data, thereby respectively determining audio enhancement parameters corresponding to each component of the audio source.
In an embodiment, the audio data to be enhanced is stereo sound, i.e. stereo audio source data. The audio data to be enhanced may be obtained from a music platform or device.
Here, algorithms for formant detection may include, but are not limited to: cepstrum, LPC (Linear Predictive Coding ) methods, hilbert-Huang transform methods, and the like. Based on the above, considering that the processing time delay and the memory consumption of various algorithms are different, the embodiment of the invention can select the formant detection algorithm according to the needs, so that different algorithms can be selected according to the difference of the platform computing power, namely, the method has certain universality, so as to achieve the optimal listening effect of the corresponding platform, thereby improving the accuracy of audio enhancement, further improving the listening effect of the audio, namely, improving the user experience.
It will be appreciated that the audio data to be enhanced is typically time domain data, and that the plurality of formants are maxima of the spectral envelope of the audio data to be enhanced.
In a specific embodiment, subscripts corresponding to the plurality of formants are determined, and a plurality of corresponding frequency points are determined based on the subscripts to obtain a frequency point set. Further, the index may be the number of frames of audio data to be enhanced.
Step 220, determining a second target frequency point from the frequency point set based on a first target frequency point with the largest power in the frequency point set, wherein the second target frequency point is a frequency point with the smallest frequency difference value with the first target frequency point in the first target frequency point set, and the first target frequency point set comprises frequency points with the power difference value larger than a preset power difference value with the first target frequency point in the frequency point set.
For ease of understanding, the unit of power at a frequency point may be set to dB here.
Here, the preset power difference may be set according to actual needs, for example, 6dB. For example, a frequency point with the largest power in the frequency point set is denoted as a first target frequency point, a frequency point with power near the first target frequency point in the frequency point set being lower than the power of the first target frequency point by 6dB, and a frequency point with the closest frequency distance from the first target frequency point is denoted as a second target frequency point.
Step 230, determining a frequency range of the current interval based on the frequency interval between the first target frequency point and the second target frequency point, and determining an interval midpoint of the current interval based on the first target frequency point.
Here, the frequency interval is determined based on a difference between the frequency of the first target frequency point and the frequency of the second target frequency point. Illustratively, the frequency interval= |frequency of the second target frequency point-frequency of the first target frequency point.
Specifically, the frequency interval may be multiplied by 2 to obtain the frequency range of the current interval, or the frequency interval may be directly determined as the frequency range of the current interval, and only the frequency range half the frequency range of the current interval needs to be known later.
Specifically, the first target frequency point may be directly determined as the middle point of the current interval, or the first target frequency point may be further adjusted to obtain the middle point of the current interval.
Step 240, determining a second target frequency point set corresponding to the current interval from the frequency point sets based on the frequency interval and the interval midpoint.
It will be appreciated that the frequency range of the current interval may be determined based on the interval midpoint of the current interval, as well as the frequency interval of the current interval. For example, the mid-point of the interval is F, the frequency interval is Q, and the frequency range of the current interval is [ F-Q, F+Q ]. And further, based on the frequency range of the current interval, determining a second target frequency point set corresponding to the current interval from the frequency point sets, namely, the frequency of each frequency point in the second target frequency point set is in the frequency range of the current interval.
Step 250, removing frequency points in the second target frequency point set in the frequency point set, returning to the step of determining the second target frequency point from the frequency point set based on the first target frequency point with the largest power in the frequency point set until a preset condition is met, wherein the preset condition comprises that no frequency point exists in the frequency point set, or the number of current intervals reaches the number of preset intervals.
Specifically, the frequency points in the second target frequency point set in the frequency point set are removed, that is, the frequency points in the frequency range of the current interval in the frequency point set are removed, in other words, other frequency points outside the current interval in the frequency point set are reserved.
It will be appreciated that the above-described step 220 is returned, i.e., steps 220-250 are repeated to determine the interval midpoint and frequency range of the next interval. It should be understood that, when returning to step 220, the frequency bin set is the most recently updated set, i.e., the removed frequency bin set.
The loop termination conditions of steps 220-250 are preset conditions. The preset condition is that no frequency point exists in the frequency point set, namely, no frequency point exists in the frequency point set after final removal, in other words, all frequency points in the frequency point set are detected; or, the preset condition is that the number of the current intervals reaches the number of preset intervals, and each time step 220-step 250 is executed, one interval is obtained, that is, the number of the loops (the execution times) of step 220-step 250 reaches the number of preset intervals.
Step 260, dividing the frequency of the audio data to be enhanced based on the mid-point of each interval and the frequency range of each interval to obtain a plurality of subband sequence sets, determining any one of the subband sequence sets as the audio signal to be enhanced, so as to respectively determine the target audio enhancement threshold corresponding to each subband sequence set, and respectively enhancing each subband sequence set based on the target audio enhancement threshold corresponding to each subband sequence set.
It will be appreciated that each time steps 220-250 are performed, the mid-point and frequency range of an interval are obtained.
In a specific embodiment, points in all intervals are marked as a frequency point set, frequency ranges of all intervals are marked as a frequency range set, the frequency point set and the frequency range set are used as input parameters of a frequency divider, and the frequency divider is used for dividing frequency of audio data to be enhanced to obtain a plurality of subband sequence sets. The number of the intervals is the same as the number of the subband sequence sets, namely, the frequency center point of each subband sequence set is the interval midpoint of the corresponding interval, and the frequency range of each subband sequence set is determined based on the frequency range of the corresponding interval. The frequency divider can be set according to actual needs, for example, the frequency divider is implemented by a Linkwitz-Riley cross filter.
It should be noted that, each subband sequence set in the plurality of subband sequence sets needs to be enhanced, and correspondingly, after the enhanced audio signals corresponding to each subband sequence set are obtained, each enhanced audio signal is overlapped, so as to obtain enhanced audio data corresponding to the audio data to be enhanced.
According to the audio enhancement method provided by the embodiment of the invention, the situation that enhancement modes required by different intervals in the audio data to be enhanced are different is considered, and based on the fact, frequency division is required to be carried out on the audio data to be enhanced. Through the mode, through the formant detection method and the peak judgment method, namely the method for determining the interval midpoint and the frequency range of each interval, the frequency point (interval midpoint) and the frequency range which need to be divided can be obtained in real time, so that the accuracy of frequency division is improved, in other words, the frequency point and the frequency range which need to be divided are enabled to change along with the change of audio data to be enhanced.
Based on any one of the foregoing embodiments, in the method, in step 230, determining an interval midpoint of the current interval based on the first target frequency point includes:
determining a first interval endpoint of the current interval based on a difference value between the first target frequency point and the frequency interval, and determining a second interval endpoint of the current interval based on a sum value of the first target frequency point and the frequency interval;
taking the endpoint with the largest frequency in the second interval endpoint of the previous interval and the first interval endpoint of the current interval as the third interval endpoint of the current interval, wherein the previous interval is the interval determined before the current interval is determined, and if the current interval is the first determined interval, the second interval endpoint of the previous interval is 0;
and determining an interval midpoint of the current interval based on an average value of a third interval endpoint of the current interval and a second interval endpoint of the current interval.
Specifically, the difference value between the first target frequency point and the frequency interval can be directly determined as a first interval endpoint of the current interval, and the sum value of the first target frequency point and the frequency interval can be directly determined as a second interval endpoint of the current interval; the difference may be further processed to obtain a first interval endpoint, and the sum may be further processed to obtain a second interval endpoint.
In order to avoid overlapping areas of the divided sections, that is, to avoid overlapping areas of the multiple subband sequence sets for frequency division, it is necessary to detect whether the overlapping areas exist or not, so as to update the first section endpoint of the current section. It should be understood that if the endpoint with the largest frequency in the second interval endpoint of the previous interval and the first interval endpoint of the current interval is the first interval endpoint of the current interval, the overlapping area does not exist, and correspondingly, the interval midpoint of the current interval is the first target frequency point; if the endpoint with the largest frequency in the second interval endpoint of the previous interval and the first interval endpoint of the current interval is the second interval endpoint of the previous interval, the existence of the overlapping area is indicated, and correspondingly, the interval midpoint of the current interval is determined by the updated third interval endpoint and the second interval endpoint.
It should be noted that, each time step 220-step 250 is performed, a section is obtained, and based on this, the previous section is the section obtained by the previous step 220-step 250. If the current section is the first determined section, that is, the step 220-250 is executed for the first time, there is no previous section, at this time, the end point of the second section of the previous section is set to 0, so as to prevent the end point of the current section from being less than 0, ensure the dividing accuracy of the current section, and further improve the frequency dividing accuracy of the audio data.
Specifically, the average value of the third interval endpoint of the current interval and the second interval endpoint of the current interval may be directly determined as the interval midpoint of the current interval, or the average value may be further adjusted to obtain the interval midpoint of the current interval.
In order to facilitate understanding of the foregoing embodiments, a specific embodiment is described herein. The method comprises the following specific steps:
step one, the frequency point with the largest power in the frequency point set is marked asThe frequency point is gathered +.>Nearby power is lower than +.>The frequency point at 6dB and the frequency point closest to the frequency are marked as +.>Calculate->And->Is described as the frequency interval of
Step two, the frequency point set is contained inFrequency points within the interval are removed and when +.>If not the middle point of the current interval, +.>
Thirdly, finding out the frequency point with the maximum power from the removed frequency point set asThe frequency point is gathered +.>Nearby power is lower than +.>The frequency point at 6dB and the frequency point closest to the frequency are marked as +.>Calculate->And->Frequency interval, denoted->
Step four, the removed frequency point set is contained inFrequency points within the interval are removed and when +.>If not the mid-point of the interval->
And fifthly, repeatedly executing the third step and the fourth step until no frequency point exists in the frequency point set, or the number of the current intervals reaches the number of the preset intervals.
According to the audio enhancement method provided by the embodiment of the invention, the possible overlapping area of each section is considered, and based on the possible overlapping area, the end point of each section is required to be readjusted, and the section midpoint of each section is required to be redetermined. By adopting the mode, the endpoint with the largest frequency in the second interval endpoint of the previous interval and the first interval endpoint of the current interval is used as the third interval endpoint of the current interval, the endpoint of the interval can be accurately adjusted, and further, the midpoint of the interval of the current interval can be accurately determined based on the average value of the third interval endpoint of the current interval and the second interval endpoint of the current interval, so that the overlapping area of each divided interval is avoided, the overlapping area of a plurality of sub-band sequence sets of frequency division is avoided, the dividing accuracy of the plurality of sub-band sequence sets is ensured, the audio enhancement accuracy is further improved, the enhancement effect of each component of an audio source can be better controlled, the better rendering effect is achieved, the user hearing is further improved, and finally the user experience is improved.
Based on any of the above embodiments, prior to step 220 above, the method further comprises:
and removing frequency points with power smaller than preset power in the frequency point set.
Here, the preset power may be set according to actual needs, for example, -60dB.
It should be noted that, only the frequency points with the power smaller than the preset power in the frequency point set need to be removed before the step 220 is performed for the first time.
According to the audio enhancement method provided by the embodiment of the invention, the frequency points with the power smaller than the preset power in the frequency point set are removed, so that the frequency points corresponding to smaller sound are removed, the number of frequency points required to be detected is reduced, the execution times of the steps 220-250 are reduced, the frequency division efficiency is further improved, and finally the audio enhancement efficiency is improved.
Based on any one of the above embodiments, the method further includes the step 120 of:
carrying out logarithmic calculation on the absolute value of the environmental sound signal to obtain a logarithmic value;
the initial audio enhancement threshold is determined based on the product of the logarithmic value and a preset value.
The logarithmic calculation may be a base-10 logarithmic calculation, or may be a logarithmic calculation of another base.
Here, the preset value may be set according to actual needs, for example, 20. The product of the logarithmic value and the preset value can be directly determined as an initial audio enhancement threshold, and the product can be further processed to obtain the initial audio enhancement threshold.
In some embodiments, in a case where the coherent sound signal includes coherent sound corresponding to a plurality of frames of audio signals, and the ambient sound signal includes ambient sound corresponding to a plurality of frames of audio signals, performing logarithmic calculation on an absolute value of the ambient sound corresponding to the frame of audio signals for any frame of audio signals, to obtain a logarithmic value corresponding to the frame of audio signals, and determining an initial audio enhancement threshold corresponding to the frame of audio signals based on a product of the logarithmic value and a preset value. At this time, the initial audio enhancement threshold includes an enhancement threshold corresponding to the multi-frame audio signal.
Illustratively, the calculation formula for the initial audio enhancement threshold is as follows:
in the method, in the process of the invention,indicate->An initial audio enhancement threshold corresponding to a frame audio signal, < >>Indicate->Ambient sound corresponding to the frame audio signal.
The audio enhancement method provided by the embodiment of the invention provides support for determining the initial audio enhancement threshold value, and the mode is to determine the initial audio enhancement threshold value based on the environmental sound signal so that the initial audio enhancement threshold value changes along with the change of the audio signal to be enhanced.
Based on any of the foregoing embodiments, the audio signal to be enhanced includes a plurality of frames of audio signals, the coherent sound signal includes a plurality of frames of coherent sounds corresponding to the audio signals, the ambient sound signal includes a plurality of frames of ambient sounds corresponding to the audio signals, and the target audio enhancement threshold includes an enhancement threshold corresponding to the audio signals. Correspondingly, the step 140 includes:
smoothing the multi-frame enhancement threshold to obtain a processed multi-frame enhancement threshold;
and carrying out enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the processed multi-frame enhancement threshold value to obtain an enhanced audio signal.
Wherein, any frame enhancement threshold after processing is determined based on the following steps:
and taking any processed enhancement threshold value of the frame as a processed enhancement threshold value corresponding to the current frame, and carrying out weighted aggregation processing on the processed enhancement threshold value corresponding to the previous frame and the enhancement threshold value corresponding to the current frame based on a first weight corresponding to the previous frame and a second weight corresponding to the current frame to obtain the processed enhancement threshold value corresponding to the current frame, wherein the sum of the first weight and the second weight is 1.
It will be appreciated that, through the steps 110 to 130, the enhancement threshold corresponding to each frame of audio signal is determined in sequence, so that the weighted aggregation processing can be performed on the enhancement threshold corresponding to the previous frame and the enhancement threshold corresponding to the current frame before processing based on the first weight corresponding to the previous frame and the second weight corresponding to the current frame.
Here, the first weight and the second weight may be set in advance according to actual needs, for example, the first weight is 0.9, and the second weight is 0.1.
Illustratively, the calculation formula of the processed enhancement threshold corresponding to the current frame is as follows:
in the method, in the process of the invention,representing the corresponding processed enhancement threshold value of the current frame, < >>Representing a first weight, ++>Representing a second weight, ++>Indicating the corresponding processed enhancement threshold value of the last frame,/->Representing the pre-processing enhancement threshold corresponding to the current frame.
According to the audio enhancement method provided by the embodiment of the invention, the multi-frame enhancement threshold can be subjected to smooth processing, so that the accuracy of determining the enhancement threshold can be improved, the accuracy of audio enhancement is further improved, and finally the listening effect is improved, so that the user experience is improved.
Based on any of the above embodiments, after step 140, the method further includes:
And processing the enhanced audio signal based on preset audio control parameters to obtain a processed enhanced audio signal.
Here, the preset audio control parameters are used for performing parameter control processing on the enhanced audio signal, so as to prevent the enhanced audio signal from exceeding the required enhancement range. The preset audio control parameters are parameters preset according to actual needs.
For example, the preset audio control parameters include amplitude control parameters to prevent the enhanced audio signal from being truncated, i.e., to ensure that the processed enhanced audio signal does not have a truncated condition. In a specific embodiment, the amplitude control parameter is used as an input parameter of a compressor or a limiter, and the enhanced audio signal is processed through the compressor or the limiter to prevent the condition of clipping.
In addition, the preset audio control parameters may be set with reference to at least one audio enhancement parameter, which will not be described in detail herein.
According to the audio enhancement method provided by the embodiment of the invention, the enhancement audio signal is processed based on the preset audio control parameter, so that the processed enhancement audio signal is obtained, the enhancement audio signal subjected to enhancement processing before is prevented from being an abnormal audio signal, namely, the audio parameter of the enhancement audio signal subjected to enhancement processing is prevented from not being in the normal audio parameter range, and the accuracy and the reliability of audio enhancement are further ensured.
In order to facilitate understanding of the above embodiments, a specific embodiment is described herein. As shown in fig. 3, firstly, formant detection and peak judgment are performed on input stereo audio (audio data to be enhanced) to obtain a frequency point set and a frequency range set, secondly, the frequency point set and the frequency range set are used as input parameters of a frequency divider, the frequency divider is used for dividing the input stereo audio to obtain a plurality of subband sequence sets, then coherent sound extraction is performed on the plurality of subband sequence sets respectively to obtain coherent sound and environmental sound, then, a target audio enhancement threshold value of each subband sequence set is obtained based on the coherent sound and the environmental sound by threshold value determination, then, enhancement processing is performed on the plurality of subband sequence sets respectively based on each target audio enhancement threshold value and audio enhancement parameters, then, a preset audio control parameter is used as an input parameter of a compressor/limiter, compression/limitation is performed on the sequence set after the enhancement processing is performed through the compressor/limiter, and then, superposition processing is performed on data after the compression/limitation corresponding to each subband to obtain output stereo audio (enhanced audio data).
The audio enhancement device provided by the invention is described below, and the audio enhancement device described below and the audio enhancement method described above can be referred to correspondingly.
Fig. 4 is a schematic structural diagram of an audio enhancement device according to the present invention, as shown in fig. 4, where the audio enhancement device includes:
an audio extraction module 410, configured to perform coherent sound extraction on an audio signal to be enhanced, so as to obtain a coherent sound signal and an ambient sound signal;
a first determining module 420, configured to determine an initial audio enhancement threshold based on the ambient sound signal and determine the initial audio enhancement threshold as a target audio enhancement threshold when a cross-correlation coefficient of the coherent sound signal and the ambient sound signal is less than a preset correlation coefficient threshold;
a second determining module 430, configured to determine a target audio enhancement threshold based on a preset audio enhancement threshold, where an absolute value of the preset audio enhancement threshold is less than or equal to an absolute value of the initial audio enhancement threshold, when a cross-correlation coefficient of the coherent acoustic signal and the ambient acoustic signal is greater than or equal to a preset correlation coefficient threshold;
an audio enhancement module 440, configured to perform enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold, so as to obtain an enhanced audio signal;
The target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
According to the audio enhancement device provided by the embodiment of the invention, coherent sound extraction is carried out on an audio signal to be enhanced to obtain a coherent sound signal and an environmental sound signal, the target audio enhancement threshold is determined based on the environmental sound signal under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is smaller than the preset correlation coefficient threshold, and the target audio enhancement threshold is determined based on the preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is larger than or equal to the preset correlation coefficient threshold, so that the difference degree of the coherent sound and the environmental sound is determined in a coherent sound extraction mode, the target audio enhancement threshold required by audio dynamic range control is dynamically set based on the difference degree of the coherent sound signal and the environmental sound signal, and the target audio enhancement threshold is determined based on the environmental sound signal under the condition that the cross-correlation coefficient of the coherent sound signal and the environmental sound signal is smaller than the preset correlation coefficient threshold, so that the target audio enhancement threshold is changed along with the change of the audio signal to be enhanced, compared with the fact that the audio signal to be enhanced, the audio enhancement is carried out on one audio through fixed audio control parameter, the audio enhancement parameters can be updated in real time, so that the audio enhancement parameters corresponding to the audio enhancement parameters to be enhanced according to the audio signal to be enhanced, and the audio enhancement parameters are more accurately and at least one audio signal to be enhanced to be at least one enhancement threshold, and the enhancement threshold is more than the audio enhancement threshold is more than the enhancement threshold to be enhanced to be accurate, therefore, the enhancement effect of each component of the audio source can be better controlled, so that a better rendering effect is achieved, the hearing of a user is further improved, and finally the experience of the user is improved.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform an audio enhancement method comprising: performing coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an ambient sound signal; determining an initial audio enhancement threshold based on the ambient sound signal and determining the initial audio enhancement threshold as a target audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the ambient sound signal is smaller than a preset correlation coefficient threshold; determining a target audio enhancement threshold based on a preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent acoustic signal and the environmental acoustic signal is greater than or equal to a preset correlation coefficient threshold, wherein the absolute value of the preset audio enhancement threshold is smaller than or equal to the absolute value of the initial audio enhancement threshold; performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal; the target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the audio enhancement method provided by the above methods, the method comprising: performing coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an ambient sound signal; determining an initial audio enhancement threshold based on the ambient sound signal and determining the initial audio enhancement threshold as a target audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the ambient sound signal is smaller than a preset correlation coefficient threshold; determining a target audio enhancement threshold based on a preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent acoustic signal and the environmental acoustic signal is greater than or equal to a preset correlation coefficient threshold, wherein the absolute value of the preset audio enhancement threshold is smaller than or equal to the absolute value of the initial audio enhancement threshold; performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal; the target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of audio enhancement, comprising:
performing coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an ambient sound signal;
determining an initial audio enhancement threshold based on the ambient sound signal and determining the initial audio enhancement threshold as a target audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent sound signal and the ambient sound signal is smaller than a preset correlation coefficient threshold;
determining a target audio enhancement threshold based on a preset audio enhancement threshold under the condition that the cross-correlation coefficient of the coherent acoustic signal and the environmental acoustic signal is greater than or equal to a preset correlation coefficient threshold, wherein the absolute value of the preset audio enhancement threshold is smaller than or equal to the absolute value of the initial audio enhancement threshold;
Performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal;
the target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
2. The audio enhancement method according to claim 1, wherein the audio signal to be enhanced is determined based on the steps of:
formant detection is carried out on the audio data to be enhanced to obtain a plurality of formants, and a frequency point set corresponding to the formants is determined, wherein the frequency point set comprises a plurality of frequency points corresponding to the formants;
determining a second target frequency point from the frequency point set based on a first target frequency point with the largest power in the frequency point set, wherein the second target frequency point is a frequency point with the smallest frequency difference value with the first target frequency point in the first target frequency point set, and the first target frequency point set comprises frequency points with the power difference value larger than a preset power difference value with the first target frequency point in the frequency point set;
Determining a frequency range of a current interval based on the frequency interval between the first target frequency point and the second target frequency point, and determining an interval midpoint of the current interval based on the first target frequency point;
determining a second target frequency point set corresponding to the current interval from the frequency point sets based on the frequency interval and the interval midpoint;
removing frequency points in the second target frequency point set in the frequency point set, returning to the first target frequency point with maximum power in the frequency point set, and determining the second target frequency point from the frequency point set until a preset condition is met, wherein the preset condition comprises that no frequency point exists in the frequency point set, or the number of current intervals reaches the number of preset intervals;
dividing the frequency of the audio data to be enhanced based on the middle point of each interval and the frequency range of each interval to obtain a plurality of subband sequence sets, determining any one of the subband sequence sets as the audio signal to be enhanced, respectively determining the target audio enhancement threshold corresponding to each subband sequence set, and respectively carrying out enhancement processing on each subband sequence set based on the target audio enhancement threshold corresponding to each subband sequence set.
3. The audio enhancement method according to claim 2, wherein said determining an interval midpoint of a current interval based on the first target frequency point comprises:
determining a first interval endpoint of the current interval based on a difference value between the first target frequency point and the frequency interval, and determining a second interval endpoint of the current interval based on a sum value of the first target frequency point and the frequency interval;
taking the endpoint with the largest frequency in the second interval endpoint of the previous interval and the first interval endpoint of the current interval as the third interval endpoint of the current interval, wherein the previous interval is the interval determined before the current interval is determined, and if the current interval is the first determined interval, the second interval endpoint of the previous interval is 0;
and determining an interval midpoint of the current interval based on an average value of a third interval endpoint of the current interval and a second interval endpoint of the current interval.
4. The audio enhancement method according to claim 2, wherein the determining a second target frequency point from the set of frequency points based on the first target frequency point with the largest power in the set of frequency points further comprises:
And removing frequency points with power smaller than preset power in the frequency point set.
5. The audio enhancement method of claim 1, wherein the determining an initial audio enhancement threshold based on the ambient sound signal comprises:
carrying out logarithmic calculation on the absolute value of the environmental sound signal to obtain a logarithmic value;
the initial audio enhancement threshold is determined based on the product of the logarithmic value and a preset value.
6. The audio enhancement method according to claim 1, wherein the audio signal to be enhanced comprises a plurality of frames of audio signals, the coherent sound signal comprises a plurality of frames of coherent sounds corresponding to the audio signals, the ambient sound signal comprises a plurality of frames of ambient sounds corresponding to the audio signals, and the target audio enhancement threshold comprises an enhancement threshold corresponding to a plurality of frames of the audio signals;
the enhancing the audio signal to be enhanced based on the at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal includes:
smoothing the multi-frame enhancement threshold to obtain a processed multi-frame enhancement threshold;
performing enhancement processing on the audio signal to be enhanced based on at least one audio enhancement parameter and the processed multi-frame enhancement threshold value to obtain an enhanced audio signal;
Wherein, any frame enhancement threshold after processing is determined based on the following steps:
and taking any processed enhancement threshold value of the frame as a processed enhancement threshold value corresponding to the current frame, and carrying out weighted aggregation processing on the processed enhancement threshold value corresponding to the previous frame and the enhancement threshold value corresponding to the current frame based on a first weight corresponding to the previous frame and a second weight corresponding to the current frame to obtain the processed enhancement threshold value corresponding to the current frame, wherein the sum of the first weight and the second weight is 1.
7. The audio enhancement method according to any one of claims 1 to 6, wherein the enhancing the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value, results in an enhanced audio signal, further comprising:
and processing the enhanced audio signal based on preset audio control parameters to obtain a processed enhanced audio signal.
8. An audio enhancement device, comprising:
the audio extraction module is used for carrying out coherent sound extraction on the audio signal to be enhanced to obtain a coherent sound signal and an environmental sound signal;
A first determining module, configured to determine an initial audio enhancement threshold based on the ambient sound signal and determine the initial audio enhancement threshold as a target audio enhancement threshold when a cross-correlation coefficient of the coherent sound signal and the ambient sound signal is less than a preset correlation coefficient threshold;
a second determining module, configured to determine a target audio enhancement threshold based on a preset audio enhancement threshold when a cross-correlation coefficient of the coherent acoustic signal and the ambient acoustic signal is greater than or equal to a preset correlation coefficient threshold, where an absolute value of the preset audio enhancement threshold is less than or equal to an absolute value of the initial audio enhancement threshold;
the audio enhancement module is used for enhancing the audio signal to be enhanced based on at least one audio enhancement parameter and the target audio enhancement threshold value to obtain an enhanced audio signal;
the target audio enhancement threshold is used for determining a part of audio signals from the audio signals to be enhanced, wherein the part of audio signals are signals which are required to be enhanced and correspond to at least one enhancement parameter in the at least one audio enhancement parameter.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the audio enhancement method of any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the audio enhancement method according to any one of claims 1 to 7.
CN202311413048.0A 2023-10-30 2023-10-30 Audio enhancement method, device, electronic equipment and storage medium Active CN117153192B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311413048.0A CN117153192B (en) 2023-10-30 2023-10-30 Audio enhancement method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311413048.0A CN117153192B (en) 2023-10-30 2023-10-30 Audio enhancement method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117153192A CN117153192A (en) 2023-12-01
CN117153192B true CN117153192B (en) 2024-02-20

Family

ID=88899085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311413048.0A Active CN117153192B (en) 2023-10-30 2023-10-30 Audio enhancement method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117153192B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101816191A (en) * 2007-09-26 2010-08-25 弗劳恩霍夫应用研究促进协会 Be used for obtaining extracting the apparatus and method and the computer program that are used to extract ambient signal of apparatus and method of the weight coefficient of ambient signal
WO2021248521A1 (en) * 2020-06-12 2021-12-16 瑞声声学科技(深圳)有限公司 Audio signal adjustment method and apparatus, computer device, and storage medium
CN114067817A (en) * 2021-11-08 2022-02-18 易兆微电子(杭州)股份有限公司 Bass enhancement method, bass enhancement device, electronic equipment and storage medium
CN114203163A (en) * 2022-02-16 2022-03-18 荣耀终端有限公司 Audio signal processing method and device
CN114424588A (en) * 2019-09-17 2022-04-29 诺基亚技术有限公司 Direction estimation enhancement for parametric spatial audio capture using wideband estimation
CN114420153A (en) * 2021-12-08 2022-04-29 深圳市东微智能科技股份有限公司 Sound quality adjusting method, device, equipment and storage medium
WO2023284438A1 (en) * 2021-07-16 2023-01-19 RealMe重庆移动通信有限公司 Audio data processing method and apparatus, and electronic device
CN115862657A (en) * 2023-02-22 2023-03-28 科大讯飞(苏州)科技有限公司 Noise-dependent gain method and device, vehicle-mounted system, electronic equipment and storage medium
CN116168719A (en) * 2022-12-26 2023-05-26 杭州爱听科技有限公司 Sound gain adjusting method and system based on context analysis
CN116486833A (en) * 2023-06-21 2023-07-25 北京探境科技有限公司 Audio gain adjustment method and device, storage medium and electronic equipment
CN116645973A (en) * 2023-07-20 2023-08-25 腾讯科技(深圳)有限公司 Directional audio enhancement method and device, storage medium and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101816191A (en) * 2007-09-26 2010-08-25 弗劳恩霍夫应用研究促进协会 Be used for obtaining extracting the apparatus and method and the computer program that are used to extract ambient signal of apparatus and method of the weight coefficient of ambient signal
CN114424588A (en) * 2019-09-17 2022-04-29 诺基亚技术有限公司 Direction estimation enhancement for parametric spatial audio capture using wideband estimation
WO2021248521A1 (en) * 2020-06-12 2021-12-16 瑞声声学科技(深圳)有限公司 Audio signal adjustment method and apparatus, computer device, and storage medium
WO2023284438A1 (en) * 2021-07-16 2023-01-19 RealMe重庆移动通信有限公司 Audio data processing method and apparatus, and electronic device
CN114067817A (en) * 2021-11-08 2022-02-18 易兆微电子(杭州)股份有限公司 Bass enhancement method, bass enhancement device, electronic equipment and storage medium
CN114420153A (en) * 2021-12-08 2022-04-29 深圳市东微智能科技股份有限公司 Sound quality adjusting method, device, equipment and storage medium
CN114203163A (en) * 2022-02-16 2022-03-18 荣耀终端有限公司 Audio signal processing method and device
CN116168719A (en) * 2022-12-26 2023-05-26 杭州爱听科技有限公司 Sound gain adjusting method and system based on context analysis
CN115862657A (en) * 2023-02-22 2023-03-28 科大讯飞(苏州)科技有限公司 Noise-dependent gain method and device, vehicle-mounted system, electronic equipment and storage medium
CN116486833A (en) * 2023-06-21 2023-07-25 北京探境科技有限公司 Audio gain adjustment method and device, storage medium and electronic equipment
CN116645973A (en) * 2023-07-20 2023-08-25 腾讯科技(深圳)有限公司 Directional audio enhancement method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN117153192A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
KR101914312B1 (en) Dynamic compensation of audio signals for improved perceived spectral imbalances
JP5248625B2 (en) System for adjusting the perceived loudness of audio signals
US9998081B2 (en) Method and apparatus for processing an audio signal based on an estimated loudness
JP5635669B2 (en) System for extracting and modifying the echo content of an audio input signal
US10242692B2 (en) Audio coherence enhancement by controlling time variant weighting factors for decorrelated signals
US8755545B2 (en) Stability and speech audibility improvements in hearing devices
JPH0916194A (en) Noise reduction for voice signal
CN115862657B (en) Noise-following gain method and device, vehicle-mounted system, electronic equipment and storage medium
JP7350973B2 (en) Adaptation of sibilance detection based on detection of specific voices in audio signals
WO2021239255A1 (en) Method and apparatus for processing an initial audio signal
JP2023536104A (en) Noise reduction using machine learning
JP6482880B2 (en) Mixing apparatus, signal mixing method, and mixing program
JP2009296298A (en) Sound signal processing device and method
CN117153192B (en) Audio enhancement method, device, electronic equipment and storage medium
US20230360662A1 (en) Method and device for processing a binaural recording
CN110168640B (en) Apparatus and method for enhancing a desired component in a signal
KR20080068397A (en) Speech intelligibility enhancement apparatus and method
CN114143667A (en) Volume adjusting method, storage medium and electronic device
CN115066912A (en) Method for audio rendering by a device
JP4495704B2 (en) Sound image localization emphasizing reproduction method, apparatus thereof, program thereof, and storage medium thereof
CN118072719A (en) Audio processing method, readable medium, electronic device, and program product
CN114615581A (en) Method and device for improving audio subjective experience quality
CN116057626A (en) Noise reduction using machine learning
US9653065B2 (en) Audio processing device, method, and program
CN113730914A (en) Audio adjusting method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant