CN116095561A

CN116095561A - Audio data processing method, audio data processing device, and storage medium

Info

Publication number: CN116095561A
Application number: CN202310132337.7A
Authority: CN
Inventors: 李建华
Original assignee: Nanjing Goertek Acoustics Technology Co ltd
Current assignee: Nanjing Goertek Acoustics Technology Co ltd
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-05-09

Abstract

The invention discloses a processing method of audio data, a processing device of the audio data and a storage medium. Wherein the method comprises the following steps: acquiring an audio signal; extracting a low frequency signal from the audio signal; identifying a first characteristic parameter of a human voice signal and a second characteristic parameter of an accompaniment sound signal in the audio signal, and determining a control parameter for controlling harmonic generation according to the first characteristic parameter and the second characteristic parameter; generating the enhanced harmonic signal in the low frequency signal according to the control parameter. The invention aims to improve the definition of human voice in played sound while realizing bass enhancement.

Description

Audio data processing method, audio data processing device, and storage medium

Technical Field

The present invention relates to the field of intelligent audio technologies, and in particular, to a method for processing audio data, a device for processing audio data, and a storage medium.

Background

During data processing, a sound playing device (especially a device with a smaller speaker size) generally uses a bass enhancement algorithm to process an audio signal, and replaces low-frequency components in the audio signal with higher harmonics generated by the low-frequency components.

However, the control parameters for forming harmonics in the low frequency signal are generally fixed parameters set in advance, which may result in a reduction in the clarity of human voice in the sound played after bass enhancement of some audio data.

Disclosure of Invention

The invention mainly aims to provide a processing method of audio data, a processing device of the audio data and a storage medium, aiming at enhancing the bass and improving the voice definition in playing sound.

To achieve the above object, the present invention provides an audio data processing method including the steps of:

acquiring an audio signal;

extracting a low frequency signal from the audio signal;

identifying a first characteristic parameter of a human voice signal and a second characteristic parameter of an accompaniment sound signal in the audio signal, and determining a control parameter for controlling harmonic generation according to the first characteristic parameter and the second characteristic parameter;

generating the enhanced harmonic signal in the low frequency signal according to the control parameter.

Optionally, the first characteristic parameter includes a first duty ratio of the human voice signal in the audio signal, the second characteristic parameter includes a second duty ratio of the accompaniment sound signal in the audio signal, and the step of determining the control parameter for controlling the harmonic generation according to the first characteristic parameter and the second characteristic parameter includes:

and determining a harmonic generation amount and/or a harmonic generation proportion according to the first duty ratio and the second duty ratio, wherein the control parameter comprises the harmonic generation amount and/or the harmonic generation proportion.

Optionally, the harmonic generation amount is inversely related to the first duty cycle, and/or the harmonic generation ratio is inversely related to the first duty cycle;

the harmonic generation amount is positively correlated with the second duty ratio, and/or the harmonic generation ratio is positively correlated with the first duty ratio.

Optionally, the step of determining the harmonic generation amount and/or the harmonic generation ratio according to the first and second duty ratios includes:

when the first duty ratio is larger than a first preset duty ratio and the second duty ratio is smaller than a second preset duty ratio, determining a first harmonic quantity as the harmonic generation quantity and/or determining a first proportion as the harmonic generation proportion;

when the first duty ratio and the second duty ratio are smaller than or equal to a first preset duty ratio and are larger than or equal to the second preset duty ratio, determining a second harmonic quantity as the harmonic generation quantity and/or determining a second proportion as the harmonic generation proportion;

when the second duty ratio is larger than the first preset duty ratio and the first duty ratio is smaller than the second preset duty ratio, determining a third harmonic quantity as the harmonic generation quantity and/or determining a third proportion as the harmonic generation proportion;

wherein the first harmonic quantity is smaller than the second harmonic quantity, and the second harmonic quantity is smaller than the third harmonic quantity; the first ratio is less than the second ratio, which is less than the third ratio.

Optionally, before the step of determining the second harmonic quantity as the harmonic generation quantity and/or determining the second proportion as the harmonic generation proportion, the method further includes:

determining a relationship value of the first duty cycle and the second duty cycle when the first duty cycle and the second duty cycle are both smaller than or equal to a first preset duty cycle and are both larger than or equal to the second preset duty cycle;

adjusting the third harmonic quantity according to the relation value to obtain the second harmonic quantity; and/or adjusting the third proportion according to the relation value to obtain the second proportion.

Optionally, before the step of determining the third harmonic quantity as the harmonic generation quantity and/or determining the third proportion as the harmonic generation proportion, the method further includes:

identifying the frequency amplitude of the accompaniment sound signal in a preset frequency band;

and determining the third harmonic quantity and/or the third proportion according to the frequency amplitude.

Optionally, the step of identifying the first characteristic parameter of the human voice signal and the second characteristic parameter of the accompaniment sound signal in the audio signal includes:

extracting frequency domain features and time domain features of the audio signal;

and determining the first characteristic parameter and the second characteristic parameter according to the frequency characteristic and the time domain characteristic.

Optionally, the time domain features include at least one of the following parameters: short energy, loudness, glottal excitation pulses;

the frequency domain features include at least one of the following parameters: cepstral coefficients, spectral entropy, line spectrum pairs.

In addition, in order to achieve the above object, the present application also proposes a processing apparatus of audio data, the processing apparatus of audio data comprising: memory, processor and stored on said memory and executable on said processor, said audio data processing program implementing the steps of the audio data processing method as claimed in any one of the above when executed by said processor.

In addition, in order to achieve the above object, the present application also proposes a storage medium having stored thereon a processing program of audio data, which when executed by a processor, implements the steps of the processing method of audio data as set forth in any one of the above.

According to the audio data processing method, the enhancement harmonic signals generated in the low-frequency signals extracted from the audio signals are regulated and controlled based on the first characteristic parameters of the voice signals and the second characteristic parameters of the accompaniment sound signals which are obtained through identification in the audio signals, and based on the enhancement harmonic signals, the control parameters generated by the harmonics in the bass enhancement process are not determined by preset fixed parameters but are determined by combining the characteristics of the voice and the accompaniment sound in the audio signals, so that the voice definition after the audio data subjected to bass enhancement is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a hardware structure involved in an embodiment of an audio data processing apparatus according to the present invention;

FIG. 2 is a flow chart illustrating an embodiment of a method for processing audio data according to the present invention;

FIG. 3 is a flowchart illustrating an embodiment of a method for processing audio data according to the present invention;

FIG. 4 is a flowchart illustrating a processing method of audio data according to another embodiment of the present invention;

fig. 5 is a flowchart illustrating a processing method of audio data according to another embodiment of the invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the invention provides an audio data processing device 1.

In the present embodiment, the processing apparatus 1 of audio data may be built in any device having an audio playing function. For example, devices herein may include, but are not limited to, cell phones, head mounted display devices (e.g., virtual reality devices or augmented reality devices), smart audio glasses, neck speakers, open headphones, stereo, tablet, television, and the like.

In this embodiment, the volume of the horn in the device where the processing apparatus 1 for audio data is located is smaller than a preset volume, and the cutoff frequency of the horn of the volume is larger than the preset frequency.

In an embodiment of the present invention, referring to fig. 1, an audio data processing apparatus 1 includes: a processor 1001 (e.g., CPU), a memory 1002, a timer 1003, and the like. The components in the control device are connected through a communication bus. The memory 1002 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1002 may alternatively be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the device structure shown in fig. 1 is not limiting of the device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components.

As shown in fig. 1, a processing program of audio data may be included in a memory 1002 as a storage medium. In the apparatus shown in fig. 1, a processor 1001 may be used to call a processing program of audio data stored in a memory 1002 and perform the relevant step operations of the processing method of audio data in the following embodiments.

The embodiment of the invention also provides a processing method of the audio data, which is applied to the processing device of the audio data.

Referring to fig. 2, an embodiment of a method for processing audio data is provided. In this embodiment, the method for processing audio data includes:

step S10, acquiring an audio signal;

the audio signal can be read in a local memory, can be acquired in real time through a microphone, and can be acquired based on network connection.

Step S20, extracting a low-frequency signal in the audio signal;

specifically, the cut-off frequency of a loudspeaker for playing the audio signal can be obtained, and a signal lower than the cut-off frequency in the audio signal is used as a low-frequency signal.

Step S30, identifying a first characteristic parameter of a human voice signal and a second characteristic parameter of an accompaniment sound signal in the audio signal, and determining a control parameter for controlling harmonic generation according to the first characteristic parameter and the second characteristic parameter;

the voice signal may be any voice signal, or may be a signal formed corresponding to a voice that satisfies a preset condition (for example, may be a voice signal corresponding to a person with a specified identity, etc.).

The accompaniment sound signal may be all signals other than the human voice signal in the audio signal. Alternatively, the accompaniment sound signal may be a signal satisfying a preset melody condition other than the human sound signal in the audio signal.

Specifically, a voice recognition model and an accompaniment recognition model can be preset, an audio signal is input into the voice recognition model, and a recognition result output by the voice recognition model is used as a first characteristic parameter; the audio signal is input into the accompaniment recognition model, and the recognition result output by the accompaniment recognition model is used as the second characteristic parameter. The voice recognition model and the accompaniment recognition model may be machine learning models, wherein the voice recognition model may be adjusted according to a first error existing when the accompaniment recognition model recognizes standard data including voice and accompaniment voice, and the accompaniment recognition model may be adjusted according to a second error existing when the voice recognition model recognizes standard data including voice and accompaniment voice, thereby being beneficial to improving recognition accuracy.

The first characteristic parameter may be any parameter (e.g., duty cycle, number of voices and/or frequency amplitude, etc.) that characterizes the voice signal in the audio signal. The second characteristic parameter may be any parameter (e.g. duty cycle, sound type and/or frequency amplitude etc.) that characterizes the accompaniment sound signal in the audio signal.

The control parameter is any parameter used for regulating and controlling the gain of the harmonic signal in the low-frequency signal. In the present embodiment, the control parameters may include a harmonic generation amount and/or a harmonic generation ratio, etc.

The different first characteristic parameters and the different second characteristic parameters correspond to different control parameters. Specifically, the correspondence between the first characteristic parameter, the second characteristic parameter and the control parameter may be established in advance. The correspondence may include a form of a mapping relationship, a calculation formula, or the like. Based on the corresponding relation, the control parameters can be obtained through the table look-up of the first characteristic parameters and the second characteristic parameters; the control parameters may also be calculated by substituting the first and second characteristic parameters into a formula, and so on.

Step S40, generating the enhanced harmonic signal in the low frequency signal according to the control parameter.

In this embodiment, the enhanced harmonic signal is specifically a higher harmonic signal.

Here, the generation of the enhanced harmonic signal in the low frequency signal according to the control parameter, specifically, generating the enhanced harmonic signal generated from the low frequency signal according to the control parameter, and replacing the component of the original low frequency signal in the audio signal with the enhanced harmonic signal.

When the control parameters comprise harmonic generation quantity, generating initial harmonic by the low-frequency signals, generating corresponding quantity of initial harmonic signals according to the harmonic generation quantity to serve as enhanced harmonic signals, and replacing original low-frequency signals in the audio signals with the enhanced harmonic signals. When the control parameters comprise harmonic generation proportion, generating initial harmonic by the low-frequency signal, amplifying the initial harmonic according to the harmonic generation proportion to obtain enhanced harmonic signal, and replacing the original low-frequency signal in the audio signal with the enhanced harmonic signal.

The original low-frequency signal of the audio signal is replaced by the enhanced harmonic signal and then can be output through the loudspeaker.

According to the processing method of the audio data, the enhancement harmonic signals generated in the low-frequency signals extracted from the audio signals are regulated and controlled based on the first characteristic parameters of the voice signals and the second characteristic parameters of the accompaniment sound signals, which are obtained through identification in the audio signals, and the control parameters of harmonic generation in the bass enhancement process are not determined by preset fixed parameters any more, but are determined by combining the characteristics of the voice and the accompaniment sound in the audio signals, so that the voice definition after the voice data with bass enhancement are played is effectively improved.

Further, based on the above embodiment, another embodiment of the processing method of audio data of the present application is provided. In this embodiment, referring to fig. 3, the first characteristic parameter includes a first duty ratio of the human voice signal in the audio signal, the second characteristic parameter includes a second duty ratio of the accompaniment sound signal in the audio signal, and the step of determining a control parameter for controlling harmonic generation according to the first characteristic parameter and the second characteristic parameter includes: and determining a harmonic generation amount and/or a harmonic generation proportion according to the first duty ratio and the second duty ratio, wherein the control parameter comprises the harmonic generation amount and/or the harmonic generation proportion. Based on this, step S30 includes: step S31, identifying a first characteristic parameter of a human voice signal and a second characteristic parameter of an accompaniment sound signal in the audio signal, and determining a harmonic generation amount and/or a harmonic generation proportion according to the first duty ratio and the second duty ratio, wherein the control parameter comprises the harmonic generation amount and/or the harmonic generation proportion. Step S40 includes: and step S41, generating the enhanced harmonic signal in the low-frequency signal according to the harmonic generation amount and/or the harmonic generation proportion.

The harmonic generation amount is specifically a target number of generated harmonics. The harmonic generation ratio is specifically an amplification ratio of a harmonic generated by the low-frequency signal.

Specifically, a relationship value between the first duty ratio and the second duty ratio may be determined, and the harmonic generation amount and/or the harmonic generation ratio may be determined according to the relationship value. In addition, a first section in which the first duty ratio is located and a second section in which the second duty ratio is located may be determined, and the harmonic generation amount and/or the harmonic generation ratio may be determined from the first section and the second section.

The different first duty cycle and the different second duty cycle correspond to different harmonic generation amounts. The different first and second duty cycles correspond to different harmonic generation ratios. Based on this, a first correspondence relationship (e.g., a mapping relationship, a formula, etc.) between the first duty ratio, the second duty ratio, and the harmonic generation amount may be established in advance, and the harmonic generation amount to which the first duty ratio and the second duty ratio correspond may be determined based on the first correspondence relationship. A second correspondence (e.g., a mapping relationship, a formula, etc.) between the first duty ratio, the second duty ratio, and the harmonic generation ratio is established in advance, and the harmonic generation ratio corresponding to the first duty ratio and the second duty ratio is determined based on the second correspondence. Wherein the harmonic generation amount is inversely related to the first duty cycle, and/or the harmonic generation ratio is inversely related to the first duty cycle; the harmonic generation amount is positively correlated with the second duty ratio, and/or the harmonic generation ratio is positively correlated with the first duty ratio. That is, the larger the human voice signal ratio is, the smaller the harmonic generation amount and/or the harmonic generation ratio is, and the larger the accompaniment voice signal ratio is, the larger the harmonic generation amount and/or the harmonic generation ratio is. Based on the method, the voice definition in the sound played after the bass enhancement can be effectively improved.

Further, more than one second corresponding relation may be preset, one of the more than one second corresponding relation is determined as a target corresponding relation according to the harmonic generation amount, different harmonic generation amounts correspond to different target corresponding relations, and the harmonic generation proportion corresponding to the first duty ratio and the second duty ratio is determined based on the target corresponding relation.

In this embodiment, the first duty ratio and the second duty ratio can accurately reflect the influence of the audio signal on the clarity of the voice after the bass enhancement, and the harmonic generation amount and the harmonic generation ratio can effectively adjust the quality of the voice after the bass enhancement, so that the harmonic generation ratio and/or the harmonic generation amount are determined by combining the first duty ratio and the second duty ratio, the clarity of the voice can not be excessively weakened while the bass enhancement is ensured, and the clarity of the voice after the audio data after the bass enhancement is played is effectively improved.

Further, based on any one of the above embodiments, a further embodiment of the method for processing audio data of the present application is provided. In the present embodiment, referring to fig. 4, determining the harmonic generation amount and/or the harmonic generation ratio from the first and second duty ratios includes:

step S311, when the first duty ratio is greater than a first preset duty ratio and the second duty ratio is less than a second preset duty ratio, determining a first harmonic quantity as the harmonic generation quantity and/or determining a first proportion as the harmonic generation proportion;

step S312, when the first duty ratio and the second duty ratio are both smaller than or equal to a first preset duty ratio and are both larger than or equal to the second preset duty ratio, determining a second harmonic amount as the harmonic generation amount and/or determining a second ratio as the harmonic generation ratio;

step S313, when the second duty ratio is greater than the first preset duty ratio and the first duty ratio is less than the second preset duty ratio, determining a third harmonic quantity as the harmonic generation quantity and/or determining a third proportion as the harmonic generation proportion;

The first preset duty ratio and the second preset duty ratio may be preset fixed parameters, or may be determined according to the audio type corresponding to the audio signal and a relationship value between the first duty ratio and the second duty ratio. For example, the first preset duty cycle is 90% and the second preset duty cycle is 10%. Alternatively, the first preset duty ratio and the second preset duty ratio may be set to other values according to actual conditions.

The first harmonic amount, the second harmonic amount, the third harmonic amount, the first ratio, the second ratio, and/or the third ratio may be fixed values set in advance, or may be values determined according to the audio signal, the first signal, and/or the second signal.

For example, when the first duty ratio is 100% and the second duty ratio is 0%, determining the first harmonic quantity as the harmonic generation quantity and determining the first ratio as the harmonic generation proportion; when the first duty ratio is 0% and the second duty ratio is 100%, determining a third harmonic quantity as the harmonic generation quantity and determining a third proportion as the harmonic generation proportion; when the first and second duty ratios are between 0% and 100%, the second harmonic amount may be determined as the harmonic generation amount, and the second ratio may be determined as the harmonic generation ratio.

In this embodiment, based on the difference between the duty ratio intervals where the first duty ratio and the second duty ratio are located, different harmonic generation amounts and/or harmonic generation ratios are correspondingly adopted as control parameters, and based on this, the influence of the recognition errors of the first duty ratio and the second duty ratio on the accuracy of subsequent control parameters can be effectively reduced, so that the accuracy of the control parameters is improved, and the voice clarity after the audio data with enhanced bass is played is further improved.

Further, in this embodiment, before the step of determining that the second harmonic amount is the harmonic generation amount and/or determining that the second ratio is the harmonic generation ratio, the method further includes: determining a relationship value of the first duty cycle and the second duty cycle when the first duty cycle and the second duty cycle are both smaller than or equal to a first preset duty cycle and are both larger than or equal to the second preset duty cycle; adjusting the third harmonic quantity according to the relation value to obtain the second harmonic quantity; and/or adjusting the third proportion according to the relation value to obtain the second proportion.

The relationship values may include differences and/or ratios, etc. Specifically, a first adjustment amplitude or a first adjustment proportion of the third harmonic quantity can be determined according to the relation value, and the third harmonic quantity is reduced according to the first adjustment amplitude or the first adjustment proportion to obtain a second harmonic quantity; in addition, a second adjustment amplitude or a second adjustment proportion of the third proportion can be determined according to the relation value, and the third proportion is reduced according to the second adjustment amplitude or the second adjustment proportion to obtain the second proportion.

Specifically, in this embodiment, the relationship value is a ratio of the first duty ratio to the second duty ratio, and the second harmonic quantity is inversely related to the ratio, and the second ratio is inversely related to the ratio, that is, the smaller the ratio is, the greater the specific gravity of the accompaniment sound relative to the human voice is, and the greater the corresponding second harmonic quantity and/or the second ratio may be.

Further, the first harmonic quantity may be determined based on the second harmonic quantity, and the first ratio may be determined based on the second ratio.

In this embodiment, the third harmonic quantity is adjusted based on the relation value between the first duty ratio and the second duty ratio to obtain the second harmonic quantity and/or the third proportion is adjusted to obtain the second proportion, so that accuracy of the second harmonic quantity and/or the second proportion determined when the deviation between the first duty ratio and the second duty ratio is not too large is further ensured, and the definition of human voice is further improved while bass enhancement is further realized.

Further, in this embodiment, before the step of determining that the third harmonic amount is the harmonic generation amount and/or determining that the third proportion is the harmonic generation proportion, the method further includes: identifying the frequency amplitude of the accompaniment sound signal in a preset frequency band; and determining the third harmonic quantity and/or the third proportion according to the frequency amplitude.

The preset frequency band is a frequency band in a frequency range smaller than the cutoff frequency. The maximum value of the preset frequency band is smaller than the cut-off frequency.

Different frequency magnitudes correspond to different third harmonic amounts and/or third proportions. Wherein the third harmonic quantity and/or the third ratio is positively correlated with the frequency amplitude. I.e. the larger the frequency amplitude, the larger the third harmonic quantity and/or the third ratio. Specifically, the third harmonic quantity and/or the third proportion can be calculated and obtained through a frequency amplitude substitution formula; alternatively, a section in which the frequency amplitude is located may also be determined, and the corresponding third harmonic content and/or third ratio may be determined based on the section.

The third harmonic quantity and/or the third ratio may be determined in the above manner.

In this embodiment, the above manner is beneficial to improving accuracy of the third harmonic quantity and/or the third ratio, so as to further enhance the bass sound and improve the clarity of the voice.

Further, based on any one of the above embodiments, a further embodiment of the method for processing audio data of the present application is provided. In the present embodiment, referring to fig. 5, step S20 includes:

step S21, extracting frequency domain characteristics and time domain characteristics of the audio signal;

the time domain features include at least one of the following parameters: short energy, loudness, glottal excitation pulse. The frequency domain features include at least one of the following parameters: cepstral coefficients, spectral entropy, line spectrum pairs. Wherein the cepstral coefficients may comprise one or more of linear prediction cepstral coefficients, mel-cepstral coefficients, and first order differential mel-cepstral coefficients.

Specifically, the audio signal may be divided into a plurality of data frames according to a preset rule, and the time domain features of each data frame may be extracted. After a plurality of data frames are obtained, each data frame may be subjected to windowing, fourier transform is performed on the data frame subjected to the windowing, and frequency domain feature extraction is performed based on the result of the fourier transform, thereby obtaining frequency domain features.

And S22, determining the first characteristic parameter and the second characteristic parameter according to the frequency characteristic and the time domain characteristic.

The signal types in the audio signal are classified based on the frequency characteristic and the time domain characteristic to obtain a voice signal and an accompaniment sound signal, a first characteristic parameter is determined according to the signal characteristic of the voice signal and the signal characteristic of the audio signal, and a second characteristic parameter is determined according to the signal characteristic of the accompaniment sound signal and the signal characteristic of the audio signal.

In this embodiment, the frequency domain feature and the time domain feature of the audio signal are extracted, so that the first feature parameter and the second feature parameter are accurately analyzed and obtained, so as to further improve the accuracy of the control parameters of the enhanced harmonic signal which are determined later.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a processing program of audio data, and the processing program of the audio data realizes the relevant steps of any embodiment of the processing method of the audio data when being executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a processing means for audio data, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An audio data processing method, characterized in that the audio data processing method comprises the steps of:

acquiring an audio signal;

extracting a low frequency signal from the audio signal;

2. The audio data processing method of claim 1, wherein the first characteristic parameter comprises a first duty cycle of the human voice signal in the audio signal, the second characteristic parameter comprises a second duty cycle of the accompaniment sound signal in the audio signal, and the step of determining the control parameter for controlling the harmonic generation according to the first characteristic parameter and the second characteristic parameter comprises:

3. The audio data processing method according to claim 2, wherein the harmonic generation amount is inversely related to the first duty ratio, and/or the harmonic generation ratio is inversely related to the first duty ratio;

4. The audio data processing method according to claim 3, wherein the step of determining the harmonic generation amount and/or the harmonic generation ratio from the first and second duty ratios includes:

5. The audio data processing method according to claim 4, wherein before the step of determining that the second harmonic amount is the harmonic generation amount and/or determining that the second ratio is the harmonic generation ratio, further comprising:

6. The audio data processing method according to claim 4, wherein before the step of determining that the third harmonic amount is the harmonic generation amount and/or determining that the third ratio is the harmonic generation ratio, further comprising:

7. The audio data processing method according to any one of claims 1 to 6, wherein the step of identifying a first characteristic parameter of a human voice signal and a second characteristic parameter of an accompaniment sound signal in the audio signal includes:

8. The audio data processing method of claim 7, wherein the time domain features include at least one of the following parameters: short energy, loudness, glottal excitation pulses;

9. An audio data processing device, characterized in that the audio data processing device comprises: memory, a processor and a processing program of audio data stored on the memory and executable on the processor, which when executed by the processor, realizes the steps of the audio data processing method according to any one of claims 1 to 8.

10. A storage medium having stored thereon a processing program of audio data, which when executed by a processor, implements the steps of the audio data processing method according to any one of claims 1 to 8.