CN110858487A

CN110858487A - Audio signal scaling processing method and device

Info

Publication number: CN110858487A
Application number: CN201810965842.9A
Authority: CN
Inventors: 高威特; 张楠赓
Original assignee: Canaan Creative Co Ltd
Current assignee: Canaan Bright Sight Co Ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2020-03-03

Abstract

The embodiment of the invention provides a method and a device for scaling an audio signal, which are characterized in that the method comprises the following steps: acquiring a current frame audio signal; detecting the threshold crossing rate of the energy amplitude of the current frame audio signal in a preset frequency range; updating an audio zooming processing coefficient according to the threshold crossing rate; and carrying out scaling processing on the next frame of the current frame audio signal according to the updated audio scaling processing coefficient. The invention further improves the zooming processing effect of the audio signal by pertinently selecting the dynamic range of the audio signal according to the data condition in the effective frequency band.

Description

Audio signal scaling processing method and device

Technical Field

The invention relates to the field of data processing, in particular to an audio signal scaling processing method and device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The problem of sampling is involved when describing a real quantity with existing resources (such as memory, bandwidth, etc.) in a computer, and the sampled data needs to be described in a binary form by using a fixed bit width. However, the fixed-length bit value has a range limitation, especially when an embedded audio processing device is used, which easily causes data overflow and further causes audio signal distortion.

In the prior art, when the audio signal is scaled to prevent the audio signal from overflowing, the effective frequency band and the irrelevant frequency band of the audio signal are not distinguished. For example, in the speech recognition field, most of data underflows are concentrated in the frequency band below 100Hz, however, in practical cases, this is a frequency band that is not concerned at all by human ears, and in the prior art, such underflows are usually used as the same influencing factors for the subsequent audio scaling processing, which further deteriorates the effect of scaling the audio signal.

Disclosure of Invention

The invention provides a method and a device for scaling an audio signal, which can adaptively scale the audio signal according to an actual application scene and remarkably improve the effect of anti-data overflow processing.

In a first aspect of an embodiment of the present invention, a method for scaling an audio signal is provided, where the method includes:

acquiring a current frame audio signal; detecting the threshold passing rate of the energy amplitude of the current frame audio signal in a preset frequency range;

updating an audio scaling processing coefficient according to the threshold crossing rate;

and carrying out scaling processing on the next frame of audio signal of the current frame of audio signal according to the updated audio scaling processing coefficient.

In one embodiment, the method further comprises:

setting a threshold value according to a data overflow critical value;

and obtaining the threshold crossing rate by counting the frequency range in which the energy amplitude of the current frame audio signal exceeds the threshold value in a preset frequency range.

In one embodiment, before the detecting that the amplitude of the energy of the audio signal of the current frame is within the threshold rate of the preset frequency range, the method further includes:

and carrying out scaling processing on the current frame audio signal according to the audio scaling processing coefficient before updating.

In one embodiment, the audio scaling processing coefficients comprise sound gain coefficients and/or fixed point FFT (fast fourier transform) right shift coefficients.

In one embodiment, the scaling process specifically includes:

carrying out scaling processing on the next frame of audio signal of the current frame of audio signal according to the updated sound gain coefficient; and/or

And during the Fourier transform, carrying out scaling processing on the next frame audio signal of the current frame audio signal according to the updated fixed point FFT (fast Fourier transform) right shift coefficient.

In one embodiment, the fixed-point FFT (fast fourier transform) right-shift coefficient is a right-shift number of a next frame audio signal of the current frame audio signal entering a butterfly operation in a fast fourier transform process.

In one embodiment, the method further comprises:

setting an upper threshold value according to a data overflow critical value; and/or

And setting a lower threshold value according to a data underflow critical value.

In one embodiment, the threshold crossing rate further comprises a first rate and/or a second rate; wherein the content of the first and second substances,

the first ratio is the ratio of the frequency range with data exceeding the upper threshold value in the preset frequency range to the preset frequency range; and

the data underflow rate is the ratio of the frequency range of the data in the preset frequency range lower than the lower threshold value to the preset frequency range.

In an embodiment, the updating the audio scaling factor according to the threshold crossing rate specifically includes:

if the first ratio exceeds a first threshold value, carrying out reduction processing on the audio scaling coefficient according to the first ratio; and

if the first ratio is smaller than a second threshold value, carrying out expansion processing on the audio scaling coefficient according to the first ratio;

wherein the first threshold is greater than or equal to the second threshold.

In an embodiment, the updating the preset scaling factor according to the threshold crossing rate specifically includes:

and if the first ratio does not exceed the first threshold and the second ratio exceeds a third threshold, performing expansion processing on the audio scaling coefficient according to the first ratio and the second ratio.

In a second aspect of the embodiments of the present invention, an apparatus for scaling an audio signal is provided, the apparatus comprising:

the acquisition module is used for acquiring the audio signal of the current frame;

the detection module is used for detecting the threshold crossing rate of the energy amplitude of the current frame audio signal in a preset frequency range;

the updating module is used for updating the audio zooming processing coefficient according to the threshold crossing rate;

and the scaling processing module is used for scaling the next frame of audio signal of the current frame of audio signal according to the updated audio scaling processing coefficient.

In one embodiment, the apparatus further comprises a threshold module configured to:

setting a threshold value according to a data overflow critical value;

In one embodiment, the scaling module is further configured to: and before the threshold crossing rate of the energy amplitude of the current frame audio signal in a preset frequency range is detected, carrying out scaling processing on the current frame audio signal according to the audio scaling processing coefficient before updating.

In one embodiment, the update module is configured to:

updating the sound gain coefficients and/or for updating the fixed point FFT (fast fourier transform) right shift coefficients.

In an embodiment, the scaling module is specifically configured to:

And during the fast Fourier transform, carrying out scaling processing on the next frame of audio signal of the current frame of audio signal according to the updated fixed point FFT (fast Fourier transform) right shift coefficient.

In one embodiment, the fixed-point FFT (fast fourier transform) right-shift coefficient is a right-shift number of the next frame audio signal of the current frame audio signal entering a butterfly operation during the fast fourier transform.

In one embodiment, the threshold module is further configured to:

In an embodiment, the update module is specifically configured to:

wherein the first threshold is greater than or equal to the second threshold.

In an embodiment, the update module is specifically configured to:

According to the technical scheme provided by the embodiment of the invention, the threshold crossing rate in the preset frequency range is obtained, so that the effective frequency band and the irrelevant frequency band of the audio signal can be distinguished and processed aiming at different application scenes, and the audio signal scaling processing scheme is selected in a targeted manner according to the condition that the data in the effective frequency band exceeds the preset threshold value, so that the technical effect of audio signal scaling processing is further improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 shows a flow chart of a method of audio signal scaling processing according to an embodiment of the invention;

FIG. 2 is a flow chart illustrating another method of scaling an audio signal according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an apparatus for scaling an audio signal according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another apparatus for scaling an audio signal according to an embodiment of the present invention;

FIG. 5A is a diagram illustrating a theoretical spectrum of a current frame audio signal according to an embodiment of the present invention;

FIG. 5B is a diagram illustrating an actual spectrum of a current frame audio signal according to an embodiment of the present invention;

FIG. 5C is a schematic diagram showing a spectrum of an audio signal of a next frame subjected to a scaling process according to an embodiment of the present invention;

fig. 6 shows a schematic diagram of an apparatus for scaling an audio signal according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Exemplary method

The embodiment of the invention provides a method for scaling an audio signal.

Fig. 1 is a schematic flow chart of a method of scaling an audio signal according to an embodiment of the present invention. As shown in fig. 1, the method includes, but is not limited to, S110 to S140, and specifically, the steps include:

s110, acquiring the audio signal of the current frame.

And S120, detecting the threshold crossing rate of the energy amplitude of the current frame audio signal in a preset frequency range.

It can be understood by those skilled in the art that, before step S120, a Fast Fourier Transform (FFT) may be performed on the current frame audio signal according to the prior art, so as to convert the current frame audio signal from a time domain signal to a frequency domain signal, and then, in the subsequent technical solution, the current frame audio signal can be distinguished and processed in a frequency domain.

And S130, updating the audio scaling processing coefficient according to the threshold crossing rate.

And S140, carrying out scaling processing on the next frame of the current frame audio signal according to the updated audio scaling processing coefficient.

In the traditional audio signal digital processing process, the dynamic range selection and the subsequent audio signal application are two independent processes, and the embodiment of the invention associates the two processes and adopts different schemes for selecting the dynamic range according to different application scenes, so that the dynamic range can be selected more specifically.

In the embodiment of the application, only the condition that the energy amplitude value in the preset frequency range exceeds the threshold value needs to be considered.

In an exemplary embodiment, in a speech recognition application scenario, because in an actual situation, several frequency range segments that may cause data underflow are generally below 100Hz and belong to frequency bands that are not of interest to human ears, only when the amplitude of audio energy in a human voice frequency range exceeds a threshold value, the preset frequency range may be set as the human voice frequency range in the embodiment of the present invention.

In another exemplary embodiment, in a hearing aid application scenario, since the patient can only receive signals in certain frequency ranges, only the situation that the amplitude of the audio energy in the frequency range that can be received exceeds the threshold value needs to be considered, and the embodiment of the present invention may set the preset frequency range as the device operating frequency range.

In still another exemplary embodiment, in some special application scenarios, for example, in a marine organism detection application scenario, since different marine organisms have different howling sound frequencies, the preset frequency range may be specifically set for different detection objects.

Here, the preset frequency range is not particularly limited, and any one or several frequency ranges may be selected as the preset frequency range according to the actual application scenario. The above-mentioned human voice frequency range, marine organism howling frequency range, and hearing aid operating frequency range are used as examples in this application, but are not limited thereto.

It can be understood by those skilled in the art that, regarding step S140, for a continuous audio frame, since the energy value between the current frame audio and the next frame audio has coherence, the above-mentioned technical solution of performing scaling processing on the next frame of the current frame audio signal according to the updated audio scaling processing coefficient in step S140 may be adopted. This is equivalent to adjusting the scaling coefficient of the next frame of the current frame audio signal according to the detection result of the current frame audio signal.

Embodiments of the present invention are further described below in conjunction with fig. 2.

In one embodiment, before the performing fast fourier transform on the current frame audio signal, the method may further include:

By adopting the technical scheme for carrying out the scaling processing on the current frame audio signal according to the audio scaling processing coefficient before updating, the situation that the threshold crossing rate of each frequency band cannot be distinguished due to the fact that the threshold crossing rate of the unprocessed audio signal is too high (for example, the full frequency band exceeds the threshold value) can be avoided, and therefore the more accurate threshold crossing rate value can be calculated by adopting the technical scheme.

In an embodiment, the audio scaling processing coefficient before updating may be an audio scaling processing coefficient obtained according to an over-threshold rate of a previous frame of the current frame audio signal, or may be a preset audio scaling processing coefficient.

Step S130 of the embodiment of the present invention is further described below with reference to fig. 3 and 4.

In a specific embodiment, the audio scaling processing coefficient and the scaling of the audio according to the audio scaling processing coefficient may include:

(1) the audio frequency scaling processing coefficient is a sound gain coefficient

It will be understood by those skilled in the art that the sound gain coefficient is actually a parameter of the gain multiplier, and the technical solution of performing the audio scaling process by controlling the sound gain coefficient may include: the gain multiplier receives an original audio signal input by the microphone, multiplies the intensity value of the audio signal by a set sound gain coefficient, and outputs the audio signal after gain.

In one embodiment, the specific implementation of controlling the sound gain coefficient for audio scaling processing may be represented by turning up or turning down the audio volume.

In one embodiment, a comparison table of the threshold crossing rate and the gain coefficient may be preset and pre-stored, and the gain coefficient may be obtained directly by means of a lookup table according to the detected threshold crossing rate.

As shown in fig. 3, in an embodiment, the scaling process based on the sound gain coefficient specifically includes: and before Fourier transformation, carrying out scaling processing on the next frame of the current frame audio signal according to the sound gain coefficient.

(2) The audio scaling processing coefficient is a fixed-point FFT right shift coefficient.

In a specific embodiment, the fixed-point FFT right shift coefficient is a right shift number of the current frame audio signal data in a product operation of a butterfly operation during the fixed-point FFT transformation.

Those skilled in the art will appreciate that the right shift operation is a shift of a binary bit operand to the right by the number of bits specified to be shifted, the shifted bits are discarded, and the left shifted null is either uniformly 0-filled or sign-filled, depending on the machine. The audio signal can be subjected to scaling processing through the right shift operation, and the operation amount is remarkably saved.

As shown in fig. 4, in an embodiment, the scaling based on the fixed-point FFT right shift coefficient specifically includes: and when the audio signal is subjected to Fourier transform, carrying out scaling processing on the next frame of the current frame audio signal according to the fixed-point FFT right shift coefficient.

In one embodiment, the above-mentioned technical solution of scaling audio according to the sound gain coefficient and the fixed-point FFT right-shift coefficient can be used alone or in combination. Where multiple coefficients are used in combination, further processing may be performed, for example, assigning different weights to the sound gain coefficient and the fixed-point FFT right-shift coefficient.

Here, the type of the audio scaling processing coefficient is not particularly limited, and it may be a coefficient of a processing signal that is obtained by performing FFT operation on an audio signal or by being incorporated into an audio processing flow. In the present application, the sound gain coefficient and/or the fixed point FFT right shift coefficient are taken as examples, but not limited thereto.

By adopting the sound gain coefficient and/or the fixed-point FFT right shift coefficient as the parameters of the audio zooming processing coefficient and utilizing the original parameters in the audio processing system for detection, on one hand, no extra operation amount is generated, and on the other hand, errors possibly generated in the audio secondary detection process are avoided.

Embodiments of the present invention are further described in detail below with reference to fig. 5A, 5B, and 5C.

In one embodiment, as shown in FIG. 5A, the threshold crossing rate can be further divided into a first ratio and a second ratio; the first ratio is the ratio of the frequency range in which the data in the preset frequency range exceeds the upper threshold to the preset frequency range; the second ratio is a ratio of a frequency range in which data exceeds a lower threshold in a preset frequency range to the preset frequency range.

Further, the upper threshold is specifically set according to a data overflow threshold; the lower threshold value is specifically set according to a data underflow threshold value.

For example, if the overflow threshold of the system is a, the upper threshold may be set to 90% a.

In the above technical solution of setting the threshold value according to the data overflow threshold value, the embodiment of the present invention may specifically set the overflow threshold value within the overflow threshold value according to the actual situation, for example, the threshold value may be the same as the overflow threshold value, and the embodiment of the present invention only takes the above 90% as an example, but is not limited thereto.

As will be understood by those skilled in the art, overflow refers to the conversion of energy value data into binary bit widths greater than the highest range that the bit width of the device can express. Underflow refers to the lowest range of energy values that can be expressed by the bit width of the device, for example, when the energy value is 0.4, since the computer can only express the energy value in an integer form, the energy value will be expressed as 0, and distortion will also be caused.

In summary, the first ratio and the second ratio are two different parameter types, and there is even a relative relationship between the two. The embodiment of the invention can comprehensively consider the data by adopting the technical scheme of further distinguishing the threshold crossing rate into the first data ratio and the second data ratio, thereby achieving the purpose of distinguishing and processing the audio signals.

The embodiment of the invention sets the threshold value according to the data critical value and controls the audio zooming processing coefficient according to the threshold value, compared with the technical scheme of directly counting the frequency range positioned in the data overflow critical value, the embodiment of the invention is more beneficial to collecting and observing data, and further, a smaller double threshold value is set in the middle of the overflow critical value, namely a certain margin is left between the threshold value and the corresponding overflow critical value, and the amplitude of the audio energy is controlled in the threshold value as far as possible, so that even if the audio signal of the next frame controlled by the zooming coefficient exceeds the threshold value, the audio signal is still in the data overflow critical value, and the reduction of the data overflow rate is further facilitated.

In an exemplary embodiment, as shown in fig. 5A, a theoretical spectrogram of a current frame audio is shown, in which an upper dotted line Y2 indicates a data overflow threshold, a lower dotted line Y1 indicates a data underflow threshold, and a preset frequency range is m-n, and it can be seen that a situation in which data is greater than the overflow threshold occurs in the preset frequency range.

It can be understood by those skilled in the art that fig. 5A is only a theoretical spectrogram, however, in reality, the overflow data cannot be represented really, as shown in fig. 5B, in reality, the part of the current frame audio signal exceeding the overflow threshold is output as Y2, and accordingly, all the band energy less than Y1 is output as 0, resulting in distortion.

Therefore, in this case, the embodiment of the present invention may set an upper threshold Y3 and a lower threshold Y4, and calculate the first ratio by counting all frequency bands exceeding the upper threshold Y3 in the preset frequency band (m-n); the second ratio is calculated by detecting all frequency bands below the lower threshold Y4.

The threshold crossing rate is calculated in an exemplary embodiment of the invention using the following formula:

s＝[(b-a)]/(n-m),

as can be seen from fig. 5, the frequency bands lower than the lower threshold in the embodiment of the present invention are outside the preset frequency band range, and therefore are not considered.

In one embodiment, the preset frequency range may be a range or a set of several frequency ranges.

Further, as shown in fig. 5C, with the above-described technical solution, the next frame of the current frame audio signal is scaled according to the above-mentioned calculated threshold crossing rate s, so that most of the band energies fall within the value below Y2.

As will be understood by those skilled in the art, in practice, the present invention adopts a technical solution of updating an audio scaling coefficient according to a threshold crossing rate of a current frame audio signal in a preset frequency range, and scaling a next frame of the current frame audio signal according to the updated audio scaling coefficient, and since the audio signal has coherence and the energy in the frequency range is not easy to generate transient, a spectrogram similar to the current frame audio signal is adopted as an example spectrogram in the embodiment of the present invention.

Specifically, as shown in fig. 5B, when the threshold crossing rate is calculated, the current frame audio signal only has (a-B) segments exceeding the upper threshold value in the preset frequency range, and signals lower than the lower threshold outside the preset frequency range are not considered, so that the scaling method adopted in the embodiment of the present invention is more targeted compared with the full-band threshold crossing rate. In a specific embodiment, the updating the audio scaling factor according to the threshold crossing rate specifically includes:

if the first ratio of the data exceeds a first threshold value, carrying out reduction processing on the audio scaling coefficient according to the first ratio; and

if the first ratio of the data is smaller than a second threshold value, carrying out expansion processing on the audio scaling coefficient according to the first ratio;

wherein the first threshold is greater than or equal to the second threshold.

In one exemplary embodiment, the audio scaling processing coefficients may be updated based only on the first ratio of the audio signal. Specifically, if the first ratio exceeds 10% (first threshold), the audio scaling factor is scaled down according to the first ratio; if the first ratio of the current frame audio signal does not exceed 5% (second threshold), reducing the audio scaling coefficient according to the first ratio; if the first ratio is between 5% (the second threshold value) and 10% (the first threshold value), the audio scaling processing coefficient is not updated, and the audio signal of the next frame is processed by the current audio scaling processing coefficient, so that the first ratio of the audio data can be controlled to be between 5% and 10% as much as possible by the technical scheme.

In an exemplary embodiment, the first threshold may be equal to the second threshold.

In the embodiment of the present invention, since the data lower than the lower threshold is actually a relatively fine audio model, and generally belongs to an energy range not concerned by human ears, when the application scenario of the embodiment of the present invention does not require fine detection of fine sounds, the technical effect of less computation and simpler computation logic can be achieved by using the above technical scheme of performing scaling processing only according to the first ratio of the audio data.

In one embodiment, the updating the preset scaling factor according to the threshold crossing rate may include: and if the first ratio does not exceed the first threshold and the second ratio exceeds a third threshold, performing expansion processing on the audio scaling coefficient according to the first ratio and the second ratio.

In one embodiment, an embodiment of the present invention updates the audio scaling processing coefficients based on a first ratio and a second ratio, wherein the first ratio factor is prioritized over the second ratio factor. Specifically, if the first ratio does not exceed 10% (the first threshold), the second ratio is further determined, and if the second ratio exceeds 10% (the third threshold), the audio scaling processing coefficient is expanded according to the first ratio and the second ratio.

Here, the manner of updating the audio scaling coefficient according to the threshold crossing rate is not particularly limited, and the audio scaling coefficient may be calculated by performing priority ranking on various types of threshold crossing rates and then controlling the data scaling coefficient through a decision tree model, or by performing weighted combination on the first ratio and the second ratio, or by establishing a mathematical model based on each ratio factor. The present application is only exemplified by the above technical solutions, but not limited thereto.

In the embodiment of the present invention, by adopting the above technical scheme of comprehensively considering the first ratio and the second ratio and calculating a scaling processing coefficient, a more accurate scaling processing technical effect can be achieved.

In summary, according to the technical scheme provided by the embodiment of the present invention, by acquiring the threshold crossing rate in the preset frequency range, the effective frequency band and the irrelevant frequency band of the audio signal can be distinguished and processed for different application scenarios, and the audio signal scaling scheme is selected in a targeted manner according to the condition that the data in the effective frequency band exceeds the preset threshold, so that the technical effect of the audio signal scaling is further improved.

Exemplary device

The embodiment of the invention provides a scaling processing device of an audio signal.

Fig. 6 is a schematic flow chart of an apparatus 600 for scaling an audio signal according to an embodiment of the present invention. As shown in fig. 6, the apparatus 600 includes, but is not limited to:

an obtaining module 610, configured to obtain an audio signal of a current frame;

a detecting module 620, configured to detect a threshold crossing rate of the energy amplitude of the current frame audio signal within a preset frequency range;

those skilled in the art can understand that before the detection module performs detection, a Fast Fourier Transform (FFT) may be performed on the current frame audio signal according to the prior art, so as to convert the current frame audio signal from a time domain signal to a frequency domain signal, and then perform differentiation processing in a frequency domain range in a subsequent technical solution.

An updating module 630, configured to update the audio scaling processing coefficient according to the threshold crossing rate;

and a scaling module 640, configured to perform scaling processing on a next frame of the current frame audio signal according to the updated audio scaling processing coefficient.

In the embodiment of the application, only the condition that the amplitude of the audio energy of the signal in the preset frequency range exceeds the threshold value needs to be considered.

As will be understood by those skilled in the art, regarding the scaling module 640, for a continuous audio frame, since the energy value between the current frame audio and the next frame audio has coherence, the scaling module 640 may adopt a technical solution of scaling the next frame of the current frame audio signal according to the updated audio scaling coefficient, which is equivalent to adjusting the scaling coefficient of the next frame of the current frame audio signal according to the detection result of the current frame audio signal.

In one embodiment, the scaling module 640 may be further configured to: before the fast Fourier transform is carried out on the current frame audio signal, carrying out scaling processing on the current frame audio signal according to the audio scaling processing coefficient before updating.

The update module 630 of an embodiment of the present invention is further described below in conjunction with fig. 3 and 4.

It will be understood by those skilled in the art that the sound gain coefficient is actually a parameter of the gain multiplier, and the technical solution of performing the audio scaling process by controlling the sound gain coefficient may include: modifying the sound gain coefficient in the gain multiplier according to the threshold crossing rate of the current frame audio signal; the gain multiplier receives the audio signal of the next frame input by the microphone, multiplies the intensity value of the audio signal of the next frame by the set sound gain coefficient, and outputs the audio signal of the next frame after the gain.

In a specific embodiment, the fixed-point FFT right shift coefficient is a right shift number of a next frame of the current frame audio signal into a product operation of a butterfly operation in the fixed-point FFT process.

As shown in fig. 4, in an embodiment, the scaling based on the fixed-point FFT right shift coefficient specifically includes: modifying a fixed point FFT right shift coefficient in an FFT module according to the threshold passing rate of the current frame audio signal; the FFT module receives the next frame of audio signals input by the microphone, processes the next frame of audio signals according to the updated fixed-point FFT right shift coefficient, and outputs the processed next frame of audio signals. .

In one embodiment, as shown in fig. 5A, the threshold crossing rate in the embodiment of the present invention can be further divided into a first rate and a second rate; the first ratio is the ratio of the frequency range of data overflow in the preset frequency range to the preset frequency range; the second ratio is a ratio of a frequency range of data underflow within a preset frequency range to the preset frequency range.

In summary, the first ratio of data and the second ratio of data are two different parameter types, and there is even a relative relationship between the two. The embodiment of the invention can comprehensively consider the data by adopting the technical scheme of further distinguishing the threshold crossing rate into the first data ratio and the second data ratio, thereby achieving the purpose of distinguishing and processing the audio signals.

Therefore, in this case, the overflow band width can be calculated by detecting all bands with the critical value of Y2 in the preset band (m-n); by detecting all the frequency bands with the value of 0, the underflow frequency band width is calculated.

In the embodiment of the invention, an upper threshold value Y3 and a lower threshold value Y4 can be set, and a first ratio is calculated by counting all frequency bands exceeding the upper threshold value Y3 in preset frequency bands (m-n); the second ratio is calculated by detecting all frequency bands below the lower threshold Y4.

s＝[(b-a)]/(n-m),

In one embodiment, the preset frequency range may be a range or a set of several frequency ranges. Further, as shown in fig. 5C, with the above-described technical solution, the next frame of the current frame audio signal is scaled according to the above-mentioned calculated threshold crossing rate s (specifically, the first rate in this embodiment), so that most of the band energy falls within the value below Y2.

Specifically, as shown in fig. 5B, when the threshold crossing rate is calculated, the current frame audio signal only has (a-B) segments exceeding the upper threshold value in the preset frequency range, and the signal with data outside the preset frequency range lower than the lower threshold is not considered, so that the scaling processing device adopted in the embodiment of the present invention has more pertinence compared with the full-band threshold crossing rate.

In a specific embodiment, the processing module 650 is further configured to:

wherein the first threshold is greater than or equal to the second threshold.

In one exemplary embodiment, the audio scaling processing coefficients may be updated based only on the first ratio of the audio signal. Specifically, if the first ratio of the data exceeds 10% (first threshold), the audio scaling factor is scaled down according to the first ratio; if the first ratio of the data of the current frame audio signal does not exceed 5 percent (a second threshold), carrying out reduction processing on the audio scaling coefficient according to the first ratio; if the first ratio is between 5% (the second threshold value) and 10% (the first threshold value), the audio scaling processing coefficient is not updated, and the audio signal of the next frame is processed by the current audio scaling processing coefficient, so that the first ratio of the audio data can be controlled to be between 5% and 10% as much as possible by the technical scheme.

In the embodiment of the present invention, since the underflow data is actually a relatively fine audio model, which generally belongs to an energy range that is not in the interest of human ears, when the application scenario of the embodiment of the present invention does not require fine detection of fine sound, the technical effect of less computation and simpler computation logic can be achieved by using the above technical scheme of performing scaling processing only according to the first ratio of the audio data.

In one embodiment, the processing module 650 may be further configured to: and if the first data ratio does not exceed the first threshold and the second data ratio exceeds a third threshold, performing expansion processing on the audio scaling coefficient according to the first ratio and the second ratio.

Here, the manner of updating the audio scaling processing coefficient according to the threshold crossing rate is not specifically limited, and the audio scaling processing coefficient may be updated by performing priority ranking on various types of threshold crossing rates and then updating the audio scaling processing coefficient through a decision tree model, or updating the audio scaling processing coefficient by performing weighted combination on various types of threshold crossing rates, or calculating the audio scaling processing coefficient after establishing a mathematical model based on various types of threshold crossing rate factors. The present application is only exemplified by the above technical solutions, but not limited thereto.

In summary, in the technical solution provided by the embodiment of the present invention, by acquiring the data overflow rate exceeding threshold rate in the preset frequency range, the effective frequency band and the irrelevant frequency band of the audio signal can be distinguished and processed for different application scenarios, and the audio signal scaling processing scheme is selected in a targeted manner according to the condition that the data overflow condition in the effective frequency band exceeds the preset threshold value, so as to further improve the technical effect of the audio signal scaling processing.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for scaling an audio signal, the method comprising:

acquiring a current frame audio signal;

detecting the threshold passing rate of the energy amplitude of the current frame audio signal in a preset frequency range;

2. The method of claim 1, further comprising:

setting a threshold value according to a data overflow critical value;

3. The method of claim 1, wherein prior to the detecting that the amplitude of the energy of the audio signal of the current frame is above a threshold rate within a preset frequency range, the method further comprises:

4. The method of claim 1, wherein the audio scaling processing coefficients comprise sound gain coefficients and/or fixed point FFT (fast fourier transform) right shift coefficients.

5. The method according to claim 4, wherein the scaling process specifically comprises:

6. The method of claim 4, wherein the fixed-point FFT (fast Fourier transform) right-shift coefficient is a right-shift number of a next frame audio signal of the current frame audio signal when entering a butterfly operation in a fast Fourier transform process.

7. The method of claim 2, further comprising:

8. The method of claim 7, wherein the threshold crossing rate further comprises a first rate and/or a second rate; wherein the content of the first and second substances,

9. The method of claim 8, wherein the updating the audio scaling factor according to the threshold crossing rate specifically comprises:

wherein the first threshold is greater than or equal to the second threshold.

10. The method according to claim 8, wherein the updating the preset scaling factor according to the threshold crossing rate specifically comprises:

11. An apparatus for scaling an audio signal, the apparatus comprising:

12. The apparatus of claim 11, further comprising a threshold module configured to:

setting a threshold value according to a data overflow critical value;

13. The apparatus of claim 11, wherein the scaling processing module is further configured to: and before the threshold crossing rate of the energy amplitude of the current frame audio signal in a preset frequency range is detected, carrying out scaling processing on the current frame audio signal according to the audio scaling processing coefficient before updating.

14. The apparatus of claim 11, wherein the update module is to:

15. The apparatus of claim 14, wherein the scaling module is specifically configured to:

16. The apparatus according to claim 14, wherein the fixed-point FFT (fast fourier transform) right-shift coefficient is a right-shift number of a next frame audio signal of the current frame audio signal when entering a butterfly operation in a fast fourier transform process.

17. The apparatus of claim 12, wherein the threshold module is further configured to:

18. The apparatus of claim 17, wherein the threshold crossing rate further comprises a first rate and/or a second rate; wherein the content of the first and second substances,

19. The apparatus of claim 18, wherein the update module is specifically configured to:

wherein the first threshold is greater than or equal to the second threshold.

20. The apparatus of claim 18, wherein the update module is specifically configured to:

and if the first data ratio does not exceed the first threshold and the second data ratio exceeds a third threshold, performing expansion processing on the audio scaling coefficient according to the first ratio and the second ratio.