CN111405419B

CN111405419B - Audio signal processing method, device and readable storage medium

Info

Publication number: CN111405419B
Application number: CN202010221446.2A
Authority: CN
Inventors: 邢文峰
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2022-02-15
Anticipated expiration: 2040-03-26
Also published as: CN111405419A

Abstract

The embodiment of the application provides an audio signal processing method, an audio signal processing device and a readable storage medium, wherein the method comprises the following steps: acquiring an audio signal, and performing spectrum analysis on the audio signal to acquire a first spectrum corresponding to the audio signal; determining target frequency components according to the amplitudes of the frequency components in the first frequency spectrum and a masking curve of a target object, wherein the target frequency components are the frequency components of which the amplitudes are smaller than the masking curve in the frequency components; and adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component to acquire a processed audio signal. The scheme is smaller than the masking curve through the suppression intensity, and the amplitude of the frequency component with higher masking signal ratio is masked, so that the masking effect of the frequency component which cannot be heard originally on other effective frequency components is reduced, the sound definition is improved, and a user obtains a better auditory effect.

Description

Audio signal processing method, device and readable storage medium

Technical Field

Embodiments of the present disclosure relate to audio processing technologies, and in particular, to an audio signal processing method and apparatus, and a readable storage medium.

Background

At present, more and more devices with audio playing function are provided, such as tv, smart phone, earphone, MP3, MP4, etc. For a device with an audio playing function, processing an audio signal to obtain a better hearing effect is a vital task. In order to make different types of objects obtain better auditory effects, people propose the concept of 'exclusive human voice effect', which refers to processing audio signals according to the auditory characteristics of different types of objects, so as to obtain the sound effect formulated for different types of objects.

In the prior art, a hearing threshold of a target object is generally obtained, and then, according to a masking curve between different frequency bands of an audio signal and the hearing threshold of the target object, a frequency component with an audio signal intensity smaller than the hearing threshold of the target object is compensated, so that the target object can hear an original sound which cannot be heard.

However, by compensating the frequency component whose audio signal intensity is less than the hearing threshold of the target object in the above manner, although the frequency component can be heard, due to the auditory masking effect between the signals of different frequency components, the masking effect of the compensated frequency component on other frequency components may be enhanced, and thus the signals of other frequency components may not be heard, resulting in a poor auditory effect.

Disclosure of Invention

The embodiment of the application provides an audio signal processing method, an audio signal processing device and a readable storage medium, so as to improve the auditory effect of exclusive human voice sound effect.

In a first aspect, an embodiment of the present application provides an audio signal processing method, including:

acquiring an audio signal, and performing spectrum analysis on the audio signal to acquire a first spectrum corresponding to the audio signal;

determining target frequency components according to the amplitudes of the frequency components in the first frequency spectrum and a masking curve of a target object, wherein the target frequency components are the frequency components of which the amplitudes are smaller than the masking curve in the frequency components;

and adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component to acquire a processed audio signal, wherein the masking signal ratio is used for indicating the strength of the masking effect of the target frequency component, and the frequencies of the other frequency components are different from the frequency of the target frequency component.

Optionally, the adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component to obtain the processed audio signal includes:

if the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold, obtaining a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient.

Optionally, if the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold, obtaining a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient, includes:

if the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold and smaller than a second preset threshold, acquiring a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient;

the method further comprises the following steps: if the masking signal ratio of the target frequency component is greater than or equal to the second preset threshold, adjusting the amplitude of the target frequency component to be 0; wherein the first preset threshold is smaller than the second preset threshold.

Optionally, the gain factor satisfies the formula:

where S denotes a gain factor, msr denotes a masking signal ratio, and K denotes a first preset threshold.

Optionally, the method further comprises:

according to a preset frequency cut-off interval, deleting frequency components outside the preset frequency cut-off interval in the frequency spectrum of the processed audio signal to obtain a second frequency spectrum;

and performing linear stretching processing on the frequency components in the second frequency spectrum to obtain the audio signal subjected to the linear stretching processing.

Optionally, the performing linear stretching processing on the frequency component in the second frequency spectrum to obtain the audio signal after the linear stretching processing includes:

sampling in the second frequency spectrum according to a preset frequency interval to obtain N sampling frequency components;

mapping the N sampling frequency components to a preset stretching frequency interval according to a preset stretching frequency interval to obtain N mapping frequency components;

obtaining the processed audio signal according to the N mapping frequency components, wherein amplitudes of the N mapping frequency components are respectively equal to amplitudes of the N sampling frequency components.

Optionally, the performing linear stretching processing on the frequency component in the second frequency spectrum to obtain a processed audio signal includes:

classifying the frequency components in the second frequency spectrum according to audio objects, and determining the audio object corresponding to each frequency component;

if the frequency components in the second frequency spectrum are determined to correspond to the M audio objects, S stretching frequency widths are determined according to a preset stretching frequency interval and a preset frequency cut-off interval, wherein M is an integer larger than or equal to 2, and S is an integer larger than or equal to 1;

and for the M audio objects, inserting the stretching bandwidth between the continuous frequency components respectively included by the two adjacent audio objects to obtain the audio signal after the stretching processing.

In a second aspect, an embodiment of the present application provides an audio signal processing apparatus, including:

the acquisition module is used for acquiring an audio signal;

the processing module is used for carrying out spectrum analysis on the audio signal to obtain a first spectrum corresponding to the audio signal; determining the target frequency component according to the amplitude of each frequency component in the first frequency spectrum and a masking curve of a target object, wherein the target frequency component is a frequency component of which the amplitude is smaller than the masking curve in each frequency component;

the processing module is further configured to adjust an amplitude of the target frequency component according to a masking signal ratio of the target frequency component, and acquire a processed audio signal, where the masking signal ratio is used to indicate a masking effect strength of the target frequency component, and the frequencies of the other frequency components are different from the frequencies of the target frequency component.

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, including: carrying out a procedure;

the program is executed by a processor to perform the audio signal processing method according to any one of the first aspect.

In a fourth aspect, an embodiment of the present application further provides an audio processing apparatus, including: memory, processor, and computer program instructions;

the memory stores the computer program instructions;

the processor executes the computer program instructions to perform the audio processing method according to any of the first aspect.

In a fifth aspect, an embodiment of the present application further provides a display device, including: a display panel, a speaker and an audio signal processing device;

the display panel is used for displaying a user interface;

the loudspeaker is used for outputting the processed audio signal acquired by the audio processing device;

the audio signal processing apparatus is configured to perform the audio signal processing method according to any one of the first aspect.

The embodiment of the application provides an audio signal processing method, an audio signal processing device and a readable storage medium, wherein the method comprises the following steps: acquiring an audio signal, and performing spectrum analysis on the audio signal to acquire a first spectrum corresponding to the audio signal; determining target frequency components according to the amplitudes of the frequency components in the first frequency spectrum and a masking curve of a target object, wherein the target frequency components are the frequency components of which the amplitudes are smaller than the masking curve in the frequency components; and adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component to acquire a processed audio signal. The scheme is smaller than the masking curve through the suppression intensity, and the amplitude of the frequency component with higher masking signal ratio is masked, so that the masking effect of the frequency component which cannot be heard originally on other effective frequency components is reduced, the sound definition is improved, and the user can obtain better hearing effect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1a is a schematic masking curve at 4000H for a 20 year old user as provided herein;

FIG. 1b is a schematic diagram of a masking curve at 4000H for an 80 year old user as provided herein;

FIG. 1c is a schematic diagram of masking curves of a 20 year old user and an 80 year old user in a preset frequency range, respectively, provided by the present application;

FIG. 1d is a diagram illustrating an audio signal processing method according to the prior art;

fig. 2 is a schematic structural diagram of a display device with an audio playing function suitable for the present application;

fig. 3 is a flowchart of an audio signal processing method according to an embodiment of the present application;

FIG. 4 is a diagram of a user interface provided by the present application;

FIG. 5 is another user interface diagram provided herein;

fig. 6 is a flowchart of an audio signal processing method according to another embodiment of the present application;

FIG. 7 is a plot of log (A) as shown in the present application_n)、log(int(A_n) And log (int (A))_n) +1) linear relationship diagram;

fig. 8 is a schematic spectrum diagram of a formant containing two audio objects respectively shown in the present application;

fig. 9 is a schematic spectrum diagram of formants corresponding to three audio objects according to the present application;

fig. 10 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an audio signal processing apparatus according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the continuous development of electronic technology, the variety of electronic devices is more and more abundant, and the audio playing function also becomes one of the basic functions of most electronic devices. For a device with an audio playing function, processing an audio signal to obtain a better hearing effect is a vital task. In order to make different types of objects obtain better auditory effects, people propose the concept of 'exclusive human voice effect', which refers to processing audio signals according to the auditory characteristics of different types of objects, so as to obtain the sound effect formulated for different types of objects. For example, for the elderly, the elderly may not be able to hear the sound of some frequency components in the audio signal output by the device due to hearing loss, and therefore, the "exclusive sound effect" formulated for the elderly means that the audio signal is processed according to the hearing characteristics of the elderly, so that the elderly can hear the sound that cannot be heard originally.

In the prior art, a hearing threshold of a target object is generally obtained, and then, according to a masking curve between different frequency bands of an audio signal and the hearing threshold of the target object, a frequency component with an intensity smaller than the hearing threshold of the target object in the audio signal is compensated, so that the target object can hear an original sound which cannot be heard. However, by compensating the frequency components of the audio signal whose intensity is less than the hearing threshold of the target object in the above manner, although the frequency components can be heard clearly, due to the auditory masking effect of different frequency components, the masking effect of the frequency components after compensation on some other frequency components may be enhanced, and thus some other frequency components that can be heard originally may not be heard, and thus the auditory effect is poor.

For example, fig. 1a shows a hearing masking curve at 4000Hz for a user 20 years old, the masking curve representing the masking effect of the 4000Hz frequency components on other frequency components, which would be masked by the 4000Hz frequency components and would not be perceived by the user if the intensities of the other frequency components are below the masking curve, and if the intensities of the frequency components marked as "1" in fig. 1a are lower than the value of the masking curve, the frequency component 1 would not be perceived by the user; if the intensity of the frequency component labeled "2" in fig. 1a is higher than the value of the masking curve, then this frequency component 2 may be perceived by the user.

Also shown in fig. 1b is the hearing masking curve at 4000Hz for a user aged 80, which masking curve represents the masking effect of the 4000Hz frequency components on the other frequency components, as shown with reference to fig. 1b, wherein only the frequency components labeled "1" and "2" are perceivable by the user aged 80, and the other frequency components are not perceivable by the user. Comparing the masking curves shown in fig. 1a and 1b, it can be seen that the masking effect of the curve is stronger for the aged 80 years than for the younger 20 years, and the frequency selectivity is poorer. For example, the case where the aged 80 and the young 20 hear the same frequency components, the aged can hear only 2 frequency components, and the young can hear 5 frequency components obviously shows that the frequency selectivity of the aged is poor.

In practice, not only the frequency components with high intensity have a masking effect on the frequency components with low intensity, but also the frequency components with low intensity have a masking effect on the frequency components with high intensity, that is, each frequency component does not have a masking effect but other frequency components, all other frequency components also have a masking effect on the current frequency component, the masking effects of all the frequency components form a masking curve, the sound with the signal intensity below the masking curve cannot be perceived by the user, and only the sound with the signal intensity above the masking curve can be perceived by the user. For example, in fig. 1c, a curve "a" shows a schematic diagram of a masking curve of a user aged 80 in a preset frequency range, and a curve "b" shows a schematic diagram of a masking curve of a user aged 20 in a preset frequency range.

Referring to the curves "a" and "b" in fig. 1c, it can be seen that the frequency selectivity of the user aged 80 is poor, the masking curve in the preset frequency range is relatively flat, and the value of the masking curve is much higher than that of the user aged 20.

With the audio processing method in the prior art, the processing is usually performed on the frequency components below the masking curve, and if the intensity of the frequency components is closer to the masking curve, the signal intensity of the frequency components is enhanced to be protruded above the masking curve, as shown by the frequency components marked as "c" and "d" in fig. 1 d.

However, in the method in the prior art, although the sound that cannot be heard before can be made clearer than before, the masking effect of the increased intensity on other frequency components can also be changed, which may cause the sound of other frequency components that can be heard before to be unable to be heard, and thus the hearing effect is poor. Especially for some users with relatively smooth masking curves, such as the elderly, who have poor frequency selectivity, the possibility of this is higher by adjusting in the prior art.

Therefore, on the basis of the above problems in the prior art, the present application provides an audio signal processing method, which obtains a first spectrum corresponding to an audio signal by performing spectrum analysis on the audio signal; and then, adjusting the amplitude of each frequency component of which the amplitude on the first frequency spectrum meets the preset condition according to the masking curve of the target object, so as to suppress the larger frequency component of the masking signal, reduce the influence of the larger frequency component of the masking signal on other frequency components, improve the definition of the output audio signal and improve the auditory effect.

In the scheme, as can be seen from the analysis of the masking effect, not only the frequency components with high intensity have the masking effect on the frequency components with low intensity, but also the frequency components with low intensity also have the masking effect on the frequency components with high intensity, and in practical situations, the frequency cost with low intensity cannot be perceived by the user, so that by adjusting some frequency components which cannot be heard in the audio signal in advance and are far away from the masking curve, the frequency components do not weaken the intensity of effective frequency components (the effective frequency components referred to herein are sounds which can be perceived by the user), but can reduce the influence on the effective frequency components, so that some frequency components with the intensity close to the masking curve are protruded above the masking curve, thereby improving the definition of the output audio signal and improving the auditory effect.

Before introducing the audio signal processing method provided by the present application, a detailed description is first given of a scenario in which the present application is applicable:

the audio signal processing method provided by the application is suitable for various electronic devices with audio playing functions, such as televisions, smart phones, IPADs, notebook computers, earphones and the like.

Fig. 2 is a first schematic structural diagram of a display device with an audio playing function suitable for the present application. Referring to fig. 2, the display device 100, for example, a television, includes at least: a display panel 101, a speaker 102, and an audio signal processing device 103.

The display panel 101 is used for displaying a user interface; the user can input a user instruction to the display device by operating the remote control device according to the content displayed on the user interface so as to instruct the display device to execute corresponding operation.

The audio signal processing device 103 is configured to execute the audio signal processing method provided in the embodiment of the present application, and output the processed audio signal to the speaker 102.

And a speaker 102 for outputting the audio signal output by the audio signal processing apparatus 103.

It should be understood that the display device further includes: other modules (not shown in fig. 2) such as a driving board for driving the display panel, a power supply device, a chassis, a housing, and the like; the power supply device is used for supplying electric energy to each module in the display panel.

The following describes the audio signal processing method provided by the present application in detail through several specific embodiments:

fig. 3 is a flowchart of an audio signal processing method according to an embodiment of the present application. For example, the present embodiment will be described in detail by taking the execution main body as the display device. Referring to fig. 3, the method of the present embodiment includes:

and S101, acquiring an audio signal.

S102, carrying out spectrum analysis on the audio signal to obtain a first spectrum corresponding to the audio signal.

In one possible implementation, the audio signal may be converted from a time-domain signal to a frequency-domain signal by performing a fast fourier transform on the acquired audio signal. The specific implementation of the fast fourier transform is similar to that in the prior art, and is not described here again.

S103, determining the target frequency component according to the amplitude of each frequency component in the first frequency spectrum and the masking curve of the target object.

The target frequency component is a frequency component of each frequency component of the first frequency spectrum, and the amplitude of the frequency component is smaller than the masking curve of the target object. Specifically, for the target object, if the amplitude of a certain frequency component in the first frequency spectrum is lower than the masking curve of the target object, the target object cannot hear the sound of the frequency component; if the amplitude of a certain frequency component in the first frequency spectrum is higher than the masking curve of the target object, the target object can hear the sound of the frequency component. Therefore, in the present scheme, the target frequency component that needs to be adjusted is determined by comparing the amplitude of each frequency component in the first spectrum with the value of that frequency component in the masking curve of the target object.

In one possible implementation manner, the display device may obtain the masking curve of the target object by: the display device can provide a visual user interface through the display, and a user can input a user instruction to the display device according to the content displayed on the user interface, wherein the user instruction is used for indicating the user type selected by the user; and the display device determines the masking curve of the target object according to the user instruction and the corresponding relation between the pre-configured user type and the masking curve.

It should be noted that, here, the user instruction input may be input by operating a remote control device, or may be input by the user in a voice manner, or may be input by the user through other terminal equipment capable of communicating with the display device, for example, the user instruction input to the display device by operating an application installed on a smart phone. The embodiment of the application is not limited to the specific implementation manner of inputting the user instruction by the user.

Fig. 4 exemplarily shows a user interface provided by the display device, and a plurality of options of different age groups may be provided in the user interface 400 shown in fig. 4, including, for example: options 4021 "20-30 years old", "30-40 years old", "40-50 years old", "50-60 years old", "60-70 years old", and "over 70 years old". The user can select the corresponding age bracket by operating the remote control device according to the requirement, thereby indicating the type of the user. For example, when the user selects the age group "70 years old or older", the display device determines the masking curve corresponding to "70 years old or older" based on the age group "70 years old or older" and the correspondence relationship between the object of each of the age groups and the masking curve, which is arranged in advance.

Fig. 5 exemplarily shows another user interface provided by the display device, and a plurality of options of different user types may be provided in the user interface shown in fig. 5, including, for example: the user can select the corresponding user type by operating the remote control device according to the requirement, thereby indicating the user type. For example, if the user selects the option "old age", the display device determines a masking curve corresponding to "old age" according to the user type "old age" and the pre-configured correspondence between each user type and the masking curve.

Of course, the display device may also provide other user interfaces to determine the masking curve corresponding to the target object. For example, the display device provides a user interface including a text input box 402 thereon, and the user can input information indicating the user type or the like in the text input box 402 by operating the remote control device and input the information indicating the user type to the display device by pressing a "confirm" button on the remote control device.

It should be understood that the user interface provided by the display device is not limited to the above description, and the page layout of the user interface can be set according to the requirement. The foregoing is merely an example description and is not intended to limit the present disclosure to particular implementations.

Alternatively, the correspondence between the user type and the masking curve, which is configured in advance in the display device, may be obtained by:

the method comprises the steps of obtaining a large number of masking curves corresponding to objects of different user types in advance, carrying out statistical analysis on the obtained masking curves of the objects of the same user type to obtain a masking curve capable of representing the object of the user type, and carrying out statistical analysis on the masking curves corresponding to the objects of the user types to obtain a masking curve capable of representing the object of the user types. And then, storing the corresponding relation between the user type and the masking curve corresponding to the user type into a display device.

Thereafter, the display device may determine a masking curve of the target object according to the user type selected by the user.

And S104, adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component, and acquiring the processed audio signal.

The purpose of this step is to suppress the amplitude of the target frequency component according to the masking signal ratio of the target frequency component to reduce the influence of other frequency components on the target frequency component, thereby improving the auditory effect.

The masking signal ratio can be obtained by the ratio of the intensity of other frequency components to the intensity of the target frequency component, and can represent the degree of masking effect of the target frequency component; the larger the masking signal ratio, the stronger the masking effect of the frequency component on the other frequency components, and the smaller the masking signal ratio, the weaker the masking effect of the frequency component on the other frequency components. The "other frequency component" referred to herein means a frequency component different from the target frequency component.

For frequency components with amplitude smaller than the masking curve of the target object and larger masking signal ratio, these frequency components cannot be heard by the target object and may generate masking effect on other frequency components, such as frequency components with amplitude close to the masking curve of the target object, so that other frequency components may not be heard. Here, the "masking curve of the neighboring target object" may be expressed as a difference between the amplitude and the masking curve of the target object satisfying a preset reference value.

Therefore, according to the scheme, the amplitude of the frequency component which is smaller than the masking curve of the target object and the masking signal ratio of which meets the preset condition is adjusted, so that the influence of the frequency components on other frequency components is reduced, and the frequency component of which the intensity is close to the masking curve of the target object is protruded above the masking curve, so that the target object can be heard clearly.

One possible implementation: if the masking signal ratio of the target frequency component is determined to be greater than or equal to a first preset threshold value, obtaining a gain coefficient of the amplitude of the target frequency component according to the masking signal ratio of the target frequency component; and adjusts the amplitude of the target frequency component according to the obtained gain coefficient. And if the masking signal ratio of the target frequency component is determined to be smaller than the first preset threshold value, keeping the amplitude of the target frequency component unchanged.

Another possible implementation: if the masking signal ratio of the target frequency component is determined to be greater than or equal to a first preset threshold value and smaller than a second preset threshold value, obtaining a gain coefficient of the amplitude of the target frequency component according to the masking signal ratio of the target frequency component; and adjusts the amplitude of the target frequency component according to the obtained gain coefficient. And if the masking signal ratio of the target frequency component is determined to be smaller than the first preset threshold value, keeping the amplitude of the target frequency component unchanged. And if the masking signal ratio of the target frequency component is determined to be greater than or equal to a second preset threshold value, adjusting the amplitude of the target frequency component to be 0. The first preset threshold is smaller than the second preset threshold.

Alternatively, in the above two ways, the gain coefficient of the target frequency component may be determined by equation (1):

As can be seen from the formula (1), the gain coefficient S of the target frequency component is greater than 0 and smaller than 1, that is, the formula (1) compresses the amplitude of the target frequency component.

In practical application, more reference thresholds can be set, and different reference threshold ranges correspond to different adjustment strategies, so that the adjustment accuracy is improved.

In the embodiment, firstly, an audio signal is obtained, and spectrum analysis is performed on the audio signal to obtain a first spectrum corresponding to the audio signal; determining a target frequency component according to the amplitude of each frequency component in the first frequency spectrum and a masking curve of a target object, wherein the target frequency component is a frequency component of which the amplitude is smaller than the masking curve in each frequency component; and adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component to acquire the processed audio signal. In the embodiment, the suppression intensity is smaller than the masking curve, and the amplitude of the frequency component with higher masking signal ratio is reduced, so that the masking effect of the frequency component which cannot be heard originally on other effective frequency components is reduced, the sound definition is improved, and a user obtains a better auditory effect. In addition, because the change of the audio signal in the time domain is dynamic and nearly continuous, the amplitude of the target frequency component is compressed, so that the change range of the target frequency component in the time domain is small, and unnecessary noise can be avoided.

Fig. 6 is a flowchart of an audio signal processing method according to another embodiment of the present application. Referring to fig. 6, the method of the present embodiment includes:

s201, obtaining an audio signal.

S202, carrying out spectrum analysis on the audio signal to obtain a first spectrum corresponding to the audio signal.

S203, determining the target frequency component according to the amplitude of each frequency component in the first frequency spectrum and the masking curve of the target object.

And S204, adjusting the amplitude of the target frequency component according to the masking signal ratio of the target frequency component, and acquiring the processed audio signal.

Steps S201 to S204 provided in this embodiment are similar to steps S101 to S104 in the embodiment shown in fig. 3, and refer to the detailed description in the embodiment shown in fig. 3, which is not repeated herein.

And S205, deleting frequency components outside the preset frequency cut-off interval in the frequency spectrum of the processed audio signal according to the preset frequency cut-off interval, and acquiring a second frequency spectrum.

In this embodiment, since sounds outside the preset frequency cutoff interval are hardly perceived by people even if they exist, the second spectrum is obtained by deleting frequency components outside the preset frequency cutoff interval from the spectrum corresponding to the processed audio signal obtained in step S204. That is, the amplitude of the frequency component outside the preset cutoff frequency interval in the frequency spectrum corresponding to the processed audio signal obtained in step S204 is adjusted to 0, and the second frequency spectrum is obtained.

Illustratively, the preset frequency cutoff interval is [ KSL, KSH ], where KSL represents a low frequency cutoff frequency of the preset frequency cutoff interval, and KSH represents a high frequency cutoff frequency of the preset frequency cutoff interval, where KSL is smaller than KSH.

And S206, performing linear stretching processing on the frequency component in the second frequency spectrum to obtain the audio signal after the linear stretching processing.

In practical application, a strong pure tone can mask a weak pure tone which sounds near the frequency domain of the strong pure tone at the same time, and the closer the weak pure tone is to the strong pure tone on the frequency domain, the easier the weak pure tone is to be masked; a weak pure tone is less easily masked if it is farther away in the frequency domain from a strong pure tone. Therefore, in the present scheme, the distance of the effective frequency component in the frequency domain is enhanced by performing linear stretching processing on the frequency component in the second frequency spectrum, so as to reduce the mutual influence between the effective frequency components.

In one possible implementation, sampling is performed in the second frequency spectrum at preset intervals to obtain N sampling frequency components; then, mapping the N sampling frequency components to a preset stretching frequency interval to obtain N mapping frequency components, and obtaining the processed audio signal according to the N mapping frequency components, wherein amplitudes of the N mapping frequency components correspond to amplitudes of the N sampling frequency components, respectively.

Alternatively, the preset interval may be varied with a change in frequency, for example, the preset interval may be increased or decreased as the frequency is changed from low to high in the second frequency spectrum.

Illustratively, how the stretching process is performed is explained in detail by a specific example:

wherein, the preset stretching frequency interval is [ KL, KH ], wherein KL represents the low-frequency stretching cut-off frequency of the preset stretching frequency interval, KH represents the high-frequency stretching cut-off frequency of the preset stretching frequency interval, and KL is smaller than KH. Alternatively, KH is greater than KSH and KL is less than KSL.

Assuming that, of the N sampling frequency components, a step size between first frequency indexes respectively corresponding to two adjacent sampling frequency components satisfies formula (2):

wherein, stepOld in the formula (2) identifies the step length between the frequency indexes respectively corresponding to the two adjacent sampling frequency components; n represents the total number of sampling frequency components, N being a positive integer greater than or equal to 2.

It should be noted that, since the sensitivity of the human ear is higher in the low frequency band and lower in the high frequency band, that is, the sensitivity of the human ear to sound decreases with increasing frequency, in formula (2), a logarithmization process (i.e., a log process) is adopted when calculating the step length between the frequency indexes corresponding to two adjacent sampling frequency components.

Then, in the preset stretching frequency interval, the step length between the second frequency indexes respectively corresponding to two adjacent mapping frequency components is:

wherein, stepNew in the formula (3) represents the step length between the second frequency indexes respectively corresponding to the two adjacent mapping frequency components; n represents the total number of mapped frequency components, N being a positive integer greater than or equal to 2.

A first frequency index corresponding to an nth sampling frequency component in the second frequency spectrum and a second frequency index corresponding to a corresponding nth mapping frequency component satisfy equation (4):

wherein, log (A) in the formula (4)_n) A first frequency index representing the mapping frequency component of the nth; log (A)_nf) Indicating a second frequency index corresponding to the nth sampling frequency component.

Suppose that

Then equation (5) can be obtained from equation (4):

log(A_n)-log(KL)＝K(log(A_nf) Log (KSL)). formula (5)

Converting equation (5) can yield equation (6):

log(A_n)＝K(log(A_nf) Log (KSL)) + log (KL) formula (6)

In the above equations (5) and (6), log (A)_n) Either an integer or a floating point number. If log (A)_n) Is an integer, then the frequency amplitude value of the nth mapping frequency component is directly determined in the preset stretching frequency interval

If log (A)_n) Is a floating point number, then A_nIn the interval (int (A)_n),int(A_n) +1) may be according to log (A)_n)、log(int(A_n) And log (int (A))_n) +1) linear relationship, re-determining log (A)_n) Is further redetermined

Wherein the re-determined log (A)_n) The value of (b) is an integer.

In particular, if log (A)_n) Is a floating point number, then log (A)_n)、log(int(A_n) And log (int (A))_n) +1) is shown in FIG. 7, which is log (A)_n)、log(int(A_n) And log (int (A))_n) +1) the linear relationship between the several second frequency indices, in the coordinate system shown in fig. 7, the horizontal axis represents the second frequency index in hertz (Hz) and the vertical axis represents the frequency amplitude value corresponding to the second frequency index in hertz (Hz). Referring to fig. 7, the triangle T1 and the triangle T2 are similar triangles, and according to the similar triangle principle, equation (7) can be determined:

converting equation (7) to obtain equation (8):

in the above-mentioned formula,

denotes the second frequency index as log (A)_n) Maps the frequency amplitude values of the frequency components.

By the mode, the N sampling frequency components can be mapped to the preset stretching frequency interval, the sound of each audio object is reserved, the interval of the effective frequency components in the audio signal in the frequency domain is increased, the mutual influence among the effective frequency components in the second frequency spectrum is reduced, and the auditory effect is improved.

In another possible implementation manner, the frequency components in the second frequency spectrum may be analyzed to expand the intervals of the frequency components corresponding to the plurality of audio objects in the frequency domain, so as to reduce the mutual influence between the frequency components of the plurality of audio objects. The method specifically comprises the following steps:

step one, classifying the frequency components in the second frequency spectrum according to the audio objects, and determining the audio object corresponding to each frequency component. For example, classification can be made by: referring to fig. 8, assuming two adjacent formants P1 and P2 in the second spectrum, the bandwidth and amplitude of P1 are W1 and a1, respectively, the bandwidth and amplitude of P2 are W2 and a2, if the frequency between the two formants P1 and P2, and if the continuous bandwidth W12 of min (a1, a2) × 0.5 is greater than max (W1, W2), the two formants P1 and P2 are considered to belong to different audio objects, as in fig. 8, the bandwidth of W12 is greater than W1, and W1 is greater than W2, it is determined that P1 and P2 belong to different audio objects. All formants in the second frequency spectrum are analyzed in the above manner, so as to determine the audio object corresponding to each frequency component in the second frequency spectrum.

And step two, if the frequency components in the second frequency spectrum are determined to correspond to the M audio objects, S stretching frequency widths are determined according to the preset stretching frequency interval and the preset frequency cut-off interval.

Specifically, if it is determined that the frequency component in the second spectrum corresponds to one audio object, no processing needs to be performed; if it is determined that the frequency components in the second spectrum correspond to two or more audio objects, S stretching bandwidths are determined according to a preset stretching frequency interval and a preset frequency cut-off interval, the S stretching bandwidths are bandwidths to be inserted into the frequency components of two adjacent audio objects, and the amplitudes of the S stretching bandwidths are both 0.

How to determine the S stretching bandwidths is described in detail below:

first, a total stretching bandwidth, i.e. the sum of S stretching bandwidths, is determined according to a preset stretching frequency interval and a preset frequency cutoff interval, for example, according to the formula W_total(ii) calculating total stretch bandwidth as log (KSH) -log (KH) + log (KL) -log (KSL), wherein W is_totalRepresenting the total stretch bandwidth.

Then, the process of the present invention is carried out,determining the quantity S of 0-value stretching bandwidth to be inserted according to the continuous frequency components respectively corresponding to the M audio objects; then, the total stretching bandwidth and the number S of 0 stretching bandwidths to be inserted are determined, and the bandwidth of each stretching bandwidth is determined. One possible implementation is based on the total stretching bandwidth W_totalDetermining each stretching bandwidth by the ratio of S, wherein the S stretching bandwidths determined in this way are the same; another possible implementation is based on the total stretching bandwidth W_totalAnd determining the S stretching bandwidths according to the ratio of the bandwidths respectively corresponding to the two adjacent audio objects needing to insert the stretching bandwidths.

And step three, inserting the S stretching bandwidths between the continuous frequency components respectively included by the two adjacent audio objects meeting the preset condition, thereby obtaining the audio signal after stretching processing.

It should be noted that, since the signal strength of the inserted S stretching bandwidths is 0, the stretching bandwidths may also be other names such as 0-value stretching bandwidths, 0-value bandwidths, and the like, and the embodiment of the present application is not limited by this.

As shown in fig. 9, the three formants P1, P2 and P3 are respectively identified as the formants corresponding to the three audio objects. The (center frequency and bandwidth) of three resonance peaks P1, P2 and P3 are respectively (500Hz, 400Hz), (1500Hz, 300Hz), (2500Hz, 200Hz), and the total stretching bandwidth is W_totalWherein W is_totalMay be determined from the preset stretching frequency interval and the preset frequency cutoff interval in the manner described above.

If the first implementation is followed, the stretching bandwidth inserted between P1 and P2 and the stretching bandwidth inserted between P2 and P3 are both W_total*0.5。

If the second implementation manner is adopted, the following steps are executed:

step one, calculating log bandwidths corresponding to W12 and W23 respectively;

the log bandwidth of W12 is:

WL₁₂＝log(W12)＝log(1500-300*0.5)-log(500+400*0.5)＝0.29

the log bandwidth of W23 is:

WL₂₃＝log(W23)＝log(2500-200*0.5)-log(1500+300*0.5)＝0.16

wherein W12 is a bandwidth of continuous frequency components satisfying min (a1, a2) × 0.5; w23 is the bandwidth of the continuous frequency component satisfying min (a2, A3) × 0.5.

Step two, according to the log bandwidth and the total stretching bandwidth W corresponding to W12 and W23 respectively_totalDetermining a 0-value log stretch bandwidth inserted between P1 and P2 and a 0-value log stretch bandwidth inserted between P2 and P3;

wherein the 0-valued log stretching bandwidth inserted between P1 and P2 is B1, and the 0-valued log stretching bandwidth inserted between P2 and P3 is B2:

B1＝W_total*WL₁₂/(WL₁₂+WL₂₃)＝W_total*0.64

B2＝W_total*WL₂₃/(WL₁₂+WL₂₃)＝W_total*0.36

further, by converting the above B1 and B2, the actual 0-value log stretching bandwidth inserted between P1 and P2 is X1, and the 0-value log stretching bandwidth inserted between P2 and P3 is X2:

X1＝2^Wtotal*0.64x1 is inserted at (500+1500) × 0.5 ═ 1000Hz, i.e. X1 is inserted at the center frequency of W12;

X2＝2^Wtotal*0.36x2 is inserted at (1500+2500) × 0.5 ═ 2000Hz, i.e. X2 is inserted at the center frequency of W23.

In practical applications, 0-value log stretching bandwidths X1 and X2 may be inserted at the center frequencies of W12 and W23, respectively, as described above, or may be inserted at other frequencies of W12 and W23, and are not limited to the center frequencies.

In this embodiment, by performing the stretching processing on the second frequency spectrum in any one of the above manners, the intervals between the effective frequency components in the second frequency spectrum in the frequency domain are increased, the sound of each audio object is retained, and the mutual influence between the effective frequency components in the second frequency spectrum is reduced, thereby improving the auditory effect.

Fig. 10 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present application. Referring to fig. 10, the apparatus 700 provided in this embodiment includes: an acquisition module 701 and a processing module 702.

The acquiring module 701 is configured to acquire an audio signal;

a processing module 702, configured to perform spectrum analysis on the audio signal to obtain a first spectrum corresponding to the audio signal; determining the target frequency component according to the amplitude of each frequency component in the first frequency spectrum and a masking curve of a target object, wherein the target frequency component is a frequency component of which the amplitude is smaller than the masking curve in each frequency component;

the processing module 702 is further configured to adjust the amplitude of the target frequency component according to a masking signal ratio of the target frequency component, and acquire a processed audio signal, where the masking signal ratio is used to indicate a degree of masking effect of the target frequency component, and the frequencies of the other frequency components are different from the frequency of the target frequency component.

The apparatus provided in this embodiment may be used to implement the technical solution in any of the method embodiments, and the implementation principle and technical effect thereof are similar to those in the embodiments described above, and are not described herein again.

Optionally, the processing module 702 is specifically configured to determine whether a masking signal ratio of the target frequency component is greater than or equal to a first preset threshold; and if the masking signal ratio of the target frequency component is determined to be greater than or equal to a first preset threshold, acquiring a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient.

Optionally, the processing module 702 is specifically configured to determine whether a masking signal ratio of the target frequency component is greater than or equal to a first preset threshold and is less than a second preset threshold; if the masking signal ratio of the target frequency component is determined to be greater than or equal to a first preset threshold and smaller than a second preset threshold, acquiring a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient;

in addition, the processing module 702 is further configured to adjust the amplitude of the target frequency component to be 0 if the masking signal ratio of the target frequency component is greater than or equal to the second preset threshold; wherein the first preset threshold is smaller than the second preset threshold.

Optionally, the gain factor satisfies the formula:

Optionally, the processing module 702 is further configured to delete, according to a preset frequency cut-off interval, a frequency component, which is outside the preset frequency cut-off interval, in the frequency spectrum of the processed audio signal, and obtain a second frequency spectrum; and performing linear stretching processing on the frequency components in the second frequency spectrum to obtain the audio signal subjected to the linear stretching processing.

Optionally, the processing module 702 is specifically configured to sample in the second frequency spectrum according to a preset frequency interval, and obtain N sampling frequency components; mapping the N sampling frequency components to a preset stretching frequency interval according to a preset stretching frequency interval to obtain N mapping frequency components; obtaining the processed audio signal according to the N mapping frequency components, wherein amplitudes of the N mapping frequency components are respectively equal to amplitudes of the N sampling frequency components.

Optionally, the processing module 702 is specifically configured to classify the frequency components in the second frequency spectrum according to audio objects, and determine an audio object corresponding to each frequency component; if the frequency components in the second frequency spectrum are determined to correspond to the M audio objects, S stretching frequency widths are determined according to a preset stretching frequency interval and a preset frequency cut-off interval, wherein M is an integer larger than or equal to 2, and S is an integer larger than or equal to 1; and for the M audio objects, inserting the stretching bandwidth between the continuous frequency components respectively included by the two adjacent audio objects to obtain the audio signal after the stretching processing.

Fig. 11 is a schematic structural diagram of an audio signal processing apparatus according to another embodiment of the present application. Referring to fig. 11, the apparatus 800 of the present embodiment includes: a memory 801 and a processor 802;

the memory 801 may be a separate physical unit that is coupled to the processor 802 via the bus 1103. The memory 801 and the processor 802 may also be integrated, implemented in hardware, and the like.

The memory 801 is used to store program instructions that are called by the processor 802 to perform the operations of any of the above method embodiments.

Alternatively, when part or all of the methods of the above embodiments are implemented by software, the above apparatus 1100 may only include the processor 802. A memory 801 for storing programs is located outside the device 800 and a processor 802 is connected to the memory via circuits/wires for reading and executing the programs stored in the memory.

The Processor 802 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.

The processor 802 may further include a hardware chip. The hardware chip may be an Application-Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), General Array Logic (GAL), or any combination thereof.

The Memory 801 may include a Volatile Memory (Volatile Memory), such as a Random-Access Memory (RAM); the Memory may also include a Non-volatile Memory (Non-volatile Memory), such as a Flash Memory (Flash Memory), a Hard Disk Drive (HDD) or a Solid-state Drive (SSD); the memory may also comprise a combination of memories of the kind described above.

The present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium includes a program, which when executed by a processor, performs the method of any of the above embodiments.

Embodiments of the present invention also provide a program product, which includes a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of the audio signal processing apparatus, and the at least one processor executes the computer program to make the audio signal processing apparatus execute the operations of any one of the above method embodiments.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An audio signal processing method, comprising:

if the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold, obtaining a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient, wherein the masking signal ratio is used for indicating the masking effect strength of the target frequency component, and can be obtained by the ratio of the strength of other frequency components to the strength of the target frequency component, and the frequencies of the other frequency components are different from the frequency of the target frequency component.

2. The method according to claim 1, wherein if the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold, obtaining a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient, comprises:

3. The method of claim 1 or 2, wherein the gain factor satisfies the formula:

wherein, the S tableThe gain factor is shown, msr represents the masking signal ratio and K represents the first preset threshold.

4. The method according to claim 1 or 2, characterized in that the method further comprises:

5. The method according to claim 4, wherein the performing linear stretching processing on the frequency components in the second frequency spectrum to obtain a linear-stretched audio signal comprises:

6. The method of claim 4, wherein the performing linear stretching processing on the frequency components in the second frequency spectrum to obtain the processed audio signal comprises:

7. An audio signal processing apparatus, comprising:

the acquisition module is used for acquiring an audio signal;

the processing module is used for carrying out spectrum analysis on the audio signal to obtain a first spectrum corresponding to the audio signal; determining target frequency components according to the amplitudes of the frequency components in the first frequency spectrum and a masking curve of a target object, wherein the target frequency components are the frequency components of which the amplitudes are smaller than the masking curve in the frequency components;

the processing module is further used for determining whether the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold value; and if it is determined that the masking signal ratio of the target frequency component is greater than or equal to a first preset threshold, acquiring a gain coefficient of the target frequency component according to the masking signal ratio of the target frequency component and the first preset threshold, and adjusting the amplitude of the target frequency component according to the gain coefficient, wherein the masking signal ratio is used for indicating the masking effect strength of the target frequency component and can be obtained by the ratio of the strength of other frequency components to the strength of the target frequency component, and the frequencies of the other frequency components are different from the frequency of the target frequency component.

8. A computer-readable storage medium, comprising: carrying out a procedure;

the program is executed by a processor to perform the audio signal processing method of any one of claims 1 to 6.

9. A display device, comprising: a display panel and a speaker and the audio signal processing apparatus according to claim 7;

the display panel is used for displaying a user interface;

and the loudspeaker is used for outputting the processed audio signal acquired by the audio signal processing device.