CN113299313B

CN113299313B - Audio processing method and device and electronic equipment

Info

Publication number: CN113299313B
Application number: CN202110121348.6A
Authority: CN
Inventors: 张勇
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2024-03-26
Anticipated expiration: 2041-01-28
Also published as: WO2022161475A1; CN113299313A

Abstract

The application discloses an audio processing method, an audio processing device and electronic equipment, and belongs to the field of signal processing. The problem of poor playing effect of the broadband/full-band non-voice signals can be solved. The method comprises the following steps: performing resolution enhancement processing on the first audio signal to obtain a second audio signal; performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal; performing signal processing on the processed second audio signal to obtain Y first sub-band signals with the same bandwidth; generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals; based on the high-frequency characteristic information of the first audio signal, performing spectrum adjustment on the M high-frequency subband signals to obtain M target high-frequency subband signals; synthesizing the M target high-frequency subband signals to obtain target audio signals; wherein Y, M is a positive integer. The embodiment of the application is applied to a scene for processing audio.

Description

Audio processing method and device and electronic equipment

Technical Field

The application belongs to the field of signal processing, and particularly relates to an audio processing method, an audio processing device and electronic equipment.

Background

With the progress of electronic technology, the performance of electronic devices is continuously improved, and high-definition televisions, headphones, sound boxes, mobile phones and the like can support the playing of high-definition audio, so that the demands of people on high-definition audio with high fidelity and high expressive force are also urgent.

Generally, audio signals typically include speech signals and non-speech signals (e.g., music signals). In the related art, the electronic device may expand the narrowband speech signal into the wideband speech signal based on the speech signal generation model, so as to reduce loss of sound information of the speech signal and improve fidelity of the speech signal.

However, since the spectral characteristics of the non-speech signal are different from those of the speech signal, and the speech signal generation model in the electronic device is generated based on the spectral characteristics of the speech signal, only the audio signal having the same spectral characteristics as the speech signal can be processed. Therefore, the speech signal generation model in the electronic device is not applicable to non-speech signals (e.g., music signals, sound signals generated in nature). Therefore, the electronic device cannot process the non-voice signal, and the playing effect of the non-voice signal is poor.

Disclosure of Invention

The embodiment of the application aims to provide an audio processing method which can solve the problem that the playing effect of a broadband/full-band non-voice signal is poor.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, an embodiment of the present application provides an audio processing method, including: performing resolution enhancement processing on the first audio signal to obtain a second audio signal; performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal; performing signal processing on the processed second audio signal to obtain Y first sub-band signals with the same bandwidth; generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals; based on the high-frequency characteristic information of the first audio signal, performing spectrum adjustment on the M high-frequency subband signals to obtain M target high-frequency subband signals; synthesizing the M target high-frequency subband signals to obtain target audio signals; wherein Y, M is a positive integer.

In a second aspect, embodiments of the present application provide an audio processing apparatus, the apparatus including: the device comprises a processing module, a generating module and a synthesizing module, wherein:

The processing module is used for carrying out resolution improvement processing on the first audio signal to obtain a second audio signal; the processing module is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal; the processing module is further configured to perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth; the generating module is used for generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals obtained by the processing module; the processing module is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module based on the high-frequency characteristic information of the first audio signal, so as to obtain M target high-frequency subband signals; the synthesizing module is used for synthesizing the M target high-frequency subband signals obtained by the processing module to obtain target audio signals; wherein Y, M is a positive integer.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction implementing the steps of the method according to the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a non-volatile storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

In this embodiment of the present application, the electronic device may perform resolution enhancement processing on a low-resolution first audio signal (e.g., a wideband/full-band non-speech signal) to obtain a high-resolution second audio signal, perform low-pass filtering processing on the second audio signal, thereby filtering a high-frequency signal in the second audio signal, then perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, generate M high-frequency subband signals according to low-frequency subband signals in the Y first subband signals, and finally perform spectrum adjustment on the M high-frequency subband signals based on high-frequency spectrum information of the low-resolution first audio signal, to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals to obtain a first audio signal with good reconstructed harmonic characteristics of a high-frequency portion, so as to obtain a high-definition high-performance high-frequency subband signal, thereby enhancing a play effect of the non-speech signal.

Drawings

Fig. 1 is a flowchart of an audio processing method according to an embodiment of the present application;

fig. 2 is one of schematic diagrams of waveform diagrams of an audio signal according to an embodiment of the present application;

FIG. 3 is a second schematic diagram of a waveform diagram of an audio signal according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of spectral duplication/flip provided by an embodiment of the present application;

fig. 5 is a schematic diagram of a neural network topology provided in an embodiment of the present application;

FIG. 6 is a graph of the amplitude-frequency response of a lowpass prototype filter and a PQMF analysis filter bank provided by an embodiment of the present application;

fig. 7 is a schematic diagram of a PQMF subband analysis/synthesis filter bank provided in an embodiment of the present application;

FIG. 8 is a block diagram of a high definition audio generation system provided by an embodiment of the present application;

fig. 9 is a schematic structural diagram of an audio processing device according to an embodiment of the present application;

FIG. 10 is a second schematic diagram of an audio processing device according to an embodiment of the present disclosure;

fig. 11 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made with reference to the accompanying drawings, in which it is apparent that some, but not all embodiments of the embodiments described are described. All other embodiments, based on the embodiments herein, which would be apparent to one of ordinary skill in the art without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the present application may be implemented in other sequences than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type and do not limit the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The audio processing method provided by the embodiment of the application is described in detail below by means of specific embodiments and application scenes thereof with reference to the accompanying drawings.

The embodiment of the application provides an audio processing method, which may be applied to an audio processing device, and illustrates a flowchart of the audio processing method provided by the embodiment of the application, as shown in fig. 1, and the audio processing method provided by the embodiment of the application may include the following steps 101 to 106:

Step 101: and performing resolution improvement processing on the first audio signal to obtain a second audio signal.

In an embodiment of the present application, the first audio signal includes at least one of: wideband audio (16 kHz sampling), ultra wideband audio (32 kHz sampling) and full band audio (44.1 kHz sampling, 48kHz sampling).

In an embodiment of the present application, the resolution of the first audio signal is smaller than the resolution of the second audio signal.

It should be noted that, the resolution of the audio signal is determined by the sampling rate (Sample rate) and the Bit Depth (Bit Depth), and for two audio signals with the same Bit Depth, the resolution of the audio signal with a high sampling rate is higher than the resolution of the audio signal with a low sampling rate, so that the resolution of the audio signal can be improved by improving the sampling rate of the audio signal. That is, the sampling rate of the first audio signal is smaller than the sampling rate of the second audio signal. For example, the sampling rate of the second audio signal may be 96kHZ.

In the embodiment of the present application, since the first audio signal is generally broadband/ultra-broadband/full-band audio, the playing effect thereof is poor, so that the first audio signal needs to be adjusted to high-definition audio, however, the requirements for generating the high-definition audio film source on the software and hardware environments are higher. Therefore, the sampling rate of the first audio signal can be increased under the condition of not changing the sampling rate and the coding format of the digital audio film source and not increasing the network transmission bandwidth so as to achieve the sampling rate of high-definition audio, so that the broadband/ultra-broadband/full-band audio can be adjusted to be high-definition audio (96 kHz sampling).

In general, both up-sampling and down-sampling are performed on a digital signal, and specifically, the resampling is performed at a sampling rate that is compared with the sampling rate at which the digital signal was originally obtained (e.g., sampled from an analog signal), and if the resampling is performed at a sampling rate that is greater than the sampling rate at which the digital signal was originally obtained, the up-sampling is performed, and vice versa.

It is to be understood that the resolution improvement process described above can be regarded as: the first audio signal is up-sampled. That is, the step 101 may include the following step 101a:

step 101a: and carrying out L times up-sampling on the first audio signal to obtain a second audio signal with a preset sampling rate. Wherein L is greater than 0.

By way of example, assuming that the first audio signal is a full-band audio signal having a sampling rate of 48kHz, the sampling rate (48 kHz) of the full-band audio signal is converted to the sampling rate of high-definition audio (96 kHz) in the case of 2-fold up-sampling (i.e., resampling) thereof.

Example 1 illustrates the implementation of step 101a described above, taking the example of generating 96kHz sampled high definition audio from 48kHz sampled full band audio (i.e., the first audio signal). The time domain waveform diagram of the audio signal of the full band audio is shown in fig. 2 (a), and the frequency spectrum diagram of the audio signal of the full band audio is shown in fig. 2 (b). For example, assuming a sampling rate of 48kHZ for the full-band audio and an effective bandwidth of 24kHZ, the audio processing device would up-sample the full-band audio input by a factor of 2 to obtain a 96kHZ sampled signal (i.e., the second audio signal).

It should be noted that, since the up-sampling process of the audio signal increases the bandwidth of the audio signal, the bandwidth of the second audio signal is larger than that of the first audio signal.

Step 102: and performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal.

In the embodiment of the present application, the signal processing device may filter the high frequency component (i.e., the high frequency signal) in the second audio signal through the low pass filter, and only the low frequency component (i.e., the low frequency signal) of the second audio signal is remained. It should be noted that, the low-pass filtering can be simply considered as: a frequency bin (i.e., a cut-off frequency) is set, which cannot pass when the signal frequency is above this frequency, and which is all assigned a value of 0 when the frequency domain is above this cut-off frequency.

Example 2, the above signal processing of the second audio signal is described in conjunction with example 1 above. After the 96kHz sampled signal is obtained, the audio processing apparatus may filter the audio signal with a low-pass filter having a cut-off frequency of 24kHz to remove the image frequency component of the high-frequency portion after the up-sampling. The waveform diagram and the spectrogram of the audio signal after up-sampling and low-pass filtering are shown in fig. 3 (a) and fig. 3 (b), respectively.

The bandwidth of the first audio signal is the same as the bandwidth of the processed second audio signal. For example, referring to fig. 3, the effective bandwidth of the processed audio signal remains at 24kHz with a sample rate of 96 kHz.

It should be noted that, the bandwidth of the audio signal is defined as: the audio signal has a frequency range, according to nyquist's law, where the sampling frequency (i.e. the sampling rate) of the signal is 2 times the bandwidth of the signal, i.e. the bandwidth of the signal is 1/2 of the sampling frequency of the signal. The first audio signal is assumed to have a sampling rate of 48kHz and a bandwidth of 48kHz/2, i.e. 24kHz.

Step 103: and performing signal processing on the processed second audio signal to obtain Y first sub-band signals with the same bandwidth.

Wherein Y is a positive integer.

In the embodiment of the present application, the Y first subband signals include a high frequency subband signal and a low frequency subband signal.

Optionally, in an embodiment of the present application, the signal processing performed on the processed second audio signal may be: and filtering and downsampling the processed second audio signal.

Illustratively, the signal processing includes a PQMF subband filtering process and a downsampling process. Further, the audio processing apparatus may divide the input processed second audio signal into Y subband signals with equal bandwidths through the PQMF subband filter bank, and then obtain Y first subband signals by downsampling each subband signal.

The PQMF subband analysis performs time-frequency transformation on the original signal, and the purpose of the PQMF subband analysis is to obtain a plurality of subband signals which reflect high-low frequency correlation, have good harmonic characteristics, and are easy to analyze. At the analysis end, the input time domain signal is divided into a plurality of sub-band signals with equal bandwidth through a PQMF analysis filter bank, and then each sub-band signal is downsampled. At the synthesis end, each sub-band signal is first up-sampled, and then the up-sampled sub-band signal is converted into a time domain signal through a PQMF synthesis filter bank.

The division of the high-frequency subband signals and the low-frequency subband signals in the Y first subband signals is determined according to the frequency ranges of the high-frequency component and the low-frequency component of the processed second audio signal. That is, the first subband signal having a signal frequency in the frequency range of the low frequency component is a low frequency subband signal; the first subband signal having a signal frequency in the frequency range of the high frequency component is a high frequency subband signal.

Step 104: m high-frequency subband signals are generated from the low-frequency subband signals in the Y first subband signals.

In the embodiment of the present application, the audio processing apparatus may generate one or more high-frequency subband signals according to one low-frequency subband signal, that is, each low-frequency subband signal in the Y first subband signals corresponds to one or more high-frequency subband signals, where Y is less than or equal to M.

In the embodiment of the present application, a high-frequency generator may be used to generate a high-frequency subband signal spectrum from the spectrum of the low-frequency subband signal in the Y first subband signals, so as to generate a high-frequency subband signal.

By way of example, the method of generating M high frequency subband signals by the audio processing apparatus may comprise any of the 4 methods shown in table 1.

Table 1 high frequency subband spectrum generation method

Referring to table 1 above, the spectrum processing type corresponding to the above method 1 and method 2 is spectrum duplication, the spectrum processing type corresponding to the above method 3 and method 4 is spectrum inversion, and the difference between the spectrum duplication and the spectrum inversion can be shown with reference to fig. 4.

Step 105: and carrying out frequency spectrum adjustment on the M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals.

In the embodiment of the present application, the high-frequency characteristic information may be signal gains of the M high-frequency subband signals.

In the embodiment of the present application, the audio processing apparatus may adjust the amplitude of the M high-frequency subband signals by using an envelope adjuster, so as to obtain M reconstructed high-frequency subband signals (i.e., the M target high-frequency subband signals).

Step 106: and synthesizing the M target frequency subband signals to obtain a target audio signal.

Wherein M is a positive integer.

In this embodiment of the present application, the audio processing apparatus may synthesize the above-mentioned M target high-frequency subband signals through a PQMF synthesis filter bank, to obtain a target audio signal.

In the audio processing method provided in the embodiment of the present application, the electronic device may perform resolution enhancement processing on a low-resolution first audio signal (e.g., a wideband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering processing on the second audio signal, so as to filter out a high-frequency signal in the second audio signal, then perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals, and finally perform spectral adjustment on the M high-frequency subband signals based on high-frequency spectral information of the low-resolution first audio signal, so as to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals to obtain a first audio signal with good reconstruction of harmonic characteristics of a high-frequency portion, thereby obtaining a high-fidelity and high-expressive high-definition non-speech signal.

Alternatively, in the embodiment of the present application, since there is a correlation between the high-frequency subband signal and the low-frequency subband signal of the audio signal, the corresponding high-frequency subband signal may be generated according to the low-frequency subband signal in the processed second audio signal.

For example, the step 104 may include the following step 104a:

step 104a: and performing spectrum copying on all low-frequency subband signals in the Y subband signals to generate M high-frequency subband signals.

For example, the audio processing apparatus may generate the spectrums of the M high-frequency subband signals using the method of spectrum replication in table 1 described above. For example, the audio processing apparatus may copy the upper half of the spectrum of the low frequency subband signal a plurality of times to generate spectrums of the M high frequency subband signals to generate the M high frequency subband signals.

In this way, the audio processing apparatus may obtain the high frequency component in the processed second audio signal based on the low frequency component in the processed second audio signal, thereby preliminarily obtaining the frequency spectrum of the processed second audio signal.

Optionally, in the embodiment of the present application, the audio processing apparatus may extract the low-frequency characteristic of the original audio signal based on a strong correlation between the low-frequency characteristic and the high-frequency spectrum envelope of the audio signal, so as to predict the high-frequency characteristic of the audio signal according to the low-frequency characteristic.

Illustratively, before the step 105, the audio processing method provided in the embodiment of the present application further includes the following step A1 and step A2:

step A1: and extracting the characteristics of the first audio signal to obtain the low-frequency characteristic information of the first audio signal.

Step A2: inputting the low-frequency characteristic information into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.

Illustratively, the low frequency characteristic information includes at least one of: normalized autocorrelation coefficient of first audio signal (x _acf ) Gradient index x _gi Subband spectral flatness (x _sfm )。

It should be noted that the above low-frequency characteristic information may be regarded as a characteristic parameter of the first audio signal, and the following three principles need to be considered for selecting the characteristic parameter:

(1) The low-frequency characteristic parameter has stronger correlation with the high-frequency spectrum envelope;

(2) Good independence exists among the characteristic components;

(3) The feature component is easy to calculate.

Based on the principle, the embodiment of the application selects the 3 characteristic parameters to describe the audio frequency characteristics from the angles of the time domain and the frequency domain respectively. In practical application, other feature parameters with feasibility may be selected, which are not limited in the embodiment of the present application.

Further details of the three frequency characteristic information (i.e., characteristic parameters) described above are provided below.

The preset neural network may be a DNN neural network, for example. It should be noted that DNN neural network is a unidirectional propagation multi-layer forward network, which can abstract and model complex data with high efficiency. The DNN neural network topology is shown in fig. 5, and is divided into three types, an input layer, a hidden layer, and an output layer. Typically, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. Full connection is realized among neurons of each layer, and no connection is realized among neurons of the same layer.

Illustratively, the DNN neural network is configured to establish a nonlinear mapping from low frequency characteristics of the first audio signal to a high frequency spectral envelope of the first audio signal.

Illustratively, the input of the DNN neural network is Gao Pinte sign information of the first audio signal, including normalized autocorrelation coefficients, gradient indexes, and subband spectral flatness, and the output of the DNN neural network is a signal gain (denoted by G) of a high-frequency subband signal of the first audio signal.

In this way, the audio processing apparatus may predict the high-frequency characteristic information of the first audio signal based on the low-frequency characteristic information of the first audio signal, so as to adjust the frequency spectrum (i.e., the spectral envelope) of the processed second audio signal by the high-frequency characteristic information.

Optionally, in the embodiment of the present application, the audio processing device may frame the processed second audio signal, and then perform audio signal processing based on each audio signal frame, so as to reduce the unsteady-state and time-varying effects of the overall speech signal.

Illustratively, the step 103 may include the following steps 103a and 103b:

step 103a: and framing the processed second audio signal to obtain X audio signal frames.

Step 103b: and sequentially carrying out filtering and downsampling processing on each audio signal frame to obtain N first sub-band signals corresponding to each audio signal frame.

Wherein the Y first subband signals include: n first sub-band signals corresponding to each audio signal frame.

Illustratively, each audio signal frame includes a first predetermined number of sample points. For example, it may be preset that each signal frame includes 2048 sample points.

Illustratively, X of the X audio signal frames is determined based on the sampling rate of the second audio signal and the number of sample points included in each audio signal frame.

For example, the audio processing apparatus may number the obtained X audio signal frames, each audio signal frame may correspond to a sequence number, for example, if the processed second audio signal includes l audio signal frames, the l audio signal frames may be numbered from 1 to l.

By way of example, high definition audio of 96kHz samples is generated from full band audio (i.e., the first audio signal) sampled at 48 kHz. In combination with examples 1 and 2 above, after sampling and low-pass filtering the first audio signal to obtain a processed second audio signal, the processed second audio signal has a sampling rate of 96kHz and may be divided into 46 audio signal frames (i.e., X audio signal frames) of 2048 sample points per frame at a sampling rate of 96 kHz.

For example, the audio processing apparatus may sequentially perform the above-described filtering process and the downsampling process for each of the audio signal frames in accordance with the timing information of the X audio signal frames.

Illustratively, each of the N first subband signals has an index, one index corresponding to each first subband signal.

Illustratively, the N first subband signals include P low frequency subband signals and Q high frequency subband signals. Wherein P and Q are positive integers.

The number of subband signals (i.e., N) corresponding to each audio signal frame is preset, and further, the number of subband signals is determined according to parameters set for the PQMF subband filter bank. For example, the number of subbands in the PQMF subband filter bank is set to 64, and after each audio signal frame is processed by the PQMF subband filter bank, 64 subband signals corresponding to each audio signal frame can be obtained.

For example, in step 103b, the audio processing apparatus may perform PQMF filtering processing on each audio signal frame to obtain N subband signals corresponding to each audio signal frame, and then downsamples the N subband signals to obtain N first subband signals. Further, the downsampling process may be an N-times downsampling process.

Illustratively, each of the N first subband signals includes a second predetermined number of sample points. Further, the second predetermined number is determined based on a sampling multiple of the downsampling.

Illustratively, the second predetermined number of sample points in each of the first subband signals are arranged in time order within the frequency range in which the first subband signal is located.

Example 3, take as an example the generation of 96kHz sampled high definition audio by 48kHz sampled full band audio (i.e., the first audio signal). After framing the processed second audio signal, assuming that each signal frame includes 2048 sample points, filtering the signal frame by a PQMF analysis filter bank to obtain 64 subband signals, and performing 64 times downsampling on each subband signal to obtain 64 first subband signals, where each first subband signal includes 32 sample points. Wherein, the 0 th to 31 th subband signals are low frequency subband signals, and the 32 nd to 63 th subband signals are high frequency subband signals.

It should be noted that, the N first subband signals corresponding to each audio signal frame described above respectively belong to N different frequency ranges (i.e., frequency bands) of the second audio signal. For example, assuming that each of the audio signal frames corresponds to 64 first subband signals, the second audio signal is divided into 64 frequency ranges according to signal frequencies, and each of the first subband signals belongs to one of the 64 frequency ranges. The N first subband signals thus obtained have frequency characteristics that can reflect the signals, and have good harmonic characteristics.

For ease of description, the PQMF analysis filter bank output signal, the N first subband signals described above, is denoted as x _l [k][n]Where k represents a subband number ranging from 0.ltoreq.k.ltoreq.63, n represents a sequence number of a sequence sample point within each subband ranging from 0.ltoreq.n.ltoreq.31, and l represents a sequence number of a current audio signal frame.

It should be noted that, for each of the X audio signal frames, the output subband signal (i.e., the first subband signal) is filtered by the PQMF analysis filter bank to form an X [ k ] [ n ] matrix, where k represents the transformed subband sequence number (the sequence number of the first subband signal) and n represents the sequence number of the transformed subband time sequence sample point (i.e., the time sequence sample point of the first subband signal). x [ k ] [ n ] has dual resolution of time and frequency, and has frequency distribution characteristic of frequency domain and waveform characteristic of time domain.

For ease of understanding, the following description will be given of the expressions of the PQMF analysis filter bank and the synthesis filter bank.

Illustratively, the mathematical expressions of the PQMF analysis filter bank and synthesis filter bank used in the embodiments of the present application are as follows:

analysis filter:

a synthesis filter:

wherein N in the formulas (1) and (2) is the number of the first subband signals, p (N) is a low-pass prototype filter, the normalized cut-off frequency is pi/(2N), the filter length is M, m=ln, L is any positive integer, k=0, 1, …, N-1, and represents the subband sequence number, and N identifies the sequence number of the subband sequence sample point after conversion.

For example, the number of subbands in the PQMF subband filter bank may be set to n=64, the order of the low-pass prototype filter p (N) may be set to m=768, and the filter stop-band attenuation is designed to-90 dB.

Fig. 6 (a) shows the amplitude-frequency response curve of the low-pass prototype filter p (n), and fig. 6 (b) shows the amplitude-frequency response curve of the PQMF analysis filter bank.

FIG. 7 is a schematic diagram of the PQMF subband analysis/synthesis filter bank, H in FIG. 7 _k (z) is h _k Z transformation of (n), F _k (z) is F _k (n)Z conversion.

The analysis filter bank is used for dividing the input time domain signal into N subband signals, and the synthesis filter bank is used for synthesizing the N subband signals into one time domain signal.

Further alternatively, in combination with the step 103b, the step 104a may include the following step 104a1:

step 104a1: at least one high frequency subband signal is generated from the low frequency subband signal of the N first subband signals of each audio signal frame.

Illustratively, the number of high frequency subband signals that are ultimately generated per audio signal frame is the same.

In example 4, in combination with the above example 3, after obtaining 64 first subband signals corresponding to each audio signal frame, the audio processing apparatus may select 16 low frequency subband signals with subband indexes of 15-30 (i.e. corresponding to the low frequency source subband numbers in table 2) and copy the spectral coefficients of the low frequency subband signals 2 times, to generate 32 high frequency subband spectral coefficients (i.e. corresponding to the high frequency target subband numbers in table 2), where the correspondence relationship during the frequency band copy is shown in table 2.

Low frequency source subband signals	High frequency target subband signals
		15	32、48
16	33、49
		17	34、50
18	35、51
		19	36、52
20	37、53
		21	38、54
22	39、55
		23	40、56
24	41、57
		25	42、58
26	43、59
		27	44、60
28	45、61
		29	46、62
30	47、63

Table 2 table of high and low frequency band replication correspondence

In table 2, "low frequency source subband number" is the number of the low frequency subband signal, and "high frequency target subband number" is the number of the high frequency subband signal.

Further alternatively, in the embodiment of the present application, the step A1 includes the following step B1:

step B1: and extracting the characteristics of the P low-frequency subband signals in the N first subband signals in each audio signal frame to obtain the low-frequency characteristic information of each audio signal frame.

For example, the audio processing device may calculate the normalized autocorrelation coefficient and the gradient index of the first audio signal from the number of samples of the first audio signal and the order of the autocorrelation function.

The definition of the low frequency characteristic information is described in detail below:

(1) The normalized autocorrelation coefficients are used to describe the correlation of the signal in the time domain. Let x (N) be the input audio signal, N be the number of samples per frame of signal, M be the order of the autocorrelation function (m=1, 2, …, M being the maximum autocorrelation order), the normalized autocorrelation coefficients are calculated as follows:

(2) The gradient index is used to distinguish between harmonic and noise characteristics of an audio signal, and is defined as the sum of gradient amplitudes of the audio signal in each variation direction, namely:

wherein the variable ψ (n) is an indication function of the signal change direction:

wherein sign (x) is a sign function defined as:

where E is the total energy of the input signal of the current frame:

(3) The subband spectral flatness described above is used to distinguish tonal and noise characteristics of an audio signal in a subband. The greater the subband spectrum flatness, the more tonal components the subband spectrum exhibits. Conversely, the more noise components are present in the subband spectrum. It is defined as the ratio of the geometric mean to algebraic mean of all spectra (MDTC spectral coefficients) within each low frequency PQMF subband.

The definition of the low frequency characteristic information is described below in conjunction with a specific example.

For example, the audio processing apparatus may acquire a spectral coefficient of each of the P low frequency subband signals to calculate the subband spectral flatness of each of the low frequency subband signals.

The low frequency characteristic information of the first audio signal may be a set of 64-dimensional characteristic vectors of each audio signal frame

Example 5, take as an example the generation of 96kHz sampled high definition audio by 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each of the above-described signal frames corresponds to 64 subband signals (first subband signals), the audio processing apparatus may acquire spectral coefficients of 0 to 31 subband signals therein and calculate subband spectral flatness of each of the 0 to 31 subband signals.

It should be noted that, in the feature extraction, the maximum autocorrelation order of the normalized autocorrelation coefficient may be set to m=31, and the feature dimension in the embodiment of the present application is set as shown in table 3.

TABLE 3 feature names and dimensions

Further alternatively, in an embodiment of the present application, in combination with the step B1, the step A2 includes the following step B2:

step B2: and inputting the low-frequency characteristic information of each audio signal frame into a preset neural network model, and predicting the high-frequency characteristic information of each audio signal frame.

For example, the high frequency characteristic information of each audio signal frame may be signal gains of the H high frequency subband signals.

By way of example, assuming that a kth high-frequency subband signal of the M high-frequency subband signals is generated from a jth low-frequency subband signal of the low-frequency subband signals, a subband gain G [ k ] of the kth high-frequency subband is defined as:

en in formula (9) _k For the k-th high frequency subband spectral coefficient total energy, en _j The total energy for the low frequency jth PQMF subband MDCT spectral coefficients.

It should be noted that the audio signal is time-sequential "serialized" data, and the front and back signals are related. In order to be able to fully exploit its context correlation, DNN neural networks (i.e. DNN models) employ a mosaic of frames to take into account the impact of context-related information on the current frame. Specifically, it is assumed that the feature parameter vector extracted from the current frame signal is When the frame is assembled, m frames are selected from the front and the back to form a super frame characteristic vector +.>As input to the DNN model, +.>The expression is as follows:

illustratively, in order to make full use of the contextual relevance of the audio signal (i.e., the relevance between a plurality of consecutive audio signal frames), the audio processing apparatus may take a frame-spelling strategy after obtaining the low frequency characteristic information of each audio signal frame, and input a plurality of audio signal frames in the DNN neural network. For example, when the frame is assembled, 3 frames are selected forward and backward, and 7 frames of feature vectors including the current frame feature form a super frame feature vectorAs input to the DNN model, its dimension is 64×7=448, i.e.:

example 6, take as an example the generation of 96kHz sampled high definition audio by 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each audio signal frame corresponds to 64 subband signals (first subband signals), wherein subbands 32-63 are high frequency subband signals, after processing each audio signal frame through the DNN neural network, the signal gain of the output high frequency subband signalsIs a 32-dimensional feature vector, and the mathematical expression is as follows: />

Exemplary, the above-described hyper-parameter settings for the DNN neural networks are shown in table 4.

TABLE 4 super parameters of DNN neural model

Further alternatively, in the embodiment of the present application, the step 105 includes the following step 105a:

step 105a: and according to the high-frequency characteristic information of each audio signal frame, carrying out frequency spectrum adjustment on the H high-frequency subband signals in each audio signal frame to obtain H target high-frequency subband signals.

Wherein the M target high frequency subband signals include the H target high frequency subband signals for each audio signal frame.

Illustratively, let the kth high-frequency subband signal of the H high-frequency subband signals obtained by the high-frequency generator beThe total energy is->Let the kth high frequency subband gain obtained by the envelope predictor be G [ k ]]The kth reconstructed high frequency subband signal (i.e., the target high frequency subband signal) obtained by the envelope adjuster is X k][m]The following steps are:

where N is the frame length, k, of one frame MDCT coefficient of the PQMF sub-band _l And k _h The start index and the end index of the high frequency PQMF subband, respectively.

Example 7, take as an example the generation of 96kHz sampled high definition audio by 48kHz sampled full band audio (i.e., the first audio signal). Assume that each audio signal frame corresponds to 64 subband signals (first subband signals), wherein the subbands32-63 is a high frequency subband signal, and the kth high frequency subband signal obtained by the high frequency generator is made to be The total energy is->Let the kth high frequency subband gain obtained by the envelope predictor be G [ k ]]The kth reconstructed high frequency subband signal obtained by the envelope adjuster is X k][m]The following steps are:

further optionally, in the embodiment of the present application, after framing the processed second audio signal, the audio signals at the boundary between two adjacent frames may have a larger amplitude difference, so that the audio signals are discontinuous, and noise is generated. To eliminate such noise, the above-described X audio signal frames may be subjected to denoising processing.

Exemplary, after the step 103a, the signal processing method provided in the embodiment of the present application further includes the following step C1:

step C1: and performing signal processing on N first sub-band signals in two adjacent audio signal frames in the X audio signal frames to obtain N processed first sub-band signals.

Illustratively, the processed first subband signal comprises the low frequency subband signal in each of the audio signal frames.

The signal processing described above may include, for example, MDCT transforms. Further, in the case of performing MDCT transform, two first subband signals having the same frequency in the adjacent two audio signal frames may be sequentially acquired, and then the two first subband signals may be windowed and MDCT transformed to obtain one first subband signal having MDCT spectral coefficients (i.e., spectrum).

For convenience of the following description, the two first subband signals having the same frequency in the above-mentioned adjacent two audio signal frames are denoted as related two subband signals.

Further, each subband signal includes N sample points, where the MDCT transform is performed, an input sequence of the first audio signal frame (i.e., x (N)) and N sample points of the input sequence of the first audio signal frame are combined to form 2N sample points, and then the signal of the 2N sample points is windowed, and then the MDCT transform is performed on the signal after the windowing to obtain MDCT spectral coefficients of the N sample points.

The expression for MDCT is as follows:

illustratively, when windowing the signal, the window function selects a sine window, defined as:

example 8, take as an example the generation of 96kHz sampled high definition audio by 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each audio signal frame corresponds to 64 subband signals (first subband signals), wherein each subband signal comprises 32 sample points, after windowing and MDCT transforming the above-mentioned related two subband signals, each subband signal obtains MDCT spectral coefficients of 32 sample points, denoted X _l [k][m]Where k denotes a subband sequence number, whose range is 0.ltoreq.k.ltoreq.63, m denotes an MDCT spectrum sequence number, whose range is 0.ltoreq.m.ltoreq.31, and l denotes an audio signal frame sequence number.

Further optionally, in an embodiment of the present application, in combination with the step 103a, the step 106 includes the following steps 106a and 106b:

step 106a: and synthesizing the H target high-frequency subband signals in each audio signal frame to obtain a fourth audio signal corresponding to each audio signal frame.

Step 106b: and synthesizing the fourth audio signal corresponding to each audio signal frame to obtain the target audio signal.

The audio processing apparatus may synthesize the H target high frequency subband signals in each audio signal frame through up-sampling and filtering processes, to obtain a fourth audio signal corresponding to each audio signal frame.

Further, in the case of synthesizing the above-described H target high-frequency subband signals in each audio signal frame, the audio processing apparatus first up-samples each subband signal N times, and then converts the up-sampled subband signals into time-domain signals by a PQMF synthesis filter bank.

The mathematical expression of the PQMF synthesis filter bank used in the embodiments of the present application has been described above, and will not be described here again.

Further alternatively, in the embodiment of the present application, in the case of performing MDCT transform on the above-described N first subband signals, the audio processing apparatus may perform MDCT inverse transform (i.e., IMDCT) on the spectrum-adjusted H high-frequency subband signals to restore the subband signals in each audio signal frame.

In combination with the step 103a and the step C1, after the spectrum adjustment is performed on the H high-frequency subband signals in each audio signal frame in the step 105a, the audio signal processing method provided in the embodiment of the present application further includes the following step D1:

step D1: and performing IMDCT (inverse discrete cosine transform) on the H high-frequency subband signals subjected to the frequency spectrum adjustment to obtain subband reconstruction signals corresponding to each high-frequency subband signal.

Wherein the H target high frequency subband signals include the subband reconstruction signals.

Illustratively, the audio processing apparatus performs IMDCT transform and overlap-add operation on MDCT spectral coefficients of each subband to obtain N subband reconstructed signals x 'of the current first frame, with IDMT transform on the processed first subband signals' _l [k][n]Where k represents a subband number ranging from 0.ltoreq.k.ltoreq.63, n represents a sequence number of a sequence sample point within each subband ranging from 0.ltoreq.n.ltoreq.31, and l represents an audio signal frame number.

The expression of IMDCT is as follows:

where w (n) is a window function. For output signals after IMDCT conversionPerforming overlay-add operation to obtain sub-band reconstruction signal x 'of current frame l' _l (n), namely:

it should be noted that, an overall flow block diagram of the audio processing method provided in the embodiment of the present application is shown in fig. 8.

It should be noted that, in the audio processing method provided in the embodiment of the present application, the execution body may be an audio processing apparatus, or a control module in the audio processing apparatus for executing the audio processing method. In this application embodiment, an audio processing device is taken as an example to execute an audio processing method by using the audio processing device, and the audio processing device provided in the embodiment of the application is described.

An embodiment of the present application provides an audio processing apparatus, as shown in fig. 9, including: a processing module 801, a generating module 802, and a synthesizing module 803, wherein:

the processing module 801 is configured to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processing module 801 is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal; the processing module 801 is further configured to perform filtering processing and downsampling processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth; the generating module 802 is configured to generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals obtained by the processing module 801; the processing module 801 is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module 802 based on the high-frequency characteristic information of the first audio signal, to obtain M target high-frequency subband signals; the synthesizing module 803 is configured to synthesize the M target high-frequency subband signals obtained by the processing module 801 to obtain a target audio signal; wherein Y, M is a positive integer.

In the audio processing apparatus provided in this embodiment of the present application, the electronic device may perform resolution enhancement processing on a low-resolution first audio signal (e.g., a wideband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering processing on the second audio signal, so as to filter out a high-frequency signal in the second audio signal, and then perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, and generate M high-frequency subband signals according to low-frequency subband signals in the Y first subband signals, and finally perform spectral adjustment on the M high-frequency subband signals based on high-frequency spectral information of the low-resolution first audio signal, so as to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals to obtain a first audio signal with good reconstructed harmonic characteristic of a high-frequency portion, so as to obtain a high-fidelity and high-expressive high-definition non-speech signal.

Optionally, in this embodiment of the present application, the generating module 802 is specifically configured to perform spectral duplication on all low-frequency subband signals in the Y subband signals to generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is less than or equal to M.

Optionally, in an embodiment of the present application, the audio processing apparatus further includes: an extraction module 804 and a prediction module 805;

the extracting module 804 is configured to perform feature extraction on the first audio signal to obtain low-frequency feature information of the first audio signal; the prediction module 805 is configured to input the low-frequency characteristic information extracted by the extraction module into a preset neural network model to predict high-frequency characteristic information of the first audio signal.

Optionally, in this embodiment of the present application, the processing module 801 is specifically configured to upsample the first audio signal by L times to obtain a second audio signal with a predetermined sampling rate, where the bandwidth of the first audio signal is the same as that of the second audio signal.

Optionally, in this embodiment of the present application, the processing module 801 is further configured to frame a low frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points; the processing module 801 is specifically configured to sequentially perform filtering and downsampling processing on each audio signal frame to obtain N first subband signals corresponding to each audio signal frame; wherein the Y first subband signals include: n first sub-band signals corresponding to each audio signal frame.

Optionally, in this embodiment of the present application, the processing module 801 is specifically configured to perform signal processing on N first subband signals of the first audio signal frame and N first subband signals in the second audio signal frame, to obtain processed N first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames of the X audio signal frames.

The audio processing device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine or a self-service machine, etc., and the embodiments of the present application are not particularly limited.

The audio processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The audio processing device provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to 8, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 10, the embodiment of the present application further provides an electronic device 900, including a processor 901, a memory 902, and a program or an instruction stored in the memory 902 and capable of running on the processor 901, where the program or the instruction is executed by the processor 901 to implement each process of the foregoing embodiment of the audio processing method, and the process may achieve the same technical effect, and for avoiding repetition, a description is omitted herein.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 11 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes, but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, and processor 110.

Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 110 via a power management system such that charge, discharge, and power consumption management functions are performed by the power management system. The electronic device structure shown in fig. 11 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown in the drawings, or may combine some components, or may be arranged in different components, which will not be described in detail herein.

The processor 110 is configured to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processor 110 is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal, perform filtering processing and downsampling processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, and generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals; the processor 110 is further configured to perform spectrum adjustment on the generated M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals to obtain a target audio signal; wherein Y, M is a positive integer.

In the electronic device provided in this embodiment of the present application, the electronic device may perform resolution enhancement processing on a low-resolution first audio signal (e.g., a wideband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering processing on the second audio signal, so as to filter out a high-frequency signal in the second audio signal, and then perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, and generate M high-frequency subband signals according to low-frequency subband signals in the Y first subband signals, and finally perform spectral adjustment on the M high-frequency subband signals based on high-frequency spectral information of the low-resolution first audio signal, so as to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals to obtain a first audio signal with good reconstructed harmonic characteristics of a high-frequency portion, so as to obtain a high-definition and high-performance high-definition audio signal, thereby enhancing the playing effect of the non-speech signal.

Optionally, in this embodiment of the present application, the processor 110 is specifically configured to perform spectral duplication on all low-frequency subband signals in the Y subband signals to generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is less than or equal to M.

Optionally, in this embodiment of the present application, the processor 110 is further configured to perform feature extraction on the first audio signal to obtain low-frequency feature information of the first audio signal, input the low-frequency feature information into a preset neural network model, and predict high-frequency feature information of the first audio signal.

Optionally, in this embodiment of the present application, the processor 110 is specifically configured to upsample the first audio signal by L times to obtain a second audio signal with a predetermined sampling rate, where the bandwidth of the first audio signal is the same as that of the second audio signal.

Optionally, in this embodiment of the present application, the processor 110 is further configured to frame the low frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points, and sequentially perform filtering and downsampling processing on each audio signal frame to obtain N first subband signals corresponding to each audio signal frame; wherein the Y first subband signals include: n first sub-band signals corresponding to each audio signal frame.

Optionally, in the embodiment of the present application, the processor 110 is specifically configured to perform signal processing on N first subband signals of the first audio signal frame and N first subband signals in the second audio signal frame, to obtain processed N first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames of the X audio signal frames.

It should be appreciated that in embodiments of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein. Memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 110 may integrate an application processor that primarily handles operating systems, user interfaces, application programs, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction realizes each process of the above embodiment of the audio processing method, and the same technical effect can be achieved, so that repetition is avoided, and no description is repeated here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used for running a program or an instruction, so as to implement each process of the above embodiment of the audio processing method, and achieve the same technical effect, so that repetition is avoided, and no redundant description is provided herein.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, system-on-chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a non-volatile storage medium, the program product being executed by at least one processor to implement a method as described in the first aspect.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above embodiment method may be implemented by means of software plus necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the protection of the claims, which fall within the protection of the present application.

Claims

1. A method of audio processing, the method comprising:

performing resolution enhancement processing on the first audio signal to obtain a second audio signal;

performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal;

filtering the processed second audio signal to obtain Y subband signals with the same bandwidth, and performing downsampling on the Y subband signals with the same bandwidth to obtain Y first subband signals with the same bandwidth;

generating M high-frequency subband signals according to low-frequency subband signals in Y first subband signals in a spectrum copying or spectrum inversion mode, wherein each low-frequency subband signal in Y first subband signals corresponds to one or more high-frequency subband signals;

Performing frequency spectrum adjustment on the M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals, wherein the high-frequency characteristic information is the signal gain of the M high-frequency subband signals, and the signal gain is determined by the normalized autocorrelation coefficient, gradient index and subband spectrum flatness of the first audio signal;

synthesizing the M target high-frequency subband signals to obtain target audio signals;

wherein the first audio signal comprises at least one of: broadband audio, ultra-broadband audio, full-band audio, Y, M is a positive integer.

2. The method according to claim 1, wherein generating M high frequency subband signals from the low frequency subband signals in the Y first subband signals by spectral replication or spectral inversion comprises:

and performing spectrum copying on all low-frequency subband signals in the Y first subband signals to generate M high-frequency subband signals, wherein one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is smaller than or equal to M.

3. The method of claim 1, wherein the performing spectral adjustment on the M high frequency subband signals based on the high frequency characteristic information of the first audio signal, before obtaining M target high frequency subband signals, further comprises:

Extracting the characteristics of the first audio signal to obtain low-frequency characteristic information of the first audio signal;

inputting the low-frequency characteristic information into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.

4. The method of claim 1, wherein performing resolution enhancement processing on the first audio signal to obtain a second audio signal comprises:

and carrying out L times up-sampling on the first audio signal to obtain a second audio signal with a preset sampling rate, wherein the bandwidth of the first audio signal is the same as that of the second audio signal.

5. The method of claim 1, wherein filtering the processed second audio signal to obtain Y subband signals with the same bandwidth, and downsampling the Y subband signals with the same bandwidth to obtain Y first subband signals with the same bandwidth, comprises:

framing the low-frequency component of the second audio signal to obtain X audio signal frames, wherein each audio signal frame comprises a preset number of sample points;

sequentially carrying out filtering and downsampling on each audio signal frame to obtain N first sub-band signals corresponding to each audio signal frame;

6. The method of claim 5, wherein after framing the low frequency component of the second audio signal to obtain X audio signal frames, the method further comprises:

performing signal processing on N first sub-band signals of the first audio signal frame and N first sub-band signals in the second audio signal frame to obtain N processed first sub-band signals; wherein the first audio signal frame and the second audio signal frame are adjacent ones of the X audio signal frames.

7. An audio processing apparatus, the apparatus comprising: the device comprises a processing module, a generating module and a synthesizing module, wherein:

the processing module is used for carrying out resolution improvement processing on the first audio signal to obtain a second audio signal;

the processing module is further used for performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal;

the processing module is further configured to perform filtering processing on the processed second audio signal to obtain Y subband signals with the same bandwidth, and perform downsampling processing on the Y subband signals with the same bandwidth to obtain Y first subband signals with the same bandwidth;

The generating module is configured to generate M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals obtained by the processing module, where each low-frequency subband signal in the Y first subband signals corresponds to one or more high-frequency subband signals in a spectrum duplication or spectrum inversion manner;

the processing module is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module based on the high-frequency characteristic information of the first audio signal, to obtain M target high-frequency subband signals, where the high-frequency characteristic information is signal gains of the M high-frequency subband signals, and the signal gains are determined by a normalized autocorrelation coefficient, a gradient index and subband spectrum flatness of the first audio signal;

the synthesizing module is used for synthesizing the M target high-frequency subband signals obtained by the processing module to obtain target audio signals;

8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the generating module is specifically configured to perform spectrum duplication on all low-frequency subband signals in the Y first subband signals to generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is smaller than or equal to M.

9. The apparatus of claim 7, wherein the audio processing apparatus further comprises: the device comprises an extraction module and a prediction module;

the extraction module is used for extracting the characteristics of the first audio signal to obtain low-frequency characteristic information of the first audio signal;

the prediction module is used for inputting the low-frequency characteristic information extracted by the extraction module into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.

10. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the processing module is specifically configured to perform L times up-sampling on the first audio signal to obtain a second audio signal with a predetermined sampling rate, where the bandwidth of the first audio signal is the same as that of the second audio signal.

11. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the processing module is further configured to frame the low-frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points;

the processing module is specifically configured to sequentially perform filtering and downsampling processing on each audio signal frame to obtain N first subband signals corresponding to each audio signal frame;

12. The apparatus of claim 11, wherein the device comprises a plurality of sensors,

the processing module is specifically configured to perform signal processing on N first subband signals of the first audio signal frame and N first subband signals in the second audio signal frame, so as to obtain N processed first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent ones of the X audio signal frames.

13. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the audio processing method according to any of claims 1-6.

14. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the audio processing method according to any of claims 1-6.