CN113299313A - Audio processing method and device and electronic equipment - Google Patents

Audio processing method and device and electronic equipment Download PDF

Info

Publication number
CN113299313A
CN113299313A CN202110121348.6A CN202110121348A CN113299313A CN 113299313 A CN113299313 A CN 113299313A CN 202110121348 A CN202110121348 A CN 202110121348A CN 113299313 A CN113299313 A CN 113299313A
Authority
CN
China
Prior art keywords
audio signal
frequency
subband signals
audio
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110121348.6A
Other languages
Chinese (zh)
Other versions
CN113299313B (en
Inventor
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202110121348.6A priority Critical patent/CN113299313B/en
Publication of CN113299313A publication Critical patent/CN113299313A/en
Priority to PCT/CN2022/074795 priority patent/WO2022161475A1/en
Application granted granted Critical
Publication of CN113299313B publication Critical patent/CN113299313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses an audio processing method, an audio processing device and electronic equipment, and belongs to the field of signal processing. The problem of poor playing effect of broadband/full-band non-voice signals can be solved. The method comprises the following steps: carrying out resolution enhancement processing on the first audio signal to obtain a second audio signal; carrying out low-pass filtering processing on the second audio signal to obtain a processed second audio signal; performing signal processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth; generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals; performing spectrum adjustment on the M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals; synthesizing the M target high-frequency sub-band signals to obtain target audio signals; wherein Y, M is a positive integer. The embodiment of the application is applied to the scene of processing the audio.

Description

Audio processing method and device and electronic equipment
Technical Field
The application belongs to the field of signal processing, and particularly relates to an audio processing method and device and electronic equipment.
Background
With the improvement of electronic technology, the performance of electronic equipment is continuously improved, high-definition televisions, earphones, sound boxes, mobile phones and the like can support the playing of high-definition audio, and the requirements of people on the high-definition audio with high fidelity and high expressive force are more urgent.
Generally, audio signals generally include speech signals and non-speech signals (e.g., music signals). In the related art, the electronic device may expand the narrowband speech signal into the wideband speech signal based on the speech signal generation model, so as to reduce the loss of the sound information of the speech signal and improve the fidelity of the speech signal.
However, since the spectral characteristics of the non-speech signal are different from those of the speech signal, and the speech signal generation model in the electronic device is generated based on the spectral characteristics of the speech signal, only an audio signal having the same spectral characteristics as the speech signal can be processed. Therefore, the speech signal generation model in the electronic device cannot be applied to non-speech signals (e.g., music signals, sound signals generated in nature). Therefore, the electronic equipment cannot process the non-voice signal, and the playing effect of the non-voice signal is poor.
Disclosure of Invention
An object of the embodiments of the present application is to provide an audio processing method, which can solve the problem of poor playing effect of broadband/full-band non-speech signals.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides an audio processing method, where the method includes: carrying out resolution enhancement processing on the first audio signal to obtain a second audio signal; carrying out low-pass filtering processing on the second audio signal to obtain a processed second audio signal; performing signal processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth; generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals; performing spectrum adjustment on the M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals; synthesizing the M target high-frequency sub-band signals to obtain target audio signals; wherein Y, M is a positive integer.
In a second aspect, an embodiment of the present application provides an audio processing apparatus, including: a processing module, a generating module and a synthesizing module, wherein:
the processing module is configured to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processing module is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal; the processing module is further configured to perform signal processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth; the generating module is configured to generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals obtained by the processing module; the processing module is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module based on the high-frequency feature information of the first audio signal to obtain M target high-frequency subband signals; the synthesis module is used for synthesizing the M target high-frequency subband signals obtained by the processing module to obtain target audio signals; wherein Y, M is a positive integer.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In a sixth aspect, the present application provides a computer program product stored on a non-volatile storage medium, the program product being executed by at least one processor to implement the method according to the first aspect.
In this embodiment, the electronic device may perform resolution enhancement processing on a first audio signal with low resolution (e.g., a wideband/full-band non-speech signal) to obtain a second audio signal with high resolution, perform low-pass filtering processing on the second audio signal to filter a high-frequency signal in the second audio signal, perform signal processing on the processed second audio signal to obtain a first subband signal with Y bandwidths, generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals, perform spectrum adjustment on the M high-frequency subband signals based on high-frequency spectrum information of the first audio signal with low resolution to obtain M target high-frequency subband signals, synthesize the M target high-frequency subband signals to obtain a first audio signal with well-reconstructed harmonic characteristics of a high-frequency part, therefore, high-definition audio signals with high fidelity and high expressive force can be obtained, and the playing effect of non-voice signals is improved.
Drawings
Fig. 1 is a flowchart of an audio processing method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a waveform of an audio signal provided by an embodiment of the present application;
FIG. 3 is a second schematic diagram of a waveform of an audio signal provided by an embodiment of the present application;
fig. 4 is a schematic diagram of spectrum replication/flipping provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of a neural network topology provided by an embodiment of the present application;
fig. 6 is an amplitude-frequency response curve of a lowpass prototype filter and a PQMF analysis filter bank provided in an embodiment of the present application;
FIG. 7 is a diagram of a PQMF subband analysis/synthesis filter bank according to an embodiment of the present application;
fig. 8 is a block diagram of a high definition audio generation system provided by an embodiment of the present application;
fig. 9 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application;
fig. 10 is a second schematic structural diagram of an audio processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it should be understood that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any inventive effort, shall fall within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced otherwise than as shown or described herein and the terms "first," "second," and the like are used generically and do not limit the number of terms to which they may be applied, e.g., the first term may refer to one or more than one term. Further, in the specification and claims, "and/or" means at least one of the connected objects, the character "/" generally means a relationship that preceding and succeeding associated objects are an "or".
The audio processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
An embodiment of the present application provides an audio processing method, which may be applied to an audio processing apparatus, and illustrates a flowchart of the audio processing method provided in the embodiment of the present application, and as shown in fig. 1, the audio processing method provided in the embodiment of the present application may include the following steps 101 to 106:
step 101: and carrying out resolution enhancement processing on the first audio signal to obtain a second audio signal.
In an embodiment of the present application, the first audio signal includes at least one of: wideband audio (16kHz samples), ultra wideband audio (32kHz samples), and full band audio (44.1kHz samples, 48kHz samples).
In an embodiment of the present application, a resolution of the first audio signal is smaller than a resolution of the second audio signal.
It should be noted that the resolution of an audio signal is determined by a sampling rate (Sample rate) and a Bit Depth (Bit Depth), and for two audio signals having the same Bit Depth, the resolution of an audio signal having a high sampling rate is higher than the resolution of an audio signal having a low sampling rate, and thus the resolution of an audio signal can be increased by increasing the sampling rate of an audio signal. That is, the sampling rate of the first audio signal is smaller than the sampling rate of the second audio signal. For example, the sampling rate of the second audio signal may be 96 kHZ.
In the embodiment of the present application, since the first audio signal is generally a wideband/ultra-wideband/full-band audio, and the playing effect is poor, the first audio signal needs to be adjusted to a high definition audio, however, the requirement for generating a high definition audio film source is high. Therefore, the sampling rate and the coding format of the digital audio chip source are not changed, and the sampling rate of the first audio signal can be increased under the condition that the network transmission bandwidth is not increased, so that the sampling rate of high-definition audio is achieved, and therefore broadband/ultra-wideband/full-band audio can be adjusted to be high-definition audio (96kHz sampling).
In general, both the up-sampling and the down-sampling are to resample a digital signal, specifically, a resampling sampling rate is compared with a sampling rate at which the digital signal (for example, a sampling from an analog signal) is originally obtained, and if the resampling sampling rate is greater than the sampling rate at which the digital signal is originally obtained, the up-sampling is performed, otherwise, the down-sampling is performed.
It is to be understood that the resolution enhancement processing described above can be considered as: the first audio signal is up-sampled. That is, the step 101 may include the following steps 101 a:
step 101 a: and performing L-time upsampling on the first audio signal to obtain a second audio signal with a preset sampling rate. Wherein L is greater than 0.
Illustratively, assuming that the first audio signal is a full-band audio signal having a sampling rate of 48kHz, the sampling rate of the full-band audio signal (48kHz) is converted to the sampling rate of high definition audio (96kHz) with 2 times up-sampling (i.e., resampling) thereof.
Example 1, a specific implementation of the above step 101a is described by taking an example of generating 96kHz sampled high definition audio from 48kHz sampled full band audio (i.e., the first audio signal). A time domain waveform diagram of the audio signal of the full band audio is shown in (a) in fig. 2, and a frequency spectrum diagram of the audio signal of the full band audio is shown in (b) in fig. 2. For example, assuming that the full-band audio has a sampling rate of 48kHZ and an effective bandwidth of 24kHZ, the audio processing device may up-sample the full-band audio input by a factor of 2, resulting in a 96kHZ sampled signal (i.e., the second audio signal).
It should be noted that, since the up-sampling process of the audio signal increases the bandwidth of the audio signal, the bandwidth of the second audio signal is larger than that of the first audio signal.
Step 102: and carrying out low-pass filtering processing on the second audio signal to obtain a processed second audio signal.
In the embodiment of the present application, the signal processing apparatus may filter the high frequency component (i.e., the high frequency signal) in the second audio signal by the low pass filter, and only retain the low frequency component (i.e., the low frequency signal) of the second audio signal. It should be noted that the low-pass filtering can be simply considered as: a frequency point (i.e., a cutoff frequency) is set that fails to pass when the signal frequency is above this frequency and all values are assigned to 0 when the frequency domain is above this cutoff frequency.
Example 2, the above-described signal processing for the second audio signal is described in conjunction with the above-described example 1. After obtaining the 96kHz sampled signal, the audio processing device may filter the audio signal through a low pass filter with a cut-off frequency of 24kHz to remove the image frequency component of the up-sampled high frequency portion. The waveform and spectrogram of the up-sampled and low-pass filtered audio signal are shown in fig. 3 (a) and fig. 3 (b), respectively.
It should be noted that the bandwidth of the first audio signal is the same as the bandwidth of the processed second audio signal. For example, referring to FIG. 3, the effective bandwidth of the processed audio signal is still maintained at 24kHz with the sample rate of the processed audio signal being 96 kHz.
It should be noted that the bandwidth of the audio signal is defined as: the audio signal has a frequency range where the sampling frequency (i.e., the sampling rate) of the signal is 2 times the bandwidth of the signal according to the nyquist's law, i.e., the bandwidth of the signal is 1/2 times the sampling frequency of the signal. Assume that the first audio signal has a sampling rate of 48kHz and a bandwidth of 48kHz/2, i.e. 24 kHz.
Step 103: and performing signal processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth.
Wherein Y is a positive integer.
In the embodiment of the present application, the Y first subband signals include a high frequency subband signal and a low frequency subband signal.
Optionally, in this embodiment of the application, the signal processing on the processed second audio signal may be: and carrying out filtering processing and downsampling processing on the processed second audio signal.
Illustratively, the signal processing includes PQMF subband filtering processing and downsampling processing. Further, the audio processing apparatus may first divide the input processed second audio signal into Y subband signals with equal bandwidth through a PQMF subband filter bank, and then down-sample each subband signal to obtain Y first subband signals.
It should be noted that the PQMF subband analysis performs time-frequency transformation on the original signal, and aims to obtain a plurality of subband signals which reflect high and low frequency correlation, have good harmonic characteristics, and are convenient to analyze. At an analysis end, an input time domain signal is divided into a plurality of sub-band signals with equal bandwidth through a PQMF analysis filter bank, and then each sub-band signal is subjected to down-sampling. At the synthesis end, each sub-band signal is up-sampled first, and then the up-sampled sub-band signal is converted into a time-domain signal through a PQMF synthesis filter bank.
It should be noted that the division of the high-frequency subband signal and the low-frequency subband signal in the Y first subband signals is determined according to the frequency range of the high-frequency component and the low-frequency component of the processed second audio signal. That is, the first subband signal having a signal frequency within the frequency range of the low frequency component is a low frequency subband signal; the first subband signal having a signal frequency in the frequency range of the high frequency component is a high frequency subband signal.
Step 104: and generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals.
In the embodiment of the present application, the audio processing apparatus may generate one or more high frequency subband signals from one low frequency subband signal, that is, each low frequency subband signal of the Y first subband signals corresponds to one or more high frequency subband signals, respectively, and Y is less than or equal to M.
In the embodiment of the present application, a high-frequency subband signal spectrum may be generated from the spectrum of a low-frequency subband signal among the Y first subband signals by using a high-frequency generator to generate a high-frequency subband signal.
For example, the method for generating M high frequency subband signals by the audio processing apparatus may include any one of the 4 methods shown in table 1.
Figure BDA0002922375680000071
TABLE 1 high-frequency subband frequency spectrum generation method
As can be seen from table 1 above, the spectrum processing type corresponding to the above methods 1 and 2 is spectrum replication, the spectrum processing type corresponding to the above methods 3 and 4 is spectrum inversion, and the difference between the spectrum replication and the spectrum inversion can be referred to as fig. 4.
Step 105: and performing spectrum adjustment on the M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals.
In this embodiment, the high frequency characteristic information may be signal gains of the M high frequency subband signals.
In this embodiment, the audio processing apparatus may adjust the amplitudes of the M high-frequency subband signals by using an envelope adjuster, so as to obtain M reconstructed high-frequency subband signals (i.e., the M target high-frequency subband signals).
Step 106: and synthesizing the M target high-frequency subband signals to obtain a target audio signal.
Wherein M is a positive integer.
In this embodiment, the audio processing apparatus may synthesize the M target high-frequency subband signals by using a PQMF synthesis filter bank, so as to obtain a target audio signal.
In the audio processing method provided in this embodiment of the application, the electronic device may perform resolution enhancement processing on a first audio signal with low resolution (e.g., a wideband/full-band non-speech signal) to obtain a second audio signal with high resolution, perform low-pass filtering processing on the second audio signal to filter a high-frequency signal in the second audio signal, perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals, perform spectrum adjustment on the M high-frequency subband signals based on high-frequency spectrum information of the first audio signal with low resolution to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals, the harmonic characteristics of the high-frequency part are obtained to obtain the well-reconstructed first audio signal, so that a high-definition audio signal with high fidelity and high expressive force can be obtained, and the playing effect of the non-voice signal is improved.
Optionally, in this embodiment of the present application, since there is a correlation between the high frequency subband signal and the low frequency subband signal of the audio signal, the corresponding high frequency subband signal may be generated according to the low frequency subband signal in the processed second audio signal.
Illustratively, the step 104 may include the following steps 104 a:
step 104 a: and performing spectrum replication on all low-frequency subband signals in the Y subband signals to generate M high-frequency subband signals.
For example, the audio processing apparatus may generate the frequency spectrums of the M high frequency subband signals by using the method of spectrum replication in table 1 described above. For example, the audio processing apparatus may copy the upper half of the frequency spectrum of the low frequency subband signal a plurality of times to generate the frequency spectrums of the M high frequency subband signals to generate the M high frequency subband signals.
In this way, the audio processing apparatus may obtain the high frequency component in the processed second audio signal based on the low frequency component in the processed second audio signal, so as to preliminarily obtain the frequency spectrum of the processed second audio signal.
Optionally, in this embodiment of the present application, the audio processing apparatus may extract the low-frequency features of the original audio signal based on a strong correlation between the low-frequency features and the high-frequency spectral envelope of the audio signal, so as to predict the high-frequency features of the audio signal according to the low-frequency features.
Before the step 105, the audio processing method provided by the embodiment of the present application further includes the following step a1 and step a 2:
step A1: and performing feature extraction on the first audio signal to obtain low-frequency feature information of the first audio signal.
Step A2: and inputting the low-frequency characteristic information into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.
Illustratively, the low frequency characteristic information includes at least one of: normalized autocorrelation coefficient (x) of first audio signalacf) Gradient index xgiSub-band spectral flatness (x)sfm)。
It should be noted that the low-frequency feature information may be regarded as a feature parameter of the first audio signal, and the feature parameter needs to be selected in consideration of the following three principles:
(1) the low-frequency characteristic parameters have stronger correlation with the high-frequency spectrum envelope;
(2) there is good independence between the feature components;
(3) the feature components are easy to calculate.
Based on the above principle, the embodiment of the present application selects the above 3 feature parameters to describe the audio characteristics from the perspective of the time domain and the frequency domain, respectively. In practical applications, other characteristic parameters with feasibility may also be selected, which is not limited in this embodiment of the present application.
Further details regarding the above-described three frequency characteristic information (i.e., characteristic parameters) are provided below.
For example, the predetermined neural network may be a DNN neural network. It should be noted that the DNN neural network is a one-way propagation multi-layer forward network, which can abstract and model complex data efficiently. The DNN neural network topology is shown in fig. 5, which is divided into three classes, an input layer, a hidden layer, and an output layer. Typically, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. Full connection is realized among all layers of neurons, and no connection exists among the neurons in the same layer.
Illustratively, the DNN neural network described above is used to establish a non-linear mapping from low frequency features of the first audio signal to a high frequency spectral envelope of the first audio signal.
For example, the input of the DNN neural network is high-frequency characteristic information of the first audio signal, including normalized autocorrelation coefficient, gradient index, and subband spectrum flatness, and the output of the DNN neural network is a signal gain (denoted by G) of the high-frequency subband signal of the first audio signal.
In this way, the audio processing apparatus may predict the high-frequency feature information of the first audio signal based on the low-frequency feature information of the first audio signal, and adjust the spectrum (i.e., the spectral envelope) of the processed second audio signal by the high-frequency feature information.
Optionally, in this embodiment of the application, the audio processing apparatus may perform framing on the processed second audio signal, and then perform audio signal processing on a per audio signal frame basis, so as to reduce the overall unsteady and time-varying influence of the speech signal.
Illustratively, the step 103 may include the following steps 103a and 103 b:
step 103 a: and framing the processed second audio signal to obtain X audio signal frames.
Step 103 b: and sequentially filtering and downsampling each audio signal frame to obtain N first subband signals corresponding to each audio signal frame.
Wherein the Y first subband signals include: and N first sub-band signals corresponding to each audio signal frame.
Illustratively, each audio signal frame includes a first predetermined number of sample points. For example, it may be preset that each signal frame includes 2048 sample points.
Illustratively, X of the X audio signal frames is determined according to a sampling rate of the second audio signal and a number of sample points included in each of the audio signal frames.
For example, the audio processing apparatus may number the obtained X audio signal frames, and each audio signal frame may correspond to a sequence number, for example, assuming that the processed second audio signal includes l audio signal frames, the l audio signal frames may be numbered from 1 to l.
For example, a 96kHz sampled high definition audio is generated by a 48kHz sampled full band audio (i.e., the first audio signal). With reference to the above example 1 and example 2, after the first audio signal is subjected to upsampling and low-pass filtering to obtain the processed second audio signal, the sampling rate of the processed second audio signal is 96kHz, and the processed second audio signal may be divided into 46 audio signal frames (i.e., X audio signal frames) of 2048 sample points per frame according to the sampling rate of 96 kHz.
For example, the audio processing apparatus may sequentially perform the filtering process and the down-sampling process on each audio signal frame according to the timing information of the X audio signal frames.
Illustratively, there is one index per each of the N first subband signals, one index corresponding to each first subband signal.
Illustratively, the N first subband signals include P low frequency subband signals and Q high frequency subband signals. Wherein P and Q are positive integers.
Illustratively, the number of sub-band signals (i.e., N) corresponding to each audio signal frame is predetermined, and further, the number of sub-band signals is determined according to parameters set for the PQMF sub-band filter bank. For example, the number of subbands in the PQMF subband filter bank is set to 64, and after each audio signal frame is processed by the PQMF subband filter bank, 64 subband signals corresponding to each audio signal frame can be obtained.
For example, with respect to step 103b, the audio processing apparatus may first perform PQMF filtering processing on each audio signal frame to obtain N sub-band signals corresponding to each audio signal frame, and then perform downsampling on the N sub-band signals to obtain N first sub-band signals. Further, the downsampling process may be an N-fold downsampling process.
Illustratively, each of the N first subband signals comprises a second predetermined number of sample points. Further, the second predetermined number is determined according to a sampling multiple of the down-sampling.
Illustratively, the second predetermined number of sample points in each of the first subband signals is arranged in time sequence in the frequency range in which the first subband signal is located.
Example 3, a 96kHz sampled high definition audio is generated by a 48kHz sampled full band audio (i.e., the first audio signal). After the processed second audio signal is framed, assuming that each signal frame includes 2048 sample points, 64 subband signals are obtained after filtering by a PQMF analysis filter bank, and then 64 times of downsampling is performed on each subband signal to obtain 64 first subband signals, where each first subband signal includes 32 sample points. Wherein, the 0 th to 31 th sub-band signals are low frequency sub-band signals, and the 32 th to 63 th sub-band signals are high frequency sub-band signals.
It should be noted that the N first subband signals corresponding to each audio signal frame respectively belong to N different frequency ranges (i.e., frequency bands) of the second audio signal. For example, assuming that each audio signal frame corresponds to 64 first subband signals, the second audio signal is divided into 64 frequency ranges according to the signal frequency, and each first subband signal belongs to one of the 64 frequency ranges. Thus, the N first subband signals obtained have frequency characteristics that can reflect signals, and have good harmonic characteristics.
For convenience of description, the PQMF analysis filter bank output signal, i.e., the above-mentioned N first subband signals, will be denoted as xl[k][n]Wherein k represents the sub-band sequence number, the range of k is more than or equal to 0 and less than or equal to 63, n represents the time sequence number of the time sequence sampling point in each sub-band, the range of n is more than or equal to 0 and less than or equal to 31, and l represents the sequence number of the current audio signal frame.
It should be noted that, for each audio signal frame in the X audio signal frames, after being filtered by the PQMF analysis filter bank, the output sub-band signal (i.e., the first sub-band signal) forms an X [ k ] [ n ] matrix, where k denotes the transformed sub-band sequence number (sequence number of the first sub-band signal), and n denotes the sequence number of the transformed sub-band time-series sample point (i.e., time-series sample point of the first sub-band signal). And x [ k ] [ n ] has dual time and frequency resolutions, and has frequency distribution characteristics of a frequency domain and waveform characteristics of a time domain.
For ease of understanding, the expressions of the PQMF analysis filter bank and the synthesis filter bank are described below.
Illustratively, the mathematical expressions of the PQMF analysis filter bank and the synthesis filter bank used in the embodiments of the present application are as follows:
an analysis filter:
Figure BDA0002922375680000121
and (3) synthesizing a filter:
Figure BDA0002922375680000131
in the equations (1) and (2), N is the number of first subband signals, p (N) is a low-pass prototype filter, the normalized cutoff frequency of the low-pass prototype filter is pi/(2N), the filter length is M, M is LN, L is an arbitrary positive integer, k is 0,1, …, N-1, and represents a subband sequence number, and N denotes a sequence number of a transformed subband time series sample point.
For example, the number of subbands in the PQMF subband filter bank may be set to N-64, the order of the low-pass prototype filter p (N) may be set to M-768, and the filter stopband attenuation may be designed to-90 dB.
Fig. 6 (a) shows the amplitude-frequency response curve of the low-pass prototype filter p (n), and fig. 6 (b) shows the amplitude-frequency response curve of the PQMF analysis filter bank.
FIG. 7 is a PQMF subband analysis/synthesis filter bank schematic diagram, H in FIG. 7k(z) is hk(n) Z transformation, Fk(z) is Fk(n) Z-transform.
It should be noted that the analysis filter bank is configured to divide an input time-domain signal into N subband signals, and the synthesis filter bank is configured to synthesize the N subband signals into one time-domain signal.
Further optionally, in combination with the step 103b, the step 104a may include the following step 104a 1:
step 104a 1: and generating at least one high-frequency subband signal according to the low-frequency subband signal in the N first subband signals of each audio signal frame.
Illustratively, the number of high frequency subband signals that are ultimately generated for each frame of audio signal is the same.
Example 4, in combination with example 3 above, after the audio processing apparatus obtains 64 first subband signals corresponding to each audio signal frame, when the audio processing apparatus performs the copy, the audio processing apparatus may select 16 low-frequency subband signals with subband indexes of 15 to 30 (i.e., corresponding to the low-frequency source subband numbers in table 2), copy the spectral coefficients of the low-frequency subband signals 2 times, and generate 32 high-frequency subband spectral coefficients (i.e., corresponding to the high-frequency target subband numbers in table 2), where the correspondence relationship when copying the frequency bands is as shown in table 2.
Low frequency source sub-band signal High frequency target subband signal
15 32、48
16 33、49
17 34、50
18 35、51
19 36、52
20 37、53
21 38、54
22 39、55
23 40、56
24 41、57
25 42、58
26 43、59
27 44、60
28 45、61
29 46、62
30 47、63
TABLE 2 high and low frequency band duplication correspondence table
In table 2, "low-frequency source subband number" is the number of the low-frequency subband signal, and "high-frequency target subband number" is the number of the high-frequency subband signal.
Further optionally, in an embodiment of the present application, the step a1 includes the following step B1:
step B1: and performing feature extraction on P low-frequency subband signals in the N first subband signals in each audio signal frame to obtain low-frequency feature information of each audio signal frame.
For example, the audio processing apparatus may calculate a normalized autocorrelation coefficient and a gradient index of the first audio signal according to the number of samples of the first audio signal and the order of the autocorrelation function.
The following describes the definition of the low frequency feature information in detail:
(1) the normalized autocorrelation coefficients are used to describe the correlation of the signal in the time domain. Let x (N) be the input audio signal, N be the number of samples per frame, and M be the order of the autocorrelation function (M is 1,2, …, M is the maximum autocorrelation order), the normalized autocorrelation coefficients are calculated as follows:
Figure BDA0002922375680000151
(2) the above gradient index is used to distinguish the harmonic and noise characteristics of the audio signal, which is defined as the sum of the gradient amplitudes of the audio signal in each direction of change, i.e.:
Figure BDA0002922375680000152
wherein the variable ψ (n) is an indication function of the direction of change of the signal:
Figure BDA0002922375680000153
where sign (x) is a sign function defined as:
Figure BDA0002922375680000154
where E is the total energy of the input signal of the current frame:
Figure BDA0002922375680000155
(3) the above-described subband spectral flatness is used to distinguish the tonal and noise characteristics of the in-band audio signal. The more flat the subband spectrum, the more tonal components are present in the subband spectrum. Conversely, the more noise components are represented in the subband spectrum. It is defined as the ratio of the geometric mean to the algebraic mean of all the spectra (MDTC spectral coefficients) within each low frequency PQMF subband.
The following further describes the low frequency characteristic information in a specific example, with reference to the definition of the low frequency characteristic information.
For example, the audio processing apparatus may obtain a spectral coefficient of each of the P low frequency subband signals to calculate a subband spectral flatness of each low frequency subband signal.
For example, the low-frequency feature information of the first audio signal may be a group of 64-dimensional feature vectors of each audio signal frame
Figure BDA0002922375680000156
Figure BDA0002922375680000157
Example 5, a 96kHz sampled high definition audio is generated by a 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each signal frame corresponds to 64 subband signals (first subband signals), the audio processing apparatus may acquire spectral coefficients of 0 to 31 subband signals therein, and calculate a subband spectral flatness of each of the 0 to 31 subband signals.
Note that, when feature extraction is performed, the maximum autocorrelation order of the normalized autocorrelation coefficient may be set to M — 31, and the feature dimension in the embodiment of the present application is set as shown in table 3.
Figure BDA0002922375680000161
TABLE 3 feature names and dimensions
Further optionally, in this embodiment of the present application, in combination with the step B1, the step a2 includes the following step B2:
step B2: and inputting the low-frequency characteristic information of each audio signal frame into a preset neural network model, and predicting the high-frequency characteristic information of each audio signal frame.
For example, the high frequency characteristic information of each audio signal frame may be signal gains of the H high frequency subband signals.
For example, assuming that the kth high frequency subband signal of the M high frequency subband signals is generated from the jth low frequency subband signal of the low frequency subband signals, the subband gain G [ k ] of the kth high frequency subband is defined as:
Figure BDA0002922375680000162
en in formula (9)kFor the kth high-frequency subband spectral coefficient total energy, EnjThe total energy of the jth PQMF sub-band MDCT spectral coefficient at low frequency.
It should be noted that the audio signal is time-series "serialized" data, and the preceding and following signals are related. To be able to exploit its context relevance, the DNN neural network (i.e., DNN model) employs framing to take into account the effect of context-related information on the current frame. Specifically, assume that the current frame signal is extracted as a feature parameter vector
Figure BDA0002922375680000171
When splicing frames, selecting m frames to form a superframe feature vector
Figure BDA0002922375680000172
As an input to the DNN model,
Figure BDA0002922375680000173
is represented as follows:
Figure BDA0002922375680000174
for example, in order to fully utilize the context correlation of the audio signal (i.e., the correlation between a plurality of consecutive audio signal frames), the audio processing apparatus may adopt a frame splicing strategy to input a plurality of audio signal frames in the DNN neural network after obtaining the low-frequency feature information of each audio signal frame. E.g. when framing it is forward andselecting 3 frames backward, and forming a superframe feature vector by a total of 7 frame feature vectors including the current frame feature
Figure BDA0002922375680000175
As input to the DNN model, its dimension is 64 × 7 — 448, i.e.:
Figure BDA0002922375680000176
example 6, a 96kHz sampled high definition audio is generated by a 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each audio signal frame corresponds to 64 subband signals (first subband signals), wherein the subbands 32-63 are high-frequency subband signals, after each audio signal frame is processed by the DNN neural network, the signal gain of the output high-frequency subband signal is obtained
Figure BDA0002922375680000177
Is a 32-dimensional feature vector, and the mathematical expression thereof is as follows:
Figure BDA0002922375680000178
exemplarily, the hyper-parameter settings of the DNN neural network described above are shown in table 4.
Figure BDA0002922375680000179
TABLE 4 hyper-parameters of the DNN neural model
Further optionally, in this embodiment of the present application, the step 105 includes the following steps 105 a:
step 105 a: and according to the high-frequency characteristic information of each audio signal frame, performing spectrum adjustment on H high-frequency subband signals in each audio signal frame to obtain H target high-frequency subband signals.
Wherein the M target high frequency subband signals include H target high frequency subband signals of each of the audio signal frames.
Exemplarily, let the k high frequency sub-band signal of the H high frequency sub-band signals obtained by the high frequency generator be
Figure BDA0002922375680000181
The total energy is
Figure BDA0002922375680000182
Let the k-th high-frequency subband gain obtained by the envelope predictor be G [ k ]]The k-th reconstructed high frequency subband signal (i.e., the target high frequency subband signal) obtained by the envelope adjuster is X [ k ]][m]Then, there are:
Figure BDA0002922375680000183
where N is the frame length, k, of the MDCT coefficient for a frame of the PQMF sub-bandlAnd khRespectively, the start index and the end index of the high frequency PQMF subband.
Example 7, a 96kHz sampled high definition audio is generated by a 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each audio signal frame corresponds to 64 sub-band signals (first sub-band signals), wherein the sub-bands 32-63 are high frequency sub-band signals, let the high frequency generator obtain the kth high frequency sub-band signal as
Figure BDA0002922375680000184
The total energy is
Figure BDA0002922375680000185
Let the k-th high-frequency subband gain obtained by the envelope predictor be G [ k ]]The k-th reconstructed high-frequency sub-band signal obtained by the envelope adjuster is X [ k ]][m]Then, there are:
Figure BDA0002922375680000186
further optionally, in this embodiment of the application, after the processed second audio signal is framed, the audio signal at the boundary between two adjacent frames may generate a large amplitude difference, so that the audio signal is discontinuous, and further noise is generated. To eliminate this noise, the X audio signal frames may be denoised.
After step 103a, the signal processing method according to the embodiment of the present application further includes the following step C1:
step C1: and performing signal processing on N first sub-band signals in two adjacent audio signal frames in the X audio signal frames to obtain the processed N first sub-band signals.
Illustratively, the processed first subband signal includes a low frequency subband signal in each of the audio signal frames.
Illustratively, the signal processing may include an MDCT transform. Further, in the case of performing MDCT transform, two first subband signals with the same frequency band in the two adjacent audio signal frames may be sequentially obtained, and then, the two first subband signals may be subjected to windowing and MDCT transform to obtain one first subband signal with MDCT spectral coefficients (i.e., frequency spectrum).
For convenience of subsequent description, the two first subband signals with the same frequency band in the two adjacent audio signal frames are denoted as two related subband signals.
Furthermore, each sub-band signal includes N sample points, and when performing MDCT transform, 2N sample points are formed by combining the input sequence of the first audio signal frame (i.e., x (N)) and N sample points of the input sequence of the first audio signal frame, and then the signals of the 2N sample points are windowed, and then MDCT transform is performed on the windowed signals to obtain MDCT spectral coefficients of the N sample points.
The expression for MDCT is as follows:
Figure BDA0002922375680000191
illustratively, when windowing the signal, the window function selects a sine window, which is defined as:
Figure BDA0002922375680000192
example 8, a 96kHz sampled high definition audio is generated by a 48kHz sampled full band audio (i.e., the first audio signal). Assuming that each audio signal frame corresponds to 64 subband signals (first subband signals), wherein each subband signal comprises 32 sample points, after windowing and MDCT transforming the above-mentioned related two subband signals, each subband signal obtains MDCT spectral coefficients of 32 sample points, denoted Xl[k][m]Where k denotes a subband number in the range 0. ltoreq. k.ltoreq.63, m denotes an MDCT spectrum number in the range 0. ltoreq. m.ltoreq.31, and l denotes an audio signal frame number.
Further optionally, in this embodiment of the application, in combination with the step 103a, the step 106 includes the following steps 106a and 106 b:
step 106 a: and synthesizing the H target high-frequency subband signals in each audio signal frame to obtain a fourth audio signal corresponding to each audio signal frame.
Step 106 b: and synthesizing the fourth audio signal corresponding to each audio signal frame to obtain the target audio signal.
For example, the audio processing apparatus may synthesize H target high-frequency subband signals in each audio signal frame through upsampling and filtering processing, so as to obtain a fourth audio signal corresponding to each audio signal frame.
Further, in the case of synthesizing the H target high frequency subband signals in each of the audio signal frames, the audio processing apparatus first up-samples each subband signal by N times, and then converts the up-sampled subband signals into time-domain signals by the PQMF synthesis filter bank.
The mathematical expressions of the PQMF synthesis filter bank used in the embodiments of the present application have been described above, and are not described herein again.
Further alternatively, in the embodiment of the present application, in the case of performing MDCT transform on the above-mentioned N first sub-band signals, the audio processing apparatus may perform an inverse MDCT transform (i.e., IMDCT) transform on the spectrally modified H high-frequency sub-band signals to restore the sub-band signals in each audio signal frame.
With reference to the step 103a and the step C1, after performing spectrum adjustment on the H high-frequency subband signals in each audio signal frame in the step 105a, the audio signal processing method according to the embodiment of the present application further includes the following step D1:
step D1: and performing IMDCT (inverse discrete cosine transform) on the H high-frequency sub-band signals subjected to the frequency spectrum adjustment to obtain a sub-band reconstruction signal corresponding to each high-frequency sub-band signal.
Wherein the H target high frequency subband signals include the subband reconstruction signal.
For example, in the case of performing IDMT transform on the processed first subband signal, the audio processing apparatus performs IMDCT transform and overlap-add operation on the MDCT spectral coefficients of each subband to obtain N subband reconstructed signals x 'of the current l-th frame'l[k][n]Where k denotes the subband sequence number, where k is in the range of 0 ≦ k ≦ 63, n denotes the sequence number of the time sequence sample in each subband, where n is in the range of 0 ≦ n ≦ 31, and l denotes the audio signal frame number.
The expression of IMDCT is as follows:
Figure BDA0002922375680000201
where w (n) is a window function. For output signal after IMDCT transformation
Figure BDA0002922375680000202
Executing overlay-add operation to obtain sub-band reconstruction signal x 'of current frame l'l(n) is:
Figure BDA0002922375680000211
it should be noted that, an overall flow chart of the audio processing method provided in the embodiment of the present application is shown in fig. 8.
It should be noted that, in the audio processing method provided in the embodiment of the present application, the execution main body may be an audio processing apparatus, or a control module in the audio processing apparatus for executing the audio processing method. In the embodiment of the present application, an audio processing apparatus executing an audio processing method is taken as an example to describe the audio processing apparatus provided in the embodiment of the present application.
An embodiment of the present application provides an audio processing apparatus, as shown in fig. 9, the apparatus includes: a processing module 801, a generating module 802 and a synthesizing module 803, wherein:
the processing module 801 is configured to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processing module 801 is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal; the processing module 801 is further configured to perform filtering processing and downsampling processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth; the generating module 802 is configured to generate M high-frequency subband signals according to a low-frequency subband signal of the Y first subband signals obtained by the processing module 801; the processing module 801 is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module 802 based on the high-frequency feature information of the first audio signal, so as to obtain M target high-frequency subband signals; the synthesis module 803 is configured to synthesize the M target high-frequency subband signals obtained by the processing module 801 to obtain a target audio signal; wherein Y, M is a positive integer.
In the audio processing apparatus provided in this embodiment of the application, the electronic device may perform resolution enhancement processing on a first audio signal with low resolution (e.g., a wideband/full-band non-speech signal) to obtain a second audio signal with high resolution, perform low-pass filtering processing on the second audio signal to filter a high-frequency signal in the second audio signal, perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals, perform spectrum adjustment on the M high-frequency subband signals based on high-frequency spectrum information of the first audio signal with low resolution to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals, the harmonic characteristics of the high-frequency part are obtained to obtain the well-reconstructed first audio signal, so that a high-definition audio signal with high fidelity and high expressive force can be obtained, and the playing effect of the non-voice signal is improved.
Optionally, in this embodiment of the application, the generating module 802 is specifically configured to perform spectrum replication on all low-frequency subband signals in the Y subband signals, so as to generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is less than or equal to M.
Optionally, in this embodiment of the application, the audio processing apparatus further includes: an extraction module 804 and a prediction module 805;
the extracting module 804 is configured to perform feature extraction on the first audio signal to obtain low-frequency feature information of the first audio signal; the prediction module 805 is configured to input the low-frequency feature information extracted by the extraction module into a preset neural network model, and predict high-frequency feature information of the first audio signal.
Optionally, in this embodiment of the application, the processing module 801 is specifically configured to perform L-time upsampling on the first audio signal to obtain a second audio signal with a predetermined sampling rate, where bandwidths of the first audio signal and the second audio signal are the same.
Optionally, in this embodiment of the application, the processing module 801 is further configured to perform framing on the low-frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points; the processing module 801 is specifically configured to perform filtering and downsampling on each audio signal frame in sequence to obtain N first subband signals corresponding to each audio signal frame; wherein the Y first subband signals include: n first subband signals corresponding to each audio signal frame.
Optionally, in this embodiment of the application, the processing module 801 is specifically configured to perform signal processing on N first subband signals of a first audio signal frame and N first subband signals of a second audio signal frame, so as to obtain N processed first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames in the X audio signal frames.
The audio processing device in the embodiment of the present application may be a device, and may also be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The audio processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.
The audio processing apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiments in fig. 1 to fig. 8, and is not described herein again to avoid repetition.
Optionally, as shown in fig. 10, an electronic device 900 is further provided in this embodiment of the present application, and includes a processor 901, a memory 902, and a program or an instruction stored in the memory 902 and executable on the processor 901, where the program or the instruction is executed by the processor 901 to implement each process of the foregoing audio processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 11 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 100 includes, but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.
Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 11 does not constitute a limitation of the electronic device, and the electronic device may include more or fewer components than those shown, or combine some components, or arrange different components, and thus, the description thereof is omitted.
The processor 110 is configured to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processor 110 is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal, perform filtering processing and downsampling processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, and generate M high-frequency subband signals according to a low-frequency subband signal of the Y first subband signals; the processor 110 is further configured to perform spectrum adjustment on the generated M high-frequency subband signals based on the high-frequency feature information of the first audio signal to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals to obtain a target audio signal; wherein Y, M is a positive integer.
In the electronic device provided in this embodiment of the application, the electronic device may perform resolution enhancement processing on a first audio signal with low resolution (e.g., a wideband/full-band non-speech signal) to obtain a second audio signal with high resolution, perform low-pass filtering processing on the second audio signal to filter out a high-frequency signal in the second audio signal, perform signal processing on the processed second audio signal to obtain Y first subband signals with the same bandwidth, generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals, perform spectrum adjustment on the M high-frequency subband signals based on high-frequency spectrum information of the first audio signal with low resolution to obtain M target high-frequency subband signals, and synthesize the M target high-frequency subband signals, the harmonic characteristics of the high-frequency part are obtained to obtain a well-reconstructed first audio signal, so that a high-definition audio signal with high fidelity and high expressive force can be obtained, and the playing effect of the non-voice signal is improved.
Optionally, in this embodiment of the application, the processor 110 is specifically configured to perform spectrum replication on all low-frequency subband signals in the Y subband signals, so as to generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is less than or equal to M.
Optionally, in this embodiment of the application, the processor 110 is further configured to perform feature extraction on the first audio signal to obtain low-frequency feature information of the first audio signal, input the low-frequency feature information into a preset neural network model, and predict high-frequency feature information of the first audio signal.
Optionally, in this embodiment of the application, the processor 110 is specifically configured to perform L-time upsampling on the first audio signal to obtain a second audio signal with a predetermined sampling rate, where bandwidths of the first audio signal and the second audio signal are the same.
Optionally, in this embodiment of the application, the processor 110 is further configured to perform framing on the low-frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points, and sequentially perform filtering and downsampling on each audio signal frame to obtain N first subband signals corresponding to each audio signal frame; wherein the Y first subband signals include: n first subband signals corresponding to each audio signal frame.
Optionally, in this embodiment of the application, the processor 110 is specifically configured to perform signal processing on N first subband signals of a first audio signal frame and N first subband signals of a second audio signal frame, so as to obtain N processed first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames in the X audio signal frames.
It should be understood that, in the embodiment of the present application, the input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics Processing Unit 1041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, applications, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
An embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements the processes of the foregoing audio processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above-mentioned audio processing method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, or a system-on-chip.
Embodiments of the application provide a computer program product stored on a non-volatile storage medium for execution by at least one processor to implement a method as described in the first aspect.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order, depending on the functionality involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above embodiment method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present application may be substantially or partially embodied in the form of a software product stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk), and including instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the scope of the invention as defined by the appended claims.

Claims (14)

1. A method of audio processing, the method comprising:
carrying out resolution enhancement processing on the first audio signal to obtain a second audio signal;
carrying out low-pass filtering processing on the second audio signal to obtain a processed second audio signal;
carrying out filtering processing and down-sampling processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth;
generating M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals;
performing spectrum adjustment on the M high-frequency sub-band signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency sub-band signals;
synthesizing the M target high-frequency sub-band signals to obtain target audio signals;
wherein Y, M is a positive integer.
2. The method of claim 1, wherein generating M high frequency subband signals from low frequency subband signals of the Y subband signals comprises:
and performing spectrum replication on all low-frequency subband signals in the Y subband signals to generate M high-frequency subband signals, wherein one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is less than or equal to M.
3. The method according to claim 1, wherein before performing spectral adjustment on the M high-frequency subband signals based on the high-frequency feature information of the first audio signal to obtain M target high-frequency subband signals, the method further comprises:
extracting the characteristics of the first audio signal to obtain low-frequency characteristic information of the first audio signal;
and inputting the low-frequency characteristic information into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.
4. The method of claim 1, wherein performing resolution enhancement processing on the first audio signal to obtain a second audio signal comprises:
and performing L-time upsampling on the first audio signal to obtain a second audio signal with a preset sampling rate, wherein the bandwidth of the first audio signal is the same as that of the second audio signal.
5. The method of claim 1, wherein the signal processing the low frequency component of the second audio signal to obtain a first subband signal with the same Y bandwidth comprises:
framing the low-frequency component of the second audio signal to obtain X audio signal frames, wherein each audio signal frame comprises a preset number of sample points;
sequentially filtering and downsampling each audio signal frame to obtain N first sub-band signals corresponding to each audio signal frame;
wherein the Y first subband signals comprise: n first subband signals corresponding to each audio signal frame.
6. The method of claim 5, wherein after the framing the low frequency component of the second audio signal to obtain X audio signal frames, the method further comprises:
carrying out signal processing on N first sub-band signals in a first audio signal frame and N first sub-band signals in a second audio signal frame to obtain processed N first sub-band signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames of the X audio signal frames.
7. An audio processing apparatus, characterized in that the apparatus comprises: a processing module, a generating module and a synthesizing module, wherein:
the processing module is used for carrying out resolution enhancement processing on the first audio signal to obtain a second audio signal;
the processing module is further configured to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal;
the processing module is further configured to perform filtering processing and downsampling processing on the processed second audio signal to obtain a first sub-band signal with the same Y bandwidth;
the generating module is configured to generate M high-frequency subband signals according to a low-frequency subband signal in the Y first subband signals obtained by the processing module;
the processing module is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module based on the high-frequency feature information of the first audio signal to obtain M target high-frequency subband signals;
the synthesis module is used for synthesizing the M target high-frequency subband signals obtained by the processing module to obtain target audio signals;
wherein Y, M is a positive integer.
8. The apparatus of claim 7,
the generating module is specifically configured to perform spectrum replication on all low-frequency subband signals in the Y subband signals, and generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is less than or equal to M.
9. The apparatus of claim 7, wherein the audio processing apparatus further comprises: an extraction module and a prediction module;
the extraction module is used for extracting the characteristics of the first audio signal to obtain low-frequency characteristic information of the first audio signal;
the prediction module is configured to input the low-frequency feature information extracted by the extraction module into a preset neural network model, and predict high-frequency feature information of the first audio signal.
10. The apparatus of claim 7,
the processing module is specifically configured to perform L-fold upsampling on the first audio signal to obtain a second audio signal with a predetermined sampling rate, where bandwidths of the first audio signal and the second audio signal are the same.
11. The apparatus of claim 7,
the processing module is further configured to frame the low-frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points;
the processing module is specifically configured to perform filtering and downsampling processing on each audio signal frame in sequence to obtain N first subband signals corresponding to each audio signal frame;
wherein the Y first subband signals comprise: n first subband signals corresponding to each audio signal frame.
12. The apparatus of claim 11,
the processing module is specifically configured to perform signal processing on N first subband signals of a first audio signal frame and N first subband signals of a second audio signal frame to obtain N processed first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames of the X audio signal frames.
13. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the audio processing method of any of claims 1-6.
14. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the audio processing method according to any one of claims 1 to 6.
CN202110121348.6A 2021-01-28 2021-01-28 Audio processing method and device and electronic equipment Active CN113299313B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110121348.6A CN113299313B (en) 2021-01-28 2021-01-28 Audio processing method and device and electronic equipment
PCT/CN2022/074795 WO2022161475A1 (en) 2021-01-28 2022-01-28 Audio processing method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110121348.6A CN113299313B (en) 2021-01-28 2021-01-28 Audio processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113299313A true CN113299313A (en) 2021-08-24
CN113299313B CN113299313B (en) 2024-03-26

Family

ID=77318871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110121348.6A Active CN113299313B (en) 2021-01-28 2021-01-28 Audio processing method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN113299313B (en)
WO (1) WO2022161475A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022161475A1 (en) * 2021-01-28 2022-08-04 维沃移动通信有限公司 Audio processing method and apparatus, and electronic device
WO2024061286A1 (en) * 2022-09-23 2024-03-28 维沃移动通信有限公司 Audio signal processing method and apparatus, electronic device, and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004053940A (en) * 2002-07-19 2004-02-19 Matsushita Electric Ind Co Ltd Audio decoding device and method
CN1606687A (en) * 2002-09-19 2005-04-13 松下电器产业株式会社 Audio decoding apparatus and method
CN101471072A (en) * 2007-12-27 2009-07-01 华为技术有限公司 High-frequency reconstruction method, encoding module and decoding module
CN106057220A (en) * 2016-05-19 2016-10-26 Tcl集团股份有限公司 Audio signal high frequency expansion method and audio frequency player
CN107221334A (en) * 2016-11-01 2017-09-29 武汉大学深圳研究院 The method and expanding unit of a kind of audio bandwidth expansion
CN107393552A (en) * 2013-09-10 2017-11-24 华为技术有限公司 Adaptive bandwidth extended method and its device
CN110556121A (en) * 2019-09-18 2019-12-10 腾讯科技(深圳)有限公司 Frequency band extension method, device, electronic equipment and computer readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2981539C (en) * 2010-12-29 2020-08-25 Samsung Electronics Co., Ltd. Apparatus and method for encoding/decoding for high-frequency bandwidth extension
US9922660B2 (en) * 2013-11-29 2018-03-20 Sony Corporation Device for expanding frequency band of input signal via up-sampling
CN105280189B (en) * 2015-09-16 2019-01-08 深圳广晟信源技术有限公司 The method and apparatus that bandwidth extension encoding and decoding medium-high frequency generate
CN105513601A (en) * 2016-01-27 2016-04-20 武汉大学 Method and device for frequency band reproduction in audio coding bandwidth extension
EP3382704A1 (en) * 2017-03-31 2018-10-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal
CN113299313B (en) * 2021-01-28 2024-03-26 维沃移动通信有限公司 Audio processing method and device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004053940A (en) * 2002-07-19 2004-02-19 Matsushita Electric Ind Co Ltd Audio decoding device and method
CN1606687A (en) * 2002-09-19 2005-04-13 松下电器产业株式会社 Audio decoding apparatus and method
CN101471072A (en) * 2007-12-27 2009-07-01 华为技术有限公司 High-frequency reconstruction method, encoding module and decoding module
WO2009089728A1 (en) * 2007-12-27 2009-07-23 Huawei Technologies Co., Ltd. Method for high frequency band replication, coder and decoder thereof
CN107393552A (en) * 2013-09-10 2017-11-24 华为技术有限公司 Adaptive bandwidth extended method and its device
CN106057220A (en) * 2016-05-19 2016-10-26 Tcl集团股份有限公司 Audio signal high frequency expansion method and audio frequency player
CN107221334A (en) * 2016-11-01 2017-09-29 武汉大学深圳研究院 The method and expanding unit of a kind of audio bandwidth expansion
CN110556121A (en) * 2019-09-18 2019-12-10 腾讯科技(深圳)有限公司 Frequency band extension method, device, electronic equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022161475A1 (en) * 2021-01-28 2022-08-04 维沃移动通信有限公司 Audio processing method and apparatus, and electronic device
WO2024061286A1 (en) * 2022-09-23 2024-03-28 维沃移动通信有限公司 Audio signal processing method and apparatus, electronic device, and readable storage medium

Also Published As

Publication number Publication date
CN113299313B (en) 2024-03-26
WO2022161475A1 (en) 2022-08-04

Similar Documents

Publication Publication Date Title
US20230160015A1 (en) Oversampling in a combined transposer filterbank
US8971551B2 (en) Virtual bass synthesis using harmonic transposition
EP3989223B1 (en) Efficient combined harmonic transposition
CN104318930B (en) Sub-band processing unit and the method for generating synthesized subband signal
WO2022161475A1 (en) Audio processing method and apparatus, and electronic device
CN112259116B (en) Noise reduction method and device for audio data, electronic equipment and storage medium
AU2013286049A1 (en) Device, method and computer program for freely selectable frequency shifts in the sub-band domain
EP2720477B1 (en) Virtual bass synthesis using harmonic transposition
Nakamura et al. Time-domain audio source separation based on Wave-U-Net combined with discrete wavelet transform
Goodwin et al. Frequency-domain algorithms for audio signal enhancement based on transient modification
CN112309425A (en) Sound tone changing method, electronic equipment and computer readable storage medium
CN113611321B (en) Voice enhancement method and system
Sueur et al. Introduction to Frequency Analysis: The Fourier Transformation
Yecchuri et al. Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant