CN111210831A - Bandwidth extension audio coding and decoding method and device based on spectrum stretching - Google Patents

Bandwidth extension audio coding and decoding method and device based on spectrum stretching Download PDF

Info

Publication number
CN111210831A
CN111210831A CN201811397265.4A CN201811397265A CN111210831A CN 111210831 A CN111210831 A CN 111210831A CN 201811397265 A CN201811397265 A CN 201811397265A CN 111210831 A CN111210831 A CN 111210831A
Authority
CN
China
Prior art keywords
frequency
domain grid
frequency domain
spectrum
bandwidth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811397265.4A
Other languages
Chinese (zh)
Inventor
闫建新
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Rise Technology Co Ltd
Original Assignee
Digital Rise Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Rise Technology Co Ltd filed Critical Digital Rise Technology Co Ltd
Priority to CN201811397265.4A priority Critical patent/CN111210831A/en
Publication of CN111210831A publication Critical patent/CN111210831A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering

Abstract

A bandwidth extension audio coding and decoding method and device based on spectrum stretching. The invention relates to an audio coding method and device for bandwidth extension. The audio having a low frequency portion and a high frequency portion, the method comprising the steps of: performing frequency domain grid division on the high-frequency part to obtain a frequency domain grid region; selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at the low-frequency part, wherein M is a natural number; determining a stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in the M frequency domain grid regions and the matched corresponding copy frequency spectrum; and transmitting the stretch information related to the stretch factor to a decoding end. The invention also relates to an audio decoding method and device for bandwidth extension.

Description

Bandwidth extension audio coding and decoding method and device based on spectrum stretching
Technical Field
Embodiments of the present invention relate to digital audio encoding and decoding technologies, and in particular, to an audio encoding method and apparatus for bandwidth extension, and an audio decoding method and apparatus for bandwidth extension.
Background
The typical stereo bitrate of conventional perceptual audio coding techniques (DRA, AAC, MP3, etc.) is 96-128kbps, and below 64 kbps/stereo there is significant subjective perceptual distortion in the coding quality. Typical coding rates for fm broadcast applications are 48kbps to 64kbps per stereo, where the subjective sound quality of conventional perceptual audio coding techniques has not been able to meet fm broadcast requirements.
As shown in fig. 1, a BandWidth Extension (BWE) encoding technique for digital audio signals is proposed. As shown in fig. 1, a low frequency part of a full-band mono audio signal is encoded using conventional perceptual audio coding (such as AAC or DRA), and a high frequency part is parametrically encoded using BWE, thereby implementing a method of low-rate audio coding.
There are many bandwidth extension coding techniques and the performance is also uneven. The bandwidth extension coding technique that has been disclosed and used in the international standard is mainly composed of two coding algorithms:
the first Bandwidth extension coding technique is Spectral Band Replication (SBR) coding as described in ISO/IEC 14496-3 MPEG-4. Fig. 2 shows a detailed functional block diagram of SBR encoding. SBR is an algorithm of frequency domain processing, and its coding principle is: each frame of signal is passed through a 64-subband Quadrature Mirror Filter (QMF) to obtain 64 uniform subbands, each subband contains 32 sampling points, a reasonable time-frequency grid is divided according to the transient characteristics of the current signal, and each grid calculates an energy information and performs Huffman coding. The algorithm also includes tonality checking and transmission of individual sinusoidal signal parameter information.
Fig. 3 shows a detailed functional block diagram of SBR decoding. The decoding principle of SBR is as follows: the decoded pcm output by the core decoder (AAC) obtains 32 uniform sub-bands through the QMF of the 32 sub-bands, each sub-band comprises 32 sampling points, high-frequency generation is performed according to control parameters output by SBR demultiplexing, then high frequency is adjusted according to the control parameters and envelope data, then the output of the low-frequency 32 sub-band QMF and the output of the adjusted high-frequency sub-band QMF enter 64-band QMF synthesis together, and finally a full-band pcm audio signal is output.
A schematic diagram of the generation of the high frequency subband m from the low frequency subband k in SBR is shown in fig. 4. In SBR, the formula for the low frequency subband k to generate the high frequency subband m is:
x[m][n]=x[k][n]+bw(k)·a0·x[k][n-1]+(bw(k))2·a1·x[k][n-2](1)
wherein: a is0And a1Is a prediction coefficient; bw (k) is a bending factor, which ranges from 0 to 0.98, and the specific value is determined by a control parameter, which means that bw (k) is biased to 0 when the high frequency is strong in tonality; bw (k) is biased toward 0.98 when the pitch of high frequencies is weak, even noise-like.
As can be seen from the above formula (1):
when bw (k) is 0, the high-frequency subband m is directly copied and generated by the low-frequency subband k;
when bw (k) is 0.98, the high frequency subband m is generated from the prediction residual of the low frequency subband k.
Therefore, the SBR technology has a major problem in high frequency generation that the high frequency details of the SBR are obtained as a low frequency copy or a low frequency residual copy, and this technology has a large problem when the low frequency and the high frequency of the audio signal are greatly different, and it is difficult to obtain high quality in the restoration of the entire high frequency part because the details of the high frequency of the SBR are recovered more coarsely.
The second bandwidth extension coding technique is a simple bandwidth extension technique included in the 3GPP AMR-WB + coding method. It is an algorithm of time domain processing, and the main coding principle is as follows: dividing an input signal into a low-frequency time domain signal and a high-frequency time domain signal with the same bandwidth, analyzing and filtering a low-frequency (LF) part through Linear Predictive Coding (LPC) to obtain a residual signal of the low-frequency signal, and simulating a high-frequency detail signal through high-frequency LPC synthesis filtering; then through the actual SHF(n) comparing the actual high frequency signals to obtain a gain vector (one gain value per sub-frame) of the high frequency envelope (energy), finally further modifying the gain vector by the consistency of the gains of the low frequency high frequency and low frequency connection points, and then coding the modified gain vector. And thus includes the modified gain vector and the high frequency LPC coefficients transmitted to the decoding end. The high frequency decoding process of AMR-WB + is basically the inverse of encoding.
A schematic diagram of the high frequency generation method in AMR-WB + is shown in FIG. 5. Fs in the figure refers to the signal sampling rate after resampling. A brief process of high frequency generation in AMR-WB + is as follows: after resampling, the signal with the frequency of Fs is subjected to low-pass filtering for 2 times of downsampling to obtain a low-frequency signal with the sampling rate of Fs/2; predicting the low-frequency signal to obtain a low-frequency residual signal; the spectrum of the residual signal is inverted and a high-frequency prediction filter is excited to generate a high-frequency signal.
The bandwidth expansion technology of AMR-WB + is fixed in the initial frequency band generated at high frequency, which can only be Fs/4, thus reducing the flexibility of the bandwidth expansion technology. For most signals, the closer to the low frequency, the stronger the tonality, the closer to the high frequency, the weaker the tonality, and even the noise-like, however, as can be seen from fig. 5, the highest frequency signal portion in the bandwidth extension is generated by the lowest frequency signal portion in the core encoder, so for most signals, the copy makes the high frequency portion of the AMR-WB + bandwidth extended signal have strong tonality, which greatly reduces the subjective quality.
In the SBR bandwidth expansion coding and decoding algorithm, the details of the high-frequency signal are obtained by copying the low-frequency part during reconstruction, or the low-frequency part is obtained by adopting simple 2-order filtering; since the replaced high frequency part content is not considered, the high frequency detail envelope shape obtained by the method is the same as that of the low frequency part; or a flat spectrum that is close to white noise after filtering.
In addition, the AMR-WB + bandwidth extension technique obtains the spectral envelope of the high frequency portion by way of LPC for the high frequency portion, but the LPC calculation occupies a certain computational complexity, and the coding of the prediction coefficient needs to occupy more bit rate (since BWE technique is generally applied to low-rate audio coding, the bit rate occupied by the LPC coefficient coding may cause insufficient bit rate of the low frequency portion and cause excessive low frequency quantization distortion, which affects the overall subjective sound quality).
The general BWE decoding process is: the high frequency detail spectral coefficients are generated by copying from the low frequency part, then filtering or spectral envelope shape adjustment (e.g. SBR, AMR-WB + bandwidth extension), and finally gain adjustment (reconstructing the total energy of the high frequency part).
The bandwidth (or number of spectral lines) of the low frequency part that is usually selected for copying is the same as the bandwidth (or number of spectral lines) of the high frequency part details of the target to be replaced.
When the audio coding rate is low, the low frequency coding part (usually adopting perceptual audio coding, such as AAC, DRA, etc.) has low frequency (the low frequency part of the audio coded by the nuclear coder is low), and when the high frequency part to be coded by the bandwidth extension technique BWE is more (wide), the low frequency part may be continuously copied twice or more (such as SBR), and at this time, the details of the reconstructed high frequency spectral coefficient usually have great deviation from the details of the original high frequency spectral coefficient, thereby affecting the high frequency reconstruction effect and finally reducing the overall subjective sound quality.
In addition, for strong harmonic-like audio signals, such audio signals have rich higher harmonic components (overtones) in addition to the fundamental frequency signal, so that the whole audio signal sounds plump, smooth, bright, and the like (timbre). For the BWE encoding and decoding of the signals, because the high frequency contains a large number of chord signals, a large amount of encoding information is needed by encoding through independent chord signals, which cannot be guaranteed when encoding with low code rate; it is therefore important to reconstruct the high frequency details from how the low frequency is copied to the high frequency. Simple copying usually cannot ensure that fundamental tones and low harmonics in low-frequency spectral lines just replace higher harmonics of high-frequency parts of original audio signals, so that high-frequency distortion is brought by changing tone colors.
Disclosure of Invention
The present invention has been made to mitigate or solve at least one of the above-mentioned problems.
According to an aspect of an embodiment of the present invention, there is provided an audio encoding method for bandwidth extension, the audio having a low frequency part and a high frequency part, the method including the steps of: performing frequency domain grid division on the high-frequency part to obtain a frequency domain grid region; selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at the low-frequency part, wherein M is a natural number; determining a stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in the M frequency domain grid regions and the matched corresponding copy frequency spectrum; and transmitting the stretch information related to the stretch factor to a decoding end.
According to another aspect of embodiments of the present invention, there is provided an audio decoding method for bandwidth extension, the audio having a low frequency part and a high frequency part, the method comprising the steps of: acquiring stretching information related to a stretching factor and determining the stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in M frequency domain grid regions of a high-frequency part and a corresponding matched copy frequency spectrum in a low-frequency part, and M is a natural number; determining a matched copy spectrum from the low frequency portion based on the stretching information; stretching the matched copy frequency spectrum by the corresponding stretching factor times to obtain a stretched frequency spectrum; and copying the stretched spectrum to a corresponding position of a corresponding one of the M frequency domain grid regions.
According to still another aspect of embodiments of the present invention, there is provided an audio encoding apparatus for bandwidth extension, the audio having a low frequency part and a high frequency part, the apparatus including: the frequency domain grid division module is used for carrying out frequency domain grid division on the high-frequency part to obtain a frequency domain grid region; the matching module is used for selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at a low-frequency part, wherein M is a natural number; a stretching factor determination module for determining a stretching factor, which is a linear stretching factor related to a frequency domain grid region of the M frequency domain grid regions and the matched corresponding copy spectrum; and a transmitting module for transmitting the stretching information related to the stretching factor to the decoding end.
According to still another aspect of embodiments of the present invention, there is provided an audio decoding apparatus for bandwidth extension, the audio having a low frequency part and a high frequency part, the apparatus including: a stretching factor determining module, configured to obtain stretching information related to a stretching factor and determine the stretching factor, where the stretching factor is a linear stretching factor related to a corresponding copy spectrum matched in a frequency domain grid region and a low frequency part of M frequency domain grid regions of a high frequency part, and M is a natural number; a copy spectrum determination module for determining a matching corresponding copy spectrum from the low frequency portion; the stretching module is used for stretching the matched copy frequency spectrum by the corresponding stretching factor times to obtain a stretched frequency spectrum; and the copying module is used for copying the stretched spectrum to the corresponding position of the corresponding frequency domain grid region in the M frequency domain grid regions.
Drawings
These and other features and advantages of the various embodiments of the disclosed invention will be better understood from the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a diagram of a low-rate audio coding framework based on bandwidth expansion in the prior art;
FIG. 2 is a schematic block diagram of SBR encoding in the prior art;
FIG. 3 is a schematic block diagram of SBR decoding in the prior art;
FIG. 4 is a schematic diagram of the generation of a high frequency subband m using a low frequency subband k in SBR;
FIG. 5 is a schematic diagram of a high frequency generation method in AMR-WB + in the prior art;
fig. 6 is a flowchart of an audio encoding method for bandwidth extension according to an exemplary embodiment of the present invention;
FIG. 7 is a schematic block diagram of BWE encoding applying a high-spectrum envelope template according to an exemplary embodiment of the present invention;
fig. 8 is a flowchart of an audio decoding method for bandwidth extension according to an exemplary embodiment of the present invention;
fig. 9 is a schematic block diagram of BWE decoding applying a high-spectrum envelope template according to an exemplary embodiment of the present invention;
FIG. 10 is a schematic diagram of high frequency detail reconstruction of a generic audio signal according to an exemplary embodiment of the present invention;
FIG. 11 is a diagram illustrating grouping of stretch factors in the time-frequency direction according to an exemplary embodiment of the present invention;
FIG. 12 is a schematic diagram of high frequency detail reconstruction of a strong harmonic audio signal according to an exemplary embodiment of the present invention;
FIG. 13 is a schematic block diagram of spectral stretch based SBR encoding applied in SBR techniques in accordance with an exemplary embodiment of the present invention;
fig. 14 is a schematic block diagram of spectral stretch based SBR decoding applied in SBR technology according to an exemplary embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings. In the specification, the same or similar reference numerals denote the same or similar components. The following description of the embodiments of the present invention with reference to the accompanying drawings is intended to explain the general inventive concept of the present invention and should not be construed as limiting the invention.
In the invention, the stretch-based coding method mainly comprises the steps of comparing the signal characteristics between a low-frequency part and a high-frequency part of BWE coding at a BWE coding end based on the division condition of grids, determining a stretch factor α (or a group of stretch factors, generally depending on the number of time grids in a frame), coding the stretch factor α (the coding method can be that the difference in the time direction is quantized, then the difference is linearly coded, or a group of α in a frame is vector quantized and coded) into α parameters, and then packaging α parameters serving as stretch parameter information into BWE code stream information.
Based on the above, as shown in fig. 6, the present invention proposes an audio encoding method for bandwidth extension, the audio having a low frequency part and a high frequency part, the method comprising the steps of:
performing frequency domain grid division on the high-frequency part to obtain a frequency domain grid region;
selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at the low-frequency part, wherein M is a natural number;
determining a stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in the M frequency domain grid regions and the matched corresponding copy frequency spectrum; and
transmitting stretching information related to the stretching factor to a decoding end.
Fig. 7 is a schematic block diagram of BWE encoding applying a high-spectrum envelope template according to an exemplary embodiment of the present invention.
In the present invention, linearity in the linear stretching factor means that linear proportional stretching is directly performed in the frequency domain at the stretching factor multiple of the copy spectrum when stretching is performed.
In the present invention, M may be 1, which is within the scope of the present invention. In the case of M being 1, if there is no further grid division in the time domain, the one frequency domain grid region corresponds to one copy spectrum and also to one stretch factor. In the case that M is 1, if the time domain is further grid-divided into a plurality of time domain grid regions, the one frequency domain grid region may correspond to one copy spectrum (the copy spectrum may be simultaneously applied to the plurality of time domain grid regions) or a plurality of copy spectra, and a plurality of stretching factors exist simultaneously, but the stretching factors may be the same in value or different in value.
It should also be noted that, in the frequency domain division, the high frequency part may have other frequency domain grid regions besides the M frequency domain grid regions.
In the present invention, M may be larger than 1. At this time, the stretching factors may include M stretching factors respectively corresponding to the M frequency domain grid regions. It will also be appreciated that the M stretch factors may be equal in value to each other or may be different.
In the case that M is greater than 1, optionally, M copy spectrums of the low-frequency portions matched with the M frequency domain grid regions do not overlap.
In the present invention, the overlapping between the copy frequency spectrums, including the partial overlapping in the frequency domain, and also including the case that the two copy frequency spectrums are the same, are within the protection scope of the present invention.
In the case that M is greater than 1, optionally, there is an overlap of at least two copy spectrums of the M copy spectrums of the low-frequency portion that match the M frequency domain grid regions.
In the case that M is greater than 1, optionally, bandwidths of at least two copy spectrums of the M copy spectrums matched with the M frequency domain grid regions are different from each other; and/or bandwidths of at least two of the M frequency domain grid regions are different from each other.
In the case that M is greater than 1, optionally, bandwidths of the M frequency domain grid regions are the same as each other.
Fig. 10 is a schematic diagram of a high frequency detail reconstruction of a generic audio signal according to an exemplary embodiment of the present invention. A general audio signal here is a non-strong harmonic-like audio signal. In this case, the high frequency part has no strong harmonic signal or higher harmonic signal, and the highest spectral line (F) of the low frequency coded part is obtained when the coding rate is lowL) Is relatively low (in this case, the method in the prior art needs to copy the low frequency 2 times or more), if it is determined that the bandwidth of the low frequency part to be copied is BWL=FL-FS(wherein FSAs the initial spectral line, BWLMay be selected other than FLEnd with F insteadEEnd of spectral line, at BWL=FE-FSTherefore, the start and end spectral line labels are required to be transmitted as parameters to the receiving end or the encoding end), the bandwidth of the high frequency part is reconstructed to be BWH,=FH-FLThen a stretch factor α BW is definedH/BWLThe details of the high-frequency partial spectral coefficients can thus be obtained by a copy and stretch process, wherein the linear stretch process can be implemented using α -fold resampling.
Based on the above, the stretching factor may be a ratio of the bandwidth of the frequency domain grid region to the bandwidth of the matched copy spectrum when there is no strong harmonic signal or higher harmonic signal in the high frequency part. For example, in fig. 10, the stretch factor is 1.6.
The copy spectrum has corresponding low frequency onset and low frequency termination spectral lines.
As shown in fig. 10, the low-frequency termination line of a copy spectrum is the highest line of the low-frequency part.
FIG. 11 is a diagram illustrating grouping of stretch factors in the time-frequency direction according to an exemplary embodiment of the invention.
As shown in fig. 11, the high frequency part of the BWE may be combined into M stretching areas BW according to a grid division (frequency direction)H1、BWH2…BWHM(requirement BW)H=BWH1+BWH2+…+BWHM. Each region may be matched with an associated spectrum segment in the low frequency part, or each region finds the most relevant part of the low frequency part as BWL1、BWL2…BWLM(where the various parts may overlap) and then result in a set of α, i.e., α1=BWH1/BWL1、α2=BWH2/BWL2…αM=BWHM/BWLM
As shown in fig. 11, the time domain grid region may be divided into time domains, that is, a frame time width may include multiple grids, or may be combined into N time segments to calculate the stretch factor (the division principle is that different time segments require different stretch factors and different low-frequency-related spectral segments), so that a two-dimensional array of stretch factors α may be obtainedM,N
Correspondingly, the method also comprises the step of carrying out grid division on each frequency domain grid region in the M frequency domain grid regions in the time domain to obtain N time domain grid regions, wherein N is a natural number, and the stretching factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijThe bandwidth of a copy spectrum in a low-frequency part, which is matched with the ith frequency domain grid region in the jth time domain grid region, is set, i is a natural number not greater than M, and j is a natural number not greater than N.
FIG. 12 is a schematic diagram of high frequency detail reconstruction of a strong harmonic audio signal according to an exemplary embodiment of the present invention.
For the case of high-frequency audio signals with strong harmonic, because the higher harmonic is usually at the frequency doubling position of the fundamental frequency and the lower harmonic of the low-frequency part, or the lower harmonic part, can be copied to the high-frequency part to replace the original high-frequency details.
Thus, for the case of higher harmonics in the high frequency portion, the stretch factor α can be calculated based on the ratio of the spacing (bandwidth) of the harmonics in the high frequency portion to the spacing (bandwidth) between the fundamental frequency and the lower harmonics or the spacing (bandwidth) between the lower harmonics in the low frequency portion in an alternative embodiment, the ratio of the spacing between the two higher harmonic spectral lines with the lowest frequency in the grid region of the frequency domain to the spacing between the fundamental spectral line and the lower harmonic spectral line with the lowest frequency or the spacing between the two lower harmonic spectral lines with the lowest frequency in the low frequency portion is used as the stretch factor α.
In a particular embodiment, once stretch factor α is determined, start frequency F, which may be encoded according to BWE, is shown in FIG. 12LAnd a first high-frequency harmonic spectral line distance BWHSAnd taking into account BWLS=BWHSα to determine the low-frequency initial spectral line FS
Accordingly, in the present invention, the high frequency portion has a strong harmonic audio signal, each of the M frequency domain grid regions has at least two higher harmonics; the stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave. Optionally, the low-frequency end spectral line of one copy spectrum is the highest spectral line of the low-frequency part. As shown in fig. 12, the further M frequency domain grid regions include a frequency domain grid region having the highest spectral line of the low frequency part as the high frequency starting spectral line.
Although not shown, in the case that M is greater than 1, in an adjacent frequency-domain grid region of the M frequency-domain grid regions, the terminating spectral line of the former frequency-domain grid region is the starting spectral line of the latter frequency-domain grid region.
In order to determine the initial spectral lines of the copy spectrum, in the regions of the copy spectrum and the grid of the frequency domain corresponding to each other, the ratio of the bandwidth between the high-frequency initial spectral line and the higher harmonic spectral line with the lowest frequency to the bandwidth between the low-frequency initial spectral line and the low-frequency reference spectral line with the fundamental wave in the corresponding copy spectrum may be made equal to the stretching factor, wherein the low-frequency reference spectral line is the fundamental spectral line in the case of the fundamental wave in the corresponding copy spectrum, and the low-frequency reference spectral line is the lower harmonic spectral line with the lowest frequency in the copy spectrum in the case of no fundamental wave in the corresponding copy spectrum.
In addition, the high frequency details can also be generated by a hybrid approach, typically using strong harmonic stretching followed by a non-strong harmonic stretching to collectively construct the high frequency details. Based on this, although not shown, the high frequency part may be subjected to frequency domain grid division to obtain M +1 frequency domain grid regions, where the M +1 th frequency domain grid region is a frequency domain grid region other than the M frequency domain grid regions; the method further includes transmitting auxiliary stretching information representing an auxiliary stretching factor to the decoding end, wherein the auxiliary stretching factor is a ratio of a bandwidth of the (M + 1) th frequency domain grid region to a bandwidth of the matched auxiliary copy spectrum for the (M + 1) th frequency domain grid region. The outside of the M frequency domain grid regions may be on the right side or the left side of the M frequency domain grid regions, and all are within the protection scope of the present invention.
In the embodiment shown in fig. 12, the grid time direction and the grid frequency direction may be divided into M time segments and N frequency segments, respectively, and the array α of the stretch factors may be calculatedM,NAs such, in one embodiment, the method further comprises the step of temporally meshing at least one of the M frequency-domain grid regions to obtain N time-domain grid regions, where N is a natural number, and the stretch factor comprises αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe bandwidth corresponding to the low frequency of the copy spectrum in the low frequency part matched with the ith frequency domain grid region in the jth time domain grid region is shown, i is a natural number not larger than M, and j is a natural number not larger than N.
In the case where higher harmonics are present in the high frequency part, when the low frequency part matches the copy spectrum corresponding to the frequency domain grid region, the copy spectrum may be selected based on the higher harmonic lines in the frequency domain grid region, as shown in the example in fig. 12.
In the decoding method based on stretching, at the BWE decoding end, stretching parameters such as stretching factor α can be split from the BWE code stream, the stretching factor information is decoded or the stretching information related to the stretching factor information is decoded to calculate the stretching factor α, and the high frequency detail generation module is used to reconstruct the high frequency spectrum by conducting α times linear stretching on the low frequency part.
Based on this and the above audio encoding method, as shown in fig. 8, the present invention provides an audio decoding method for bandwidth extension, wherein the audio has a low frequency part and a high frequency part, the method comprises the steps of:
acquiring stretching information related to a stretching factor and determining the stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in M frequency domain grid regions of a high-frequency part and a corresponding matched copy frequency spectrum in a low-frequency part, and M is a natural number;
determining a matched copy spectrum from the low frequency portion based on the stretching information;
stretching the matched copy frequency spectrum by the corresponding stretching factor times to obtain a stretched frequency spectrum; and
the stretched spectrum is copied to a corresponding position of a corresponding one of the M frequency domain grid regions.
Fig. 9 is a schematic block diagram of BWE decoding using a high-spectrum envelope template according to an exemplary embodiment of the present invention.
Optionally, in the audio decoding method, the stretch factor is a ratio of a bandwidth of the frequency domain grid region to a bandwidth of the matched copy spectrum.
Optionally, in the audio decoding method, when the matched copy spectrum is selected from the low frequency part, the method includes the steps of: determining low frequency onset and low frequency termination spectral lines of the copy spectrum. Further, the low-frequency terminating line of a copy spectrum is the highest line of the low-frequency part.
Optionally, in the audio decoding method, each of the M frequency domain grid regions is grid-divided into N time domain grid regions in the time domain, where N is a natural number, and the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijIs the bandwidth of the copy spectrum in the low frequency portion that matches the ith frequency domain grid region at the jth time domain grid region, i being a natural number not greater than M and j being a natural number not greater than N, and copying the stretched spectrum to a corresponding location of a corresponding one of the M frequency domain grid regions includes the step of stretching α the matched copy spectrumijThe resulting stretched spectrum is substituted for the jth time domain grid region of the ith frequency domain grid region of the M frequency domain grid regions of the high frequency portion.
Optionally, in the audio decoding method, the stretching factor is a ratio of a bandwidth of a frequency domain grid region to a bandwidth of a matched copy spectrum, the high-frequency part has a strong harmonic audio signal, and each of the M frequency domain grid regions has at least two higher harmonics; the stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
Optionally, in the audio decoding method, the low-frequency terminating spectral line of one copy spectrum is the highest spectral line of the low-frequency part.
Optionally, in the audio decoding method, the M frequency domain grid regions include a frequency domain grid region in which a highest spectral line of the low frequency part is used as a high frequency starting spectral line.
Optionally, in the audio decoding method, M is greater than 1, and in an adjacent frequency domain grid region of the M frequency domain grid regions, a termination spectral line of a previous frequency domain grid region is an initial spectral line of a next frequency domain grid region.
Optionally, in the audio decoding method, in the copy spectrum and frequency domain grid region corresponding to each other, a ratio of a bandwidth between the high-frequency initial line and the higher harmonic line with the lowest frequency to a bandwidth between the low-frequency initial line and the low-frequency reference line is equal to the stretching factor, wherein the low-frequency reference line is a fundamental line in a case where a fundamental wave is present in the corresponding copy spectrum, and the low-frequency reference line is a lower harmonic line with the lowest frequency in the copy spectrum in a case where the fundamental wave is not present in the corresponding copy spectrum.
Optionally, in the audio decoding method, the high-frequency part frequency domain grid is divided into M +1 frequency domain grid regions, where the M +1 th frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions; the method further comprises the steps of: obtaining auxiliary stretching information representing an auxiliary stretching factor and determining the auxiliary stretching factor, wherein the auxiliary stretching factor is the ratio of the bandwidth of the M +1 th frequency domain grid region to the bandwidth of the matched auxiliary copy frequency spectrum; the method further comprises the steps of: and stretching the auxiliary copy frequency spectrum matched with the M +1 th frequency domain grid region by the auxiliary stretching factor to obtain an auxiliary stretching frequency spectrum, and copying the auxiliary stretching frequency spectrum to a corresponding position in the M +1 th frequency domain grid region.
Optionally, in the audio decoding method, each of the M frequency domain grid regions is grid-divided into N time domain grid regions in the time domain, where N is a natural number, and the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijIs the low frequency corresponding bandwidth of the copy spectrum in the low frequency portion that matches the ith frequency domain grid region in the jth time domain grid region, i being a natural number not greater than M and j being a natural number not greater than N, and "copying the stretched spectrum to the corresponding location of the corresponding frequency domain grid region of the M frequency domain grid regions" includes the step of stretching α the matched copy spectrumijThe resulting stretched spectrum is substituted for the jth time domain grid region of the ith frequency domain grid region of the M frequency domain grid regions of the high frequency portion.
Based on the embodiment described later with reference to fig. 14, optionally, the audio decoding method further includes the steps of: the stretched spectrum copied to the high frequency part is envelope shaped.
Based on the embodiment described later with reference to fig. 14, optionally, the audio decoding method further includes the steps of: the gain adjustment is performed on the stretched spectrum that has been envelope shaped or is to be envelope shaped.
Accordingly, an embodiment of the present invention also proposes an audio encoding apparatus for bandwidth extension, the audio having a low frequency part and a high frequency part, the apparatus comprising:
the frequency domain grid division module is used for carrying out frequency domain grid division on the high-frequency part to obtain a frequency domain grid region;
the matching module is used for selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at a low-frequency part, wherein M is a natural number;
a stretching factor determination module for determining a stretching factor, which is a linear stretching factor related to a frequency domain grid region of the M frequency domain grid regions and the matched corresponding copy spectrum; and
and the sending module is used for transmitting the stretching information related to the stretching factor to the decoding end.
Optionally, in the encoding apparatus, M is greater than 1; the stretching factors include M stretching factors respectively corresponding to the M frequency domain grid regions.
Optionally, in the encoding apparatus, the stretching factor is a ratio of a bandwidth of the frequency domain grid region to a bandwidth of the matched copy spectrum.
Optionally, in the encoding apparatus, the copy spectrum has corresponding low-frequency start spectral lines and low-frequency end spectral lines. Optionally, the low-frequency terminating spectral line of one copy spectrum is the highest spectral line of the low-frequency part.
Optionally, the encoding apparatus further includes a time domain division module, configured to perform time domain trellis division on each of the M frequency domain trellis regions to obtain N time domain trellis regions, where N is a natural number, and the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijThe bandwidth of a copy spectrum in a low-frequency part, which is matched with the ith frequency domain grid region in the jth time domain grid region, is set, i is a natural number not greater than M, and j is a natural number not greater than N.
Optionally, in the encoding apparatus, the high frequency part has a strong harmonic audio signal, and each of the M frequency domain lattice regions has at least two higher harmonics; the stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
Optionally, in the encoding device, the low-frequency end spectral line of one copy spectrum is the highest spectral line of the low-frequency part.
Optionally, in the encoding apparatus, the M frequency domain grid regions include a frequency domain grid region having the highest spectral line of the low frequency part as a high frequency starting spectral line. Further optionally, in the encoding apparatus, M is greater than 1, and in an adjacent frequency-domain grid region of the M frequency-domain grid regions, the termination spectral line of the previous frequency-domain grid region is the start spectral line of the next frequency-domain grid region.
Optionally, in the encoding apparatus, a ratio of a bandwidth between the line and the higher harmonic spectral line with the lowest frequency to a bandwidth between the low-frequency initial spectral line and the low-frequency reference spectral line is equal to the stretching factor, wherein the low-frequency reference spectral line is a fundamental spectral line in a case where there is a fundamental wave in the corresponding copy spectrum, and the low-frequency reference spectral line is a lower harmonic spectral line with the lowest frequency in the copy spectrum in a case where there is no fundamental wave in the corresponding copy spectrum.
Optionally, in the encoding apparatus, frequency domain grid division is performed on the high-frequency part to obtain M +1 frequency domain grid regions, where the M +1 th frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions; the device also comprises an auxiliary stretching factor determining module, which is used for determining an auxiliary stretching factor, wherein the auxiliary stretching factor is the ratio of the bandwidth of the M +1 th frequency domain grid region to the bandwidth of the matched auxiliary copy frequency spectrum; the sending module transmits the auxiliary stretching information representing the auxiliary stretching factor to a decoding end.
Optionally, in the encoding apparatus, the apparatus further includes a time domain division module, configured to perform time domain grid division on at least one frequency domain grid region of the M frequency domain grid regions to obtain N time domain grid regions, where N is a natural number, and the stretch factor includes αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe bandwidth corresponding to the low frequency of the copy spectrum in the low frequency part matched with the ith frequency domain grid region in the jth time domain grid region is shown, i is a natural number not larger than M, and j is a natural number not larger than N.
Accordingly, an embodiment of the present invention also proposes an audio decoding apparatus for bandwidth extension, the audio having a low frequency part and a high frequency part, the apparatus comprising:
a stretching factor determining module, configured to obtain stretching information related to a stretching factor and determine the stretching factor, where the stretching factor is a linear stretching factor related to a corresponding copy spectrum matched in a frequency domain grid region and a low frequency part of M frequency domain grid regions of a high frequency part, and M is a natural number;
a copy spectrum determination module for determining a matching corresponding copy spectrum from the low frequency portion;
the stretching module is used for stretching the matched copy frequency spectrum by the corresponding stretching factor times to obtain a stretched frequency spectrum; and
and the copying module is used for copying the stretching frequency spectrum to the corresponding position of the corresponding frequency domain grid region in the M frequency domain grid regions.
Optionally, in the decoding apparatus, the stretching factor is a ratio of a bandwidth of the frequency domain grid region to a bandwidth of the matched copy spectrum.
Optionally, in the decoding apparatus, the copy spectrum has low-frequency start spectral lines and low-frequency end spectral lines. Optionally, the low-frequency terminating spectral line of one copy spectrum is the highest spectral line of the low-frequency part.
Optionally, in the decoding apparatus, the apparatus further includes a time domain grid dividing module, configured to divide each of the M frequency domain grid regions into N time domain grid regions in a time domain, where N is a natural number, and the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijThe bandwidth of a copy spectrum in a low frequency part matched with an ith frequency domain grid region in a jth time domain grid region, i is a natural number not greater than M, j is a natural number not greater than N, and the stretching module stretches α the matched copy spectrumijMultiplying the obtained stretching frequency spectrum; the copy module replaces a jth time-domain grid region of an ith frequency-domain grid region of the M frequency-domain grid regions of the high frequency portion with the stretched spectrum.
Optionally, in the decoding apparatus, the stretching factor is a ratio of a bandwidth of a frequency domain grid region to a bandwidth of a matched copy spectrum, the high frequency part has a strong harmonic audio signal, and each of the M frequency domain grid regions has at least two higher harmonics; the stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
Optionally, in the decoding apparatus, the low-frequency end spectral line of one copy spectrum is the highest spectral line of the low-frequency part.
Optionally, in the decoding apparatus, the M frequency domain grid regions include a frequency domain grid region in which the highest spectral line of the low frequency part is used as the high frequency starting spectral line.
Optionally, in the decoding apparatus, M is greater than 1, and in an adjacent frequency domain grid region of the M frequency domain grid regions, a termination spectral line of a previous frequency domain grid region is a start spectral line of a next frequency domain grid region.
Optionally, in the decoding apparatus, in the copy spectrum and frequency domain grid region corresponding to each other, a ratio of a bandwidth between the high-frequency start line and the higher harmonic line with the lowest frequency to a bandwidth between the low-frequency start line and the low-frequency reference line is equal to the stretching factor, wherein the low-frequency reference line is a fundamental line in a case where a fundamental wave is present in the corresponding copy spectrum, and the low-frequency reference line is a lower harmonic line with the lowest frequency in the copy spectrum in a case where the fundamental wave is not present in the corresponding copy spectrum.
Optionally, in the decoding apparatus, the high-frequency part frequency domain grid is divided into M +1 frequency domain grid regions, where the M +1 th frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions; the device further comprises an auxiliary stretching factor determining module, which is used for obtaining auxiliary stretching information representing the auxiliary stretching factor and determining the auxiliary stretching factor, wherein the auxiliary stretching factor is the ratio of the bandwidth of the M +1 th frequency domain grid region to the bandwidth of the matched auxiliary copy frequency spectrum; the apparatus further comprises an auxiliary copy spectrum determination module for determining an auxiliary copy spectrum in the low frequency part based on the auxiliary stretching information; the device also comprises an auxiliary stretching module, which is used for stretching the auxiliary copy frequency spectrum matched with the M +1 th frequency domain grid region by the auxiliary stretching factor to obtain the auxiliary stretching frequency spectrum; the apparatus further includes an auxiliary copy module for copying the auxiliary stretched spectrum to a corresponding location in the M +1 th frequency-domain grid region.
Optionally, in the decoding apparatus, the apparatus further includes a time domain grid dividing module, configured to divide each of the M frequency domain grid regions into N time domain grid regions in a time domain, where N is a natural number, and the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe method includes the steps of obtaining a low-frequency portion of a copy spectrum, wherein the low-frequency portion corresponds to a low-frequency bandwidth of the copy spectrum matched with an ith frequency domain grid region in a jth time domain grid region, i is a natural number not greater than M, j is a natural number not greater than N, and the stretching module stretches α the matched copy spectrumijMultiplying to obtain a stretching frequency spectrum; the copy module replaces a jth time-domain grid region of an ith frequency-domain grid region of the M frequency-domain grid regions of the high frequency portion with the stretched spectrum.
Optionally, based on an embodiment described later with reference to fig. 14, the decoding apparatus further includes: and the envelope shaping module is used for carrying out envelope shaping on the stretched spectrum copied to the high-frequency part.
Optionally, based on an embodiment described later with reference to fig. 14, the decoding apparatus further includes: and the gain adjusting module is used for performing gain adjustment on the stretched spectrum which is subjected to envelope shaping or is to be subjected to envelope shaping.
The stretch-based SBR encoding method and the SBR decoding method are exemplarily described below based on fig. 13 and 14.
The processing steps of stretching the module in SBR coding are as follows:
(1) other module calculation methods in the SBR are not changed and are not described in detail herein;
(2) dividing an input audio signal into 64 sub-bands aiming at QMF in SBR, wherein each sub-band has 32 sampling points, and dividing the input audio signal into grids with different time-frequency resolutions according to the transient characteristics of the audio signal;
(3) for the stretching factor α calculation module, the input parameter is to divide the time-frequency grid parameter into M frequency segments N time segments in the high frequency part of SBR coding, in order to reduce the transmission of additional information (the start and end spectral lines of each segment, etc.), set N to 1, i.e. each frame uses the same low frequency spectral segment and corresponding stretching factor at any time, M is 2, i.e. 2 stretching factors in the frequency direction;
(4) determining low-frequency start and end spectral lines, and selecting the highest spectral line F of the low-frequency part for the end spectral line, again for simplicityLAnalyzing whether a high-frequency spectral line of BWE starting encoding is a strong harmonic signal, if so, calculating a stretching factor α 1 by adopting a strong harmonic mode, otherwise, calculating a stretching factor α 1 according to a non-strong harmonic mode;
(5) determining the initial spectral line F of the low frequency part according to the stretching factor α 1 and the high frequency partSAnd the code is used as a parameter to be packed into an SBR code stream;
(6) according to the highest coding frequency required by coding, if the high-frequency part after the low-frequency part stretching α can not satisfy the BWE high-frequency part, the remaining highest spectral segment (M ═ 2 second segment) further adopts a non-strong harmonic mode to calculate a new α 2;
(7) α 1 and α 2 further pack the codebook subscript into SBR code stream through vector quantization coding;
(8) and meanwhile, M and N are also used as stretching parameters to be packed into the SBR code stream, and finally, the SBR code stream is transmitted to a decoding end.
The processing steps of stretching the module in SBR decoding are as follows:
(1) receiving SBR code stream, analyzing the stretching parameter information from the code stream to obtain the number of segments M in frequency direction, the number of segments N in time direction and the low-frequency initial spectral line FSAnd a stretch factor coding index;
(2) vector quantization decoding is carried out on the extension factor coding index to obtain extension factors α 1 and α 2;
(3) since N-1 and M-2, further depending on the given low frequency starting point FSCopying a section of spectral line from the low-frequency part (i.e. F)STo FL) Then stretched by α 1 instead of FLTo α 1 (F)L-FS) Then the second part of the high frequency passes through the pair FLThe previous low-frequency line is stretched α 2 times;
(4) the spectral line detail recovery of the SBR high-frequency part is completed through the two steps (2) and (3), and then the high frequency can be subjected to envelope shaping;
(5) finally, SBR usually requires gain adjustment to ensure that the energy of the high frequency part of the grid is consistent with the energy of the original high frequency part at the encoding end. Thereby recovering the high frequency part of SBR and completing SBR decoding
Although embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments and combinations of elements without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

Claims (32)

1. An audio encoding method for bandwidth extension, the audio having a low frequency part and a high frequency part, the method comprising the steps of:
performing frequency domain grid division on the high-frequency part to obtain a frequency domain grid region;
selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at the low-frequency part, wherein M is a natural number;
determining a stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in the M frequency domain grid regions and the matched corresponding copy frequency spectrum; and
transmitting stretching information related to the stretching factor to a decoding end.
2. The method of claim 1, wherein:
the stretch factor is the ratio of the bandwidth of the frequency domain grid region to the bandwidth of the matched copy spectrum.
3. The method of claim 2, wherein:
the method further comprises the steps of: performing grid division on each frequency domain grid region in the M frequency domain grid regions in a time domain to obtain N time domain grid regions, wherein N is a natural number; and is
The stretch factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijThe bandwidth of a copy spectrum in a low-frequency part, which is matched with the ith frequency domain grid region in the jth time domain grid region, is set, i is a natural number not greater than M, and j is a natural number not greater than N.
4. The method of claim 1, wherein:
the high frequency portion has a strong harmonic audio signal, each of the M frequency domain grid regions having at least two higher harmonics; and is
The stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
5. The method of claim 1, wherein:
the high frequency portion has a strong harmonic audio signal, each of the M frequency domain grid regions having at least two higher harmonics;
the M frequency domain grid regions comprise frequency domain grid regions taking the highest spectral line of the low-frequency part as a high-frequency initial spectral line;
m is larger than 1, in the adjacent frequency domain grid regions in the M frequency domain grid regions, the termination spectral line of the former frequency domain grid region is the initial spectral line of the latter frequency domain grid region;
in the grid region of the copy spectrum and the frequency domain corresponding to each other, the ratio of the bandwidth between the high-frequency initial line and the higher harmonic line with the lowest frequency to the bandwidth between the low-frequency initial line and the low-frequency reference line is equal to the stretching factor, wherein the low-frequency reference line is a fundamental line in the case of having a fundamental wave in the corresponding copy spectrum, and the low-frequency reference line is a lower harmonic line with the lowest frequency in the copy spectrum in the case of not having a fundamental wave in the corresponding copy spectrum.
6. The method of claim 4 or 5, wherein:
performing frequency domain grid division on the high-frequency part to obtain M +1 frequency domain grid regions, wherein the M +1 frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions;
the method further includes transmitting auxiliary stretching information representing an auxiliary stretching factor to the decoding end, wherein the auxiliary stretching factor is a ratio of a bandwidth of the (M + 1) th frequency domain grid region to a bandwidth of the matched auxiliary copy spectrum for the (M + 1) th frequency domain grid region.
7. The method of claim 4 or 5, wherein:
the method further comprises the steps of: performing grid division on at least one frequency domain grid region in the M frequency domain grid regions in a time domain to obtain N time domain grid regions, wherein N is a natural number; and is
The stretching factor bagScraper αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe bandwidth corresponding to the low frequency of the copy spectrum in the low frequency part matched with the ith frequency domain grid region in the jth time domain grid region is shown, i is a natural number not larger than M, and j is a natural number not larger than N.
8. An audio decoding method for bandwidth extension, the audio having a low frequency part and a high frequency part, the method comprising the steps of:
acquiring stretching information related to a stretching factor and determining the stretching factor, wherein the stretching factor is a linear stretching factor related to a frequency domain grid region in M frequency domain grid regions of a high-frequency part and a corresponding matched copy frequency spectrum in a low-frequency part, and M is a natural number;
determining a matched copy spectrum from the low frequency portion based on the stretching information;
stretching the matched copy frequency spectrum by the corresponding stretching factor times to obtain a stretched frequency spectrum; and
the stretched spectrum is copied to a corresponding position of a corresponding one of the M frequency domain grid regions.
9. The method of claim 8, wherein:
the stretch factor is the ratio of the bandwidth of the frequency domain grid region to the bandwidth of the matched copy spectrum.
10. The method of claim 9, wherein:
each of the M frequency domain grid regions is time domain grid-divided into N time domain grid regions, wherein N is a natural number, and the stretching factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijFor copies of the low frequency part that match the ith frequency domain grid region at the jth time domain grid regionBandwidth of a frequency spectrum, i is a natural number not greater than M, and j is a natural number not greater than N; and is
"copying the stretched spectrum to a corresponding location of a corresponding one of the M frequency-domain grid regions" includes the step of stretching α the matched copied spectrumijThe resulting stretched spectrum is substituted for the jth time domain grid region of the ith frequency domain grid region of the M frequency domain grid regions of the high frequency portion.
11. The method of claim 8, wherein:
the stretching factor is a ratio of a bandwidth of a frequency domain grid region to a bandwidth of a matched copy spectrum, the high-frequency part has a strong harmonic audio signal, and each of the M frequency domain grid regions has at least two higher harmonics; and is
The stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
12. The method of claim 8, wherein:
the high frequency portion has a strong harmonic audio signal, each of the M frequency domain grid regions having at least two higher harmonics;
the M frequency domain grid regions comprise frequency domain grid regions taking the highest spectral line of the low-frequency part as a high-frequency initial spectral line;
m is larger than 1, in the adjacent frequency domain grid regions in the M frequency domain grid regions, the termination spectral line of the former frequency domain grid region is the initial spectral line of the latter frequency domain grid region;
in the grid region of the copy spectrum and the frequency domain corresponding to each other, the ratio of the bandwidth between the high-frequency initial line and the higher harmonic line with the lowest frequency to the bandwidth between the low-frequency initial line and the low-frequency reference line is equal to the stretching factor, wherein the low-frequency reference line is a fundamental line in the case of having a fundamental wave in the corresponding copy spectrum, and the low-frequency reference line is a lower harmonic line with the lowest frequency in the copy spectrum in the case of not having a fundamental wave in the corresponding copy spectrum.
13. The method of claim 11 or 12, wherein:
dividing a high-frequency part frequency domain grid into M +1 frequency domain grid regions, wherein the M +1 frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions;
the method further comprises the steps of: obtaining auxiliary stretching information representing an auxiliary stretching factor and determining the auxiliary stretching factor, wherein the auxiliary stretching factor is the ratio of the bandwidth of the M +1 th frequency domain grid region to the bandwidth of the matched auxiliary copy frequency spectrum;
the method further comprises the steps of: and stretching the auxiliary copy frequency spectrum matched with the M +1 th frequency domain grid region by the auxiliary stretching factor to obtain an auxiliary stretching frequency spectrum, and copying the auxiliary stretching frequency spectrum to a corresponding position in the M +1 th frequency domain grid region.
14. The method of claim 11 or 12, wherein:
each frequency domain grid region in the M frequency domain grid regions is subjected to grid division in a time domain into N time domain grid regions, wherein N is a natural number;
the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe bandwidth corresponding to the low frequency of a copy frequency spectrum matched with the ith frequency domain grid region in the jth time domain grid region in the low frequency part is set, i is a natural number not greater than M, and j is a natural number not greater than N; and is
"copying the stretched spectrum to a corresponding location of a corresponding one of the M frequency-domain grid regions" includes the step of stretching α the matched copied spectrumijThe resulting stretched spectrum is substituted for the jth time domain grid region of the ith frequency domain grid region of the M frequency domain grid regions of the high frequency portion.
15. The method of claim 8, further comprising the step of:
the stretched spectrum copied to the high frequency part is envelope shaped.
16. The method of claim 15, further comprising the step of:
the gain adjustment is performed on the stretched spectrum that has been envelope shaped or is to be envelope shaped.
17. An audio encoding device for bandwidth extension, the audio having a low frequency portion and a high frequency portion, the device comprising:
the frequency domain grid division module is used for carrying out frequency domain grid division on the high-frequency part to obtain a frequency domain grid region;
the matching module is used for selecting copy spectrums correspondingly matched with M frequency domain grid regions in the frequency domain grid regions at a low-frequency part, wherein M is a natural number;
a stretching factor determination module for determining a stretching factor, which is a linear stretching factor related to a frequency domain grid region of the M frequency domain grid regions and the matched corresponding copy spectrum; and
and the sending module is used for transmitting the stretching information related to the stretching factor to the decoding end.
18. The apparatus of claim 17, wherein:
the stretch factor is the ratio of the bandwidth of the frequency domain grid region to the bandwidth of the matched copy spectrum.
19. The apparatus of claim 18, wherein:
the device also comprises a time domain division module, a frequency domain grid division module and a frequency domain grid division module, wherein the time domain division module is used for carrying out grid division on each frequency domain grid region in the M frequency domain grid regions in a time domain to obtain N time domain grid regions, and N is a natural number;
the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijThe bandwidth of a copy spectrum in a low-frequency part, which is matched with the ith frequency domain grid region in the jth time domain grid region, is set, i is a natural number not greater than M, and j is a natural number not greater than N.
20. The apparatus of claim 17, wherein:
the high frequency portion has a strong harmonic audio signal, each of the M frequency domain grid regions having at least two higher harmonics; and is
The stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
21. The apparatus of claim 17, wherein:
the M frequency domain grid regions comprise frequency domain grid regions taking the highest spectral line of the low-frequency part as a high-frequency initial spectral line;
m is larger than 1, in the adjacent frequency domain grid regions in the M frequency domain grid regions, the termination spectral line of the former frequency domain grid region is the initial spectral line of the latter frequency domain grid region;
in the grid region of the copy spectrum and the frequency domain corresponding to each other, the ratio of the bandwidth between the high-frequency initial line and the higher harmonic line with the lowest frequency to the bandwidth between the low-frequency initial line and the low-frequency reference line is equal to the stretching factor, wherein the low-frequency reference line is a fundamental line in the case of having a fundamental wave in the corresponding copy spectrum, and the low-frequency reference line is a lower harmonic line with the lowest frequency in the copy spectrum in the case of not having a fundamental wave in the corresponding copy spectrum.
22. The apparatus of claim 20 or 21, wherein:
performing frequency domain grid division on the high-frequency part to obtain M +1 frequency domain grid regions, wherein the M +1 frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions;
the device also comprises an auxiliary stretching factor determining module, which is used for determining an auxiliary stretching factor, wherein the auxiliary stretching factor is the ratio of the bandwidth of the M +1 th frequency domain grid region to the bandwidth of the matched auxiliary copy frequency spectrum;
the sending module transmits the auxiliary stretching information representing the auxiliary stretching factor to a decoding end.
23. The apparatus of claim 20 or 21, wherein:
the device also comprises a time domain division module, which is used for carrying out grid division on at least one frequency domain grid region in the M frequency domain grid regions in a time domain to obtain N time domain grid regions, wherein N is a natural number;
the stretch factor comprises αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe bandwidth corresponding to the low frequency of the copy spectrum in the low frequency part matched with the ith frequency domain grid region in the jth time domain grid region is shown, i is a natural number not larger than M, and j is a natural number not larger than N.
24. An audio decoding apparatus for bandwidth extension, the audio having a low frequency portion and a high frequency portion, the apparatus comprising:
a stretching factor determining module, configured to obtain stretching information related to a stretching factor and determine the stretching factor, where the stretching factor is a linear stretching factor related to a corresponding copy spectrum matched in a frequency domain grid region and a low frequency part of M frequency domain grid regions of a high frequency part, and M is a natural number;
a copy spectrum determination module for determining a matching corresponding copy spectrum from the low frequency portion;
the stretching module is used for stretching the matched copy frequency spectrum by the corresponding stretching factor times to obtain a stretched frequency spectrum; and
and the copying module is used for copying the stretching frequency spectrum to the corresponding position of the corresponding frequency domain grid region in the M frequency domain grid regions.
25. The apparatus of claim 24, wherein:
the stretch factor is the ratio of the bandwidth of the frequency domain grid region to the bandwidth of the matched copy spectrum.
26. The apparatus of claim 24, wherein:
the device also comprises a time domain grid dividing module, a grid dividing module and a grid dividing module, wherein the time domain grid dividing module is used for grid dividing each frequency domain grid region in the M frequency domain grid regions into N time domain grid regions in the time domain, and N is a natural number;
the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijFor the bandwidth, BW, of the ith frequency domain grid region at the jth time domain grid regionLijThe bandwidth of a copy frequency spectrum matched with the ith frequency domain grid region in the jth time domain grid region in the low-frequency part is set, i is a natural number not greater than M, and j is a natural number not greater than N;
the stretching module stretches α the matched copy spectrumijMultiplying the obtained stretching frequency spectrum;
the copy module replaces a jth time-domain grid region of an ith frequency-domain grid region of the M frequency-domain grid regions of the high frequency portion with the stretched spectrum.
27. The apparatus of claim 24, wherein:
the stretching factor is a ratio of a bandwidth of a frequency domain grid region to a bandwidth of a matched copy spectrum, the high-frequency part has a strong harmonic audio signal, and each of the M frequency domain grid regions has at least two higher harmonics; and is
The stretching factor is the ratio of the high-frequency corresponding bandwidth to the corresponding low-frequency corresponding bandwidth, and the high-frequency corresponding bandwidth is the bandwidth between two higher harmonic spectral lines with the lowest frequency of the corresponding frequency domain grid region in the M frequency domain grid regions; and the low frequency corresponds to a bandwidth between a low frequency initial spectral line and a fundamental spectral line in the copy spectrum in the case where the corresponding copy spectrum has a fundamental wave, and is a bandwidth between two low-order harmonic spectral lines having the lowest frequency in the copy spectrum in the case where the corresponding copy spectrum does not have the fundamental wave.
28. The apparatus of claim 24, wherein:
the M frequency domain grid regions comprise frequency domain grid regions taking the highest spectral line of the low-frequency part as a high-frequency initial spectral line;
m is larger than 1, in the adjacent frequency domain grid regions in the M frequency domain grid regions, the termination spectral line of the former frequency domain grid region is the initial spectral line of the latter frequency domain grid region;
in the grid region of the copy spectrum and the frequency domain corresponding to each other, the ratio of the bandwidth between the high-frequency initial line and the higher harmonic line with the lowest frequency to the bandwidth between the low-frequency initial line and the low-frequency reference line is equal to the stretching factor, wherein the low-frequency reference line is a fundamental line in the case of having a fundamental wave in the corresponding copy spectrum, and the low-frequency reference line is a lower harmonic line with the lowest frequency in the copy spectrum in the case of not having a fundamental wave in the corresponding copy spectrum.
29. The apparatus of claim 27 or 28, wherein:
dividing a high-frequency part frequency domain grid into M +1 frequency domain grid regions, wherein the M +1 frequency domain grid region is a frequency domain grid region outside the M frequency domain grid regions;
the device further comprises an auxiliary stretching factor determining module, which is used for obtaining auxiliary stretching information representing the auxiliary stretching factor and determining the auxiliary stretching factor, wherein the auxiliary stretching factor is the ratio of the bandwidth of the M +1 th frequency domain grid region to the bandwidth of the matched auxiliary copy frequency spectrum;
the apparatus further comprises an auxiliary copy spectrum determination module for determining an auxiliary copy spectrum in the low frequency part based on the auxiliary stretching information;
the device also comprises an auxiliary stretching module, which is used for stretching the auxiliary copy frequency spectrum matched with the M +1 th frequency domain grid region by the auxiliary stretching factor to obtain the auxiliary stretching frequency spectrum;
the apparatus further includes an auxiliary copy module for copying the auxiliary stretched spectrum to a corresponding location in the M +1 th frequency-domain grid region.
30. The apparatus of claim 27 or 28, wherein:
the device also comprises a time domain grid dividing module, a grid dividing module and a grid dividing module, wherein the time domain grid dividing module is used for grid dividing each frequency domain grid region in the M frequency domain grid regions into N time domain grid regions in the time domain, and N is a natural number;
the stretch factor is αijWherein, αij=BWHij/BWLij,BWHijA high-frequency corresponding bandwidth BW of the ith frequency domain grid region in the jth time domain grid regionLijThe bandwidth corresponding to the low frequency of a copy frequency spectrum matched with the ith frequency domain grid region in the jth time domain grid region in the low frequency part is set, i is a natural number not greater than M, and j is a natural number not greater than N;
the stretching module stretches α the matched copy spectrumijMultiplying to obtain a stretching frequency spectrum;
the copy module replaces a jth time-domain grid region of an ith frequency-domain grid region of the M frequency-domain grid regions of the high frequency portion with the stretched spectrum.
31. The apparatus of claim 24, further comprising:
and the envelope shaping module is used for carrying out envelope shaping on the stretched spectrum copied to the high-frequency part.
32. The apparatus of claim 31, further comprising:
and the gain adjusting module is used for performing gain adjustment on the stretched spectrum which is subjected to envelope shaping or is to be subjected to envelope shaping.
CN201811397265.4A 2018-11-22 2018-11-22 Bandwidth extension audio coding and decoding method and device based on spectrum stretching Pending CN111210831A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811397265.4A CN111210831A (en) 2018-11-22 2018-11-22 Bandwidth extension audio coding and decoding method and device based on spectrum stretching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811397265.4A CN111210831A (en) 2018-11-22 2018-11-22 Bandwidth extension audio coding and decoding method and device based on spectrum stretching

Publications (1)

Publication Number Publication Date
CN111210831A true CN111210831A (en) 2020-05-29

Family

ID=70788038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811397265.4A Pending CN111210831A (en) 2018-11-22 2018-11-22 Bandwidth extension audio coding and decoding method and device based on spectrum stretching

Country Status (1)

Country Link
CN (1) CN111210831A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024050673A1 (en) * 2022-09-05 2024-03-14 北京小米移动软件有限公司 Audio signal frequency band extension method and apparatus, device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583784A (en) * 1993-05-14 1996-12-10 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Frequency analysis method
US20120136670A1 (en) * 2010-06-09 2012-05-31 Tomokazu Ishikawa Bandwidth extension method, bandwidth extension apparatus, program, integrated circuit, and audio decoding apparatus
CN102884574A (en) * 2009-10-20 2013-01-16 弗兰霍菲尔运输应用研究公司 Audio signal encoder, audio signal decoder, method for encoding or decoding an audio signal using an aliasing-cancellation
US20150154971A1 (en) * 2012-07-16 2015-06-04 Thomson Licensing Method and apparatus for encoding multi-channel hoa audio signals for noise reduction, and method and apparatus for decoding multi-channel hoa audio signals for noise reduction
CN105261373A (en) * 2015-09-16 2016-01-20 深圳广晟信源技术有限公司 Self-adaptive grid construction method and device used for bandwidth extended coding
CN105280190A (en) * 2015-09-16 2016-01-27 深圳广晟信源技术有限公司 Bandwidth extension encoding and decoding method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583784A (en) * 1993-05-14 1996-12-10 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Frequency analysis method
CN102884574A (en) * 2009-10-20 2013-01-16 弗兰霍菲尔运输应用研究公司 Audio signal encoder, audio signal decoder, method for encoding or decoding an audio signal using an aliasing-cancellation
US20120136670A1 (en) * 2010-06-09 2012-05-31 Tomokazu Ishikawa Bandwidth extension method, bandwidth extension apparatus, program, integrated circuit, and audio decoding apparatus
US20150154971A1 (en) * 2012-07-16 2015-06-04 Thomson Licensing Method and apparatus for encoding multi-channel hoa audio signals for noise reduction, and method and apparatus for decoding multi-channel hoa audio signals for noise reduction
CN105261373A (en) * 2015-09-16 2016-01-20 深圳广晟信源技术有限公司 Self-adaptive grid construction method and device used for bandwidth extended coding
CN105280190A (en) * 2015-09-16 2016-01-27 深圳广晟信源技术有限公司 Bandwidth extension encoding and decoding method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杭波;王毅;康长青;: "移动音频带宽扩展算法计算复杂度优化", 计算机应用, no. 02, pages 516 - 520 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024050673A1 (en) * 2022-09-05 2024-03-14 北京小米移动软件有限公司 Audio signal frequency band extension method and apparatus, device, and storage medium

Similar Documents

Publication Publication Date Title
KR101589942B1 (en) Cross product enhanced harmonic transposition
RU2492530C2 (en) Apparatus and method for encoding/decoding audio signal using aliasing switch scheme
TWI545560B (en) Apparatus and method for encoding or decoding an audio signal with intelligent gap filling in the spectral domain
CA2698031C (en) Method and device for noise filling
CN104575517B (en) Audio Signal Processing during high-frequency reconstruction
JP7391930B2 (en) Apparatus and method for generating enhanced signals with independent noise filling
EP3268958A1 (en) Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
CN109448741B (en) 3D audio coding and decoding method and device
JP2009515212A (en) Audio compression
CN113963705A (en) Audio encoder and decoder for frequency domain processor and time domain processor
CN105280190A (en) Bandwidth extension encoding and decoding method and device
CN111210832A (en) Bandwidth extension audio coding and decoding method and device based on spectrum envelope template
CN111210831A (en) Bandwidth extension audio coding and decoding method and device based on spectrum stretching
KR101387808B1 (en) Apparatus for high quality multiple audio object coding and decoding using residual coding with variable bitrate
CN105280189B (en) The method and apparatus that bandwidth extension encoding and decoding medium-high frequency generate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination