CN101471072B - High-frequency reconstruction method, encoding device and decoding module - Google Patents

High-frequency reconstruction method, encoding device and decoding module Download PDF

Info

Publication number
CN101471072B
CN101471072B CN 200710305087 CN200710305087A CN101471072B CN 101471072 B CN101471072 B CN 101471072B CN 200710305087 CN200710305087 CN 200710305087 CN 200710305087 A CN200710305087 A CN 200710305087A CN 101471072 B CN101471072 B CN 101471072B
Authority
CN
China
Prior art keywords
frequency
band
low
sub
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200710305087
Other languages
Chinese (zh)
Other versions
CN101471072A (en
Inventor
马鸿飞
郭庆巍
张海波
张波
许丽净
张清
许剑峰
李伟
杜正中
胡晨
杨毅
苗磊
齐峰岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Xidian University
Original Assignee
Huawei Technologies Co Ltd
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Xidian University filed Critical Huawei Technologies Co Ltd
Priority to CN 200710305087 priority Critical patent/CN101471072B/en
Priority to PCT/CN2008/073728 priority patent/WO2009089728A1/en
Publication of CN101471072A publication Critical patent/CN101471072A/en
Application granted granted Critical
Publication of CN101471072B publication Critical patent/CN101471072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the invention discloses a method for rebuilding high frequency, which comprises the following steps: carrying out the filtering processing to acoustic frequency or voice signals to obtain a low-frequency sub-band and a high-frequency sub-band, confirming the frequency band copy strategy, obtaining the correlation between the low-frequency sub-band and the high-frequency sub-band according to the frequency band copy strategy, selecting the low-frequency sub-band as an optimum copy frequency band whose correlation is higher for the high-frequency sub-band, outputs high-frequency rebuilding parameter information which comprises the corresponding correlation of the selected frequency band. The invention further provides a method for rebuilding the high-frequency, which comprises the following steps: receiving the high-frequency rebuilding parameter information which comprises the corresponding correlation of the selected frequency band, the corresponding correlation of the selected frequency band is particular the corresponding correlation between the low-frequency sub-band with big correlation and the high-frequency sub-band, and at the high-frequency band, the low-frequency sub-band is copied as a high-frequency sub-band according to the high-frequency rebuilding parameter information which comprises the corresponding correlation of the selected frequency band. Correspondingly, the embodiment of the invention provides a coding module and a decoding module. The technical proposal provided by the embodiment of the invention can more accurately carry out the high-frequency rebuilding.

Description

High-frequency reconstruction method, encoding device and decoding device
Technical Field
The invention relates to the technical field of communication, in particular to a high-frequency reconstruction method, an encoding module and a decoding module.
Background
In audio and speech processing techniques, high frequency reconstruction is a relatively critical technique. The high frequency compression and recovery technology represented by Spectral Band Replication (SBR) is a high frequency reconstruction method with good effect so far, and the high frequency reconstruction method copies the waveform of a low frequency band to a high frequency band part, and then repairs the copied high frequency band by using the energy adjustment parameter and the harmonic adjustment parameter which are extracted during encoding, thereby achieving the purpose of high frequency reconstruction.
There are two main methods for high-frequency reconstruction using low-frequency band signals in the prior art, please refer to the following description:
the first prior art is as follows:
passing the low-frequency signals of the audio or voice through a digital filtering group to obtain a group of low-frequency subband signals; and then the low-frequency sub-band group is taken as an integral block signal to copy the high-frequency signal. The whole high-frequency band signal replication method is that the high-frequency band is divided into a plurality of sections according to the frequency from low to high, and the bandwidth of each section is approximately the same as that of the whole low-frequency signal; the entire set of low frequency subbands is then copied contiguously to each segment of the high frequency band. Like this, monoblock low frequency subband group can be used several times at the high frequency channel periodically, until the whole high frequency channel that needs to resume all is duplicated and is accomplished, and concrete mode has two kinds: 1) shifting the whole low-frequency subband group to the corresponding high-frequency band, which can refer to fig. 1, which is a schematic diagram of the whole shift copy of a low-frequency subband in the prior art; 2) the whole low-frequency subband group is firstly folded, that is, the arrangement order of the subbands is reversed, and then the whole low-frequency subband group is translated to the corresponding high-frequency band, which can refer to fig. 2, which is a schematic diagram of the whole folding translation of a low-frequency subband in the prior art. During replication, modes 1) and 2) may be used alternately. Thus, the entire set of low frequency subbands is used periodically until the entire high frequency band to be restored is copied.
Please refer to fig. 4, which is a waveform diagram of energy of original audio and its subband signals in the prior art (for the sake of visual comparison, only the waveforms of the first 29 subbands are shown in the diagram). Fig. 5 is a three-dimensional graph of prior art original audio sub-band energy waveforms. Fig. 6 is a waveform diagram of energy of each subband signal obtained by performing high-frequency reconstruction in the method 1) in the first conventional technique, and fig. 7 is a three-dimensional diagram of energy of each corresponding subband. Fig. 8 is an energy waveform of each subband signal obtained by performing high-frequency reconstruction in the method 2) in the first conventional technique, and fig. 9 is a three-dimensional graph of energy of each corresponding subband. For the energy waveform diagrams, the structures of the waveforms shown therein are: the lowest waveform is the original audio waveform; the lines labeled 0 through 8 are low frequency subband waveforms that will be used to replicate the high frequency subbands; the reference number 8 th and 9 th are the boundary between the high frequency and the low frequency; all subbands, represented upward from curve 9, are ranges for high frequency reconstruction and processing. For the energy three-dimensional map, the audio parameters depicted in the map are the energy amplitude, the audio frame number (30 frames) and the subband number (29), respectively. Corresponding to 29 processed subbands shown in the energy waveform diagram. Wherein, the sub-band above the 9 th sub-band is a high frequency processing part.
The second prior art is:
and passing the low-frequency sub-bands through a low-pass filter bank to obtain a group of low-frequency sub-bands. Here, the high frequency portion to be restored is not reproduced continuously in whole segments with the selected low frequency subband group as a whole, as in the first prior art. And respectively restoring a plurality of discretely distributed high-frequency sub-bands by utilizing the sub-bands in the low-frequency sub-band group.
In the high frequency part, if there are very abundant harmonic components, the frequencies of the harmonic components are many times integral multiples of their respective fundamental frequencies. Under the guidance of this idea, the second prior art proposes that if the subband numbers of some subbands in the high-frequency part are integer multiples of natural numbers such as 2, 3, 4, 5, in other words, there is a multiple correspondence between some high-frequency subbands and low-frequency subbands, then these subbands are likely to have rich harmonic components and need to be restored with emphasis.
Please refer to fig. 3, which is a diagram illustrating discrete replication of two low frequency subbands in the prior art. The entire frequency band of the audio signal is divided into thirty-three sub-bands (the sub-bands are numbered 0, 1, 2, and 31, 32 in sequence) by sub-band filtering. The low-frequency sub-band group comprises eight sub-bands of 0 th, 1 st, 2 nd, 9 th, 10 th, 9 th, 31 st, 32 th and the like, and the high-frequency sub-band group needing to be restored comprises twenty-five sub-bands of 8 th, 9 th, 10 rd, 9 rd, 32 th and the like. Wherein the low frequency subband group provides as many as four consecutive subbands at a time to complete the replication.
The first thing to start with is the replication process II. Because the serial numbers of the 8 th sub-band, the 10 th sub-band, the 12 th sub-band and the 14 th sub-band in the high-frequency sub-band group are all integer multiples of 2, the 4 th sub-band, the 5 th sub-band, the 6 th sub-band and the 7 th sub-band are selected from the low-frequency sub-band group, and the 8 th sub-band, the 10 th sub-band, the 12 th sub-band and.
Then is the replication process III. And the serial numbers of the 9 th sub-band, the 12 th sub-band, the 15 th sub-band, the 18 th sub-band and the 21 st sub-band in the high-frequency sub-band group are all integer multiples of 3, but the 12 th sub-band is just copied, and the position of the 12 th sub-band also influences the continuity of the sub-bands, so that the four sub-bands of the 3 rd sub-band, the 5 th sub-band, the 6 th sub-band and the 7 th sub-band are selected from the low-frequency sub-.
This is followed by a replication process IV. The serial numbers of the 8 th sub-bands, 12 th sub-bands, 16 th sub-bands, 20 th sub-bands, 24 th sub-bands, 28 th sub-bands and the like in the high-frequency sub-band group are all integer multiples of 4, but the 8 th sub-bands and the 12 th sub-bands are copied, the 4 th sub-bands, the 5 th sub-bands, the 6 th sub-bands and the 7 th sub-bands are selected from the low-frequency sub-band group, and the 16 th sub-bands.
Finally, a replication process V. The serial numbers of the 10 th sub-band, 15 th sub-band, 20 th sub-band, 25 th sub-band, 30 th sub-band and the like in the high frequency sub-band group are all integer multiples of 5, but the 10 th sub-band, 15 th sub-band and 20 th sub-band are copied, so that the 6 th sub-band and the 7 th sub-band are only required to be selected from the low frequency sub-band group, and the 25 th sub-band and the 30 th sub.
Thus, the process of recovering discretely distributed high frequency subbands with a continuous set of low frequency subbands is completed. And finally, for the high-frequency subbands omitted by the method, selecting low-frequency subbands with similar waveforms, and restoring the omitted high-frequency subbands to finish the copying of all the high-frequency subbands.
During the research and practice of the prior art, the inventor finds that the prior art has the following problems:
in the prior art, no matter the low-frequency subband is taken as a whole block to be periodically translated, copied or folded or the frequency multiplication copied according to the prior art, harmonic waves are mechanically restored, the diversity and the variability of the audio voice signal are not considered, in addition, the low-frequency subband and the high-frequency subband are sequentially extracted and copied according to the subband sequence number during copying, and the waveform of the low-frequency subband and the waveform of the high-frequency subband are originally different, so that the copied high-frequency subband may have larger waveform difference or peak value difference compared with the original high-frequency subband, and the accuracy of the reconstructed high-frequency signal is not high. If the waveform diagrams mentioned above are observed, the difference between the waveform reconstructed by the prior art method and the original waveform is relatively large; observing the above mentioned energy waveform patterns, the comparison results show that many high frequency harmonics are lost after reconstruction by the prior art method.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a high frequency reconstruction method, an encoding module and a decoding module, which can perform high frequency reconstruction more accurately.
In order to solve the technical problem, the embodiment provided by the invention is realized by the following technical scheme:
the embodiment of the invention provides a high-frequency reconstruction method, which comprises the following steps: filtering the audio signal to obtain a low-frequency sub-band and a high-frequency sub-band; analyzing the short-time characteristics of the audio signal, and correspondingly selecting different frequency band replication strategies according to the result of analyzing the short-time characteristics; obtaining the correlation between the low-frequency sub-band and the high-frequency sub-band according to the determined frequency band replication strategy, selecting the low-frequency sub-band with high correlation as an optimal replication frequency band for the high-frequency sub-band, and outputting high-frequency reconstruction parameter information comprising the corresponding relation of the selected frequency band; the short-time characteristic analysis of the audio signal, and correspondingly selecting different frequency band replication strategies according to the result of the short-time characteristic analysis specifically include: calculating the energy mean value of the low-frequency part, the energy mean value of the high-frequency part, the energy values of each subband of the low-frequency part and the energy values of each subband of the high-frequency part of the audio signal; comparing each subband energy value of the low-frequency part with a first weighted value of the energy mean value of the low-frequency part; if the subband energy values of the partial subbands with the low frequency part are less than or equal to the first weighted value of the energy average value of the low frequency part, the selected strategy is as follows: selecting a low-frequency sub-band in an energy set, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the low-frequency sub-band in the energy set; if the sub-band energy values of the low-frequency part sub-band are all larger than the first weighted value of the low-frequency part energy mean value, further comparing the high-frequency part energy mean value with the second weighted value of the low-frequency part energy mean value; if the mean value of the energy of the high-frequency part is less than or equal to a second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: selecting a whole low-frequency sub-band, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the whole low-frequency sub-band; if the mean value of the energy of the high-frequency part is larger than the second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: the high frequency is divided into a plurality of copy frequency bands, and a low frequency sub-band having a large correlation is selected for each copy frequency band to be copied.
The embodiment of the invention provides a coding module, which comprises an analysis filter module, a short-time characteristic analysis module and a frequency band selection module; the analysis filter module is used for filtering the audio signal to obtain a low-frequency sub-band and a high-frequency sub-band; the short-time characteristic analysis module is used for carrying out short-time characteristic analysis on the audio signal; the frequency band selection module comprises a copy strategy selection module and an optimal frequency band selection module; the copy strategy selection module is used for correspondingly selecting different frequency band copy strategies according to the analysis result of the short-time characteristic analysis module; the optimal frequency band selection module is used for acquiring the correlation between the low-frequency sub-band and the high-frequency sub-band according to the determined frequency band replication strategy, selecting the low-frequency sub-band with high correlation as an optimal replication frequency band for the high-frequency sub-band, and outputting high-frequency reconstruction parameter information comprising the corresponding relation of the selected frequency band; the short-time characteristic analysis module specifically performs short-time characteristic analysis on the audio signal as follows: calculating the energy mean value of the low-frequency part, the energy mean value of the high-frequency part, the energy values of each subband of the low-frequency part and the energy values of each subband of the high-frequency part of the audio signal; the copy strategy selection module correspondingly selects different frequency band copy strategies according to the analysis result of the short-time characteristic analysis module, and specifically comprises the following steps: comparing each subband energy value of the low-frequency part with a first weighted value of the energy mean value of the low-frequency part; if the subband energy values of the partial subbands with the low frequency part are less than or equal to the first weighted value of the energy average value of the low frequency part, the selected strategy is as follows: selecting a low-frequency sub-band in an energy set, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the low-frequency sub-band in the energy set; if the sub-band energy values of the low-frequency part sub-band are all larger than the first weighted value of the low-frequency part energy mean value, further comparing the high-frequency part energy mean value with the second weighted value of the low-frequency part energy mean value; if the mean value of the energy of the high-frequency part is less than or equal to a second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: selecting a whole low-frequency sub-band, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the whole low-frequency sub-band; if the mean value of the energy of the high-frequency part is larger than the second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: the high frequency is divided into a plurality of copy frequency bands, and a low frequency sub-band having a large correlation is selected for each copy frequency band to be copied.
It can be seen from the above technical solutions that, in the embodiments of the present invention, the correlation between the low frequency subband and the high frequency subband is fully considered, the correlation between the low frequency subband and the high frequency subband is obtained according to the determined band replication policy, the low frequency subband having a large correlation is selected as an optimal replication band for the high frequency subband, and high frequency reconstruction parameter information including a correspondence relationship between the selected frequency bands is output, so that high frequency reconstruction can be performed according to the high frequency reconstruction parameter information. The energy oscillogram and the energy three-dimensional graph of each subband signal obtained after high-frequency reconstruction can obviously find that the high-frequency part reconstructed by the method is closer to the original audio signal and has better effect than the prior art, so that the method and the device can more accurately reconstruct the high frequency.
Drawings
FIG. 1 is a diagram of a prior art global translational replication of a low frequency subband;
FIG. 2 is a diagram of a prior art global folding translation of a low frequency subband;
FIG. 3 is a schematic diagram of discrete replication of a prior art two low frequency subband;
FIG. 4 is a waveform of the energy of the original audio and its subband signals of the prior art;
FIG. 5 is a three-dimensional graph of energy of sub-band energy waveforms of prior art original audio;
fig. 6 is an energy waveform diagram of each subband signal obtained after high-frequency reconstruction by the method 1) in the first prior art;
FIG. 7 is a three-dimensional graph of the energy of each sub-band obtained by performing high-frequency reconstruction in the manner 1) in the first prior art;
fig. 8 is an energy waveform diagram of each subband signal obtained after high-frequency reconstruction by means of the method 2) in the first prior art;
FIG. 9 is a three-dimensional graph of the energy of each sub-band obtained by performing high-frequency reconstruction in the manner 2) in the first prior art;
FIG. 10 is a schematic and block diagram of the high frequency reconstruction according to an embodiment of the present invention;
FIG. 11 is a schematic diagram of a high-frequency segmented low-frequency matching replication strategy according to an embodiment of the present invention;
fig. 12 is a schematic diagram of a high-frequency matching duplication strategy of a low-frequency pilot band according to an embodiment of the present invention;
fig. 13 is a schematic diagram of a low-frequency band high-frequency matching copy strategy according to an embodiment of the present invention;
FIG. 14 is a diagram illustrating an extended copy policy of an embodiment of the present invention;
FIG. 15(a) is a block diagram of an adaptive spectral band replication method at an encoding end according to an embodiment of the present invention;
FIG. 15(b) is a block diagram of the fixed-end fixed-bandwidth copy method according to the embodiment of the present invention;
FIG. 16 is a flowchart of a high frequency reconstruction method of an adaptive spectral band replication method according to an embodiment of the present invention;
fig. 17 is a flowchart of band replication policy selection according to an embodiment of the present invention;
FIG. 18 is a flow chart of optimal band selection according to an embodiment of the present invention;
FIG. 19 is an algorithmic flow chart of time varying property detection according to an embodiment of the present invention;
FIG. 20 is a diagram illustrating a decoding side performing high frequency reconstruction according to parameter information of an encoding side according to an embodiment of the present invention;
FIG. 21 is a flowchart of the decoder side high frequency generator algorithm according to an embodiment of the present invention;
FIG. 22 is a diagram of an energy waveform after an original audio signal is restored by the method of the embodiment of the invention;
FIG. 23 is a three-dimensional graph of the energy of an original audio signal after recovery by a method of an embodiment of the invention;
FIG. 24 is a block diagram of an encoding module according to an embodiment of the present invention;
FIG. 25 is a block diagram of a second encoding module according to an embodiment of the present invention;
FIG. 26 is a block diagram of a decoding module according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a high-frequency reconstruction method, which can more accurately perform high-frequency reconstruction.
Referring to fig. 10, a high frequency reconstruction principle and a structure diagram according to an embodiment of the present invention are shown.
As shown in fig. 10, the upper half is a block related to high frequency processing in the encoding side, and the lower half is a block related to high frequency processing in the decoding side.
At the encoding end, the original audio signal is converted into subband signals distributed in different frequency bands through an analysis filter module, the subband signals comprise a low-frequency subband and a high-frequency subband, the low-frequency subband is encoded by a core encoder and transmitted to the decoding end, and in addition, the low-frequency subband is processed to obtain parameter information for guiding high-frequency reconstruction. Specifically, at the encoding end, the low-frequency sub-band passes through an analysis and detection module, and the obtained detection result is input to a frequency band selection module for guiding the analysis strategy of the frequency band selection module; the frequency band selection module selects a proper copy strategy according to the guide information of the analysis and detection module, selects a matched low-frequency sub-band for each or each section of high-frequency sub-band by using a maximum correlation criterion, extracts envelope parameters and finally outputs parameter information of high-frequency reconstruction.
At a decoding end, a core decoder decodes and restores a low-frequency sub-band signal by using the received coding information of the low-frequency sub-band, copies a high-frequency sub-band according to the high-frequency reconstructed parameter information from the coding end, and then carries out envelope adjustment to obtain a reconstructed high-frequency sub-band; finally, the signals of the low-frequency sub-band and the high-frequency sub-band are processed by a synthesis filter to recover the audio and voice signals of the full frequency band.
The band replication strategy of the embodiment of the present invention is described first below:
the conventional prior art band replication method usually selects a certain range of low frequency band as fundamental frequency, and then applies some fixed rule to replicate to the high frequency part. Such as translational replication, frequency multiplication relationship replication, and folding replication, such replication methods limit the accuracy of high frequency reconstruction. Since although the high frequency and low frequency components have a certain correlation, the translation and folding copy cannot ensure the correspondence of the correlation, the correlation between the low frequency subband used for copying and the high frequency subband to be copied may not be good or even poor, which may introduce noise or change the sound quality; the frequency multiplication copying method utilizes harmonic periodicity, but each sub-band is not a harmonic component, and the correlation is not good, so that noise is introduced or the tone quality is changed; in addition to the diversity of sound sources in speech and audio signals, the use of a fixed reproduction method is more likely to cause the misuse of reproduction bands.
The frequency band replication strategy provided by the embodiment of the invention fully considers the correlation between the low frequency sub-band and the high frequency sub-band, is also suitable for the short-time characteristic and the time-varying characteristic of audio and voice signals, has a flexible frequency band selection function, and can ensure that the frequency bands for replication and replication have optimal correlation. The embodiment of the invention provides the following three frequency band replication strategies and their extension strategies:
(1) high frequency segmentation low frequency matching replication strategy (strategy 1):
this strategy divides the high frequency band component into a plurality of copy bands, which can be divided in different ways. For example, the copy band may be divided according to Bark (Bark scale) scale bands, and sub-bands may be grouped into copy bands at different resolutions within the Bark scale bands, etc. The resolution indicates the number of subbands included in the replica band, and the smaller the number, the higher the resolution, and the larger the number, the lower the resolution. When the copy band is divided within Bark bands, the resolution decreases with increasing frequency. The copy band division may also equally divide the high frequency component into a plurality of copy bands at the same resolution, and then select the most relevant low frequency band for each high frequency band to copy.
Fig. 11 is a schematic diagram of a high-frequency segmentation low-frequency matching copy strategy according to an embodiment of the present invention. B0 is the end subband of the low frequency subband signal, B1 is the end subband of the high frequency processed signal, B1, B2, B3 are the copy band partition boundaries. After the copy frequency bands are divided, the strategy selects the most relevant low-frequency sub-band for each copy frequency band for copying, and the low-frequency sub-band can be repeatedly used as long as the maximum correlation between the low-frequency sub-band and the copied copy frequency band is ensured.
The replication strategy makes full use of the correlation between high frequency and low frequency, is suitable for the condition that the spectral envelope is relatively stable, and has good spectral envelope energy at both high frequency and low frequency, because the high frequency component at the moment has important audio frequency components, especially the high frequency band signal close to the low frequency part, if the high frequency replication generates distortion, noise is introduced, the tone quality is influenced, the replication strategy is selected in a segmentation mode to select the low frequency band with the maximum correlation for each high frequency band for replication, the correlation correspondence between the frequency bands is ensured, and the distortion caused by the frequency band misutilization can be avoided.
Compared with the prior art, the replication strategy has the following differences: in the prior art, a whole block of low-frequency signals is used for continuously and repeatedly copying high-frequency signals, when the correlation of the high-frequency signals and the low-frequency signals is poor, the low-frequency subbands with large differences are used for copying the high-frequency subbands, and large distortion is introduced.
(2) Low-frequency main pilot band high-frequency matching copy strategy (strategy 2):
the strategy firstly selects a frequency band with concentrated energy in a low-frequency signal as a pilot frequency band, then selects a high-frequency signal segment which is more relevant to the low-frequency signal, and copies the selected frequency band with concentrated energy in the low-frequency signal in the high-frequency signal segment as a high-frequency sub-band; for the remaining small frequency bands of the high-frequency signal segment that have not yet been copied, the closest low frequency band is selected for copying. The method firstly processes the frequency bands with good harmonic characteristics, and then selects the closest low-frequency sub-band for the scattered frequency bands to copy. The band selection is performed by first selecting the high frequency harmonics with the best correlation using the low frequencies and then selecting the most correlated low frequency subbands using the non-harmonic high frequency subbands.
Fig. 12 is a schematic diagram of a high-frequency matching duplication strategy of a low-frequency pilot band according to an embodiment of the present invention. Firstly, selecting a low-frequency signal with concentrated energy, then selecting a high-frequency part with better correlation according to the selected low-frequency signal, and copying the high-frequency part, for example, high-frequency band signals i and j are copy frequency bands selected by the low-frequency signal, and then selecting a proper low-frequency band for scattered high-frequency bands except i and j to copy.
The low-frequency main frequency guide section high-frequency matching replication strategy utilizes the frequency domain harmonic characteristics of signals, selects high-frequency harmonics of different orders for similar fundamental frequency signals in low frequency, and is suitable for voice and audio signals with good harmonic characteristics. According to the general property of harmonic transformation, the interval of harmonic waves appearing in a high-frequency part is gradually reduced, and the frequency band range covered by the harmonic waves is gradually increased, so that high-frequency distortion is generated by harmonic wave copying performed by a traditional frequency multiplication copying method in the prior art.
Compared with the prior art, the replication strategy has the following differences: in the prior art, when a frequency multiplication copying method is adopted, low-frequency subband signals are copied to high frequency in multiples, formed high-frequency harmonics include harmonics of different low-frequency signals, the continuity of the harmonics is damaged, and the copying strategy continuously replaces similar fundamental frequency signals to the high-frequency harmonics, so that the continuity of the harmonics is ensured, and high-frequency distortion cannot be caused.
(3) Low frequency band high frequency matching copy strategy (strategy 3):
the strategy considers the whole low frequency band as a reference frequency band signal, and then selectively copies the harmonic wave of the high frequency band according to the optimal matching principle.
Fig. 13 is a schematic diagram of a low-frequency band and high-frequency matching copy strategy according to an embodiment of the present invention. The high frequency bands i and j are selected harmonic components which are relatively relevant to the low frequency signal, the whole low frequency band is copied at the position, and the scattered sub-bands in the high frequency band are still selected to be copied by using a maximum relevant copy frequency band selection method.
The harmonic wave selection and duplication strategy is suitable for audio signals with stable spectral envelopes, stable descending of high-frequency energy and low high-frequency energy, the high-frequency harmonic wave energy of the audio signals generally decreases exponentially with the ascending of orders, and the high-frequency energy is small and can be regarded as the mixture of harmonic waves and noise, so that the whole low-frequency signal can be selectively duplicated to the high frequency, but the duplication accuracy of a high-frequency sub-band close to a low frequency band is very important when the code rate is low, and careful related frequency band selection is required.
Compared with the prior art, the replication strategy has the following differences: the prior art uses a whole block of continuous copy high-frequency signals of low-frequency signals, while the copy strategy regards low frequency as a whole, then selects high-frequency harmonic which is most related to the low-frequency signals from high-frequency components, copies the whole low-frequency band at the position, such as i and j in fig. 13, allowing a transition frequency band to exist between the harmonics, and selects a proper low-frequency band for the transition frequency band by using an optimal frequency band selection method to copy, so that the harmonic shift can be prevented.
(4) Extension policy of band replication policy (policy 1, policy 2, and policy 3):
the extension strategy method is to use the high-frequency sub-band frequency band with lower frequency obtained by high-frequency copy in the prior art for copying the higher frequency band.
Because the low-frequency signal may not cover a complete harmonic wave under the condition of low code rate, the band selection range is expanded, the low-frequency signal and a small number of high-frequency sub-bands adjacent to the low-frequency signal are regarded as an integral part, and then the method in the strategy 1, the strategy 2 or the strategy 3 is used for determining the copy band. The most relevant low frequency subband replica is selected for the small number of high frequency subbands added. When the band selection detection is carried out, firstly, the range of an expansion band (namely, a high-frequency sub-band which is used as a copy source during recovery) is determined according to the coding bit rate and the harmonic integrity relation, and a low-frequency sub-band for copying is selected for each expansion band by using the maximum correlation criterion, and as the expansion band needs the highest reconstruction accuracy, the highest band resolution is adopted during band selection (namely, a single sub-band is used as a copy band); then the extension band is combined with the low frequency sub-band as a copy source, and then strategy 1, strategy 2 or strategy 3 is adopted to select a copy band for the high frequency sub-band.
Fig. 14 is a schematic diagram of an extended copy policy according to an embodiment of the present invention. After the reconstruction of the replica band 1, the lower frequency band of the replica band 1 and the lower frequency band are combined into a continuous band and used for signal reconstruction of the replica band 2 and above.
This replication strategy is suitable for use at low code rates, since the low frequency band processed by the core codec at low code rates is shorter, may not cover all fundamental frequency overtones, and the overtones of the mid-band are closer to the high frequency overtones characteristic than the low frequency overtones, so that the reconstructed signal can be used for replication of the higher frequency band after reconstruction of the lower frequency high frequency signal with higher resolution is guaranteed. The harmonic wave can be completely depicted by the replication method, and the high-frequency reconstruction range is favorably expanded.
The high frequency reconstruction method according to an embodiment of the present invention is described in further detail below.
The high-frequency reconstruction method of the embodiment of the invention can have two modes, namely a self-adaptive frequency band replication mode and a fixed frequency band replication mode:
(1) adaptive band replication method: fig. 15(a) is a block diagram showing the structure of the adaptive spectral band replication method at the encoding side according to the embodiment of the present invention. The method detects the characteristics of the audio signal by using an energy spectrum analysis and estimation method, and the detection result outputs guide information to guide the selection of a copy strategy and further guide the selection of an optimal frequency band. Since the characteristics of the speech and audio signals are usually the same over a certain time period, i.e. quasi-stationary characteristics, it is not necessary to perform a frequency band selection anew, a time-varying characteristic detection is introduced, and the frequency band selection is only performed anew when the time-varying characteristic variable is larger than a tolerance.
(2) Fixed band replication method: fig. 15(b) is a block diagram showing the structure of the fixed-end wideband replica method according to the embodiment of the present invention. This method is implemented by selecting a fixed copy method in advance according to actual needs, that is, determining one of the above proposed bandwidth copy policies (such as policy 1, policy 2, policy 3, or their extension policies) in the embodiment of the present invention, and keeping the policy unchanged throughout the whole audio processing process, and combining with appropriate optimal bandwidth selection. In the fixed bandwidth copy mode, the selection of the bandwidth copy policy does not need to be guided according to the result of the short-term characteristic analysis module, and the copy policy is specified by setting parameters, so that the short-term characteristic analysis module is not needed.
In the adaptive bandwidth transfer scheme and the fixed bandwidth transfer scheme, time-varying characteristic detection is not necessarily required.
The following describes a high-frequency reconstruction method using an adaptive spectral band replication method.
Please refer to fig. 16, which is a flowchart illustrating a high frequency reconstruction method in an adaptive spectral band replication method according to an embodiment of the present invention, including the steps of:
step 1601, performing short-time characteristic analysis on the sub-band signal obtained by the analysis filter module;
step 1602, selecting a frequency band replication strategy according to the result of the short-time characteristic analysis;
step 1603, selecting an optimal frequency band according to the selected frequency band replication strategy;
and 1604, performing band replication according to the optimal band.
The steps are specifically described below.
Step 1601, performing short-time characteristic analysis on the sub-band signal obtained by the analysis filter module;
and for the original audio signal, converting the original audio signal into sub-band signals distributed in different frequency bands through an analysis filter module, and then carrying out short-time characteristic analysis on the sub-band signals.
The short-term characteristics analysis is a preparatory work to select an appropriate band replication strategy. The audio or voice signal is first time-frequency transformed and then analyzed for the energy distribution of the harmonic wave, the low frequency part and the high frequency part, and the frequency band replication strategy is determined by the analyzed parameter result.
There are many algorithms for implementing the short-term characteristic analysis, and one of the algorithms is used in the embodiment of the present invention, but is not limited to this.
Let the low-frequency subband sample point be XLow(n, l). Wherein n represents the low-frequency subband number, n is more than or equal to 1 and is less than k0,k0Is the first subband sequence number for high frequency processing; l represents a sample point in the subband, and l is more than or equal to 0 and less than 32. Setting the high-frequency sub-band sampling point as XHigh(k, l), where k represents a high frequency subband, k0≤k≤ke,keIs the end subband of the high frequency processing.
Calculating the energy of each sub-band of the low-frequency part, as shown in the following formula:
<math><mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>31</mn> </munderover> <msubsup> <mi>X</mi> <mi>Low</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow></math>
secondly, calculating the energy average value of the whole low-frequency part, as shown in the following formula:
<math><mrow> <msub> <mi>E</mi> <mi>Low</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>k</mi> <mn>0</mn> </msub> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <msub> <mi>k</mi> <mn>0</mn> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>E</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow></math>
thirdly, calculating the energy of each subband of the high-frequency part, as shown in the following formula:
<math><mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> </mrow> <mn>31</mn> </munderover> <msubsup> <mi>X</mi> <mi>High</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow></math>
fourthly, calculating the energy average value of the whole high-frequency part, wherein the energy average value is shown as the following formula:
<math><mrow> <msub> <mi>E</mi> <mi>High</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>k</mi> <mi>e</mi> </msub> <mo>-</mo> <msub> <mi>k</mi> <mn>0</mn> </msub> </mrow> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <msub> <mi>k</mi> <mn>0</mn> </msub> </mrow> <mrow> <msub> <mi>k</mi> <mi>e</mi> </msub> <mo>-</mo> <msub> <mi>k</mi> <mn>0</mn> </msub> </mrow> </munderover> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow></math>
after the above calculation, the short-term characteristic analysis is completed, and the analysis parameters are applied to the band replication policy selection section.
Step 1602, selecting a frequency band replication strategy according to the result of the short-time characteristic analysis;
four band replication strategies have been mentioned above, and one strategy is selected for replication based on the results of the short-term characteristic analysis. And after the band replication strategy is determined, outputting a replication strategy flag bit and replication strategy information to guide the selection of the following optimal band.
Please refer to fig. 17, which is a flowchart illustrating selection of a band replication policy according to an embodiment of the present invention.
And comparing the energy value of each low-frequency sub-band obtained by short-time characteristic analysis with the energy mean value of the low-frequency part, and then comparing the energy mean value of the high-frequency part with the energy mean value of the low-frequency part. According to the comparison condition, when the energy values of partial low-frequency sub-bands are far lower than the mean value and the energy of other partial continuous low-frequency sub-bands is close to or above the mean value, selecting a strategy 2; if the energy of each sub-band of the low frequency is close, the energy curve of the low frequency part is continuous and smooth, and the difference between the energy mean value of the high frequency part and the energy mean value of the low frequency part is large, a strategy 3 is selected; otherwise, policy 1 is selected.
The judgment process composed of the strategy 2, the strategy 3 and the strategy 1 is a main body of strategy selection, the frequency band replication extension strategy is used as an auxiliary strategy, and the method is mainly used for expanding the width of a low-frequency band for replication and improving the integrity of a fundamental frequency aiming at the condition that the low-frequency band is narrow in the process of recovering high frequency. When the coding rate is low and the number of low-frequency sub-bands is limited, the band replication extension strategy can ensure that the selection of the low-frequency sub-bands by the high-frequency part is not limited too much. The method binds a plurality of middle and high frequency sub-bands which need to be frequency band copied with a low frequency part to form a new low frequency part for selecting most high frequency sub-bands and extracting corresponding copying parameters. Meanwhile, for a plurality of middle-high frequency sub-bands in the new low-frequency part, the low-frequency sub-bands which are most matched with the middle-high frequency sub-bands are selected from the original low-frequency part, and the extraction of the copy parameters is sequentially completed for the low-frequency sub-bands.
The band replication extension policy is an extension to policy 2, policy 3, and policy 1. . When the extended _ flag is output in the band copy policy selection flow, the band copy extension policy is used. Therefore, when there is an extended _ flag output, the selected policy 2, policy 3, or policy 1 becomes an extended policy 2, extended policy 3, or extended policy 1, respectively.
The specific flow in fig. 17 is as follows:
step 1701, completing time-frequency transformation, and inputting a QMF (Quadrature Mirror Filter) sub-band;
1702, determining whether the input subband is in a low coding rate mode, if so, entering 1703, and if not, entering 1705;
step 1703, expanding the range of the low-frequency part participating in the copying to form a new low-frequency part, and entering step 1704;
step 1704, a flag extended _ flag is output, which is used for a band replication extension strategy, and the step 1705 is entered;
step 1705, judging whether a low-frequency sub-band with too low energy exists, if not, entering step 1708, and if so, entering step 1706;
the energy E (n) of each low-frequency sub-band and the energy mean value E of the low-frequency partLowBy comparison, if there is some low frequency subband energy E (n) satisfying the following formula: e (n) is less than or equal to delta1*ELow
The case that the energy of the sub-band is steeply decreased in the low frequency sub-band is illustrated, the distribution of the energy of the fundamental frequency is discontinuous, step 1706 is entered, otherwise step 1708 is entered. Wherein, delta1The value range of (1) is 0 < delta1And (3) the value range is an empirical value obtained by observing the waveform condition corresponding to the replication strategy, and the value can be set according to requirements.
Step 1706, searching low-frequency sub-bands with higher energy and continuous distribution and determining a selection strategy 2;
the step mainly searches a subband interval with continuous energy distribution of the low-frequency part as the fundamental frequency part of the strategy 2. The decision algorithm is as follows:
if E (n) > delta is satisfied2*ELow,E(n+1)>δ2*ELow,......,E(n+q-1)>δ2*ELow,E(n+q)<δ2*ELow(wherein q is more than or equal to 1, n is more than or equal to 1 and is less than k0,δ2The value range of (1) is 0 < delta1<δ2Less than 1, the value range is an empirical value obtained by observing the waveform condition corresponding to the replication strategy, and the value can be set according to requirements,
determining to adopt a strategy 2, and recording the sub-band sequence number n and the sub-band interval number q;
step 1707, outputting a Flag corresponding to the strategy 2, and outputting a subband sequence number n and a subband interval number q; it should be noted that if the extended _ flag is output at the same time, the current policy is the extended policy 2.
Step 1708, comparing the high-frequency energy average value with the low-frequency energy average value, judging whether the high-frequency energy average value is too low, if not, entering step 1709, and if so, entering step 1710;
if each low frequency subband energy E (n) and the low frequency energy mean E are processed through step 1705LowBy comparison, the following formula is satisfied: e (n) > delta1*ELowThe emphasis of the analysis is shifted to the energy relationship of the low and high frequency parts.
Handle EHighAnd λ × ELowAnd comparing, wherein the value range of lambda is more than 0 and less than 1, the lambda is an empirical value obtained by observing the relevant waveform, and the value can be set according to requirements.
When can satisfy EHigh≤λ*ELowThen decide to adopt strategy 3 and proceed to step 1710, otherwise EHigh>λ*ELowIf so, determining to adopt the strategy 1, and entering a step 1709;
step 1709, outputting Flag corresponding to the strategy 1; it should be noted that if the extended _ flag is output at the same time, the current policy is the extended policy 1.
Step 1710, outputting Flag bit Flag corresponding to the strategy 3; it should be noted that if the extended _ flag is output at the same time, the current policy is the extended policy 3.
Step 1603, selecting an optimal frequency band according to the selected frequency band replication strategy;
the optimal frequency band selection module flexibly searches an optimal matching frequency band for copying a certain reference frequency band by taking the maximum correlation as a standard, ensures the correlation of frequency band copying, and enables a copied high-frequency signal to approach an original signal without excessive adjustment.
And selecting the optimal high-low frequency signal corresponding relation according to the guidance of the determined copy strategy and the copy strategy information (including the corresponding initialized frequency band table). The band replication strategy guides the optimal band selection to decide whether the band selection is to select a low frequency signal with a high frequency signal or to select a high frequency harmonic with a low frequency signal, for example, under strategy 1, the optimal band selection selects an optimal low frequency signal for replication for each high frequency replicated band signal, and under strategy 2, the optimal band selection first selects a high frequency harmonic that can be replicated for a fundamental-like frequency signal. The initialized band table guides the estimated bandwidth of the optimal band selection and the selected band range.
The optimal frequency band selection is to compare the correlation of high and low frequency signals and the similarity of envelope characteristics of the high and low frequency signals, and then the optimal matching relationship of the high and low frequency signals is determined comprehensively according to the obtained two parameters. In order to avoid the calculation difference caused by the signal energy amplitude when the correlation and the envelope characteristic are compared, the signal is normalized according to the range of an initialized frequency band table before estimation, so that the similarity degree of the signal characteristic is mainly analyzed when the matched signal is selected, and the energy difference can be adjusted when the signal is reconstructed.
For convenience of description, the following describes a general algorithm for optimal band selection by taking policy 1 as an example, and taking a single subband representing the maximum frequency resolution as a copy band instead of the single subband as a replica band.
Please refer to fig. 18, which is a flowchart illustrating an optimal band selection process according to an embodiment of the present invention, including the steps of:
step 1801, dividing the copy frequency band and the alternative frequency band according to the initialized frequency band table in the frequency band copy strategy information, and dividing the input subband signal into high frequency and low frequency according to the initialized frequency band table;
let the number of the copy frequency bands be nb, and let the low frequency subband sampling point be XLow(n, l) wherein 1. ltoreq. n < k0Number of low-frequency sub-band, k0Is the first sub-band number of high-frequency processing, l is more than or equal to 0 and less than 32 represents the sampling point in the sub-band, and X is used for setting the sampling point of the high-frequency sub-bandHigh(k, l) represents, wherein k0≤k≤keIndicating the high frequency sub-band, keIs the end subband of the high frequency processing.
Step 1802, for each division of the replica band in the initialization band table, performs normalization processing within the replica band length on the high and low frequency signals, assuming that the replica band is equal to the transform subband, as follows,
<math><mrow> <msubsup> <mi>X</mi> <mi>Low</mi> <mo>&prime;</mo> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>X</mi> <mi>Low</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>max</mi> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>Low</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow></math> <math><mrow> <msubsup> <mi>X</mi> <mi>High</mi> <mo>&prime;</mo> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>X</mi> <mi>High</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>max</mi> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>High</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow></math>
step 1803, calculating correlation functions of each high frequency band or high frequency band and each possible low frequency band or low frequency band;
in consideration of the sample point shift situation, in order to obtain the low frequency band most approximate to the high frequency band waveform, the low frequency band sample point is shifted and then the correlation function is calculated, the formula is as follows,
<math><mrow> <msubsup> <mi>r</mi> <mi>k</mi> <mi>m</mi> </msubsup> <mo>[</mo> <mi>n</mi> <mo>]</mo> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>31</mn> </munderover> <msub> <mi>X</mi> <mi>High</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <msub> <mi>X</mi> <mi>Low</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>l</mi> <mo>-</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>0,1,2</mn> <mo>,</mo> </mrow></math> where m is the number of offset samples,
Figure GDA0000095502450000154
representing the correlation function values of the high and low frequency bands after the sample point is shifted.
If the time-frequency transformation is complex transformation, the sub-band sampling points are complex values, and can be selected
Figure GDA0000095502450000161
Is analyzed by the real part of
<math><mrow> <msubsup> <mi>r</mi> <mi>k</mi> <mi>m</mi> </msubsup> <mo>[</mo> <mi>n</mi> <mo>]</mo> <mo>=</mo> <mi>Re</mi> <mrow> <mo>[</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>31</mn> </munderover> <msub> <mi>X</mi> <mi>High</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <msubsup> <mi>X</mi> <mi>Low</mi> <mo>*</mo> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>l</mi> <mo>-</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>0,1,2</mn> <mo>.</mo> </mrow></math>
Selecting for each low frequency band the one with the largest correlation value in the offset correlation function with a certain high frequency band
Figure GDA0000095502450000163
r k max [ n ] = max ( r k m [ n ] ) .
The above calculation is performed for all high frequency bands k to obtain
Figure GDA0000095502450000165
Constituent maximum correlation matrix Rmax[k][n],Rmax[k][n]The maximum correlation value for all high and low frequency bands is recorded.
1804, estimating the variation characteristics of the envelope of the high-low frequency band, and calculating the difference of the envelope variation characteristics of the high-low frequency band;
the method for estimating the envelope characteristic comprises the steps of regarding a sampling point in the length of the copy frequency band as a sample, calculating an autocorrelation function within the second order of the sample, and then obtaining the difference of the high-frequency envelope characteristic and the low-frequency envelope characteristic by comparing the mean square error of the autocorrelation function of the high-frequency band and the low-frequency band.
Firstly, calculating a second-order autocorrelation function of high and low frequency components according to the length of a copy frequency band:
<math><mrow> <msubsup> <mi>r</mi> <mi>k</mi> <mi>m</mi> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>=</mo> <mi>Re</mi> <mrow> <mo>[</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>2</mn> </mrow> <mn>32</mn> </munderover> <msub> <mi>X</mi> <mi>High</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <msubsup> <mi>X</mi> <mi>High</mi> <mo>*</mo> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>-</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>0,1,2</mn> <mo>,</mo> </mrow></math>
<math><mrow> <msubsup> <mi>r</mi> <mi>n</mi> <mi>m</mi> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>=</mo> <mi>Re</mi> <mrow> <mo>[</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>2</mn> </mrow> <mn>32</mn> </munderover> <msub> <mi>X</mi> <mi>Low</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <msubsup> <mi>X</mi> <mi>Low</mi> <mo>*</mo> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>-</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>0,1,2</mn> <mo>,</mo> </mrow></math> wherein,
Figure GDA0000095502450000168
and
Figure GDA0000095502450000169
the autocorrelation functions of the high and low bands are represented, respectively, and m represents the autocorrelation interval.
Then, the envelope difference between the high and low frequency subbands is calculated:
<math><mrow> <mi>e</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&rho;</mi> <mn>1</mn> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>31</mn> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>r</mi> <mi>k</mi> <mn>1</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>-</mo> <msubsup> <mi>r</mi> <mi>n</mi> <mn>1</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msub> <mi>&rho;</mi> <mn>2</mn> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>2</mn> </mrow> <mn>31</mn> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>r</mi> <mi>k</mi> <mn>2</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>-</mo> <msubsup> <mi>r</mi> <mi>n</mi> <mn>2</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> </mrow></math> where ρ is1And ρ2Is a two coefficients, p121, e (k, n) represents the envelope difference between the high band k and the low band n.
Step 1805, comprehensive comparison
Figure GDA00000955024500001611
And e (k, n) selecting an optimal low-band replica for each replica band;
it is obvious thatThe larger the value of (e) represents the better the correlation between the high and low frequency bands, while the smaller the value of e (k, n), the more similar the envelopes representing the high and low frequency bands, and to find the most suitable matching relationship between the high and low frequency bands, uniform parameters must be formed for comparison, so the following transformation is performed:
Figure GDA0000095502450000172
where alpha and beta are the weight coefficients,
Figure GDA0000095502450000173
is the resulting band selection factor.
<math><mrow> <mi>&beta;</mi> <mo>=</mo> <mfrac> <mrow> <mi>min</mi> <mrow> <mo>(</mo> <mi>mean</mi> <mrow> <mo>(</mo> <msubsup> <mi>r</mi> <mi>k</mi> <mn>0</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>)</mo> </mrow> <mo>,</mo> <mi>mean</mi> <mrow> <mo>(</mo> <msubsup> <mi>r</mi> <mi>n</mi> <mn>0</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <mi>max</mi> <mrow> <mo>(</mo> <mi>mean</mi> <mrow> <mo>(</mo> <msubsup> <mi>r</mi> <mi>k</mi> <mn>0</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>)</mo> </mrow> <mo>,</mo> <mi>mean</mi> <mrow> <mo>(</mo> <msubsup> <mi>r</mi> <mi>n</mi> <mn>0</mn> </msubsup> <mo>[</mo> <mi>l</mi> <mo>]</mo> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow></math> Wherein
Figure GDA0000095502450000175
And
Figure GDA0000095502450000176
are respectively an array
Figure GDA0000095502450000177
And
Figure GDA0000095502450000178
α ═ 1- β.
Step 1806, band selection Table F is createdTableThe optimum band required for the duplication is indicated.
For each high-frequency subband k is selected such that
Figure GDA0000095502450000179
The value of n is the largest, and when the band replication extension strategy is selected, the reference band is not only the low frequency sub-band, but can be all bands before the detection band. Generating band selectionTABLE FTable[ke-k0+1]The optimum band required for the duplication is indicated.
After the optimal band selection is performed in step 1603 according to the selected band replication strategy, the optimal band may be used all the time, or time-varying characteristic detection may be further performed, and the band is reselected according to the detection result.
The time-varying characteristic detection is described in detail below:
the audio and voice signals generally have the same characteristic (namely quasi-stationary characteristic) within a period of time, so the same high-frequency copy strategy may be used in several continuous frames, under the condition that the copy strategy is not changed, the same copy frequency band selection table may be used in several continuous frames according to the time periodicity of the audio signals, the optimal frequency band selection is not required to be performed for each frame, and once the frequency band selection table is determined to be used for continuous multiple frames, the calculation amount and the transmission bit rate can be saved, and the continuity between frames can be ensured. In order to determine whether the band selection table of the previous frame can be used, a time-varying characteristic detection module is introduced. The time-varying characteristic detection is used for judging whether the current frame can use the frequency band selection table of the previous frame or not, if the difference of the audio characteristics between adjacent frames is detected to be larger than a threshold, the frequency band selection table is refreshed, and the frequency band is reselected; otherwise, the band selection table remains unchanged.
The time-varying characteristic detection method is to estimate the audio characteristic change of the low-frequency signals of the current frame and the previous frame, and specifically, an envelope difference comparison method can be adopted. If the envelope difference is small, the difference of the high-frequency signals is also small according to the correlation of the high frequency and the low frequency, and then a frequency band selection table generated by the previous frame can be used; if the envelope difference between the current low-frequency signal and the previous frame low-frequency signal is within the tolerance range, but there is a frequency offset, if the frequency offset is greater than 5% of the critical frequency band, the optimal frequency band needs to be reselected, and the frequency band selection table needs to be refreshed, because according to the pitch imbalance theory, the frequency difference of the two groups of overtones in the defined critical frequency band is between 5% and 50%, and the two groups of overtones are imbalanced, which can generate an auditory perception difference.
Please refer to fig. 19, which is a flowchart illustrating an algorithm for time-varying feature detection according to an embodiment of the present invention, including the steps of:
step 1901, calculate the low frequency subband energy mean square error E of the current frame and the previous frameerror
<math><mrow> <msub> <mi>E</mi> <mi>error</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <msub> <mi>k</mi> <mn>0</mn> </msub> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msup> <mi>E</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mrow> <msup> <mi>E</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> </mrow></math> Wherein E (n) represents the low frequency sub-band energy of the current frame, and E' (n) represents the previous oneThe low-frequency sub-band energy of the frame,
step 1902, determining the mean square error E of the low frequency subband energyerrorIf the threshold is smaller than the threshold Thr, if yes, go to step 1903, and if not, go to step 1906;
step 1903, estimating the offset frequency Δ of the low frequency band with concentrated energyf
Selecting the frequency band or frequency band with the highest energy, with the center frequency set as fc
Figure GDA0000095502450000182
Wherein f islAnd fhRespectively representing the lower and upper bounds of the highest energy band, and calculating the frequency offset deltaf=fc-f′c
Step 1904, determine the offset frequency ΔfIf the bandwidth is less than 5% of the current critical band bandwidth, go to step 1905, and if not, go to step 1906;
step 1905, follow the band selection table generated by the previous frame;
and step 1906, selecting the optimal frequency band again.
It should be noted that, in the case of the fixed-bandwidth copy method, compared with the procedure of the adaptive-bandwidth copy method, the sub-band signal obtained by the analysis filter module does not need to be subjected to the short-time characteristic analysis, and the bandwidth copy policy selection does not need to be performed according to the result of the short-time characteristic analysis, and the optimal bandwidth selection procedure and the procedure of the time-varying characteristic detection are the same.
And 1604, performing band replication according to the optimal band.
After the decoding end obtains the optimal frequency band, the frequency band can be copied according to the optimal frequency band. Please refer to fig. 20, which is a diagram illustrating a decoding side performing high frequency reconstruction according to parameter information of an encoding side according to an embodiment of the present invention.
Compared with the decoding end in the prior art, the decoding end in the embodiment of the invention has the advantages that the functions and the cooperation relation of most modules are not changed, and only the high-frequency subband copying strategy of the high-frequency generator module is modified. In the SBR code stream input into the high-frequency generator, three parameters are added according to the high-frequency reconstruction guide parameter information, namely a new algorithm use mark, a frequency band selection table replacement mark and a frequency band selection table.
The 'band selection table' is a relatively key parameter, and records the copy corresponding relation of the high-frequency sub-band and the low-frequency sub-band when the high-frequency sub-band is restored by each frame of signal.
The 'new algorithm use mark' determines whether to use the new algorithm at the encoding end to guide high-frequency reconstruction or to use the standard SBR method to guide high-frequency reconstruction during decoding. The new algorithm mentioned here refers to the algorithm used in the process of finally determining the high-frequency reconstruction parameters at the encoding end in the embodiment of the present invention described above. If the flag bit is '1', reconstructing high frequency according to a new algorithm; if the flag bit is '0', the high frequency is reconstructed according to the standard SBR method. By setting in this way, interfaces can be reserved for the compatibility of the new algorithm and the standard SBR method in subsequent research.
The band selection table replacement flag determines how the current signal obtains the band selection table when restoring the high frequency sub-band. If the flag bit is '0', the current signal directly uses the corresponding relation of the high-low frequency sub-bands of the previous frame signal to guide high-frequency copying; and if the flag bit is '1', finishing high-frequency copying according to the refreshed 'frequency band selection table' parameter read from the SBR code stream. The main role of the "band selection table replacement flag" is to reduce the amount of data to be transmitted to the decoding side. It should be noted that, when the "bandwidth selection table replacement flag" is 0, the "bandwidth selection table" parameter will not be included in the transmitted SBR code stream.
The following describes the code stream information received by the decoding end in detail.
Referring to table 1, a header file data structure is described for performing initialization setting when a decoding end starts to work.
Figure GDA0000095502450000191
Figure GDA0000095502450000201
TABLE 1HeaderFile data Structure Table
ENV _ DATA is a DATA structure that describes SBR information per frame. The parameters "new algorithm use flag", "band selection table replacement flag", and "band selection table" are all added to the ENV _ DATA structure describing the SBR information.
Defining a structure body variable which is specially used for storing three parameter information of a new algorithm use mark, a frequency band selection table replacement mark and a frequency band selection table, wherein the three parameter information are as follows:
Figure GDA0000095502450000211
the "new algorithm use flag" and the "band selection table replacement flag" are both "0" or "1", so that two character-type variables "flag _ 1" and "flag _ 2" are set to describe the "new algorithm use flag" and the "band selection table replacement flag", respectively.
The "band selection table" stores the corresponding numbers of the high-frequency subbands and low-frequency subbands to be restored, and stores them in the set FreTable [28 ]. Wherein, the number of high frequency sub-bands needing to be recovered is different from one coding mode to another. For the highest bit rate coding mode, 28 high frequency subbands need to be recovered. As the coding bit rate decreases, the number of high frequency subbands that need to be recovered decreases accordingly.
The position of the structure variable index Vector in the ENV _ DATA structure is shown in table 2.
Figure GDA0000095502450000212
Figure GDA0000095502450000221
TABLE 2ENV _ DATA Structure definition Table
The following describes an algorithm flow of the high frequency generator, please refer to fig. 21, which is a flowchart of the algorithm flow of the high frequency generator at the decoding end according to the embodiment of the present invention, and includes the following steps:
step 2101, receive "new algorithm use mark", "band selection table replace mark" and "band selection table";
step 2102, judging a new algorithm using mark, if the new algorithm using mark is 0, entering step 2103, and if the new algorithm using mark is 1, entering step 2104;
step 2103, decoding according to a standard SBR method;
step 2104, judge "band selection table replaces the sign", if it is 0, go to step 2105, if it is 1, go to step 2106;
step 2105, if the flag bit is "0", the current signal directly uses the high-low frequency subband corresponding relation of the previous frame signal to guide high-frequency copying;
it should be noted that after the high-low frequency subband corresponding relationship is determined for each frame of signal, the band selection table of the current frame is backed up in the buffer. If necessary, the next frame signal will call the band selection table in the buffer.
Step 2106, if the flag bit is "1", then according to the parameter of the "frequency band selection table" read from the SBR code stream, directing high-frequency replication;
and step 2107, finishing primary high-frequency copying.
The high-frequency sub-band after the primary copying enters an envelope adjusting module, a harmonic component adding module and the like for processing, and finally the high-frequency copying is finished.
The high-frequency signal reconstruction method can be used for accurately realizing the reconstruction of the high-frequency signal. Please refer to fig. 22, which is a waveform diagram of energy recovered from an original audio signal according to an embodiment of the present invention; fig. 23 is a three-dimensional diagram of energy recovered from an original audio signal by the method of the embodiment of the present invention. By comparing these two figures with the prior art figure, it can be found that the high frequency reconstruction effect of the embodiment of the present invention is better than that of the prior art. Therefore, the method of the embodiment of the invention can more accurately reconstruct the high-frequency signal by using the information of a few low-frequency sub-bands, is also beneficial to compressing the audio information, can greatly improve the compression efficiency of the audio and voice coder and simultaneously improve the audio quality; effectively reduces distortion and noise caused by bit rate audio and voice signal compression coding. And aiming at different audio characteristics, a plurality of corresponding frequency band replication strategies are provided, so that an adaptive high-frequency reconstruction method can be provided for various audio and voice signals, and the flexibility of audio and voice signal processing is improved.
The foregoing details describe the high frequency reconstruction method according to an embodiment of the present invention, and accordingly, an embodiment of the present invention provides an encoding module and a decoding module.
Please refer to fig. 24, which is a schematic structural diagram of an encoding module according to an embodiment of the present invention.
The encoding module includes: an analysis filter module 241, a band selection module 242.
The analysis filter module 241 is configured to filter the audio or speech signal to obtain a low-frequency subband and a high-frequency subband.
And a band selection module 242, configured to determine a band replication policy, obtain a correlation between the low-frequency subband and the high-frequency subband according to the determined band replication policy, select a low-frequency subband with a high correlation as an optimal replication band for the high-frequency subband, and output high-frequency reconstruction parameter information including a correspondence relationship between the selected bands.
The encoding module further comprises: the short-term characteristic analyzing module 243 analyzes the short-term characteristics of the audio or speech signal.
The band selection module 242 includes: a copy strategy selection module 2421, an optimal band selection module 2422.
A copy policy selecting module 2421, configured to correspondingly select different frequency band copy policies according to the result analyzed by the short-time characteristic analyzing module 243;
an optimal band selection module 2422, configured to obtain the correlation between the low-frequency subband and the high-frequency subband according to the determined band replication policy, select a low-frequency subband with a high correlation as an optimal replication band for the high-frequency subband, and output high-frequency reconstruction parameter information including a corresponding relationship of the selected bands.
The encoding module further comprises: a time-varying characteristic detection module 244, configured to perform time-varying characteristic detection on the filtered audio or voice signal; accordingly, the optimal band selection module 2422 also selects an optimal copy band according to the result of the detection by the time-varying characteristic detection module 244.
The short-time characteristic analysis module 243 specifically performs short-time characteristic analysis on the audio or voice signal as follows: calculating the energy mean value of the low-frequency part, the energy mean value of the high-frequency part, the energy value of each subband of the low-frequency part and the energy value of each subband of the high-frequency part of the audio or voice signal; the copy policy selecting module 2421 selects different band copy policies according to the result analyzed by the short-time characteristic analyzing module 243, specifically: comparing each subband energy value of the low-frequency part with the energy average value of the low-frequency part; if the subband energy values of the partial subbands with the low frequency part are less than or equal to the first weighted value of the energy average value of the low frequency part, the selected strategy is as follows: selecting a low-frequency sub-band in an energy set, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the low-frequency sub-band in the energy set; if the sub-band energy values of the low-frequency part sub-band are all larger than the first weighted value of the low-frequency part energy mean value, further comparing the high-frequency part energy mean value with the second weighted value of the low-frequency part energy mean value; if the mean value of the energy of the high-frequency part is less than or equal to a second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: selecting a whole low-frequency sub-band, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the whole low-frequency sub-band; if the mean value of the energy of the high-frequency part is larger than the second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: the high frequency is divided into a plurality of copy frequency bands, and a low frequency sub-band having a large correlation is selected for each copy frequency band to be copied.
The policies selected by the replication policy selection module 2421 further include: and when copying, further taking the high-frequency sub-band adjacent to the low-frequency sub-band and the selected low-frequency sub-band as a copying source, and selecting the low-frequency sub-band with high correlation from the high-frequency sub-band adjacent to the low-frequency sub-band for copying, thereby being equivalent to the expansion strategy of each strategy.
Please refer to fig. 25, which is a schematic structural diagram of a coding module according to an embodiment of the present invention.
The encoding module includes: an analysis filter module 241, a band selection module 242.
The analysis filter module 241 is configured to receive an audio or speech signal and then perform filtering processing to obtain a low-frequency subband and a high-frequency subband.
And a band selection module 242, configured to determine a band replication policy, obtain a correlation between the low-frequency subband and the high-frequency subband according to the determined band replication policy, select a low-frequency subband with a high correlation as an optimal band for the high-frequency subband, and output high-frequency reconstruction parameter information including a corresponding relationship between the selected bands.
The band selection module 242 includes: a copy strategy setting module 2423, an optimal band selection module 2422.
A copy policy setting module 2423, configured to determine a unique band copy policy according to preset parameters. The band replication policy is one of the policies described with reference to fig. 24 or an extension policy corresponding to the policy.
An optimal band selection module 2422, configured to obtain the correlation between the low-frequency subband and the high-frequency subband according to the determined band replication policy, select a low-frequency subband with a high correlation as an optimal band for the high-frequency subband, and output high-frequency reconstruction parameter information including a corresponding relationship of the selected bands.
The encoding module further comprises: a time-varying characteristic detection module 244, configured to perform time-varying characteristic detection on the filtered audio or voice signal; correspondingly, the optimal frequency band selection module also selects the optimal copy frequency band according to the result detected by the time-varying characteristic detection module.
Please refer to fig. 26, which is a block diagram illustrating a decoding module according to an embodiment of the present invention.
A decoding module having a high frequency generator module 261, said high frequency generator module 261 comprising: a receiving unit 2611 and a reconstruction unit 2612.
A receiving unit 2611, configured to receive high frequency reconstruction parameter information including a correspondence relationship of selected frequency bands, where the correspondence relationship of the selected frequency bands is specifically a correspondence between a low frequency subband and a high frequency subband having a large correlation.
A reconstruction unit 2612, configured to copy the low frequency subband as the high frequency subband in the high frequency band according to the high frequency reconstruction parameter information including the correspondence of the selected frequency band.
The parameter information received by the receiving unit 2611 further includes a new algorithm use flag and a band selection table replacement flag; the reconstructing unit 2612 determines an algorithm used in the copy process according to the new algorithm use flag, determines a band selection table used in the copy process according to the band selection table replacement flag, and copies the low-frequency subband in the correspondence relationship as a high-frequency subband in a high-frequency band according to the determined algorithm and the band selection table.
In summary, the embodiment of the present invention fully considers the correlation between the low frequency subband and the high frequency subband, obtains the correlation between the low frequency subband and the high frequency subband according to the determined frequency band replication policy, selects the low frequency subband with a large correlation as the optimal replication frequency band for the high frequency subband, and outputs the high frequency reconstruction parameter information including the correspondence relationship between the selected frequency bands, so as to perform high frequency reconstruction according to the high frequency reconstruction parameter information. The energy oscillogram and the energy three-dimensional graph of each subband signal obtained after high-frequency reconstruction can obviously find that the high-frequency part reconstructed by the method is closer to the original audio signal and has better effect than the prior art, so that the method and the device can more accurately reconstruct the high frequency.
Furthermore, the scheme of the embodiment of the invention can comprise an adaptive frequency band replication mode and a fixed frequency band replication mode, and has a flexible frequency band selection function.
Furthermore, the technical scheme of the embodiment of the invention can also add detection on the time-varying characteristics of the audio or voice signals and carry out adjustment according to the detection result.
The high frequency reconstruction method, the encoding module and the decoding module provided by the embodiments of the present invention are described in detail above, and persons skilled in the art may change the embodiments and the application scope according to the idea of the embodiments of the present invention.

Claims (8)

1. A high frequency reconstruction method, comprising:
filtering the audio signal to obtain a low-frequency sub-band and a high-frequency sub-band;
analyzing the short-time characteristics of the audio signal, and correspondingly selecting different frequency band replication strategies according to the result of analyzing the short-time characteristics;
obtaining the correlation between the low-frequency sub-band and the high-frequency sub-band according to the determined frequency band replication strategy, selecting the low-frequency sub-band with high correlation as an optimal replication frequency band for the high-frequency sub-band, and outputting high-frequency reconstruction parameter information comprising the corresponding relation of the selected frequency band;
the short-time characteristic analysis of the audio signal, and correspondingly selecting different frequency band replication strategies according to the result of the short-time characteristic analysis specifically include: calculating the energy mean value of the low-frequency part, the energy mean value of the high-frequency part, the energy values of each subband of the low-frequency part and the energy values of each subband of the high-frequency part of the audio signal; comparing each subband energy value of the low-frequency part with a first weighted value of the energy mean value of the low-frequency part; if the subband energy values of the partial subbands with the low frequency part are less than or equal to the first weighted value of the energy average value of the low frequency part, the selected strategy is as follows: selecting a low-frequency sub-band in an energy set, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the low-frequency sub-band in the energy set; if the sub-band energy values of the low-frequency part sub-band are all larger than the first weighted value of the low-frequency part energy mean value, further comparing the high-frequency part energy mean value with the second weighted value of the low-frequency part energy mean value; if the mean value of the energy of the high-frequency part is less than or equal to a second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: selecting a whole low-frequency sub-band, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the whole low-frequency sub-band; if the mean value of the energy of the high-frequency part is larger than the second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: the high frequency is divided into a plurality of copy frequency bands, and a low frequency sub-band having a large correlation is selected for each copy frequency band to be copied.
2. The high-frequency reconstruction method according to claim 1, characterized in that:
the step of filtering the audio signal further comprises:
detecting the time-varying characteristic of the audio signal after the filtering processing; accordingly, the method can be used for solving the problems that,
and further selecting an optimal copy frequency band according to the result of the time-varying characteristic detection.
3. The high-frequency reconstruction method according to claim 1, characterized in that:
the selected policy further comprises:
and further taking the high-frequency sub-band adjacent to the low-frequency sub-band and the selected low-frequency sub-band as a copy source during copying, and selecting the high-frequency sub-band adjacent to the low-frequency sub-band to copy the low-frequency sub-band with high correlation with the high-frequency sub-band.
4. The high-frequency reconstruction method according to claim 2, characterized in that:
the time-varying characteristic detection of the filtered audio signal, and further selecting an optimal copy frequency band in combination with the result of the time-varying characteristic detection specifically include:
and calculating the energy mean square error of the low-frequency sub-bands of the current frame and the previous frame, if the energy mean square error is less than a judgment threshold and the offset frequency of the low-frequency sub-bands in the energy set is greater than a preset value, re-selecting the optimal copy frequency band, and otherwise, continuing to use the previously selected optimal copy frequency band.
5. The high-frequency reconstruction method according to claim 1, characterized in that:
the obtaining of the correlation between the low-frequency subband and the high-frequency subband according to the determined band replication strategy, and selecting the low-frequency subband with high correlation as the optimal replication band for the high-frequency subband specifically include:
calculating correlation function values of the high-frequency sub-band and the low-frequency sub-band;
calculating an envelope difference value between the high-frequency sub-band and the low-frequency sub-band according to the autocorrelation function values of the high-frequency sub-band and the low-frequency sub-band;
and selecting a low-frequency sub-band with high correlation as an optimal copy frequency band for the high-frequency sub-band according to the correlation function value and the envelope difference value.
6. An encoding apparatus comprising an analysis filter module, a short-time characteristic analysis module, and a band selection module;
the analysis filter module is used for filtering the audio to obtain a low-frequency sub-band and a high-frequency sub-band;
the short-time characteristic analysis module is used for carrying out short-time characteristic analysis on the audio signal;
the frequency band selection module comprises a copy strategy selection module and an optimal frequency band selection module;
the copy strategy selection module is used for correspondingly selecting different frequency band copy strategies according to the analysis result of the short-time characteristic analysis module;
the optimal frequency band selection module is used for acquiring the correlation between the low-frequency sub-band and the high-frequency sub-band according to the determined frequency band replication strategy, selecting the low-frequency sub-band with high correlation as an optimal replication frequency band for the high-frequency sub-band, and outputting high-frequency reconstruction parameter information comprising the corresponding relation of the selected frequency band;
the short-time characteristic analysis module specifically performs short-time characteristic analysis on the audio signal as follows: calculating the energy mean value of the low-frequency part, the energy mean value of the high-frequency part, the energy values of each subband of the low-frequency part and the energy values of each subband of the high-frequency part of the audio signal;
the copy strategy selection module correspondingly selects different frequency band copy strategies according to the analysis result of the short-time characteristic analysis module, and specifically comprises the following steps: comparing each subband energy value of the low-frequency part with a first weighted value of the energy mean value of the low-frequency part; if the subband energy values of the partial subbands with the low frequency part are less than or equal to the first weighted value of the energy average value of the low frequency part, the selected strategy is as follows: selecting a low-frequency sub-band in an energy set, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the low-frequency sub-band in the energy set; if the sub-band energy values of the low-frequency part sub-band are all larger than the first weighted value of the low-frequency part energy mean value, further comparing the high-frequency part energy mean value with the second weighted value of the low-frequency part energy mean value; if the mean value of the energy of the high-frequency part is less than or equal to a second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: selecting a whole low-frequency sub-band, and copying the low-frequency sub-band in a selected high-frequency band with high correlation with the whole low-frequency sub-band; if the mean value of the energy of the high-frequency part is larger than the second weighted value of the mean value of the energy of the low-frequency part, the selected strategy is as follows: the high frequency is divided into a plurality of copy frequency bands, and a low frequency sub-band having a large correlation is selected for each copy frequency band to be copied.
7. The encoding apparatus according to claim 6, wherein the encoding apparatus further comprises:
the time-varying characteristic detection module is used for detecting the time-varying characteristic of the audio signal after the filtering processing; correspondingly, the optimal frequency band selection module also selects the optimal copy frequency band according to the result detected by the time-varying characteristic detection module.
8. The encoding device according to claim 6,
the policy selected by the copy policy selection module further comprises: and further taking the high-frequency sub-band adjacent to the low-frequency sub-band and the selected low-frequency sub-band as a copy source during copying, and selecting the high-frequency sub-band adjacent to the low-frequency sub-band to copy the low-frequency sub-band with high correlation with the high-frequency sub-band.
CN 200710305087 2007-12-27 2007-12-27 High-frequency reconstruction method, encoding device and decoding module Active CN101471072B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 200710305087 CN101471072B (en) 2007-12-27 2007-12-27 High-frequency reconstruction method, encoding device and decoding module
PCT/CN2008/073728 WO2009089728A1 (en) 2007-12-27 2008-12-25 Method for high frequency band replication, coder and decoder thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710305087 CN101471072B (en) 2007-12-27 2007-12-27 High-frequency reconstruction method, encoding device and decoding module

Publications (2)

Publication Number Publication Date
CN101471072A CN101471072A (en) 2009-07-01
CN101471072B true CN101471072B (en) 2012-01-25

Family

ID=40828487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710305087 Active CN101471072B (en) 2007-12-27 2007-12-27 High-frequency reconstruction method, encoding device and decoding module

Country Status (2)

Country Link
CN (1) CN101471072B (en)
WO (1) WO2009089728A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12094480B2 (en) 2017-03-23 2024-09-17 Dolby International Ab Backward-compatible integration of harmonic transposer for high frequency reconstruction of audio signals

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2792449C (en) 2010-03-09 2017-12-05 Dolby International Ab Device and method for improved magnitude response and temporal alignment in a phase vocoder based bandwidth extension method for audio signals
CA2792452C (en) 2010-03-09 2018-01-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing an input audio signal using cascaded filterbanks
RU2591012C2 (en) 2010-03-09 2016-07-10 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. Apparatus and method for handling transient sound events in audio signals when changing replay speed or pitch
JP5714180B2 (en) 2011-05-19 2015-05-07 ドルビー ラボラトリーズ ライセンシング コーポレイション Detecting parametric audio coding schemes
JP6010539B2 (en) * 2011-09-09 2016-10-19 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Encoding device, decoding device, encoding method, and decoding method
CN107993673B (en) * 2012-02-23 2022-09-27 杜比国际公司 Method, system, encoder, decoder and medium for determining a noise mixing factor
US9489959B2 (en) * 2013-06-11 2016-11-08 Panasonic Intellectual Property Corporation Of America Device and method for bandwidth extension for audio signals
CN105513601A (en) * 2016-01-27 2016-04-20 武汉大学 Method and device for frequency band reproduction in audio coding bandwidth extension
CN107221334B (en) * 2016-11-01 2020-12-29 武汉大学深圳研究院 Audio bandwidth extension method and extension device
CN106507113B (en) * 2016-11-28 2019-03-29 河海大学 One kind three describes lattice vector quantization prediction wing coding/decoding method
CN108489596B (en) * 2018-03-20 2020-04-21 南京凯奥思数据技术有限公司 Continuous scanning laser quick vibration measuring method and system thereof
CN108682413B (en) * 2018-04-24 2020-09-29 上海师范大学 Emotion persuasion system based on voice conversion
CN109036457B (en) * 2018-09-10 2021-10-08 广州酷狗计算机科技有限公司 Method and apparatus for restoring audio signal
CN110556122B (en) * 2019-09-18 2024-01-19 腾讯科技(深圳)有限公司 Band expansion method, device, electronic equipment and computer readable storage medium
CN113299313B (en) * 2021-01-28 2024-03-26 维沃移动通信有限公司 Audio processing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1465137A (en) * 2001-07-13 2003-12-31 松下电器产业株式会社 Audio signal decoding device and audio signal encoding device
CN1496559A (en) * 2001-01-12 2004-05-12 艾利森电话股份有限公司 Speech bandwidth extension
CN1527995A (en) * 2001-11-14 2004-09-08 ���µ�����ҵ��ʽ���� Encoding device and decoding device
CN1784020A (en) * 2004-12-01 2006-06-07 三星电子株式会社 Apparatus, method,and medium for processing audio signal using correlation between bands

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6978236B1 (en) * 1999-10-01 2005-12-20 Coding Technologies Ab Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1496559A (en) * 2001-01-12 2004-05-12 艾利森电话股份有限公司 Speech bandwidth extension
CN1465137A (en) * 2001-07-13 2003-12-31 松下电器产业株式会社 Audio signal decoding device and audio signal encoding device
CN1527995A (en) * 2001-11-14 2004-09-08 ���µ�����ҵ��ʽ���� Encoding device and decoding device
CN1784020A (en) * 2004-12-01 2006-06-07 三星电子株式会社 Apparatus, method,and medium for processing audio signal using correlation between bands

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12094480B2 (en) 2017-03-23 2024-09-17 Dolby International Ab Backward-compatible integration of harmonic transposer for high frequency reconstruction of audio signals

Also Published As

Publication number Publication date
CN101471072A (en) 2009-07-01
WO2009089728A1 (en) 2009-07-23

Similar Documents

Publication Publication Date Title
CN101471072B (en) High-frequency reconstruction method, encoding device and decoding module
US10373623B2 (en) Apparatus and method for processing an audio signal to obtain a processed audio signal using a target time-domain envelope
CN1766993B (en) Enhancing perceptual performance of high frequency reconstruction coding methods by adaptive filtering
CN107945811B (en) Frequency band expansion-oriented generation type confrontation network training method and audio encoding and decoding method
RU2676416C2 (en) Audio processor and method for processing audio signal using horizontal phase correction
CN101046964B (en) Error hidden frame reconstruction method based on overlap change compression coding
CN101297356B (en) Audio compression
US6708145B1 (en) Enhancing perceptual performance of sbr and related hfr coding methods by adaptive noise-floor addition and noise substitution limiting
KR100707174B1 (en) High band Speech coding and decoding apparatus in the wide-band speech coding/decoding system, and method thereof
EP2056294B1 (en) Apparatus, Medium and Method to Encode and Decode High Frequency Signal
Ravelli et al. Union of MDCT bases for audio coding
CN105518777A (en) Apparatus and method for encoding or decoding an audio signal with intelligent gap filling in the spectral domain
JP2009524100A (en) Encoding / decoding apparatus and method
KR20140023389A (en) Forensic detection of parametric audio coding schemes
CN104718571A (en) Method and apparatus for concealing frame error and method and apparatus for audio decoding
KR101035104B1 (en) Processing of multi-channel signals
CN109247069B (en) Encoding for reconstructing phase information by using structure tensor on audio spectrogram
CN101436407B (en) Method for encoding and decoding audio
JP2014531623A (en) Audio signal encoding method, audio signal decoding method, and apparatus using the same
CN103155035B (en) Audio signal bandwidth extension in CELP-based speech coder
CN101604524A (en) Stereo encoding method and device thereof, stereo decoding method and device thereof
RU2409874C9 (en) Audio signal compression
RU2414009C2 (en) Signal encoding and decoding device and method
Radfar et al. Performance evaluation of three features for model-based single channel speech separation problem.
JP3230782B2 (en) Wideband audio signal restoration method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant