CN109448745B

CN109448745B - Coding mode switching method and device and decoding mode switching method and device

Info

Publication number: CN109448745B
Application number: CN201811418613.1A
Authority: CN
Inventors: 黄冬梅; 郭轶芹; 袁浩
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2013-01-07
Filing date: 2013-01-07
Publication date: 2021-09-07
Anticipated expiration: 2033-01-07
Also published as: CN109448745A; CN103915100B; CN103915100A

Abstract

The embodiment of the invention provides a coding mode switching method, when an ith frame and an (i +1) th frame are respectively in an MDCT (modified discrete cosine transform) and ACELP (echo-coding) coding mode, the MDCT coding is carried out on the ith frame by adopting a predefined window type to obtain MDCT coding information; decoding the ith frame or the coded information of the ith frame and the frames before the ith frame to obtain a decoded signal; and after the historical state of a required filter in an ACELP coding mode is established and updated according to the decoding signal, the part of the input signal of the ith frame and the part of the input signal of the (i +1) th frame, the ACELP coding processing is carried out on the input signal of the length of the subsequent frame. The embodiment of the invention provides a device for switching coding modes from MDCT to ACELP. The embodiment of the invention also provides a method and a device for switching the decoding mode from the MDCT to the ACELP. The embodiment of the invention also provides a method and a device for switching the encoding mode from the ACELP to the MDCT and a method and a device for switching the decoding mode.

Description

Coding mode switching method and device and decoding mode switching method and device

The scheme is a divisional application of patent application 201310005140.3, the application date of the original application is 1, 7 and 2013, and the application number is 201310005140.3.

Technical Field

The present invention relates to the field of audio encoding and decoding, and in particular, to a method and an apparatus for switching encoding modes and a method and an apparatus for switching decoding modes.

Background

Coding techniques for audio signals can be divided into two broad categories, time-domain coding and frequency-domain coding.

In the prior art, known frequency domain Coding schemes include MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts Group Audio Layer 3), AAC (Advanced Audio Coding), and the like. These frequency-domain coding schemes are based on a time-domain/frequency-domain transform, followed by quantization and coding of the frequency-domain coefficients. In the quantization stage, a psychoacoustic model is used for controlling quantization errors; in the encoding stage, the quantized spectral information and corresponding side information are entropy encoded using a code table.

Known time-domain coding schemes are AMR-WB (Adaptive Multi-Rate-Wideband coder), etc. Such speech coding schemes are based on Linear Prediction (LP) filtering of the time domain signal. The LP filtering is obtained by linear prediction analysis of the input time domain signal. Then, the resulting LP filter coefficients are encoded and transmitted. This method is called Linear Predictive Code (LPC). After filtering the input signal with the LP filter, encoding is performed using ACELP (Algebraic Code Excited Linear Prediction).

The frequency domain coding scheme often uses MDCT (Modified Discrete Cosine Transform), which first performs MDCT Transform on an input signal to obtain an MDCT spectrum, then performs quantization coding on the MDCT spectrum under the constraint of a total bit rate according to a psychoacoustic model or other methods, and transmits the MDCT spectrum to a decoding end for decoding; the MDCT Transformation process can be decomposed into windowing, folding addition, and type IV DCT (discrete Cosine transform) Transformation processes. The folding of the signal after windowing can be divided into two folds: one odd fold and one even fold. The method is widely applied to modern audio encoders, and has a good encoding effect at a high code rate.

ACELP is often used in time-domain coding schemes, which first performs an LP analysis on the input signal to obtain the coefficients of the LP filter. And then, LP filtering is carried out on the input signal to obtain a prediction residual signal. The correlation analysis is performed on the current Prediction residual signal and the excitation signal of the previous frame to obtain a Pitch Lag (Pitch Lag) and a Pitch Gain (Pitch Gain), and the excitation signal with the Pitch Gain adjusted before one Pitch Lag is subtracted from the current residual signal to obtain a new residual signal, which is also called Long Term Prediction (LTP). And taking the new residual signal as a target, searching a codebook which is most matched with the new residual signal in a given algebraic codebook to obtain a corresponding codebook label, and solving a corresponding codebook gain. And finally, carrying out quantitative coding on the obtained LP filter coefficient, the pitch period, the pitch gain, the code book label and the code book gain, and transmitting the coefficients to a decoding end for decoding. The method is widely applied to a voice coder, and has good coding effect on voice signals.

The frequency domain coding scheme has the advantage that high quality coding of music signals can be achieved with a high code rate. But the quality is poor when encoding audio signals with low code rates. The time-domain coding scheme can achieve high-quality coding of a speech signal at a low code rate, and compared with the frequency-domain coding scheme, has higher quality at a similar bit rate for the speech signal and a significantly lower bit rate under the same coding and decoding quality, but has poor quality when a music signal is coded. Generally, the coding quality of music signals is better by frequency domain coding and the coding quality of speech signals is better by time domain coding under the medium code rate.

In order to achieve high quality coding for both speech and music, one solution is to combine a time-domain coding mode suitable for speech coding with a frequency-domain coding mode suitable for music coding to form a hybrid coding scheme. For example, the MDCT often used in frequency domain coding schemes is combined with the ACELP often used in time domain coding schemes to form a hybrid coding scheme. In this case, one problem to be solved is how to achieve seamless or smooth switching from one coding mode to another coding mode at a medium code rate without increasing the code rate, without increasing the delay, and with low computational complexity.

The switching method between the encoding mode based on ACELP and the encoding mode based on MDCT in the prior art adopts a pre-encoding technology, and the method has very high computational complexity; another switching method needs to perform additional coding on signals before and after switching or is realized based on a variable-rate MDCT coder, and the method can increase the code rate in the switching process or increase the switching delay under a constant code rate, thereby increasing the requirement on a transmission channel. Other switching methods, such as a method of obtaining a signal at a switching position by using a signal extrapolation method, cannot achieve a good smooth switching effect.

In summary, the disclosed method for switching between the ACELP-based coding mode and the MDCT-based coding mode is not effective, has high computational complexity, or requires an additional code rate or delay.

Disclosure of Invention

The invention aims to provide a method and a device for switching coding modes and a method and a device for switching decoding modes so as to obtain seamless or smooth switching between the two modes.

In order to solve the above problem, the present invention provides a method for switching coding modes, comprising:

when the coding mode type of the kth frame is determined to be an Algebraic Code Excited Linear Prediction (ACELP) coding mode, the previous frame, namely the kth-1 frame, is an ACELP coding mode, and the next frame, namely the kth +1 frame, is a Modified Discrete Cosine Transform (MDCT) coding mode, then:

the input signal of one frame length in the k frame and the k +1 frame is processed by down sampling to obtain a signal s on the ACELP core working frequency_dTo said s_dUsing a high-pass filter in ACELP coding to process to obtain a signal s_dHP(ii) a Wherein, the input signal of the kth frame contained in the input signal of one frame length in the kth frame and the (k +1) th frame is the signal of the rest part of the kth frame after the partial signal of the kth frame is input in the previous ACELP coding;

for the signal s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPc(ii) a For the s_dHPcCarrying out subsequent ACELP coding processing to obtain an ACELP coding code stream of the kth frame;

MDCT encoding the k +1 frame signal using a predefined window type; the predefined window type enables a part of signals to be reconstructed when the coding code stream of the (k +1) th frame is decoded to be overlapped with the reconstructed signals when the ACELP coding code stream of the (k) th frame is decoded, and the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching is compensated.

The invention also provides a decoding mode switching method, which comprises the following steps:

when the code stream type of the kth frame is algebraic code excited linear prediction ACELP code stream, the previous frame, namely the kth-1 frame, is ACELP code stream, and the code stream type of the next frame, namely the kth +1 frame, is Modified Discrete Cosine Transform (MDCT) code stream, then:

ACELP decoding is carried out on the kth frame code stream to obtain an input signal s of a post-high-pass filter in the ACELP decoding process_2ddpAnd the output signal s of the post-high-pass filter_2ddpHPAnd carrying out post high-pass filtering nonlinear phase shift compensation processing to obtain s_2ddpHPcTo s to_2ddpHPcCarrying out subsequent ACELP decoding processing to obtain an ACELP decoded signal of the kth frame and an ACELP decoded signal of the (k +1) th frame;

performing MDCT decoding on the (k +1) th frame code stream by adopting a predefined window type to obtain an MDCT decoded signal; the predefined window type enables a part of signals to be reconstructed when MDCT decoding is carried out on the (k +1) th frame code stream and the signals reconstructed when ACELP coding is carried out on the k +1 th frame code stream to be overlapped, and compensates the delay difference between the ACELP coding and decoding before decoding mode switching and the MDCT coding and decoding after the coding mode switching;

and processing the ACELP decoded signal and the MDCT decoded signal of the (k +1) th frame to obtain a final decoded signal of the (k +1) th frame.

The embodiment of the present invention further provides an encoding mode switching apparatus, where the encoding mode switching apparatus is configured to encode code streams of a kth frame and a (k +1) th frame when an encoding mode type of the kth frame is an algebraic code excited linear prediction ACELP encoding mode, a previous frame, that is, a (k-1) th frame, of the kth frame is an ACELP encoding mode, and a next frame, that is, a (k +1) th frame, of the kth frame is a modified discrete cosine transform MDCT encoding mode, and the encoding mode switching apparatus includes:

a third coding module for down-sampling the input signal with one frame length in the k frame and the k +1 frame to obtain a signal s at the ACELP core operating frequency_dTo said s_dUsing a high-pass filter in ACELP coding to process to obtain a signal s_dHP(ii) a To the aboveSignal s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPc(ii) a For the s_dHPcCarrying out subsequent ACELP coding processing to obtain an ACELP coding code stream of the kth frame; wherein, the input signal of the kth frame contained in the input signal of one frame length in the kth frame and the (k +1) th frame is the signal of the rest part of the kth frame after the partial signal of the kth frame is input in the previous ACELP coding;

a fourth encoding module for performing MDCT encoding on the k +1 frame signal using a predefined window type; the predefined window type enables a part of signals to be reconstructed when the coding code stream of the (k +1) th frame is decoded to be overlapped with the reconstructed signals when the ACELP coding code stream of the (k) th frame is decoded, and the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching is compensated.

The embodiment of the present invention further provides a decoding mode switching device, configured to decode a kth frame and a kth +1 frame code stream when a code stream type of the kth frame is an algebraic code excited linear prediction ACELP code stream, a previous frame, that is, a k-1 frame, is an ACELP code stream, and a code stream type of a subsequent frame, that is, a k +1 frame, is a modified discrete cosine transform MDCT code stream, including:

a third decoding module for ACELP decoding the kth frame code stream to obtain an input signal s of a post-high-pass filter in the ACELP decoding process_2ddpAnd the output signal s of the post-high-pass filter_2ddpHPAnd carrying out post high-pass filtering nonlinear phase shift compensation processing to obtain s_2ddpHPcTo s to_2ddpHPcCarrying out subsequent ACELP decoding processing to obtain an ACELP decoded signal of the kth frame and an ACELP decoded signal of the (k +1) th frame;

the fourth decoding module is used for performing MDCT decoding on the (k +1) th frame code stream by adopting a predefined window type to obtain an MDCT decoding signal; the predefined window type enables a part of signals to be reconstructed when MDCT decoding is carried out on the (k +1) th frame code stream and the signals reconstructed when ACELP coding is carried out on the k +1 th frame code stream to be overlapped, and compensates the delay difference between the ACELP coding and decoding before decoding mode switching and the MDCT coding and decoding after the coding mode switching;

and the second comprehensive processing module is used for processing the ACELP decoded signal and the MDCT decoded signal of the (k +1) th frame to obtain a final decoded signal of the (k +1) th frame.

In summary, the method and the device of the present invention can realize smooth switching between the MDCT coding mode and the ACELP coding mode, and compared with the prior art, the method and the device of the present invention have the advantages of no code rate increase, no delay increase, low computational complexity in the switching process, good switching effect, and the like.

Drawings

Fig. 1 is a coding flow chart of smooth handover from g.722.1 to g.722.2 in embodiment 1 of the present invention;

FIGS. 2-5 show first through fourth window types in G.722.1 codec according to an embodiment of the present invention;

FIG. 6 is a flow chart of the ACELP pretreatment in example 1 of the present invention;

FIG. 7 is a flowchart of state establishment and update of G.722.2 encoding in embodiment 1 of the present invention;

fig. 8 is a decoding flowchart of the smooth handover from g.722.1 to g.722.2 in embodiment 1 of the present invention;

FIG. 9 is a flowchart of state establishment and update of G.722.2 decoding in embodiment 1 of the present invention;

fig. 10 is a flowchart of encoding in the case of smooth handover from g.722.2 to g.722.1 in embodiment 2 of the present invention;

fig. 11 is a decoding flowchart in the case of smooth handover from g.722.2 to g.722.1 in embodiment 2 of the present invention;

fig. 12 is a block diagram of an encoding mode switching means when the encoding mode is switched from the MDCT mode to the ACELP mode in embodiment 3 of the present invention;

fig. 13 is a block diagram of a second encoding block in the encoding mode switching apparatus when the encoding mode is switched from the MDCT mode to the ACELP mode in embodiment 3 of the present invention;

fig. 14 is a block diagram of a decoding mode switching means at the time of switching the encoding mode from the MDCT mode to the ACELP mode in embodiment 3 of the present invention;

fig. 15 is a block diagram of a second decoding module in the decoding mode switching apparatus when the encoding mode is switched from the MDCT mode to the ACELP mode in embodiment 3 of the present invention;

fig. 16 is a block diagram of an encoding mode switching means when the encoding mode is switched from the ACELP mode to the MDCT mode in embodiment 3 of the present invention;

fig. 17 is a block diagram of a decoding mode switching device when the encoding mode is switched from the ACELP mode to the MDCT mode in embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

In the present application, the meaning of smooth rise or smooth fall includes: the first derivative of the ascending function or the descending function is continuous.

iN the present application, the signal number of the i-th frame is (i-1) × n.. iN-1; n is the length of one frame signal.

The main ideas of the application include: in the process of switching the signal coding mode from the MDCT coding mode to the ACELP coding mode, when the last frame of signal is processed by the MDCT coding mode before switching, a new MDCT coding window type and a new decoding window type are adopted, so that a part of additional subsequent signals can be reconstructed through current coding and decoding, and certain overlap is formed between the additional subsequent signals and the signals processed based on the ACELP coding mode; when the first frame signal is processed by using the ACELP coding mode after switching, the historical state of each filter in the ACELP coding and decoding is initialized, then the historical state of the filter required by the subsequent ACELP coding and decoding is established by using the input signal and the decoding signal of the MDCT coding part, the ACELP coding and decoding operation after switching is carried out on the basis, and the influence of a high-pass filter in the ACELP coding and decoding on the coding mode switching can be compensated; at this time, the switching smoothness can be ensured to a certain extent by the memorability of each filter in the ACELP coding mode; finally, the smoothness of switching is further ensured by carrying out overlapping addition on the overlapped part of the MDCT decoded signal and the ACELP decoded signal;

in the process of switching the signal coding mode from the ACELP coding mode to the MDCT coding mode, when the last frame signal is processed by using the coding mode of the ACELP before switching, the influence of a high-pass filter in the ACELP coding and decoding on the switching of the coding mode is compensated; when a first frame signal is processed by using an MDCT coding mode after switching, MDCT coding and decoding are initialized, and a part of signals in the connection direction with ACELP coding and decoding can be reconstructed by adopting a new MDCT coding window type and a new decoding window type through single coding and decoding, wherein the part of signals has certain overlap with the last frame signal processed by the ACELP coding before switching, and the decoding signals behind the reconstructed signals can be combined with the subsequent MDCT decoding signals or ACELP decoding signals to realize signal reconstruction; finally, by overlapping and adding the overlapping part of the ACELP decoded signal and the MDCT decoded signal, the smoothness of switching is ensured.

The embodiment of the invention provides a coding mode switching method, which comprises the following steps:

when the coding mode of the ith frame is determined to be a Modified Discrete Cosine Transform (MDCT) coding mode, and the coding mode of the subsequent frame, namely the (i +1) th frame, is determined to be an algebraic code-excited linear prediction (ACELP) coding mode, then:

performing MDCT coding on an input signal of an ith frame by adopting a predefined window type to obtain coding information of the ith frame; the predefined window type enables partial signals of the (i +1) th frame to be reconstructed when the coded code stream of the (i) th frame is decoded;

decoding the coded information of the ith frame or the ith frame and frames before the ith frame to obtain a decoded signal;

establishing and updating the historical state of a required filter in an ACELP coding mode according to the decoding signal, the partial input signal comprising the ith frame and the first partial input signal comprising the (i +1) th frame, and performing ACELP coding processing on the input signal comprising the second partial input signal comprising the (i +1) th frame and the partial input signal comprising the (i + 2) th frame with the same frame length based on the updated historical state of the required filter in the ACELP coding mode;

wherein the i +1 th frame input signal is composed of a first partial input signal and a second partial input signal which are not intersected, and the first partial input signal is earlier than the second partial input signal.

In an alternative of this embodiment, when the coding mode of the frame preceding the ith frame is the MDCT mode, the predefined window type is a second window type, and the second window type satisfies the following condition:

the second window type comprises 5 parts from left to right in sequence: a first null region, a rising window region, a 1 value holding region, a falling window region, a second null region, wherein:

the first zero-value region has a value of 0 and has the same length as the 1-value holding region on the left side of the center of the window of the second window type;

the ascending window area is consistent with the window type of the first window type on the left side of the window center;

the value of the 1 value holding area is 1, and the length is N₁The value range is D_2f≤N₁≤N；

Smoothly descending from 1 to 0 in the descending window region, and having a length N_1fSatisfies 0<N_1f≤N-N₁；

The second zero-value region has a value of 0 and has the same length as the 1-value holding region on the right side of the center of the window of the second window type;

wherein, the N is the number of sampling points of a frame signal of the MDCT coding mode, and the D is_2fThe number of corresponding sampling points on the sampling rate of the input signal is the delay generated by sampling rate conversion in the ACELP; the first window type is a window type used when MDCT encoding is performed on a frame that is not in encoding mode switching.

In an alternative of this embodiment, when the encoding mode of the frame preceding the ith frame is ACELP mode, the predefined window type is a fourth window type, and the fourth window type satisfies the following condition:

the fourth window type comprises 5 parts from left to right in sequence: a first null region, a rising window region, a 1 value holding region, a falling window region, a second null region, wherein:

the first zero-value region has a value of 0 and has the same length as the 1-value holding region on the left side of the window center of the fourth window type;

the ascending window region smoothly ascends from 0 to 1 and has a length N₂Greater than 0;

the 1 value holding region has a value of 1 and a length of N₃And satisfies the following conditions: n is a radical of₃≥D₁+D2f；

The descending window area is a window smoothly descending from 1 to 0 and has a length N_1fSatisfies 0<N_1f≤N-D2f；

The second zero-value area has a value of 0 and has the same length as the 1-value holding area on the right side of the center of the window of the fourth window type;

wherein, D is₁The number of sampling points corresponding to the time delay generated by the overlapping of frames in the MDCT coding mode on the sampling rate of the input signal; d_2fThe number of corresponding sampling points on the sampling rate of the input signal for the delay generated by the sampling rate conversion in the ACELP is N, and N is the number of sampling points of one frame signal in the MDCT coding mode.

In an alternative of this embodiment, the length L of the predefined window pattern_w>2 × N, scaling parameters related to the length of the predefined window type with a scaling factor [ L ] in the course of MDCT encoding of the input signal of the i-th frame using the predefined window type_w/2N]，[·]Indicating that the nearest integer is taken.

In an alternative of this embodiment, decoding the encoded information of the ith frame or the ith frame and frames before the ith frame to obtain a decoded signal includes:

decoding according to the coded information of the ith frame or the ith frame and frames before the ith frame to obtain a decoded signal s comprising M sampling points of the ith frame and the (i +1) th frame_1dM ═ M₂+D_12o-D_2f(ii) a Wherein, M is₂Not less than the length of the signal required for the subsequent creation and updating of the history state of the required filter in ACELP coding, said D_12oThe number of samples in the region of overlap of the MDCT decoded signal and the ACELP decoded signal when switching from the MDCT mode to the ACELP mode for the coding mode, and D_12o≥D_2f，D_2fGenerated for sample rate conversion in ACELPDelaying the number of corresponding sampling points on the sampling rate of the input signal.

In an alternative of this embodiment, the establishing and updating the history state of the required filter in the ACELP coding mode according to the decoded signal, the partial input signal including the i-th frame and the first partial input signal including the i + 1-th frame includes:

for the decoded signal s_1dDown-sampling the ACELP coding mode to obtain the signal s_1dd；

Using a down-sampling filter in the ACELP coding mode to down-sample the partial input signal of the i frame and the first partial input signal of the (i +1) th frame to obtain a signal s_dThereafter, saving the state S of the downsampling filter_d。

In an alternative of this embodiment, the pair of the decoded signals s_1dDown-sampling the ACELP coding mode to obtain the signal s_1dd(ii) a And downsampling the partial input signal of the i-th frame and the first partial input signal of the (i +1) -th frame by using a downsampling filter in an ACELP coding mode to obtain a signal s_dThereafter, saving the state S of the downsampling filter_dBefore outputting the MDCT encoded code stream of the i-th frame.

In an alternative of this embodiment, the establishing and updating the history state of the required filter in the ACELP coding mode according to the decoded signal, the partial input signal including the i-th frame and the first partial input signal including the i + 1-th frame further includes one or a combination of the following:

pair s with pre-emphasis filter in ACELP coding mode_1ddPre-emphasis is carried out to obtain a pre-emphasized signal s on the core working frequency of the ACELP coding mode_1ddp；

Using a high-pass filter in the pre-processing in the ACELP coding mode for the signal s_dAfter high-pass filtering, saving the state S of the high-pass filter_HP；

Using a pre-emphasis filter in the ACELP coding mode for the signal s_dTo perform pre-loadingRecovering the signal s_dpThen, the s is added_dpSaving the state S of the pre-emphasis filter as an ACELP pre-processing signal_p；

For input signal s₂When down-sampling processing in ACELP coding is carried out, the state S of the down-sampling filter is set_dAs the history state of the downsampling filter in ACELP coding; the input signal s₂An input signal with a common frame length of a second part of the input signal of the (i +1) th frame and a part of the input signal of the (i + 2) th frame;

for the input signal s₂In the process of ACELP coding, the state S of the high-pass filter is adjusted_HPAs the history state of the high pass filter in ACELP coding;

for the input signal s₂In the process of ACELP coding, the state S of the pre-emphasis filter is set_pAs the history state of the pre-emphasis filter in ACELP coding;

for the input signal s₂In the ACELP coding process, when a historical pre-emphasis input signal is required to be utilized, partial information of the ACELP pre-processing signal is used as the pre-emphasized input signal on the historical ACELP core working frequency required in the ACELP coding;

for the input signal s₂In the ACELP coding process, when historical unquantized reactance spectrum is needed to be used for ISP coefficients, the ISP coefficients corresponding to the unquantized linear predictive coding LPC coefficients obtained by calculation in the ACELP coding process are used as historical unquantized ISP coefficients needed in the ACELP coding;

for the input signal s₂In the ACELP coding process, when a historical perception weighting signal is needed, a perception weighting filter consisting of LPC coefficients of a first subframe after interpolation obtained by calculation in the ACELP coding process is used for carrying out perception weighting filtering on the ACELP preprocessing signal, and the obtained perception weighting signal is used as the historical perception weighting signal needed in the ACELP coding;

for the input signal s₂In the process of ACELP coding, when the open-loop pitch search is neededDuring searching, a high-pass filter in open-loop pitch search in the ACELP coding process is used for carrying out high-pass filtering processing on the perception weighted signal to obtain a perception weighted signal after high-pass filtering and the state of the high-pass filter, the perception weighted signal after high-pass filtering is used as historical high-pass filtered perception weighted signal cache data needed by open-loop pitch search gain calculation, and the state of the high-pass filter is used as the historical state of the high-pass filter needed by the open-loop pitch search gain calculation;

for the input signal s₂In the ACELP coding process, when historical quantized ISP coefficients are needed to be used, the quantized ISP coefficients obtained by calculation in the ACELP coding process are used as the historical quantized ISP coefficients needed in the ACELP coding;

for the input signal s₂Using said s when it is required to use the history of the LPC synthesis filter in performing ACELP coding_1ddpLast M_LPCoThe length information is used as the history state of the LPC synthesis filter required when processing the first subframe in the ACELP coding; wherein, M is_LPCo is the order of LPC in ACELP coding;

for the input signal s₂In the ACELP coding process, when the historical excitation signal is needed to be used, the LPC coefficient of the first subframe calculated in the ACELP coding process is used for forming a prediction analysis filter, and the s is subjected to the prediction analysis filter_1ddpAnalyzing and filtering to obtain residual signals of LPC analysis filtering, and taking the obtained residual signals of LPC analysis filtering as historical excitation signals required in ACELP coding;

for the input signal s₂In the process of ACELP coding, when closed-loop pitch search is required, calculating the error between a coded input signal of one frame length before the initial position of a current ACELP coding processing signal and an MDCT decoded signal at a corresponding position; filtering the error by using a perception weighting filter in the ACELP coding process, and taking the state of the obtained perception weighting filter as the historical state of the perception weighting filter in the calculation of a target signal required by closed-loop pitch search in the ACELP coding process;

in an alternative of this embodiment, the performing, based on the updated historical state of the required filter in the ACELP coding mode, ACELP coding on the input signal that includes the second partial input signal of the (i +1) th frame and the partial input signal of the (i + 2) th frame and has the length of a common frame includes:

in the ACELP coding process of the input signal with the length of a common frame including the second part of the input signal of the (i +1) th frame and the part of the input signal of the (i + 2) th frame:

when the fixed codebook gain predicted value needs to be utilized, if the method for calculating the fixed codebook gain predicted value in the original ACELP coding is a prediction method, a non-prediction method is adopted to calculate to obtain the fixed codebook gain predicted value; the original ACELP coding is an ACELP coding of a frame in a non-MDCT and ACELP switching and of type ACELP coding mode; the non-prediction method for calculating the prediction value of the fixed codebook gain refers to a method for predicting the current fixed codebook gain by using the information of the current coding processing signal;

when the codebook gain quantization of each subframe is needed, comparing a fixed codebook gain predicted value obtained by a prediction method in the original ACELP coding with a fixed codebook gain predicted value obtained by the non-prediction method, and selecting a value which enables the minimum coding error energy of the subframe from the fixed codebook gain predicted value and the fixed codebook gain predicted value as a final fixed codebook prediction gain value of the subframe; simultaneously, recording the fixed codebook gain predicted value selected by the subframe by adopting a selection flag bit; the subframe codebook gain is quantized based on the selected fixed codebook prediction gain value and the quantization energy prediction error is updated.

In an alternative of this embodiment, the method further includes:

after calculating the high-frequency gain and the index, setting the high-frequency gain of the first subframe as the minimum value, and using the original bits for transmitting the high-frequency gain of the first subframe for transmitting the fixed codebook gain predicted value obtained by the non-prediction method and the selection flag bit information of the fixed codebook gain predicted value of the first subframe; and reducing the precision represented by the high-frequency gain indexes of the second to fourth subframes by 1 bit respectively, and transmitting the selection flag bit information of the fixed codebook gain predicted values of the second to fourth subframes by the saved bits respectively.

In an alternative of this embodiment, the performing, based on the updated historical state of the filter in the ACELP coding mode, ACELP coding on the input signal that includes the second partial input signal of the (i +1) th frame and the partial input signal of the (i + 2) th frame and has the length of a common frame includes:

in the process of performing ACELP coding on the input signal with one frame of the total length of the second part of the input signal comprising the (i +1) th frame and the part of the input signal comprising the (i + 2) th frame, a downsampling filter of an ACELP coding mode is used for downsampling the input signal with one frame of the total length of the second part of the input signal comprising the (i +1) th frame and the part of the input signal comprising the (i + 2) th frame to obtain a signal s_d2Using a high-pass filter on the signal s_d2High-pass filtering to obtain s_dHP2；

For the s_dHP2Carrying out high-pass filtering nonlinear phase shift compensation processing to obtain s_dHPc2To said s_dHPc2A subsequent ACELP coding process is performed.

In an alternative of this embodiment, the pair s_dHP2Carrying out high-pass filtering nonlinear phase shift compensation processing to obtain s_dHPc2The method comprises the following steps:

if the coding mode of the (i + 2) th frame is an MDCT mode: setting the output signal of the high-pass filter for compensating the influence of the nonlinear phase shift of the high-pass filtering as the input signal of the high-pass filter, namely: s_dHPc2＝s_d2。

if the coding mode of the (i + 2) th frame is ACELP mode:

input signal s to the high-pass filter_d2Applying a length L having the characteristic of smoothly dropping from 1 to 0_hpe1Obtaining the signal s by the first falling window of_d2w(ii) a And, an output to the high-pass filterSignal s_dHP2Applying a length L having a smooth rise from 0 to 1 characteristic_hpe1Obtaining the signal s by the first window_dHP2w(ii) a Will signal s_d2wAnd s_dHP2wThe value obtained by the superposition is taken as s_dHPc2L in the first ascending window_hpe1Points, s_dHPc2A value before the first window of ascent and the s_d2In agreement, said s_dHPc2Values after the first ascending window and the s_dHP2In agreement, said L_hpe1The length of a frame signal on the ACELP core working frequency is less than or equal to 1, and the sum of the first descending window and the first ascending window is equal to or less than 1.

In an alternative of this embodiment, the first descending window is a linear descending window, and the first ascending window is a linear ascending window.

In an alternative of this embodiment, the method further includes:

if the coding mode of the (i + 2) th frame is an MDCT coding mode, coding the (i + 2) th frame as follows:

performing MDCT coding on the i +2 frame signal by using a preset window type; the preset window type enables a part of signals to be reconstructed when the coding code stream of the (i + 2) th frame is decoded to be overlapped with the signals reconstructed when the ACELP coding code stream of the (i +1) th frame is decoded, and the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching is compensated.

In an alternative of this embodiment, if the coding mode of the frame subsequent to the i +2 th frame is an MDCT coding mode, then:

the preset window type is a third window type, and the third window type sequentially comprises 5 parts from left to right: a first null region, a rising window region, a 1 value holding region, a falling window region, a second null region, wherein:

the first zero-value region has a value of 0 and has the same length as the 1-value holding region on the left side of the window center of the third window type;

the ascending window region smoothly ascends from 0 to 1 and has a length N₂Said N is₂Greater than 0;

the 1 value holding region has a value of 1 and a length N_2cSatisfies the following conditions: n is a radical of_2c≥D₁+D_21o-N+D_2f；

The descending window area is consistent with the window type of the first window type on the right side of the center of the window;

the second zero value zone has a value of 0 and has the same length as the 1 value holding zone on the right side of the window center of the third window type;

wherein D is₁The number of sampling points corresponding to the time delay generated by the overlapping of frames in the MDCT coding mode on the sampling rate of the input signal; d_2fThe number of corresponding sampling points, D, on the input signal sampling rate for the delay produced by the sampling rate conversion in ACELP_21oThe number of sampling points of an overlapping region of the MDCT decoding signal and the ACELP decoding signal is more than or equal to 0 when the coding mode is switched from the ACELP mode to the MDCT mode, and N is the number of sampling points of a frame signal of the MDCT coding mode;

the first window type is a window type used when MDCT encoding is performed on a frame that is not in encoding mode switching.

In an alternative of this embodiment, the length L of the predetermined window shape_w3>2 × N, scaling parameters related to the length of the preset window type with a scaling scale of [ L ] N in the process of performing MDCT encoding on the input signal of the i +2 th frame using the preset window type_w3/2N]。

The embodiment of the invention also provides a decoding mode switching method, which comprises the following steps:

when the code stream type of the ith frame is a Modified Discrete Cosine Transform (MDCT) code stream, and the code stream type of the subsequent frame, namely the (i +1) th frame, is an Algebraic Code Excited Linear Prediction (ACELP) code stream, then:

performing MDCT decoding on the code stream of the ith frame by adopting a predefined window type to obtain a decoded signal of the ith frame and an MDCT decoded signal of the (i +1) th frame; the predefined window type enables partial signals of the (i +1) th frame to be reconstructed when MDCT decoding is carried out on the code stream of the ith frame;

establishing and updating a history state of a required filter in ACELP decoding according to the partial decoding signal of the ith frame and the MDCT decoding signal of the (i +1) th frame, and performing ACELP decoding on the code stream of the (i +1) th frame based on the updated history state of the required filter in the ACELP decoding to obtain an ACELP decoding signal;

and processing the MDCT decoded signal of the (i +1) th frame and the ACELP decoded signal to obtain a final decoded signal of the (i +1) th frame.

In an alternative of this embodiment, the processing the MDCT decoded signal of the i +1 th frame and the ACELP decoded signal to obtain the final decoded signal of the i +1 th frame includes:

decoding the MDCT decoded signal s of the i +1 th frame_1dApplying a length L having the characteristic of smoothly dropping from 1 to 0_mafObtaining the signal s by the second falling window of_1dw(ii) a And decoding the ACELP decoded signal s_2dApplying a length L having a smooth rise from 0 to 1 characteristic_mafObtaining the signal s from the second window_2dwThe final decoded signal s of the (i +1) th frame_fdThe value in the second ascending window is s_1dw+s_2dwS of said s_fdA value before the second ascending window and the s_1dIn agreement, said s_fdValues after the second ascending window and the s_2dThe consistency is achieved; wherein a sum of the second falling window and the second rising window is 1, 0<L_maf≤D_12oSaid D is_12oThe number of sample points of the overlapping region of the MDCT decoded signal and the ACELP decoded signal.

In an alternative of this embodiment, when the coding mode of the frame preceding the i-th frame is the MDCT mode, the predefined window type is the second window type.

In an alternative of this embodiment, when the coding mode of the frame preceding the i-th frame is the ACELP mode, the predefined window type is the fourth window type.

In an alternative of this embodiment, the establishing and updating the history state of the required filter in ACELP decoding according to the partially decoded signal of the i-th frame and the MDCT decoded signal of the i + 1-th frame includes:

to what is neededDown-sampling the i-th frame partial decoded signal and the i + 1-th frame MDCT decoded signal to obtain a signal s_1dd。

In an alternative of this embodiment, the downsampling is performed on the partially decoded signal of the i-th frame and the MDCT decoded signal of the i + 1-th frame to obtain a signal s_1ddBefore outputting the decoded signal of the i frame.

In an alternative of this embodiment, the establishing and updating the history state of the required filter in ACELP decoding according to the partially decoded signal of the i-th frame and the MDCT decoded signal of the i + 1-th frame further includes one or a combination of the following:

pair s with pre-emphasis filter in ACELP coding mode_1ddPre-emphasis is performed to obtain a signal s_1ddp；

In the process of ACELP decoding the code stream of the (i +1) th frame, when the quantized ISP coefficient of the previous frame is needed, the quantized ISP coefficient obtained by decoding is used as the quantized ISP coefficient of the previous frame needed in the ACELP decoding;

in the process of ACELP decoding the code stream of the (i +1) th frame, when the historical excitation signal of an LPC synthesis filter is needed, the LPC coefficients of a first subframe which are obtained by calculation in the ACELP decoding and are quantized and interpolated are used for forming a prediction analysis filter, and the s +1 th frame is subjected to the coding process_1ddpAnalyzing and filtering to obtain residual signals of LPC analysis filtering, and using the obtained residual signals of LPC analysis filtering as historical excitation signals of an LPC synthesis filter required in ACELP decoding;

in the process of ACELP decoding the code stream of the (i +1) th frame, when the historical state of an LPC synthesis filter is needed, the s is used_1ddpMiddle and last M_LPCoInformation of the length as the history state of the LPC synthesis filter in ACELP decoding, where M_LPCoThe order of LPC in ACELP coding;

in the process of ACELP decoding the code stream of the (i +1) th frame, when a de-emphasis filter is needed, the s is utilized_1ddAs the history of the de-emphasis filter in ACELP decodingA state;

in the process of ACELP decoding the code stream of the (i +1) th frame, when an up-sampling filter is needed, the s is used_1ddLast D of_2fdOne sample point as the history state of the up-sampling filter in ACELP decoding, D_2fdThe delay generated for the sampling rate conversion in ACELP corresponds to the number of samples on the sampled signal at the core operating frequency of the ACELP coding mode.

In an alternative of this embodiment, the performing ACELP decoding on the code stream of the i +1 th frame based on the updated historical states of the filters in the ACELP decoding includes:

when the fixed codebook gain of each subframe is needed, if the method for calculating the fixed codebook gain predicted value in the original ACELP coding is a prediction method, the fixed codebook gain predicted value calculated by a non-prediction method is solved, and the corresponding fixed codebook gain predicted value is selected to calculate the fixed codebook gain of the first subframe to the fourth subframe according to the selection flag bit information of the fixed codebook gain predicted value of the first subframe to the fourth subframe, wherein the original ACELP coding is the ACELP coding performed on the frame which is in a non-MDCT and ACELP switching mode and has the ACELP coding mode.

The fixed codebook gain predictor and the selection flag bit information may be obtained by, but are not limited to:

and resolving the fixed codebook gain predicted value obtained by the non-prediction method from the high-frequency gain of the first subframe obtained by ACELP decoding, and resolving the selection flag bit information of the fixed codebook gain predicted values of the first to fourth subframes from the high-frequency gains of the first to fourth subframes respectively.

and in the process of carrying out ACELP decoding on the code stream of the (i +1) th frame, carrying out post high-pass filtering nonlinear phase shift compensation processing when carrying out post high-pass filtering.

In an alternative of this embodiment, the performing post-high-pass filtering nonlinear phase shift compensation processing includes:

if the code stream type of the (i + 2) th frame is MDCT type, the input signal of the post-high-pass filter in the ACELP decoding process is s_2ddpThe output signal s of the post-high-pass filter for compensating the effect of the high-pass filtering non-linear phase shift_2ddpHPcFor the post-high-pass filtered input signal s_2ddp。

if the code stream type received by the (i + 2) th frame is an ACELP type, an input signal s of a post-high-pass filter in the ACELP decoding is subjected to_2ddpApplying a length L having the characteristic of smoothly dropping from 1 to 0_hpd1To obtain a windowed high-pass filtered input signal s_2ddpwFor the output signal s of the post-high-pass filter_2ddpHPApplying a length L having a smooth rise from 0 to 1 characteristic_hpd1To obtain a windowed high-pass filtered output signal s_2ddpHPw；

Subjecting said s to_2ddpwAnd said s_2ddpHPwAdding to obtain output signal s of post-high-pass filter for compensating nonlinear phase shift influence of high-pass filter_2ddpHPcL in the third descending window_hpd1Point, said s_2ddpHPcThe value before the third falling window and s_2ddpIn agreement, the value after the third falling window is equal to s_2ddpHPConsistent, 0. ltoreq.L_hpd1≤N_d；N_dIs the length of a frame signal above the ACELP core operating frequency, and the sum of the third falling window and the third rising window is 1.

In an alternative of this embodiment, the third descending window is a linear descending window, and the third ascending window is a linear ascending window.

In an alternative of this embodiment, if the code stream type of the i +2 th frame is an MDCT code stream, the i +2 th frame is decoded as follows:

performing MDCT decoding on the code stream of the (i + 2) th frame by using a preset window type; the preset window type enables a part of signals to be reconstructed when the coding code stream of the (i + 2) th frame is decoded to be overlapped with the signals reconstructed when the ACELP coding code stream of the (i +1) th frame is decoded, and the delay difference between the ACELP coding and decoding before mode switching and the MDCT coding and decoding after mode switching is compensated.

In an alternative of this embodiment, if the code stream type of the frame subsequent to the i +2 th frame is an MDCT code stream, then: the preset window type is a third window type.

The embodiment of the invention also provides a coding mode switching method, which comprises the following steps:

In an alternative of this embodiment, if the coding mode of the frame subsequent to the (k +1) th frame is an MDCT coding mode, then: the predefined window type is a third window type.

In an alternative of this embodiment, if the coding mode of the frame subsequent to the (k +1) th frame is an ACELP coding mode, then: the predefined window type is a fourth window type.

In an alternative of this embodiment, the length L of the predefined window pattern_w>2 × N, scaling parameters related to the length of the predefined window type with a scaling factor [ L ] in the MDCT encoding of the input signal of the (k +1) th frame using the predefined window type_w/2N]。

In an alternative of this embodiment, the pair of signals s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPcThe method comprises the following steps:

for the output signal s of the high-pass filter_dHPApplying a length L having the characteristic of smoothly dropping from 1 to 0_hpe2Obtaining the signal s by the fourth falling window of_dHPw(ii) a And, an input signal s to said high-pass filter_dApplying a length L having a smooth rise from 0 to 1 characteristic_hpe2The fourth window of (1) obtains a signal s_dw(ii) a Will signal s_dHPwAnd s_dwThe value obtained by the superposition is taken as s_dHPcL in the fourth falling window_hpe2Points, s_dHPcSum of values before the fourth falling window_dHPCoincidence, s_dHPcValue sum s after a fourth falling window_dIn agreement, said L_hpe2More than 0 and less than or equal to the length of the signal of the three sub-frames at the ACELP core operating frequency minus the length of the signal in the overlapping region of the ACELP decoded signal and the subsequent MDCT decoded signal at the ACELP core operating frequency, and the sum of the fourth falling window and the fourth rising window is 1.

In an alternative of this embodiment, the fourth descending window is a linear descending window, and the fourth ascending window is a linear ascending window.

In an alternative of this embodiment, the post-high-pass filtering nonlinear phase shift compensation processing is performed to obtain s_2ddpHPcThe method comprises the following steps:

for the output signal s of the post-high-pass filter_2ddpHPApplying a length L having the characteristic of smoothly dropping from 1 to 0_hpd2Obtaining the signal s by the fifth falling window of_2ddpHPw(ii) a And, an input signal s to said post-high-pass filter_2ddpApplying a length L having a smooth rise from 0 to 1 characteristic_hpd2The fifth window of (1) obtains a signal s_2ddpw(ii) a Will signal s_2ddpHPwAnd s_2ddpwThe value obtained by the superposition is taken as s_2ddpHPcL in the fifth descending window_hpd2Points, s_2ddpHPcSum of values before the fifth falling window_2ddpHPCoincidence, s_2ddpHPcAfter the fifth falling windowValue sum s_2ddpIn agreement, said L_hpd2Greater than 0 and less than or equal to N_d－D_2fd/2－D_21odWherein N is_dFor the length of a frame signal at the ACELP core operating frequency, D_2fdThe number of sampling points on the ACELP core operating frequency corresponding to the delay generated when the sampling rate in ACELP is converted between the input signal sampling frequency and the ACELP core operating frequency, D_21odIs the length of the signal in the overlapping region of the ACELP decoded signal and the subsequent MDCT decoded signal at the ACELP core operating frequency, and the sum of the fifth falling window and the fifth rising window is 1.

In an alternative of this embodiment, the fifth descending window is a linear descending window, and the fifth ascending window is a linear ascending window.

In an alternative of this embodiment, if the code stream of the frame subsequent to the (k +1) th frame is an MDCT code stream, then: the predefined window type is a third window type.

In an alternative of this embodiment, if the code stream of the frame subsequent to the (k +1) th frame is an ACELP code stream, then: the predefined window type is a fourth window type.

In an alternative of this embodiment, the length L of the predefined window pattern_w>2 × N, scaling parameters related to the length of the predefined window type by [ L ] in the MDCT decoding of the input signal of the (k +1) th frame using the predefined window type_w/2N]。

In an alternative of this embodiment, the processing the ACELP decoded signal of the (k +1) th frame and the MDCT decoded signal to obtain the final decoded signal of the (k +1) th frame includes:

decoding the ACELP decoded signal s for the (k +1) th frame_2dApplying a length L having the characteristic of smoothly dropping from 1 to 0_amfObtaining the signal s by the sixth falling window of_2dw(ii) a And decoding the MDCT decoded signal s_1dApplying a length L having a smooth rise from 0 to 1 characteristic_amfThe sixth window of (1) obtains a signal s_1dwThe final decoded signal s of the (k +1) th frame_fdAt the first stageThe value in the six-rise window is s_1dw+s_2dwS before the sixth ascending window_fdValue of (a) and said s_2dCoincidently, s after the sixth lifting window_fdValue of (a) and said s_1dThe consistency is achieved; wherein the sum of the sixth falling window and the sixth rising window is 1, 0<L_amf≤D_21oSaid D is_21oThe number of sampling points of the overlapping region of the ACELP decoded signal and the MDCT decoded signal.

In the following embodiments, the MDCT coding mode is exemplified by g.722.1, and the ACELP coding mode is exemplified by g.722.2, but the present invention is not limited thereto, and other MDCT coding modes and ACELP coding modes may be adopted, such as the MDCT coding mode in AAC, the ACELP coding mode in AMR-WB + (Extended Adaptive Multi-Rate-Wideband coder), and the like.

Example 1

This embodiment describes a smooth switching method from the MDCT coding mode to the ACELP coding mode. Considering that the MLT (Modulated Lapped Transform) Transform is a variant of the MDCT Transform, the audio encoder of g.722.1(MLT Transform) is chosen as the MDCT encoder, and the speech encoder of g.722.2 is chosen as the ACELP encoder; after signal classification, the coding mode of the ith frame and the previous signal is an MDCT mode and is coded by G.722.1; the coding mode of the signal of the (i +1) th frame is ACELP mode, and G.722.2 is used for coding. The specific smooth handover method of g.722.1 to g.722.2 is as follows.

As shown in fig. 1, the method comprises the following steps at the encoding end:

step E101: for the input signals of the frames 0 to i-1, the coding modes of the signals of the previous frame, the current frame and the next frame are MDCT modes, the original window type in G.722.1 coding and decoding, which is called as a first window type, is utilized to carry out G.722.1 coding on the part of signals, and the coding code stream of each frame and the coding mode information of the next frame are output together;

the sampling rate of the input signal is 16kHz, the length of one frame of signal is 20ms, and the input signal comprises 320 sampling points;

the 0 th frame signal is not a real input signal, all the frame signals are set to be 0, and the coding mode of the frame signals is the same as that of the 1 st frame signal;

the first window type is shown in fig. 2, and the specific formula is as follows:

where N is 320 is the number of samples in a frame, and the window h is considered to be when the i-1 th frame is encoded₀(n_w) Starting position (n) of_w0) coincides with the start position (N-2) N of the i-1 th frame, spanning two frames i-1 and i-th frame, N_wIndex for signal at 16kHz sampling rate;

step E102: for the ith frame signal, the coding types of the previous frame signal and the current frame signal are MDCT modes, and the coding mode of the next frame signal is an ACELP mode, G.722.1 coding is carried out on the ith frame input signal by utilizing a predefined second window type, so that when the coding code stream of the ith frame is decoded, a partial signal of the (i +1) th frame can be reconstructed;

the second window type sequentially comprises 5 parts from left to right: a first null region, a rising window region, a 1-value holding region, a falling window region, and a second null region, wherein:

the first zero value area is a window with the value of 0, and the length of the first zero value area is the same as the length of the 1 value holding area on the left side of the center of the window of the second window type;

the ascending window area is a section of ascending window which is the same as the first half window type of the first window type, so that the ascending window area is matched with the first window type to realize the coding and decoding of the ith frame signal;

the 1 value holding area is a section with the length of N₁Has a value of 1 window, N₁The value range is as follows: d_2f≤N₁N is less than or equal to N; preferably, D_2f≤N₁≤min(D₁+D_12o，N-N_1f) Wherein, in the step (A),

D_2fthe delay (1.875ms) for sample rate conversion in g.722.2 is 16kHz (i.e., 30 samples)Input signal sampling rate) the number of corresponding sampling points on the sampled signal;

D₁the number of corresponding sampling points on the 16kHz (i.e., the sampling rate of the input signal) sampling signal due to the delay (20ms, which does not include the delay for encoding a frame signal itself) generated by the overlap between frames in g.722.1 is 320 sampling points;

D_12othe number of sample points, D, for the region where the G.722.1 decoded signal overlaps with the G.722.2 decoded signal when the coding mode is switched from the MDCT mode to the ACELP mode_12o≥D_2fTo ensure that an accurate history state is provided for the up-sampling filter in g.722.2 decoding;

the descending window area is a section of descending window which smoothly descends from 1 to 0 and has the length of N_1fAnd 0 is<N_1f≤N-N₁(ii) a Preferably, a cosine falling window is selected as the section of falling window;

the second zero value area is a window with the value of 0, and the length of the second zero value area is the same as the length of the 1 value holding area after the center position of the window;

the window center position of the second window type is the position of the center point of the second window type; two folding positions in the MDCT coding are the central point position of a front half region and the central point position of a rear half region in the second window type;

thus, the second window type has a total length L_w2＝2*(160+N₁+N_1f2); preferably L_w2≥2*N；

If L is_w22 × N, then the other parts in the corresponding g.722.1 codec do not need to be changed;

if L is_w2>2N, scaling or adjusting the parameters related to the window length in the G.722.1 coding and decoding to ensure that the L is scaled or adjusted under the constraint of the original code rate_w2G.722.1 coding and decoding are carried out on the signals with the lengths; preferably, the scale of scaling is [ L ]_w2/2N]。

In this embodiment, the second window type is as shown in fig. 3, and the specific formula is:

wherein N is₁110, the length of the first null region is 0; window length L_w22 × 320, the corresponding MDCT coding process adopts the process in the original g.722.1; when encoding the ith frame, the window h₁(n_w) Starting position (n) of_w0) coincides with the start position (N ═ 1) × N) of the ith frame, spanning two frames of the ith frame and the (i +1) th frame;

step E103: g.722.1 local decoding is carried out on the i-1 th frame and the G.722.1 coded information of the i frame to obtain the N frame in the i +1 th frame₁Decoded signal s of 110 samples and M-1 samples before_1d(n),n＝iN+N₁-M,...,iN+N₁-1；

Wherein, the window type adopted in decoding is the same as the window type in encoding; m is M₂+D_12o-D_2f，M₂Not less than the length of signal needed when the history state of each filter in G.722.2 coding is established and updated subsequently; in this example, M is taken₂375 sampling points, namely the decoded signal s is obtained_1d(n),n＝(i-1)*N+55,...,iN+109；

Step E104: decoding the signal s of G.722.1 in the i-th frame and the i + 1-th frame according to the encoding parameters in G.722.2_1d(N), N is (i-1) × N + 55., iN +109 and the input signal of the i frame and the i +1 frame are processed with ACELP preprocessing such as down-sampling, pre-emphasis and high-pass filtering, and the g.722.1 decoding signal, ACELP preprocessing signal and the state S of the down-sampling filter after pre-emphasis on the g.722.2 core working frequency are obtained_dState S of the high-pass filter_HPAnd the state S of the pre-emphasis filter_p；

Of course, the pre-emphasis and high-pass filtering process in step E104 may also be performed after step E105;

step E105: outputting the code stream coded by the ith frame G.722.1 (the coded code stream in the step E102) and the coding mode information of the (i +1) th frame together;

step E106: will be from the (i +1) th frame_2sStarting from a point, an original signal s (n) spanning the length of one frame of the (i +1) th frame and the (i + 2) th frame, where n is iN+N_2s,...,(i+1)*N+N_2s-1 feeding into g.722.2 coding, performing g.722.2(ACELP mode) coding when the coding mode switches from g.722.1(MDCT mode) to g.722.2(ACELP mode), comprising:

initializing the history state of each filter in the G.722.2 coding, using the ACELP pre-processing signal obtained when the i-th frame is coded, the G.722.1 decoding signal pre-emphasized on the working frequency of the G.722.2 core, and the down-sampling filter state S_dHigh pass filter state S_HPState S of the pre-emphasis filter_pEstablishing and updating the historical state of the filter required in the G.722.2 coding by the parameters in the G.722.2 coding process, and simultaneously carrying out G.722.2 coding processing on the input signal based on the historical state of each filter in the updated G.722.2 coding process to obtain a G.722.2 coding code stream;

wherein N is_2s＝N₁+D₂-D_12o+D_2f/2，D ₂80 samples are the number of samples in the g.722.2(ACELP mode) for look-ahead delay (5ms) corresponding to the 16kHz sample rate signal; preferably, N_2sIs an integer multiple of 5;

in this example N_2sThe signal input to g.722.2 coding is s (N), N is iN +175, (i +1) N + 174;

step E107: outputting the code stream obtained by G.722.2 coding and the coding mode information of the (i + 2) th frame together;

step E108: and carrying out subsequent coding processing on the residual uncoded signals and the subsequent signals in the (i + 2) th frame according to the coding mode information obtained by signal classification, and ending.

Preferably, if the coding mode of the (i + 2) th frame is the MDCT mode, the signal of the (i + 2) th frame is coded according to the coding processing method of g.722.1(MDCT coding mode) after switching when the coding mode is switched from the ACELP mode to the MDCT mode as described in the subsequent embodiment 2, and then the subsequent coding processing is continued according to the coding mode information obtained by classifying the subsequent signals;

preferably, if the coding mode of the (i + 2) th frame is the ACELP mode, the signal of the (i + 2) th frame is coded according to the original g.722.2 coding method, and then the subsequent coding process is continued according to the coding mode information obtained by classifying the subsequent signals.

The following describes in detail the process of performing ACELP preprocessing on the g.722.1 local decoded signal of the i-th frame and the i + 1-th frame and the input signal of the i-th frame and the i + 1-th frame in step E104 with reference to fig. 6, which specifically includes the following steps:

step E104 a: local decoding of the signal s of G.722.1 for the i-th frame and the i + 1-th frame using a downsampling filter in G.722.2_1d(n),n＝iN+N₁-M,...,iN+N₁-D_12o+D_2f-1, down-sampling the sampling rate from 16kHz to the core operating frequency of g.722.2 to obtain a g.722.1 decoded signal corresponding to the core operating frequency of g.722.2:

s_1dd(n_d),n_d＝iN_d+[4*(N₁-M)/5]+D_2fd/2,

...,iN_d+[4*(N₁-D_12o+D_2f-1)/5]-D_2fd/2

the operating frequency of the G.722.2 core is 12.8 kHz;

wherein n is_dIs indexed by a signal at the operating frequency of the G.722.2 core, 12.8kHz, N_d256, which is the length of a frame signal at the g.722.2 core operating frequency; d_2fdThe 24 sampling points are the number of corresponding sampling points on the sampling signal of the core working frequency 12.8kHz of G.722.2, wherein the time delay (1.875ms) generated by the sampling rate conversion in the G.722.2 is obtained;

in this embodiment, the range of the g.722.1 decoding signal corresponding to the g.722.2 core operating frequency obtained by this step is s_1dd(n_d),n_d＝(i-1)*N_d+56,...,iN_d+75；

Step E104 b: decoding signals s of G.722.1 in the ith frame and the (i +1) th frame corresponding to the operating frequency of the G.722.2 core by using a pre-emphasis filter in the G.722.2 coding_1dd(n_d),n_d＝(i-1)*N_d+56,...,iN_d+75 pre-emphasis processing, discarding the first point, and obtaining the pre-emphasized G.722.1 decoding signal on the G.722.2 core operating frequencyNumber s_1ddp(n_d),n_d＝(i-1)*N_d+57,...,iN_d+75；

Step E104 c: using the downsampling filter iN g.722.2, the partial input signal s (N) of the i-th frame and the i + 1-th frame is N ═ iN + N_2s-M_s-D₂,...,iN+N_2s-1, down-sampling to obtain an input signal s corresponding to the core operating frequency of G.722.2 at 12.8kHz_d(n_d),n_d＝iN_d+[4*(N_2s-M_s-D₂)/5]+D_2fd/2,…,iN_d+[4*(N_2s-1)/5]-D_2fd/2, simultaneously down-sampling the state S of the filter_dStoring;

the partial input signals of the ith frame and the (i +1) th frame are s (N), and N is iN + N_2s-M_s-D₂,...,iN+N_2s-1, i.e. the starting position of the signal to be subsequently input into the g.722.2 encoder for encoding (N in the (i +1) th frame_2s175 points) of M before_s+D₂Original signals of sampling points; wherein M is_sNot less than the length of the original signal required when subsequently establishing and updating the history state in the G.722.2 coding;

in this example, M is taken_s365, the partial input signal of the i-th frame and the partial input signal of the i + 1-th frame which need to be subjected to the down-sampling processing are s (N), N is (i-1) × N + 50., iN + 174; the input signal corresponding to the obtained core working frequency of G.722.2 with the sampling rate of 12.8kHz is s_d(n_d),n_d＝(i-1)*N_d+52,...,iN_d+127；

Step E104 d: utilizing high-pass filter in G.722.2 coding preprocessing to input signal s corresponding to G.722.2 core working frequency_d(n_d),n_d＝(i-1)*N_d+52,...,iN_d+127 high-pass filtering, the state S of the high-pass filter_HPStoring;

step E104E: input signal s corresponding to G.722.2 core operating frequency is subjected to pre-emphasis filter in G.722.2 coding_d(n_d),n_d＝(i-1)*N_d+52,...,iN_d+127 with pre-emphasis, discardingThe first point in the result is the input signal s after pre-emphasis at the operating frequency of the g.722.2 core_dp(n_d),n_d＝(i-1)*N_d+53,...,iN_d+127, saving the signal as ACELP pre-processing signal, while pre-emphasizing the state S of the filter_pAnd (5) storing and ending.

The g.722.2 encoding process described in step E106 is described in detail below with reference to fig. 7, which specifically includes the following steps:

e106 a: will be from the (i +1) th frame_2s175 starting frame length of the original signal s (N), N is iN + N_2s,...,(i+1)*N+N_2s-1 is fed into g.722.2 coding, the signal of one frame length spans the input signals of two frames, i +1 frame and i +2 frame;

e106 b: because the coding mode type of the signal of the (i +1) th frame is an ACELP type and the coding mode type of the previous frame is an MDCT type, the initialization operation in G.722.2 coding is executed, and the historical state of each filter is initialized;

step E106 c: using the state S of the downsampling filter obtained by ACELP preprocessing in the step E104 when the ith frame is encoded_dAs the history of the downsampling filter iN g.722.2 encoding, N ═ iN + N is applied to the input signal s (N)_2s,...,(i+1)*N+N_2s-1, carrying out down-sampling processing in G.722.2 to obtain a signal s on the core operating frequency of G.722.2_d(n_d),n_d＝iN_d+[4N_s/5]-D_2fd/2,...,(i+1)*N_d+[4(N_s-1)/5]；

In this embodiment, a signal s at the operating frequency of the g.722.2 core can be obtained_d(n_d),n_d＝iN_d+128,...,(i+1)*N_d+139；

Step E106 d: using the state S of the high-pass filter obtained by ACELP preprocessing in the step E104 when encoding the ith frame_HPHistory state of high-pass filter as preprocessing part in G.722.2 coding for signal s at operating frequency of G.722.2 core_d(n_d),n_d＝iN_d+128,...,(i+1)*N_d+139 G.722.2 encoding preprocessingThe high-pass filtering processing in the step (2) can obtain a high-pass filtered signal s_dHP(n_d),n_d＝iN_d+128,...,(i+1)*N_d+139；

Step E106E: compensating the influence of the nonlinear phase shift of high-pass filtering in the G.722.2 coding preprocessing process on smooth switching;

preferably, if the coding mode of the (i + 2) th frame is an MDCT mode, the influence of high-pass filtering nonlinear phase shift in the G.722.2 coding preprocessing process is compensated by using a first method;

preferably, if the coding mode of the (i + 2) th frame is an ACELP mode, the second method is used to compensate the influence of the high-pass filtering nonlinear phase shift in the g.722.2 coding preprocessing process;

the first method for compensating the nonlinear phase shift influence of the high-pass filter comprises the following steps: setting the output signal of the high-pass filter as the input signal of the high-pass filter, i.e. the signal s after compensation of the effect of the high-pass filtered non-linear phase shift_dHPc(n_d) Comprises the following steps:

s_dHPc(n_d)＝s_d(n_d),n_d＝iN_d+128,...,(i+1)*N_d+139 (3)

the second method for compensating the nonlinear phase shift influence of the high-pass filter is as follows: for high-pass filtered input signal s_d(n_d),n_d＝iN_d+128,...,(i+1)*N_d+127 applying a falling window w_hpe1(j) Obtaining a windowed high-pass filtered input signal s_dw(n_d)：

s_dw(n_d)＝w_hpe1(n_d-(iN_d+128))*s_d(n_d),n_d＝iN_d+128,...,(i+1)*N_d+127(4)

Wherein the descending window is w_hpe1(j) The following linear descent window is possible:

w_hpe1(j)＝(L_hpe1-1-j)/(L_hpe1-1),j＝0,...,L_hpe1-1 (5)

of course, the descending window in the present invention is not limited to the linear descending in the above formula (5)The window can also adopt other forms of functions with the characteristic of smoothly reducing from 1 to 0, and the length of the window is more than or equal to 0 and less than or equal to L_hpe1≤N_d(ii) a In this example, take L_hpe1For the length N of a frame signal on the G.722.2 core operating frequency_dI.e. L_hpe1＝N_d；

For high-pass filtered output signal s_dHP(n_d),n_d＝iN_d+128,...,(i+1)*N_d+127 applying a window-up resulting in a windowed high-pass filtered output signal s_dHPw(n_d)：

Wherein the ascending window can be a linear ascending window as follows:

1-w_hpe1(j),j＝0,...,L_hpe1-1 (7)

the ascending window may not be a linear ascending window, the ascending window is related to the previous descending window, and the sum of the ascending window and the descending window is equal to 1.

High-pass filtered input signal s to be windowed_dw(n_d) Sum windowed high pass filter output signal s_dHPw(n_d) Overlap-add (n)_dThe values at the same position are added) to obtain a signal s compensating the effect of the high-pass filtering non-linear phase shift_dHPc(n_d)：

s_dHPc(n_d)＝s_dw(n_d)+s_dHPw(n_d),n_d＝iN_d+128,...,(i+1)*N_d+127 (8)

The high-pass filtered output signal after the overlap region remains unchanged, i.e.:

s_dHPc(n_d)＝s_dHP(n_d),n_d＝(i+1)*N_d+128,...,(i+1)*N_d+139 (9)

if the start positions of said falling and rising windows are not the start positions of the high-pass filtered output signal, the high-pass filtered output signal before the overlap-and-add region is equal to the input signal of the high-pass filter.

Wherein, if the encoding mode of the (i + 2) th frame is the ACELP mode, step E106E is an optional step;

of course, the nonlinear phase shift of the high-pass filter can also be compensated in other ways, such as by designing the filter to be complementary to the phase of the high-pass filter.

Step E106 f: using the state S of the pre-emphasis filter obtained by ACELP pre-processing in the step E104 when encoding the ith frame_pAs a history of the pre-emphasis filter in g.722.2 coding, the high-pass filtered output signal s compensated for the effect of the nonlinear phase shift is filtered_dHPc(n_d) Pre-emphasis filtering is carried out to obtain a pre-emphasized input signal s_dHPp(n_d) Then, the subsequent G.722.2 coding processing is continued;

step E106 g: when the historical pre-emphasis input signal is needed in the G.722.2 coding, the last M in the ACELP pre-processing signal obtained by the ACELP pre-processing in the E104 step when the ith frame is coded is used_2dsInformation of length, s_dp(n_d),n_d＝iN_d+127-M_2ds,...,iN_d+127 as the pre-emphasized input signal s at the operating frequency of the g.722.2 core (before the g.722.2 encoded input signal) of the history required in the g.722.2 encoding_dHPp(n_d),n_d＝iN_d+127-M_2ds,...,iN_d+127, and then proceeding with the subsequent g.722.2 encoding process;

wherein M is_2ds＝D_LPC-D_2fd/2，D_LPCThe number of additional 10ms samples of the pre-emphasized input signal required for the calculation of the LPC (Linear Predictive Coding) coefficients in the g.722.2 Coding (the sampling rate of the signal is g.722.2 core operating frequency); in this example, D_LPC＝128，M_2ds116; last M utilized_2dsThe length of the ACELP pre-processed signal is s_dp(n_d),n_d＝iN_d+12,...,iN_d+127；

Step E106 h: when historical unquantized ISP (impedance spectrum pair) coefficients are required to be used in G.722.2 coding, ISP coefficients corresponding to the unquantized LPC coefficients obtained by calculation in the G.722.2 coding process are used as the historical unquantized ISP coefficients required in the G.722.2 coding process, and then subsequent G.722.2 coding processing is continued;

step E106 i: when the historical perceptual weighting signal is needed to be used in the g.722.2 encoding, the perceptual weighting filter composed of the LPC coefficients of the interpolated first sub-frame calculated in the g.722.2 encoding process is used to encode the ACELP pre-processed signal s obtained in step E104E when the i-th frame is encoded_dp(n_d),n_d＝(i-1)*N_d+53,...,iN_d+127 to process perception weighted filter, using the obtained perception weighted signal as the history perception weighted signal needed in G.722.2 coding, then proceeding the subsequent G.722.2 coding process;

the sub-frames are four parts averagely divided into one frame signal in the ACELP coding, each part is one sub-frame, the length of each part is 1/4 times the length of the frame signal, and the four sub-frames are respectively a first sub-frame, a second sub-frame, a third sub-frame and a fourth sub-frame according to the time sequence;

step E106 j: when the open-loop pitch search is needed in the g.722.2 coding, the history cache data needed by the gain calculation in the open-loop pitch search is updated by using the perceptual weighting signal obtained in the step E106i, and then the subsequent g.722.2 coding processing is continued;

the method for updating the historical cache data required by the gain calculation in the open-loop pitch search comprises the following steps: performing high-pass filtering processing on the perceptual weighting signal obtained in the step E106i by using a high-pass filter in the open-loop pitch search in the g.722.2 coding process to obtain a high-pass filtered perceptual weighting signal and a state of the high-pass filter, using the high-pass filtered perceptual weighting signal as historical high-pass filtered perceptual weighting signal cache data required for gain calculation in the open-loop pitch search, and using the state of the high-pass filter as a historical state of the high-pass filtering required for gain calculation in the open-loop pitch search;

of course, this step is an optional step;

step E106 k: when historical quantized ISP coefficients are needed to be used in G.722.2 coding, the quantized ISP coefficients obtained by calculation in the G.722.2 coding process are used as the historical quantized ISP coefficients needed in the G.722.2 coding, and then subsequent G.722.2 coding processing is continued;

step E106 l: when the history state of the LPC synthesis filter is needed to be used in the G.722.2 encoding, the last M of the pre-emphasized G.722.1 decoded signal at the operating frequency of the G.722.2 core obtained in the step E104b in the i-th frame encoding is used_LPCoInformation of length, s_1ddp(n_d),n_d＝iN_d+75-M_LPCo+1,...,iN_d+75 as the history state of the LPC synthesis filter required when processing the first subframe in g.722.2 coding, and then continuing the subsequent g.722.2 coding process; wherein M is_LPCo16 is the order of the LPC in g.722.2 coding, and is also the length of the historical state data required by the LPC synthesis filter;

step E106 m: when the historical excitation signal is needed to be used in the g.722.2 encoding, the g.722.1 decoded signal s obtained in step E104b after pre-emphasis on the g.722.2 core operating frequency is used, and the LPC coefficients of the first sub-frame calculated in the g.722.2 encoding process are used to form a prediction analysis filter_1ddp(n_d),n_d＝(i-1)*N_d+57,...,iN_d+75, analyzing and filtering to obtain residual signal of LPC analysis filtering, using the residual signal of LPC analysis filtering as historical excitation signal needed in G.722.2 coding, and then continuing the subsequent G.722.2 coding process;

step E106 n: when closed-loop pitch search is needed iN G.722.2 coding, the signal starting position iN is processed by using the current G.722.2 coding on the working frequency of the G.722.2 core_d+N_2spPre-emphasized encoded input signal s of the preceding frame length_dp(n_d),n_d＝iN_d+N_2sp-N_d,...,iN_d+N_2sp-1 and the correspondingly positioned pre-emphasized MDCT decoded signal s_1ddp(n_d),n_d＝iN_d+N_2sp-N_d,...,iN_d+N_2sp1, updating the history state of a perceptual weighting filter required when a target signal is calculated in closed-loop pitch search in the G.722.2 coding process, and then continuing the subsequent G.722.2 coding process;

the starting position iN of the signal processed by the current G.722.2 coding_d+N_2spFor G.722.2 encoding the input signal (16kHz sample rate) start position iN + N_2sFront D₂The location of each sample point at the g.722.2 core operating frequency (12.8kHz sampling rate); in this example N_2sp＝76；

The method for updating the historical state of the perceptual weighting filter in the calculation of the target signal required by the closed-loop pitch search in the G.722.2 coding process comprises the following steps:

calculating the initial position iN + N of the current G.722.2 coding processing signal_2spEncoding input signal s of preceding frame length_dp(n_d),n_d＝iN_d+N_2sp-N_d,...,iN_d+N_2sp-1 and the corresponding position of the MDCT decoded signal s_1ddp(n_d),n_d＝iN_d+N_2sp-N_d,...,iN_d+N_2sp-error of 1:

e_1d(n_d)＝s_dp(n_d)-s_1ddp(n_d),n_d＝iN_d+N_2sp-N_d,...,iN_d+N_2sp-1 (10)

error e is corrected by using perceptual weighting filter in G.722.2 coding process_1d(n_d),n_d＝iN_d+N_2sp-N_d,...,iN_d+N_2sp-1 filtering, using the obtained state of the perceptual weighting filter as the history state of the perceptual weighting filter in the calculation of the target signal required for the closed-loop pitch search in the g.722.2 coding process;

signal start position iN of current G.722.2 coding processing on G.722.2 core working frequency utilized iN the invention_d+N_2spA previously pre-emphasized encoded input signal s_dp(n_d) And a pre-emphasized MDCT decoded signal s at the corresponding position_1ddp(n_d) The length of (2) is not limited to a frame length, as long as the historical state value of the perceptual weighting filter in the calculation of the target signal required by the obtained closed-loop pitch search is ensured not to be greatly different from the state value when the target signal is coded by G.722.2 alone;

step E106 o: when the fixed codebook gain predicted value is needed to be utilized in the G.722.2 coding, a non-prediction method is adopted to calculate to obtain the fixed codebook gain predicted value; the fixed codebook gain predicted value has only one value in a frame signal;

the non-prediction method for calculating the prediction value of the fixed codebook gain refers to a method for predicting the current fixed codebook gain by using the information of the current coding processing signal, and the method does not need historical signal information, such as a method for calculating the prediction value of the fixed codebook gain in AMR-WB +;

of course, the non-prediction method for calculating the prediction value of the fixed codebook gain described in the present application is not limited to the method using AMR-WB +, and other non-prediction methods may be adopted;

step E106 p: when the codebook gain quantization of each subframe is needed in the G.722.2 coding, comparing a fixed codebook gain predicted value obtained by a prediction method in the G.722.2 with a fixed codebook gain predicted value obtained by calculation by adopting a non-prediction method, and selecting a value which enables the coding error energy of the subframe to be minimum from the fixed codebook gain predicted value and the fixed codebook gain predicted value as a final fixed codebook prediction gain value of the subframe; simultaneously, recording the fixed codebook gain predicted value selected by the subframe by adopting a selection flag bit; quantizing the subframe codebook gain based on the selected fixed codebook prediction gain value, and updating a quantization energy prediction error; then continuing to perform subsequent G.722.2 coding processing;

step E106 q: after high-frequency gain and index are calculated in G.722.2 coding, the high-frequency gain of a first subframe is set to be the minimum value, the part of information does not need to be transmitted, and the saved bits are used for transmitting a fixed codebook gain predicted value of a frame signal and selection flag bit information of the fixed codebook gain predicted value of the first subframe, wherein the fixed codebook gain predicted value is calculated by a non-prediction method; reducing the precision represented by each high-frequency gain index of the second sub-frame to the fourth sub-frame by 1 bit, respectively transmitting the selection flag bit information of the fixed codebook gain predicted value of the second sub-frame to the fourth sub-frame by the saved bit, then continuing the subsequent G.722.2 coding processing to obtain the G.722.2 coding code stream, and ending;

of course, if the selected ACELP coding mode employs a non-predictive method to calculate the fixed codebook gain, the correlation of the fixed codebook gains in step E106o, step E106p and step E106q may be skipped.

As shown in fig. 8, the method comprises the following steps at the decoding end:

step D101: sequentially reading in the code stream of the 0-i-1 frame transmitted from the encoding end and the encoding mode information of the next frame of each frame transmitted simultaneously, judging that the code stream of the currently received frame and the code streams of the previous frame and the next frame are MDCT code streams according to the received encoding mode information, decoding the code stream of the 0-i-1 frame by utilizing a first window type G.722.1, and outputting the decoded signals;

the code stream type corresponds to the coding mode, namely the MDCT coding mode corresponds to the G.722.1 code stream (MDCT code stream), and the ACELP coding mode corresponds to the G.722.2 code stream (ACELP code stream).

Step D102: reading in the i frame code stream and the i +1 frame coding mode information transmitted simultaneously, and judging that the i +1 frame code stream type is an ACELP type;

step D103: the code streams of the i-1 th frame to the i +1 th frame are respectively of an MDCT type, an MDCT type and an ACELP type, so that the code stream of the i frame is decoded by adopting a second window type to obtain the whole frame of the i frame and the first N of the i +1 th frame₁Decoded signal s of 110 samples_1d(n),n＝(i-1)*N,...,iN+109；

Step D104: i frame signal s decoded from G.722.1_1d(N), N ═ N, (i-1) · N, iN-1 output, the nth frame iN the (i +1) th frame₁Decoded signal s of 110 samples and M-1 samples before_1d(n),n＝iN+N₁-M,...,iN+N₁-1 is stored in a buffer, i.e. the decoded signal s_1d(n),n＝(i-1)*N+55,...,iN+109；

Step D105: decoding the signal s of G.722.1 of the i-th frame and the i + 1-th frame according to the encoding parameters in G.722.2_1d(N, N ═ N + 55., (i-1) · N + 109. ln +109 performs ACELP preprocessing as described iN steps E104a and E104b, resulting iN g.722.1 decoded signal s before pre-emphasis at the g.722.2 core operating frequency_1dd(n_d),n_d＝(i-1)*N_d+56,...,iN_d+75 and g.722.1 decoded signal s after pre-emphasis_1ddp(n_d),n_d＝(i-1)*N_d+57,...,iN_d+75；

Of course, the pre-emphasis process in step E104b described in step D105 may also be performed after step D106;

step D106: reading in the code stream of the (i +1) th frame and the coding mode information of the (i + 2) th frame transmitted simultaneously;

step D107: the code streams of the i-th frame and the i + 1-th frame are respectively of an MDCT type and an ACELP type, so that g.722.2(ACELP mode) decoding in an encoding mode from g.722.1(MDCT mode) to g.722.2(ACELP mode) is performed: initializing the historical state of each filter in the G.722.2 decoding, and utilizing the G.722.1 decoding signal s obtained in the step D105 before pre-emphasis on the operating frequency of the G.722.2 core_1dd(n_d),n_d＝(i-1)*N_d+56,...,iN_d+75, pre-emphasized g.722.1 decoded signal s_1ddp(n_d),n_d＝(i-1)*N_d+57,...,iN_dEstablishing and updating the history state of a filter required in G.722.2 decoding by using the parameters in the processes of +75 decoding and G.722.2 decoding, and simultaneously carrying out G.722.2 decoding on the i +1 th frame code stream based on the updated history state of the filter in the G.722.2 decoding to obtain a G.722.2 decoding signal s_2d(n),n＝iN+N_2spu-D_2f/2,...,(i+1)*N+N_2spu-1-D_2f/2；

Wherein, iN + N_2spuWhen the i +1 th frame signal is encoded for the G.722.2, the starting position iN of the G.722.2 encoding processing signal on the operating frequency (12.8kHz) of the G.722.2 core_d+N_2sp(12.8kHz sampling rate) corresponds to a position on the input signal sampling rate (16 kHz); in this example, N_2spu N₁-D_12o+D_2f2, it can be seen that the resulting g.722.2 decoded signal can be further represented as:

s_2d(n),n＝iN+N₁-D_12o,...,(i+1)*N+N₁-D_12o-1

in this embodiment, the obtained g.722.2 decoded signal is s_2d(n),n＝iN+80,...,(i+1)*N+79；

Step D108: performing overlap-add processing on the signals of the overlapping part of the decoded signal of G.722.1 and the decoded signal of G.722.2 to obtain a final decoded signal;

the region where the G.722.1 decoded signal and the G.722.2 decoded signal overlap is iN + N₁-D_12o≤n≤iN+N₁-1, iN the present embodiment, the overlapping area is iN + 80. ltoreq. n. ltoreq. iN + 109; applying a falling window to the G.722.1 decoded signal of the region to obtain a windowed G.722.1 decoded signal s_1dw(n)：

s_1dw(n)＝w_maf(n-(iN+80))*s_1d(n),n＝iN+80,...,iN+109 (11)

Wherein the descending window w_maf(j) May be a cosine fall window:

w_maf(n)＝cos²(jπ/(2*(L_maf-1))),j＝0,...,L_maf-1 (12)

of course, the falling window in the present invention is not limited to the cosine falling window, and other forms of functions with the characteristic of smooth falling from 1 to 0, the length of the window being 0<L_maf≤D₁₂o; in this example, take L_maf＝D_12o；

Applying a window-up to the G.722.2 decoded signal of the region to obtain a windowed G.722.2 decoded signal s_2dw(n)：

s_2dw(n)＝(1-w_maf(n-(iN+80)))*s_2d(n),n＝iN+80,...,iN+109 (13)

Wherein the ascending window can be cosine ascending window 1-w_maf(j),j＝0,...,L_maf-1；

Decoding the windowed G.722.1 signal s_1dw(n) and windowed G.722.2 decoded signals_2dw(n) adding to obtain the final decoded signal s in the overlap region_fd(n)：

s_fd(n)＝s_1dw(n)+s_2dw(n),n＝iN+80,...,iN+109 (14)

The final decoded signal before the overlap region is the decoded signal of g.722.1:

s_fd(n)＝s_1d(n),n＝iN,...,iN+79 (15)

the final decoded signal after the overlap region is the decoded signal of g.722.2:

s_fd(n)＝s_2d(n),n＝iN+110,...,(i+1)*N+79 (16)

step D109: outputting the signal of the (i +1) th frame obtained by final decoding, and decoding the first N of the (i + 2) th frame obtained by current decoding₁-D _12o80 sampling point signals are stored in a buffer;

step D110: sequentially reading in the (i + 2) th frame and the subsequent code stream and the coding mode information transmitted simultaneously, executing subsequent decoding processing according to the received coding mode information, and ending;

preferably, if the coding mode of the (i + 2) th frame is the MDCT mode, the g.722.1 decoding processing method after switching is used to perform decoding processing on the (i + 2) th frame code stream according to the coding mode switched from the ACELP mode to the MDCT mode described in subsequent embodiment 2, and then the subsequent decoding processing is continued according to the subsequently received coding mode information;

preferably, if the coding mode of the (i + 2) th frame is the ACELP mode, the (i + 2) th frame code stream is decoded according to the original g.722.2 decoding method, and then the subsequent decoding processing is continued according to the subsequently received coding mode information.

The g.722.1 decoding process described in step D107 is described in detail below with reference to fig. 9, which specifically includes the following steps:

step D107 a: the code streams of the ith frame and the (i +1) th frame are respectively of an MDCT type and an ACELP type, so that the historical state of each filter in G.722.2 decoding is initialized;

step D107 b: after decoding the relevant parameters from the g.722.2 code stream of the i +1 th frame, performing normal g.722.2 decoding processing, when the quantized ISP coefficient of the previous frame is needed, using the quantized ISP coefficient obtained by decoding as the quantized ISP coefficient of the previous frame needed in g.722.2 decoding, and then continuing the subsequent g.722.2 decoding processing;

step D107 c: when the historical excitation signal of the LPC synthesis filter is needed, the LPC coefficients of the quantized and interpolated first sub-frame calculated in the g.722.2 decoding are used to form a prediction analysis filter, and the g.722.1 decoded signal s after pre-emphasis on the g.722.2 core operating frequency obtained in step D105 is used_1ddp(n_d),n_d＝(i-1)*N_d+57,...,iN_d+75, analyzing and filtering to obtain residual signal of LPC analysis filtering, using the obtained residual signal of LPC analysis filtering as historical excitation signal of LPC synthesis filter needed in G.722.2 decoding, and then continuing subsequent G.722.2 decoding;

step D107D: when the fixed codebook gain of each subframe is needed, a fixed codebook gain predicted value obtained by non-prediction method calculation is solved from the high frequency gain of the first subframe obtained by G.722.2 decoding, then the selection flag bit information of the fixed codebook gain predicted values of the first to fourth subframes is respectively solved from the high frequency gain of the first to fourth subframes, the corresponding fixed codebook gain predicted value is selected to calculate the fixed codebook gain of the first to fourth subframes, and then the subsequent G.722.2 decoding processing is continued;

of course, if the method for calculating the predicted value of the gain of the fixed codebook in the adopted ACELP coding mode is a non-linear prediction method, the step can be omitted;

step D107 e: when the history state of the LPC synthesis filter is needed, the last M in the pre-emphasized g.722.1 decoded signal corresponding to the g.722.2 core operating frequency obtained in step D105 is used_LPCoInformation of length 16, s_1ddp(n_d),n_d＝(i-1)*N_d+60,...,iN_d+75 as the history status of the LPC synthesis filter in g.722.2 decoding, and then proceed with the subsequent g.722.2 decoding process;

step D107 f: when the de-emphasis filter is needed, the last value, s, in the g.722.1 decoded signal before pre-emphasis corresponding to the g.722.2 core operating frequency obtained in step D105 is used_1dd(n_d),n_d＝iN_d+75 as the de-emphasis filter history state in g.722.2 decoding, and then proceeding with the subsequent g.722.2 decoding process;

step D107 g: when the G.722.2 decoding is carried out to the post high-pass filtering, the influence of the nonlinear phase shift of the post high-pass filtering on smooth switching in the G.722.2 decoding process is compensated, a G.722.2 decoding post high-pass filtering output signal after the influence of the nonlinear phase shift is compensated is obtained, and the subsequent G.722.2 decoding processing is carried out continuously on the basis of the signal;

the input signal of the post high-pass filter in the G.722.2 decoding process is s_2ddp(n_d),n_d＝iN_d+N_2sp,...,(i+1)*N_d+N_2sp-1, in this example, N_2sp76, the input signal of the post-high-pass filter is s_2ddp(n_d),n_d＝iN_d+76,...,(i+1)*N_d+ 75; the post-high-pass filtered output signal in normal G.722.2 decoding is s_2ddpHP(n_d),n_d＝iN_d+76,...,(i+1)*N_d+ 75; the method for compensating the influence of the nonlinear phase shift of the post high-pass filtering on the smooth switching in the G.722.2 decoding process comprises the following steps:

preferably, if the code stream type received by the (i + 2) th frame is an MDCT type, the influence of post-high-pass filtering nonlinear phase shift in the G.722.2 decoding process is compensated by using a first method;

the first method for compensating the nonlinear phase shift effect of the post-high-pass filtering is as follows: post-high-pass filtered output signal s after compensation of high-pass filtered nonlinear phase shift effects_2ddpHPc(n_d) Input signal s for post-high-pass filtering_2ddp(n_d) Namely:

s_2ddpHPc(n_d)＝s_2ddp(n_d),n_d＝iN_d+76,...,(i+1)*N_d+75 (17)

preferably, if the code stream type received by the (i + 2) th frame is an ACELP type, the second method is used for compensating the influence of the post-high-pass filtering nonlinear phase shift in the G.722.2 decoding process;

the second method for compensating the nonlinear phase shift effect of the post-high-pass filtering is as follows: for post-high-pass filtered input signal s_2ddp(n_d),n_d＝iN_d+76,...,(i+1)*N_d+75 applying a falling window to obtain a windowed high-pass filtered input signal s_2ddpw(n_d)：

Wherein the descending window w_hpd1(j) May be a linear falling window:

w_hpd1(j)＝(L_hpd1-1-j)/(L_hpd1-1),j＝0,...,L_hpd1-1 (19)

of course, the descending window in the present invention is not limited to the above-mentioned linear descending window, and other forms of functions having a characteristic of smoothly descending from 1 to 0 may be adopted, and the length of the window is 0 ≦ L_hpd1≤N_d(ii) a In this example, take L_hpd1＝N_d；

To a post-high-pass filtered output signal s_2ddpHP(n_d),n_d＝iN_d+76,...,(i+1)*N_d+75 applying a window-up to obtain a windowed high-pass filtered output signal s_2ddpHPw(n_d)：

Wherein the ascending window can be a linear ascending window:

1-w_hpd1(j),j＝0,...,L_hpd1-1

high-pass filtered input signal s to be windowed_2ddpw(n_d) Sum windowed high pass filter output signal s_2ddpHPw(n_d) Overlapping and adding to obtain signal s for compensating high-pass filtering nonlinear phase shift influence_2ddpHPc(n_d)：

s_2ddpHPc(n_d)＝s_2ddpw(n_d)+s_2ddpHPw(n_d),n_d＝iN_d+76,...,(i+1)*N_d+75(21)

If the length of the descending window and the ascending window is less than the length of a signal of a frame above the G.722.2 core working frequency, the signal which is compensated for the influence of the nonlinear phase shift of the high-pass filtering before the windowing overlapping addition is an input signal of the high-pass filtering, and the signal which is compensated for the influence of the nonlinear phase shift of the high-pass filtering after the windowing overlapping addition is an output signal of the high-pass filtering;

wherein, if the code stream type received by the (i + 2) th frame is in an ACELP mode, the step D107g is an optional step;

step D107 h: when the up-sampling filter is needed, the last D in the g.722.1 decoded signal before pre-emphasis corresponding to the g.722.2 core operating frequency obtained in step D105 is used_2fd24 sampling points, s_1dd(n_d),n_d＝iN_d+52,...,iN_d+75 as the history state of the upsampling filter in the g.722.2 decoding, and then continuing the subsequent g.722.2 decoding process to obtain the g.722.2 decoded signal s_2d(N), N ═ iN + 80., (i +1) × N +79, end.

Example 2

The present embodiment describes a smooth switching method from an ACELP-based coding mode to an MDCT-based coding mode. Similarly, the audio coder of G.722.1 is selected as an MDCT coder, and the speech coder of G.722.2 is selected as an ACELP coder; after signal classification, the coding mode of the k frame and the previous signal is an ACELP mode and is coded by G.722.2; the coding mode of the (k +1) th frame is an MDCT mode and is encoded using g.722.1. The specific smooth handover method of g.722.2 to g.722.1 is as follows.

As shown in fig. 10, the method comprises the following steps at the encoding end:

step E201: for the 0-k-1 frame signals, the signals of the previous frame, the current frame and the next frame are codedThe code patterns are ACELP patterns, from N to-N_2sReading signals at the position of-175, carrying out G.722.2 coding, and outputting the coded code stream of each frame and the coding mode information of the next frame;

the signal of n-175 … 0 is the signal in the 0 th frame, the 0 th frame signal is not the real input signal, and the frame signals are all 0;

step E202: for the k frame signal, the coding type of the previous frame and the current frame signal is ACELP mode, the coding mode of the next frame signal is MDCT mode, and the Nth frame signal is read in_2sThe method comprises the steps of sampling an input signal s (N) starting from 175 points and spanning the length of a frame of a k frame and a k +1 frame, wherein N is (k-1) N +175, and kN +174 in G.722.2 coding to obtain a signal s at the core operating frequency of G.722.2_d(n_d),n_d＝(k-1)*N_d+[4N_s/5]-D_2fd/2,...,kN_d+[4(N_s-1)/5]I.e. s_d(n_d),n_d＝(k-1)*N_d+128,...,kN_d+139, and then for signal s at the operating frequency of the G.722.2 core_d(n),n＝(k-1)*N_d+128,...,kN_d+139 high-pass filtering in G.722.2 coding preprocessing to obtain high-pass filtered signal s_dHP(n),n＝(k-1)*N_d+128,...,k*N_d+139；

Step E203: compensating the influence of high-pass filtering nonlinear phase shift in G.722.2 coding preprocessing on smooth switching by using a third method;

for high-pass filtered output signal s_dHP(n_d),n_d＝kN_d+N_dHPfs,...,kN_d+N_dHPfeApplying a linearly decreasing window to obtain a windowed high-pass filtered output signal s_dHPw(n_d)：

Wherein N is_dHPfeFor the end position of the windowed portion signal, N_dHPfe≤[4(N_s-1)/5]-D_2d-D_21od–D_2fd/2；D_2dFor a look-ahead delay (5ms) in G.722.2(ACELP mode) corresponding to the number of samples, D, on the 12.8kHz sample rate signal_2d＝64；D_21odThe number of sampling points in the region where the g.722.2 decoded signal and the subsequent g.722.1 decoded signal overlap the region signal at the operating frequency of the g.722.2 core is D in the embodiment_21od＝32；N_dHPfe＝31；

N_dHPfsFor the start position of the windowed portion signal, N_dHPfs＝N_dHPfe-L_hpe2≥[4(N_s-1)/5]-N_d–D_2fd/2；

L_hpe2For lowering window w_hpe2(j) Length of (d), descending window w_hpe2(j) Comprises the following steps:

w_hpe2(j)＝(L_hpe2-1-j)/(L_hpe2-1),j＝0,...,L_hpe2-1 (23)

length L of window_hpe2Is greater than 0 and less than or equal to the length of the signal of three sub-frames on the core working frequency of G.722.2 minus the length D of the signal of the overlapping area of the G.722.2 decoding signal and the subsequent G.722.1 decoding signal on the core working frequency of G.722.2_21odI.e. 0<L_hpe2≤3*N_d/4-D_21od(ii) a In this example, D_21od＝32，L_hpe2＝3*N_d/4-D_21od＝160；

Of course, the falling window in the present embodiment is not limited to the above-mentioned linear falling window, and other functions having a characteristic of smoothly falling from 1 to 0 may be employed.

For high-pass filtered input signal s_d(n_d),n_d＝kN_d+N_dHPfs,...,k*N_d+N_dHPfeApplying a linearly rising window to obtain a windowed high-pass filtered input signal s_dw(n_d)：

Wherein the lifting window is 1-w_hpe2(j),j＝0,...,L_hpe2-1；

The windowed high-pass filtered input signal s_dw(n_d) And the windowed high-pass filter output signal s_dHPw(n_d) Overlapping and adding to obtain signal s for compensating high-pass filtering nonlinear phase shift influence_dHPc(n_d)：

s_dHPc(n_d)＝s_dw(n_d)+s_dHPw(n_d),n_d＝kN_d+N_dHPfs,...,kN_d+N_dHPfe (25)

The high pass filter output signal before the overlap region remains unchanged:

s_dHPc(n_d)＝s_dHP(n_d),n_d＜kN_d+N_dHPfs (26)

the high-pass filtered output signal after the overlap region is equal to the input signal of the high-pass filter:

s_dHPc(n_d)＝s_d(n_d),n_d＞kN_d+N_dHPfe (27)

step E204: for the high-pass filtered output signal s compensated for the effect of the non-linear phase shift_dHPc(n_d) Carrying out subsequent normal G.722.2 coding processing to obtain a G.722.2 coding code stream;

step E205: outputting the code stream obtained by G.722.2 coding and the coding mode information of the (k +1) th frame together;

step E206: for the (k +1) th frame signal, the coding type of the previous frame is an ACELP mode, the coding mode of the current frame is an MDCT mode, G.722.1 coding is carried out on the (k +1) th frame signal by using a new window type, so that when the coding code stream of the (k +1) th frame is decoded, partial signals can be reconstructed to be overlapped with the reconstructed signals when the ACELP coding code stream of the (k) th frame is decoded, and the delay difference between the ACELP coding and decoding before the coding mode is switched and the MDCT coding and decoding after the coding mode is switched is compensated;

if the coding mode of the signal of the frame after the (k +1) th frame is the MDCT mode, G.722.1 coding the signal of the (k +1) th frame by using a predefined third window type;

the third window type comprises 5 parts: a first zero value area, a rising window area, a 1 value holding area, a falling window area and a second zero value area; wherein:

the first zero value area is a window with the value of 0, and the length of the first zero value area is the same as the length of the 1 value holding area before the center position of the window;

the ascending window area is a ascending window which smoothly ascends from 0 to 1 and has the length N₂And N is₂>0; preferably, a cosine up-window is selected as the segment up-window;

the 1 value holding area is a section with the length of N_2cHas a value of 1 window, N_2cHas a value range of N_2c≥D₁+D_21o-N+D_2f(ii) a Wherein D is_21oThe number of signal sampling points in the overlapping region between the decoded signal of G.722.1 and the decoded signal of G.722.2 in the front;

the descending window area is a section of descending window which is the same as the rear half part of the first window type;

the central position of the window is the position of the central point of a third window type; two folding positions in the MDCT coding are the central point position of the first half area and the central point position of the second half area in the third window type;

the total length of the third window type is L_w3＝N+2*N_2c+N₂(ii) a Preferably L_w3≥2*N；

If L is_w32 × N, then the other parts in the corresponding MDCT codec do not need to be changed;

if L is_w3>2 × N, scaling or adjusting the window length related parameters in the g.722.1 codec accordingly to ensure that the L is scaled or adjusted under the constraint of the original code rate_w3G.722.1 coding and decoding are carried out on the signals with the lengths; preferably, the scale of scaling is [ L ]_w3/2N]；

In this embodiment, the third window type is shown in fig. 4, and the specific formula is:

wherein N320 is the number of samples of a frame of signal; n is a radical of₂80, the length of the ascending window area; the length of the 1-value holding region is N_2c＝D₁+D_21o-N₁+D_2f＝N-N₂/2＝280，D _21o40 is the number of sampling points of the overlapping region of the decoded signal of the MDCT and the previous g.722.2 decoded signal; when coding the (k +1) th frame, the window h₂(n_w) In n_wThe position 0 coincides with the start position of the k-th frame, the window h₂(n_w) Spanning the k, k +1 and k +2 frames;

if the coding mode type of the signal of the frame after the (k +1) th frame is the ACELP mode, g.722.1 coding is carried out on the signal of the (k +1) th frame by using the fourth window type, and other operations before the encoded code stream is output when the g.722.1 codes the last frame are simultaneously carried out in the process of switching the signal coding mode from the g.722.1(MDCT mode) to the g.722.2(ACELP mode) in the embodiment 1;

when performing local decoding of G.722.1, only the G.722.1 coding information of the (k +1) th frame is decoded to obtain a decoded signal s_1d(N), (k +1) × N +110-M, (k +1) × N + 109; the window type adopted in local decoding is the same as that in encoding, and M is 375;

the fourth window type can reconstruct partial signals to be overlapped with signals reconstructed when decoding the coding code stream of the k +1 th frame, reconstruct partial signals to be overlapped with ACELP decoding signals of the k +2 th frame, and simultaneously compensate the time delay generated by the overlapping between the frames in the MDCT coding mode and the time delay generated by the sampling rate conversion in the ACELP coding mode;

the fourth window type comprises 5 parts: a first zero value area, a rising window area, a 1 value holding area, a falling window area and a second zero value area; wherein:

the value of the 1 value holding area is 1, and the length of the 1 value holding area is N₃A window of value 1; the 1 value holding region length N₃The value range is as follows: n is a radical of₃≥D₁+D_2f(ii) a Preferably, N₃≥M；

The descending window area is a window smoothly descending from 1 to 0 and has a length N_1fSatisfies 0<N_1f≤N-D_2f；

the central position of the window is the position of a central line point of a fourth window type; two folding positions in the MDCT coding are the central point position of a front half region and the central point position of a rear half region in a fourth window type;

preferably, the fourth window type is a combination of the second window type and the third window type, wherein:

the ascending window area is the same as the ascending window area in the third window type;

the length N of the 1-value holding region₃The value range is as follows: n is a radical of₃＝N₁+N_2c；

The descending window area is the same as the descending window area of the second window type;

thus, the fourth window type has a total length of L_w4＝2*(N₃+N_2c+N_1f/2+N₂)；

It can be seen that L_w4>2 × N, scaling or adjusting the window length related parameters in the g.722.1 codec accordingly to ensure that the L is scaled or adjusted under the constraint of the original code rate_w4G.722.1 coding and decoding are carried out on the signals with the lengths; preferably, the scale of scaling is [ L ]_w4/2N]；

In this embodiment, the fourth window type is as shown in fig. 5, and the specific formula is:

where N is 320 is the number of samples in a frame of signal, N₂＝80，N₁When encoding the (k +1) th frame, window h, 110₃(n_w) In n_wThe position 0 coincides with the start position of the k-th frame, the window h₃(n_w) Spanning the k, k +1 and k +2 frames;

step E207: outputting the coded code stream of the (k +1) th frame G.722.1 and the coding mode information of the (k + 2) th frame together;

step E208: for the signals of the (k + 2) th frame and the following frames, carrying out subsequent coding processing according to the coding mode types obtained by signal classification, and ending;

preferably, if the coding mode of the (k + 2) th frame is the MDCT mode, the coding processing is continuously performed on the input signal of the (k + 2) th frame according to the original g.722.1 coding method, and then the subsequent coding processing is continuously performed according to the coding mode information obtained by subsequent signal classification;

preferably, if the coding mode of the (k + 2) th frame is the ACELP mode, the input signal of the (k + 2) th frame is coded according to the switched g.722.2 coding processing method described in embodiment 1 when the coding mode is switched from the MDCT mode to the ACELP mode; and then, according to the coding mode information obtained by subsequent signal classification, continuing to perform subsequent coding processing.

As shown in fig. 11, the method comprises the following steps at the decoding end:

step D201: sequentially reading the code stream of the 0-k-1 frame transmitted from the encoding end and the encoding mode type information of the next frame of each frame transmitted simultaneously, judging that the code stream of the currently received frame and the code streams of the previous frame and the next frame of signals are ACELP code streams according to the received encoding mode information, carrying out G.722.2 decoding on the code streams of the frames to obtain a decoding signal corresponding to the current code stream, combining the decoding signal with a signal cached when the code stream of the previous frame is decoded, outputting the decoding signal of the current frame, and storing the decoding signal of the next frame obtained by decoding in the current code stream in a buffer;

wherein the last N-N in the 0 th frame_2sThe output signal of 145 points is obtained by decoding the 0 th frame code stream, the front N_2sSetting the output signals of 175 points to be all 0;

step D202: reading a kth frame code stream and the (k +1) th frame coding mode information transmitted simultaneously, and judging that the (k +1) th frame code stream is of an MDCT type;

step D203: for the kth frame code stream, the previous frame code stream and the current frame code stream are both of ACELP type, and the next frame code stream is of MDCT type, then G.722.2 decoding is carried out on the kth frame code stream, and the input signal s of post-high-pass filtering in the G.722.2 decoding process is obtained_2ddp(n_d),n_d＝(k-1)*N_d+N_2sp,...,k*N_d+N_2sp-1 and a post-high-pass filtered output signal s_2ddpHP(n_d),n_d＝(k-1)*N_d+N_2sp,...,k*N_d+N_2sp-1, in this example, N_2sp＝76；

Step D204: compensating the influence of post-high-pass filtering nonlinear phase shift on smooth switching in the G.722.2 decoding process by using a third method;

to a post-high-pass filtered output signal s_2ddpHP(n_d),n_d＝kN_d+N_2ddpHPfs,...,kN_d+N_2ddpHPfeApplying a falling window to obtain a windowed post-high-pass filtered output signal s_2ddpHPw(n_d)：

Wherein N is_2ddpHPfeFor the end position of the windowed portion signal, N_2ddpHPfe≤N_2sp-1-D_21od–D_2fd2; in this example, N_2ddpHPfe＝31；

N_2ddpHPfsFor implementing the starting position of the windowed portion signal, N_2ddpHPfs＝N_2ddpHPfe-L_hpd2≥N_2sp-N_d；

L_hpd2For lowering window w_hpd2(j) Length of (d), descending window w_hpd2(j) May be a linear falling window:

w_hpd2(j)＝(L_hpd2-1-j)/(L_hpd2-1),j＝0,...,L_hpd2-1 (31)

length L of window_hpd2Greater than 0 and less than or equal to one frame signal length N of G.722.2 core working frequency_dSubtracting the D at the g.722.2 core operating frequency, which is generated when the g.722.2 sampling rate is interconverted between the input signal sampling rate and the g.722.2 core operating frequency_2fdHalf the delay of a sample point, i.e. D_2fd2, and then subtracting the length D of the signal in the overlapping region of the G.722.2 decoding signal and the subsequent G.722.1 decoding signal on the operating frequency of the G.722.2 core_21odI.e. 0<L_hpd2≤N_d-D_2fd/2-D_21od(ii) a In this example, take L_hpd2＝N_d-D_2fd/2-D_21od＝212；

Of course, the falling window in the present invention is not limited to the above-mentioned linear falling window, and other forms of functions having a characteristic of smoothly falling from 1 to 0 may be adopted;

for post-high-pass filtered input signal s_2ddp(n_d),n_d＝kN_d+N_2ddpHPfs,...,kN_d+N_2ddpHPfeApplying a window-up to obtain a windowed high-pass filtered input signal s_2ddpw(n_d)：

Wherein the ascending window can be a linear ascending window: 1-w_hpd2(j),j＝0,...,L_hpd2-1；

The windowed high-pass filtered input signal s_2ddpw(n_d) And the windowed high-pass filter output signal s_2ddpHPw(n_d) Overlapping and adding to obtain signal s for compensating high-pass filtering nonlinear phase shift influence_2ddpHPc(n)：

s_2ddpHPc(n_d)＝s_2ddpw(n_d)+s_2ddpHPw(n_d),n_d＝kN_d+N_2ddpHPfs,...,kN_d+N_2ddpHPfe (33)

The post-high-pass filtered output signal before the overlap region remains unchanged, and the post-high-pass filtered output signal after the overlap region is equal to the post-high-pass filtered input signal;

step D205: for the post-high-pass filtering output signal s compensated for the effect of the non-linear phase shift_2ddpHPc(n_d),n_d＝(k-1)*N_d+76,...,k*N_d+75 to get G722.2 decoding signal s_2d(n),n＝(k-1)*N+80,...,k*N+79；

Step D206: combining the decoding signal cached when the k-1 th frame code stream is decoded and the decoding signal of the current k-th frame code stream, outputting a k-th frame decoding signal s_2d(N), N ═ k-1 × N., (k-1) × N +319, and decoded signal s of the (k +1) th frame_2d(N), N ═ k × N,. k × N +79 is stored in the buffer;

step D207: reading a (k +1) th frame code stream and a (k + 2) th frame coding mode information transmitted simultaneously;

step D208: for the (k +1) th frame code stream, if the previous frame code stream is of an ACELP type and the current frame code stream is of an MDCT type, G.722.1 decoding is carried out on the (k +1) th frame code stream by adopting a new window type;

preferably, if the code stream type of the frame after the (k +1) th frame is the MDCT type, the third window type is adopted to perform G.722.1 decoding on the (k +1) th frame code stream, and the signal s of 280 sampling points after the (k +1) th frame can be obtained by the decoding at this time_1d(N), k × N +40,.., k × N +319, and k +2 frame windowed-unfolded signals;

preferably, if the code stream type of the frame after the (k +1) th frame is the ACELP type, the fourth window type is adopted to perform g.722.1 decoding on the code stream of the (k +1) th frame, and other operations required by g.722.1 decoding the last frame when the signal coding mode described in embodiment 1 is switched from the g.722.1(MDCT mode) to the g.722.2(ACELP mode) are performed at the same time; the current decoding canObtaining the signal s of 280 sampling points after the k +1 frame_1d(N), N ═ k × N + 40.., k × N +319, and the signals of the first 110 sampling points of the k +2 th frame;

step D209: performing overlap-add processing on the signals of the overlapping part of the decoded signal of G.722.2 and the decoded signal of G.722.1 to obtain a final decoded signal;

the overlap region of the G.722.2 decoded signal and the G.722.1 decoded signal is kN + N_2s-D₂-D_f/2-D_21o≤n≤kN+N_2s-1-D₂-D_fA falling window w is applied to the G.722.2 decoded signal in the region of/2, i.e. k x N + 40. ltoreq. n.ltoreq. k x N +79_amf(j) Obtaining a windowed G.722.2 decoding signal s_2dw(n)：

s_2dw(n)＝w_amf(n-(k*N+40))*s_2d(n),n＝k*N+40,...,k*N+79 (34)

Wherein the descending window w_amf(j) May be a cosine fall window:

w_amf(j)＝cos²(jπ/(L_amf-1)),n＝0,...,L_amf-1 (35)

of course, the falling window in the present invention is not limited to the cosine falling window, and other forms of functions with the characteristic of smooth falling from 1 to 0 can be adopted, and the length L of the window_amfGreater than 0 and less than or equal to length D of the overlapping region of the G.722.2 decoded signal and the G.722.1 decoded signal_21oI.e. 0<L_amf≤D_21o(ii) a In this example, take D_21o＝40；

Applying a window-up to the G.722.1 decoded signal of the region to obtain a windowed G.722.1 decoded signal s_1dw(n)：

s_1dw(n)＝(1-w_amf(n-(k*N+40)))*s_1d(n),n＝k*N+40,...,k*N+79 (36)

Wherein the ascending window can be cosine ascending window 1-w_amf(j),n＝0,...,L_amf-1；

Decoding the windowed G.722.1 signal s_1dw(n) and the windowed G.722.2 decoded signal s_2dw(n) overlap-add to obtain the final decoded signal s in the overlap region_fd(n)：

s_fd(n)＝s_2dw(n)+s_1dw(n),n＝k*N+40,...,k*N+79 (37)

The final decoded signal before the overlap region is the decoded signal s of G.722.2_2d(n), the final decoded signal after the overlap region is the decoded signal s of G.722.1_1d(n)；

The ascending window may be a non-cosine ascending window, and other forms of functions having a characteristic of smoothly ascending from 0 to 1 may be used.

Step D210: outputting the finally decoded k +1 frame signal, and storing part of the k +2 frame signal obtained by decoding in a buffer;

step D211: sequentially reading the (k + 2) th frame and the subsequent code streams and the coding mode information transmitted simultaneously, executing subsequent decoding processing according to the received coding mode information, and ending;

preferably, if the coding mode of the (k + 2) th frame is the MDCT mode, the decoding processing is continuously performed on the (k + 2) th frame code stream according to the original g.722.1 decoding method, and then the subsequent decoding processing is continuously performed according to the subsequently received coding mode information;

preferably, if the coding mode of the (k + 2) th frame is the ACELP mode, the switching g.722.2 decoding processing method is used to perform decoding processing on the (k + 2) th frame code stream when the coding mode is switched from the MDCT mode to the ACELP mode as described in embodiment 1, and then the subsequent decoding processing is continued according to the subsequently received coding mode information.

Example 3

This embodiment describes a codec device for smoothly switching between audio coding modes, which implements the smooth switching method of the above-mentioned embodiments, and includes a coding mode switching device when the coding mode is switched from the MDCT mode to the ACELP mode, a decoding mode switching device when the decoding mode is switched from the MDCT mode to the ACELP mode, a coding mode switching device when the coding mode is switched from the ACELP mode to the MDCT mode, and a decoding mode switching device when the decoding mode is switched from the ACELP mode to the MDCT mode.

The encoding mode switching apparatus for switching the encoding mode from the MDCT mode to the ACELP mode is used for encoding when the encoding mode type of the i-th frame is the modified discrete cosine transform MDCT encoding mode and the encoding mode of the subsequent frame, i.e. the i + 1-th frame, is the algebraic code-excited linear prediction ACELP encoding mode, and includes a first encoding module E310, a first decoding module D310, and a second encoding module E320, as shown in fig. 12, where:

the first encoding module E310 is configured to perform MDCT encoding on the input signal of the ith frame by using a predefined window type to obtain encoding information of the ith frame; the predefined window type enables partial signals of the (i +1) th frame to be reconstructed when the coded code stream of the (i) th frame is decoded;

the first decoding module D310 is configured to decode the i-th frame or the encoded information of the i-th frame and frames before the i-th frame to obtain a decoded signal;

the second encoding module E320 is configured to establish and update a history state of a required filter in an ACELP encoding mode according to the decoded signal, the partial input signal including the i-th frame and the first partial input signal including the i + 1-th frame, and perform ACELP encoding processing on an input signal including a length of a common frame of the second partial input signal including the i + 1-th frame and the partial input signal including the i + 2-th frame based on the updated history state of the required filter in the ACELP encoding mode;

preferably, the second encoding module E320 includes a first preprocessing module E321 and a second core encoding module E322, as shown in fig. 13, wherein:

the first preprocessing module E321 is configured to: for the decoded signal s_1dDown-sampling the ACELP coding mode to obtain the signal s_1dd(ii) a And downsampling the partial input signal of the i-th frame and the first partial input signal of the (i +1) -th frame by using a downsampling filter in an ACELP coding mode to obtain a signal s_dThereafter, saving the state S of the downsampling filter_d. Preferably, the first preprocessing module is executed before outputting the MDCT encoded code stream of the i-th frame.

The second coreThe heart encoding module E322 is configured to: using pre-emphasis filter pairs s in ACELP coding mode_1ddPre-emphasis is carried out to obtain a pre-emphasized signal s on the core working frequency of the ACELP coding mode_1ddp；

Using a pre-emphasis filter in the ACELP coding mode for the signal s_dPre-emphasis is performed to obtain a signal s_dpThen, the s is added_dpSaving the state S of the pre-emphasis filter as an ACELP pre-processing signal_p；

For input signal s₂When down-sampling processing in ACELP coding is carried out, the state S of the down-sampling filter is set_dAs the history state of the downsampling filter in ACELP coding; the input signal s₂The input signals are the input signals of the length of a common frame of the second part of the input signals of the (i +1) th frame and the part of the input signals of the (i + 2) th frame;

for the input signal s₂In the ACELP coding process, when open-loop pitch search is needed, a high-pass filter in the open-loop pitch search in the ACELP coding process is used for carrying out high-pass filtering processing on the perception weighted signal to obtain a perception weighted signal after high-pass filtering and the state of the high-pass filter, the perception weighted signal after high-pass filtering is used as historical perception weighted signal cache data after high-pass filtering needed by open-loop pitch search gain calculation, and the state of the high-pass filter is used as the historical state of the high-pass filter needed by the open-loop pitch search gain calculation;

for the input signal s₂Using said s when it is required to use the history of the LPC synthesis filter in performing ACELP coding_1ddpLast M_LPCoThe length information is used as the history state of the LPC synthesis filter required when processing the first subframe in the ACELP coding; wherein, M is_LPCoThe order of LPC in ACELP coding;

for the input signal s₂In the ACELP coding process, when the historical excitation signal is needed to be used, the LPC coefficient of the first subframe calculated in the ACELP coding process is used for forming a prediction analysis filter, and the s is subjected to the prediction analysis filter_1ddpAnalyzing and filtering to obtain residual signal of LPC analysis filtering, and using the obtained residual signal as historical excitation signal required in ACELP coding；

For the input signal s₂In the process of ACELP coding, when closed-loop pitch search is required, calculating the error between a coded input signal of one frame length before the initial position of a current ACELP coding processing signal and an MDCT decoded signal at a corresponding position; and filtering the error by using a perceptual weighting filter in the ACELP coding process, and taking the state of the obtained perceptual weighting filter as the historical state of the perceptual weighting filter in the calculation of a target signal required by closed-loop pitch search in the ACELP coding process.

The first decoding module E310 is further configured to: when the coding mode of the (i + 2) th frame is an MDCT mode, coding the (i + 2) th frame signal as follows:

performing MDCT coding on the i +2 frame signal by using a preset window type; the preset window type enables a part of signals to be reconstructed when the coding code stream of the (i + 2) th frame is decoded to be overlapped with the signals reconstructed when the ACELP coding code stream of the (i +1) th frame is decoded, and the delay difference between the ACELP coding and decoding before mode switching and the MDCT coding and decoding after mode switching is compensated.

The decoding mode switching device when the decoding mode is switched from the MDCT mode to the ACELP mode is used for decoding when the code stream type of the i-th frame is a modified discrete cosine transform MDCT code stream and the code stream type of the subsequent frame, i.e., the (i +1) -th frame, is an algebraic code excited linear prediction ACELP code stream, and includes a first decoding module D310, a second decoding module D320, and a first comprehensive processing module Z310, as shown in fig. 14, where:

the first decoding module D310 is configured to perform MDCT decoding on the code stream of the ith frame by using a predefined window type to obtain a decoded signal of the ith frame and an MDCT decoded signal of the (i +1) th frame; the predefined window type enables partial signals of the (i +1) th frame to be reconstructed when MDCT decoding is carried out on the code stream of the ith frame;

the second decoding module D320 is configured to establish and update a history state of a required filter in ACELP decoding according to the partial decoded signal of the i-th frame and the MDCT decoded signal of the i + 1-th frame, and perform ACELP decoding on the code stream of the i + 1-th frame based on the updated history state of the required filter in ACELP decoding to obtain an ACELP decoded signal;

the first comprehensive processing module Z310 is configured to process the MDCT decoded signal of the (i +1) th frame and the ACELP decoded signal to obtain a final decoded signal of the (i +1) th frame.

Preferably, the second decoding module D320 includes a second preprocessing module D321 and a second core decoding module D322, as shown in fig. 15, wherein:

the second preprocessing module D321 is configured to down-sample the partial decoded signal of the i-th frame and the MDCT decoded signal of the i + 1-th frame to obtain a signal s_1dd。

The second core decoding module D322 is configured to perform one or a combination of the following:

the code stream of the (i +1) th frame is processedUsing said s when a de-emphasis filter is needed in the ACELP decoding process_1ddAs the de-emphasis filter history state in ACELP decoding;

Preferably, the downsampling processes in the first decoding module D310 and the second preprocessing module D321 are both performed before the i-th frame MDCT decoded signal is output.

The first decoding module D310 is further configured to: when the code stream type of the (i + 2) th frame is an MDCT code stream, decoding the (i + 2) th frame code stream according to the following mode:

performing MDCT decoding on the (i + 2) th frame code stream by using a preset window type; the preset window type enables a part of signals to be reconstructed when the code stream of the (i + 2) th frame is decoded to be overlapped with the signals reconstructed when the ACELP coding code stream of the (i +1) th frame is decoded, and the delay difference between the ACELP coding and decoding before mode switching and the MDCT coding and decoding after mode switching is compensated.

The first comprehensive processing module Z310 is further configured to, when the code stream type of the (i + 2) th frame is an MDCT code stream, process the MDCT decoding signal of the (i + 2) th frame and the ACELP decoding signal obtained when the ACELP coding code stream of the (i +1) th frame is decoded to obtain a final decoding signal of the (i + 2) th frame.

The encoding mode switching device for switching the encoding mode from the ACELP mode to the MDCT mode is used for encoding signals of a k frame and a k +1 frame when the encoding mode type of the k frame is the algebraic code-excited linear prediction ACELP encoding mode, the k-1 frame before the k frame is the ACELP encoding mode, and the k +1 frame after the k frame is the modified discrete cosine transform MDCT encoding mode, and includes a third encoding module E330 and a fourth encoding module E340, as shown in fig. 16, wherein:

the third encoding module E330 is configured to perform downsampling on the input signal of one frame length in the k frame and the k +1 frame to obtain a signal s at the ACELP core operating frequency_dTo said s_dUsing a high-pass filter in ACELP coding to process to obtain a signal s_dHP(ii) a For the signal s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPc(ii) a For the s_dHPcCarrying out subsequent ACELP coding processing to obtain an ACELP coding code stream of the kth frame; wherein, the input signal of the kth frame contained in the input signal of one frame length in the kth frame and the (k +1) th frame is the signal of the rest part of the kth frame after the partial signal of the kth frame is input in the previous ACELP coding;

the fourth encoding module E340 is configured to perform MDCT encoding on the (k +1) th frame signal by using a predefined window type; the predefined window type enables a part of signals to be reconstructed when the coding code stream of the (k +1) th frame is decoded to be overlapped with the reconstructed signals when the ACELP coding code stream of the (k) th frame is decoded, and the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching is compensated.

The decoding mode switching device when the coding mode is switched from the ACELP mode to the MDCT mode is used for decoding the code stream of the kth frame and the (k +1) th frame when the code stream type of the kth frame is an algebraic code excited linear prediction ACELP code stream, the previous frame, namely the (k-1) th frame, is an ACELP code stream, and the code stream type of the subsequent frame, namely the (k +1) th frame, is a modified discrete cosine transform MDCT code stream, and includes a third decoding module D330, a fourth decoding module D340, and a second comprehensive processing module Z320, as shown in fig. 17, wherein:

the third decoding module D330 is used for performing ACELP decoding on the kth frame code stream to obtain an input signal s of a post-high-pass filter in the ACELP decoding process_2ddpAnd the output signal s of the post-high-pass filter_2ddpHPAnd carrying out post high-pass filtering nonlinear phase shift compensation processing to obtain s_2ddpHPcTo s to_2ddpHPcPerforming subsequent ACELP decoding to obtain the k-th frameAn ACELP decoded signal and an ACELP decoded signal of a (k +1) th frame;

the fourth decoding module D340 is configured to perform MDCT decoding on the (k +1) th frame code stream by using a predefined window type to obtain an MDCT decoded signal; the predefined window type enables a part of signals to be reconstructed when MDCT decoding is carried out on the (k +1) th frame code stream and the signals reconstructed when ACELP coding is carried out on the k +1 th frame code stream to be overlapped, and compensates the delay difference between the ACELP coding and decoding before decoding mode switching and the MDCT coding and decoding after the coding mode switching;

the second comprehensive processing module Z320 is configured to process the ACELP decoded signal and the MDCT decoded signal of the (k +1) th frame to obtain a final decoded signal of the (k +1) th frame.

It should be noted that many of the details described in the method embodiments are equally applicable to the above-described apparatus embodiments, and therefore repeated descriptions of the same or similar parts are omitted.

If no smoothing processing is carried out, the coding and decoding switching between the MDCT mode and the ACELP mode is directly carried out, and because the delay of the two coding modes is different, a blank segment or a repeated segment can be generated when the two coding modes are switched, so that the information is lost or repeated; even if different delays between two coding modes are not considered, the two coding modes have different processing methods for signals, and the methods for realizing smooth transition between frames are different, and when the two coding modes are switched directly, the two methods for realizing smooth transition between frames cannot play a role, so that the switching signals generate serious distortion such as jumping at the transition part between frames, and smooth transition cannot be realized. The disclosed method for switching between the coding mode of ACELP and the coding mode of MDCT is not effective, has high computational complexity, or requires additional code rate or delay.

Compared with the prior art, the method fully utilizes the MDCT coding mode and the ACELP coding mode to realize the smooth transition between frames, reasonably utilizes the memory of each filter in the ACELP coding by avoiding the abrupt change of the window type in the MDCT coding and decoding, simultaneously coordinates the compensation of the nonlinear phase shift influence of a high-pass filter in the ACELP coding and decoding, and utilizes the overlapping addition processing of the MDCT decoding signal and the ACELP decoding signal to ensure the smooth transition when the coding mode is switched from the MDCT coding mode to the ACELP coding mode; by avoiding the steep window change in the MDCT coding and decoding, simultaneously matching the compensation of the nonlinear phase shift influence of a high-pass filter in the ACELP coding and decoding, and utilizing the overlapping addition of the ACELP decoding signal and the MDCT decoding signal, the smooth transition of the coding mode when the coding mode is switched from the ACELP coding mode to the MDCT coding mode is ensured.

The invention fully considers the characteristics of different time delays of the MDCT coding mode and the ACELP coding mode and different signal lengths required by one-time coding under the common condition, compensates the delay difference between the two coding modes and the length difference of available data information of a decoding end by modifying the window type of the MDCT coding mode in the switching process, and ensures that the whole audio coding and decoding device based on the smooth switching between the MDCT coding mode and the ACELP coding mode has no additional time delay increase.

The invention also ensures that no additional code stream is added in the MDCT coding under a new MDCT window type in the switching process by reasonably utilizing the bit distribution scheme in the MDCT coding mode; according to different influences of parameters in the ACELP coding on switching effects, the number of coded code stream bits occupied by the parameters which are adverse or not much influenced on smooth switching is properly changed to transmit some side information beneficial to smooth switching, and the ACELP coding is ensured not to have extra code stream increase in the switching process.

The invention fully considers the influence of each parameter in each coding mode on smooth switching, extracts the parameter with larger influence on the smooth switching to carry out corresponding processing, and ensures less calculation amount introduced in the smooth switching process; meanwhile, the method fully utilizes the characteristic that the computational complexity of the MDCT coding and decoding is far less than that of the ACELP coding and decoding, reasonably moves the operation of establishing the historical state of part of the ACELP coding and decoding in the switching process to the MDCT coding and decoding, shares the complexity increased in the switching process of the ACELP coding and decoding, and ensures that the maximum value of the computational complexity of the smooth switching method between the MDCT coding mode and the ACELP coding mode is lower.

It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Claims

1. A method for coding mode switching, comprising:

when the coding mode type of the k frame is determined to be an Algebraic Code Excited Linear Prediction (ACELP) coding mode, the k-1 frame is an ACELP coding mode, and the k +1 frame is a Modified Discrete Cosine Transform (MDCT) coding mode, then:

down-sampling the input signal with one frame length in the k frame and the k +1 frame to obtain a signal s on the ACELP core working frequency_dTo said s_dUsing a high-pass filter in ACELP coding to process to obtain a signal s_dHP(ii) a Wherein, the input signal of the kth frame contained in the input signal of one frame length in the kth frame and the (k +1) th frame is the signal of the rest part of the kth frame after the partial signal of the kth frame is input in the previous ACELP coding;

MDCT encoding the k +1 frame signal using a predefined window type; the predefined window type enables a part of signals to be reconstructed when the coding code stream of the (k +1) th frame is decoded to be overlapped with the reconstructed signals when the ACELP coding code stream of the (k) th frame is decoded, and the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching is compensated;

wherein the kth frame represents an input signal of the kth frame, the (k-1) th frame represents a frame previous to the kth frame, and the (k +1) th frame represents a frame subsequent to the kth frame.

2. The method of claim 1, wherein if the coding mode of the frame subsequent to the (k +1) th frame is an MDCT coding mode:

the predefined window type is a third window type, and the third window type sequentially comprises 5 parts from left to right: a first null region, a rising window region, a 1 value holding region, a falling window region, a second null region, wherein:

3. The method of claim 2, wherein the third window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

4. The method of claim 1, wherein if the coding mode of the frame subsequent to the (k +1) th frame is an ACELP coding mode:

the predefined window type is a fourth window type, and the fourth window type meets the following requirements:

the 1 value holding region has a value of 1 and a length of N₃And satisfies the following conditions: n is a radical of₃≥D₁+D_2f；

5. The method of claim 4, wherein the fourth window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

6. Method according to claim 2 or 4, characterized in that the length L of the predefined window pattern_w>2 × N, scaling parameters related to the length of the predefined window type with a scaling factor [ L ] in the MDCT encoding of the input signal of the (k +1) th frame using the predefined window type_w/2N]，[·]Indicating that the nearest integer is taken.

7. The method of claim 1, wherein said pair of said signals s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPcThe method comprises the following steps:

8. The method of claim 7, wherein the fourth falling window is a linear falling window and the fourth rising window is a linear rising window.

9. A decoding mode switching method, comprising:

when the code stream type of the kth frame is algebraic code excited linear prediction ACELP code stream, the kth-1 frame is ACELP code stream, and the code stream type of the kth +1 frame is Modified Discrete Cosine Transform (MDCT) code stream, then:

ACELP decoding is carried out on the kth frame code stream to obtain an input signal s of a post-high-pass filter in the ACELP decoding process_2ddpAnd the output signal s of the post-high-pass filter_2ddpHPFor the signal s_2ddpAnd said signal s_2ddpHPPost-high-pass filtering nonlinear phase shift compensation processing is carried out to obtain a signal s_2ddpHPcFor the signal s_2ddpHPcCarrying out subsequent ACELP decoding processing to obtain an ACELP decoded signal of the kth frame and an ACELP decoded signal of the (k +1) th frame;

processing the ACELP decoded signal and the MDCT decoded signal of the (k +1) th frame to obtain a final decoded signal of the (k +1) th frame;

10. The method of claim 9, wherein the post-high pass filtering non-linear phase shift compensation processing results in a signal s_2ddpHPcThe method comprises the following steps:

for the output signal s of the post-high-pass filter_2ddpHPApplying a length L having the characteristic of smoothly dropping from 1 to 0_hpd2Obtaining the signal s by the fifth falling window of_2ddpHPw(ii) a And, for the rear elevationInput signal s of a pass filter_2ddpApplying a length L having a smooth rise from 0 to 1 characteristic_hpd2The fifth window of (1) obtains a signal s_2ddpw(ii) a Will signal s_2ddpHPwAnd s_2ddpwThe value obtained by the superposition is taken as s_2ddpHPcL in the fifth descending window_hpd2Points, s_2ddpHPcSum of values before the fifth falling window_2ddpHPCoincidence, s_2ddpHPcValue sum s after a fifth falling window_2ddpIn agreement, said L_hpd2Greater than 0 and less than or equal to N_d－D_2fd/2－D_21odWherein N is_dFor the length of a frame signal at the ACELP core operating frequency, D_2fdThe number of sampling points on the ACELP core operating frequency corresponding to the delay generated when the sampling rate in ACELP is converted between the input signal sampling frequency and the ACELP core operating frequency, D_21odIs the length of the signal in the overlapping region of the ACELP decoded signal and the subsequent MDCT decoded signal at the ACELP core operating frequency, and the sum of the fifth falling window and the fifth rising window is 1.

11. The method of claim 10, wherein the fifth falling window is a linear falling window and the fifth rising window is a linear rising window.

12. The method of claim 9, wherein if the code stream of the frame subsequent to the (k +1) th frame is an MDCT code stream:

the first window type is a window type used when MDCT decoding is performed on a frame that is not in decoding mode switching.

13. The method of claim 12, wherein the third window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

14. The method of claim 9, wherein if the codestream of the frame subsequent to the (k +1) th frame is an ACELP codestream:

the above-mentionedThe ascending window region smoothly ascends from 0 to 1 and has a length N₂Greater than 0;

15. The method of claim 14, wherein the fourth window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

16. The method of claim 12 or 14, wherein the length L of the predefined window pattern_w>2 × N, scaling parameters related to the length of the predefined window type by [ L ] in the MDCT decoding of the input signal of the (k +1) th frame using the predefined window type_w/2N]，[·]Indicating that the nearest integer is taken.

17. The method of claim 9, wherein the processing the ACELP decoded signal of the k +1 frame and the MDCT decoded signal to obtain the final decoded signal of the k +1 frame comprises:

decoding the ACELP decoded signal s for the (k +1) th frame_2dApplying a length L having the characteristic of smoothly dropping from 1 to 0_amfObtaining the signal s by the sixth falling window of_2dw(ii) a And decoding the MDCT decoded signal s_1dApplying a length L having a smooth rise from 0 to 1 characteristic_amfThe sixth window of (1) obtains a signal s_1dwThe final decoded signal s of the (k +1) th frame_fdThe value in the sixth window of ascent is s_1dw+s_2dwS before the sixth ascending window_fdValue of (a) and said s_2dCoincidently, s after the sixth lifting window_fdValue of (a) and said s_1dThe consistency is achieved; wherein the sum of the sixth falling window and the sixth rising window is 1, 0<L_amf≤D_21oSaid D is_21oThe number of sampling points of the overlapping region of the ACELP decoded signal and the MDCT decoded signal.

18. An encoding mode switching apparatus, wherein the encoding mode switching apparatus is configured to encode code streams of a k frame and a k +1 frame when an encoding mode type of the k frame is an algebraic code-excited linear prediction (ACELP) encoding mode, a preceding frame (i.e., a k-1 frame) is an ACELP encoding mode, and a subsequent frame (i.e., a k +1 frame) is a Modified Discrete Cosine Transform (MDCT) encoding mode, and the apparatus comprises:

a third coding module for down-sampling the input signal with one frame length in the k frame and the k +1 frame to obtain a signal s at the ACELP core operating frequency_dTo said s_dUsing a high-pass filter in ACELP coding to process to obtain a signal s_dHP(ii) a For the signal s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPc(ii) a For the s_dHPcCarrying out subsequent ACELP coding processing to obtain an ACELP coding code stream of the kth frame; wherein, the input signal of the kth frame contained in the input signal of one frame length in the kth frame and the (k +1) th frame is the signal of the rest part of the kth frame after the partial signal of the kth frame is input in the previous ACELP coding;

a fourth encoding module for performing MDCT encoding on the k +1 frame signal using a predefined window type; the predefined window type enables a part of signals to be reconstructed when the coding code stream of the (k +1) th frame is decoded to be overlapped with the reconstructed signals when the ACELP coding code stream of the (k) th frame is decoded, and the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching is compensated;

wherein the k-th frame represents the input signal of the k-th frame.

19. The apparatus of claim 18, wherein if the coding mode of the frame subsequent to the k +1 th frame is an MDCT coding mode:

wherein D is₁The number of sampling points corresponding to the time delay generated by the overlapping of frames in the MDCT coding mode on the sampling rate of the input signal; d_2fThe number of corresponding sampling points, D, on the input signal sampling rate for the delay produced by the sampling rate conversion in ACELP_21o≧ 0 is the number of sample points in the overlapping region of the MDCT decoded signal and the ACELP decoded signal when the coding mode is switched from the ACELP mode to the MDCT mode, and N is the sample of a frame signal of the MDCT coding modeThe number of points;

20. The apparatus of claim 19, wherein the third window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

21. The apparatus of claim 18, wherein if the coding mode of the frame subsequent to the (k +1) th frame is an ACELP coding mode:

the fourth window type enables a part of signals to be reconstructed to be overlapped with signals reconstructed when the coding code stream of the k +1 th frame is decoded, and enables a part of signals to be reconstructed to be overlapped with ACELP decoding signals of the k +2 th frame, and meanwhile, the delay difference between the ACELP coding and decoding before coding mode switching and the MDCT coding and decoding after coding mode switching can be compensated, and the delay difference between the MDCT coding and decoding before coding mode switching and the ACELP coding and decoding after coding mode switching can be compensated when the coding to be generated later is switched to the ACELP mode again;

22. The apparatus of claim 21, wherein the fourth window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

23. The apparatus of claim 19 or 21, wherein a fourth encoding module is at a length L of the predefined window pattern_w>2 × N, scaling parameters related to the length of the predefined window type with a scaling factor [ L ] in the MDCT encoding of the input signal of the (k +1) th frame using the predefined window type_w/2N]，[·]Indicating that the nearest integer is taken.

24. The apparatus of claim 18, wherein the third encoding module pairs the signal s_dHPCarrying out high-pass filtering nonlinear phase shift compensation processing to obtain a compensated signal s_dHPcThe method comprises the following steps:

for the output signal s of the high-pass filter_dHPApplying a length L having the characteristic of smoothly dropping from 1 to 0_hpe2Under fourthWindow dropping to obtain signal s_dHPw(ii) a And, an input signal s to said high-pass filter_dApplying a length L having a smooth rise from 0 to 1 characteristic_hpe2The fourth window of (1) obtains a signal s_dw(ii) a Will signal s_dHPwAnd s_dwThe value obtained by the superposition is taken as s_dHPcL in the fourth falling window_hpe2Points, s_dHPcSum of values before the fourth falling window_dHPCoincidence, s_dHPcValue sum s after a fourth falling window_dIn agreement, said L_hpe2More than 0 and less than or equal to the length of the signal of the three sub-frames at the ACELP core operating frequency minus the length of the signal in the overlapping region of the ACELP decoded signal and the subsequent MDCT decoded signal at the ACELP core operating frequency, and the sum of the fourth falling window and the fourth rising window is 1.

25. The apparatus of claim 24, wherein the fourth falling window is a linear falling window and the fourth rising window is a linear rising window.

26. A decoding mode switching device is characterized in that when the code stream type of the kth frame is algebraic code excited linear prediction ACELP code stream, the previous frame, namely the kth-1 frame, is ACELP code stream, and the code stream type of the next frame, namely the kth +1 frame, is Modified Discrete Cosine Transform (MDCT) code stream, the decoding mode switching device is used for decoding the kth frame and the kth +1 frame code stream and comprises the following steps:

the second comprehensive processing module is used for processing the ACELP decoded signal and the MDCT decoded signal of the (k +1) th frame to obtain a final decoded signal of the (k +1) th frame;

wherein the k-th frame represents the input signal of the k-th frame.

27. The apparatus of claim 26, wherein the third decoding module performs post-high pass filtering nonlinear phase shift compensation processing to obtain s_2ddpHPcThe method comprises the following steps:

for the output signal s of the post-high-pass filter_2ddpHPApplying a length L having the characteristic of smoothly dropping from 1 to 0_hpd2Obtaining the signal s by the fifth falling window of_2ddpHPw(ii) a And, an input signal s to said post-high-pass filter_2ddpApplying a length L having a smooth rise from 0 to 1 characteristic_hpd2The fifth window of (1) obtains a signal s_2ddpw(ii) a Will signal s_2ddpHPwAnd s_2ddpwThe value obtained by the superposition is taken as s_2ddpHPcL in the fifth descending window_hpd2Points, s_2ddpHPcSum of values before the fifth falling window_2ddpHPCoincidence, s_2ddpHPcValue sum s after a fifth falling window_2ddpIn agreement, said L_hpd2Greater than 0 and less than or equal to N_d－D_2fd/2－D_21odWherein N is_dFor the length of a frame signal at the ACELP core operating frequency, D_2fdThe number of sampling points on the ACELP core operating frequency corresponding to the delay generated when the sampling rate in ACELP is converted between the input signal sampling frequency and the ACELP core operating frequency, D_21odIs the length of the signal in the overlapping region of the ACELP decoded signal and the subsequent MDCT decoded signal at the ACELP core operating frequency, and the sum of the fifth falling window and the fifth rising window is 1.

28. The apparatus of claim 27, wherein the fifth falling window is a linear falling window and the fifth rising window is a linear rising window.

29. The apparatus of claim 26, wherein if the code stream of the frame subsequent to the k +1 th frame is an MDCT code stream:

wherein D is₁The number of sampling points corresponding to the time delay generated by the overlapping of frames in the MDCT coding mode on the sampling rate of the input signal; d_2fThe number of corresponding sampling points, D, on the input signal sampling rate for the delay produced by the sampling rate conversion in ACELP_21oThe number of sampling points of an overlapping region of the MDCT decoding signal and the ACELP decoding signal is more than or equal to 0 when the coding mode is switched from the ACELP mode to the MDCT mode, and N is the number of sampling points of a frame signal of the MDCT coding mode; the first window type is a window type used when MDCT decoding is performed on a frame that is not in decoding mode switching.

30. The apparatus of claim 29, wherein the third window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

31. The apparatus of claim 26, wherein if the codestream of the frame subsequent to the (k +1) th frame is an ACELP codestream:

32. The apparatus of claim 31, wherein the fourth window type is as follows:

wherein n is_wIndexed by the signal at a sampling rate of 16 kHz.

33. The apparatus of claim 29 or 31, wherein the fourth decoding module is to decode a length L of the predefined window type_w>2 × N, scaling parameters related to the length of the predefined window type by [ L ] in the MDCT decoding of the input signal of the (k +1) th frame using the predefined window type_w/2N]，[·]Indicating that the nearest integer is taken.

34. The apparatus of claim 26, wherein the second synthesis processing module processing the ACELP decoded signal of the (k +1) th frame and the MDCT decoded signal to obtain the final decoded signal of the (k +1) th frame comprises: