CN103714822A

CN103714822A - Sub-band coding and decoding method and device based on SILK coder decoder

Info

Publication number: CN103714822A
Application number: CN201310740505.7A
Authority: CN
Inventors: 陈若非; 高泽华; 邢世义
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2014-04-09
Anticipated expiration: 2033-12-27
Also published as: CN103714822B

Abstract

The invention discloses a sub-band coding and decoding method and device based on an SILK coder decoder, and belongs to the field of audio coding and decoding. The sub-band coding and decoding method comprises the steps of obtaining a full-frequency time-domain signal corresponding to a current audio frame, dividing the full-frequency time-domain signal into a low-frequency time-domain signal and a high-frequency time-domain signal, carrying out SILK coding processing on the low-frequency time-domain signal to generate a low-frequency parameter corresponding to the low-frequency time-domain signal, coding the high-frequency time-domain signal according to the low-frequency parameter to generate a high-frequency parameter corresponding to the high-frequency time-domain signal, and compressing the low-frequency parameter and the high-frequency parameter quantitatively to generate a bit stream corresponding to the current audio frame. More bit resources are distributed to the low-frequency signal, the high-frequency signal is coded through relatively few bit resources, and therefore more reasonable distribution of the bit resources is achieved. The coding efficiency can be improved effectively, the harmonic wave structure in the high-frequency signal can be kept, and a better listening effect is achieved under the same setting of the bit rate.

Description

Subband decoding method and device based on SILK codec

Technical field

The present invention relates to audio coding decoding field, particularly a kind of subband decoding method and device based on SILK codec.

Background technology

Along with the development of internet, the demand of speech communication constantly increases, the VOIP(Voice Over Internet Protocol based on voice packet exchange) technology is with its low cost, easily expand and good speech quality is more and more subject to user's favor.

In VOIP technology, relatively the coded system of main flow is SILK coding, its coded system is: at coding side, voice signal is carried out to modeling, by speech model, signal is disassembled into different systematic parameters, these parameters are reached to decoding end by channel, demoder solves correlation parameter, then according to identical speech model, recovers voice signal.

In realizing process of the present invention, inventor finds that prior art at least exists following problem:

In the sound sending people, high-frequency signal does not have low frequency signal abundant conventionally, SILK scrambler is respectively low-and high-frequency signal to be processed according to default bit resource, so position distribution is efficient not when compiling broadband voice, makes the Efficiency Decreasing to low-frequency signal processing.

Summary of the invention

In order to solve the problem of prior art, the embodiment of the present invention provides a kind of subband decoding method and device based on SILK codec.Described technical scheme is as follows:

On the one hand, provide a kind of method of coding subband based on SILK codec, described method comprises:

Obtain the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;

Described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;

Described low-frequency parameter and described high-frequency parameter are quantized to bit stream corresponding to the compression described current audio frame of generation.

Preferably, described and according to described full range time-domain signal and described high frequency time-domain signal, encoding to process generates high-frequency parameter corresponding to high frequency time-domain signal, comprising:

When described current audio frame is unvoiced frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;

According to the cutoff frequency of described harmonic structure, judge in described high frequency time-domain signal whether have harmonic structure;

While there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;

By described high frequency time-domain signal input linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;

By described modulating frequency, described gain-adjusted ratio and described high frequency LSP parameter identification, be high-frequency parameter corresponding to described high frequency time-domain signal.

Preferably, describedly according to described low frequency time-domain signal, carry out the complete excitation of low frequency that obtains when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping, comprising:

According to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carry harmonic structure;

The excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, do not carry harmonic structure;

According to default mixing constant corresponding to described the first high frequency pumping, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.

Preferably, described judge in described full range time-domain signal, whether there is harmonic structure after, described method also comprises:

While not existing harmonic structure or described current audio frame to be unvoiced frames in described high frequency time-domain signal, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.

On the other hand, provide a kind of subband solutions code method based on SILK codec, described method comprises:

Obtain the bit stream that current audio frame is corresponding, and by parameter decoder, described bit stream decoding is obtained to low-frequency parameter and high-frequency parameter;

According to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;

Described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and described full range time-domain signal is the decoded voice data of described current audio frame.

Preferably, the described intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, decodes and obtains high frequency time-domain signal described high-frequency parameter, comprising:

According to the modulating frequency in described high-frequency parameter, judge in described current audio frame whether have harmonic structure;

When there is harmonic structure in described voice data, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;

By the high frequency LPC coefficient in described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.

Preferably, described according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping, comprising:

According to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carry harmonic structure;

The excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, do not carry harmonic structure;

According to default mixing constant corresponding to described the 4th high frequency pumping, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.

Preferably, described judge in described current audio frame, whether there is harmonic structure after, described method also comprises:

When there is not harmonic structure in described voice data, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping;

On the other hand, provide a kind of subband coding apparatus based on SILK codec, described device comprises:

The first acquisition module, for obtaining the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;

Coding module, processes for described low frequency time-domain signal being carried out to SILK coding, generates low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;

Generation module, for quantizing bit stream corresponding to the compression described current audio frame of generation by described low-frequency parameter and described high-frequency parameter.

Preferably, described coding module, comprising:

The first computing unit, while being unvoiced frame for described current audio frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;

The first judging unit, for according to the cutoff frequency of described harmonic structure, judges in described high frequency time-domain signal whether have harmonic structure;

The second computing unit, for when there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;

The 3rd computing unit, for described high frequency time-domain signal is inputted to linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;

Determining unit, for by described modulating frequency, described gain-adjusted than and described high frequency LSP parameter identification be high-frequency parameter corresponding to described high frequency time-domain signal.

Preferably, described the second computing unit, comprising:

First processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carries harmonic structure;

Second processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, does not carry harmonic structure;

The first computation subunit, for default mixing constant corresponding to described the first high frequency pumping of basis, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.

Preferably, described coding module also comprises:

The 4th computing unit, for when described high frequency time-domain signal does not exist harmonic structure or described current audio frame to be unvoiced frames, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.

On the other hand, provide a kind of subband decoding device based on SILK codec, described device comprises:

The second acquisition module, for obtaining the bit stream that current audio frame is corresponding, and obtains low-frequency parameter and high-frequency parameter by parameter decoder to described bit stream decoding;

Decoder module, for decoding and obtain low frequency time-domain signal described low-frequency parameter according to SILK demoder; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;

Synthesis module, for described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, described full range time-domain signal is the decoded voice data of described current audio frame.

Preferably, described decoder module, comprising:

The second judging unit, for according to the modulating frequency of described high-frequency parameter, judges in described current audio frame whether have harmonic structure;

The 5th computing unit, for when there is harmonic structure in described voice data, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;

Synthesis unit, for by the high frequency LPC coefficient of described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.

Preferably, described the 5th computing unit, comprising:

The 3rd processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carries harmonic structure;

The 4th processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, does not carry harmonic structure;

The second computation subunit, for default mixing constant corresponding to described the 4th high frequency pumping of basis, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.

Preferably, described decoder module also comprises:

The 6th computing unit, for when there is not harmonic structure in described voice data, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping;

The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:

By SILK scrambler, low frequency signal is encoded, by high-frequency signal is encoded separately, more bit resource is distributed to low frequency signal, and go high-frequency signal coding by relatively few bit resource, thereby realize more rational bit resource, distribute.Can effectively improve code efficiency, and the harmonic structure in can reserved high-frequency signal, thereby under arranging, identical bit rate obtains better sense of hearing effect.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing of required use during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the method for coding subband process flow diagram based on SILK codec that the embodiment of the present invention one provides;

Fig. 2 is the subband solutions code method process flow diagram based on SILK codec that the embodiment of the present invention two provides;

Fig. 3 is the method for coding subband process flow diagram based on SILK codec that the embodiment of the present invention three provides;

Fig. 4 is the structural drawing of scrambler in the subband solutions code method based on SILK codec that provides of the embodiment of the present invention three;

Fig. 5 is the method for coding subband process flow diagram based on SILK codec that the embodiment of the present invention four provides;

Fig. 6 is the structural drawing of demoder in the subband solutions code method based on SILK codec that provides of the embodiment of the present invention four;

Fig. 7 is the subband coding apparatus structural representation based on SILK codec that the embodiment of the present invention five provides;

Fig. 8 is the subband decoding apparatus structure schematic diagram based on SILK codec that the embodiment of the present invention six provides.

Embodiment

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.

Embodiment mono-

The embodiment of the present invention provides a kind of method of coding subband based on SILK codec, and referring to Fig. 1, method flow comprises:

101: obtain the full range time-domain signal that current audio frame is corresponding; And full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;

102: low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to low frequency time-domain signal; And according to low-frequency parameter, high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;

103: low-frequency parameter and high-frequency parameter are quantized to bit stream corresponding to the compression current audio frame of generation.

The embodiment of the present invention is encoded to low frequency signal by SILK scrambler, by high-frequency signal is encoded separately, more bit resource is distributed to low frequency signal, and go high-frequency signal coding by relatively few bit resource, thereby realize more rational bit resource, distribute.Can effectively improve code efficiency, and the harmonic structure in can reserved high-frequency signal, thereby under arranging, identical bit rate obtains better sense of hearing effect.

Embodiment bis-

The embodiment of the present invention provides a kind of subband solutions code method based on SILK codec, and referring to Fig. 2, method flow comprises:

201: obtain the bit stream that current audio frame is corresponding, and by parameter decoder, bit stream decoding is obtained to low-frequency parameter and high-frequency parameter;

202: according to SILK demoder, low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when low-frequency parameter is decoded according to SILK demoder, high-frequency parameter is decoded and obtained high frequency time-domain signal;

203: low frequency time-domain signal and high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and full range time-domain signal is the decoded voice data of current audio frame.

The embodiment of the present invention is by the voice data after low-and high-frequency signal is separately encoded, and the mode of decoding respectively according to low-and high-frequency signal is decoded.By SILK scrambler, separately low-frequency parameter is decoded, more bit resource is distributed to low frequency signal, and retained the harmonic structure in high-frequency parameter, under identical bit rate arranges, obtain better sense of hearing effect.

Embodiment tri-

The embodiment of the present invention provides a kind of method of coding subband based on SILK codec, referring to Fig. 3.Wherein, the structure of this audio coder as shown in Figure 4.

Wherein, the method flow process comprises:

301: the analog to digital converter by digital communication equipment obtains crude sampling digital signal, and divide frame windowing to obtain full range time-domain signal in interval at preset timed intervals it; Obtain the full range time-domain signal that current frame data is corresponding, and this full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal.

Wherein, former is that sampled digital signal is the voice data of certain hour length, after undue frame, obtains the full range time-domain signal that each frame data is corresponding.

In embodiments of the present invention, full range time-domain signal is replicated and is divided into two paths of signals, wherein a road full range time-domain signal sends to QMF(Quadrature mirror filter, quadrature mirror filter bank) resolver unit 401, and full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal; Another road full range time-domain signal sends to the FFT(Fast Fourier Transform in scrambler, Fast Fourier Transform (FFT)) unit 402, by fast fourier transform, convert full range time-domain signal to full range frequency-region signal.

Wherein, the broadband signal that the sampling rate of take is 16KHz is example, and full range time-domain signal s (n) first enters in QMF resolver unit 401,

This QMF analysis filterbank of the process that full range time-domain signal is decomposed is by two 64 symmetrical rank high low pass FIR(Finite Impulse Response, finite impulse response (FIR)) wave filter composition, the impulse response relation between them is as follows:

{H_{l}}_{p} (n) = {(- 1)}^{n} * {h_{h}}_{p} (n)

QMF resolver 201 is decomposed into original signal s (n) the low frequency time-domain signal y of 0-4KHz _lband the high frequency time-domain signal y of 4-8KHz (n) _hb(n).

Wherein, low frequency time-domain signal y _lband high frequency time-domain signal y (n) _hb(n) computing formula is as follows:

y_{lb} (n) = Σ_{i = 0}^{31} {h_{l}}_{p} (i) [S (n + 1 + i) + S (n - i)]

y_{hb} (n) = Σ_{i = 0}^{31} {h_{h}}_{p} (i) [S (n + 1 + i) + S (n - i)]

Further, low frequency time-domain signal y _lb(n) enter the SILK cell encoder 403 of supporting 8KHz sampling, and extract all low-frequency parameters according to the original coded system of SILK, and quantizes to compress and pack in bit stream load.And for high frequency time-domain signal y _hb(n) coding and rebuild and to use more classical source-filter model, high frequency time-domain signal enters LPC(Linear Prediction Coefficients, linear predictor coefficient by high frequency pumping) compositor obtains.Under this model, high-frequency coding need to have three sample essential elements: high-frequency signal injection signal, the high frequency LSP(Line Spectral Pairs of HFS, line spectrum antithetical phrase) coefficient, and high-frequency gain, wherein high-frequency gain is to be multiplied each other and obtain with gain-adjusted ratio by low-frequency gain.

302: described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal.

The mode of wherein, encoding for low frequency time-domain signal is:

3021: described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal.

Wherein, low frequency time-domain signal y _lb(n) in SILK cell encoder 403, encode, generation includes but not limited to: irregular pulse, pitch period and LTP(Long-Term Prediction, long-term prediction) coefficient, the parameters such as low frequency LPC coefficient, pure and impure sound critical parameter and low-frequency gain coefficient are as low-frequency parameter.

Wherein, for high frequency time-domain signal y _hb(n) mode of encoding can be specially:

3022: when current audio frame is unvoiced frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure.

Meanwhile, full range time-domain signal is sent to the FFT(Fast Fourier Transform in scrambler, Fast Fourier Transform (FFT)) unit 402, by fast fourier transform, full range time-domain signal is converted to full range frequency-region signal.Wherein, this frame is that the pure and impure sound critical parameter by SILK scrambler that voiceless sound signal or voiced sound signal judge is determined.

Then full range frequency-region signal and pitch period are input in harmonic structure analyzer module 404, by harmonic structure analyzer 404, analyze the cutoff frequency that obtains harmonic structure.

Its principle is: pitch period has determined the frequency axis position of harmonic wave, and harmonic structure analyzer 404 checks fundamental frequency F by high frequency to low frequency ₀the harmonic amplitude of integer multiple frequency position | Y[m*F ₀] |.By the threshold value δ with default ₁and δ ₂determine the cutoff frequency of harmonic structure.

|Y[m*F ₀]| ²-|Y[(mm1)*F ₀]| ²>δ ₁

|Y[(m+1)*F ₀]| ²<δ ₂

Before meeting, formula has represented to find the obviously transfer point of decay of a harmonic wave, and after meeting, formula has confirmed that follow-up amplitude has been not enough to become effective harmonic wave.By low frequency, worked that to start to find first frequency location that meets above two formulas be the cutoff frequency of harmonic structure.

3023: according to the cutoff frequency of described harmonic structure, judge in described high frequency time-domain signal whether have harmonic structure.

If between 0-4KHz, illustrating HFS, cutoff frequency really there is no harmonic structure, so the modulation that high frequency pumping encourages by low frequency voiceless sound obtains.If cutoff frequency between 4-8KHz, illustrates HFS and also has certain harmonic structure.Wherein, now half of cutoff frequency is defined as to modulating frequency, and imported into simulation high frequency pumping maker unit 413 and be for further processing.While there is harmonic structure in high frequency time-domain signal, perform step 3024, while existing harmonic structure or present frame to be unvoiced frames, do not perform step 3025 in high frequency time-domain signal.

3024: while there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping.

When cutoff frequency is between 4-8KHz, carrying out this step, the embodiment of the present invention encourages by the HFS that the mixing of the excitation of low frequency voiceless sound and the complete excitation of low frequency is obtained with harmonic structure, simulates high frequency pumping.

Half of cutoff frequency is conveyed into the frequency spectrum translation unit 415 of simulation in high frequency pumping maker unit 413 as the modulating frequency calculating.

Wherein, low frequency time-domain signal is carried out also can producing the complete excitation of low frequency and the excitation of low frequency voiceless sound in cataloged procedure in SILK cell encoder 403, from SILK cell encoder 403, receive these two signals and import into simulation high frequency pumping maker unit 413.

Wherein, in step 3024, the process of calculating simulation high frequency pumping can be specially:

30241: according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carry harmonic structure.

This step is that modulating frequency is conveyed in the frequency spectrum translation unit 415 in simulation high frequency pumping maker unit 413, and by the frequency spectrum translation unit 415 in the complete excitation input simulation of low frequency high frequency pumping maker unit 413,0-Ω _mthe complete driver unit shift copy of frequency range is to Ω _m-2 Ω _mfrequency range.It follows following formula:

u _fb(k)=u _lb(k)*(1+ζ*cos(Ω _Mk))

Wherein zoom factor ζ ∈ (1,2) is in order to guarantee signal energy accurately, Ω _mfor modulating frequency.The full range excitation u more than obtaining _fb(k) enter and in high-pass filter unit 406, obtain the first high frequency pumping u _{hb_v}(k), wherein the first high frequency pumping, owing to being to have carried out frequency spectrum translation according to modulating frequency, carries harmonic structure.

30242: the excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, do not carry harmonic structure.

By in the spectrum folding unit 407 in low frequency voiceless sound excitation input simulation high frequency pumping maker unit 413 and time delay alignment unit 408, to obtain the second high frequency pumping u _{hb_uv}(k).Time delay alignment be for compensation high pass filter band time delay.

The theoretical step that spectrum folding obtains the second high frequency pumping is as follows: by low frequency voiceless sound excitation u _lb(k) up-sampling, is converted to full range excitation u by following formula _fb(k), through high-pass filtering, obtain high frequency pumping u _hb(k).

u _fb(k)=u _lb(k)*(1+(-1) ^k)

Due to the singularity of spectrum folding, above step obtains high frequency pumping and is equal to low frequency signal is directly got to negative sign.

u _hb(k)=-u _lb(k)

30243: according to default mixing constant corresponding to described the first high frequency pumping, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.

Final simulation high frequency pumping u _hb(k) by mixing constant α ∈ (0,1), by following formula, mixed, be wherein respectively the first high frequency pumping and the second high frequency pumping arranges corresponding mixing constant, these two mixing constants be combined into 1.

u _hb(k)=α*u _{hb_v}(k)+(1-α)*u _{hb_uv}(k)

Directly do not adopt the first high frequency pumping u _{hb_v}(k) as last high frequency pumping, there are two reasons:

1. the harmonic structure being obtained by frequency spectrum translation has covered 0-2 Ω _mfrequency range, and at 2 Ω _m-8KHz frequency range need to be mixed some voiceless sound pumping signals;

2., if use different excitation producing methods for different clear unvoiced frames and different cutoff frequencys, before and after may causing, frame is discontinuous and affect sense of hearing.

3025: while not existing harmonic structure or current audio frame to be unvoiced frames in described high frequency time-domain signal, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.

3026: by described high frequency time-domain signal input linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio.

The LSP coefficient of HFS is directly inputed in LPC analyzer unit 409 and is calculated and get by high frequency time-domain signal, and its computing method are as follows:

First according to linear prediction model, current sample x (n) can be with past P sample x (n-i) by different weight a _ilinear superposition is as shown in the formula forming:

x (n) = Σ_{i = 1}^{P} a_{i} x (n - i) + e (n) - - - (1)

Wherein e (n) is predicated error, and the output residual signals of LPC analyzer unit 409 is true high frequency pumping.

Predictive coefficient { a ₁, a ₂..., a _pbe high frequency LSP coefficient, can obtain by separating following formula normal equations:

Wherein the computing method of coefficient of autocorrelation r (i) are

r (i) = Σ_{n = 0}^{N - 1 - i} x (n) * x (n + i) - - - (3)

The above-mentioned process that calculates true high frequency pumping and high frequency LSP coefficient is that elder generation calculates sub-correlation coefficient r (i) according to formula (3), by formula (2), is calculating predictive coefficient { a ₁, a ₂..., a _p, i.e. high frequency LPC coefficient, finally according to formula (1), calculating e (n) is true high frequency pumping.

In actual applications, linear predictor coefficient can solve efficiently by Lai Wenxun-Du Bin (Levinson-Durbin) recurrence method.In addition,, owing to having better robustness, what in quantizing and transmitting, conventionally use is every group of LPC coefficient LSP coefficient accordingly.

Computation process for gain-adjusted ratio is as follows:

The gain-adjusted of HFS is than being mainly used in the high frequency pumping of bucking-out system model generation and the capacity volume variance between true high frequency pumping.In embodiments of the present invention, the simulation high frequency pumping being produced by simulation high frequency pumping maker unit 413 enters root mean square calculator unit 410, and circular is followed following formula:

u_{rms} = \sqrt{\frac{Σ_{k = 0}^{N} u_{hb} {(k)}^{2}}{N}}

Similarly, high frequency time-domain signal enters LPC analyzer unit 409 and obtains residual signals, that is in decoding end inverse operation, enters the true high-frequency excitation signal before LPC compositor.This signal enters root mean square calculator unit 412.Simulation and true high frequency pumping calculate the root mean square getting and enter respectively ratio of gains unit calculator 411, and true high frequency pumping is divided by simulating high frequency pumping and doing the gain-adjusted ratio that threshold restriction obtains passing to decoding end.This gain-adjusted is than being applied to all high-frequency excitation signal samples of decoding end, in order to adjust the energy of the true high-frequency signal of coupling.

3027: by described modulating frequency, described gain-adjusted ratio and described high frequency LSP parameter identification, be high-frequency parameter corresponding to described high frequency time-domain signal.

The parameter of embodiment of the present invention coding side also has three except original low-frequency parameter: modulating frequency, gain-adjusted ratio and high frequency LSP coefficient.

303: described low-frequency parameter and described high-frequency parameter are quantized to bit stream corresponding to the compression described current audio frame of generation.

Embodiment tetra-

The embodiment of the present invention provides a kind of method of the subband decoding based on SILK codec, referring to Fig. 5.Wherein, the structure of this audio decoder as shown in Figure 6.

Wherein, the method flow process comprises:

501: obtain the bit stream that current audio frame is corresponding, and by parameter decoder, described bit stream decoding is obtained to low-frequency parameter and high-frequency parameter.

Audio decoder termination is received voice packet bit stream 601, and is inputed in the parameter decoder unit 602 in audio decoder, the decoding parametric that output disparate modules needs.

Wherein low-frequency parameter includes but not limited to: irregular pulse, pitch period, LTP coefficient, low frequency LPC coefficient, the parameters such as pure and impure sound critical parameter and low-frequency gain coefficient.

502: according to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal.

Wherein, for the low-frequency parameter process that obtains low frequency time-domain signal of decoding, be:

5021: according to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal.

First parameter decoder unit 602 solves the quantization index of low frequency voiceless sound driver unit, in order to calculate the irregular pulse signal in SILK.Then the bass voiceless sound that encourages maker unit 603 to generate low frequency part by voiceless sound encourages.Next according to the situation of pure and impure sound, judge whether to enter LTP compositor unit 604.

If this frame is voiced sound signal, by parameter decoder unit 602, solve pitch period and LTP coefficient, input cyclical signal LTP compositor unit 604 generates low frequency voiced sound and partly encourages, and the excitation of low frequency voiceless sound is added and obtains complete low-frequency excitation with the excitation of low frequency voiced sound.The complete excitation of low frequency finally enters LPC compositor and obtains last low frequency time-domain signal.

If this frame is voiceless sound signal, skip cycle signal synthesizer unit 604 directly enters LPC compositor unit 605 and generates low frequency time-domain signal.Wherein, this frame is that the pure and impure sound critical parameter by SILK scrambler that voiceless sound signal or voiced sound signal judge is determined.

SILK low frequency decoder element 612 is consistent with SILK decoder functions principle in embodiments of the present invention.

Wherein, for the high-frequency parameter process that obtains high frequency time-domain signal of decoding, be:

5022: according to the modulating frequency in described high-frequency parameter, judge in described current audio frame whether have harmonic structure.

According to whether having harmonic structure in the chirp parameter audio data in the high-frequency parameter obtaining in parameter decoder unit 602.Wherein, when modulating frequency is during at 0-2KHz, determine and do not have harmonic structure, execution step 5023; When modulating frequency is during at 2-4KHz, determine and have harmonic structure, execution step 5024.

5023: when there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound; According to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;

Concrete, the process of step 5023 can be specially:

50231: according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carry harmonic structure.

Wherein, the complete excitation of low frequency of output in LTP compositor unit 603 is inputed to the frequency spectrum translation unit 608 in high frequency decoder element 613.And the chirp parameter in the high-frequency parameter obtaining in parameter decoder unit 602 is inputed in the frequency spectrum translation unit 608 in high frequency decoder element 613.The full range excitation obtaining in frequency spectrum translation unit 608 is entered and in high-pass filter unit 609, obtains the 4th high frequency pumping, and 30241 identical as in embodiment bis-of the computation process relating in this step, does not repeat them here.

50232: the excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, do not carry harmonic structure.

Encourage the low frequency voiceless sound excitation of output in maker unit 603 to input in the spectrum folding unit 606 and time delay alignment unit 607 in high frequency decoder element 613 voiceless sound.30242 identical as in embodiment bis-of concrete computation process, does not repeat them here.

50233: according to default mixing constant corresponding to described the 4th high frequency pumping, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.

30243 identical as in embodiment bis-of the concrete computation process of this step, does not repeat them here.

5024: when described current audio frame does not exist harmonic structure, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping.

3025 identical as in embodiment bis-of concrete computation process, does not repeat them here.

5025: by the high frequency LPC coefficient in described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.

By the high frequency LPC coefficient in the high-frequency parameter obtaining in parameter decoder unit 602 and gain-adjusted ratio, and the simulation high frequency pumping calculating in step 5023 inputs in LPC compositor unit 610, synthetic high frequency time-domain signal.

503: described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and described full range time-domain signal is the decoded voice data of described current audio frame.

Embodiment five

The embodiment of the present invention provides a kind of device of the sub-band coding based on SILK codec, referring to Fig. 7.This device comprises:

The first acquisition module 701, for obtaining the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;

Coding module 702, processes for described low frequency time-domain signal being carried out to SILK coding, generates low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;

Generation module 703, for quantizing bit stream corresponding to the compression described current audio frame of generation by described low-frequency parameter and described high-frequency parameter.

Wherein, described coding module 702, comprising:

Wherein, described the second computing unit, comprising:

Wherein, described coding module 702 also comprises:

Embodiment six

The embodiment of the present invention provides a kind of device of the subband decoding based on SILK codec, referring to Fig. 8.This device comprises:

The second acquisition module 801, for obtaining the bit stream that current audio frame is corresponding, and obtains low-frequency parameter and high-frequency parameter by parameter decoder to described bit stream decoding;

Decoder module 802, for decoding and obtain low frequency time-domain signal described low-frequency parameter according to SILK demoder; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;

Synthesis module 803, for described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, described full range time-domain signal is the decoded voice data of described current audio frame.

Wherein, described decoder module 802, comprising:

The 5th computing unit, for when there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;

Wherein, described the 5th computing unit, comprising:

Wherein, described decoder module 802 also comprises:

The 6th computing unit, for when there is not harmonic structure in described current audio frame, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping;

The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.

One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can complete by hardware, also can come the hardware that instruction is relevant to complete by program, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the method for coding subband based on SILK codec, is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that, described and according to described full range time-domain signal and described high frequency time-domain signal, encodes to process to generate high-frequency parameter corresponding to high frequency time-domain signal, comprising:

3. method according to claim 2, it is characterized in that, describedly according to described low frequency time-domain signal, carry out the complete excitation of low frequency that obtains when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping, comprising:

4. method according to claim 2, is characterized in that, described judge in described full range time-domain signal, whether there is harmonic structure after, described method also comprises:

5. the subband solutions code method based on SILK codec, is characterized in that, described method comprises:

6. method according to claim 5, is characterized in that, the described intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, decodes and obtain high frequency time-domain signal described high-frequency parameter, comprising:

When there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;

7. method according to claim 6, is characterized in that, described according to the complete excitation of described low frequency, and described low frequency voiceless sound excitation and described modulating frequency are calculated simulation high frequency pumping, comprising:

8. method according to claim 6, is characterized in that, described judge in described current audio frame, whether there is harmonic structure after, described method also comprises:

When there is not harmonic structure in described current audio frame, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping.

9. the subband coding apparatus based on SILK codec, is characterized in that, described device comprises:

10. device according to claim 9, is characterized in that, described coding module, comprising:

11. devices according to claim 10, is characterized in that, described the second computing unit, comprising:

12. devices according to claim 10, is characterized in that, described coding module also comprises:

13. 1 kinds of subband decoding devices based on SILK codec, is characterized in that, described device comprises:

14. devices according to claim 13, is characterized in that, described decoder module, comprising:

15. devices according to claim 14, is characterized in that, described the 5th computing unit, comprising:

16. devices according to claim 14, is characterized in that, described decoder module also comprises:

The 6th computing unit, for when there is not harmonic structure in described current audio frame, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping.