CN103714822A - Sub-band coding and decoding method and device based on SILK coder decoder - Google Patents

Sub-band coding and decoding method and device based on SILK coder decoder Download PDF

Info

Publication number
CN103714822A
CN103714822A CN201310740505.7A CN201310740505A CN103714822A CN 103714822 A CN103714822 A CN 103714822A CN 201310740505 A CN201310740505 A CN 201310740505A CN 103714822 A CN103714822 A CN 103714822A
Authority
CN
China
Prior art keywords
frequency
high frequency
domain signal
time
pumping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310740505.7A
Other languages
Chinese (zh)
Other versions
CN103714822B (en
Inventor
陈若非
高泽华
邢世义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cubesili Information Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN201310740505.7A priority Critical patent/CN103714822B/en
Publication of CN103714822A publication Critical patent/CN103714822A/en
Application granted granted Critical
Publication of CN103714822B publication Critical patent/CN103714822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a sub-band coding and decoding method and device based on an SILK coder decoder, and belongs to the field of audio coding and decoding. The sub-band coding and decoding method comprises the steps of obtaining a full-frequency time-domain signal corresponding to a current audio frame, dividing the full-frequency time-domain signal into a low-frequency time-domain signal and a high-frequency time-domain signal, carrying out SILK coding processing on the low-frequency time-domain signal to generate a low-frequency parameter corresponding to the low-frequency time-domain signal, coding the high-frequency time-domain signal according to the low-frequency parameter to generate a high-frequency parameter corresponding to the high-frequency time-domain signal, and compressing the low-frequency parameter and the high-frequency parameter quantitatively to generate a bit stream corresponding to the current audio frame. More bit resources are distributed to the low-frequency signal, the high-frequency signal is coded through relatively few bit resources, and therefore more reasonable distribution of the bit resources is achieved. The coding efficiency can be improved effectively, the harmonic wave structure in the high-frequency signal can be kept, and a better listening effect is achieved under the same setting of the bit rate.

Description

Subband decoding method and device based on SILK codec
Technical field
The present invention relates to audio coding decoding field, particularly a kind of subband decoding method and device based on SILK codec.
Background technology
Along with the development of internet, the demand of speech communication constantly increases, the VOIP(Voice Over Internet Protocol based on voice packet exchange) technology is with its low cost, easily expand and good speech quality is more and more subject to user's favor.
In VOIP technology, relatively the coded system of main flow is SILK coding, its coded system is: at coding side, voice signal is carried out to modeling, by speech model, signal is disassembled into different systematic parameters, these parameters are reached to decoding end by channel, demoder solves correlation parameter, then according to identical speech model, recovers voice signal.
In realizing process of the present invention, inventor finds that prior art at least exists following problem:
In the sound sending people, high-frequency signal does not have low frequency signal abundant conventionally, SILK scrambler is respectively low-and high-frequency signal to be processed according to default bit resource, so position distribution is efficient not when compiling broadband voice, makes the Efficiency Decreasing to low-frequency signal processing.
Summary of the invention
In order to solve the problem of prior art, the embodiment of the present invention provides a kind of subband decoding method and device based on SILK codec.Described technical scheme is as follows:
On the one hand, provide a kind of method of coding subband based on SILK codec, described method comprises:
Obtain the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;
Described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;
Described low-frequency parameter and described high-frequency parameter are quantized to bit stream corresponding to the compression described current audio frame of generation.
Preferably, described and according to described full range time-domain signal and described high frequency time-domain signal, encoding to process generates high-frequency parameter corresponding to high frequency time-domain signal, comprising:
When described current audio frame is unvoiced frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;
According to the cutoff frequency of described harmonic structure, judge in described high frequency time-domain signal whether have harmonic structure;
While there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;
By described high frequency time-domain signal input linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;
By described modulating frequency, described gain-adjusted ratio and described high frequency LSP parameter identification, be high-frequency parameter corresponding to described high frequency time-domain signal.
Preferably, describedly according to described low frequency time-domain signal, carry out the complete excitation of low frequency that obtains when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping, comprising:
According to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carry harmonic structure;
The excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, do not carry harmonic structure;
According to default mixing constant corresponding to described the first high frequency pumping, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Preferably, described judge in described full range time-domain signal, whether there is harmonic structure after, described method also comprises:
While not existing harmonic structure or described current audio frame to be unvoiced frames in described high frequency time-domain signal, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.
On the other hand, provide a kind of subband solutions code method based on SILK codec, described method comprises:
Obtain the bit stream that current audio frame is corresponding, and by parameter decoder, described bit stream decoding is obtained to low-frequency parameter and high-frequency parameter;
According to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;
Described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and described full range time-domain signal is the decoded voice data of described current audio frame.
Preferably, the described intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, decodes and obtains high frequency time-domain signal described high-frequency parameter, comprising:
According to the modulating frequency in described high-frequency parameter, judge in described current audio frame whether have harmonic structure;
When there is harmonic structure in described voice data, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;
By the high frequency LPC coefficient in described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.
Preferably, described according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping, comprising:
According to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carry harmonic structure;
The excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, do not carry harmonic structure;
According to default mixing constant corresponding to described the 4th high frequency pumping, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Preferably, described judge in described current audio frame, whether there is harmonic structure after, described method also comprises:
When there is not harmonic structure in described voice data, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping;
On the other hand, provide a kind of subband coding apparatus based on SILK codec, described device comprises:
The first acquisition module, for obtaining the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;
Coding module, processes for described low frequency time-domain signal being carried out to SILK coding, generates low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;
Generation module, for quantizing bit stream corresponding to the compression described current audio frame of generation by described low-frequency parameter and described high-frequency parameter.
Preferably, described coding module, comprising:
The first computing unit, while being unvoiced frame for described current audio frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;
The first judging unit, for according to the cutoff frequency of described harmonic structure, judges in described high frequency time-domain signal whether have harmonic structure;
The second computing unit, for when there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;
The 3rd computing unit, for described high frequency time-domain signal is inputted to linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;
Determining unit, for by described modulating frequency, described gain-adjusted than and described high frequency LSP parameter identification be high-frequency parameter corresponding to described high frequency time-domain signal.
Preferably, described the second computing unit, comprising:
First processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carries harmonic structure;
Second processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, does not carry harmonic structure;
The first computation subunit, for default mixing constant corresponding to described the first high frequency pumping of basis, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Preferably, described coding module also comprises:
The 4th computing unit, for when described high frequency time-domain signal does not exist harmonic structure or described current audio frame to be unvoiced frames, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.
On the other hand, provide a kind of subband decoding device based on SILK codec, described device comprises:
The second acquisition module, for obtaining the bit stream that current audio frame is corresponding, and obtains low-frequency parameter and high-frequency parameter by parameter decoder to described bit stream decoding;
Decoder module, for decoding and obtain low frequency time-domain signal described low-frequency parameter according to SILK demoder; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;
Synthesis module, for described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, described full range time-domain signal is the decoded voice data of described current audio frame.
Preferably, described decoder module, comprising:
The second judging unit, for according to the modulating frequency of described high-frequency parameter, judges in described current audio frame whether have harmonic structure;
The 5th computing unit, for when there is harmonic structure in described voice data, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;
Synthesis unit, for by the high frequency LPC coefficient of described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.
Preferably, described the 5th computing unit, comprising:
The 3rd processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carries harmonic structure;
The 4th processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, does not carry harmonic structure;
The second computation subunit, for default mixing constant corresponding to described the 4th high frequency pumping of basis, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Preferably, described decoder module also comprises:
The 6th computing unit, for when there is not harmonic structure in described voice data, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping;
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
By SILK scrambler, low frequency signal is encoded, by high-frequency signal is encoded separately, more bit resource is distributed to low frequency signal, and go high-frequency signal coding by relatively few bit resource, thereby realize more rational bit resource, distribute.Can effectively improve code efficiency, and the harmonic structure in can reserved high-frequency signal, thereby under arranging, identical bit rate obtains better sense of hearing effect.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing of required use during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the method for coding subband process flow diagram based on SILK codec that the embodiment of the present invention one provides;
Fig. 2 is the subband solutions code method process flow diagram based on SILK codec that the embodiment of the present invention two provides;
Fig. 3 is the method for coding subband process flow diagram based on SILK codec that the embodiment of the present invention three provides;
Fig. 4 is the structural drawing of scrambler in the subband solutions code method based on SILK codec that provides of the embodiment of the present invention three;
Fig. 5 is the method for coding subband process flow diagram based on SILK codec that the embodiment of the present invention four provides;
Fig. 6 is the structural drawing of demoder in the subband solutions code method based on SILK codec that provides of the embodiment of the present invention four;
Fig. 7 is the subband coding apparatus structural representation based on SILK codec that the embodiment of the present invention five provides;
Fig. 8 is the subband decoding apparatus structure schematic diagram based on SILK codec that the embodiment of the present invention six provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment mono-
The embodiment of the present invention provides a kind of method of coding subband based on SILK codec, and referring to Fig. 1, method flow comprises:
101: obtain the full range time-domain signal that current audio frame is corresponding; And full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;
102: low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to low frequency time-domain signal; And according to low-frequency parameter, high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;
103: low-frequency parameter and high-frequency parameter are quantized to bit stream corresponding to the compression current audio frame of generation.
The embodiment of the present invention is encoded to low frequency signal by SILK scrambler, by high-frequency signal is encoded separately, more bit resource is distributed to low frequency signal, and go high-frequency signal coding by relatively few bit resource, thereby realize more rational bit resource, distribute.Can effectively improve code efficiency, and the harmonic structure in can reserved high-frequency signal, thereby under arranging, identical bit rate obtains better sense of hearing effect.
Embodiment bis-
The embodiment of the present invention provides a kind of subband solutions code method based on SILK codec, and referring to Fig. 2, method flow comprises:
201: obtain the bit stream that current audio frame is corresponding, and by parameter decoder, bit stream decoding is obtained to low-frequency parameter and high-frequency parameter;
202: according to SILK demoder, low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when low-frequency parameter is decoded according to SILK demoder, high-frequency parameter is decoded and obtained high frequency time-domain signal;
203: low frequency time-domain signal and high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and full range time-domain signal is the decoded voice data of current audio frame.
The embodiment of the present invention is by the voice data after low-and high-frequency signal is separately encoded, and the mode of decoding respectively according to low-and high-frequency signal is decoded.By SILK scrambler, separately low-frequency parameter is decoded, more bit resource is distributed to low frequency signal, and retained the harmonic structure in high-frequency parameter, under identical bit rate arranges, obtain better sense of hearing effect.
Embodiment tri-
The embodiment of the present invention provides a kind of method of coding subband based on SILK codec, referring to Fig. 3.Wherein, the structure of this audio coder as shown in Figure 4.
Wherein, the method flow process comprises:
301: the analog to digital converter by digital communication equipment obtains crude sampling digital signal, and divide frame windowing to obtain full range time-domain signal in interval at preset timed intervals it; Obtain the full range time-domain signal that current frame data is corresponding, and this full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal.
Wherein, former is that sampled digital signal is the voice data of certain hour length, after undue frame, obtains the full range time-domain signal that each frame data is corresponding.
In embodiments of the present invention, full range time-domain signal is replicated and is divided into two paths of signals, wherein a road full range time-domain signal sends to QMF(Quadrature mirror filter, quadrature mirror filter bank) resolver unit 401, and full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal; Another road full range time-domain signal sends to the FFT(Fast Fourier Transform in scrambler, Fast Fourier Transform (FFT)) unit 402, by fast fourier transform, convert full range time-domain signal to full range frequency-region signal.
Wherein, the broadband signal that the sampling rate of take is 16KHz is example, and full range time-domain signal s (n) first enters in QMF resolver unit 401,
This QMF analysis filterbank of the process that full range time-domain signal is decomposed is by two 64 symmetrical rank high low pass FIR(Finite Impulse Response, finite impulse response (FIR)) wave filter composition, the impulse response relation between them is as follows:
H l p ( n ) = ( - 1 ) n * h h p ( n )
QMF resolver 201 is decomposed into original signal s (n) the low frequency time-domain signal y of 0-4KHz lband the high frequency time-domain signal y of 4-8KHz (n) hb(n).
Wherein, low frequency time-domain signal y lband high frequency time-domain signal y (n) hb(n) computing formula is as follows:
y lb ( n ) = Σ i = 0 31 h l p ( i ) [ S ( n + 1 + i ) + S ( n - i ) ]
y hb ( n ) = Σ i = 0 31 h h p ( i ) [ S ( n + 1 + i ) + S ( n - i ) ]
Further, low frequency time-domain signal y lb(n) enter the SILK cell encoder 403 of supporting 8KHz sampling, and extract all low-frequency parameters according to the original coded system of SILK, and quantizes to compress and pack in bit stream load.And for high frequency time-domain signal y hb(n) coding and rebuild and to use more classical source-filter model, high frequency time-domain signal enters LPC(Linear Prediction Coefficients, linear predictor coefficient by high frequency pumping) compositor obtains.Under this model, high-frequency coding need to have three sample essential elements: high-frequency signal injection signal, the high frequency LSP(Line Spectral Pairs of HFS, line spectrum antithetical phrase) coefficient, and high-frequency gain, wherein high-frequency gain is to be multiplied each other and obtain with gain-adjusted ratio by low-frequency gain.
302: described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal.
The mode of wherein, encoding for low frequency time-domain signal is:
3021: described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal.
Wherein, low frequency time-domain signal y lb(n) in SILK cell encoder 403, encode, generation includes but not limited to: irregular pulse, pitch period and LTP(Long-Term Prediction, long-term prediction) coefficient, the parameters such as low frequency LPC coefficient, pure and impure sound critical parameter and low-frequency gain coefficient are as low-frequency parameter.
Wherein, for high frequency time-domain signal y hb(n) mode of encoding can be specially:
3022: when current audio frame is unvoiced frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure.
Meanwhile, full range time-domain signal is sent to the FFT(Fast Fourier Transform in scrambler, Fast Fourier Transform (FFT)) unit 402, by fast fourier transform, full range time-domain signal is converted to full range frequency-region signal.Wherein, this frame is that the pure and impure sound critical parameter by SILK scrambler that voiceless sound signal or voiced sound signal judge is determined.
Then full range frequency-region signal and pitch period are input in harmonic structure analyzer module 404, by harmonic structure analyzer 404, analyze the cutoff frequency that obtains harmonic structure.
Its principle is: pitch period has determined the frequency axis position of harmonic wave, and harmonic structure analyzer 404 checks fundamental frequency F by high frequency to low frequency 0the harmonic amplitude of integer multiple frequency position | Y[m*F 0] |.By the threshold value δ with default 1and δ 2determine the cutoff frequency of harmonic structure.
|Y[m*F 0]| 2-|Y[(mm1)*F 0]| 21
|Y[(m+1)*F 0]| 22
Before meeting, formula has represented to find the obviously transfer point of decay of a harmonic wave, and after meeting, formula has confirmed that follow-up amplitude has been not enough to become effective harmonic wave.By low frequency, worked that to start to find first frequency location that meets above two formulas be the cutoff frequency of harmonic structure.
3023: according to the cutoff frequency of described harmonic structure, judge in described high frequency time-domain signal whether have harmonic structure.
If between 0-4KHz, illustrating HFS, cutoff frequency really there is no harmonic structure, so the modulation that high frequency pumping encourages by low frequency voiceless sound obtains.If cutoff frequency between 4-8KHz, illustrates HFS and also has certain harmonic structure.Wherein, now half of cutoff frequency is defined as to modulating frequency, and imported into simulation high frequency pumping maker unit 413 and be for further processing.While there is harmonic structure in high frequency time-domain signal, perform step 3024, while existing harmonic structure or present frame to be unvoiced frames, do not perform step 3025 in high frequency time-domain signal.
3024: while there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping.
When cutoff frequency is between 4-8KHz, carrying out this step, the embodiment of the present invention encourages by the HFS that the mixing of the excitation of low frequency voiceless sound and the complete excitation of low frequency is obtained with harmonic structure, simulates high frequency pumping.
Half of cutoff frequency is conveyed into the frequency spectrum translation unit 415 of simulation in high frequency pumping maker unit 413 as the modulating frequency calculating.
Wherein, low frequency time-domain signal is carried out also can producing the complete excitation of low frequency and the excitation of low frequency voiceless sound in cataloged procedure in SILK cell encoder 403, from SILK cell encoder 403, receive these two signals and import into simulation high frequency pumping maker unit 413.
Wherein, in step 3024, the process of calculating simulation high frequency pumping can be specially:
30241: according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carry harmonic structure.
This step is that modulating frequency is conveyed in the frequency spectrum translation unit 415 in simulation high frequency pumping maker unit 413, and by the frequency spectrum translation unit 415 in the complete excitation input simulation of low frequency high frequency pumping maker unit 413,0-Ω mthe complete driver unit shift copy of frequency range is to Ω m-2 Ω mfrequency range.It follows following formula:
u fb(k)=u lb(k)*(1+ζ*cos(Ω Mk))
Wherein zoom factor ζ ∈ (1,2) is in order to guarantee signal energy accurately, Ω mfor modulating frequency.The full range excitation u more than obtaining fb(k) enter and in high-pass filter unit 406, obtain the first high frequency pumping u hb_v(k), wherein the first high frequency pumping, owing to being to have carried out frequency spectrum translation according to modulating frequency, carries harmonic structure.
30242: the excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, do not carry harmonic structure.
By in the spectrum folding unit 407 in low frequency voiceless sound excitation input simulation high frequency pumping maker unit 413 and time delay alignment unit 408, to obtain the second high frequency pumping u hb_uv(k).Time delay alignment be for compensation high pass filter band time delay.
The theoretical step that spectrum folding obtains the second high frequency pumping is as follows: by low frequency voiceless sound excitation u lb(k) up-sampling, is converted to full range excitation u by following formula fb(k), through high-pass filtering, obtain high frequency pumping u hb(k).
u fb(k)=u lb(k)*(1+(-1) k)
Due to the singularity of spectrum folding, above step obtains high frequency pumping and is equal to low frequency signal is directly got to negative sign.
u hb(k)=-u lb(k)
30243: according to default mixing constant corresponding to described the first high frequency pumping, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Final simulation high frequency pumping u hb(k) by mixing constant α ∈ (0,1), by following formula, mixed, be wherein respectively the first high frequency pumping and the second high frequency pumping arranges corresponding mixing constant, these two mixing constants be combined into 1.
u hb(k)=α*u hb_v(k)+(1-α)*u hb_uv(k)
Directly do not adopt the first high frequency pumping u hb_v(k) as last high frequency pumping, there are two reasons:
1. the harmonic structure being obtained by frequency spectrum translation has covered 0-2 Ω mfrequency range, and at 2 Ω m-8KHz frequency range need to be mixed some voiceless sound pumping signals;
2., if use different excitation producing methods for different clear unvoiced frames and different cutoff frequencys, before and after may causing, frame is discontinuous and affect sense of hearing.
3025: while not existing harmonic structure or current audio frame to be unvoiced frames in described high frequency time-domain signal, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.
3026: by described high frequency time-domain signal input linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio.
The LSP coefficient of HFS is directly inputed in LPC analyzer unit 409 and is calculated and get by high frequency time-domain signal, and its computing method are as follows:
First according to linear prediction model, current sample x (n) can be with past P sample x (n-i) by different weight a ilinear superposition is as shown in the formula forming:
x ( n ) = Σ i = 1 P a i x ( n - i ) + e ( n ) - - - ( 1 )
Wherein e (n) is predicated error, and the output residual signals of LPC analyzer unit 409 is true high frequency pumping.
Predictive coefficient { a 1, a 2..., a pbe high frequency LSP coefficient, can obtain by separating following formula normal equations:
Figure BDA0000448445990000122
Wherein the computing method of coefficient of autocorrelation r (i) are
r ( i ) = Σ n = 0 N - 1 - i x ( n ) * x ( n + i ) - - - ( 3 )
The above-mentioned process that calculates true high frequency pumping and high frequency LSP coefficient is that elder generation calculates sub-correlation coefficient r (i) according to formula (3), by formula (2), is calculating predictive coefficient { a 1, a 2..., a p, i.e. high frequency LPC coefficient, finally according to formula (1), calculating e (n) is true high frequency pumping.
In actual applications, linear predictor coefficient can solve efficiently by Lai Wenxun-Du Bin (Levinson-Durbin) recurrence method.In addition,, owing to having better robustness, what in quantizing and transmitting, conventionally use is every group of LPC coefficient LSP coefficient accordingly.
Computation process for gain-adjusted ratio is as follows:
The gain-adjusted of HFS is than being mainly used in the high frequency pumping of bucking-out system model generation and the capacity volume variance between true high frequency pumping.In embodiments of the present invention, the simulation high frequency pumping being produced by simulation high frequency pumping maker unit 413 enters root mean square calculator unit 410, and circular is followed following formula:
u rms = Σ k = 0 N u hb ( k ) 2 N
Similarly, high frequency time-domain signal enters LPC analyzer unit 409 and obtains residual signals, that is in decoding end inverse operation, enters the true high-frequency excitation signal before LPC compositor.This signal enters root mean square calculator unit 412.Simulation and true high frequency pumping calculate the root mean square getting and enter respectively ratio of gains unit calculator 411, and true high frequency pumping is divided by simulating high frequency pumping and doing the gain-adjusted ratio that threshold restriction obtains passing to decoding end.This gain-adjusted is than being applied to all high-frequency excitation signal samples of decoding end, in order to adjust the energy of the true high-frequency signal of coupling.
3027: by described modulating frequency, described gain-adjusted ratio and described high frequency LSP parameter identification, be high-frequency parameter corresponding to described high frequency time-domain signal.
The parameter of embodiment of the present invention coding side also has three except original low-frequency parameter: modulating frequency, gain-adjusted ratio and high frequency LSP coefficient.
303: described low-frequency parameter and described high-frequency parameter are quantized to bit stream corresponding to the compression described current audio frame of generation.
The embodiment of the present invention is encoded to low frequency signal by SILK scrambler, by high-frequency signal is encoded separately, more bit resource is distributed to low frequency signal, and go high-frequency signal coding by relatively few bit resource, thereby realize more rational bit resource, distribute.Can effectively improve code efficiency, and the harmonic structure in can reserved high-frequency signal, thereby under arranging, identical bit rate obtains better sense of hearing effect.
Embodiment tetra-
The embodiment of the present invention provides a kind of method of the subband decoding based on SILK codec, referring to Fig. 5.Wherein, the structure of this audio decoder as shown in Figure 6.
Wherein, the method flow process comprises:
501: obtain the bit stream that current audio frame is corresponding, and by parameter decoder, described bit stream decoding is obtained to low-frequency parameter and high-frequency parameter.
Audio decoder termination is received voice packet bit stream 601, and is inputed in the parameter decoder unit 602 in audio decoder, the decoding parametric that output disparate modules needs.
Wherein low-frequency parameter includes but not limited to: irregular pulse, pitch period, LTP coefficient, low frequency LPC coefficient, the parameters such as pure and impure sound critical parameter and low-frequency gain coefficient.
502: according to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal.
Wherein, for the low-frequency parameter process that obtains low frequency time-domain signal of decoding, be:
5021: according to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal.
First parameter decoder unit 602 solves the quantization index of low frequency voiceless sound driver unit, in order to calculate the irregular pulse signal in SILK.Then the bass voiceless sound that encourages maker unit 603 to generate low frequency part by voiceless sound encourages.Next according to the situation of pure and impure sound, judge whether to enter LTP compositor unit 604.
If this frame is voiced sound signal, by parameter decoder unit 602, solve pitch period and LTP coefficient, input cyclical signal LTP compositor unit 604 generates low frequency voiced sound and partly encourages, and the excitation of low frequency voiceless sound is added and obtains complete low-frequency excitation with the excitation of low frequency voiced sound.The complete excitation of low frequency finally enters LPC compositor and obtains last low frequency time-domain signal.
If this frame is voiceless sound signal, skip cycle signal synthesizer unit 604 directly enters LPC compositor unit 605 and generates low frequency time-domain signal.Wherein, this frame is that the pure and impure sound critical parameter by SILK scrambler that voiceless sound signal or voiced sound signal judge is determined.
SILK low frequency decoder element 612 is consistent with SILK decoder functions principle in embodiments of the present invention.
Wherein, for the high-frequency parameter process that obtains high frequency time-domain signal of decoding, be:
5022: according to the modulating frequency in described high-frequency parameter, judge in described current audio frame whether have harmonic structure.
According to whether having harmonic structure in the chirp parameter audio data in the high-frequency parameter obtaining in parameter decoder unit 602.Wherein, when modulating frequency is during at 0-2KHz, determine and do not have harmonic structure, execution step 5023; When modulating frequency is during at 2-4KHz, determine and have harmonic structure, execution step 5024.
5023: when there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound; According to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;
Concrete, the process of step 5023 can be specially:
50231: according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carry harmonic structure.
Wherein, the complete excitation of low frequency of output in LTP compositor unit 603 is inputed to the frequency spectrum translation unit 608 in high frequency decoder element 613.And the chirp parameter in the high-frequency parameter obtaining in parameter decoder unit 602 is inputed in the frequency spectrum translation unit 608 in high frequency decoder element 613.The full range excitation obtaining in frequency spectrum translation unit 608 is entered and in high-pass filter unit 609, obtains the 4th high frequency pumping, and 30241 identical as in embodiment bis-of the computation process relating in this step, does not repeat them here.
50232: the excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, do not carry harmonic structure.
Encourage the low frequency voiceless sound excitation of output in maker unit 603 to input in the spectrum folding unit 606 and time delay alignment unit 607 in high frequency decoder element 613 voiceless sound.30242 identical as in embodiment bis-of concrete computation process, does not repeat them here.
50233: according to default mixing constant corresponding to described the 4th high frequency pumping, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
30243 identical as in embodiment bis-of the concrete computation process of this step, does not repeat them here.
5024: when described current audio frame does not exist harmonic structure, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping.
3025 identical as in embodiment bis-of concrete computation process, does not repeat them here.
5025: by the high frequency LPC coefficient in described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.
By the high frequency LPC coefficient in the high-frequency parameter obtaining in parameter decoder unit 602 and gain-adjusted ratio, and the simulation high frequency pumping calculating in step 5023 inputs in LPC compositor unit 610, synthetic high frequency time-domain signal.
503: described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and described full range time-domain signal is the decoded voice data of described current audio frame.
The embodiment of the present invention is by the voice data after low-and high-frequency signal is separately encoded, and the mode of decoding respectively according to low-and high-frequency signal is decoded.By SILK scrambler, separately low-frequency parameter is decoded, more bit resource is distributed to low frequency signal, and retained the harmonic structure in high-frequency parameter, under identical bit rate arranges, obtain better sense of hearing effect.
Embodiment five
The embodiment of the present invention provides a kind of device of the sub-band coding based on SILK codec, referring to Fig. 7.This device comprises:
The first acquisition module 701, for obtaining the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;
Coding module 702, processes for described low frequency time-domain signal being carried out to SILK coding, generates low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;
Generation module 703, for quantizing bit stream corresponding to the compression described current audio frame of generation by described low-frequency parameter and described high-frequency parameter.
Wherein, described coding module 702, comprising:
The first computing unit, while being unvoiced frame for described current audio frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;
The first judging unit, for according to the cutoff frequency of described harmonic structure, judges in described high frequency time-domain signal whether have harmonic structure;
The second computing unit, for when there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;
The 3rd computing unit, for described high frequency time-domain signal is inputted to linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;
Determining unit, for by described modulating frequency, described gain-adjusted than and described high frequency LSP parameter identification be high-frequency parameter corresponding to described high frequency time-domain signal.
Wherein, described the second computing unit, comprising:
First processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carries harmonic structure;
Second processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, does not carry harmonic structure;
The first computation subunit, for default mixing constant corresponding to described the first high frequency pumping of basis, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Wherein, described coding module 702 also comprises:
The 4th computing unit, for when described high frequency time-domain signal does not exist harmonic structure or described current audio frame to be unvoiced frames, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.
The embodiment of the present invention is encoded to low frequency signal by SILK scrambler, by high-frequency signal is encoded separately, more bit resource is distributed to low frequency signal, and go high-frequency signal coding by relatively few bit resource, thereby realize more rational bit resource, distribute.Can effectively improve code efficiency, and the harmonic structure in can reserved high-frequency signal, thereby under arranging, identical bit rate obtains better sense of hearing effect.
Embodiment six
The embodiment of the present invention provides a kind of device of the subband decoding based on SILK codec, referring to Fig. 8.This device comprises:
The second acquisition module 801, for obtaining the bit stream that current audio frame is corresponding, and obtains low-frequency parameter and high-frequency parameter by parameter decoder to described bit stream decoding;
Decoder module 802, for decoding and obtain low frequency time-domain signal described low-frequency parameter according to SILK demoder; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;
Synthesis module 803, for described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, described full range time-domain signal is the decoded voice data of described current audio frame.
Wherein, described decoder module 802, comprising:
The second judging unit, for according to the modulating frequency of described high-frequency parameter, judges in described current audio frame whether have harmonic structure;
The 5th computing unit, for when there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;
Synthesis unit, for by the high frequency LPC coefficient of described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.
Wherein, described the 5th computing unit, comprising:
The 3rd processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carries harmonic structure;
The 4th processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, does not carry harmonic structure;
The second computation subunit, for default mixing constant corresponding to described the 4th high frequency pumping of basis, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
Wherein, described decoder module 802 also comprises:
The 6th computing unit, for when there is not harmonic structure in described current audio frame, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping;
The embodiment of the present invention is by the voice data after low-and high-frequency signal is separately encoded, and the mode of decoding respectively according to low-and high-frequency signal is decoded.By SILK scrambler, separately low-frequency parameter is decoded, more bit resource is distributed to low frequency signal, and retained the harmonic structure in high-frequency parameter, under identical bit rate arranges, obtain better sense of hearing effect.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can complete by hardware, also can come the hardware that instruction is relevant to complete by program, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (16)

1. the method for coding subband based on SILK codec, is characterized in that, described method comprises:
Obtain the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;
Described low frequency time-domain signal is carried out to SILK coding and process, generate low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;
Described low-frequency parameter and described high-frequency parameter are quantized to bit stream corresponding to the compression described current audio frame of generation.
2. method according to claim 1, is characterized in that, described and according to described full range time-domain signal and described high frequency time-domain signal, encodes to process to generate high-frequency parameter corresponding to high frequency time-domain signal, comprising:
When described current audio frame is unvoiced frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;
According to the cutoff frequency of described harmonic structure, judge in described high frequency time-domain signal whether have harmonic structure;
While there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;
By described high frequency time-domain signal input linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;
By described modulating frequency, described gain-adjusted ratio and described high frequency LSP parameter identification, be high-frequency parameter corresponding to described high frequency time-domain signal.
3. method according to claim 2, it is characterized in that, describedly according to described low frequency time-domain signal, carry out the complete excitation of low frequency that obtains when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping, comprising:
According to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carry harmonic structure;
The excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, do not carry harmonic structure;
According to default mixing constant corresponding to described the first high frequency pumping, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
4. method according to claim 2, is characterized in that, described judge in described full range time-domain signal, whether there is harmonic structure after, described method also comprises:
While not existing harmonic structure or described current audio frame to be unvoiced frames in described high frequency time-domain signal, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.
5. the subband solutions code method based on SILK codec, is characterized in that, described method comprises:
Obtain the bit stream that current audio frame is corresponding, and by parameter decoder, described bit stream decoding is obtained to low-frequency parameter and high-frequency parameter;
According to SILK demoder, described low-frequency parameter is decoded and obtained low frequency time-domain signal; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;
Described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, and described full range time-domain signal is the decoded voice data of described current audio frame.
6. method according to claim 5, is characterized in that, the described intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, decodes and obtain high frequency time-domain signal described high-frequency parameter, comprising:
According to the modulating frequency in described high-frequency parameter, judge in described current audio frame whether have harmonic structure;
When there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;
By the high frequency LPC coefficient in described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.
7. method according to claim 6, is characterized in that, described according to the complete excitation of described low frequency, and described low frequency voiceless sound excitation and described modulating frequency are calculated simulation high frequency pumping, comprising:
According to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carry harmonic structure;
The excitation of described low frequency voiceless sound is carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, do not carry harmonic structure;
According to default mixing constant corresponding to described the 4th high frequency pumping, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
8. method according to claim 6, is characterized in that, described judge in described current audio frame, whether there is harmonic structure after, described method also comprises:
When there is not harmonic structure in described current audio frame, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping.
9. the subband coding apparatus based on SILK codec, is characterized in that, described device comprises:
The first acquisition module, for obtaining the full range time-domain signal that current audio frame is corresponding; And described full range time-domain signal is decomposed into low frequency time-domain signal and high frequency time-domain signal;
Coding module, processes for described low frequency time-domain signal being carried out to SILK coding, generates low-frequency parameter corresponding to described low frequency time-domain signal; And according to described low-frequency parameter, described high frequency time-domain signal is encoded and processed high-frequency parameter corresponding to generation high frequency time-domain signal;
Generation module, for quantizing bit stream corresponding to the compression described current audio frame of generation by described low-frequency parameter and described high-frequency parameter.
10. device according to claim 9, is characterized in that, described coding module, comprising:
The first computing unit, while being unvoiced frame for described current audio frame, described full range time-domain signal is converted into full range frequency-region signal, according to described low frequency time-domain signal, carry out the pitch period obtaining when SILK coding is processed, and by described pitch period and described full-time frequency-region signal input harmonics structure analyzer, calculate the cutoff frequency of harmonic structure;
The first judging unit, for according to the cutoff frequency of described harmonic structure, judges in described high frequency time-domain signal whether have harmonic structure;
The second computing unit, for when there is harmonic structure in described high frequency time-domain signal, according to described low frequency time-domain signal, carry out the complete excitation of low frequency obtaining when SILK coding is processed, described low frequency voiceless sound excitation and the modulating frequency calculating according to the cutoff frequency of described harmonic structure, calculate simulation high frequency pumping;
The 3rd computing unit, for described high frequency time-domain signal is inputted to linear predictor coefficient LPC analyzer, calculate true high frequency pumping and high frequency line spectrum antithetical phrase LSP coefficient, and according to described simulation high frequency pumping and described true high frequency pumping, calculate gain-adjusted ratio;
Determining unit, for by described modulating frequency, described gain-adjusted than and described high frequency LSP parameter identification be high-frequency parameter corresponding to described high frequency time-domain signal.
11. devices according to claim 10, is characterized in that, described the second computing unit, comprising:
First processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the first high frequency pumping after high-pass filtering, in described the first high frequency pumping, carries harmonic structure;
Second processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the second high frequency pumping, in described the second high frequency pumping, does not carry harmonic structure;
The first computation subunit, for default mixing constant corresponding to described the first high frequency pumping of basis, with default mixing constant corresponding to described the second high frequency pumping, described the first high frequency pumping and described the second high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
12. devices according to claim 10, is characterized in that, described coding module also comprises:
The 4th computing unit, for when described high frequency time-domain signal does not exist harmonic structure or described current audio frame to be unvoiced frames, the low frequency voiceless sound excitation of carrying out obtaining when SILK coding is processed according to described low frequency time-domain signal is carried out spectrum folding and time delay alignment and is obtained third high and frequently encourage, and described third high is frequently encouraged and is defined as simulating high frequency pumping.
13. 1 kinds of subband decoding devices based on SILK codec, is characterized in that, described device comprises:
The second acquisition module, for obtaining the bit stream that current audio frame is corresponding, and obtains low-frequency parameter and high-frequency parameter by parameter decoder to described bit stream decoding;
Decoder module, for decoding and obtain low frequency time-domain signal described low-frequency parameter according to SILK demoder; And the intermediate parameters generating when described low-frequency parameter is decoded according to SILK demoder, described high-frequency parameter is decoded and obtained high frequency time-domain signal;
Synthesis module, for described low frequency time-domain signal and described high frequency time-domain signal are synthesized to full range time-domain signal by QMF compositor, described full range time-domain signal is the decoded voice data of described current audio frame.
14. devices according to claim 13, is characterized in that, described decoder module, comprising:
The second judging unit, for according to the modulating frequency of described high-frequency parameter, judges in described current audio frame whether have harmonic structure;
The 5th computing unit, for when there is harmonic structure in described current audio frame, the complete excitation of low frequency generating when obtaining described SILK demoder described low-frequency parameter being decoded and the excitation of low frequency voiceless sound, and according to the complete excitation of described low frequency, described low frequency voiceless sound excitation and described modulating frequency, calculate simulation high frequency pumping;
Synthesis unit, for by the high frequency LPC coefficient of described high-frequency parameter and gain-adjusted ratio, and described simulation high frequency pumping input LPC compositor, the high frequency time-domain signal after output is synthetic.
15. devices according to claim 14, is characterized in that, described the 5th computing unit, comprising:
The 3rd processes subelement, for according to described modulating frequency, the complete excitation of described low frequency is carried out to frequency spectrum translation and obtain full range excitation, and described full range excitation is carried out obtaining the 4th high frequency pumping after high-pass filtering, in described the 4th high frequency pumping, carries harmonic structure;
The 4th processes subelement, for described low frequency voiceless sound excitation being carried out to spectrum folding and time delay alignment obtains the 5th high frequency pumping, in described the 5th high frequency pumping, does not carry harmonic structure;
The second computation subunit, for default mixing constant corresponding to described the 4th high frequency pumping of basis, with default mixing constant corresponding to described the 5th high frequency pumping, described the 4th high frequency pumping and described the 5th high frequency pumping are carried out to mixed weighting and calculate simulation high frequency pumping.
16. devices according to claim 14, is characterized in that, described decoder module also comprises:
The 6th computing unit, for when there is not harmonic structure in described current audio frame, the low frequency voiceless sound excitation generating when obtaining described SILK demoder described low-frequency parameter being decoded, and excitation is carried out spectrum folding and time delay alignment obtains the 6th high frequency pumping according to described low frequency voiceless sound, and described the 6th high frequency pumping is defined as simulating high frequency pumping.
CN201310740505.7A 2013-12-27 2013-12-27 Sub-band coding and decoding method and device based on SILK coder decoder Active CN103714822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310740505.7A CN103714822B (en) 2013-12-27 2013-12-27 Sub-band coding and decoding method and device based on SILK coder decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310740505.7A CN103714822B (en) 2013-12-27 2013-12-27 Sub-band coding and decoding method and device based on SILK coder decoder

Publications (2)

Publication Number Publication Date
CN103714822A true CN103714822A (en) 2014-04-09
CN103714822B CN103714822B (en) 2017-01-11

Family

ID=50407727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310740505.7A Active CN103714822B (en) 2013-12-27 2013-12-27 Sub-band coding and decoding method and device based on SILK coder decoder

Country Status (1)

Country Link
CN (1) CN103714822B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105047201A (en) * 2015-06-15 2015-11-11 广东顺德中山大学卡内基梅隆大学国际联合研究院 Broadband excitation signal synthesis method based on segmented expansion
CN105808651A (en) * 2016-02-29 2016-07-27 四川秘无痕信息安全技术有限责任公司 Android WeChat based silk_v3 voice file format decoding method
CN108231083A (en) * 2018-01-16 2018-06-29 重庆邮电大学 A kind of speech coder code efficiency based on SILK improves method
CN110085242A (en) * 2019-04-28 2019-08-02 武汉大学 A kind of adaptive steganography method in SILK fundamental tone domain based on minimum distortion cost
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
CN113096670A (en) * 2021-03-30 2021-07-09 北京字节跳动网络技术有限公司 Audio data processing method, device, equipment and storage medium
CN114598886A (en) * 2022-05-09 2022-06-07 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image coding method, decoding method and related device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5687157A (en) * 1994-07-20 1997-11-11 Sony Corporation Method of recording and reproducing digital audio signal and apparatus thereof
CN1222997A (en) * 1996-07-01 1999-07-14 松下电器产业株式会社 Audio signal coding and decoding method and audio signal coder and decoder
WO2006107840A1 (en) * 2005-04-01 2006-10-12 Qualcomm Incorporated Systems, methods, and apparatus for wideband speech coding
CN101185124A (en) * 2005-04-01 2008-05-21 高通股份有限公司 Method and apparatus for dividing frequencyband coding of voice signal
CN101276587A (en) * 2007-03-27 2008-10-01 北京天籁传音数字技术有限公司 Audio encoding apparatus and method thereof, audio decoding device and method thereof
US20100228541A1 (en) * 2005-11-30 2010-09-09 Matsushita Electric Industrial Co., Ltd. Subband coding apparatus and method of coding subband
CN101903945A (en) * 2007-12-21 2010-12-01 松下电器产业株式会社 Encoder, decoder, and encoding method
CN101964189A (en) * 2010-04-28 2011-02-02 华为技术有限公司 Audio signal switching method and device
CN102436820A (en) * 2010-09-29 2012-05-02 华为技术有限公司 High frequency band signal coding and decoding methods and devices
CN102473414A (en) * 2009-06-29 2012-05-23 弗兰霍菲尔运输应用研究公司 Bandwidth extension encoder, bandwidth extension decoder and phase vocoder
US20130117029A1 (en) * 2011-05-25 2013-05-09 Huawei Technologies Co., Ltd. Signal classification method and device, and encoding and decoding methods and devices
CN103165134A (en) * 2013-04-02 2013-06-19 武汉大学 Coding and decoding device of audio signal high frequency parameter

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5687157A (en) * 1994-07-20 1997-11-11 Sony Corporation Method of recording and reproducing digital audio signal and apparatus thereof
CN1222997A (en) * 1996-07-01 1999-07-14 松下电器产业株式会社 Audio signal coding and decoding method and audio signal coder and decoder
WO2006107840A1 (en) * 2005-04-01 2006-10-12 Qualcomm Incorporated Systems, methods, and apparatus for wideband speech coding
CN101185124A (en) * 2005-04-01 2008-05-21 高通股份有限公司 Method and apparatus for dividing frequencyband coding of voice signal
US20100228541A1 (en) * 2005-11-30 2010-09-09 Matsushita Electric Industrial Co., Ltd. Subband coding apparatus and method of coding subband
CN101276587B (en) * 2007-03-27 2012-02-01 北京天籁传音数字技术有限公司 Audio encoding apparatus and method thereof, audio decoding device and method thereof
CN101276587A (en) * 2007-03-27 2008-10-01 北京天籁传音数字技术有限公司 Audio encoding apparatus and method thereof, audio decoding device and method thereof
CN101903945A (en) * 2007-12-21 2010-12-01 松下电器产业株式会社 Encoder, decoder, and encoding method
CN102473414A (en) * 2009-06-29 2012-05-23 弗兰霍菲尔运输应用研究公司 Bandwidth extension encoder, bandwidth extension decoder and phase vocoder
CN101964189A (en) * 2010-04-28 2011-02-02 华为技术有限公司 Audio signal switching method and device
CN102436820A (en) * 2010-09-29 2012-05-02 华为技术有限公司 High frequency band signal coding and decoding methods and devices
US20130117029A1 (en) * 2011-05-25 2013-05-09 Huawei Technologies Co., Ltd. Signal classification method and device, and encoding and decoding methods and devices
CN103165134A (en) * 2013-04-02 2013-06-19 武汉大学 Coding and decoding device of audio signal high frequency parameter

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
K.VOS,S.JENSEN: ""SILK Speech Codec draft-vos-silk-02"", 《NETWORK WORKING GROUP》 *
郑国宏等: ""宽带语音编码技术专题讲座(四)一种适用于VOIP的开宽带语音编码算法:SILK"", 《军事通信技术》 *
韩怡: ""基于GPGPPU的SILK语音Codec优化"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105047201A (en) * 2015-06-15 2015-11-11 广东顺德中山大学卡内基梅隆大学国际联合研究院 Broadband excitation signal synthesis method based on segmented expansion
CN105808651A (en) * 2016-02-29 2016-07-27 四川秘无痕信息安全技术有限责任公司 Android WeChat based silk_v3 voice file format decoding method
CN108231083A (en) * 2018-01-16 2018-06-29 重庆邮电大学 A kind of speech coder code efficiency based on SILK improves method
CN110085242A (en) * 2019-04-28 2019-08-02 武汉大学 A kind of adaptive steganography method in SILK fundamental tone domain based on minimum distortion cost
CN110085242B (en) * 2019-04-28 2021-04-16 武汉大学 SILK-based sound range self-adaptive steganography method based on minimum distortion cost
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
WO2021258940A1 (en) * 2020-06-24 2021-12-30 腾讯科技(深圳)有限公司 Audio encoding/decoding method and apparatus, medium, and electronic device
CN112767954B (en) * 2020-06-24 2024-06-14 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
CN113096670A (en) * 2021-03-30 2021-07-09 北京字节跳动网络技术有限公司 Audio data processing method, device, equipment and storage medium
CN113096670B (en) * 2021-03-30 2024-05-14 北京字节跳动网络技术有限公司 Audio data processing method, device, equipment and storage medium
CN114598886A (en) * 2022-05-09 2022-06-07 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image coding method, decoding method and related device
CN114598886B (en) * 2022-05-09 2022-09-13 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image coding method, decoding method and related devices

Also Published As

Publication number Publication date
CN103714822B (en) 2017-01-11

Similar Documents

Publication Publication Date Title
CN103714822A (en) Sub-band coding and decoding method and device based on SILK coder decoder
US11721349B2 (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
CN1136537C (en) Synthesis of speech using regenerated phase information
CN101276587B (en) Audio encoding apparatus and method thereof, audio decoding device and method thereof
CN102341852B (en) Filtering speech
CN104025189B (en) The method of encoding speech signal, the method for decoded speech signal, and use its device
US11594236B2 (en) Audio encoding/decoding based on an efficient representation of auto-regressive coefficients
CN104969290A (en) Method and apparatus for controlling audio frame loss concealment
CN103384900A (en) Low-delay sound-encoding alternating between predictive encoding and transform encoding
KR20160087827A (en) Selective phase compensation in high band coding
JPH10307599A (en) Waveform interpolating voice coding using spline
CN110634503B (en) Method and apparatus for signal processing
EP3055860B1 (en) Gain shape estimation for improved tracking of high-band temporal characteristics
CN103325375A (en) Coding and decoding device and method of ultralow-bit-rate speech
CN104978970A (en) Noise signal processing and generation method, encoder/decoder and encoding/decoding system
CN101281749A (en) Apparatus for encoding and decoding hierarchical voice and musical sound together
CN105280190A (en) Bandwidth extension encoding and decoding method and device
JPH10319996A (en) Efficient decomposition of noise and periodic signal waveform in waveform interpolation
CN103999153A (en) Method and device for quantizing voice signals in a band-selective manner
US5717819A (en) Methods and apparatus for encoding/decoding speech signals at low bit rates
CN105280189A (en) Method and apparatus for high-frequency generation during bandwidth extension coding and decoding
Eriksson et al. On waveform-interpolation coding with asymptotically perfect reconstruction
Shoham Low complexity speech coding at 1.2 to 2.4 kbps based on waveform interpolation
Cao et al. Research on order-variable code exited linear prediction speech coding method
Li et al. Speech coding based on pitch synchrony and two-stage transformation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 511446 Guangzhou City, Guangdong Province, Panyu District, South Village, Huambo Business District Wanda Plaza, block B1, floor 28

Applicant after: Guangzhou Huaduo Network Technology Co., Ltd.

Address before: 510655, Guangzhou, Whampoa Avenue, No. 2, creative industrial park, building 3-08,

Applicant before: Guangzhou Huaduo Network Technology Co., Ltd.

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210118

Address after: 511442 3108, 79 Wanbo 2nd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Patentee after: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 511446 28th floor, block B1, Wanda Plaza, Wanbo business district, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140409

Assignee: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

Assignor: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Contract record no.: X2021440000053

Denomination of invention: Subband codec method and device based on silk codec

Granted publication date: 20170111

License type: Common License

Record date: 20210208