CN108231083A - A kind of speech coder code efficiency based on SILK improves method - Google Patents
A kind of speech coder code efficiency based on SILK improves method Download PDFInfo
- Publication number
- CN108231083A CN108231083A CN201810040152.2A CN201810040152A CN108231083A CN 108231083 A CN108231083 A CN 108231083A CN 201810040152 A CN201810040152 A CN 201810040152A CN 108231083 A CN108231083 A CN 108231083A
- Authority
- CN
- China
- Prior art keywords
- signal
- noise
- input
- speech
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000013139 quantization Methods 0.000 claims abstract description 27
- 238000001914 filtration Methods 0.000 claims abstract description 13
- 238000005086 pumping Methods 0.000 claims abstract description 13
- 230000008859 change Effects 0.000 claims abstract description 8
- 230000003595 spectral effect Effects 0.000 claims abstract description 8
- 238000007493 shaping process Methods 0.000 claims description 39
- 238000004458 analytical method Methods 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 19
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims description 7
- 238000011045 prefiltration Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000005284 excitation Effects 0.000 claims description 4
- 230000005236 sound signal Effects 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 claims description 4
- 239000004615 ingredient Substances 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 238000003786 synthesis reaction Methods 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims 1
- 238000004891 communication Methods 0.000 description 13
- 230000005540 biological transmission Effects 0.000 description 5
- 239000002131 composite material Substances 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000007774 longterm Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- PASHVRUKOFIRIK-UHFFFAOYSA-L calcium sulfate dihydrate Chemical compound O.O.[Ca+2].[O-]S([O-])(=O)=O PASHVRUKOFIRIK-UHFFFAOYSA-L 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/012—Comfort noise or silence coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
- G10L19/265—Pre-filtering, e.g. high frequency emphasis prior to encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention proposes that a kind of speech coder code efficiency based on SILK improves method.Specific implementation method includes:First, to adding specific noise in input speech signal, analog signal is generated, when then carrying out long to the analog signal and short-term prediction, the prediction gain of predictive filter is improved, the entropy of quantization index is reduced, so as to improve code efficiency;Secondly, in coding side pumping signal is determined by minimizing perceptual weighting reconstruction error, the spectral regions between quantizing noise and signal with high correlation are compressed using post-filtering in decoding end, different weighting filters is added by the input to noise shaper quantizer and reconstruction signal so that two above function is combined in the quantizer of coding side.Using method proposed by the present invention, side information is not both needed to, without change bitstream format, and the code efficiency of SILK can be improved.
Description
Technical field
The invention belongs to field of voice communication, especially relate to a kind of wideband acoustic encoder based on SILK, extensively
Applied to videoconference, voice-over-net telephone service (Voice over Internet Protocol, VoIP), wireless communication,
In the real-time speech communicatings scene such as gaming platform.
Background technology
Voice be the mankind most directly, most convenient, most efficient information transmission media, therefore the transmission of voice signal is most
The basic function that number communication system has.With the development of science and technology, the non-voice informations such as image, text are in information transmission
In occupation of increasing ratio, but it is still one of function of numerous communication system indispensabilities effectively to transmit voice messaging.
In digital communication systems, primary speech signal can be just transmitted, but voice signal after having to pass through digitlization
After analog/digital conversion, data volume increases, and after such as carrying out 16kHz samplings, 16bit uniform quantizations to voice signal, numeric code rate can
Reach 256kbps.The audio digital signals of high-speed need the bandwidth of bigger when being transmitted in communication network, this has band resource
The communication system of limit, such as cellular mobile communication, increase transmission cost, therefore, it is necessary to digitized voice signal into
Row compressed encoding.
1972, International Telegraph and Telephone Advisory Committee (Consultative Committee of International
Telegraph and Telephone, CCITT) disclose the speech coding standard of 64kbps G.711, it uses pulse code
(Pulse Code Modulation, the PCM) technology of modulation, applied in telephone communication service;1980, CCITT was formulated
The modulation of 32kbps adaptive difference pulse codes (Adaptive Differential Pulse Code Modulation,
ADPCM) speech coding standard is G.721;Subsequently, based on analysis synthesis speech coding algorithm become mainstream, 1992, CCITT
Disclose short delay Code Excited Linear Prediction (Low Delay Code Excited the Linear Prediction, LD- of 16kbps
CELP speech coding schemes) are G.728;8kbps conjugate structure algebraic code excited linear predictions are formulated within 1996
The voice of (Conjugate Structure Algebraic Code Excited Linear Prediction, CS-ACELP)
Coding standard is G.729.The standard can be applied to VoIP and H.323 wait Speech Communications field.With being continuously increased for network bandwidth,
Terminal processing capacity constantly enhances, and user requires also constantly improving to speech quality, broadband, ultra wide band, Whole frequency band voice
Coding techniques is widely studied and applied.
In traditional narrowband speech coding standard, speech signal bandwidth is generally limited in the range of 300Hz~3400Hz,
Sample frequency is 8kHz.The limitation of this bandwidth limits the naturalness of voice so that some special sound treatment effects are not to the utmost
Comply with one's wishes, also restrict further improving for speech coding quality.In order to realize the voice communication of high-quality, people introduce frequency band
Wideband speech coding for 50Hz~7000Hz.Relative to narrowband speech, the low-frequency range expansion of 50Hz~300Hz improves
Naturalness, presence and the comfort level of voice;The high frequency expansion of 3400Hz~7000Hz, is rubbed due to can preferably distinguish
Fricative and plosive etc., so as to improve the intelligibility of voice.Therefore, internal and international many research institutions and tissue for many years
To be dedicated to the formulation of the research of wideband speech coding algorithm and standard always.Up to the present, a variety of width have been made
Band speech coding standard:ITU-T G.722, ITU-T G.722.1, ITU-T it is G.722.2 wide with 3GPP2 variable bit rate multi-modes
Band audio coder & decoder (codec) (Variable-Rate Multimode Wideband, VMR-WB).3GPP2 selectes VMR-WB within 2003
Make the wideband voice codec of CDMA2000 systems.Subsequent ITU-T has also been proposed several new Embedded Broad-band voice codings
Standard ITU-T G.729.1, ITU-T G.711.1 with ITU-T G.718, wherein 2006 formulate G.729.1 most represent
Property.G.729.1 wideband extension (bandwidth expansion to 50Hz~7000Hz) has been carried out on the basis of G.729;In March, 2008 ITU-
G.711.1 T has promulgated the Embedded Broad-band voice standardized and audio coding standard again, code rate 64kbps,
80kbps, 96kbps etc.;G.718 be standardized in June, 2008 one of ITU-T have to frame erasing the narrowband of robustness/
Broadband is embedded, variable rate voice and audio coder, code rate have 8kbps, 12kbps, 16kbps, 24kbps and
Five kinds of 32kbps, when carrying out narrowband encoding and decoding, encoder only supports two kinds of rates of 8kbps, 12kbps, during wideband encoding, branch
Hold all 5 kinds of rates;The broadband multi-rate speech coder of early stage is mainly used in video conference, and then mainly concentrates now
On VoIP and wireless application.
With the development of Internet technology and popularizing for application, the low-cost networking telephone is furtherd investigate, the world
Multiple standardization bodies and industrial bodies propose many corresponding speech coding schemes.Including International Telecommunication Union
G.711, G.723.1 and G.729A, the voice codings such as iLBC, SILK that the industries enterprise such as GIPS companies and Skype companies proposes
Algorithm.SILK is a set of encoding and decoding speech solution that Skype companies voluntarily develop, it supports adopting for 8,12,16 and 24kHz
The multi-rate coding bit rate of sample frequency and 6~40kbps.The encoder can not only provide real-time retractility to adapt to
The variation of network quality, and can be in the audio letter for being less than offer ultra wide band in the case of 50% than former network occupancy
Number, even if in the case where packet loss is higher, it can still stablize the call tone quality for remaining certain.Due to can be in low bandwidth
There is provided more good speech quality in environment, the application prospect of SILK is by extensive concern, and key algorithm grinds in SILK encoders
Study carefully becomes the target that numerous researchers contend at present with further promoted of performance.Therefore, it designs a kind of high-quality based on SILK
The speech coder of amount and high coding efficiency, and apply it to such as videoconference, VoIP, wireless communication, gaming platform in fact
When voice communication scene in, have important research significance and application value.
The mode of redundancy coding and multiframe packing is supported when SILK is encoded, although which can enhance the appearance of SILK
Wrong ability, but redundancy coding can increase bit rate, so as to influence the code efficiency of SILK.Therefore it is intended to and does not reduce
Under the premise of coding quality, code efficiency is improved.
Invention content
It is proposed that a kind of code efficiency is higher, the coding quality preferably voice coder based on SILK in view of the deficiencies of the prior art
Code device.Technical scheme is as follows:It includes the decoding step of the coding step of coding side and decoding end, wherein being based on
The speech coder code efficiency of SILK improves method, and step is as follows:
101st, input speech signal carries out voice activation detection (Voice activation to input speech signal first
Detection, VAD) processing, detect the pause occurred in voice, quiet interval and efficient voice ingredient;Meanwhile by voice
Signal eliminates all direct current biasings and 50Hz or 60Hz buzzs by the high-pass filter that frequency is 70Hz;
102 then to voice signal carry out pitch analysis, SILK by open-loop pitch analysis to voice signal carry out it is pure and impure
Sound is adjudicated, and the pitch period of Voiced signal is estimated, so as to obtain the auto-correlation coefficient of fundamental tone and fundamental tone time delay;
103rd, the output signal of high-pass filtering is subjected to noise shaping analysis (Noise Shaping Analysis, NSA),
The gain used in prefilter and noise shaper quantizer and filter coefficient are obtained using NSA;
104th, the signal input generation analog signal module obtained pitch analysis and NSA, while pitch analysis is exported
Signal carries out long-term prediction analysis (Long Time Prediction, LTP) and analyzes, and the output of NSA is carried out at pre-filtering
Reason;
105th, to by generation analog signal and high-pass filtering treated the further forecast analysis of signal, then by its turn
Line spectral frequencies (Linear spectral frequency, LSF) parameter is changed to, and feature is extracted using multi-stage vector quantization
Parameter, then by the Parameter Switch after quantization be linear forecasting parameter (Linear Predictive Coding, LPC), pass through
The synchronization of encoding and decoding is realized in this conversion;
106th, noise shaped quantization (Noise Shaping Quantizer, NSQ) is carried out on the basis of step 105, is led to
Noise shaping is crossed so that noise spectrum follows the spectral change of signal so that noise is not easy to be audible;
107th, Interval Coding is carried out to the speech characteristic parameter extracted, realizes entire cataloged procedure.
Analog signal module is generated in further step 104 using comprising time-varying source filter model come encoded voice to believe
Number, which consists of the following parts:
Input is made of the voice signal comprising some row successive frames;
First signal processing module, it is intended to the method that particular noise signals are added by the speech signal frame to input,
To realize the operation to each voice signal generation analog signal in series of successive frames.
Second signal processing module, it is intended to determine the LPC coefficient signal based on analog signal frame;It further determines that and is based on
The LPC residual signal of the LPC coefficient of input signal;
Third signal processing module, it is intended to encode to generate generation by LPC coefficient and LPC residual signal count
The encoded signal of predicative sound signal.
Analog signal generation step is as follows:
A1:First first as analog output signal is added using input speech signal with the output of noise shaping filter
A input, wherein noise shaping filter by it is long when shaping and shaping filter forms in short-term;
A2:White noise and the quantization gain analyzed by noise shaping is defeated as second of analog output signal
Enter, wherein, white noise has following features, i.e. its variance is identical with the variance of quantizing noise;
A3:The output of step A1 and two analog signals obtained by A2 are added to the analog output signal that can be obtained to the end,
Complete the generation of analog signal in step 104;
Noise shaper quantizer individually composes shaping to signal and coding noise in further step 106, can be
Higher voice quality is obtained under identical bit.Prefilter output signal is multiplied by one and is calculated during NSA first
Compensating gain G, then with synthesize shaping filter output be added, then subtract each other with the output of a predictive filter, finally
A residual signals are obtained, the quantization multiplied by gains that the residual signals and NSA are obtained will be in obtained result and step 104
The specific noise input lattice quantizer of generation, the quantization index representative of quantizer are input to the excitation index of Interval Coding device,
The output of predictive filter is added the output signal so as to be quantified with pumping signal, at the same again using quantized output signal as
Synthesize shaping and the input of predictive filtering.It is different from classical NSQ, it is of the invention in NSQ noise shaping directly about quantifying
Around device and input terminal is fed back to, the input terminal of quantizer is back to after the voice signal of input and output is compared.
It advantages of the present invention and has the beneficial effect that:
Method is improved using the speech coder code efficiency based on SILK in the present invention, coding quality can not influenced
Under the premise of, coding bit rate is effectively reduced, it, can so as to fulfill a kind of high coding efficiency, the SILK speech coders of high quality
It applies in the real-time speech communicatings scene such as videoconference, VoIP, wireless communication, gaming platform, therefore the present invention has well
Application prospect and practical value.
Description of the drawings
Fig. 1 embodiment SILK voice coding flow charts provided by the invention
Fig. 2 present invention generates analog signal module diagram
Fig. 3 embodiment high efficiency SILK voice coding flow charts provided by the invention
Fig. 4 noise shaped quantization functional block diagrams of the present invention
Fig. 5 embodiment SILK tone decoding flow charts provided by the invention
Specific implementation method
Below in conjunction with attached drawing, the invention will be further described:
SILK speech coding principles block diagram is as shown in Figure 1, whole using source filter classical model, i.e., voice is generated
Based on system modelling, by two stage filter, first order long-term prediction filter removes the periodic component in voiced speech, clearly
Sound does not need to then carry out LTP processing;Second step is filtered in short-term, the redundancy between nearly sampling point is removed, here using primary
LPC coefficient is calculated in lattice algorithm, then using the method for multi-stage vector quantization;Excitation is can be obtained by by this two stage filter
Then signal carries out gain quantization, NSQ and normalization, Interval Coding is used to the signal after normalization.Specific implementation step is such as
Under:
Step 1:Input speech signal carries out VAD processing to input speech signal first, detects what is occurred in voice
Pause, quiet interval and efficient voice ingredient;Meanwhile voice signal by the high-pass filter that frequency is 70Hz is eliminated and is owned
Direct current biasing and 50Hz or 60Hz buzzs;
Step 2:Then pitch analysis is carried out to voice signal, SILK carries out voice signal by open-loop pitch analysis
Voicing decision estimates the pitch period of Voiced signal, so as to obtain the auto-correlation coefficient of fundamental tone and fundamental tone time delay;
Step 3:By the output signal of high-pass filtering carry out noise shaping analysis (Noise Shaping Analysis,
NSA), the gain used in prefilter and noise shaper quantizer and filter coefficient are obtained using NSA;
Step 4:The signal input generation analog signal module that pitch analysis and NSA are obtained, at the same it is defeated to pitch analysis
Go out signal and carry out long-term prediction analysis analysis, pre-filtering processing is carried out to the output of NSA;
Step 5:To by generation analog signal and high-pass filtering treated the further forecast analysis of signal, then will
It is converted to LSF parameters, and using multi-stage vector quantization to extract characteristic parameter, then by the Parameter Switch after quantization is linear
Prediction Parameters realize the synchronization of encoding and decoding by this conversion;
Step 6:Noise shaped quantization is carried out on the basis of step 5, is followed by noise shaping noise spectrum
The spectral change of signal so that noise is not easy to be audible;
Step 7:Interval Coding is carried out to the speech characteristic parameter extracted, realizes entire cataloged procedure.
What Fig. 2 was provided be it is a kind of improve code efficiency specific implementation method, coding side generate one kind can and frequency spectrum
The analog signal that feature matches replaces original input signal with the analog signal, then in conjunction with to analog signal it is long when it is pre-
It surveys and short-term prediction, the prediction gain to cause predictive filter gets a promotion, and the entropy of quantization index is reduced, so as to
Reduce bit rate required during transmission encoding speech signal, the code efficiency of the encoder of raising.
Using comprising time-varying source filter model, come encoding speech signal, which consists of the following parts:
Input is made of the voice signal comprising some row successive frames;
First signal processing module, it is intended to the method that particular noise signals are added by the speech signal frame to input,
To realize the operation to each voice signal generation analog signal in series of successive frames.
Second signal processing module, it is intended to determine the LPC coefficient signal based on analog signal frame;It further determines that and is based on
The LPC residual signal of the LPC coefficient of input signal;
Third signal processing module, it is intended to encode to generate generation by LPC coefficient and LPC residual signal count
The encoded signal of predicative sound signal.
Analog signal generation step is as follows:
S1:First first as analog output signal is added using input speech signal with the output of noise shaping filter
A input, wherein noise shaping filter by it is long when shaping and shaping filter forms in short-term;
S2:White noise and the quantization gain analyzed by noise shaping is defeated as second of analog output signal
Enter, wherein, white noise has following features, i.e. its variance is identical with the variance of quantizing noise;
S3:The simulation output that the output of step 1 and two analog signals obtained by step 2, which is added, can obtain to the end is believed
Number, complete the generation of analog signal in step 4;
Rationally it is added to SILK speech coders by the way that analog signal module will be generated, the quantizing noise that step 6 is obtained
Input as NSQ, you can obtain high efficiency SILK speech coders shown in Fig. 3.It replaces being originally inputted with the analog signal
Signal, then in conjunction with the long-term prediction and short-term prediction to analog signal, the prediction gain to cause predictive filter is carried
It rises, the entropy of quantization index is reduced, and bit rate required during encoding speech signal, the volume of the encoder of raising are transmitted so as to reduce
Code efficiency.
NSQ module quantifies residual signals, while pumping signal can also be generated.In coding side by minimizing perceptual weighting
Reconstruction error determines pumping signal, has higher phase between quantizing noise and signal using post-filtering to compress in decoding end
The spectral regions of closing property, the NSQ in the present invention by adding different weighting filter to input and reconstruction signal so that
Two above function is combined in the quantizer of encoder.The two operations of coding side are integrated not just to simplification
Decoding end, and for make coding side using arbitrary simple/complicated sensor model come synchronize/by oneself shaping quantization make an uproar
Sound and enhance/inhibit spectral regions, using this model, do not need to spend side information or change bitstream format.Fig. 4 is
Embodiment noise shaped quantization functional block diagram provided by the invention, predictive filter includes the filters of two kinds of predictions of LPC and LTP in figure
Wave device.FanaAnd FsynAnalysis and composite noise shaping filter respectively, for unvoiced frame they all comprising it is long when and in short-term two
Kind wave filter, the excitation index of quantization are represented by i (n).LTP coefficient, gain and each subframe update of noise shaping coefficient are primary,
And then per frame, update is primary for LPC coefficient.The output of NSQ quantizers is obtained by formula (1):
The first part of formula (1) is input signal shaping unit, and second part is quantized noise shaping part.
Fig. 5 embodiment SILK tone decoding flow charts provided by the invention.
In receiving terminal, the data packet received is divided into many frames by becoming code length decoder, these frames are included in data
Bao Zhong.The necessary information of the output signal of one 20ms frame of reconstruct is included per frame.
Step 1:Section decoder.Change module decoded speech characteristic parameter from the bit stream received, change the defeated of module
Go out including the generation pulse of pumping signal and the index of gain and LTP and LSF code books, which is used to decode LTP and LPC
Coefficient, and the coefficient can be used for carrying out LTP and lpc analysis to pumping signal;
Step 2:Decoding parametric.Pulse and gain can be obtained after step 1 decoding, if the speech frame that decoding obtains is
Unvoiced frame, then can decode the target code book and index of LTP, LTP coefficient be decoded by the target code book of LTP, to every frame
In four subframes in all similarly handled;LPC coefficient then decodes to obtain by LSF code books, each vector in code book
Come from each stage in code book;
Step 3:Generate pumping signal.Pulse signal and quantization multiplied by gains obtain pumping signal;
Step 4:LTP is synthesized.It, should using pumping signal e (n) as the input of LTP composite filters for voiced speech
Wave filter can rebuild one by LTP analysis filters remove it is long when autocorrelation sequence, and pass through formula (2) generate one
LPC pumping signal e_LPC (n);
Wherein, L is fundamental tone time delay, and b_i is decoding LTP coefficient;
For voiceless sound, output signal is then the simple copy of pumping signal, i.e. e_LPC (n)=e (n);
Step 5:LPC is synthesized.LPC composite filters reconstruct the auto-correlation in short-term fallen by lpc analysis filters filter
Value, LPC pumping signal e_LPC (n) are filtered by LTP coefficient a_i, and decoded signal can be obtained according to formula (3):
Wherein d_LPC is the exponent number of LPC composite filters, and y (n) is decoded output signal.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.
After the content for having read the record of the present invention, technical staff can make various changes or modifications the present invention, these equivalent changes
Change and modification equally falls into the scope of the claims in the present invention.
Claims (4)
1. a kind of speech coder code efficiency based on SILK improves method, which is characterized in that it includes the coding of coding side
The decoding step of step and decoding end, wherein the speech coder code efficiency based on SILK improves method, step is specially:
101st, input speech signal carries out VAD processing to input speech signal first, detects the pause occurred in voice, quiet
Silent interval and efficient voice ingredient, meanwhile, by voice signal by the high-pass filter that cutoff frequency is 70Hz, eliminate all straight
Stream biasing and 50Hz or 60Hz buzzs;
102nd, pitch analysis is carried out to voice signal, SILK carries out clear/voiced sound to voice signal by open-loop pitch analysis and adjudicates,
The pitch period of Voiced signal is estimated, obtains the auto-correlation coefficient of fundamental tone and fundamental tone time delay;
103rd, the output signal of high-pass filtering is subjected to noise shaping analysis, analyzes to obtain prefilter and make an uproar using noise shaping
The gain used in sound shaper quantizer and filter coefficient;
104th, the signal for analyzing pitch analysis and noise shaping, input generation analog signal module, while to fundamental tone point
It analyses output signal and carries out LTP analyses, pre-filtering processing is carried out to the output of noise shaping analysis;
105th, to by generation analog signal module and high-pass filtering, treated that voice signal further carries out forecast analysis, so
After extract LSF parameters, and using multi-stage vector quantization to extract characteristic parameter, then by the Parameter Switch after quantization be LPC
Coefficient realizes the synchronization of encoding and decoding by this conversion;
106th, noise shaped quantization is carried out on the basis of step 105, signal is followed by noise shaping noise spectrum
Spectral change makes noise be not easy to be audible;
107th, Interval Coding is carried out to the speech characteristic parameter extracted, realizes entire cataloged procedure.
2. a kind of speech coder code efficiency based on SILK according to claim 1 improves method, it is characterised in that
In step 104, generation analog signal module using comprising time-varying source filter model come encoding speech signal, the encoder by with
Lower part is grouped as:
Input is made of the voice signal comprising some row successive frames;
First signal processing module, it is intended to which the method for adding particular noise signals by the speech signal frame to input is come real
Now to the operation of each voice signal generation analog signal in series of successive frames;
Second signal processing module, it is intended to determine the LPC coefficient signal based on analog signal frame, further determine that based on input
The LPC residual signal of the LPC coefficient of signal;
Third signal processing module, it is intended to represent language by count encoding to LPC coefficient and LPC residual signal to generate
The encoded signal of sound signal;
Analog signal generation step is as follows:
A1:It is added first using input speech signal with the output of noise shaping filter defeated as first of analog output signal
Enter, wherein noise shaping filter by it is long when shaping and shaping filter forms in short-term;
A2:It is inputted using white noise and by second of the quantization gain that noise shaping is analyzed as analog output signal,
Wherein, white noise has following features, i.e. its variance is identical with the variance of quantizing noise;
A3:The output of step A1 and two analog signals obtained by A2 are added to the analog output signal that can be obtained to the end, completed
The generation of analog signal in step 104.
3. a kind of generation analog signal module realizing method according to claim 2, it is characterised in that:Volume in step A4
Code device consists of the following parts:
Input is made of the voice signal comprising some row successive frames;
First signal processing module, it is intended to which the method for adding particular noise signals by the speech signal frame to input is come real
Now to the operation of each voice signal generation analog signal in series of successive frames;
Second signal processing module, it is intended to determine the LPC coefficient signal based on analog signal frame, further determine that based on input
The LPC residual signal of the LPC coefficient of signal;
Third signal processing module, it is intended to represent language by count encoding to LPC coefficient and LPC residual signal to generate
The encoded signal of sound signal.
4. a kind of raising method of the code efficiency of speech coder based on SILK according to claim 1, feature
It is:Noise shaper quantizer individually composes shaping to signal and coding noise in step 106, can be under identical bit
Higher voice quality is obtained, prefilter output signal first is multiplied by a compensating gain G calculated during NSA,
Then it is added, then subtract each other with the output of a predictive filter with synthesizing the output of shaping filter, finally obtains a residual error
Obtained result is inputted a lattice quantizer, quantization by signal, the quantization multiplied by gains that the residual signals and NSA are obtained
The quantizating index representative of device is input to the excitation index of Interval Coding device, the output of predictive filter is added with pumping signal thus
The output signal quantified, while again using quantized output signal as synthesis shaping and the input of predictive filtering.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810040152.2A CN108231083A (en) | 2018-01-16 | 2018-01-16 | A kind of speech coder code efficiency based on SILK improves method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810040152.2A CN108231083A (en) | 2018-01-16 | 2018-01-16 | A kind of speech coder code efficiency based on SILK improves method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108231083A true CN108231083A (en) | 2018-06-29 |
Family
ID=62641268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810040152.2A Pending CN108231083A (en) | 2018-01-16 | 2018-01-16 | A kind of speech coder code efficiency based on SILK improves method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108231083A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110085242A (en) * | 2019-04-28 | 2019-08-02 | 武汉大学 | A kind of adaptive steganography method in SILK fundamental tone domain based on minimum distortion cost |
CN110097887A (en) * | 2019-04-28 | 2019-08-06 | 武汉大学 | A kind of safe steganography method of SILK based on LSF coefficient Statistical Distribution Characteristics |
CN110730015A (en) * | 2019-11-20 | 2020-01-24 | 深圳市星网荣耀科技有限公司 | Multilink portable communication device and voice coding compression and decoding method thereof |
CN111063361A (en) * | 2019-12-31 | 2020-04-24 | 广州华多网络科技有限公司 | Voice signal processing method, system, device, computer equipment and storage medium |
CN112509591A (en) * | 2020-12-04 | 2021-03-16 | 北京百瑞互联技术有限公司 | Audio coding and decoding method and system |
CN112992161A (en) * | 2021-04-12 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Audio encoding method, audio decoding method, audio encoding apparatus, audio decoding medium, and electronic device |
WO2021136343A1 (en) * | 2019-12-31 | 2021-07-08 | 华为技术有限公司 | Audio signal encoding and decoding method, and encoding and decoding apparatus |
CN113744751A (en) * | 2021-08-16 | 2021-12-03 | 清华大学苏州汽车研究院(相城) | Multi-channel speech signal enhancement method applied to microphone array |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102341848A (en) * | 2009-01-06 | 2012-02-01 | 斯凯普有限公司 | Speech encoding |
CN103714822A (en) * | 2013-12-27 | 2014-04-09 | 广州华多网络科技有限公司 | Sub-band coding and decoding method and device based on SILK coder decoder |
-
2018
- 2018-01-16 CN CN201810040152.2A patent/CN108231083A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102341848A (en) * | 2009-01-06 | 2012-02-01 | 斯凯普有限公司 | Speech encoding |
CN103714822A (en) * | 2013-12-27 | 2014-04-09 | 广州华多网络科技有限公司 | Sub-band coding and decoding method and device based on SILK coder decoder |
Non-Patent Citations (3)
Title |
---|
K. VOS等: ""SILK Speech Codec"", 《HTTP://TOOLS.IETF.ORG/HTML/DRAFT-VOS-SILK-02》 * |
KOEN VOS等: ""Voice Coding with Opus"", 《AES 135TH CONVENTION》 * |
郑国宏 等: ""一种适用于VoIP 的宽带语音编码算法: SILK"", 《军事通信技术》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097887A (en) * | 2019-04-28 | 2019-08-06 | 武汉大学 | A kind of safe steganography method of SILK based on LSF coefficient Statistical Distribution Characteristics |
CN110085242B (en) * | 2019-04-28 | 2021-04-16 | 武汉大学 | SILK-based sound range self-adaptive steganography method based on minimum distortion cost |
CN110085242A (en) * | 2019-04-28 | 2019-08-02 | 武汉大学 | A kind of adaptive steganography method in SILK fundamental tone domain based on minimum distortion cost |
CN110730015A (en) * | 2019-11-20 | 2020-01-24 | 深圳市星网荣耀科技有限公司 | Multilink portable communication device and voice coding compression and decoding method thereof |
CN111063361B (en) * | 2019-12-31 | 2023-02-21 | 广州方硅信息技术有限公司 | Voice signal processing method, system, device, computer equipment and storage medium |
CN111063361A (en) * | 2019-12-31 | 2020-04-24 | 广州华多网络科技有限公司 | Voice signal processing method, system, device, computer equipment and storage medium |
WO2021136343A1 (en) * | 2019-12-31 | 2021-07-08 | 华为技术有限公司 | Audio signal encoding and decoding method, and encoding and decoding apparatus |
CN113129910A (en) * | 2019-12-31 | 2021-07-16 | 华为技术有限公司 | Coding and decoding method and coding and decoding device for audio signal |
CN112509591A (en) * | 2020-12-04 | 2021-03-16 | 北京百瑞互联技术有限公司 | Audio coding and decoding method and system |
CN112509591B (en) * | 2020-12-04 | 2024-05-14 | 北京百瑞互联技术股份有限公司 | Audio encoding and decoding method and system |
CN112992161A (en) * | 2021-04-12 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Audio encoding method, audio decoding method, audio encoding apparatus, audio decoding medium, and electronic device |
CN113744751A (en) * | 2021-08-16 | 2021-12-03 | 清华大学苏州汽车研究院(相城) | Multi-channel speech signal enhancement method applied to microphone array |
CN113744751B (en) * | 2021-08-16 | 2024-05-17 | 清华大学苏州汽车研究院(相城) | Multichannel voice signal enhancement method applied to microphone array |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108231083A (en) | A kind of speech coder code efficiency based on SILK improves method | |
US10249313B2 (en) | Adaptive bandwidth extension and apparatus for the same | |
US9837092B2 (en) | Classification between time-domain coding and frequency domain coding | |
CN101180676B (en) | Methods and apparatus for quantization of spectral envelope representation | |
CN100369112C (en) | Variable rate speech coding | |
US6260009B1 (en) | CELP-based to CELP-based vocoder packet translation | |
CN1244907C (en) | High frequency intensifier coding for bandwidth expansion speech coder and decoder | |
JP4270866B2 (en) | High performance low bit rate coding method and apparatus for non-speech speech | |
Hasegawa-Johnson et al. | Speech coding: Fundamentals and applications | |
WO2010028301A1 (en) | Spectrum harmonic/noise sharpness control | |
CN105264596B (en) | The noise filling without side information for Code Excited Linear Prediction class encoder | |
JP2002530705A (en) | Low bit rate coding of unvoiced segments of speech. | |
TW463143B (en) | Low-bit rate speech encoding method | |
EP2888734A1 (en) | Audio classification based on perceptual quality for low or medium bit rates | |
KR100499047B1 (en) | Apparatus and method for transcoding between CELP type codecs with a different bandwidths | |
Lin et al. | Speech enhancement for low bit rate speech codec | |
KR100480341B1 (en) | Apparatus for coding wide-band low bit rate speech signal | |
US6801887B1 (en) | Speech coding exploiting the power ratio of different speech signal components | |
Srivastava et al. | Performance evaluation of Speex audio codec for wireless communication networks | |
KR100554164B1 (en) | Transcoder between two speech codecs having difference CELP type and method thereof | |
Gournay et al. | A 1200 bits/s HSX speech coder for very-low-bit-rate communications | |
Aguilar et al. | An embedded sinusoidal transform codec with measured phases and sampling rate scalability | |
Bakır | Compressing English Speech Data with Hybrid Methods without Data Loss | |
KR100309873B1 (en) | A method for encoding by unvoice detection in the CELP Vocoder | |
JPH08160996A (en) | Voice encoding device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180629 |
|
WD01 | Invention patent application deemed withdrawn after publication |