CN105957533B - Voice compression method, voice decompression method, audio encoder and audio decoder - Google Patents

Voice compression method, voice decompression method, audio encoder and audio decoder Download PDF

Info

Publication number
CN105957533B
CN105957533B CN201610260757.3A CN201610260757A CN105957533B CN 105957533 B CN105957533 B CN 105957533B CN 201610260757 A CN201610260757 A CN 201610260757A CN 105957533 B CN105957533 B CN 105957533B
Authority
CN
China
Prior art keywords
bit
frequency domain
bit allocation
quantization
bits
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610260757.3A
Other languages
Chinese (zh)
Other versions
CN105957533A (en
Inventor
杨洋
姚嘉
任金平
高永泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Nanosic Technology Co ltd
Original Assignee
Hangzhou Nanosic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Nanosic Technology Co ltd filed Critical Hangzhou Nanosic Technology Co ltd
Priority to CN201610260757.3A priority Critical patent/CN105957533B/en
Publication of CN105957533A publication Critical patent/CN105957533A/en
Application granted granted Critical
Publication of CN105957533B publication Critical patent/CN105957533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a voice compression method, a voice decompression method, an audio encoder and an audio decoder, wherein MLT (maximum likelihood transform) is used for transforming a time domain signal into a frequency domain signal, an RMS (root mean square) weight analysis method is used for refining quantization grading of the frequency domain signal, and methods such as vector quantization, Huffman coding and the like are used for compressing quantization parameters (quantization weight and bit distribution number) and frequency domain data respectively so as to improve the compression ratio to the maximum extent by ensuring approximately lossless spectral characteristics.

Description

Voice compression method, voice decompression method, audio encoder and audio decoder
Technical Field
The invention belongs to the field of wireless voice signal compression, and particularly relates to a voice compression method and a decompression method based on MLT (multi-level linear transformation) and vector entropy coding, an audio encoder and an audio decoder.
Background
The voice signal compression is to save hardware memory space and facilitate storage and transmission. The wireless digital voice system is different from a common wired audio system, and utilizes air bandwidth to transmit voice signals without using wires as signal transmission carriers, thereby facilitating the actual use experience of users.
The wireless digital audio system based on the embedded technology more effectively combines the embedded technology, the audio coding and decoding technology and the wireless transmission technology together, and has the characteristics of small volume, convenient carrying, high function specialization, lower cost, high stability, good real-time performance and the like. But are limited in bandwidth, delay, and power consumption. Compression algorithms applied to wireless voice transmission are therefore required to have characteristics of high pitch, high quality and compression ratio, low delay, and low computational complexity at the same time.
The sound quality of the current frequency domain compression coding Bluetooth SBC voice algorithm is lower, and the time domain compression algorithms ADPCM, G711 and the like generally have lower compression ratios. Therefore, it is very meaningful to design a high compression ratio, low delay and low computation complexity for wireless transmission to implement higher-quality speech codec and apply it in a wireless audio system based on embedded technology.
The voice data compression utilizes the redundancy of voice signals and the unique perceptibility of the human auditory system, the redundancy of the voice signals is mainly expressed in time domain redundancy and frequency domain redundancy 2, and the currently known voice compression methods can be divided into two types according to coding modes. The first type is: time domain compression, which is performed by the coder of the type by analyzing the correlation of the speech data in the time domain; the second type is: frequency domain compression, which is a type of encoder that compresses speech data by analyzing correlations across the frequency domain.
The first type of compression method mainly adopts time domain redundancy for eliminating voice signals for compression, and sets the quantization level of an adaptive quantizer and updates the predicted value of the next data by calculating the difference value between audio data and the predicted value. The time domain prediction method is difficult to improve the subjective tone quality level under the condition of ensuring a certain compression ratio, so the time domain prediction method has the characteristics of low delay, low computation, medium tone quality and low compression ratio. The mainstream time domain prediction methods include ADPCM, G711 and the like, and the compression ratio is generally between 2:1 and 4: 1.
The second type of compression method mainly adopts the method of eliminating the frequency domain redundancy of the voice signal for compression, generally adopts the method of combining a transform domain with a psychoacoustic model, transforms time domain voice data into frequency domain data through the transform domain, then carries out hierarchical quantization on the frequency domain signal of the voice data according to the auditory characteristics of human ears through the psychoacoustic model, carries out less quantization on the frequency domain part with high auditory sensitivity of human ears, keeps higher precision, carries out more quantization on the frequency domain part with low auditory sensitivity of human ears and keeps less precision. Due to the analysis of the psychoacoustic model, the transform domain method can compress the audio data stream to the maximum extent under the condition of ensuring the subjective feeling of human ears, so the transform domain method has the characteristics of high delay, high complexity, high sound quality and low code stream. The mainstream transform domain method includes subband coding implemented by a cosine modulation filter bank, such as SBC (generally, the sound quality is about 5:1, and the compression ratio is only about 1), and coding implemented by Modified Discrete Cosine Transform (MDCT), such as CELT, SPEEX, and the like (the sound quality is high, but the delay needs 50ms to 100 ms).
Because of the high tone quality, high compression ratio, low delay and low computational complexity required by the voice code stream based on wireless voice transmission, the domain predictive coding of the mainstream in the mainstream first-type encoder can not meet the requirements due to the low compression ratio and tone quality; the mainstream transform domain coding of the second type of encoder cannot meet the requirement of wireless transmission because of high delay and high computation amount.
Disclosure of Invention
In view of the problems in the prior art, an object of the present invention is to provide a speech compression method based on MLT transform and vector entropy coding, which can simultaneously and effectively satisfy the requirements of high sound quality, low delay, high compression ratio and low complex computation of wireless speech transmission. Another object of the present invention is to provide a speech decompression method based on MLT transform and vector entropy coding.
In order to achieve the above object, the speech compression method based on MLT transform and vector entropy coding of the present invention specifically comprises:
1) MLT frequency domain transformation: converting a time domain digital voice signal collected by a digital microphone into a frequency domain spectral coefficient;
2) RMS quantization weight calculation: the frequency domain spectral coefficient is the root mean square RMS of the grouped calculation signals, and the weight of the frequency domain component is calculated through the grouped root mean square;
3) optimal grouping bit allocation: obtaining an optimal grouping bit according to the grouping signal frequency domain component weight and the set bit rate parameter;
4) carrying out vector quantization on the grouped frequency domain voice signals to generate grouped vector quantization coefficients;
5) and carrying out Huffman coding on the grouped vector quantization coefficients to complete data compression.
Further, the step 1) adopts modulation aliasing transformation, converts the PCM time domain audio data of the short time frame into MLT frequency domain spectral coefficients through MLT transformation, and groups the MLT frequency domain spectral coefficients according to frequency domain correlation.
Further, the PCM time domain audio data is firstly subjected to 50% data overlapping and mixing processing, then subjected to anti-aliasing filtering to prevent spectrum overflow, and then subjected to DCT-IV conversion to convert the time domain data into frequency domain spectral coefficients.
Further, the formula of the MLT frequency domain transform is as follows:
Figure BDA0000972459130000031
Figure BDA0000972459130000041
further, in step 2), the quantization weight is calculated by the frequency domain spectral coefficient after time-frequency conversion through root mean square RMS, and the RMS calculation formula is as follows:
Figure BDA0000972459130000042
calculate the quantization weight value for each set of RMS values:
Figure BDA0000972459130000043
further, in the step 3), the optimal grouping bit calculation method includes: and calculating the maximum bit and the minimum bit according to the quantization weight, and optimizing the grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit.
Further, according to the quantization weight value, calculating each group of bit distribution coefficients:
category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
(0≤r≤number_of_regions;-32≤offset≤31);
calculating the bit number required by the prediction quantization according to the bit distribution parameter:
Figure BDA0000972459130000044
then, the number of available bits is calculated from the set bit rate parameter:
estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
Further, the processing procedures of the step 4) and the step 5) are as follows:
A) dividing the frequency domain spectral coefficient into sign bit and intensity, and calculating the normalization index of each group of intensity:
k(i)=MIN{(x*magnitude of(mlt(20r+i))+deadzone_rounding),kmax}
((0<i<20;x=1/(stepsize*(magnitude_of_rms(r)););
B) the normalized indexes are grouped into a vector group bit stream:
Figure BDA0000972459130000045
Figure BDA0000972459130000051
C) and performing Huffman coding on each group of vector groups and symbol bit groups to form a compressed bit stream.
A speech decompression method based on MLT transformation and vector entropy coding aiming at the speech compression method adopts inverse vector quantization and inverse MLT to decompress the speech after data compression, and specifically comprises the following steps:
1) analyzing and performing Haffman decoding on the compressed bit stream to obtain a vector group and a symbol bit group;
2) carrying out inverse normalization operation on the vector group to obtain the intensity of the frequency domain spectral coefficient and a corresponding sign bit to obtain the frequency domain spectral coefficient;
3) and performing inverse modulation aliasing transformation IMLT on the frequency domain spectral coefficient to acquire time domain voice data and finish decoding.
Further, in the step 1), the code stream data after being coded and compressed is analyzed, and time domain PCM stream information of a sampling rate, a bit rate and a time division frame length is obtained.
Further, the inverse normalization operation formula in step 2) is as follows:
Figure BDA0000972459130000052
Figure BDA0000972459130000053
further, the IMLT transformation formula in step 3) is as follows:
Figure BDA0000972459130000054
Figure BDA0000972459130000055
Figure BDA0000972459130000056
wherein
Figure BDA0000972459130000057
An audio encoder implementing the voice compression method comprises an MLT frequency domain converter, an RMS quantization weight calculator, an optimal grouping bit position distributor and a Huffman encoder, wherein a time domain signal is converted into a frequency domain signal through the MLT converter, the RMS quantization weight calculator is adopted to refine the quantization grade of the frequency domain signal, the optimal grouping bit position distributor and the Huffman encoder are adopted to respectively compress quantization parameters and frequency domain data, and the voice data compression ratio is improved to the maximum extent under the condition of ensuring approximately lossless spectral characteristics.
An audio decoder implementing the above speech decompression method comprises a code stream analyzer, a huffman decoder, an inverse vector quantizer, and an inverse MLT transform filter, wherein:
reading code stream data subjected to coding compression in a code stream analyzer for analysis, and acquiring time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
in an inverse vector quantizer, performing inverse quantization operation on the quantized MLT frequency domain spectrum vector by using RMS weight and bit allocation parameters to obtain an MLT frequency domain spectrum coefficient;
in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
and controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating the PCM voice code stream.
The invention has the following beneficial effects: the method realizes high compression ratio, low delay and medium operation complexity under the condition of ensuring high tone quality of voice data, and is more suitable for wireless voice application.
Drawings
FIG. 1 is a compression flow diagram;
FIG. 2 is a decompression flow diagram;
FIG. 3 is a schematic diagram of an MLT transform;
FIG. 4 is a flow chart of optimal bit allocation;
FIG. 5 is a diagram of raw PCM waveform data in the time domain;
FIG. 6 is a graph of raw PCM waveform data spectrum data;
FIG. 7 is a time domain data plot of PCM waveform data after MLT transform;
FIG. 8 is a diagram of the data spectrum of the PCM waveform data after MLT transformation.
Detailed Description
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The invention relates to a voice compression method based on MLT transformation and vector entropy coding, which specifically comprises the following steps:
(1) an MLT (modulated mapped transform) frequency domain converter, wherein the MLT converter is a frequency domain converter, can convert time domain data into independent frames in short time, adopts a 50% frame aliasing mode to ensure that the frequency spectrum of critical data is not distorted, and has the characteristics of linearity, perfect signal reconstruction and the like; the MLT transform formula is as follows:
Figure BDA0000972459130000071
(2) an RMS quantization weight calculator, the RMS calculating a Root-Mean-Square (Root-Mean-Square) of the grouped frequency domain spectral coefficients for representing the quantization weights; compared with the quantization weight represented by an absolute value, the quantization level represented by the RMS value is more, the quantization precision is higher, and the RMS calculation formula is as follows:
Figure BDA0000972459130000072
calculate the quantization weight value for each set of RMS values:
Figure BDA0000972459130000073
(3) and the optimal grouping bit distributor calculates the bit distribution coefficient of each group according to the quantization weight value:
category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}},
(0≤r≤number_of_regions;-32≤offset≤31);
calculating the bit number required by the prediction quantization according to the bit distribution parameter:
Figure BDA0000972459130000081
then, the number of available bits is calculated from the set bit rate parameter:
estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8),
adjusting the bit distribution parameters of each group to obtain that each group of available bits reaches the maximum within the range of the available bit number, and determining the optimal grouping bit;
(4) vector quantization is carried out on the frequency domain spectral coefficients to generate grouped vector quantization coefficients:
dividing the frequency domain spectral coefficient into sign bit and intensity, and calculating the normalization index of each group of intensity:
k(i)=MIN{(x*magnitude of(mlt(20r+i))+deadzone_rounding),kmax},
((0<i<20;x=1/(stepsize*(magnitude_of_rms(r));),
the normalized indexes are grouped into a vector group bit stream:
Figure BDA0000972459130000082
(5) and performing Huffman coding on each group of vector groups and symbol bit groups to form a compressed bit stream.
A speech decompression method based on MLT transformation and vector entropy coding aiming at the speech compression method adopts inverse vector quantization and inverse MLT to decompress the speech after data compression, and specifically comprises the following steps:
(1) decoding and analyzing the compressed code stream by a Huffman decoder to obtain quantized MLT frequency domain spectral coefficient quantized data;
(2) performing inverse quantization analysis on MLT frequency domain pedigree number quantized data by adopting an inverse vector quantizer, performing inverse normalization operation on a vector group, and obtaining frequency domain spectral coefficient intensity and a corresponding sign bit to obtain a frequency domain spectral coefficient;
Figure BDA0000972459130000091
Figure BDA0000972459130000092
(3) performing IMLT (inverse modulation aliasing transform) on the frequency domain spectral coefficients to acquire time domain voice data and finish decoding; the IMLT transformation formula is as follows:
Figure BDA0000972459130000093
Figure BDA0000972459130000094
Figure BDA0000972459130000095
wherein
Figure BDA0000972459130000096
An audio encoder implementing the voice compression method comprises an MLT frequency domain converter, an RMS quantization weight calculator, an optimal grouping bit position distributor and a Huffman encoder, wherein a time domain signal is converted into a frequency domain signal through the MLT converter, the RMS quantization weight calculator is adopted to refine the quantization grade of the frequency domain signal, the optimal grouping bit position distributor and the Huffman encoder are adopted to respectively compress quantization parameters and frequency domain data, and the voice data compression ratio is improved to the maximum extent under the condition of ensuring approximately lossless spectral characteristics.
An audio decoder implementing the above speech decompression method comprises a code stream analyzer, a huffman decoder, an inverse vector quantizer, and an inverse MLT transform filter, wherein:
reading code stream data subjected to coding compression in a code stream analyzer for analysis, and acquiring time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
in an inverse vector quantizer, performing inverse quantization operation on the quantized MLT frequency domain spectrum vector by using RMS weight and bit allocation parameters to obtain an MLT frequency domain spectrum coefficient;
in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
and controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating the PCM voice code stream.
In the invention, the embodiment of the compression part is as shown in figure 1:
(1) sampling voice data using a digital microphone, acquiring PCM raw digital voice data, and dividing the voice data into short time frames: 5ms (80sample), 10ms (160sample) or 20ms (320sample), and writes information such as the bit rate sampling rate of the PCM configuration into the code stream.
(2) And converting the time domain PCM data of the short-time frame into MLT frequency domain spectral coefficients through MLT transformation. And grouping the MLT frequency domain spectral coefficients according to the frequency domain correlation, and dividing the MLT frequency domain spectral coefficients into 20 groups of MLT frequency domain spectral vectors.
(3) And calculating the RMS of the grouped MLT frequency domain spectral vectors by an RMS weight calculator to obtain the quantization weight of each group of frequency domain spectral vectors, and directly writing the quantization weight into the code stream.
(4) And using the quantization weight RMS of the grouped frequency domain spectrum coefficients in the optimal bit distributor to perform bit distribution calculation on each grouped MLT frequency domain spectrum vector to obtain the optimal bit distribution number, wherein the bit distribution number is also directly written into a code stream.
(5) In the vector quantizer set, quantized spectral coefficients are quantized using quantization weights and optimal bit allocation. And grouping MLT frequency domain spectral vectors to perform vector quantization.
(6) And in a Huffman encoder, carrying out Huffman encoding on the quantization weight, the bit distribution parameter and the quantized grouped MLT frequency domain spectrum vector to obtain a final encoding compressed code stream.
In the present invention, the specific implementation of the decoding part is as shown in fig. 2:
(1) in a code stream analyzer, code stream data subjected to coding compression is analyzed to obtain time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
(2) decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
(3) in the inverse vector quantizer, the quantized MLT frequency-domain spectral vectors are inverse quantized using RMS weights and bit allocation parameters. Obtaining an MLT frequency domain spectral coefficient;
(4) in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
(5) and controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating the PCM voice code stream.
As shown in fig. 3, which is a schematic diagram of MLT transformation, PCM time-domain audio data is first subjected to 50% data overlap mixing processing, then subjected to anti-aliasing filtering to prevent spectrum overflow, and then subjected to DCT-IV transformation to transform the time-domain data into frequency-domain spectral coefficients. As shown in fig. 5, 6, 7, and 8, the PCM data before and after the MLT transform shows that the transformed PCM data and the original PCM data have lossless effect on both time domain and frequency domain information.
As shown in fig. 4, the optimal bit allocation process is a process of allocating bits according to the spectral coefficients of the packet frequency domain:
(1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
(2) and then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit in the limitation of the pre-set signal-to-noise ratio and the residual bit number. If not, resetting the bit allocation parameter, and performing bit allocation again, and if so, entering bit allocation calculation of the next group of frequency domain spectral coefficients. And simultaneously updating the residual bit number for the next group of bit allocation operation.
The psychoacoustic model, the bit allocation and the quantization mode of the embodiment are optimized to simplify the computational complexity of the psychoacoustic model, and the verified frequency domain auditory threshold and the masked threshold are directly applied to analyze the subband data; and because the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation result is not directly transmitted to the decoding end through the code stream, but the bit allocation number is calculated at the decoding end through the same bit allocation mechanism through the quantization factor, so that a large number of code streams are reduced and can be used for transmitting quantized audio data, and the bit allocation number can be adjusted at any time according to the wireless transmission environment by setting the code stream length adjusting parameter.
As described above, in the present invention, the perfectly reconstructed MLT transform is used for time domain to frequency domain conversion for the characteristics of wireless voice transmission application, so that high voice quality of voice data is ensured, the MLT transform length can be directly modified according to the system requirement for delay, low delay is ensured, optimal bit allocation is adopted to ensure that the compression ratio is highest without affecting voice quality, and finally huffman coding is adopted to further compress quantized data.

Claims (9)

1. A method of speech compression, the method comprising:
1) MLT frequency domain transformation: converting a time domain digital voice signal collected by a digital microphone into a frequency domain spectral coefficient;
2) RMS quantization weight calculation: the frequency domain spectral coefficient is the root mean square RMS of the grouped calculation signals, and the weight of the frequency domain component is calculated through the grouped root mean square;
3) optimal grouping bit allocation: obtaining an optimal grouping bit according to the grouping signal frequency domain component weight and the set bit rate parameter;
4) carrying out vector quantization on the grouped frequency domain voice signals to generate grouped vector quantization coefficients;
5) performing Huffman coding on the grouped vector quantization coefficients to complete data compression;
the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
(1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
(2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
in the step 3), the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
0≤r≤number_of_regions;-32≤offset≤31;
calculating the bit number required by the prediction quantization according to the bit distribution parameter:
Figure FDA0002659038940000021
then, the number of available bits is calculated from the set bit rate parameter:
estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
2. The speech compression method as recited in claim 1, wherein the step 1) converts the PCM time domain audio data of the short time frame into MLT frequency domain spectral coefficients through MLT transform using modulation aliasing transform, and the MLT frequency domain spectral coefficients are grouped by frequency domain correlation; the PCM time domain audio data is firstly subjected to 50% data overlapping and mixing processing, then subjected to anti-aliasing filtering to prevent spectrum overflow, and then subjected to DCT-IV transformation to transform the time domain data into frequency domain spectral coefficients.
3. The speech compression method of claim 1, wherein the MLT frequency-domain transform is formulated as follows:
Figure FDA0002659038940000022
0≤m<N,0≤n<2N,N∈(80,160,320);
in the step 2), the quantization weight of the frequency domain spectral coefficients after time-frequency conversion is calculated through root mean square RMS, and the RMS calculation formula is as follows:
Figure FDA0002659038940000023
4. the voice compression method according to claim 1, wherein the processing procedures of the step 4) and the step 5) are as follows:
A) dividing the frequency domain spectral coefficient into sign bit and intensity, and calculating the normalization index of each group of intensity:
k(i)=MIN{(x*magnitude of(mlt(20r+i))+deadzone_rounding),kmax}
0<i<20;x=1/(stepsize*(magnitude_of_rms(r);
B) the normalized indexes are grouped into a vector group bit stream:
Figure DEST_PATH_FDA0002643982680000031
j=index to jthvalue of k();vd=vector dimension;
C) and performing Huffman coding on each group of vector groups and symbol bit groups to form a compressed bit stream.
5. A speech decompression method is characterized in that inverse vector quantization and inverse MLT are adopted to decompress speech after data compression, and specifically comprises the following steps:
1) analyzing and performing Haffman decoding on the compressed bit stream to obtain a vector group and a symbol bit group;
2) carrying out inverse normalization operation on the vector group to obtain the intensity of the frequency domain spectral coefficient and a corresponding sign bit to obtain the frequency domain spectral coefficient;
3) performing inverse modulation aliasing transformation IMLT on the frequency domain spectral coefficient to acquire time domain voice data and finish decoding;
the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
(1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
(2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
0≤r≤number_of_regions;-32≤offset≤31;
calculating the bit number required by the prediction quantization according to the bit distribution parameter:
Figure FDA0002659038940000041
then, the number of available bits is calculated from the set bit rate parameter:
estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
6. The speech decompression method according to claim 5, wherein in step 1), the code stream data after being coded and compressed is analyzed to obtain time domain PCM stream information of a sampling rate, a bit rate and a time division frame length; the inverse normalization operation formula in the step 2) is as follows:
Figure FDA0002659038940000042
Figure FDA0002659038940000043
indicates taking the greatest integer value less than or equal to z,
i=(n+1)vd-j-1;0≤j≤vd-1;0≤n≤vpr-1。
7. the speech decompression method according to claim 5, wherein the IMLT transform formula in the step 3) is as follows:
Figure FDA0002659038940000051
Figure FDA0002659038940000052
Figure FDA0002659038940000053
wherein
Figure FDA0002659038940000054
8. An audio encoder is characterized by comprising an MLT frequency domain converter, an RMS quantization weight calculator, an optimal grouping bit distributor and a Huffman encoder, wherein a time domain signal is converted into a frequency domain signal through the MLT converter, the RMS quantization weight calculator is adopted to refine the quantization grade of the frequency domain signal, the optimal grouping bit distributor and the Huffman encoder are adopted to respectively compress quantization parameters and frequency domain data, and the compression ratio of voice data is improved to the maximum extent under the condition of ensuring approximately lossless spectral characteristics; the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
(1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
(2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
0≤r≤number_of_regions;-32≤offset≤31;
calculating the bit number required by the prediction quantization according to the bit distribution parameter:
Figure FDA0002659038940000061
then, the number of available bits is calculated from the set bit rate parameter:
estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
9. An audio decoder comprising a stream analyzer, a huffman decoder, an inverse vector quantizer, an inverse MLT transform filter, wherein:
reading code stream data subjected to coding compression in a code stream analyzer for analysis, and acquiring time domain PCM stream information such as a sampling rate, a bit rate, a time division frame length and the like;
decoding and obtaining RMS weight, bit distribution parameter and quantized MLT frequency domain spectrum vector in a Huffman decoder;
in an inverse vector quantizer, performing inverse quantization operation on the quantized MLT frequency domain spectrum vector by using RMS weight and bit allocation parameters to obtain an MLT frequency domain spectrum coefficient;
in an inverse MLT transform filter, performing inverse MLT transform filtering on the MLT frequency domain spectral coefficients to obtain time domain PCM data;
controlling PCM data through PCM stream information analyzed by the code stream, and reconstructing and integrating PCM voice code stream;
the bit allocation unit adopts a symmetrical quantization scheme, the bit allocation number is calculated at a decoding end through the same bit allocation mechanism according to the result of bit allocation through a quantization factor, and a code stream length adjustment parameter is set, so that the bit allocation number can be adjusted at any time according to a wireless transmission environment;
the method comprises the following steps of carrying out bit allocation according to a unit of a grouped frequency domain spectral coefficient:
(1) firstly, analyzing RMS quantization weight information of the group of frequency domain spectral coefficients, setting bit distribution parameters, and carrying out bit distribution calculation;
(2) then, calculating the bit number consumed by the predicted bit allocation according to the bit allocation result, and analyzing whether the current predicted bit allocation number meets the limit or not under the limit of the preset signal-to-noise ratio and the residual bit number; if not, resetting bit allocation parameters, and performing bit allocation again, and if so, performing bit allocation calculation of the next group of frequency domain spectral coefficients; simultaneously updating the residual bit number for the next group of bit allocation operation;
the optimal grouping bit calculation method comprises the following steps: calculating a maximum bit and a minimum bit according to the quantization weight, and optimizing grouping bits according to the bit rate parameters to ensure that the optimized bits meet the requirements of each grouping spectral coefficient under the bit limit; and calculating the distribution coefficient of each group of bits according to the quantization weight value:
category(r)=MAX{0,MIN{7,(offset-rms_index(r)/2)}};
0≤r≤number_of_regions;-32≤offset≤31;
calculating the bit number required by the prediction quantization according to the bit distribution parameter:
Figure FDA0002659038940000071
then, the number of available bits is calculated from the set bit rate parameter:
estimated_number_of_available_bits=320+((number_of_available_bits.320)*5/8);
and adjusting the bit distribution parameters of each group to obtain the maximization of each group of available bits within the range of the available bits, and determining the optimal grouping bits.
CN201610260757.3A 2016-04-22 2016-04-22 Voice compression method, voice decompression method, audio encoder and audio decoder Active CN105957533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610260757.3A CN105957533B (en) 2016-04-22 2016-04-22 Voice compression method, voice decompression method, audio encoder and audio decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610260757.3A CN105957533B (en) 2016-04-22 2016-04-22 Voice compression method, voice decompression method, audio encoder and audio decoder

Publications (2)

Publication Number Publication Date
CN105957533A CN105957533A (en) 2016-09-21
CN105957533B true CN105957533B (en) 2020-11-10

Family

ID=56915027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610260757.3A Active CN105957533B (en) 2016-04-22 2016-04-22 Voice compression method, voice decompression method, audio encoder and audio decoder

Country Status (1)

Country Link
CN (1) CN105957533B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583056A (en) * 2018-11-16 2019-04-05 中国科学院信息工程研究所 A kind of network-combination yarn tool performance appraisal procedure and system based on emulation platform
CN111402907B (en) * 2020-03-13 2023-04-18 大连理工大学 G.722.1-based multi-description speech coding method
CN113612672A (en) * 2021-08-04 2021-11-05 杭州微纳科技股份有限公司 Asynchronous single-wire audio transmission circuit and audio transmission method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0684705A2 (en) * 1994-05-06 1995-11-29 Nippon Telegraph And Telephone Corporation Multichannel signal coding using weighted vector quantization
CN101165778A (en) * 2006-10-18 2008-04-23 宝利通公司 Dual-transform coding of audio signals
CN101206860A (en) * 2006-12-20 2008-06-25 华为技术有限公司 Method and apparatus for encoding and decoding layered audio
CN101572087A (en) * 2008-04-30 2009-11-04 北京工业大学 Method and device for encoding and decoding embedded voice or voice-frequency signal
CN101572586A (en) * 2008-04-30 2009-11-04 北京工业大学 Method, device and system for encoding and decoding
CN102081926A (en) * 2009-11-27 2011-06-01 中兴通讯股份有限公司 Method and system for encoding and decoding lattice vector quantization audio
CN102150202A (en) * 2008-07-14 2011-08-10 三星电子株式会社 Method and apparatus to encode and decode an audio/speech signal
CN102801427A (en) * 2012-08-08 2012-11-28 深圳广晟信源技术有限公司 Encoding and decoding method and system for variable-rate lattice vector quantization of source signal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2836122C (en) * 2011-05-13 2020-06-23 Samsung Electronics Co., Ltd. Bit allocating, audio encoding and decoding
CN102436819B (en) * 2011-10-25 2013-02-13 杭州微纳科技有限公司 Wireless audio compression and decompression methods, audio coder and audio decoder

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0684705A2 (en) * 1994-05-06 1995-11-29 Nippon Telegraph And Telephone Corporation Multichannel signal coding using weighted vector quantization
CN101165778A (en) * 2006-10-18 2008-04-23 宝利通公司 Dual-transform coding of audio signals
CN101206860A (en) * 2006-12-20 2008-06-25 华为技术有限公司 Method and apparatus for encoding and decoding layered audio
CN101572087A (en) * 2008-04-30 2009-11-04 北京工业大学 Method and device for encoding and decoding embedded voice or voice-frequency signal
CN101572586A (en) * 2008-04-30 2009-11-04 北京工业大学 Method, device and system for encoding and decoding
CN102150202A (en) * 2008-07-14 2011-08-10 三星电子株式会社 Method and apparatus to encode and decode an audio/speech signal
CN102081926A (en) * 2009-11-27 2011-06-01 中兴通讯股份有限公司 Method and system for encoding and decoding lattice vector quantization audio
CN102801427A (en) * 2012-08-08 2012-11-28 深圳广晟信源技术有限公司 Encoding and decoding method and system for variable-rate lattice vector quantization of source signal

Also Published As

Publication number Publication date
CN105957533A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
JP5539203B2 (en) Improved transform coding of speech and audio signals
JP4212591B2 (en) Audio encoding device
CN101064106B (en) Adaptive rate control algorithm for low complexity aac encoding
US8135583B2 (en) Encoder, decoder, encoding method, and decoding method
US9754601B2 (en) Information signal encoding using a forward-adaptive prediction and a backwards-adaptive quantization
US6115689A (en) Scalable audio coder and decoder
EP1080579B1 (en) Scalable audio coder and decoder
CN103187065B (en) The disposal route of voice data, device and system
JP2018112759A (en) Audio/speech encoding apparatus and audio/speech encoding method
CN102436819B (en) Wireless audio compression and decompression methods, audio coder and audio decoder
KR20070070189A (en) Sound encoder and sound encoding method
CN104838443A (en) Voice audio encoding device, voice audio decoding device, voice audio encoding method, and voice audio decoding method
JPH08278799A (en) Noise load filtering method
RU2505921C2 (en) Method and apparatus for encoding and decoding audio signals (versions)
CN107591157B (en) Transform coding/decoding of harmonic audio signals
TW201724087A (en) Apparatus for coding envelope of signal and apparatus for decoding thereof
CN101206860A (en) Method and apparatus for encoding and decoding layered audio
CN102522092B (en) Device and method for expanding speech bandwidth based on G.711.1
KR102625143B1 (en) Signal encoding method and apparatus, and signal decoding method and apparatus
CN105957533B (en) Voice compression method, voice decompression method, audio encoder and audio decoder
JP2018205766A (en) Method, encoder, decoder, and mobile equipment
KR20080059657A (en) Signal coding and decoding based on spectral dynamics
CN104392726B (en) Encoding device and decoding device
CN103035249B (en) Audio arithmetic coding method based on time-frequency plane context
CN114863942B (en) Model training method for voice quality conversion, method and device for improving voice quality

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant