CN112614495A - Software radio multi-system voice coder-decoder - Google Patents

Software radio multi-system voice coder-decoder Download PDF

Info

Publication number
CN112614495A
CN112614495A CN202011452195.5A CN202011452195A CN112614495A CN 112614495 A CN112614495 A CN 112614495A CN 202011452195 A CN202011452195 A CN 202011452195A CN 112614495 A CN112614495 A CN 112614495A
Authority
CN
China
Prior art keywords
coding
decoding
algorithm
encoding
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011452195.5A
Other languages
Chinese (zh)
Inventor
周小青
李建
刘新
曹清亮
赵静怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huaxin Shengyuan Technology Co ltd
Original Assignee
Beijing Huaxin Shengyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huaxin Shengyuan Technology Co ltd filed Critical Beijing Huaxin Shengyuan Technology Co ltd
Priority to CN202011452195.5A priority Critical patent/CN112614495A/en
Publication of CN112614495A publication Critical patent/CN112614495A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • G10L19/265Pre-filtering, e.g. high frequency emphasis prior to encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a software radio multi-standard voice codec.A hardware adopts an embedded system platform and comprises a coding module and a decoding module, and the coding and decoding processing of various voices such as CVSD, G.729 and MELP is realized through an application process consisting of a main process, an audio process and a coding and decoding algorithm process; the main process provides a communication port for a user, coordinates the work between an audio process and a coding and decoding algorithm process, and inputs configuration parameters; the audio process is used for providing volume adjustment, MIC and linear input channel switching and processing of a recording interface and a playback interface; the encoding and decoding algorithm process is provided with a plurality of algorithms, so that encoding and decoding processing are realized, parameters are configured to be used, and an interface of an audio process is obtained. The invention integrates a plurality of voice coding algorithms in a chip, meets the requirement of flexible switching of the waveform system of a new generation of software radio communication radio station on voice, and can realize flexible switching of the radio station voice.

Description

Software radio multi-system voice coder-decoder
Technical Field
The invention relates to a software radio device, in particular to a software radio multi-mode voice coder-decoder.
Background
Language is an important means for human interaction, and the most common form of data in a communication system is speech. Voice communication is one of the most basic and important ways of human communication. With the development of the social era, people rapidly enter the information era, and the requirements on the utilization rate of various resources are higher and higher, so that the development of the voice coding and decoding technology is promoted.
At present, global military and civil communication systems have different voice encoding and decoding modes due to different communication environments, communication distances, channel bandwidths and user requirements. At present, each military and soldier category in China has a respective independent communication system, the voice coding and decoding systems are different, and coding and decoding special chips adopted in corresponding equipment in the system are different, so that interconnection and intercommunication among different combat systems cannot be realized, and the combat efficiency is influenced. The new generation military radio station adopts a flexible architecture of a software radio system, and can realize interconnection and intercommunication of information (characters, images and videos) among the radio stations. In order to realize interconnection under such a structure, a radio station needs to be capable of flexibly switching between various coding and decoding systems, which urgently needs a multi-system voice codec.
Disclosure of Invention
The invention provides a software radio multi-system voice codec, which solves the problems of single communication mode code rate and algorithm fixation and poor system flexibility of the traditional customized radio station, and adopts the following technical scheme:
a software radio multi-standard voice codec, the hardware adopts the embedded system platform, include the code module and decode the module, realize CVSD, G.729, MELP many pronunciation code and decode the processing through the application process that the main process, audio frequency process and code and decode the algorithm process make up; the main process provides a communication port for a user, coordinates the work between an audio process and a coding and decoding algorithm process, and inputs configuration parameters; the audio process is used for providing volume adjustment, MIC and linear input channel switching and processing of a recording interface and a playback interface; the encoding and decoding algorithm process is provided with a plurality of algorithms, so that encoding and decoding processing are realized, parameters are configured to be used, and an interface of an audio process is obtained.
The encoding and decoding steps of the invention are opposite, wherein the encoding process comprises the following steps:
s1: setting input configuration parameters by the main process to form working parameters of a coding and decoding algorithm process;
s2: an adjustable gain amplifier in the audio process receives the collected audio data, amplifies the audio data and sends the amplified audio data to an ADC module of the audio process;
s3: the ADC module of the audio process converts the audio data into analog-digital data and transmits the analog-digital data to the coding module of the coding and decoding algorithm process through the annular buffer area;
s4: a coding module of the coding and decoding algorithm process selects a corresponding algorithm decision to perform coding processing according to the system;
s5: and finally, the coding module of the coding and decoding algorithm process outputs the corresponding code stream to other equipment through the network port.
The encoding and decoding algorithm process is provided with a CVSD encoding algorithm, the CVSD encoding algorithm tracks the change of a signal by continuously changing the size of a quantum order delta during encoding so as to reduce granular noise and slope overload distortion, and the quantum order delta is output based on the past 3 or 4 samples;
1) when f (n) > g (n), the comparator output e (n) >0, the digital code y (n) > 1, the integrator output g (n) > g (n-1) + delta
2) When f (n) < g (n), the comparator output e (n) <0, the digital code y (n) <0, the integrator output
g(n)=g(n-1)-δ。
The CVSD decoding algorithm process is provided with a CVSD decoding algorithm, when the CVSD decoding algorithm is used for decoding, the decoding is to judge the received digital code y (n), the integrator outputs a rising value when receiving a 1 code, the integrator outputs a falling value when receiving a 0 code, the integrator outputs a rising value when continuously receiving the 1 codes, otherwise, the integrator outputs a falling value when continuously receiving the 0 codes, and thus, the input signal is recovered;
1) when y (n) is 1, the integrator outputs g (n) g (n-1) + δ
2) When y (n) is 0, the integrator outputs g (n) g (n-1) - δ.
The encoding and decoding algorithm process is provided with a G.729 encoding algorithm, when the G.729 encoding algorithm is used for encoding, an input signal is subjected to high-pass filtering preprocessing, LP analysis is performed once every 10ms frame, LP filter coefficients are calculated and converted into line spectrum pairs, the line spectrum pairs are defined as LSP, an excitation signal is searched by an A-B-S method, the search is performed by taking the minimum error perception weighting of original speech and synthesized speech as a measure, and a perception weighting filter is constructed by using unquantized LP coefficients; determining excitation parameters once per subframe, quantized and unquantized LP filter coefficients for subframe 2, and using interpolated LP coefficients in subframe 1, estimating an open-loop pitch delay from the perceptually weighted speech signal every 10ms frame; repeating for each subframe: the target signal is obtained by calculating LP residual filtered by a weighted synthesis filter; calculating the impulse response of the weighted synthesis filter; searching values near the open-loop pitch delay by using the target signal and the impulse response to perform closed-loop pitch analysis; subtracting the contribution of the adaptive code book from the target signal, and using the new target signal for searching the fixed code book to find the optimal excitation; finally, the filter is modified by the determined excitation signal.
The coding and decoding algorithm process is provided with a G.729 coding algorithm, when the G.729 coding algorithm is used for decoding, parameter numbers are firstly extracted from a received code stream, the numbers are decoded to obtain coding parameters corresponding to a 10ms voice frame, the parameters are LSP parameters, two fractional fundamental tone time delays, two fixed code vectors and two groups of self-adaptive and fixed code word gains, the LSP parameters of each subframe are interpolated and converted into LPC filter coefficients, and then, the processing is carried out according to the following steps every 5ms subframe: firstly, respectively multiplying self-adaptive code words and fixed code words by respective gains and adding to form excitation; secondly, exciting the LPC synthesis filter to reconstruct voice; and thirdly, the reconstructed voice signal is subjected to post-processing, including long-time post-filtering, short-time comprehensive filtering and high-pass filtering.
The coding and decoding algorithm process is provided with a 2.4kbps coding algorithm, and the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter direct-current power frequency interference; then, multi-band mixed excitation is adopted to carry out unvoiced and voiced sound judgment so as to accurately extract a fundamental tone signal; linear prediction mainly includes analysis of input speech and analysis of residual signals; when the periodicity of the voiced segment signal is not good, exciting unstable vocal cord pulses at a decoding end by adopting an excitation source adaptive to the aperiodic mark; according to the minimum principle of perceptual weighted distortion, a four-level codebook fast search vector quantization algorithm is adopted to quantize the related parameters; and packaging the error correction coded bit stream and then transmitting the error correction coded bit stream.
The coding and decoding algorithm process is provided with a 1.2kbps coding algorithm, and compared with a 2.4kbps coding algorithm, only the intra-frame correlation is removed in linear prediction, and the code rate is reduced.
The coding and decoding algorithm process is provided with a 0.6kbps coding algorithm, wherein coding is divided into parameter extraction and parameter quantization, and the parameter extraction of a coder is divided into four parts, namely fundamental tone extraction, band-pass unvoiced and voiced sound analysis, line spectrum frequency parameter extraction and gain estimation; during decoding, firstly, unpacking the received bit streams, arranging the bit streams according to the parameter sequence, distinguishing the coded bit streams of each parameter, then sending the coded bit streams of each parameter to a parameter decoding module, and decoding each parameter by adopting an inverse quantization means to obtain four parameters of line spectrum frequency, band-pass unvoiced and voiced sound judgment, pitch period and gain of the whole super frame; and finally, forming an excitation signal by using the fundamental tone period, the residual harmonic amplitude and the band-pass voiced and unvoiced decision, performing spectrum enhancement processing on the generated excitation signal by using the line spectrum frequency, and performing voice synthesis processing on the input excitation signal by using the line spectrum frequency and the gain to obtain two frames of synthesized voice signals and outputting the two frames of synthesized voice signals.
The software radio multi-system voice codec is suitable for the requirement of flexible switching of the waveform system of a new generation software radio communication radio station on voice, realizes flexible switching of the radio station voice, and replaces the communication mode that the traditional customized radio station can only realize single conversation by depending on a single chip. The coder-decoder adopts a micro-system integration technology, integrates a plurality of voice coding algorithms into one chip, comprises a plurality of voice coding and decoding modes of CVSD, G.729 and MELP, is suitable for the requirement of flexible switching of a waveform system of a new generation of software radio communication radio station on voice, and can realize flexible switching of the radio station voice. The problem that the combat communication systems cannot be interconnected and intercommunicated is solved, and the combat efficiency is improved. Plays an important role in various communication systems of sea, land and air.
Drawings
FIG. 1 is a schematic diagram of a multi-process design of the software radio multi-mode speech codec;
FIG. 2 is a schematic diagram of the relationship of user space and kernel space of the present invention;
FIG. 3 is a schematic of the encoding flow process of the present invention;
FIG. 4 is a diagram of a delta modulation waveform and a corresponding digital code pattern in a CVSD speech codec algorithm;
FIG. 5 is a schematic diagram of the encoding operation of a CVSD;
FIG. 6 is a diagram illustrating the decoding operation of a CVSD;
FIG. 7 is a schematic diagram of the G.729 speech codec algorithm for encoding;
FIG. 8 is a schematic diagram of the decoding performed by the G.729 speech codec algorithm;
FIG. 9 is a schematic illustration of the encoding performed by the 2.4kbps speech codec algorithm;
FIG. 10 is a schematic illustration of the decoding performed by the 2.4kbps speech codec algorithm;
FIG. 11 is a schematic illustration of the encoding performed by the 1.2kbps speech codec algorithm;
FIG. 12 is a schematic illustration of the decoding performed by the 1.2kbps speech codec algorithm;
FIG. 13 is a schematic illustration of the encoding performed by the 0.6kbps speech codec algorithm;
FIG. 14 is a schematic illustration of decoding performed by the 0.6kbps speech codec algorithm.
Detailed Description
First, the software and hardware platform introduction of the invention
The invention loads the multi-system voice coding and decoding software to the embedded system platform, so that the chip has a plurality of voice coding and decoding modes such as CVSD, G.729, MELP and the like, has high voice quality, meets the requirements of various rates and various coding and decoding modes, and has full duplex voice coding and decoding capability. The multi-standard speech coding and decoding rate can be changed between 600bps and 32000 bps. Natural sound quality and speech intelligibility is maintained even at 600 bps.
The software radio multi-mode voice codec comprises two parts of hardware and software, wherein the hardware adopts an embedded system and mainly provides a platform for the software, and the software comprises an equipment driver and an application program. The software combines a plurality of voice coding and decoding algorithms, and can flexibly switch according to the requirements of customers so as to meet the requirements of the customers.
The software radio multi-system voice codec comprises a coding module and a decoding module, and the working mode is coding and decoding, and can be only coding, only decoding or coding and decoding.
And (3) encoding: firstly, audio data collected by a microphone is subjected to analog-to-digital (A/D) conversion and then transmitted to a coding module; then, selecting a corresponding coding algorithm according to the system to perform coding processing, and finally outputting a corresponding coding code stream to other equipment through a network port;
and (3) decoding: firstly, transmitting code stream data of the network port to a decoding module; and then selecting a corresponding decoding algorithm according to the system for decoding, and finally outputting the decoded data to a loudspeaker after digital-to-analog (D/A) conversion.
As shown in fig. 1, the software-defined radio multi-mode speech codec adopts a multi-process design, and its functions are split into: a main process, an audio process, and a coding/decoding algorithm process (coding algorithm process, decoding algorithm process), which are related as follows:
(1) a main process: providing a communication port for a user, completing data analysis, and coordinating the work among an audio process, an encoding algorithm process and a decoding algorithm process;
(2) and (3) audio process: providing volume adjustment, MIC and linear input channel switching, input gain control, and processing of a recording interface and a playback interface;
(3) encoding and decoding algorithm process: algorithms 1-n are provided to provide algorithm support for encoding algorithm processes and decoding algorithm processes, and to provide process parameter configuration and interface processing for acquiring audio processes.
The multi-process design of fig. 1 is implemented in both software and hardware, as shown in fig. 2, the software may be regarded as a user space, the top layer of the software is an application process, the application process is implemented by a modular application program made using a library language, and the hardware may be regarded as a kernel space, which is implemented by a device driver layer.
The application process comprises a main process, an audio process, an encoding algorithm process and a decoding algorithm process in the figure 1; the modularized application program comprises a voice algorithm, state management, command interaction, data interaction, protocol processing, log management, CORBA service and a ring buffer area; the library language comprises a system C language library and other third party libraries; the device driver layer comprises hardware drivers such as an SPI driver, a UART driver, a network driver, a GPIO driver, an audio driver, a Flash driver and the like.
The following is a functional description of some of the main contents:
(1) host process
And the system is responsible for scheduling the whole software and deciding the flow and the used parameters of the current software.
(2) Audio process
The alsa-lib library is relied on to provide recording, playback and audio parameter setting services for the system. And a socket communication mode is adopted between the process and other processes.
(3) Encoding and decoding process
And coding and decoding algorithm services of various systems are provided depending on coding and decoding algorithms. And the communication with other processes is carried out in a socket mode.
(4) Speech algorithm
The method realizes the speech coding and decoding algorithm interfaces and configuration interfaces such as CVSD, G729, MELP and the like, and various systems are mutually independently designed into independent processes.
(5) State management
And a state machine for realizing system work and simultaneously outputting each state on the GPIO. And performing abnormal state indication on the GPIO.
(6) Data interaction
TCP connection management is adopted as a middleware interface of a communication layer, a specific communication hardware port is shielded for upper-layer application, and data receiving and sending, communication timeout and the like are realized. The server is designed for concurrency and provides access of a plurality of clients.
(7) Protocol processing
Realizing CORBA protocol layer and self-defining TCP protocol encapsulation analysis.
(8) Command interaction
And man-machine interaction is realized in the program for troubleshooting at the test stage, and the module belongs to an independent module and can operate any other module.
(9) Log management
And writing and reading logs, adding a time stamp to each log, and determining to immediately output a print file, write a log file only and print and write a file according to parameters.
Secondly, the invention describes the processing steps of coding and decoding
As shown in fig. 3, based on the platform provided by the software and hardware, the encoding process of the present invention includes the following processes:
s1: setting input configuration parameters by the main process to form working parameters;
s2: the adjustable gain amplifier receives the collected audio data, amplifies the audio data and then sends the amplified audio data to the ADC module;
s3: the ADC module transmits the audio data to the coding module through an annular buffer area after analog-to-digital (A/D) conversion;
s4: the coding module selects a corresponding algorithm decision to perform coding processing according to the system; the selection process of the algorithm is realized through data frames, when the system is selected, the algorithm type is selected by sending the data frames in the prior receiving and sending process, the data frames sent by the sending end contain the information of the algorithm type, and the receiving end determines according to the information.
S5: and finally, the coding module outputs the corresponding code stream to other equipment through the network port.
Accordingly, the decoding process of the present invention is to perform the inverse process of the encoding process.
Thirdly, description of various algorithms in the invention
The coding and decoding algorithm process mainly realizes the coding and decoding functions of voice, has various voice coding and decoding modes such as CVSD (16K/32K), G.729(8K), MELP (2.4K/1.2K/0.6K) and the like, has high-quality voice quality, and can meet the requirements of various rates, various coding and decoding modes and full-duplex communication systems.
The voice coding and decoding method comprises a CVSD voice coding and decoding algorithm, a G.729 voice coding and decoding algorithm, a 2.4kbps voice algorithm, a 1.2kbps voice algorithm and a 0.6kbps voice algorithm, wherein the algorithms are as follows:
1. CVSD voice coding and decoding algorithm
In a plurality of voice coding and coding modulations, continuous variable slope delta modulation (CVSD) is used as one of a plurality of delta modulations, belongs to a differential waveform quantization technology, only one bit of code needs to be coded, code pattern synchronization is not needed between a sending end and a receiving end, and the size of a step delta can automatically track signal change, so that the voice coding and coding modulation method has strong error code resistance.
At present, a CVSD (composite video signal) special encoder is available in the market, but the universality, flexibility and expandability of the special encoder are greatly limited, the development period of a product is long, and the development cost is high. The special CVSD coder can only realize one-way coding and decoding, and needs a plurality of special coders when a plurality of paths of CVSD coding and decoding are needed, so that the special CVSD coder has limitation.
CVSD is a delta modulation mode in which the magnitude of the step delta varies continuously with the average slope of the input speech signal, as shown in fig. 4. The working principle is as follows: approximating the speech signal by a plurality of line segments with continuously variable slopes, wherein when the slope of the line segment is positive, the corresponding digital code is 1; when the slope of a line segment is negative, the corresponding number is encoded as 0.
When the CVSD operates in the encoding mode, the flow is shown in fig. 5. CVSD tracks signal changes to reduce grain noise and slope overload distortion by constantly changing the magnitude of the step δ, which is based on the past 3 or 4 sample outputs.
1) When f (n) > g (n), the comparator output e (n) >0, the digital code y (n) > 1, the integrator output
g(n)=g(n-1)+δ
2) When f (n) < g (n), the comparator output e (n) <0, the digital code y (n) <0, the integrator output
g(n)=g(n-1)-δ
When the CVSD operates in the decoding mode, the process is shown in fig. 6, the decoding is to determine the received digital code y (n), the integrator outputs a rising value when receiving a "1" code, the integrator outputs a falling value when receiving a "0" code, and the output rises (or falls) when continuously receiving "1" codes (or "0" codes), so that the input signal can be approximately recovered.
1) When y (n) is 1, the integrator outputs g (n) g (n-1) + δ.
2) When y (n) is 0, the integrator outputs g (n) g (n-1) - δ.
2. G.729 speech coding and decoding algorithm
ITU-T published the 8kbps Algebraic Code Excited Linear Prediction (CS-acelP) speech coding scheme with conjugated Structure proposed by G.729 in 3.1996. The scheme is characterized in that the analysis window adopts a mixed window; LSP (Linear spectral Pair) parameter adopts two-stage vector quantization; the codebook search with the subframe as a unit is divided into adaptive codebook search and algebraic codebook search; the pitch analysis combines open-loop pitch analysis and self-adaptive codebook search, so that the operation amount is reduced, the quantization bit number of the pitch is reduced, the accuracy of pitch prediction is improved, an algebraic codebook algorithm is simple, a codebook does not need to be stored, and the recovered tone quality is clear.
The encoding workflow of the g.729 algorithm is shown in fig. 7, where the input signal is first subjected to high-pass filtering preprocessing, LP analysis is performed every 10ms frame, the LP filter coefficients are calculated, and these coefficients are converted into Line Spectral Pairs (LSPs). The excitation signal is searched by the a-B-S method, with a measure of the perceptually weighted minimum of error between the original speech and the synthesized speech, and the perceptually weighted filter is constructed using unquantized LP coefficients.
Excitation parameters (fixed codebook and adaptive codebook parameters) are determined once per subframe (5ms, 40 samples). The quantized and unquantized LP filter coefficients are used for sub-frame 2, while Interpolated (Interpolated) LP coefficients are used in sub-frame 1. The open-loop pitch delay is estimated every 10ms frame from the perceptually weighted speech signal. The following operations are repeated for each subframe: the target signal is calculated from the LP residual filtered by the weighted synthesis filter. And secondly, calculating the impulse response of the weighted synthesis filter. And searching values near the open-loop pitch delay by using the target signal and the impulse response to perform closed-loop pitch analysis (namely, searching the adaptive codebook delay and gain). And fourthly, subtracting the contribution of the adaptive code book from the target signal, and using the new target signal for searching the fixed code book to find the optimal excitation. Finally, the filter is modified by the determined excitation signal.
The decoding work flow of the G.729 algorithm is shown in FIG. 8: firstly, parameter numbers are extracted from a received code stream, and the numbers are decoded to obtain coding parameters corresponding to a 10ms voice frame. These parameters are the LSP parameters, two fractional pitch delays, two fixed codevectors and two sets of adaptive and fixed codeword gains. The LSP parameters per sub-frame are interpolated and converted to LPC filter coefficients, and then processed every 5ms sub-frame as follows: firstly, respectively multiplying self-adaptive code words and fixed code words by respective gains and adding to form excitation; secondly, exciting the LPC synthesis filter to reconstruct voice; and thirdly, the reconstructed voice signal is subjected to post-processing, including long-time post-filtering, short-time comprehensive filtering and high-pass filtering.
3. 2.4kbps speech algorithm
Encoding of MELP algorithm as shown in fig. 9, the whole algorithm can be divided into two parts of parameter extraction and parameter quantization. The extraction of parameters of the MELP coder is divided into a fundamental tone extraction part, a band-pass unvoiced and voiced analysis part, a line spectrum pair (LSF) parameter extraction part, a gain estimation part and a Fourier spectrum amplitude extraction part, and the parts are correlated, and one part may use the results of the other parts in the calculation. The parameter quantization part of the MELP coder is characterized by using multi-stage vector quantization, the quantization performance is excellent, the bit number of LSF parameter quantization is effectively reduced, and the calculation complexity is low.
The MELP coding process includes that the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter out the direct current power frequency interference; then, multi-band mixed excitation is adopted to carry out unvoiced and voiced sound judgment so as to accurately extract a fundamental tone signal; linear prediction mainly includes analysis of input speech and analysis of residual signals; when the periodicity of the voiced segment signal is not good, exciting unstable vocal cord pulses at a decoding end by adopting an excitation source adaptive to the aperiodic mark; according to the minimum principle of perceptual weighted distortion, a four-level codebook fast search vector quantization algorithm is adopted to quantize the related parameters; and packaging the error correction coded bit stream and then transmitting the error correction coded bit stream.
Decoding of the MELP algorithm is shown in fig. 10. MELP uses a speech generation model more conforming to human pronunciation mechanism to synthesize speech, and utilizes adaptive spectrum enhancement technology and pulse spread filtering technology to carry out post-processing on the synthesized speech, so as to improve the matching degree of the synthesized speech and the analyzed speech, thereby obtaining higher reconstructed speech quality.
The decoder unpacks the received code stream bits and arranges the bits according to the parameter sequence; then decoding is carried out, the whole decoding process comprises data unpacking and mixed excitation signal generation, and then the mixed excitation signal is processed by adopting a series of methods to improve the quality of the synthesized voice; finally, the synthesized voice is obtained.
4. 1.2kbps speech algorithm
As shown in FIGS. 11 and 12, the 1.2kbps speech coding algorithm is performed on the basis of 2.4kbps MELP. In order to further reduce the code rate, a multi-frame joint coding technology is adopted, namely three continuous frames are adopted to form a super frame for coding, and each frame in the super frame is called as a sub frame. The frame length of the sub-frame is 22.5ms (or 180 samples), each super-frame is 67.5ms, the super-frame is divided into different states according to the difference of clear/turbid (U/V) attributes of the three sub-frames, and each state adopts different bit allocation schemes. The calculation method of the parameters of each subframe in the superframe is the same as the 2.4kbps algorithm, and in order to improve the quality, the 1.2kbps algorithm is added with two algorithm modules of fundamental tone smoothing and smoothing of band-pass voiced sound intensity during parameter estimation.
5. 0.6kbps speech algorithm
Regarding the parameter extraction method of 600bps speech coding, because the algorithm is an improvement on the basis of MELP, the extraction method is consistent, only four parameters which are important for speech intelligibility are reserved for reducing the speech coding rate: line spectrum frequency, unvoiced and voiced decision, pitch period, and gain. Three consecutive frames are encoded as a super-frame, each frame in the super-frame being referred to as a sub-frame. The frame length of a subframe is 25ms (or 200 samples), and each superframe is 75ms, and quantization is performed by using 45 bits. The following is the encoding of the above four parameters, and the specific encoding process is as follows.
The encoding of the 0.6kbps speech algorithm is shown in fig. 13. The whole algorithm can be divided into two parts of parameter extraction and parameter quantization. The parameter extraction of the encoder is divided into four parts of fundamental tone extraction, band-pass unvoiced and voiced sound analysis, Line Spectral Frequency (LSF) parameter extraction and gain estimation.
The encoding process is as follows: the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter direct-current power frequency interference; then, the preprocessed voice signals are respectively sent to four modules of linear predictive analysis, band-pass voiced sound intensity analysis, fundamental tone detection and gain analysis.
After passing through the four modules, the parameter vector of the super frame can be obtained, and a proper quantization mode is selected for quantization coding. And finally outputting the obtained 45-bit speech coding data frame to a coding channel. The decoder adopts a voice generation model which is more in line with human pronunciation mechanism to synthesize voice, and utilizes the adaptive spectrum enhancement technology and the pulse spread filtering technology to carry out post-processing on the synthesized voice, so that the matching degree of the synthesized voice and the analyzed voice is improved, and the reconstructed voice quality is higher.
As shown in fig. 14, after receiving the bit stream transmitted from the channel, the decoding end first unpacks the received bit stream, arranges the bit stream in the order of the parameters, and distinguishes the encoded bit streams of the parameters. And then, the coded bit stream of each parameter is sent to a parameter decoding module, and each parameter is decoded by adopting a proper inverse quantization means to obtain four parameters of the line spectrum frequency, the band-pass unvoiced and voiced sound judgment, the pitch period and the gain of the whole super frame. And then, forming an excitation signal by using the fundamental tone period, the residual harmonic amplitude and the band-pass voiced and unvoiced decision, and performing spectrum enhancement processing on the generated excitation signal by using the line spectrum frequency. And finally, carrying out voice synthesis processing on the input excitation signal by using line spectrum frequency and gain to obtain two frames of synthesized voice signals and outputting the two frames of synthesized voice signals.
The invention has the following characteristics: 1. the speech codec rate may vary from 600bps to 32000 bps. 2. Multi-language algorithms are optimized on a criteria-based basis. 3. A plurality of voice coding and decoding modes are realized in an embedded system by using a digital mode, and can be freely switched.
The software radio multi-standard voice coder-decoder provided by the invention has the following physical characteristics:
(1) the external dimension is as follows: 35X35X5 (width X depth X height mm) (+0.01mm), technical grade no greater than 50 g.
(2) The environmental temperature requirement is as follows:
working temperature: minus 40 ℃ to plus 85 ℃.
Storage temperature: minus 55 ℃ to plus 125 ℃.
(3) Operating voltage and frequency:
the working voltage is 3.3V, and the working frequency is 600 MHz.
(4) The application environment requires: the internal and external field environment is applicable.
The invention utilizes the microsystem SIP packaging technology: the technology carries out secondary packaging on the components with smaller volume, so that the product has smaller volume, and the microsystem SIP packaging technology needs higher process requirements and technical level.

Claims (9)

1. A software defined radio multi-mode speech codec, characterized by: the hardware adopts an embedded system platform, which comprises a coding module and a decoding module, and realizes the coding and decoding processing of various voices such as CVSD, G.729 and MELP through an application process consisting of a main process, an audio process and a coding and decoding algorithm process; the main process provides a communication port for a user, coordinates the work between an audio process and a coding and decoding algorithm process, and inputs configuration parameters; the audio process is used for providing volume adjustment, MIC and linear input channel switching and processing of a recording interface and a playback interface; the encoding and decoding algorithm process is provided with a plurality of algorithms, so that encoding and decoding processing are realized, parameters are configured to be used, and an interface of an audio process is obtained.
2. The software defined radio multi-system speech codec of claim 1, wherein: the encoding and decoding steps of the invention are opposite, wherein the encoding process comprises the following steps:
s1: setting input configuration parameters by the main process to form working parameters of a coding and decoding algorithm process;
s2: an adjustable gain amplifier in the audio process receives the collected audio data, amplifies the audio data and sends the amplified audio data to an ADC module of the audio process;
s3: the ADC module of the audio process converts the audio data into analog-digital data and transmits the analog-digital data to the coding module of the coding and decoding algorithm process through the annular buffer area;
s4: a coding module of the coding and decoding algorithm process selects a corresponding algorithm decision to perform coding processing according to the system;
s5: and finally, the coding module of the coding and decoding algorithm process outputs the corresponding code stream to other equipment through the network port.
3. The software defined radio multi-system speech codec of claim 1, wherein: the encoding and decoding algorithm process is provided with a CVSD encoding algorithm, the CVSD encoding algorithm tracks the change of a signal by continuously changing the size of a quantum order delta during encoding so as to reduce granular noise and slope overload distortion, and the quantum order delta is output based on the past 3 or 4 samples;
1) when f (n) > g (n), the comparator output e (n) >0, the digital code y (n) > 1, the integrator output g (n) > g (n-1) + delta
2) When f (n) < g (n), the comparator output e (n) <0, the digital code y (n) <0, the integrator output
g(n)=g(n-1)-δ。
4. The software defined radio multi-system speech codec of claim 1, wherein: the CVSD decoding algorithm process is provided with a CVSD decoding algorithm, when the CVSD decoding algorithm is used for decoding, the decoding is to judge the received digital code y (n), the integrator outputs a rising value when receiving a 1 code, the integrator outputs a falling value when receiving a 0 code, the integrator outputs a rising value when continuously receiving the 1 codes, otherwise, the integrator outputs a falling value when continuously receiving the 0 codes, and thus, the input signal is recovered;
1) when y (n) is 1, the integrator outputs g (n) g (n-1) + δ
2) When y (n) is 0, the integrator outputs g (n) g (n-1) - δ.
5. The software defined radio multi-system speech codec of claim 1, wherein: the encoding and decoding algorithm process is provided with a G.729 encoding algorithm, when the G.729 encoding algorithm is used for encoding, an input signal is subjected to high-pass filtering preprocessing, LP analysis is performed once every 10ms frame, LP filter coefficients are calculated and converted into line spectrum pairs, the line spectrum pairs are defined as LSP, an excitation signal is searched by an A-B-S method, the search is performed by taking the minimum error perception weighting of original speech and synthesized speech as a measure, and a perception weighting filter is constructed by using unquantized LP coefficients; determining excitation parameters once per subframe, quantized and unquantized LP filter coefficients for subframe 2, and using interpolated LP coefficients in subframe 1, estimating an open-loop pitch delay from the perceptually weighted speech signal every 10ms frame; repeating for each subframe: the target signal is obtained by calculating LP residual filtered by a weighted synthesis filter; calculating the impulse response of the weighted synthesis filter; searching values near the open-loop pitch delay by using the target signal and the impulse response to perform closed-loop pitch analysis; subtracting the contribution of the adaptive code book from the target signal, and using the new target signal for searching the fixed code book to find the optimal excitation; finally, the filter is modified by the determined excitation signal.
6. The software defined radio multi-system speech codec of claim 1, wherein: the coding and decoding algorithm process is provided with a G.729 coding algorithm, when the G.729 coding algorithm is used for decoding, parameter numbers are firstly extracted from a received code stream, the numbers are decoded to obtain coding parameters corresponding to a 10ms voice frame, the parameters are LSP parameters, two fractional fundamental tone time delays, two fixed code vectors and two groups of self-adaptive and fixed code word gains, the LSP parameters of each subframe are interpolated and converted into LPC filter coefficients, and then, the processing is carried out according to the following steps every 5ms subframe: firstly, respectively multiplying self-adaptive code words and fixed code words by respective gains and adding to form excitation; secondly, exciting the LPC synthesis filter to reconstruct voice; and thirdly, the reconstructed voice signal is subjected to post-processing, including long-time post-filtering, short-time comprehensive filtering and high-pass filtering.
7. The software defined radio multi-system speech codec of claim 1, wherein: the coding and decoding algorithm process is provided with a 2.4kbps coding algorithm, and the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter direct-current power frequency interference; then, multi-band mixed excitation is adopted to carry out unvoiced and voiced sound judgment so as to accurately extract a fundamental tone signal; linear prediction mainly includes analysis of input speech and analysis of residual signals; when the periodicity of the voiced segment signal is not good, exciting unstable vocal cord pulses at a decoding end by adopting an excitation source adaptive to the aperiodic mark; according to the minimum principle of perceptual weighted distortion, a four-level codebook fast search vector quantization algorithm is adopted to quantize the related parameters; and packaging the error correction coded bit stream and then transmitting the error correction coded bit stream.
8. The software defined radio multi-system speech codec of claim 7, wherein: the coding and decoding algorithm process is provided with a 1.2kbps coding algorithm, and compared with a 2.4kbps coding algorithm, only the intra-frame correlation is removed in linear prediction, and the code rate is reduced.
9. The software defined radio multi-system speech codec of claim 1, wherein: the coding and decoding algorithm process is provided with a 0.6kbps coding algorithm, wherein coding is divided into parameter extraction and parameter quantization, and the parameter extraction of a coder is divided into four parts, namely fundamental tone extraction, band-pass unvoiced and voiced sound analysis, line spectrum frequency parameter extraction and gain estimation; during decoding, firstly, unpacking the received bit streams, arranging the bit streams according to the parameter sequence, distinguishing the coded bit streams of each parameter, then sending the coded bit streams of each parameter to a parameter decoding module, and decoding each parameter by adopting an inverse quantization means to obtain four parameters of line spectrum frequency, band-pass unvoiced and voiced sound judgment, pitch period and gain of the whole super frame; and finally, forming an excitation signal by using the fundamental tone period, the residual harmonic amplitude and the band-pass voiced and unvoiced decision, performing spectrum enhancement processing on the generated excitation signal by using the line spectrum frequency, and performing voice synthesis processing on the input excitation signal by using the line spectrum frequency and the gain to obtain two frames of synthesized voice signals and outputting the two frames of synthesized voice signals.
CN202011452195.5A 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder Pending CN112614495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011452195.5A CN112614495A (en) 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011452195.5A CN112614495A (en) 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder

Publications (1)

Publication Number Publication Date
CN112614495A true CN112614495A (en) 2021-04-06

Family

ID=75234476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011452195.5A Pending CN112614495A (en) 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder

Country Status (1)

Country Link
CN (1) CN112614495A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542401A (en) * 2021-07-13 2021-10-22 北京太极疆泰科技发展有限公司 Voice communication method based on Lora technology
CN117793077A (en) * 2024-02-23 2024-03-29 中国电子科技集团公司第三十研究所 Communication system and soft-hard volume adjusting method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005315973A (en) * 2004-04-27 2005-11-10 Seiko Epson Corp Semiconductor integrated circuit
CN101506876A (en) * 2006-06-21 2009-08-12 哈里公司 Vocoder and associated method that transcodes between mixed excitation linear prediction (melp) vocoders with different speech frame rates
CN106098072A (en) * 2016-06-02 2016-11-09 重庆邮电大学 A kind of 600bps very low speed rate encoding and decoding speech method based on MELP

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005315973A (en) * 2004-04-27 2005-11-10 Seiko Epson Corp Semiconductor integrated circuit
CN101506876A (en) * 2006-06-21 2009-08-12 哈里公司 Vocoder and associated method that transcodes between mixed excitation linear prediction (melp) vocoders with different speech frame rates
CN106098072A (en) * 2016-06-02 2016-11-09 重庆邮电大学 A kind of 600bps very low speed rate encoding and decoding speech method based on MELP

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟利: ""多种语音业务处理平台的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技》, pages 5 - 21 *
王国文;赵耿, 方晓等: ""MELP低比特数字语音算法研究和改进"", 《第十六届全国青年通信学术会议论文集(上)》, pages 80 - 82 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542401A (en) * 2021-07-13 2021-10-22 北京太极疆泰科技发展有限公司 Voice communication method based on Lora technology
CN117793077A (en) * 2024-02-23 2024-03-29 中国电子科技集团公司第三十研究所 Communication system and soft-hard volume adjusting method thereof

Similar Documents

Publication Publication Date Title
CN1158647C (en) Spectral magnetude quantization for a speech coder
KR100804461B1 (en) Method and apparatus for predictively quantizing voiced speech
US8032369B2 (en) Arbitrary average data rates for variable rate coders
US8346544B2 (en) Selection of encoding modes and/or encoding rates for speech compression with closed loop re-decision
CN1244907C (en) High frequency intensifier coding for bandwidth expansion speech coder and decoder
KR100487943B1 (en) Speech coding
US10431233B2 (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
KR100798668B1 (en) Method and apparatus for coding of unvoiced speech
US8090573B2 (en) Selection of encoding modes and/or encoding rates for speech compression with open loop re-decision
WO2001082289A2 (en) Frame erasure compensation method in a variable rate speech coder
EP1145228A1 (en) Periodic speech coding
EP1181687B1 (en) Multipulse interpolative coding of transition speech frames
EP1204968B1 (en) Method and apparatus for subsampling phase spectrum information
CN112614495A (en) Software radio multi-system voice coder-decoder
KR20020033737A (en) Method and apparatus for interleaving line spectral information quantization methods in a speech coder
CA2293165A1 (en) Method for transmitting data in wireless speech channels
KR100389898B1 (en) Method for quantizing linear spectrum pair coefficient in coding voice
Liang et al. A new 1.2 kb/s speech coding algorithm and its real-time implementation on TMS320LC548
JPH07199994A (en) Speech encoding system
Gardner et al. Survey of speech-coding techniques for digital cellular communication systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination