CN112233686A - Voice data processing method of NVOCPLUS high-speed broadband vocoder - Google Patents

Voice data processing method of NVOCPLUS high-speed broadband vocoder Download PDF

Info

Publication number
CN112233686A
CN112233686A CN202011047245.1A CN202011047245A CN112233686A CN 112233686 A CN112233686 A CN 112233686A CN 202011047245 A CN202011047245 A CN 202011047245A CN 112233686 A CN112233686 A CN 112233686A
Authority
CN
China
Prior art keywords
voice
signal
value
parameter
voice data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011047245.1A
Other languages
Chinese (zh)
Other versions
CN112233686B (en
Inventor
肖文雄
朱振荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Liansheng Software Development Co ltd
Original Assignee
Tianjin Liansheng Software Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Liansheng Software Development Co ltd filed Critical Tianjin Liansheng Software Development Co ltd
Priority to CN202011047245.1A priority Critical patent/CN112233686B/en
Publication of CN112233686A publication Critical patent/CN112233686A/en
Application granted granted Critical
Publication of CN112233686B publication Critical patent/CN112233686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a voice data processing method of an NVOCPLUS high-speed broadband vocoder, which comprises the following steps: step 1, an encoding end carries out initialization configuration and analysis processing on an original voice digital signal, judges whether the current voice signal is voice, and calculates a pitch period and unvoiced and voiced numerical parameters of each sub-band after extracting a pitch in the voice if the current voice signal is voice; step 2, extracting and quantizing parameters of the line spectrum pair, the pitch value, the gain parameter, the residual compensation gain and the codebook vector to obtain a voice quantization parameter; and 3, after the voice quantization parameters in the step 2 are extracted, synthesizing the voice quantization parameters into voice, increasing the voice quality through noise pressing, and performing voice reconstruction after the parameter recovery fails or the voice synthesis fails. The invention can provide good voice quality and strong adaptability to speech in low speed and in the application of losing voice frequency below 300 Hz.

Description

Voice data processing method of NVOCPLUS high-speed broadband vocoder
Technical Field
The invention belongs to the technical field of vocoder digital voice compression, and particularly relates to a voice data processing method of a high-speed wideband vocoder of a NVOCPLUS.
Background
With the rapid development of communication technology, frequency and resources are precious, and compared with an analog voice communication system, a digital voice communication system has the characteristics of strong anti-interference performance, high confidentiality, easiness in integration and the like, and a low-speed vocoder plays an important role in the digital voice communication system.
At present, most of speech coding algorithms are established on the basis of acoustic models of human vocal organs. The human vocal organs consist of the glottis, vocal tract and other auxiliary organs. The actual speech generation process is that the vibration generated by glottis is modulated by sound channel filter and radiated via mouth and nose, etc., and can be expressed as the following formula
s(n)=h(n)*e(n)
Wherein s (n) represents a voice signal, h (n) represents a unit impulse response of a sound channel filter, and e (n) represents a glottal vibration signal.
In order to clearly represent speech signals, the glottal and the vocal tract can be respectively described from the spectral characteristics, and how to efficiently quantize the characteristic parameters of the glottal and the vocal tract is the target to be achieved by the algorithm of parameter coding.
Vocoders belong to the class of parametric coding, which is a method of compressing the digital representation of a speech signal and recovering the most similar speech to the original speech with fewer bits (bits). With the explosive increase in the efficiency of digital signal processing hardware, vocoders have been used in large quantities in addition to the accelerated research of vocoders.
The existing NVOC narrowband vocoder comprises two code rates of 2.4kbps and 2.2kbps (used for encryption), the channel FEC code rate is 1.2kbps, and the voice codec and FEC are both encoded and decoded by using 8K samples and 20 milliseconds as a frame. The NVOC wideband vocoder implements a 12.2kbps high speed, more (200+ bit) than (40+ bit) after compression, and the encoded data carries more information that helps to recover the voice.
In the field of the existing wideband vocoder, because the compression ratio of the voice coding is not high, the following problems still exist on the premise of obtaining better tone quality and accuracy: (1) the gene parameters are extracted by utilizing the time domain correlation, so that the calculation is easy to be wrong; (2) because the sound is not subjected to noise reduction, the extracted sound parameters are inaccurate when noise exists; (4) neglecting compatibility with low-speed narrowband vocoders.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a voice data processing method of an NVOCPLUS high-speed broadband vocoder, which has reasonable design, high voice quality and strong adaptability to the dialect.
The invention solves the practical problem by adopting the following technical scheme:
a voice data processing method of NVOCPLUS high-speed wideband vocoder comprises the following steps:
step 1, an encoding end carries out initialization configuration and analysis processing on an original voice digital signal, firstly carries out noise suppression processing on the original voice digital signal, then judges whether the current voice signal is voice, if the current voice signal is voice, calculates a pitch period and unvoiced and voiced numerical parameters of each sub-band after extracting a pitch in the voice;
step 2, extracting and quantizing parameters of a line spectrum pair, a base pitch value, a gain parameter, a residual compensation gain and a codebook vector on the basis of the numerical parameters of the pitch period, the unvoiced sound and the voiced sound calculated in the step 1 to obtain a sound quantization parameter;
and 3, after the voice quantization parameters in the step 2 are extracted, synthesizing the voice quantization parameters into voice, increasing the voice quality through noise pressing, and performing voice reconstruction after the parameter recovery fails or the voice synthesis fails.
The step 1 specifically includes:
(1) carrying out noise suppression processing on the original voice digital signal S (n) to obtain voice data S after noise suppression1(n) and sound frequency spectrum characteristics of 0-4000 Hz of the original data S (n);
(2) judging whether the current voice signal after noise suppression processing is voice by adopting VAD activation detection technology to obtain voice data S2(n);
(3) Extracting voice data S2A pitch of (n);
(4) and calculating the parameters of the pitch period and the unvoiced and voiced values of each sub-band.
Moreover, the step 1, step (1), comprises the following specific steps:
firstly, a high-pass filter is adopted to remove direct-current components from voice data, improve high-frequency components and attenuate low frequency;
secondly, windowing signals, namely obtaining energy distribution on a frequency spectrum by adopting a Hamming window with the window length of N through overlapped Fourier transform to obtain voice data S after noise suppression1(n), noise suppression result parameters and sound spectrum characteristics of 0-4000 Hz of the original voice digital signal S (n).
Moreover, the specific method in the step 1 and the step (2) is as follows:
according to the auditory characteristics of human ears, the voice data S after noise suppression1(n) filtering the sub-band and calculating the level of the sub-band signal, estimating the signal-to-noise ratio according to the following formula, and comparing the signal-to-noise ratio with a preset threshold value to further judge whether the current speech signal is voice:
Figure BDA0002708379040000031
wherein, a is the signal level value of the current frame, and b is the current signal level value estimated from the previous frames;
moreover, the specific method in the step 1 and the step (3) is as follows:
dialogue data S using a low-pass filter with a cut-off frequency of B Hz2(n) low-pass filtering, and after the low-pass filtered voice data is inversely filtered by adopting a second-order inverse filter, calculating a self-phase function of an output signal of the second-order inverse filter according to the following formula, and extracting fundamental tone:
Figure BDA0002708379040000041
wherein N is as definedStep 1 said (1) mentioned Window function Window Length, Sw(i) And (3) outputting a signal for the second-order inverse filtering in the step (1) and the step (3).
Moreover, the specific steps of the step 1, the step (4) comprise:
dividing the frequency domain into 5 frequency bands with equal intervals of 0-4000 and respectively [0-500] Hz,
the autocorrelation function of the bandpass signal in each interval is calculated by the following formula:
Figure BDA0002708379040000042
where "t" is a continuous time argument, "τ" is an input signal delay ". The" is a convolution operator ".)*f*() To take conjugation;
the average value of the product of two values of the same time function at the moment t and t + a is taken as the function of time t, which is the measurement of the similarity between the signal and the delayed signal, when the delay time is zero, the average square value of the signal is obtained, the value of the average square value is the maximum at the moment, and the maximum value of the function is taken as the voiced sound intensity to calculate the unvoiced and voiced sound value of each sub-band;
further, the specific steps of step 2 include:
(1) filtering the voice data subjected to noise suppression by adopting a high-pass filter with cut-off frequency of A Hz to obtain S3(n), windowing, calculating autocorrelation coefficients, solving line spectrum pair parameters by using a Levinson-Durbin recursive algorithm, and performing parameter quantization on the obtained line spectrum pair parameters by using a three-level vector quantization scheme;
(2) quantizing the pitch value calculated in the step (3) in the step 1: linearly mapping integer intervals containing pitch values to [ 0-z ]]In the above, the number of z is m1Bit representation;
(3) voice data S detected by voice in step 1 (2)2(n) obtaining a prediction error signal r (n) without the influence of the formants through a second order inverse filter, wherein the coefficient of the second order inverse filter is a1、a21, the gain parameter is expressed by RMS of r (n), and the quantization is completed in a logarithmic domain;
(4) quantizing the maximum value obtained by the correlation function of the band-pass signal value after the frequency domain segmentation of the step 1 and the step 4 into m2A bit;
(5) computing residual compensation gain, computing linear prediction coefficient by using quantized LSF parameter to form prediction error filter for input speech S2(n) filtering to obtain a residual signal, wherein the length of the residual signal is 160 points;
(6) windowing the prediction residual error by using a Hamming window with the window length of 160 points, supplementing 0 to 512 points to a windowed signal, performing 512-point complex FFT on the windowed signal, and finding a Fourier transform value corresponding to the first x-order harmonic by using a spectrum peak point detection algorithm;
(7) setting P as quantized fundamental tone, setting the initial position of the ith harmonic as 512i/P, and searching the maximum peak value with the width within 512/P frequency samples and the initial position of each subharmonic as the center by peak point detection, wherein the width is truncated into an integer; the harmonic number of the search is limited to the smaller of x and P/4; the coefficients corresponding to the harmonics are then normalized, using an m for this x-dimensional vector3∈[0,48]Quantizing the vector codebook of bits to obtain m3∈[0,48]A bit.
Moreover, the specific method for synthesizing the voice quantization parameter into the voice in the step 3 is as follows:
by dividing into several frequency bands to form excitations, adding them and passing them through a synthesis filter to obtain synthesized speech, and then post-filtering the synthesized speech to obtain decoded synthesized speech data, where the synthesis filter H (z) and the post-filter HpfThe z-transform transfer function of (z) is as follows:
H(z)=1/A(z)
Figure BDA0002708379040000061
wherein A (z) is 1-az-1A is the filter coefficient, z in all the above equations is a complex variable having real and imaginary parts, and z may be made ejw,γ=0.56,βMu is 0.75, mu is determined by the reflection coefficient, the value of mu depends on
Figure BDA0002708379040000062
Furthermore, the method further comprises the following steps before the step 3:
and initializing and configuring a decoding end, wherein the initialization and configuration comprises rate selection, parameters of an algorithm of the decoding end and filter coefficients.
Before the step of initializing and configuring the decoding end, the method also comprises the following steps:
expanding the linear prediction coefficient, the excitation gain parameter and the gene period parameter in the step 3 to respectively obtain expanded parameters;
the method comprises the following specific steps:
(1) expanding the gain value quantization interval obtained in the step 3, and calculating the molecular frame to obtain an expanded excitation gain parameter;
(2) respectively subtracting the LSP parameter mean value from the current frame LSP parameter and the quantized previous frame LSP parameter in the step 3 to obtain the mean value-removed vectors, and respectively recording the mean value-removed vectors
Figure BDA0002708379040000063
And
Figure BDA0002708379040000064
Figure BDA0002708379040000065
as the input of the hierarchical vector quantization, the quantization is carried out, namely the expanded LSP linear prediction parameters are obtained;
(3) enlarging the gene value quantization bit obtained in the step 3, calculating a molecular frame once every two subframes, namely dividing the interval set in the step 3 (2) into two parts of the corresponding subframe, respectively calculating the maximum value and the index i according to the autocorrelation function in the step 2 (3), and respectively using the maximum value and the index i
Figure BDA0002708379040000071
And (5) carrying out normalization to obtain the period parameter of the expanded gene.
The invention has the advantages and beneficial effects that:
1. the invention provides excellent voice quality under low speed condition, provides good voice quality in application losing voice frequency under 300Hz and has strong adaptability to speech by analyzing the continuity of voice in time domain and the correlation of voice in frequency domain.
2. The invention extracts the actual parameters in two stages, more accurately extracts the parameters, improves the sound quality and saves the calculation resources for users.
3. The invention is different from the low-speed vocoder, expands the linear prediction coefficient, the excitation gain parameter and the gene period parameter, and ensures that the sound reconstruction degree is much higher than the narrow band because the coding result carries more information even under the condition of the error code existing in poor channel quality.
4. The invention restrains the noise through the noise restraining function, improves the accuracy of the extracted sound parameter when the noise exists, and ensures the sound quality.
5. The invention adopts the codebook based on various local conversation training, and has strong adaptability to dialect.
6. The invention is developed based on standard codes, is standard and sustainable, and is easy to be transplanted to various hardware platforms.
Drawings
Fig. 1 is a working principle diagram of the present invention.
Detailed Description
The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:
the input parameter of the voice data processing method of the NVOCPLUS high-speed broadband vocoder is a linear PCM voice digital signal with the sampling rate of 8000Hz (the number of voice signal samples collected per second) and the resolution of 16 bits; in the time domain, every 20 milliseconds of analysis, and in the frequency domain, a plurality of frequency bands of 0-4000 are analyzed.
A voice data processing method of NVOCPLUS high-speed wideband vocoder, as shown in fig. 1, includes the following steps:
step 1, initializing and configuring a coding end, wherein the initialization and configuration comprises rate selection, parameters and coefficients used by the coding end and a filter coding end algorithm;
step 2, the encoding end performs initialization configuration and analysis processing on the original voice digital signal, firstly performs noise suppression processing on the original voice digital signal, then judges whether the current voice signal is voice, and if the current voice signal is voice, calculates the pitch period and the unvoiced and voiced numerical parameters of each sub-band after extracting the pitch in the voice;
the step 2 specifically comprises the following steps:
(1) noise suppression: carrying out noise suppression processing on the original voice digital signal S (n) to obtain voice data S after noise suppression1(n) and sound frequency spectrum characteristics of 0-4000 Hz of the original data S (n);
the step 2, the step (1), comprises the following specific steps:
firstly, a high-pass filter is adopted to remove direct-current components from voice data, improve high-frequency components and attenuate low frequency;
secondly, windowing signals, namely obtaining energy distribution on a frequency spectrum by adopting a Hamming window with the window length of N through overlapped Fourier transform to obtain voice data S after noise suppression1(n), noise suppression result parameters and sound spectrum characteristics of 0-4000 Hz of the original voice digital signal S (n).
(2) Voice detection: judging whether the current voice signal after noise suppression processing is voice by adopting VAD activation detection technology to obtain voice data S2(n);
The specific method of the step 2 and the step (2) is as follows:
according to the auditory characteristics of human ears, the voice data S after noise suppression1(n) filtering the sub-band and calculating the level of the sub-band signal, estimating the signal-to-noise ratio according to the following formula, and comparing the signal-to-noise ratio with a preset threshold value to further judge whether the current speech signal is voice:
Figure BDA0002708379040000091
wherein, a is the signal level value of the current frame, and b is the current signal level value estimated from the previous frames;
(3) gene estimation first stage: extracting voice data S2A pitch of (n);
the specific method of the step (2) and the step (3) comprises the following steps:
dialogue data S using a low-pass filter with a cut-off frequency of B Hz2(n) low-pass filtering, and after the low-pass filtered voice data is inversely filtered by adopting a second-order inverse filter, calculating a self-phase function of an output signal of the second-order inverse filter according to the following formula, and extracting fundamental tone:
Figure BDA0002708379040000092
wherein N is the window length of the mentioned window function in the step 1, Sw(i) And (3) outputting a signal for the second-order inverse filtering in the step (2) and the step (3).
In this embodiment, in the frequency domain, the speech signal has a peak value and the frequency of the peak value is a multiple relation of fundamental tones, and a possible fundamental tone value or a fundamental tone range value is preliminarily calculated; in the time domain, speech has short-term autocorrelation, and if the original signal has periodicity, its autocorrelation function also has periodicity, and the periodicity is the same as that of the original signal. And peaks occur at integer multiples of the period. The unvoiced sound signal is non-periodic, its autocorrelation function is attenuated with the increase of frame length, the voiced sound is periodic, its autocorrelation function has peak value on the integral multiple of gene period, and the low-pass filter whose cut-off frequency is B Hz is used to make voice data S2(n) low-pass filtering is carried out to remove the influence of the high-frequency signal on the fundamental tone extraction, then a second-order inverse filter is adopted to carry out inverse filtering on the voice data after the low-pass filtering to remove the influence of formants, an autocorrelation function of an output signal of the second-order inverse filtering is calculated, and the fundamental tone is extracted:
Figure BDA0002708379040000101
in the autocorrelation function of the frame, the pitch value of the frame is the sampling rate/frame length at which the maximum value appears, after the first maximum value is removed.
(4) A first stage of multi-subband voiced and unvoiced decision: calculating the value of unvoiced and voiced sounds of each sub-band
The step 2, the step (4) comprises the following specific steps:
dividing the frequency domain into 5 frequency ranges at equal intervals of 0-4000, wherein the frequency ranges are [0-500] Hz, [500- < 1000 > Hz, [1000- < 2000- < 3000 > Hz, [3000- < 4000 > Hz, and calculating the autocorrelation function of the bandpass signal in each interval by using the following formula:
Figure BDA0002708379040000102
wherein ". dot" is a convolution operator, (.)*f*() To take conjugation;
the average value of the product of two values of the same time function at the moment t and t + a is taken as a function of delay time t, which is the measurement of the similarity between the signal and the delayed signal, when the delay time is zero, the average square value of the signal is obtained, the value of the average square value is the maximum at the moment, and the maximum value of the function is taken as the voiced sound intensity to calculate the unvoiced and voiced sound value of each sub-band;
step 3, extracting and quantizing parameters of a line spectrum pair, a base pitch value, a gain parameter, a residual compensation gain and a codebook vector on the basis of the numerical parameters of the pitch period, the unvoiced sound and the voiced sound calculated in the step 2 to obtain a sound quantization parameter;
the specific steps of the step 3 comprise:
(1) filtering the voice data subjected to noise suppression by adopting a high-pass filter with cut-off frequency of A Hz to obtain S3(N), adding a Hamming window with the window length of N2, calculating an autocorrelation coefficient, solving a line spectrum pair parameter (namely a prediction parameter (namely an LSF parameter)) by using a Levinson-Durbin recursive algorithm, and performing parameter quantization on the obtained line spectrum pair parameter by using a three-level vector quantization scheme to obtain m1A bit;
(2) will be described in detail2, quantizing the pitch value calculated in the step (3): linearly mapping integer intervals containing pitch values to [ 0-z ]]In the above, the number of z is m2Bit representation;
(3) voice data S detected by voice in step 2 (2)2(n) obtaining a prediction error signal r (n) without the influence of formants through a second order inverse filter, wherein the coefficient of the second order filter is a1、a21, the excitation gain parameter is expressed by RMS (mean square root) of r (n), and the quantization is completed in a logarithmic domain;
(4) quantizing the maximum value (namely the unvoiced and voiced state value) obtained by the correlation function of the band-pass signal value after the frequency domain segmentation of the step (4) in the step 2 into m3A bit;
(5) calculating spectral compensation gain, calculating linear prediction coefficient by using quantized LSF parameter to form prediction error filter for input speech S2(n) filtering to obtain a residual signal, wherein the length of the residual signal is 160 points;
(6) windowing the prediction residual error by using a Hamming window with the window length of 160 points, supplementing 0 to 512 points to a windowed signal, performing 512-point complex FFT on the windowed signal, and finding a Fourier transform value corresponding to the first x-order harmonic by using a spectrum peak point detection algorithm;
(7) let P be the quantized pitch, given an initial position of the ith harmonic of 512i/P, peak detection looks for the largest peak within 512/P frequency samples centered around the initial position of each subharmonic, this width being truncated to an integer. The number of harmonics to be searched is limited to the smaller of x and P/4. The coefficients corresponding to these harmonics are then normalized, using an m for this x-dimensional vector4∈[0,48]Quantizing the vector codebook of bits to obtain m4∈[0,48]A bit.
Step 4, expanding the linear prediction coefficient, the excitation gain parameter and the gene period parameter in the step 3 to respectively obtain an expansion parameter;
because the high-speed vocoder has larger bandwidth, more bits can be carried, and in order to improve the accuracy of gene detection, the resolution reliability of pitch detection is improved and the calculation is carried out by using a molecular frame.
In the present embodiment, 12.2kbps is taken as an example, and the meaning of the sub-frame here is expressed as every 40 sampling points (5ms data).
The specific steps of the step 4 comprise:
(1) expanding the gain value quantization interval obtained in the step 3, and calculating the molecular frame to obtain an expanded excitation gain parameter;
(2) respectively subtracting the LSP parameter mean value from the current frame LSP parameter and the quantized previous frame LSP parameter in the step 3 to obtain the mean value-removed vectors, and respectively recording the mean value-removed vectors
Figure BDA0002708379040000121
And
Figure BDA0002708379040000122
Figure BDA0002708379040000123
as the input of the hierarchical vector quantization, the quantization is carried out, namely the expanded LSP linear prediction parameters are obtained;
(3) enlarging the gene value quantization bit obtained in the step 3, calculating a molecular frame once every two subframes, namely dividing the interval set in the step 3 (2) into two parts of the corresponding subframe, respectively calculating the maximum value and the index i according to the autocorrelation function in the step 2 (3), and respectively using the maximum value and the index i
Figure BDA0002708379040000124
And (5) carrying out normalization to obtain the period parameter of the expanded gene.
In this embodiment, the extended parameters are the result of adding the extended information bits on the premise of the original parameters. Not solely as a new parameter derived from the encoding.
Step 5, initializing and configuring a decoding end, wherein the initialization and configuration comprises rate selection (2.2kbps or 2.4kbps), and parameters of an algorithm of the decoding end, filter coefficients and the like;
and 6, after the voice quantization parameters in the steps 3 and 4 are extracted, synthesizing the voice quantization parameters into voice, enhancing the voice quality through noise pressing, and performing voice reconstruction after the parameter recovery fails or the voice synthesis fails.
The specific method of the step 6 comprises the following steps:
the result after each frame signal coding is a numerical value formed by equating a line spectrum pair, gain, a gene period, voiced and voiced sounds and a vector codebook into bits. Among these parameters, the noise suppression result parameter determines whether the audio data segment with excessive environmental noise is replaced by mute or comfortable environmental sound, the pitch period and the unvoiced/voiced value determine the excitation source for synthesizing the speech signal at the decoding end, and according to the step 1 (4) at the encoding end, the unvoiced/voiced signal covers 5 frequency bands, so that the excitation is formed by dividing into several frequency bands, and then the excitation is added and passed through the synthesis filter and post-filtering, so as to obtain the decoded synthesized speech data. If the frame is an unvoiced frame, namely the unvoiced and voiced values bit are all 0, the random number is used as an excitation source, if the frame is a voiced frame, a periodic pulse sequence is selected to generate the excitation source through an all-pass filter, the amplitude of the excitation source is weighted by a gain parameter, and the length of a sampling point depends on the size of a gene period. All-pass filter H1(z) synthesis filter H2(z) and a postfilter HpfThe z-transform transfer function of (z) is as follows:
Figure BDA0002708379040000131
Figure BDA0002708379040000132
Figure BDA0002708379040000133
wherein A (z) is 1-az-1A is filter coefficient, obtained by P transformation of linear prediction parameter in step 4, P transformation is higher mathematical transformation, z in all the above formulas is complex variable and has real part and imaginary part, and z can be made equal to ejwγ is 0.56 and β is 0.75, μ is determined by the reflection coefficient, and the value of μ depends on
Figure BDA0002708379040000141
It can be understood that the algorithm of the encoding and decoding is corresponding, the input parameter format of the decoding end and the output parameter format of the encoding end are also corresponding, the decoder decodes a frame and outputs 160 sampling values, and the calling needs to be unified with the encoder speed.
It should be emphasized that the examples described herein are illustrative and not restrictive, and thus the present invention includes, but is not limited to, those examples described in this detailed description, as well as other embodiments that can be derived from the teachings of the present invention by those skilled in the art and that are within the scope of the present invention.

Claims (10)

1. A voice data processing method of NVOCPLUS high-speed broadband vocoder is characterized in that: the method comprises the following steps:
step 1, an encoding end carries out initialization configuration and analysis processing on an original voice digital signal, firstly carries out noise suppression processing on the original voice digital signal, then judges whether the current voice signal is voice, if the current voice signal is voice, calculates a pitch period and unvoiced and voiced numerical parameters of each sub-band after extracting a pitch in the voice;
step 2, extracting and quantizing parameters of a line spectrum pair, a base pitch value, a gain parameter, a residual compensation gain and a codebook vector on the basis of the numerical parameters of the pitch period, the unvoiced sound and the voiced sound calculated in the step 1 to obtain a sound quantization parameter;
and 3, after the voice quantization parameters in the step 2 are extracted, synthesizing the voice quantization parameters into voice, increasing the voice quality through noise pressing, and performing voice reconstruction after the parameter recovery fails or the voice synthesis fails.
2. The method as claimed in claim 1, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the step 1 specifically comprises the following steps:
(1) carrying out noise suppression processing on the original voice digital signal S (n) to obtain a noise-suppressed voiceSound data S1(n) and sound frequency spectrum characteristics of 0-4000 Hz of the original data S (n);
(2) judging whether the current voice signal after noise suppression processing is voice by adopting VAD activation detection technology to obtain voice data S2(n);
(3) Extracting voice data S2A pitch of (n);
(4) and calculating the parameters of the pitch period and the unvoiced and voiced values of each sub-band.
3. The method as claimed in claim 2, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the step 1, the step (1), comprises the following specific steps:
firstly, a high-pass filter is adopted to remove direct-current components from voice data, improve high-frequency components and attenuate low frequency;
secondly, windowing signals, namely obtaining energy distribution on a frequency spectrum by adopting a Hamming window with the window length of N through overlapped Fourier transform to obtain voice data S after noise suppression1(n), noise suppression result parameters and sound spectrum characteristics of 0-4000 Hz of the original voice digital signal S (n).
4. The method as claimed in claim 2, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the specific method of the step 1 and the step (2) comprises the following steps:
according to the auditory characteristics of human ears, the voice data S after noise suppression1(n) filtering the sub-band and calculating the level of the sub-band signal, estimating the signal-to-noise ratio according to the following formula, and comparing the signal-to-noise ratio with a preset threshold value to further judge whether the current speech signal is voice:
Figure FDA0002708379030000021
where a is the signal level value of the current frame and b is the current signal level value estimated from the previous frames.
5. The method as claimed in claim 2, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the specific method of the step 1 and the step (3) comprises the following steps:
dialogue data S using a low-pass filter with a cut-off frequency of B Hz2(n) low-pass filtering, and after the low-pass filtered voice data is inversely filtered by adopting a second-order inverse filter, calculating a self-phase function of an output signal of the second-order inverse filter according to the following formula, and extracting fundamental tone:
Figure FDA0002708379030000031
wherein N is the window length of the mentioned window function in the step 1, Sw(i) And (3) outputting a signal for the second-order inverse filtering in the step (1) and the step (3).
6. The method as claimed in claim 2, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the step 1, the step (4) comprises the following specific steps:
dividing the frequency domain into 5 frequency ranges at equal intervals of 0-4000, wherein the frequency ranges are [0-500] Hz, [500- < 1000 > Hz, [1000- < 2000- < 3000 > Hz, [3000- < 4000 > Hz, and calculating the autocorrelation function of the bandpass signal in each interval by using the following formula:
Figure FDA0002708379030000032
where "t" is a continuous time argument, "τ" is an input signal delay ". The" is a convolution operator ".)*f*() To take conjugation;
and secondly, taking the average value of the product of two values of the same time function at the moment t and t + a as a function of time t, wherein the average value is a measure of the similarity between the signal and the delayed signal, when the delay time is zero, the average value becomes a mean square value of the signal, the mean square value is the maximum value at the moment, and the maximum value of the function is taken as the voiced sound intensity to calculate the unvoiced and voiced sound value of each sub-band.
7. The method as claimed in claim 1, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the specific steps of the step 2 comprise:
(1) filtering the voice data subjected to noise suppression by adopting a high-pass filter with cut-off frequency of A Hz to obtain S3(n), windowing, calculating autocorrelation coefficients, solving line spectrum pair parameters by using a Levinson-Durbin recursive algorithm, and performing parameter quantization on the obtained line spectrum pair parameters by using a three-level vector quantization scheme;
(2) quantizing the pitch value calculated in the step (3) in the step 1: linearly mapping integer intervals containing pitch values to [ 0-z ]]In the above, the number of z is m1Bit representation;
(3) voice data S detected by voice in step 1 (2)2(n) obtaining a prediction error signal r (n) without the influence of the formants through a second order inverse filter, wherein the coefficient of the second order inverse filter is a1、a21, the gain parameter is expressed by RMS of r (n), and the quantization is completed in a logarithmic domain;
(4) quantizing the maximum value obtained by the correlation function of the band-pass signal value after the frequency domain segmentation of the step 1 and the step 4 into m2A bit;
(5) computing residual compensation gain, computing linear prediction coefficient by using quantized LSF parameter to form prediction error filter for input speech S2(n) filtering to obtain a residual signal, wherein the length of the residual signal is 160 points;
(6) windowing the prediction residual error by using a Hamming window with the window length of 160 points, supplementing 0 to 512 points to a windowed signal, performing 512-point complex FFT on the windowed signal, and finding a Fourier transform value corresponding to the first x-order harmonic by using a spectrum peak point detection algorithm;
(7) setting P as quantized fundamental tone, setting the initial position of the ith harmonic as 512i/P, and searching the maximum peak value with the width within 512/P frequency samples and the initial position of each subharmonic as the center by peak point detection, wherein the width is truncated into an integer; the harmonic number of the search is limited to the smaller of x and P/4; coefficient corresponding to harmonicThen normalized, using an m for the x-dimensional vector3∈[0,48]Quantizing the vector codebook of bits to obtain m3∈[0,48]A bit.
8. The method as claimed in claim 1, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the specific method for synthesizing the voice quantization parameter into the voice in the step 3 is as follows:
by dividing into several frequency bands to form excitations, adding them and passing them through a synthesis filter to obtain synthesized speech, and then post-filtering the synthesized speech to obtain decoded synthesized speech data, where the synthesis filter H (z) and the post-filter HpfThe z-transform transfer function of (z) is as follows:
H(z)=1/A(z)
Figure FDA0002708379030000051
wherein A (z) is 1-az-1A is the filter coefficient, z in all the above equations is a complex variable having real and imaginary parts, and z may be made ejwγ is 0.56 and β is 0.75, μ is determined by the reflection coefficient, and the value of μ depends on
Figure FDA0002708379030000052
9. The method as claimed in claim 1, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: the method also comprises the following steps before the step 3:
and initializing and configuring a decoding end, wherein the initialization and configuration comprises rate selection, parameters of an algorithm of the decoding end and filter coefficients.
10. The method as claimed in claim 9, wherein the voice data processing method of the NVOCPLUS high-speed wideband vocoder comprises: before the step of initializing and configuring the decoding end, the method also comprises the following steps:
expanding the linear prediction coefficient, the excitation gain parameter and the gene period parameter in the step 3 to respectively obtain expanded parameters;
the method comprises the following specific steps:
(1) expanding the gain value quantization interval obtained in the step 3, and calculating the molecular frame to obtain an expanded excitation gain parameter;
(2) respectively subtracting the LSP parameter mean value from the current frame LSP parameter and the quantized previous frame LSP parameter in the step 3 to obtain the mean value-removed vectors, and respectively recording the mean value-removed vectors
Figure FDA0002708379030000055
And
Figure FDA0002708379030000053
Figure FDA0002708379030000054
as the input of the hierarchical vector quantization, the quantization is carried out, namely the expanded LSP linear prediction parameters are obtained;
(3) enlarging the gene value quantization bit obtained in the step 3, calculating a molecular frame once every two subframes, namely dividing the interval set in the step 3 (2) into two parts of the corresponding subframe, respectively calculating the maximum value and the index i according to the autocorrelation function in the step 2 (3), and respectively using the maximum value and the index i
Figure FDA0002708379030000061
And (5) carrying out normalization to obtain the period parameter of the expanded gene.
CN202011047245.1A 2020-09-29 2020-09-29 Voice data processing method of NVOCPLUS high-speed broadband vocoder Active CN112233686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011047245.1A CN112233686B (en) 2020-09-29 2020-09-29 Voice data processing method of NVOCPLUS high-speed broadband vocoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011047245.1A CN112233686B (en) 2020-09-29 2020-09-29 Voice data processing method of NVOCPLUS high-speed broadband vocoder

Publications (2)

Publication Number Publication Date
CN112233686A true CN112233686A (en) 2021-01-15
CN112233686B CN112233686B (en) 2022-10-14

Family

ID=74120236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011047245.1A Active CN112233686B (en) 2020-09-29 2020-09-29 Voice data processing method of NVOCPLUS high-speed broadband vocoder

Country Status (1)

Country Link
CN (1) CN112233686B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115148214A (en) * 2022-07-28 2022-10-04 周士杰 Audio compression method, decompression method, computer device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6094629A (en) * 1998-07-13 2000-07-25 Lockheed Martin Corp. Speech coding system and method including spectral quantizer
CN103050121A (en) * 2012-12-31 2013-04-17 北京迅光达通信技术有限公司 Linear prediction speech coding method and speech synthesis method
CN108597529A (en) * 2018-01-22 2018-09-28 北京交通大学 A police digital trunking system air interface voice monitoring system and method
CN109729553A (en) * 2017-10-30 2019-05-07 成都鼎桥通信技术有限公司 The voice service processing method and equipment of LTE trunked communication system
CN111107501A (en) * 2018-10-25 2020-05-05 普天信息技术有限公司 Group call service processing method and device
CN111243610A (en) * 2020-01-19 2020-06-05 福建泉盛电子有限公司 Method for realizing intercommunication of different vocoder and mobile stations in digital intercommunication system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6094629A (en) * 1998-07-13 2000-07-25 Lockheed Martin Corp. Speech coding system and method including spectral quantizer
CN103050121A (en) * 2012-12-31 2013-04-17 北京迅光达通信技术有限公司 Linear prediction speech coding method and speech synthesis method
CN109729553A (en) * 2017-10-30 2019-05-07 成都鼎桥通信技术有限公司 The voice service processing method and equipment of LTE trunked communication system
CN108597529A (en) * 2018-01-22 2018-09-28 北京交通大学 A police digital trunking system air interface voice monitoring system and method
CN111107501A (en) * 2018-10-25 2020-05-05 普天信息技术有限公司 Group call service processing method and device
CN111243610A (en) * 2020-01-19 2020-06-05 福建泉盛电子有限公司 Method for realizing intercommunication of different vocoder and mobile stations in digital intercommunication system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
龚乐中等: "公网PoC软对讲与PDT数字集群互通方案", 《通信技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115148214A (en) * 2022-07-28 2022-10-04 周士杰 Audio compression method, decompression method, computer device and storage medium

Also Published As

Publication number Publication date
CN112233686B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
Shrawankar et al. Techniques for feature extraction in speech recognition system: A comparative study
JP4308345B2 (en) Multi-mode speech encoding apparatus and decoding apparatus
US5450522A (en) Auditory model for parametrization of speech
KR100348899B1 (en) The Harmonic-Noise Speech Coding Algorhthm Using Cepstrum Analysis Method
EP2491558B1 (en) Determining an upperband signal from a narrowband signal
JP4624552B2 (en) Broadband language synthesis from narrowband language signals
Milner et al. Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model.
JPH05346797A (en) Voiced sound discriminating method
WO2002062120A2 (en) Method and apparatus for speech reconstruction in a distributed speech recognition system
JP2002516420A (en) Voice coder
JP4040126B2 (en) Speech decoding method and apparatus
JP3687181B2 (en) Voiced / unvoiced sound determination method and apparatus, and voice encoding method
KR100216018B1 (en) Method and apparatus for encoding and decoding of background sounds
CN103854655A (en) Low-bit-rate voice coder and decoder
CN112233686B (en) Voice data processing method of NVOCPLUS high-speed broadband vocoder
CN112270934B (en) Voice data processing method of NVOC low-speed narrow-band vocoder
JPH07199997A (en) Audio signal processing method in audio signal processing system and method for reducing processing time in the processing
Shahnaz et al. Robust pitch estimation at very low SNR exploiting time and frequency domain cues
Ramabadran et al. The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction
CN114913844A (en) A broadcast language recognition method based on pitch normalization and reconstruction
JP4527175B2 (en) Spectral parameter smoothing apparatus and spectral parameter smoothing method
EP0713208B1 (en) Pitch lag estimation system
Schlien et al. Acoustic tube interpolation for spectral envelope estimation in artificial bandwidth extension
JPH0736484A (en) Sound signal encoding device
Milner Speech feature extraction and reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant