CN101046955A

CN101046955A - PCM code flow voice detection method

Info

Publication number: CN101046955A
Application number: CN 200610075906
Authority: CN
Inventors: 黄育延
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2006-04-24
Filing date: 2006-04-24
Publication date: 2007-10-03
Also published as: WO2007121648A1

Abstract

The present invention provides a PCM code flow speech detection method. It is characterized by that the speech signal is detected in relay interface, and firstly a detection threshold is set. Said detection method includes the following steps: A, sampling frequency in relay interface to obtain PCM code flow data; B, utilizing obtained PCM code flow data to obtain amplitude of speech signal in detection frame length interior; and C, judging that the amplitude of speech signal in detection frame length interior can meet detection threshold or not, if said amplitude can meet the detection threshold, judging that the speech signal is existed in said detection frame length interior, otherwise, the speech signal is not existed in said detection frame length interior.

Description

A kind of pcm stream speech detection method

Technical field

The present invention relates to field of telecommunications, especially relate to a kind of PCM (Pulse Code Modulation, pulse code modulation (PCM)) code flow voice detection method that is used for telecommunications switch.

Background technology

The switch of field of telecommunications when the autonomous positioning voice quality problem, need have or not voice on Incoming and the out trunk interface pcm stream by switch and detect, with voice quality problems such as identification single-pass, no audio, cross-talks.

Present speech detection method mainly contains GSM VAD, and ITU-T is Annex A G.723.1, ITU-TG.729 Annex B, analysis of noise range calibration and soft calculating (Soft-Computing) etc.GSM VAD wherein, ITU-T G.723.1 Annex A are the energy measuring of regulating decision threshold according to the noise self-adaptation basically, and they have all utilized the method for linear prediction analysis; ITU-T G.729Annex B is based on time domain energy, and the difference information of zero-crossing rate and frequency domain energy detects.

Bottom is described further speech detection method of the prior art respectively:

Prior art one, the time domain energy detection method:

E = Σ_{n = 1}^{N} x^{2} (n)

Represent the energy value of signal in the frame, wherein N is a frame length, and x (n) is the voice amplitude after the digitizing; Voice can be divided into voiceless sound, voiced sound two big classes; Be similar to the not too big ground unrest of white noise, intensity relatively, the short-time energy of voiced sound is high more a lot of than noise, can be used for distinguishing voice and ground unrest.

The shortcoming of prior art one is:

1, the data computation amount is big, needs a large amount of multiplying and summation operation;

2, the data sampling amount is big, needs continuous sampled signal, needs per second to get 8K byte data (sample frequency 8KHz) for pcm stream.

Prior art two, the short-time zero-crossing rate detection method:

ZCR = \frac{1}{2} Σ_{n = 1}^{N - 1} | sign [x (n)] - sign [x (n - 1)]

Short-time zero-crossing rate represents that voice signal passes the number of times of zero level in the frame, wherein

sign [x (n)] = {\frac{1 \cdot \cdot \cdot x (n) &GreaterEqual; 0}{- 1 \cdot \cdot \cdot x (n) < 0}

The is-symbol function; The characteristics of voiceless sound are that short-time energy is smaller in the voice, even near ground unrest, with the difficult resolution of short-time energy, but the zero-crossing rate of voiceless sound is very high, can be used as a foundation judging that voice have or not.

The shortcoming of prior art two is:

The data sampling amount is big, needs continuous sampled signal, needs per second to get 8K byte data for pcm stream.

Prior art three, auto-correlation/cross correlation detection method in short-term:

R (m) = Σ_{n = 0}^{N - 1 - m} x (n) x (n + m)

The autocorrelation of noise is all very little except m=0, and voice signal is stably in short-term, and the autocorrelation height can detect by the autocorrelation function of signal calculated and judge noise and voice; For the oscillogram of voice signal, except main peak, also have higher submaximum, this point also can be used for distinguishing voice and noise;

R (m) = Σ_{n = 0}^{N - 1 - m} x (n) y (n + m)

X (n) is the Incoming voice signal, and y (n) is out voice signal.

Simultaneously, can also use cross correlation to check Incoming and out voice, have only the voice wiring just often, Incoming and out speech data are identical, and certain time delay is just arranged, and are simple crosscorrelation; And the cross correlation of cross-talk or noise is very little, can obviously distinguish.

The cross correlation detection method is used in the switch Incoming, the detection of out trunk interface voice quality is more satisfactory, can detect various voice quality problems such as single-pass, no audio, cross-talk.

The shortcoming of prior art three is:

1, correlation calculations method complexity, operand is big, and must have special-purpose digital signal processor to realize;

2, the data sampling amount is big, needs continuous sampled signal, needs per second to get 8K byte data for pcm stream.

Above-mentioned speech detection method is mainly used in having good real-time performance in the communication system.

But, when on telecommunications switch, realizing Incoming, out trunk interface pcm stream speech detection, because the restriction of the processing power of the high capacity trunking port characteristics of switch and switch veneer, in the general telecommunications switch, relaying E1/T1 quantity is all hundreds and thousands of, trunk interface veneer quantity is also a lot, and general trunk interface veneer all is to support many E1/T1.With E1 is example, each PCM30 has 30 available time slot, if on a trunk interface veneer with 16 E1, carry out Incoming, out relay PC M code flow voice detects, need simultaneously 16 * 2 * 30=960 circuit path to be detected and calculate, if obtain the data of each circuit path with the sample frequency of per second 8KHz, the data sampling amount is big, and operand is big.Therefore, speech detection method of the prior art can not be suitable for.

Summary of the invention

Big at speech detection method data sampling amount in the above prior art, the deficiency that operand is big, the objective of the invention is to, a kind of pcm stream speech detection method is provided, can realize the detection that has or not of pcm stream voice at the high capacity Incoming of telecommunications switch, out trunk interface.

For realizing purpose of the present invention, the invention provides a kind of pcm stream speech detection method, detect voice signal at trunk interface, and detection threshold at first is set, comprising:

Step (A), obtain the pcm stream data with sample frequency fs from trunk interface;

Step (B), according to the pcm stream data obtained, obtain detecting the amplitude of voice signal in the frame length;

Whether the amplitude that step (C), judgement detect voice signal in the frame length satisfies detection threshold, if the amplitude of voice signal satisfies detection threshold in the detection frame length, then judge to detect in the frame length voice signal is arranged, if the amplitude of voice signal does not satisfy detection threshold in the detection frame length, then judging to detect in the frame length does not have voice signal.

The amplitude that step (B) is described to obtain detecting voice signal in the frame length is to utilize the voice signal amplitude computing method of simplifying to obtain, and it further comprises:

Step (B1), obtain the sampling pcm encoder three paragraph sign indicating numbers;

Step (B2), table look-up and obtain the corresponding linear amplitude value of three paragraph sign indicating numbers;

Each range value adds up and obtains the amplitude that detects voice signal in the frame length in step (B3), the detection frame length.

The amplitude that step (B) is described to obtain detecting voice signal in the frame length is the corresponding relation according to pcm encoder and 13 bit linear sign indicating numbers, 8 bit non-uniform encoding data are converted to 13 bit linear speech datas, the amplitude information of reduction actual speech signal obtains the amplitude that detects voice signal in the frame length.

The described sample frequency fs of step (A)＞＞1/ts, wherein, ts is a voice signal stationary time in short-term; Detection frame length T＞＞1/fs.

Described detection threshold comprises fixing amplitude detection thresholding or self-adaptation amplitude detection thresholding.

The implementation method of described self-adaptation amplitude detection thresholding is the amplitude according to former frame voice signals, calculates to obtain averaged amplitude value, adds on the basis of averaged amplitude value that again detection threshold obtains.

The beneficial effect that the present invention brings is:

1, when realizing the pcm stream speech detection, significantly reduced data sampling amount to pcm stream;

2, the flow chart of data processing in the time of can simplifying the pcm stream speech detection can not need to carry out the linear transformation of pcm stream, has improved the arithmetic speed of speech detection;

3, can in telecommunications switch, realize simultaneously the high capacity trunk being carried out the pcm stream speech detection.

Description of drawings

Fig. 1 is the process flow diagram of the embodiment one of pcm stream speech detection method of the present invention;

Fig. 2 is that voice signal amplitude of the present invention changes comparison of wave shape Fig. 1 in time;

Fig. 3 is that voice signal amplitude of the present invention changes comparison of wave shape Fig. 2 in time;

Fig. 4 is that voice signal amplitude of the present invention changes comparison of wave shape Fig. 3 in time;

Fig. 5 is the process flow diagram of the embodiment two of pcm stream speech detection method of the present invention.

Embodiment

Below in conjunction with accompanying drawing, pcm stream speech detection method of the present invention is described further:

As shown in Figure 1, the embodiment one of the inventive method comprises step:

Step (101), obtain the pcm stream data with sample frequency fs from trunk interface;

Step (102), obtain the sampling pcm encoder three paragraph sign indicating numbers;

Step (103), table look-up and obtain the corresponding linear amplitude value of three paragraph sign indicating numbers;

Each range value adds up and obtains the amplitude that detects voice signal in the frame length in step (104), the detection frame length;

Step (105), Threshold detection judge whether the amplitude that detects voice signal in the frame length satisfies detection threshold.

Below above-mentioned steps is elaborated:

As step (101), obtain the pcm stream data with sample frequency fs from trunk interface;

Because the steady in short-term character of voice signal, promptly at short a period of time its signal characteristic of (about tens ms) lining constant substantially (steadily), if only need obtain voice signal amplitude and energy information in a period of time, the then unnecessary full sampling of carrying out the 8KHz sample frequency, only need sampling at a certain time interval, sample frequency fs＞＞1/ts gets final product, wherein, ts is a voice signal stationary time in short-term, is about 50ms.

Accomplish Incoming and out speech detection consistance, the selection that detects frame length is very important, at first, detect frame length must satisfy T＞＞1/fs, could guarantee when having only sampled point abundant that the result of calculation of every frame is steady; Secondly, detecting frame length must can not have too big-difference with the steady cycle in short-term of voice, otherwise can't obtain the variation characteristic of voice; Therefore general detection frame length is got 250ms.

Among the present invention, test records, confirmed sample frequency fs＞＞during 1/ts, detect frame length get T＞＞during 1/fs, can be similar to the Changing Pattern that obtains voice.As shown in Figure 2, be at same section voice signal waveform, when getting fs=8KHz, when the pcm stream of T=250ms is sampled entirely, and as the fs=100Hz that gets among the present invention, during the pcm stream interval sampling of T=250ms, it is identical substantially that its both voice signal amplitudes change waveform in time; Among Fig. 2, series 1 is to change waveform in time with the full voice signal amplitude of sampling of fs=8KHz; Series 2 is that the voice signal amplitude with the fs=100Hz interval sampling changes waveform in time; As seen from the figure, two waveforms are more identical, with the variation tendency of actual speech signal also basically identical.

In addition, the trunk interface veneer of general telecommunications switch does not have synchronization mechanism, and just the start time point of each trunk interface veneer calculating frame is variant, and maximum deviation may be near frame length.As shown in Figure 3, be when frame start time point (the frame zero-time that the embodiment of this place gets 3 tests respectively differs 50ms) inequality, with fs=100Hz of the present invention, during the pcm stream interval sampling of T=250ms, the voice signal amplitude after the sampling changes still basically identical of waveform (waveforms shown in the

series

1,2 and 3) in time.

It can be said that bright, sample frequency fs of the present invention＞＞1/ts, detect frame length T＞＞the pcm stream interval sampling of 1/fs can satisfy the needs of pcm stream speech detection.

As step (102), obtain the sampling pcm encoder three paragraph sign indicating numbers;

In order to simplify the operand of speech detection, the multiplying of avoiding energy measuring to need, the present invention adopts the method for amplitude detection.As mentioned above, what the pcm stream voice adopted is the method for non-uniform encoding, leading pcm encoder with A is example, be to adopt folding binary coding method, represent polarity with the most significant digit sign indicating number, input signal is divided into 8 inhomogeneous section, and represents with 3 bit codes, each section is divided into 16 grades of requirements that guarantee to quantize signal to noise ratio (S/N ratio) representing with 4 bit codes again.Like this, then from the relaying sampling interface to voice signal data can not directly use, at first need to carry out the linear transformation of pcm stream, this just need carry out calculating operations such as multiplication, the linear transformation of pcm stream has increased the calculated amount of speech detection greatly.

Three paragraph sign indicating number in the pcm stream, the same amplitude of representing voice signal, just quantified precision is lower, because adopt the pcm stream interval sampling method among the present invention, therefore quantified precision is little to the influence of final statistics, so can directly use in the pcm stream three paragraph sign indicating number, use as the amplitude of voice signal.

Pcm encoder b1b2b3b4b5b6b7b8,8 bits, is-symbol position, b1 position needs absolute value when the amplitude of calculating, therefore, the b1 position abandon need not, and what need to use is these three paragraph sign indicating numbers of b2b3b4, it represents the amplitude of voice signal, the b2b3b4 position of pcm encoder, have only 8 kinds of one-to-one relationships with the linear amplitude of correspondence, can obtain the corresponding linear amplitude value of three paragraph sign indicating numbers by tabling look-up, its corresponding relation is as shown in table 1:

Table 1

Pcm encoder b2b3b4 position	Corresponding linear amplitude
Pcm encoder b2b3b4 position	Corresponding linear amplitude	000	0
001	1	000	0
001	1	010	2
011	4	010	2
011	4	100	8

101	16
101	16	110	32
111	64	110	32

(annotate: the even bit that A leads pcm encoder needs the negate conversion)

The accumulated value that then detects voice signal amplitude in the frame length is:

V = Σ_{n = 1}^{25} x_{s} (n)

Expression fs=100Hz during T=250ms, detects the amplitude of voice signal in the frame length, x _s(n) be voice signal through interval sampling.

Test records, and the voice amplitude computing method of this simplification can be obtained the actual margin situation of change of pcm stream voice.As shown in Figure 4, series 1 is that the voice signal amplitude that obtains after the linear transformation of normal pcm stream changes waveform in time, series 2 is to adopt the voice signal amplitude of the voice amplitude computing method conversion back acquisition of above-mentioned simplification to change waveform in time, the variation tendency of two waveforms is in full accord, only on the details of concrete amplitude, very little difference is arranged, therefore adopt among the present invention the voice amplitude computing method of this kind simplification can satisfy the needs of pcm stream speech detection as can be seen.

As step (105), Threshold detection, judge whether the amplitude that detects voice signal in the frame length satisfies detection threshold:

The amplitude of voice signal in the detection frame length that obtains by the voice amplitude computing method simplified in the above-mentioned explanation, if the amplitude of voice signal satisfies detection threshold in the detection frame length, then judge in this detection frame length voice signal is arranged, if the amplitude of voice signal does not satisfy detection threshold in the detection frame length, then judging in this detection frame length does not have voice signal.

The setting of detection threshold, can adopt two kinds of methods to set up, a kind of is that fixing amplitude detection thresholding is set, judge whether the amplitude of trunking port voice signal surpasses the amplitude of general voice signal, if surpass the amplitude detection thresholding, then be judged as voice signal, if do not surpass the amplitude detection thresholding, then being judged as does not have voice signal.The method of fixed amplitude detection threshold realizes simple, is adapted to the less demanding system of speech detection accuracy rate, and its shortcoming is when carrying out speech detection in noise circumstance, and the amplitude of noise also might surpass the detection threshold of setting;

Another kind is that self-adaptation amplitude detection thresholding is set, wherein better simply implementation method, it is amplitude according to former frame voice signals, calculate and obtain averaged amplitude value, on the basis of averaged amplitude value, add again and detect Men Xian ⊿ V, just obtained self-adaptation amplitude detection thresholding, if surpass the amplitude detection thresholding, then be judged as voice signal, do not surpassed the amplitude detection thresholding, then being judged as does not have voice signal.

Above-mentioned steps is the preferred forms of pcm stream speech detection method of the present invention, in addition, the present invention is obtaining with sample frequency fs on the basis of pcm stream data, pcm stream data to sampling are carried out the detection method that linear transformation is obtained the amplitude of voice signal, as shown in Figure 5, the embodiment two of the inventive method comprises step:

Step (501), obtain the pcm stream data with sample frequency fs from trunk interface;

Step (502), pcm stream linear transformation are obtained the amplitude data that detects voice signal in the frame length;

Step (503), Threshold detection judge whether the amplitude that detects voice signal in the frame length satisfies detection threshold.

Wherein, step (501), (503) are consistent with corresponding steps in the above-mentioned steps, no longer repeat, and step (502) is described below:

Pcm stream is a non-uniform encoding, has that A leads, μ leads two kinds of coded systems, improving small-signal resolution, reaches quantified precision near 13 bits of encoded with the coding of 8 bits.Restrain 13 broken line compound coding methods as A, adopt folding binary coding method, represent polarity, input signal is divided into 8 inhomogeneous section with the most significant digit sign indicating number, and represent that with 3 bit codes each section is divided into 16 grades of requirements that guarantee to quantize signal to noise ratio (S/N ratio) representing with 4 bit codes again.

The corresponding relation of pcm encoder and 13 bit linear sign indicating numbers is as shown in table 2:

Table 2

Pcm encoder b1b2b3b4b5b6b7b8

13 bit linear sign indicating numbers

k000wxyz	s0000000wxyz1
k000wxyz	s0000000wxyz1	k001wxyz	s0000001wxyz1
k010wxyz	s000001wxyz10	k001wxyz	s0000001wxyz1
k010wxyz	s000001wxyz10	k011wxyz	s00001wxyz100
k100wxyz	s0001wxyz1000	k011wxyz	s00001wxyz100
k100wxyz	s0001wxyz1000	k101wxyz	s001wxyz10000
k110wxyz	s01wxyz100000	k101wxyz	s001wxyz10000
k110wxyz	s01wxyz100000	k111wxyz	s1wxyz1000000

The pcm stream linear transformation is the corresponding relation according to pcm encoder shown in the table 2 and 13 bit linear sign indicating numbers, 8 bit non-uniform encoding data are converted to the linear speech data of 13 bits, the amplitude information of reduction actual speech signal obtains the amplitude that detects voice signal in the frame length.

The step of the invention described above is to launch at the characteristics of pcm stream voice, and its characteristics are as described below:

Though the voice signal right and wrong stably, time becomes, but it has instantaneous stable state, voice signal is to be the quasi-periodic signal of 3.3～16ms in the cycle, and the basic characteristics of voice signal are steady in short-term, promptly at short a period of time its signal characteristic of (about tens ms) lining constant substantially (steadily); Simultaneously, in the long time period of voice signal, be again the non-stationary acute variation.

And the characteristics of carrying out Incoming, out trunk interface pcm stream speech detection on the telecommunications switch are:

1. real-time is less demanding

Telecommunications switch is when carrying out the pcm stream speech detection, do not need to detect accurately the starting point and the end point of voice, only need judge in the call proceeding in (a few second) in a period of time in the past, whether Incoming and striking out have voice signal to exist, foundation as the voice quality problem location, therefore less demanding to the real-time of speech detection, detection time, precision promptly satisfied request for utilization in 1S;

2. coherence request is higher than accuracy requirement

Telecommunications switch is when carrying out the pcm stream speech detection, the location voice quality problem, key is contrast Incoming and out relay PC M code flow voice testing result, whether consistent in a period of time of call proceeding, therefore the speech detection result of trunking port does not need 100% correct, but good consistance is arranged, promptly same input voice, the result who detects when out at Incoming wants consistent.

Based on above-mentioned discussion, the present invention is feasible, and brings beneficial effect as follows:

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.

Claims

1. a pcm stream speech detection method is characterized in that, detects voice signal at trunk interface, and detection threshold at first is set, and comprising:

2. the method for claim 1 is characterized in that: the amplitude that step (B) is described to obtain detecting voice signal in the frame length is to utilize the voice signal amplitude computing method of simplifying to obtain, and it further comprises:

3. the method for claim 1, it is characterized in that: the amplitude that step (B) is described to obtain detecting voice signal in the frame length is the corresponding relation according to pcm encoder and 13 bit linear sign indicating numbers, 8 bit non-uniform encoding data are converted to 13 bit linear speech datas, the amplitude information of reduction actual speech signal obtains the amplitude that detects voice signal in the frame length.

4. the method for claim 1 is characterized in that: the described sample frequency fs of step (A)＞＞ ¹/ _Ts, wherein, ts is a voice signal stationary time in short-term.

5. method as claimed in claim 4 is characterized in that, also comprises: detection frame length T＞＞ ¹/ _Fs

6. the method for claim 1, it is characterized in that: described detection threshold comprises fixing amplitude detection thresholding or self-adaptation amplitude detection thresholding.

7. method as claimed in claim 6 is characterized in that: the implementation method of described self-adaptation amplitude detection thresholding is the amplitude according to former frame voice signals, calculates to obtain averaged amplitude value, adds on the basis of averaged amplitude value that again detection threshold obtains.