CN105118516A

CN105118516A - Identification method of engineering machinery based on sound linear prediction cepstrum coefficients (LPCC)

Info

Publication number: CN105118516A
Application number: CN201510637964.1A
Authority: CN
Inventors: 曹九稳; 杨三伟; 王建中; 王瑞荣; 赵伟杰; 吴成才
Original assignee: ZHEJIANG TUWEI POWER TECHNOLOGY Co Ltd
Current assignee: ZHEJIANG TUWEI POWER TECHNOLOGY Co Ltd; Zhejiang Tuwei Electricity Technology Co Ltd
Priority date: 2015-09-29
Filing date: 2015-09-29
Publication date: 2015-12-02

Abstract

The invention provides an identification method of engineering machinery based on sound linear prediction cepstrum coefficients (LPCC), and the method comprises the following steps: collecting background noise signals in a real environment on a construction spot; placing the engineering machinery in an ideal environment, and starting the engineering machinery; collecting sound signals sent by the engineering machinery by a sound colleting device; performing endpoint detection, windowing and framing for the sound signals, wherein the endpoint detection is realized by a zero-crossing rate, and the sound signal framing is carried out by adopting Hamming window; extracting the LPCC from every framed signal; establishing an engineering machinery sound feature fingerprint database by adopting a support vector machine; extracting the LPCC from the real-time sound signals, then performing matching between the support vector machine and the feature fingerprint database so as to realize classification.

Description

Based on the recognition methods of the engineering machinery of sound linear prediction residue error

Technical field

The present invention relates to voice recognition technology field, specifically, relate to a kind of engineering machinery recognition methods based on sound linear prediction residue error.

Background technology

Due to reasons such as land used anxiety, appearance of the city construction, city transmission of electricity major part adopts the form of cable laying, and compared with overground cable, underground cable transmission of electricity has that floor area is little, antijamming capability strong, be conducive to the appearance of the city advantage such as to beautify.But owing to there is a large amount of nonstandard road construction engineer operation, engineering machinery is destroyed to make underground cable suffer, the stability affecting electric power system also can bring huge inconvenience to people's lives and commercial production simultaneously.In road construction process, the destroyed engineering machinery of underground cable is very easily caused to mainly contain mechanical percussion hammer, cutting machine, hand-held electric pick etc.The voice signal produced when operation by gathering project machinery is also carried out Intelligent Recognition and can effectively give the alarm before cable is destroyed, thus auxiliary power company takes measures on customs clearance to prevent cable to be destroyed in time.

Underground cable is often subject to the destruction of engineering machinery, and does not have the special recognition methods for engineering machinery at present, therefore, the present invention proposes a kind of method carrying out recognitive engineering machinery by extracting engineering machinery sound characteristic.

Summary of the invention

For above-mentioned deficiency of the prior art, the invention provides a kind of engineering machinery recognition methods based on sound linear prediction residue error.

The technical solution used in the present invention is:

Based on an engineering machinery recognition methods for sound linear prediction residue error, comprise the steps:

The engineering machinery LPCC coefficient extracted under utilizing ecotopia builds engineering machinery sound characteristic fingerprint base;

The real-time voice signal of engineering machinery extracts LPCC coefficient and sets up;

According to the sound source belonging to it, demarcation of class mark is carried out to the LPCC coefficient of the every frame signal of engineering machinery;

Utilize support vector machine learning algorithm, adopt the LPCC engineering machinery sound characteristic fingerprint base of the every frame signal of engineering machinery after demarcating to carry out model training study, set up disaggregated model;

Real-time for engineering machinery voice signal is extracted the LPCC feature of every frame signal, be input in the disaggregated model that support vector machine learning algorithm trained, according to the classifier functions built, calculate the output valve of this frame measured signal, namely the fitting function that aforementioned training data obtains is adopted, the function calculating measured signal LPCC feature exports, and in the vector obtained, the position of greastest element is the type of engineering apparatus.

The engineering machinery LPCC coefficient extracted under utilizing ecotopia builds engineering machinery sound characteristic fingerprint base and comprises:

Engineering machinery voice signal extracts LPCC coefficient and sets up;

Background noise is extracted LPCC coefficient and is set up;

Engineering machinery voice signal extracts the foundation of LPCC coefficient and comprises:

Under engineering machinery is placed in ecotopia, star up engineering machinery;

The voice signal that voice collection device gathering project machinery sends;

The pre-service of engineering machinery voice signal;

Framing is carried out to engineering machinery voice signal;

LPCC coefficient is extracted to the every frame engineering machinery voice signal after framing.

The pre-service of engineering machinery voice signal comprises: engineering machinery voice signal carries out end-point detection, windowing, framing.

Engineering machinery voice signal end-point detection is realized by zero-crossing rate.

Hamming window is adopted to carry out framing to engineering machinery voice signal.

Background noise is extracted the foundation of LPCC coefficient and is comprised:

Background noise is gathered at the construction field (site) under true environment;

Background noise pre-service;

Framing is carried out to background noise;

LPCC coefficient is extracted to the every frame background noise after framing.

Background noise pre-service comprises: background noise carries out end-point detection, windowing, framing.

Background noise end-point detection is realized by zero-crossing rate.

Hamming window is adopted to carry out framing to background noise.

The real-time voice signal of engineering machinery extracts the foundation of LPCC coefficient and comprises:

Voice collection device gathering project machinery real-time audio signal;

The real-time voice signal pre-service of engineering machinery;

Framing is carried out to the real-time voice signal of engineering machinery;

LPCC coefficient is extracted to the real-time voice signal of the every frame engineering machinery after framing.

The pre-service of engineering machinery real-time voice signal comprises: the real-time voice signal of engineering machinery carries out end-point detection, windowing, framing.

The real-time voice signal end-point detection of engineering machinery is realized by zero-crossing rate.

Hamming window is adopted to carry out framing to the real-time voice signal of engineering machinery.

The voice signal sent when described background noise is the sound of yellow sand car, the voice of people and site works machine work.

Described engineering machinery comprises: electric pick, cutting machine, excavator and mechanical percussion hammer.

The described distance gathered apart from device and engineering machinery, comprise closely, in, three kinds of distances far away.

Described gather apart from device and electric pick, cutting machine, excavator be closely 10 meters; Middle distance is 30 meters; Be 50 meters at a distance.

Described collection is closely 50 meters apart from device and mechanical percussion hammer; Middle distance is 100 meters; Be 150 meters at a distance.

The every frame of the voice signal that engineering machinery sends chooses 1024 sampled points, and it is 512 sampled points that frame moves.

Described voice collection device is cross acoustic array.

The time interval that described cross acoustic array gathers sound is 0.5 second.

The extracting method of described LPCC feature is:

First try to achieve the LPC value of every frame signal, and then try to achieve its LPCC value;

Adopt Levinson-Durbin Algorithm for Solving LPC value;

LPC is linear predictor coefficient, the value of the present or following voice signal sampled point is predicted by the value of the voice signal sampled point in past, namely the value s (n) of voice signal of n-th can be predicted by the linear combination of the value of front p point, as shown in Equation 1:

s(n)≈a ₁s(n-1)+a ₂s(n-2)+…+a _ps(n-p)(1)

Wherein a ₁, a ₂... a _pbe linear predictor coefficient (LPC), p is LPC exponent number, gets 16 here

Levinson-Durbin algorithm, by making mean square deviation between the sampled value of actual audio signal and linear prediction sampled value (formula (2)) minimum, can obtain p rank linear predictor coefficient a ₁, a ₂... a _p.

\begin{matrix} E = Σ_{n = 1}^{N} e_{n}^{2} = Σ_{n = 1}^{N} {(Σ_{k = 0}^{p} a_{k} s_{n - k})}^{2} & a_{0} = 1 \end{matrix} - - - (2)

After linear predictor coefficient obtains, sonification system transfer function model can be obtained:

H (z) = \frac{1}{1 - Σ_{k = 1}^{p} a_{k} z^{- k}} - - - (3)

LPCC is LPC expression in the cepstral domain, and cepstrum coefficient utilizes Homomorphic processing exactly, takes the logarithm after asking transform to voice signal, then transform of negating just can obtain cepstrum coefficient.After acquisition linear predictor coefficient, obtain system model, if shock response is c (n), is defined can be obtained by cepstrum:

\hat{H} (z) = \ln H (z) = \overset{\infty}{Σ} c (n) z^{- n} - - - (4)

(3) are substituted into (4), and to z ^-1ask partial derivative, obtain (5):

(1 - \overset{p}{Σ} a_{k} z^{- k}) \overset{\infty}{Σ} n c (n) z^{- n + 1} = \overset{\infty}{Σ} {ka}_{k} z^{- k + 1} - - - (5)

Make (5) the right and left corresponding coefficient equal, obtain (6):

{\begin{matrix} c_{1} = a_{1} \\ c_{n} = a_{n} + Σ_{k = 1}^{n - 1} \frac{k}{n} c_{k} a_{n - k} & 1 < n \leq p \\ c_{n} = a_{n} + Σ_{k = 1}^{n - 1} \frac{k}{n} c_{k} a_{n - k} & n > p \end{matrix} - - - (6)

Wherein, c ₁, c ₂... c _nbe LPCC value, n is LPCC exponent number, gets n=18 here.

Described engineering machinery sound characteristic fingerprint base method for building up is as follows:

To the 18 rank LPCC values that every frame signal extracts, add row as class mark above, label ' 0 ' represents hand-held electric pick, ' 1 ' and represents noise, ' 2 ' and represent cutting machine, ' 3 ' and represent mechanical percussion hammer, and ' 4 ' represents excavator.Thus form the proper vector on 19 rank;

With libsvm as sorter, choose Radial basis kernel function (RBF) as sorter kernel function; RBF has two parameters: penalty factor c and parameter gamma, can choose numerical value by the grid search function opti_svm_coeff of libsvm; Thus make the RBF kernel function chosen can effectively by data-mapping to higher dimensional space, and convert the engineering instrument signal of all kinds of demarcation to different high dimensional signal by class mark in higher dimensional space;

Training process uses Svmtrain function, comprises four parameters: proper vector, by the above-mentioned labelled LPCC value extracted; Kernel function type, selects RBF kernel function; RBF kernel functional parameter c and gamma, uses grid data service to determine; Can obtain the variable of a model by name after calling svmtrain, this variable save training gained model information, being got off by this variable save identifies for next step;

The svmtest function of the LPCC value libsvm every frame signal obtained carries out intelligent classification;

Svmtest has three parameters, first is class mark, is used for testing discrimination, here get the first row of LPCC value, second is proper vector, namely stores the variable of LPCC value, 3rd is Matching Model, is exactly the variable model that above-mentioned steps training process obtains;

Call rreturn value that svmtest obtains to classify exactly acquired results, i.e. class mark, thus the device type producing this sound can be determined.

The beneficial effect of hinge structure of the present invention:

The present invention can improve recognition effect and reduce alert rate by mistake, effectively can identify hand-held electric pick, cutting machine, mechanical percussion hammer, excavator four kind equipment.The many employing vibrations of the anti-outside destroy of existing underground cable or fiber-optic signal detect, the present invention adopts sound, detecting distance is farther, to hand-held electric pick, cutting machine, the effective detecting distance of excavator can reach 60m, the effective detecting distance of mechanical percussion hammer can reach 180m, and discrimination can reach more than 80%.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet according to the engineering machinery recognition methods based on sound linear prediction residue error of the present invention;

Fig. 2 is that described engineering machinery sound characteristic fingerprint base is set up and intelligent classification process flow diagram;

Embodiment

Referring to drawings and Examples, the present invention will be described in detail:

Accompanying drawing 1 is known, and a kind of engineering machinery recognition methods based on sound linear prediction residue error, comprises the steps:

Utilize support vector machine learning algorithm, adopt the LPCC engineering machinery sound characteristic fingerprint base of the every frame signal of engineering machinery after demarcating to carry out model training study, set up disaggregated model.

(using function svmtrain); Engineering instrument signal by all kinds of demarcation converts different high dimensional signals to by class mark, Data classification is converted to Function Fitting problem.Specific as follows, convert the five class training datas demarcated (" 0 " represents hand-held electric pick, " 1 " expression noise signal, " 2 " expression cutting machine, " 3 " expression mechanical percussion hammer, and " 4 " represent excavator) to five dimensional vectors, wherein

" 0 " represents that hand-held electric pick converts to (1-1-1-1-1)

" 1 " represents that noise signal converts to (-11-1-1-1)

" 2 " represent that cutting machine converts to (-1-11-1-1)

" 3 " represent that mechanical percussion hammer converts to (-1-1-11-1)

" 4 " represent that excavator converts to (-1-1-1-11)

The structure of such sorter just converts Function Fitting to, and function output is five dimensional vectors.

Real-time for engineering machinery voice signal is extracted the LPCC feature of every frame signal, be input in the disaggregated model that support vector machine learning algorithm trained (i.e. the rreturn value of svmtrain function), according to the classifier functions built, calculate the output valve of this frame measured signal, namely the fitting function that aforementioned training data obtains is adopted, the function calculating measured signal LPCC feature exports (output is five dimensional vectors), and in the vector obtained, the position of greastest element is the type of engineering apparatus.

Engineering machinery voice signal extracts LPCC coefficient and sets up;

Background noise is extracted LPCC coefficient and is set up;

The pre-service of engineering machinery voice signal;

Framing is carried out to engineering machinery voice signal;

Background noise pre-service;

Framing is carried out to background noise;

Background noise end-point detection is realized by zero-crossing rate.

Hamming window is adopted to carry out framing to background noise.

Voice collection device gathering project machinery real-time audio signal;

The real-time voice signal pre-service of engineering machinery;

Framing is carried out to the real-time voice signal of engineering machinery;

Described noise signal, gets yellow sand car sound, each 1000 frames of building site ground unrest, because these two kinds of sound source frequencies of occurrences are high, large on recognizer impact.

Described voice collection device is cross acoustic array.

The extracting method of described LPCC feature is:

First try to achieve LPC (linear predictor coefficient) value of every frame signal, and then try to achieve its LPCC value;

Adopt Levinson-Durbin Algorithm for Solving LPC value;

s(n)≈a ₁s(n-1)+a ₂s(n-2)+…+a _ps(n-p)(1)

\begin{matrix} E = Σ_{n = 1}^{N} e_{n}^{2} = Σ_{n = 1}^{N} {(Σ_{k = 0}^{p} a_{k} s_{n - k})}^{2} & a_{0} = 1 \end{matrix} - - - (2)

H (z) = \frac{1}{1 - Σ_{k = 1}^{p} a_{k} z^{- k}} - - - (3)

\hat{H} (z) = \ln H (z) = \overset{\infty}{Σ} c (n) z^{- n} - - - (4)

(3) are substituted into (4), and to z ^-1ask partial derivative, obtain (5):

(1 - \overset{p}{Σ} a_{k} z^{- k}) \overset{\infty}{Σ} n c (n) z^{- n + 1} = \overset{p}{Σ} {ka}_{k} z^{- k + 1} - - - (5)

Make (5) the right and left corresponding coefficient equal, obtain (6):

\{\begin{matrix} c_{1} = a_{1} \\ c_{n} = a_{n} + Σ_{k = 1}^{n - 1} \frac{k}{n} c_{k} a_{n - k} & 1 < n \leq p \\ c_{n} = a_{n} + Σ_{k = 1}^{n - 1} \frac{k}{n} c_{k} a_{n - k} & n > p \end{matrix} - - - (6)

With libsvm as sorter, choose Radial basis kernel function (RBF) as sorter kernel function; RBF has two parameters: penalty factor c and parameter gamma, can select optimum value by the grid search function opti_svm_coeff of libsvm; Thus make the RBF kernel function chosen can effectively by data-mapping to higher dimensional space, and convert the engineering instrument signal of all kinds of demarcation to different high dimensional signal by class mark in higher dimensional space, Data classification be converted to Function Fitting problem.

Specific as follows, convert the five class training datas demarcated (" 0 " represents hand-held electric pick, " 1 " expression noise signal, " 2 " expression cutting machine, " 3 " expression mechanical percussion hammer, and " 4 " represent excavator) to five dimensional vectors, wherein

" 0 " represents that hand-held electric pick converts to (1-1-1-1-1)

" 1 " represents that noise signal converts to (-11-1-1-1)

" 2 " represent that cutting machine converts to (-1-11-1-1)

" 3 " represent that mechanical percussion hammer converts to (-1-1-11-1)

" 4 " represent that excavator converts to (-1-1-1-11)

Engineering machinery sound characteristic fingerprint base is set up.Owing to there is no the special sound characteristic fingerprint base for engineering machinery at present, therefore need to set up an engineering machinery sound characteristic fingerprint base.Concrete grammar is as follows:

To the 18 rank LPCC values that pickaxe, cutting machine, mechanical percussion hammer, excavator, the every frame signal of noise in technical scheme extract, wherein to four kinds of engineering machinery sound, respectively get 1000 frames near, in, different distance signal 18 rank far away LPCC value, for noise signal, get yellow sand car sound, each 1000 frames of building site ground unrest.

Row are added as class mark above, thus form the vector on 19 rank, shown by great many of experiments, when each distance gets 1000 frame signal, can show the feature of voice signal under this distance, add data again and there will be redundancy, promote unhelpful to discrimination, therefore to the voice signal under four kinds of engineering machinery different distance, respectively get 1000 frames near, in, different distance signal 18 rank far away LPCC value, for noise signal, get yellow sand car sound, each 1000 frames of building site ground unrest, try to achieve its 18 rank LPCC value, these 18 exponent numbers are arranged as class mark according to front increase by, hand-held electric pick is represented with label " 0 ", " 1 " represents noise signal, " 2 " represent cutting machine, " 3 " represent mechanical percussion hammer, " 4 " represent excavator.This just forms the matrix that 12000 row 19 arrange, and wherein first be classified as class mark, the second to ten nine are classified as corresponding LPCC feature.

Proper vector 3*4*1000 being amounted to 12000 group of 19 rank uses SVM to train, wherein 3 represent near, in, three kinds of distances far away; 4 represent four kinds of engineering machinery; 1000 is 1000 frame signals that each distance is chosen.

Use the svmtrain function of svmlib, svmtrain has four parameters: proper vector, the vector that 12000 row 19 obtained after namely extracting feature arrange; Kernel function type, chooses RBF kernel function; The value of the parameter c of RBF function and gamma, penalty factor c and parameter gamma selects optimal value by the grid search function opti_svm_coeff of libsvm; Returning of svmtrain remains training gained model information, the sorter obtained after training exactly, i.e. the sound characteristic fingerprint base of four kinds of engineering machinery.

SVM adopts libsvm, and it can realize many classification features, and wherein function svmtrain is used for training, and svmtest function is used for identifying.Svm has four kinds of conventional kernel functions: linear kernel (LinearKernel), polynomial kernel (PolynomialKernel), Radial basis kernel function (RadialBasisFunction), Sigmoid core (SigmoidKernel), apply these four kinds of different IPs functions to be trained to power and to be respectively: 82.72%, 62.49%, 88.25%, 81.91%, Radial basis kernel function effect is best, therefore, Radial basis kernel function is chosen here as sorter kernel function.The value of its penalty factor c and parameter gamma selects optimal value by the grid search function opti_svm_coeff of libsvm.Svmtrain function return value saves training gained model information, this value is preserved and identifies for next step.

The identification of engineering machinery sound.The svmtest function of the LPCC value libsvm every frame signal obtained carries out intelligent classification, svmtest has three parameters, first is class mark, be used for testing discrimination, second is proper vector, namely store the variable of LPCC value, the 3rd is Matching Model, is exactly the rreturn value of above-mentioned steps training process svmtrain function.Call rreturn value that svmtest obtains to classify exactly acquired results, i.e. class mark, thus the device type producing this sound can be determined.

SVM adopts libsvm, and it can realize many classification features, and wherein function svmtrain is used for training, and svmtest function is used for identifying.Svm has four kinds of conventional kernel functions: linear kernel (LinearKernel), polynomial kernel (PolynomialKernel), Radial basis kernel function (RadialBasisFunction), Sigmoid core (SigmoidKernel), apply these four kinds of different IPs functions to be trained to power and to be respectively: 82.72%, 62.49%, 88.25%, 81.91%, Radial basis kernel function effect is best.

Choose Radial basis kernel function as sorter kernel function; The value of its penalty factor c and parameter gamma selects optimal value by the grid search function opti_svm_coeff of libsvm;

Can obtain the variable of a model by name after the training of Svmtrain function, this variable save training gained model information, gets off this variable save for next step identification.

The above is only preferred embodiment of the present invention, not does any pro forma restriction to structure of the present invention.Every above embodiment is done according to technical spirit of the present invention any simple modification, equivalent variations and modification, all belong within the scope of technical scheme of the present invention.

Claims

1., based on an engineering machinery recognition methods for sound linear prediction residue error, comprise the steps:

2., according to claim 1 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described utilize ecotopia under the engineering machinery LPCC coefficient that extracts build engineering machinery sound characteristic fingerprint base and comprise:

Engineering machinery voice signal extracts LPCC coefficient and sets up;

Background noise is extracted LPCC coefficient and is set up.

3. according to claim 2 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described engineering machinery voice signal extracts the foundation of LPCC coefficient and comprises:

The pre-service of engineering machinery voice signal;

Framing is carried out to engineering machinery voice signal;

4. according to claim 3 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: the pre-service of described engineering machinery voice signal comprises: engineering machinery voice signal carries out end-point detection, windowing, framing.

5. according to claim 4 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described engineering machinery voice signal end-point detection is realized by zero-crossing rate.

6. according to claim 4 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described employing Hamming window carries out framing to engineering machinery voice signal.

7. according to claim 2 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described background noise is extracted the foundation of LPCC coefficient and comprised:

Background noise pre-service;

Framing is carried out to background noise;

8. according to claim 7 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described background noise pre-service comprises: background noise carries out end-point detection, windowing, framing.

9. according to claim 8 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: described background noise end-point detection is realized by zero-crossing rate.

10. according to claim 8 based on the engineering machinery recognition methods of sound linear prediction residue error, it is characterized in that: adopt Hamming window to carry out framing to background noise.