Disclosure of Invention
Based on the method, the data are preprocessed by using a principal component analysis method, the dimension of a sample space is reduced, and the calculation speed is greatly increased under the condition of ensuring the recognition rate.
The technical scheme of the invention is as follows:
a partial discharge identification method comprising the steps of:
collecting an audio signal of high-voltage equipment, performing time domain analysis and frequency domain analysis on the audio signal, and extracting signal characteristics;
reducing the dimension of the signal characteristics by using a principal component analysis method;
classifying and identifying the signal characteristics after dimensionality reduction, and judging whether partial discharge is generated or not;
wherein the signal features include: short-time average amplitude, high zero-crossing rate ratio, root mean square value, Mel cepstrum coefficient, sub-band energy ratio and signal bandwidth;
the method for extracting the signal features comprises the following steps:
(1) short-time average amplitude extraction:
short time average amplitude difference Fn(k) The calculation formula of (2) is as follows:
<math>
<mrow>
<msub>
<mi>F</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>-</mo>
<mi>k</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>+</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</math>
wherein x isn(m+k)=w(m+k)x(n+m+k);
w is a window function;
x is an original signal;
(2) extracting at a high zero crossing rate ratio:
setting a zero-crossing rate threshold, and calculating the proportion of frames with zero-crossing rates higher than the threshold in an audio segment, namely, the ratio of the high zero-crossing rates, which is defined as:
<math>
<mrow>
<mfrac>
<mn>1</mn>
<mrow>
<mn>2</mn>
<mi>N</mi>
</mrow>
</mfrac>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mo>[</mo>
<mi>sgn</mi>
<mrow>
<mo>(</mo>
<mi>ZCR</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mn>1.1</mn>
<mi>avZCR</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mn>1</mn>
<mo>]</mo>
</mrow>
</math>
wherein N is the total frame number in an audio segment;
ZCR (n) is the zero crossing rate of the nth frame;
the ZCR threshold value is 1.1 times of the average value of ZCR (n) in one audio segment;
sgn is a sign function;
is the average of the zero-crossing rates in an audio segment.
(3) Root mean square value extraction:
the magnitude rms is the average of the sum of squared magnitudes of the signal sequence s (n), which is defined as:
<math>
<mrow>
<msub>
<mi>T</mi>
<mi>JPG</mi>
</msub>
<mo>=</mo>
<msqrt>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</msubsup>
<msubsup>
<mi>s</mi>
<mi>i</mi>
<mn>2</mn>
</msubsup>
</msqrt>
<mo>;</mo>
</mrow>
</math>
(4) extraction of the Mel cepstrum coefficient:
firstly, determining the point number N of each frame of voice sampling sequence, taking N =240 points, filling zero after the sequence, and then performing 256-level discrete FFT (fast Fourier transform), wherein the frequency spectrum of the mth frame of voice is as follows:
<math>
<mrow>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>,</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mn>255</mn>
</msubsup>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mfrac>
<mrow>
<mn>2</mn>
<mi>πnk</mi>
</mrow>
<mn>255</mn>
</mfrac>
</mrow>
</msup>
</mrow>
</math>
wherein { s (n, m) | n =0,1, … …,239} is 240 sampling points of the m-th frame speech, and { s (n, m) | n =240, … …,255} is zero, and a discrete power spectrum s (m) is obtained by performing a modulo square on the spectrum of the speech;
calculating S (m) through I filters HiThe power values obtained after (m), I =24, were calculated for S (m) and Hi(m) the sum of the products at each discrete point, to obtain I parameters Pi,i=0,1,…,I-1;
Calculating PiNatural logarithm of to obtain Li,i=0,1,…,I-1;
To L0,L1,…,LI-1Calculating its discrete cosine transform to obtain Di,i=0,1,…,I-1;
By cutting off D representing a direct current component0Taking D1,D2,…,DJAs the mel-frequency cepstrum coefficient, J = 15.
(5) Sub-band energy ratio extraction:
the calculation formula of the sub-band energy ratio is as follows:
<math>
<mrow>
<msub>
<mi>BandSpec</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mfrac>
<mi>K</mi>
<mi>B</mi>
</mfrac>
</mrow>
<mrow>
<mi>i</mi>
<mo>·</mo>
<mfrac>
<mi>K</mi>
<mi>B</mi>
</mfrac>
</mrow>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
<mi>i</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>B</mi>
</mrow>
</math>
where DFT (n, k) is the Fourier transform coefficient of the nth frame of the input signal,
<math>
<mrow>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mo>|</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mo>-</mo>
<mo>∞</mo>
</mrow>
<mo>∞</mo>
</munderover>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mfrac>
<mrow>
<mn>2</mn>
<mi>π</mi>
</mrow>
<mi>L</mi>
</mfrac>
<mi>km</mi>
</mrow>
</msup>
<mo>|</mo>
</mrow>
</math>
wherein L is the window length;
k is the order of the discrete Fourier transform;
n is the number of audio frames in the segment;
the value of B is 4, namely the frequency domain is divided into 4 frequency multiplication sub-band intervals which are sb respectively1[0,ω0/8],sb2[ω0/8,ω0/4],sb3[ω0/4,ω0/2],sb4[ω0/2,ω0]Wherein, ω is0=fs/2,fsIs the sampling frequency;
(6) extracting signal bandwidth:
the bandwidth is defined as:
<math>
<mrow>
<mi>BW</mi>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mo>[</mo>
<msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>SC</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>]</mo>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
</math>
wherein N is the number of sampling points in a frame;
DFT is the fourier transform coefficient of the signal;
SC is the spectral centroid.
In one embodiment, the method further comprises the step of preprocessing the acquired audio signal, wherein the preprocessing comprises front-end processing, noise reduction and end-point detection, and the front-end processing comprises pre-emphasis and windowing.
In one embodiment, the signal features include: short-time average amplitude, high over-zero rate ratio, root mean square value, mel-frequency cepstrum coefficient, sub-band energy ratio and signal bandwidth.
In one embodiment, the dimension reduction of the signal features is to reduce the dimension of the mel-frequency cepstral coefficients.
It is also an object of the invention to provide a partial discharge recognition system capable of implementing the above method.
A partial discharge recognition system comprises an extraction module, a dimension reduction module and a judgment module;
the extraction module is used for collecting the audio signal of the high-voltage equipment, performing time domain analysis and frequency domain analysis on the audio signal and extracting signal characteristics;
the dimensionality reduction module is used for reducing dimensionality of the signal features by using a principal component analysis method;
the judging module is used for carrying out classification and identification on the signal characteristics after the dimension reduction and judging whether partial discharge is generated or not;
wherein the signal features include: short-time average amplitude, high zero-crossing rate ratio, root mean square value, Mel cepstrum coefficient, sub-band energy ratio and signal bandwidth;
the extraction module is further configured to:
(1) short-time average amplitude extraction:
short time average amplitude difference Fn(k) The calculation formula of (2) is as follows:
<math>
<mrow>
<msub>
<mi>F</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>-</mo>
<mi>k</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>+</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</math>
wherein x isn(m+k)=w(m+k)x(n+m+k);
w is a window function;
x is an original signal;
(2) extracting at a high zero crossing rate ratio:
setting a zero-crossing rate threshold, and calculating the proportion of frames with zero-crossing rates higher than the threshold in an audio segment, namely, the ratio of the high zero-crossing rates, which is defined as:
<math>
<mrow>
<mfrac>
<mn>1</mn>
<mrow>
<mn>2</mn>
<mi>N</mi>
</mrow>
</mfrac>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mo>[</mo>
<mi>sgn</mi>
<mrow>
<mo>(</mo>
<mi>ZCR</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mn>1.1</mn>
<mi>avZCR</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mn>1</mn>
<mo>]</mo>
</mrow>
</math>
wherein N is the total frame number in an audio segment;
ZCR (n) is the zero crossing rate of the nth frame;
the ZCR threshold value is 1.1 times of the average value of ZCR (n) in one audio segment;
sgn is a sign function;
is the average of the zero-crossing rates in an audio segment.
(3) Root mean square value extraction:
the magnitude rms is the average of the sum of squared magnitudes of the signal sequence s (n), which is defined as:
<math>
<mrow>
<msub>
<mi>T</mi>
<mi>JPG</mi>
</msub>
<mo>=</mo>
<msqrt>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</msubsup>
<msubsup>
<mi>s</mi>
<mi>i</mi>
<mn>2</mn>
</msubsup>
</msqrt>
<mo>;</mo>
</mrow>
</math>
(4) extraction of the Mel cepstrum coefficient:
firstly, determining the point number N of each frame of voice sampling sequence, taking N =240 points, filling zero after the sequence, and then performing 256-level discrete FFT (fast Fourier transform), wherein the frequency spectrum of the mth frame of voice is as follows:
<math>
<mrow>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>,</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mn>255</mn>
</msubsup>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mfrac>
<mrow>
<mn>2</mn>
<mi>πnk</mi>
</mrow>
<mn>255</mn>
</mfrac>
</mrow>
</msup>
</mrow>
</math>
wherein { s (n, m) | n =0,1, … …,239} is 240 sampling points of the m-th frame speech, and { s (n, m) | n =240, … …,255} is zero, and a discrete power spectrum s (m) is obtained by performing a modulo square on the spectrum of the speech;
calculating S (m) through I filters HiThe power values obtained after (m), I =24, were calculated for S (m) and Hi(m) the sum of the products at each discrete point, to obtain I parameters Pi,i=0,1,…,I-1;
Calculating PiNatural logarithm of to obtain Li,i=0,1,…,I-1;
To L0,L1,…,LI-1Calculating its discrete cosine transform to obtain Di,i=0,1,…,I-1;
By cutting off D representing a direct current component0Taking D1,D2,…,DJAs the mel-frequency cepstrum coefficient, J = 15.
(5) Sub-band energy ratio extraction:
the calculation formula of the sub-band energy ratio is as follows:
<math>
<mrow>
<msub>
<mi>BandSpec</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mfrac>
<mi>K</mi>
<mi>B</mi>
</mfrac>
</mrow>
<mrow>
<mi>i</mi>
<mo>·</mo>
<mfrac>
<mi>K</mi>
<mi>B</mi>
</mfrac>
</mrow>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
<mi>i</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>B</mi>
</mrow>
</math>
where DFT (n, k) is the Fourier transform coefficient of the nth frame of the input signal,
<math>
<mrow>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mo>|</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mo>-</mo>
<mo>∞</mo>
</mrow>
<mo>∞</mo>
</munderover>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mfrac>
<mrow>
<mn>2</mn>
<mi>π</mi>
</mrow>
<mi>L</mi>
</mfrac>
<mi>km</mi>
</mrow>
</msup>
<mo>|</mo>
</mrow>
</math>
wherein L is the window length;
k is the order of the discrete Fourier transform;
n is the number of audio frames in the segment;
the value of B is 4, namely the frequency domain is divided into 4 frequency multiplication sub-band intervals which are sb respectively1[0,ω0/8],sb2[ω0/8,ω0/4],sb3[ω0/4,ω0/2],sb4[ω0/2,ω0]Wherein, ω is0=fs/2,fsIs the sampling frequency;
(6) extracting signal bandwidth:
the bandwidth is defined as:
<math>
<mrow>
<mi>BW</mi>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mo>[</mo>
<msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>SC</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>]</mo>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
</math>
wherein N is the number of sampling points in a frame;
DFT is the fourier transform coefficient of the signal;
SC is the spectral centroid.
In one embodiment, the device further comprises a preprocessing module, configured to preprocess the audio signal acquired in the extracting module; wherein the preprocessing comprises front-end processing, noise reduction, and endpoint detection, the front-end processing comprising pre-emphasis and windowed framing.
In one embodiment, the signal features include: short-time average amplitude, high over-zero rate ratio, root mean square value, mel-frequency cepstrum coefficient, sub-band energy ratio and signal bandwidth.
In one embodiment, the dimensionality reduction module performs dimensionality reduction on the signal features into dimensionality reduction on the mel-frequency cepstrum coefficients.
Aiming at the defects of the existing SVM recognizer technology when recognizing a large-capacity sample, the invention firstly preprocesses the acquired audio signal to eliminate the data correlation and noise interference, after extracting the signal characteristics through time domain analysis and frequency domain analysis, uses a principal component analysis method to reduce the dimension of the signal characteristics, extracts principal elements containing sample data information, reduces the dimension of a sample space, improves the recognition rate of the partial discharge under the premise that the recognition rate of the partial discharge is basically unchanged, and greatly simplifies the calculation and saves the resource space.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Fig. 1 is a schematic flow chart of a partial discharge identification method according to an embodiment of the present invention, including the following steps:
s101, collecting an audio signal of high-voltage equipment, performing time domain analysis and frequency domain analysis on the audio signal, and extracting signal characteristics;
for S101, in a preferred embodiment, the method further includes the step of preprocessing the acquired audio signal; in the embodiment, an audio signal of partial discharge can be collected for preprocessing; after an ultrasonic sensor is used for collecting an audio signal to be detected, the audio signal is subjected to frequency reduction processing, the obtained audio signal is subjected to section selection by taking 1S as a unit and then is stored in a wav format for the next signal preprocessing; the sampling frequency of the audio signal in this embodiment is set to 40 khz.
The pre-processing preferably comprises front-end processing, noise reduction and end-point detection, the front-end processing preferably comprises pre-emphasis and windowing framing;
because the signal has serious fall in the high-frequency section part and the frequency spectrum is sharper, the signal needs to be pre-emphasized firstly in the signal preprocessing to improve the high-frequency section and smooth the signal frequency spectrum, thereby facilitating the frequency spectrum characteristic analysis and characteristic parameter extraction of the signal. The collected wav format signal can be pre-emphasized by a first order digital filter, the expression of which is Hs(z)=1-ψZ-1;
In this embodiment, the pre-emphasis process is performed by using a filter with psi of 0.9375, and the signal-to-noise ratio of the audio signal is improved after the pre-filtering process is performed on the audio signal.
The windowing framing process may employ a hanning window sequence of 240 samples in length:
<math>
<mrow>
<mi>η</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>sin</mi>
<mn>2</mn>
</msup>
<mrow>
<mo>(</mo>
<mfrac>
<mi>nπ</mi>
<mi>N</mi>
</mfrac>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mn>0.5</mn>
<mo>-</mo>
<mn>0.5</mn>
<mi>cos</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mrow>
<mn>2</mn>
<mi>π</mi>
</mrow>
<mi>N</mi>
</mfrac>
<mo>)</mo>
</mrow>
<mi>πn</mi>
<mo>=</mo>
<mn>0,1</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</math>
the pre-emphasized audio signal is processed and the window sequence is allowed to slide, with a frame-to-frame overlap of two thirds for continuity, and a frame shift of 80.
The noise reduction processing can adopt a filtering signal noise reduction method, and a FIR band elimination filter designed by a window method is adopted for the single frequency noise in the audio signal after the front end processing to filter the signal so as to remove the interference of background noise; the band elimination filter filters out noise interference of about 50Hz by setting parameters, and the band elimination filter utilizes a Kaiser window to realize signal noise reduction based on filtering, thereby achieving the purposes of removing noise interference, promoting the characteristic extraction of signals, improving the quality of audio signals and reducing the influence of background noise on signal analysis.
Endpoint detection, i.e. the segmentation of the audio signal segments, is a crucial step in the study and identification of the quality characteristics of the audio signal. The processing of the audio signals and the extraction of the characteristic parameters are realized according to the corresponding signal segments, so that the useful audio signal segments can be accurately extracted only by accurately judging the corresponding end points of each frame of audio signals.
In a preferred embodiment, a time domain endpoint detection method based on zero crossing rate and short time energy may be employed. The endpoint detection algorithm using time domain parameters as features generally distinguishes signals and noise by mainly using amplitude, energy and the like which can reflect the characteristics of audio signals on a time axis as parameters to carry out endpoint detection, and has the advantages of intuition, accuracy and the like.
The zero crossing rate of the discrete signal is actually the number of times the sign of the signal sample point changes in each frame. The processing after the signal is segmented can be called as short-time, and the short-time average zero-crossing rate is obtained by taking the statistical average of the segmented signal according to the zero-crossing times of sampling points in each frame. Since the sampling frequency of the signal is fixed, the zero-crossing rate is a reflection of the frequency of the signal, and the spectral characteristics of the signal can be roughly estimated through the zero-crossing rate of the signal. The mathematical expression is as follows:
<math>
<mrow>
<msub>
<mi>G</mi>
<mi>n</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mn>2</mn>
<mi>N</mi>
</mrow>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mo>-</mo>
<mo>∞</mo>
</mrow>
<mrow>
<mo>+</mo>
<mo>∞</mo>
</mrow>
</munderover>
<mo>|</mo>
<mi>sgn</mi>
<mo>[</mo>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>]</mo>
<mo>-</mo>
<mi>sgn</mi>
<mo>[</mo>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>]</mo>
<mo>|</mo>
<mi>w</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein sgn [ ] is a sign function;
<math>
<mrow>
<mi>sgn</mi>
<mo>[</mo>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>]</mo>
<mo>=</mo>
<mfenced open='{' close=''>
<mtable>
<mtr>
<mtd>
<mn>1</mn>
</mtd>
<mtd>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>≥</mo>
<mn>0</mn>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>0</mn>
</mtd>
<mtd>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo><</mo>
<mn>0</mn>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>.</mo>
</mrow>
</math>
the main difference between the audio signal and the noise signal is represented by their energy, and the time variation is obvious, the energy of the audio segment is larger than that of the noise segment, and the energy of the audio segment is equivalent to the superposition of the noise segment energy and the sound wave energy of the audio signal. When the environmental noise and the system input noise are small, namely the signal-to-noise ratio of the system can be ensured to be quite high, the signal section and the noise section can be distinguished by utilizing a method for calculating the short-time energy of the input signal. If s (m) is known as the original sound signal sample sequence, the short-time energy E of the signalmIs defined as:
<math>
<mrow>
<msub>
<mi>E</mi>
<mi>m</mi>
</msub>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mo>-</mo>
<mo>∞</mo>
</mrow>
<mrow>
<mo>+</mo>
<mo>∞</mo>
</mrow>
</munderover>
<msup>
<mrow>
<mo>[</mo>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mi>w</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>-</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>]</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mi>m</mi>
<mo>-</mo>
<mi>N</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msup>
<mrow>
<mo>[</mo>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mi>w</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>-</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>]</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>.</mo>
</mrow>
</math>
in a preferred embodiment, the signal features include: short-time average amplitude, high over-zero rate ratio, root mean square value, mel-frequency cepstrum coefficient, sub-band energy ratio and signal bandwidth. The time domain analysis method is based on the analysis processing of energy, power and amplitude; the frequency domain analysis method is based on short-time Fourier transform, cepstrum transform and discrete cosine transform.
The method for extracting the signal features comprises the following steps:
(1) short-time average amplitude extraction:
short time average amplitude difference Fn(k) Is calculated by the formula
<math>
<mrow>
<msub>
<mi>F</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>-</mo>
<mi>k</mi>
</mrow>
</msubsup>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>+</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</math>
Wherein x isn(m+k)=w(m+k)x(n+m+k);
w is a window function;
x is the original signal.
(2) Extracting at a high zero crossing rate ratio:
a zero-crossing rate threshold can be set, and the proportion of frames with zero-crossing rates higher than the threshold in an audio segment, namely, the high zero-crossing rate ratio, is calculated and defined as:
<math>
<mrow>
<mfrac>
<mn>1</mn>
<mrow>
<mn>2</mn>
<mi>N</mi>
</mrow>
</mfrac>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mo>[</mo>
<mi>sgn</mi>
<mrow>
<mo>(</mo>
<mi>ZCR</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mn>1.1</mn>
<mi>avZCR</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mn>1</mn>
<mo>]</mo>
</mrow>
</math>
wherein N is the total frame number in an audio segment;
ZCR (n) is the zero crossing rate of the nth frame;
the ZCR threshold value is 1.1 times of the average value of ZCR (n) in one audio segment;
sgn is a sign function;
is the average of the zero-crossing rates in an audio segment.
(3) Root mean square value extraction:
the magnitude rms is the average of the sum of squared magnitudes of the signal sequence s (n), which is defined as:
<math>
<mrow>
<msub>
<mi>T</mi>
<mi>JPG</mi>
</msub>
<mo>=</mo>
<msqrt>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</msubsup>
<msubsup>
<mi>s</mi>
<mi>i</mi>
<mn>2</mn>
</msubsup>
</msqrt>
<mo>.</mo>
</mrow>
</math>
(4) extraction of the Mel cepstrum coefficient:
the number of points N of each frame of speech sample sequence may be determined, in this embodiment, N =240 points are taken, zero is padded after the sequence, and then 256-level discrete FFT transformation is performed, so that the frequency spectrum of the mth frame of speech is:
<math>
<mrow>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>,</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mn>255</mn>
</msubsup>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mfrac>
<mrow>
<mn>2</mn>
<mi>πnk</mi>
</mrow>
<mn>255</mn>
</mfrac>
</mrow>
</msup>
</mrow>
</math>
wherein { s (n, m) | n =0,1, … …,239} is 240 sampling points of the m-th frame speech, and { s (n, m) | n =240, … …,255} is zero, and a discrete power spectrum s (m) is obtained by performing a modulo square on the spectrum of the speech;
calculating S (m) through I filters HiThe power values obtained after (m), I =24, were calculated for S (m) and Hi(m) the sum of the products at each discrete point, to obtain I parameters Pi,i=0,1,…,I-1;
Calculating PiNatural logarithm of to obtain Li,i=0,1,…,I-1;
To L0,L1,…,LI-1Calculating its discrete cosine transform to obtain Di,i=0,1,…,I-1;
By cutting off D representing a direct current component0Taking D1,D2,…,DJAs the mel-frequency cepstrum coefficient, J = 15.
(5) Sub-band energy ratio extraction:
the subband energy ratio is used for describing the frequency domain characteristics of the frequency distribution of the audio signal, and measures the proportion of the energy of different subbands to the energy of the whole frequency band, the width of each subband can be equal, and the width of each subband can be distributed according to the perception characteristic of human ears, so that each subband contains the same number of critical bandwidths. The calculation formula of the sub-band energy ratio is as follows:
<math>
<mrow>
<msub>
<mi>BandSpec</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mfrac>
<mi>K</mi>
<mi>B</mi>
</mfrac>
</mrow>
<mrow>
<mi>i</mi>
<mo>·</mo>
<mfrac>
<mi>K</mi>
<mi>B</mi>
</mfrac>
</mrow>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
<mi>i</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>B</mi>
</mrow>
</math>
where DFT (n, k) is the Fourier transform coefficient of the nth frame of the input signal,
<math>
<mrow>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mo>|</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mo>-</mo>
<mo>∞</mo>
</mrow>
<mo>∞</mo>
</munderover>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mfrac>
<mrow>
<mn>2</mn>
<mi>π</mi>
</mrow>
<mi>L</mi>
</mfrac>
<mi>km</mi>
</mrow>
</msup>
<mo>|</mo>
</mrow>
</math>
wherein L is the window length;
k is the order of the discrete Fourier transform;
n is the number of audio frames in the segment.
In actual calculation, the value of B is 4. That is, the frequency domain is divided into 4 frequency multiplication sub-band intervals, each of which is sb1[0,ω0/8],sb2[ω0/8,ω0/4],sb3[ω0/4,ω0/2],sb4[ω0/2,ω0]Wherein, ω is0=fs/2,fsIs the sampling frequency.
(6) Extracting signal bandwidth:
the bandwidth reflects the range of signal power or signal energy concentrated in the frequency spectrum, and is an index for measuring the range of audio frequency domain, and is defined as:
<math>
<mrow>
<mi>BW</mi>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mo>[</mo>
<msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>SC</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>]</mo>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mo>|</mo>
<mi>DFT</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
</math>
wherein N is the number of sampling points in a frame;
DFT is the fourier transform coefficient of the signal;
SC is the spectral centroid.
And S102, reducing the dimension of the signal characteristics by using a principal component analysis method.
For S102, in a preferred embodiment, the mel-frequency cepstrum coefficients may be dimensionality-reduced using principal component analysis; the principle of the principal component analysis method is as follows:
(1) raw data normalization
There are n samples, each sample has P indexes, the original sample matrix is
X=(Xij)n×p;i=1,2,…,n;j=1,2,…p
In the formula, XijThe j index of the ith sample is shown.
Considering the trend and dimension problems of the index, the index can be subjected to homotrend processing by adopting an inverse method, and then a Z-score method is used for carrying out standardized transformation on the sample, namely:
<math>
<mrow>
<msub>
<mi>Z</mi>
<mi>if</mi>
</msub>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>ij</mi>
</msub>
<mo>-</mo>
<mover>
<msub>
<mi>x</mi>
<mi>j</mi>
</msub>
<mo>‾</mo>
</mover>
<mo>)</mo>
</mrow>
<mo>/</mo>
<msub>
<mi>S</mi>
<mi>j</mi>
</msub>
</mrow>
</math>
wherein: <math>
<mrow>
<mover>
<msub>
<mi>x</mi>
<mi>j</mi>
</msub>
<mo>‾</mo>
</mover>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</msubsup>
<msub>
<mi>x</mi>
<mi>ij</mi>
</msub>
<mo>/</mo>
<mi>n</mi>
<mo>;</mo>
</mrow>
</math>
<math>
<mrow>
<msubsup>
<mi>S</mi>
<mi>j</mi>
<mn>2</mn>
</msubsup>
<mo>[</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</msubsup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>ij</mi>
</msub>
<mo>-</mo>
<msup>
<mover>
<msub>
<mi>x</mi>
<mi>j</mi>
</msub>
<mo>‾</mo>
</mover>
<mn>2</mn>
</msup>
<mo>)</mo>
</mrow>
<mo>]</mo>
<mo>/</mo>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
a normalized sample matrix can be obtained:
Z=((Zij)n×p),i=1,2,…,n;j=1,2,…p。
(2) matrix of correlation coefficients
Calculating the correlation coefficient between every 2 indexes of the normalized sample to obtain a correlation coefficient matrix R:
<math>
<mrow>
<mi>R</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mfrac>
<mi>ZZ</mi>
<mo>=</mo>
<msub>
<mrow>
<mo>(</mo>
<msub>
<mi>r</mi>
<mi>uv</mi>
</msub>
<mo>)</mo>
</mrow>
<mrow>
<mi>p</mi>
<mo>×</mo>
<mi>p</mi>
</mrow>
</msub>
<mo>,</mo>
<mi>u</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>p</mi>
<mo>;</mo>
<mi>v</mi>
<mo>=</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>p</mi>
</mrow>
</math>
wherein, <math>
<mrow>
<msub>
<mi>r</mi>
<mi>uv</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</msubsup>
<mo>[</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>iu</mi>
</msub>
<mo>-</mo>
<mover>
<msub>
<mi>x</mi>
<mi>u</mi>
</msub>
<mo>‾</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>]</mo>
</mrow>
<msub>
<mi>S</mi>
<mi>u</mi>
</msub>
</mfrac>
<mo>[</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>iu</mi>
</msub>
<mo>-</mo>
<mover>
<msub>
<mi>x</mi>
<mi>r</mi>
</msub>
<mo>‾</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>/</mo>
<msub>
<mi>S</mi>
<mi>v</mi>
</msub>
<mo>]</mo>
<mo>.</mo>
</mrow>
</math>
(3) calculating principal components
Obtaining P characteristic roots from the characteristic formula of lambda I-R0, and arranging the P characteristic roots according to the size of the characteristic roots1≥λ2≥…≥λpIs more than or equal to 0. They are the variances of the principal components, the magnitudes of which describe the weights of each corresponding principal component on the original sample. Obtaining a feature vector l corresponding to each feature root by a feature equationg1,lg2,lg3,…,lgp. By feature vector Fg=Z×LgG =1, 2.., p converts the normalized index into the principal component: f1…, F as the 1 st principal componentpIs the No. P main component.
(4) Determining the number of principal components
The number of the main components is equal to the number of the original indexes. In order to reduce the calculation amount and the dimensionality, the number k of the taken principal components is generally determined according to the principle component variance cumulative contribution rate of more than 80% -90%, namely:
<math>
<mrow>
<mfrac>
<mrow>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>g</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>k</mi>
</msubsup>
<msub>
<mi>λ</mi>
<mi>g</mi>
</msub>
</mrow>
<mrow>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>g</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>p</mi>
</msubsup>
<msub>
<mi>λ</mi>
<mi>g</mi>
</msub>
</mrow>
</mfrac>
<mo>≥</mo>
<mn>80</mn>
<mo>%</mo>
<mo>~</mo>
<mn>90</mn>
<mo>%</mo>
<mo>.</mo>
</mrow>
</math>
in this embodiment, the mel-frequency cepstrum coefficient in the audio signal is subjected to dimensionality reduction by adopting a principal component analysis method, the cumulative contribution rate of the principal component variance is set to be 85%, and the number of the obtained mel-frequency cepstrum coefficient principal components is 15.
S103, classifying and identifying the signal characteristics after dimensionality reduction, and judging whether partial discharge is generated or not
For S103, in a preferred embodiment, the classification identifying step may include a normalization process, an identifying step, and a post-classification process; after extracting the characteristic parameters of the audio segment to be recognized, carrying out normalization processing on the characteristic parameters except the Mel cepstrum coefficient and the first-order difference Mel cepstrum coefficient to form a data set to be recognized; the characteristic parameters comprise a Mel cepstrum coefficient, a first-order difference Mel cepstrum coefficient, short-time energy, a zero-crossing rate, a high zero-crossing rate ratio and a short-time average amplitude difference; the identification step is to use a support vector machine model based on a polynomial kernel function and to determine the function by a discriminant function
Classifying and judging the data to be recognized to obtain the classification information of each audio segment, wherein xiIs a support vector, i.e. the characteristic parameter of the training sample, x is an unknown vector, i.e. the characteristic parameter of the sample to be measured, yiTo correspond to xiClass identification of, K (x, x)i)=[(xTxi)+1]qQ = 3; (x) the result of the fault classification, if it is 1, it is determined that discharge is present, and if it is-1, it is determined that discharge is absent.
The following describes the technical solution of the partial discharge identification system in detail with reference to the accompanying drawings and specific embodiments.
Fig. 2 is a block diagram of a partial discharge recognition system according to the present invention, which includes: an extraction module 201, a dimension reduction module 202 and a judgment module 203;
the extraction module 201 is configured to collect an audio signal of a high-voltage device, perform time domain analysis and frequency domain analysis on the audio signal, and extract signal characteristics;
the dimensionality reduction module 202 is configured to perform dimensionality reduction on the signal features by using a principal component analysis method;
the judging module 203 is configured to perform classification and identification on the signal features after the dimension reduction, and judge whether the partial discharge is generated.
In a preferred embodiment, the apparatus further includes a preprocessing module, configured to preprocess the signal acquired in the extracting module 201; the preprocessing includes front-end processing, including pre-emphasis and windowed framing, noise reduction, and endpoint detection.
In a preferred embodiment, the signal features include: short-time average amplitude, high over-zero rate ratio, root mean square value, mel-frequency cepstrum coefficient, sub-band energy ratio and signal bandwidth.
In a preferred embodiment, the dimension reduction module 203 performs dimension reduction on the signal features to perform dimension reduction on the mel-frequency cepstral coefficients.
According to the invention, preprocessing such as front-end processing, noise reduction and endpoint detection is carried out after the audio signal to be detected is acquired by the ultrasonic sensor, so that the signal-to-noise ratio of the audio signal is improved, background noise interference is eliminated, and the endpoint corresponding to each frame of audio signal is accurately judged; after the signal features are extracted by a time domain analysis method and a frequency domain analysis method, the dimension of the Mel cepstrum coefficient in the signal features is reduced by using a principal component analysis method, the dimension of a sample space is reduced, and finally, whether partial discharge occurs or not is judged by using a partial discharge recognizer of a support vector machine, so that the partial discharge recognition rate is improved on the premise that the partial discharge recognition rate is basically unchanged.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.