CN106228979B

CN106228979B - Method for extracting and identifying abnormal sound features in public places

Info

Publication number: CN106228979B
Application number: CN201610674982.1A
Authority: CN
Inventors: 李伟红; 田真真; 龚卫国; 王伟冰
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2016-08-16
Filing date: 2016-08-16
Publication date: 2020-01-10
Anticipated expiration: 2036-08-16
Also published as: CN106228979A

Abstract

The invention relates to a method for extracting and identifying abnormal sounds in public places, which improves the polar symmetry mode decomposition (ESMD), called D-ESMD for short, and is characterized in that: adding a random T distribution sequence signal into abnormal sounds in public places, and reducing the influence of background noise in the public places on the extraction of the characteristics of the abnormal sounds; aiming at the problem that the original ESMD has poor decomposition effect when decomposing abnormal sounds, a symmetric midpoint interpolation method is provided to replace an extreme value midpoint odd-even interpolation method, so that the abnormal sound decomposition efficiency and the recognition rate are improved; aiming at the defects of the original ESMD in the selection of effective decomposition modes, complexity detection is carried out on the modes obtained by ESMD decomposition based on the permutation entropy algorithm, and the effective modal components of abnormal sounds are obtained in a self-adaptive mode. The method can fully describe the characteristics of the abnormal sound, obtain a better classification recognition result, more accurately extract the characteristics of the abnormal sound and have better robustness on the environmental background noise.

Description

Method for extracting and identifying abnormal sound features in public places

Technical Field

The invention belongs to the technical field of audio signal feature extraction and pattern recognition, and particularly relates to a method for extracting and recognizing abnormal sound features in public places.

Background

Public places such as squares, bus stations, subways and the like have the characteristics of large people flow, wide regions and the like, and the safety precaution of the public places is always widely concerned by governments and people of all countries. At present, a monitoring technology mainly based on video monitoring plays an active role in safety precaution in public places, but the video monitoring technology has the problems of monitoring dead angles, monitoring fuzziness in rainy days and the like. As is well known, abnormal sounds such as screaming sound, gunshot sound, glass breaking sound, explosion sound and the like are often accompanied when an abnormal event occurs, and therefore the cooperative operation of audio monitoring and video monitoring has become a development direction in the field of security monitoring in public places. At present, the existing audio monitoring system only comprises simple sound collection, transmission and the like, and is lack of effective identification of abnormal sounds, because the core theory and technology of audio monitoring are not broken through. The technology for recognizing abnormal sounds in public places is a core technology of an audio monitoring system. Therefore, the research on the technology has important social significance and research value.

At present, there is a problem in extracting abnormal sound characteristics of a public place by using an Extreme-point Symmetric Mode Decomposition (ESMD) method, wherein ① the abnormal sound characteristics of the public place consist of an abnormal sound signal and a background noise signal, the background noise signal can shield local characteristics of the abnormal sound, the ESMD is adopted to decompose the abnormal sound of the public place, the obtained modal component necessarily contains a background noise component, and the abnormal sound characteristics generate deviation, ② the ESMD constructs 1, 2, 3 or more than equal interpolation curves according to the midpoint of the interpolation curve to improve Decomposition effects when decomposing the signal, namely ESMD-I, ESMD-II and ESMD-III methods, because the effect of the interpolation method on modal Decomposition is greatly influenced, the three interpolation methods are compared to find out that the modal number is reduced with the increase of interpolation lines, the symmetry degree is reduced, the amplitude change is enhanced, the Decomposition efficiency is improved, when decomposing the abnormal sound characteristics of the public place with the background noise are decomposed by using the remaining number of the Extreme points as the Decomposition termination condition, the ESMD is not subjected to the low-frequency Decomposition judgment, and the ESMD is not subjected to the judgment that the Decomposition frequency of the ESMD is retained when decomposing noise characteristics are selected, and ESMD is retained in ESMD, and ESMD is not subjected to the ESMD, so that the ESMD is retained_min，K_max]And (4) internally changing, repeatedly decomposing the abnormal sound signals by using different screening times, and finally calculating the optimal screening time by using the least square method principle, so that the time consumption for decomposing the abnormal sound signals by the ESMD is long.

In summary, the ESMD decomposition technique has room for improvement.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a public place abnormal sound feature extraction and identification method based on an improved ESMD (D-ESMD) decomposition technology, which is used for improving an internal interpolation method, a judgment condition of decomposition mode termination and the mode component screening times by adding noise to an input signal of the ESMD to obtain the features of the public place abnormal sound under different scales.

A method for extracting and identifying abnormal sound features in public places comprises the following specific steps:

step 1: and inputting abnormal sounds to be identified in public places and preprocessing the abnormal sounds.

Step 2: and decomposing the abnormal sound signal to be identified by adopting an improved pole symmetric modal D-ESMD decomposition method to obtain modal components of each order, wherein each modal component respectively comprises the characteristics of the abnormal sound signal in different frequency bands.

And step 3: and (3) calculating the energy ratio of each order of modal component obtained in the step (2) relative to the original abnormal sound signal, and combining the energy ratios into a vector form to perform normalization processing to be used as a feature vector of the abnormal sound signal to be identified.

And 4, step 4: judging whether the feature vector is valid; if not, skipping to the step 3; if yes, go to step 5.

And 5: the identification process of the abnormal sound to be identified in the public place comprises the following steps: firstly, randomly selecting each class and a certain number of training samples from an established abnormal sound library, solving the feature vector of the training samples through the step 2 and the step 3, and establishing an SVM classification model; and then, classifying the feature vectors of the abnormal sounds to be recognized by using the established SVM classification model to obtain a classification recognition result.

The D-ESMD decomposition method is characterized in that on the basis of a pole symmetric mode ESMD decomposition method, a random T distribution noise sequence is added to abnormal sounds to be identified in a public place, a symmetric midpoint interpolation method is adopted to replace an extreme value midpoint parity interpolation method of the ESMD, arrangement entropy values are calculated for decomposed mode components, the mode component screening times are improved, complexity detection of each mode is completed, and effective mode components of the abnormal sounds are obtained in a self-adaptive mode.

The abnormal sound library comprises explosion sound, scream sound, gunshot sound and glass breaking sound.

Specifically, the D-ESMD decomposition method comprises the following specific processes:

step 2.1, determining the number N of times of adding T distributed random noise;

step 2.2, supposing that the abnormal sound signal to be identified is X, adding a random T distribution sequence into the sound signal to be identified to obtain a noise-added abnormal sound signal X_i；

Step 2.3, the abnormal sound signal X after the noise addition is obtained_iConnecting adjacent extreme points, and marking the middle point of the line segment as F_iSupplement left and right boundary points F₀And F_nAn interpolation curve L is constructed for n +1 extremum midpoints by adopting a symmetric midpoint interpolation method to replace an ESMD extremum midpoint parity interpolation method^*；

Step 2.4 reaction of X_i-L^*As input, repeating the above step 2.3 until the screening frequency reaches the maximum value to obtain the first-order modal component M₁ ⁱCalculating the value of the permutation entropy of the modal components; if the value of the permutation entropy of the signal is larger than a predetermined threshold value theta, the signal is regarded as an abnormal sound modal component, otherwise, the signal is regarded as a noise component;

step 2.5 if modal component M₁ ⁱFor abnormal sound modal component, X is added_i-M₁ ⁱRepeating the steps 2.3-2.4 as input signals until the modal component M is obtained by decomposition_n ⁱIs a noise component;

step 2.6 if i<N, let i be i +1, repeat steps 2.2 to 2.5, and the T distribution noise signal added each time is different until N decompositions are performed, and all modal components are obtained

Taking the overall average value and taking the result as the final modal component M of the signal to be decomposed_k：

In the above formula, k is the order of modal component, and N is the number of times of noise addition.

Specifically, the symmetric midpoint interpolation method specifically comprises the following steps:

step 3.1, assuming that the input signal is y, solving all maximum value points y of y_maxAnd minimum value point y_min；

Step 3.2, connecting all adjacent extreme points and solving the extreme value middle point y_mean；

y_mean＝(y_max+y_min)/2

Step 3.3, solving the symmetrical middle point y of the middle points of the adjacent extreme values_mAnd simultaneously using cubic spline interpolation method to y_mAnd (5) carrying out interpolation to obtain a final interpolation curve.

Specifically, the screening times in step 2.4 are optimally 12.

Specifically, the specific calculation process of the permutation entropy is as follows:

assuming a time series signal x (i) of length N, i ═ 1, 2, …, N, which is subjected to delayed reconstruction, the following time series results:

wherein l is time delay, m is reconstruction dimension, and m elements in X (i) are arranged in ascending order to obtain:

X_i'＝{x(i+(j₁-1)*l)≤x(i+(j₂-1)*l)

≤…≤x(i+(j_m-1)*l)}

thus, each vector x (i) has a set of permutation sequences:

Sg＝{j₁,j₂,j₃,…j_m}

in the formula, j represents an index of a column in which each element in the reconstruction component is located.

Wherein m! A different arrangement; calculating the probability p of each permutation appearing in X (i)₁、p₂、…p₃Then the normalized permutation entropy is:

where N is the time series length, m is the reconstruction dimension and l is the time delay.

The effective gain effect is as follows:

when the invention decomposes abnormal sounds in public places based on D-ESMD, random T-distributed noise sequences are added to the abnormal sound signals in the public places to be decomposed, and the decomposition deviation caused by background noise is reduced from the source, thereby greatly improving the recognition capability of the abnormal sounds in the public places. In addition, the invention combines the characteristics of the abnormal sound and the background noise of the public place, provides a D-ESMD method for extracting and identifying the characteristics of the abnormal sound of the public place, and decomposes the abnormal sound of the public place into a series of modal components with single frequency components. Theoretically, an interpolation method inside the ESMD, judgment conditions for termination of decomposition modes, screening times of mode components and the like are improved, and the mode components obtained through decomposition can reflect the characteristics of abnormal sounds in public places under different scales.

Drawings

FIG. 1: the invention provides a flow chart of a public place abnormal sound feature extraction and identification method;

FIG. 2: decomposing a simulation signal diagram by an ESMD interpolation method;

FIG. 3: the improved interpolation method provided by the invention decomposes the analog signal diagram;

FIG. 4: the invention is compared with Receiver Operating Characteristics (ROC) curves of other abnormal sound Characteristic extraction methods.

Detailed Description

The invention is explained in further detail below with reference to the drawings.

The core technology of the invention is a D-ESMD decomposition method. The D-ESMD decomposition method is an improvement based on the ESMD decomposition method, and the improvement points are as follows:

firstly, an ESMD decomposition method based on T distribution is adopted to weaken background noise components in modal components, and therefore the characteristics of abnormal sounds are extracted better. The method comprises the following specific steps:

and adding a random T distribution sequence in the sound signal to be identified, weakening a background noise component in the modal component, reducing the decomposition deviation caused by the background noise from the source, and improving the characteristic extraction capability of abnormal sound. The specific treatment process comprises the following steps:

suppose the abnormal sound signal of the public place is x (t), which generally consists of the real abnormal sound signal x (t) and the background noise signal n (t), that is:

X(t)＝x(t)+N(t)

when ESMD is used to decompose x (t), the obtained mode m (t) also includes abnormal sound signal component m (t) and background noise signal component c (t), which is:

in the formula, n is the number of modal components, and r (t) is the decomposition residue.

Adding k different T noise sequences n to the signal X (T)_iAfter (t), the series of equations can be expressed as:

X(t)+n₁(t)＝m₁₁(t)+m₁₂(t)+…+m_1n(t)+c₁₁(t)+c₁₂(t)+…+c_1n(t)+r₁(t)

X(t)+n₂(t)＝m₂₁(t)+m₂₂(t)+…+m_2n(t)+c₂₁(t)+c₂₂(t)+…+c_2n(t)+r₂(t)

………

X(t)+n_i(t)＝m_i1(t)+m_i2(t)+…+m_in(t)+c_i1(t)+c_i2(t)+…+c_in(t)+r_i(t)

………

X(t)+n_k(t)＝m_k1(t)+m_k2(t)+…+m_kn(t)+c_k1(t)+c_k2(t)+…+c_kn(t)+r_k(t)

adding the N formulas to obtain:

as can be seen from the above formula, k.times.N (t) + n when k is ∞₁(t)+n₂(t)+…n_k(t) and c_ijThe terms (t) all approach zero, then the above equation is converted as follows:

as can be seen from the above formula, k times of random T distribution noise sequences are added to abnormal sounds in public places, and the average value of each order of modes obtained by decomposing the abnormal sounds by using ESMD is taken, so that the background noise component c (T) is eliminated, and the influence of the background noise in the public places on the abnormal sound decomposition is reduced.

And secondly, symmetric midpoint interpolation is adopted to replace extreme value midpoint odd-even interpolation, and the ESMD decomposition efficiency and the decomposition accuracy are improved from the signal source head.

The symmetric midpoint interpolation method comprises the following steps:

step 3.1 to find all maxima points y of the original signal_maxAnd minimum value point y_min；

Step 3.2, connecting all adjacent extreme points and solving the extreme middle point y_mean；

y_mean＝(y_max+y_min)/2

Step 3.3. finding the symmetrical midpoint y of the midpoints of adjacent extrema_mAnd simultaneously using cubic spline interpolation method to y_mAnd (5) carrying out interpolation to obtain a final interpolation curve.

The analog signal z is decomposed by adopting symmetric midpoint interpolation and extreme point parity interpolation. The analog signal z is assumed to consist of three sinusoidal signals of different frequencies and different amplitudes, as follows:

z＝sin(20*p*t)+1.5cos(40*π*t)+2.5cos(80*π*t)

as shown in fig. 2, when the ESMD interpolation method is used to decompose the analog signal, the generated mode has a distortion phenomenon, and the amplitude deviation between the mode and the original signal is large. Fig. 3 is a diagram of an analog signal decomposed by the improved interpolation method provided by the present invention, which effectively alleviates the distortion problem caused by the ambiguity of the endpoint of the ESMD interpolation.

And thirdly, carrying out complexity detection on the modal component obtained by ESMD decomposition based on the permutation entropy algorithm, taking the detected modal component as a judgment criterion for distinguishing abnormal sound and background noise, and obtaining the effective abnormal sound component in a self-adaptive manner.

The specific calculation process of the permutation entropy is as follows:

where l is the time delay and m is the reconstruction dimension, the m elements in x (i) are sorted in ascending order to obtain:

X_i'＝{x(i+(j₁-1)*l)≤x(i+(j₂-1)*l)

≤…≤x(i+(j_m-1)*l)}

thus, each vector x (i) has a set of permutation sequences:

Sg＝{j₁,j₂,j₃,…j_m}

Wherein m! A different arrangement. Calculating the probability p of each permutation appearing in X (i)₁、p₂、…p₃Then the normalized permutation entropy is:

where N is the time series length, m is the reconstruction dimension and l is the time delay. According to the experimental result, the reconstruction dimension m is generally selected to be 3-7. The time delay has a small influence on the permutation entropy and can be generally selected to be 1.

In the invention, the selection of the mode is judged by judging whether the arrangement entropy H of the mode components with different frequency scales obtained by decomposing the abnormal sound signals of the public places added with the random T distribution sequences is larger than the threshold theta. Experiments show that the effect of extracting the abnormal sound features is good when the value of theta is in the range of 0.25-0.35.

Fourth, screening frequency of modal component

The number of modal screens is determined by a number of experiments to be the optimum number of screens, with a preferred value of 12.

The invention utilizes the above improvement points to realize the extraction and identification of the abnormal sound characteristics of public places, and as shown in figure 1, the method mainly comprises three parts: the method comprises the following steps: and decomposing, characteristic extracting and identifying abnormal sounds to be identified in public places.

The method comprises the following specific steps:

step 1: and inputting abnormal sound signals to be identified in public places and preprocessing the abnormal sound signals.

Step 2: and decomposing the abnormal sound signal to be identified into a series of modal components by adopting an improved pole symmetric modal D-ESMD decomposition method, wherein each order of modal components respectively comprises the characteristics of the abnormal sound signal in different frequency bands.

And 4, step 4: judging whether the feature vector is valid; if not, skipping to the step 3; if yes, executing step 5;

and 5: the identification process of the abnormal sound to be identified in the public place comprises the following steps: firstly, randomly selecting each class and a certain number of training samples from an established abnormal sound library, solving the feature vector of the training samples through the step 2 and the step 3, and establishing an SVM classification model; then, classifying the feature vectors of the abnormal sounds to be recognized by using the established SVM classification model to obtain a classification recognition result;

the method comprises the following steps of D-ESMD, wherein the D-ESMD is used for extracting the characteristics of abnormal sounds to be identified in public places:

Step 2.3, the abnormal sound signal X after the noise addition is obtained_iConnecting adjacent extreme points, and marking the middle point of the line segment as F_iSupplement left and right boundary points F₀And F_n. An extreme value midpoint odd-even interpolation method for replacing ESMD by adopting a symmetrical midpoint interpolation method to construct an interpolation curve L for n +1 extreme value midpoints^*。

Step 2.4 reaction of X_i-L^*As input, repeat the above step 2.3 until the number of screening reaches the maximum 12 to obtain the first order modal component M₁ ⁱCalculating the value of the permutation entropy of the modal components; if the value of the permutation entropy of the signal is larger than a predetermined threshold value theta, the signal is regarded as an abnormal sound modal component, otherwise, the signal is regarded as a noise component;

FIG. 4 is a comparison graph of ROC curves of the present invention and several other abnormal sound feature extraction methods. The ESMD is a pole symmetric modal decomposition method, the EEMD is a total empirical mode decomposition method, the SaSEEMD is a total empirical mode decomposition method based on alpha distribution, and the ELMD is a total local mean decomposition method. D-ESMD is the improved ESMD decomposition method provided by the invention.

Claims

1. A method for extracting and identifying abnormal sound features in public places is characterized by comprising the following steps: decomposing abnormal sounds to be identified in public places, extracting and identifying characteristics; the method comprises the following concrete steps:

step 1: inputting abnormal sounds to be identified in public places and preprocessing the abnormal sounds;

step 2: decomposing the abnormal sound signal to be identified by adopting an improved pole symmetric modal decomposition D-ESMD method to obtain modal components of each order, wherein each modal component respectively comprises the characteristics of the abnormal sound signal in different frequency bands;

and step 3: calculating the energy ratio of each order of modal component obtained in the step 2 relative to the original abnormal sound signal, and combining the energy ratios into a vector form to carry out normalization processing to be used as a feature vector of the abnormal sound signal to be identified;

the D-ESMD decomposition method is characterized in that on the basis of a pole symmetric mode ESMD decomposition method, a random T distribution noise sequence is added to abnormal sounds to be identified in a public place, a symmetric midpoint interpolation method is adopted to replace an extreme value midpoint parity interpolation method of the ESMD, arrangement entropy values are calculated for decomposed mode components, the mode component screening times are improved, complexity detection of each mode is completed, and effective mode components of the abnormal sounds are obtained in a self-adaptive mode;

the abnormal sound library comprises explosion sound, scream sound, gunshot sound and glass breaking sound;

the D-ESMD decomposition method comprises the following specific processes:

step 2.6 if i<N, let i be i +1, repeat steps 2.2 to 2.5, and the T distribution noise signal added each time is different until N decompositions are performed, and all modal components M are obtained_k ⁱTaking the overall average value and taking the result as the final modal component M of the signal to be decomposed_k：

2. The method for extracting and identifying the abnormal sound features in the public places according to claim 1, wherein the symmetrical midpoint interpolation method comprises the following specific steps:

y_mean＝(y_max+y_min)/2

3. The method for extracting and identifying the abnormal sound features in the public places according to claim 1, wherein the maximum value of the screening times in the step 2.4 is prioritized to 12.

4. The method for extracting and identifying the abnormal sound features in the public places according to claim 1, wherein the specific calculation process of the permutation entropy is as follows:

X'_i＝{x(i+(j₁-1)*l)≤x(i+(j₂-1)*l)≤…≤x(i+(j_m-1)*l)}

thus, each vector x (i) has a set of permutation sequences:

Sg＝{j₁,j₂,j₃,…j_m}

in the formula, j represents an index of a column where each element in the reconstruction component is located;