CN113792596A

CN113792596A - Acoustic classification method and system based on preprocessing ensemble learning

Info

Publication number: CN113792596A
Application number: CN202110913549.XA
Authority: CN
Inventors: 周松斌; 万智勇; 刘忆森
Original assignee: Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Current assignee: Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2021-12-14

Abstract

The invention relates to an acoustic classification method and system based on preprocessing ensemble learning, wherein the method comprises the following steps: deploying a sound acquisition device at the detection object for receiving a sound signal emitted by the detection object; the collected sound signals are processed by using various preprocessing algorithms to obtain a corresponding feature atlas, the training and modeling of the same neural network algorithm are carried out according to the feature atlas, and the weighting and voting fusion results of a plurality of obtained model results finally identify whether the state of the detection object is normal or not, so that the robustness of the model is greatly improved. The invention detects whether the state of the object to be detected is normal or not by the sound emitted by the object to be detected, can rapidly and nondestructively detect the object to be detected, and avoids subjective influence caused by manual auscultation. The detection efficiency is improved, and the detection time is shortened. The rapid, efficient and lossless object state detection work is realized.

Description

Acoustic classification method and system based on preprocessing ensemble learning

Technical Field

The invention relates to the technical field of detection, in particular to an acoustic classification method and system based on preprocessing ensemble learning.

Background

In recent years, acoustic defect detection has been widely used in industrial production as one of nondestructive tests. However, most production lines are provided with devices for listening and checking by trained workers, the detection efficiency is low, the detection result is influenced by the mental state, the operation state and the working state of the workers, and the device is strong in subjectivity and not stable enough. Therefore, an acoustic classification method based on preprocessing ensemble learning is urgently needed.

Disclosure of Invention

In order to overcome the defects of the prior art method, the invention provides an acoustic classification method and system based on preprocessing ensemble learning, which can perform nondestructive rapid online classification detection on the detected object.

Aiming at the technical problem, the scheme adopted by the invention is as follows:

in a first aspect, the present invention provides an acoustic classification method based on preprocessing ensemble learning, including the following steps:

step S1, placing a sound collection device at the position of the object to be detected to collect normal sound data D₁And abnormal data D₂Two types of sound data sets;

step S2, dividing the two types of data sets into a training set D and a testing set T respectively, wherein the training set D comprises training normal sound samples D1 and training abnormal sound samples D2, and the testing set T comprises testing normal samples T1 and abnormal testing samples T2;

step S3, 4 feature extraction methods are carried out on D1, D2, T1 and T2 to generate { D11, D12 … … … D1n }, { D21 · D2n }, { T11 · T1n } and { T21 · T2n };

step S4, inputting the generated n feature data sets of two types of { D11, D12 · … D1n } and { D21 · D2n } into n homogeneous two classifiers { M based on the convolutional neural network₁M₂...M_nIn the previous step, training and modeling are carried out;

step S5, testing the trained n classifiers by using the test set, counting the times of misjudgment as qualified, and obtaining the respective weights of all models;

step S6, in the real-time classification process, the collected sound data t is subjected to various pre-processed feature data { t }₁t₂...t_nAnd inputting the n trained secondary classifiers, and weighting the output results of the n models according to an output formula to obtain classification results.

Further, the step S1 specifically includes:

step S11, placing a microphone at the position of the object to be detected, and continuously acquiring normal and abnormal sound data sets in a working state according to whether the object is normal or not; and at a time interval T₀Storing at a sampling frequency f_c(ii) a The time length of the collected sound data is T₀The data length is N ═ T₀*f_c；

Step S12, calculating kurtosis for each collected sound data, and calculating according to difference of kurtosis values; determining a signal starting point S by using a threshold (10-30) after differential operation₀And a fixed signal length N₀And according to the starting point S₀And N₀Intercepting the data to finally obtain intercepted D1 and D2 data sets:

in the formula: k_iIs the kurtosis of the ith point, x_iAs the point of the i-th spot,

σ is the standard deviation of the data as the mean of the data.

Further, the step S2 specifically includes:

dividing two types of sound data sets D1 and D2 into a training set and a test set respectively;

the training set is denoted as T [ { (D)₁₁,0),(D₁₂,0)...(D_1m,0)},{(D₂₁,1),(D₂₂,1)...(D_2m,1)}]Test set P [ { (D)₁₁,0),(D₁₂,0)...(D_1j,0)},{(D₂₁,1),(D₂₂,1)...(D_2j,1)}]Therein is markedLabel 0 indicates normal sound data, and label 1 indicates abnormal sound data; the training set has 2m sound data, and the test set has 2j data.

Further, the step S3 specifically includes:

step S31, performing windowing and framing on each captured sound data, where each frame has a length of h and the window function is as follows:

in the formula: w is a window sequence, and n is the nth point;

after windowing each sound data frame by frame, N is carried out_f∈[512,1024,4096]Short-time Fourier transform of the points to obtain a time-frequency spectrum characteristic diagram data set F_stft：

F_stft＝{Γ₁(W*D₁₁),...,Γ_i(W*D_1i)}

In the formula: f_stftIs a time spectrum, i is a frame number, D_1iF is short-time Fourier transform of ith frame data sequence of the first type of sound data;

step S32, the obtained time frequency data F_stftPerforming exponential operation to adjust the difference of energy amplitudes of all frequency bands, increasing the amplitude of the frequency of the interested part, and obtaining a characteristic diagram F corresponding to the amplitude_v(ii) a The adjustment formula is as follows:

in the formula

The ith frequency in each frame frequency spectrum is indicated, alpha belongs to (0,3) as an amplitude adjustment factor, F (F)_i) Adjusting the frequency spectrum after the amplitude value is adjusted for each frame;

step S33, extracting multichannel Mel features from the time-frequency spectrum feature data by utilizing a Mel filter bank to obtain Mel spectrum feature map data F_mel：

F_mel＝F_stft*H_mel

In the formula: h_melIs a Mel transformation matrix;

step S34, dividing each frame frequency spectrum in the time frequency spectrum data into frequency f_a,f_bSplitting the frequency spectrum according to the frequency bands of 0-f_a,

Splicing front and back, only paying attention to effective frequency bands in the time frequency spectrum, and obtaining spliced time frequency spectrum characteristic diagram data

Further, the step S4 specifically includes:

step S41, applying the four preprocessing methods to the original sound data set D₁、D₂Generating a plurality of characteristic diagram data by each preprocessing method, wherein n groups are formed in total;

step S42, respectively sending the preprocessed n (n is odd) groups of feature training set data into n same-type two-classification convolutional networks, and training and constructing n two-classification models { M₁M₂...M_n}。

Further, the step S5 specifically includes:

testing the trained n binary classifiers by using the n groups of test set data obtained by the preprocessing to obtain the predicted misjudgment rate { r } of the system to the normal sound₁r₂...r_nThe number v of the n models judged to be normal in each prediction is calculated; obtaining respective weight according to the misjudgment rate of each model on normal prediction:

in the formula w_iWeighting the result of the ith model on the normal prediction, e is a natural number, r_iThe misjudgment rate of the ith model on normal prediction is obtained;

the number v of n models judged to be normal in each prediction, and the probability p of all models judged to be normal in each prediction is as follows:

output probability M of the final system predicted to be normal_outComprises the following steps:

in the formula: p is a radical of_iThe probability of being judged as normal for the ith model.

Further, the step S6 specifically includes:

step 61, during real-time detection, collected sound data t are collected according to a signal starting point S₀And length N₀Intercepting; respectively carrying out 4 kinds of n feature extraction processing on the intercepted data set to obtain { t }₁t₂...t_nN feature data;

step 62, the processed characteristic data are respectively sent to two trained classifiers corresponding to the processed characteristic data, and the output of each model is obtained;

and 63, obtaining a final classification result of the system on the equipment to be detected in the currently acquired working state by using a weighting formula.

In a second aspect, the present invention provides an acoustic classification system based on pre-processing ensemble learning, including:

a data acquisition module for placing a sound acquisition device at the object to be detected to acquire normal sound data D₁And abnormal data D₂Two types of sound data sets;

the sample classification module is used for respectively classifying the two types of data sets into a training set D and a testing set T, wherein the training set D comprises training normal sound samples D1 and training abnormal sound samples D2, and the testing set T comprises testing normal samples T1 and abnormal testing samples T2;

the characteristic extraction module is used for carrying out 4 characteristic extraction methods on D1, D2, T1 and T2 to generate { D11, D12 · · D1n }, { D21 · · D2n }, { T11 · · T1n } and { T21 · · T2n };

a training modeling module for inputting the generated n feature data sets of two types of { D11, D12 · D1n } and { D21 · D2n } into n homogeneous two classifiers { M based on the convolutional neural network₁M₂...M_nIn the previous step, training and modeling are carried out;

the test statistic module is used for testing the trained n classifiers by using the test set, counting the times of misjudgment as qualification and obtaining the respective weights of all models;

a real-time classification module used for carrying out various pre-processed characteristic data { t) on the collected sound data t in the real-time classification process₁t₂...t_nAnd inputting the n trained secondary classifiers, and weighting the output results of the n models according to an output formula to obtain classification results.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the acoustic classification method based on pre-processing ensemble learning when executing the program.

In a fourth aspect, the present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the acoustic classification method based on pre-processing ensemble learning.

The beneficial effect of this patent does:

the invention realizes the online detection of the working condition of the object to be detected by learning the sound information of the object to be detected in the working state in advance. The detection speed can be improved to dozens of times of manual listening and detection, and the detection efficiency is greatly improved. And the stability and the anti-interference performance of the traditional online detection are improved by the result fusion of a plurality of AI models for learning different characteristic diagrams.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of the training process of the acoustic classification method based on pre-processing ensemble learning of the present invention;

FIG. 2 is a flow chart of a testing process of the acoustic classification method based on pre-processing ensemble learning of the present invention;

FIG. 3 is a flow chart of the detection process of the acoustic classification method based on pre-processing ensemble learning of the present invention;

FIG. 4 is a block diagram of an acoustic classification system based on pre-processing ensemble learning according to the present invention;

fig. 5 is a block diagram of an electronic device according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.

Example 1

As shown in fig. 1-3, the present invention provides an acoustic classification method based on preprocessing ensemble learning, comprising the following steps:

The step S1 specifically includes:

step S11, placing a microphone at the object to be detected to acquire sound signals in a working state, and acquiring normal and abnormal sound data sets according to whether the running state of the object is normal or not; and at a time interval T₀Storing at a sampling frequency f_c. The time length of the collected sound data is T₀The data length is N ═ T₀*f_c；

And step S12, calculating kurtosis for each collected sound data, and calculating according to difference of kurtosis values. Determining a signal starting point S by using a threshold (10-30) after differential operation₀The first position exceeding the threshold is the starting point and the fixed signal length N₀And according to the starting point S₀And N₀Intercepting the data to finally obtain intercepted D1 and D2 data sets:

σ is the standard deviation of the data as the mean of the data.

The step S2 specifically includes:

dividing two types of sound data sets D1 and D2 into a training set and a testing set respectively, and numbering and labeling the training sets and the testing set correspondingly; the training set is recorded as:

the training set is denoted as T [ { (D)₁₁,0),(D₁₂,0)...(D_1m,0)},{(D₂₁,1),(D₂₂,1)...(D_2m,1)}]Test set P [ { (D)₁₁,0),(D₁₂,0)...(D_1j,0)},{(D₂₁,1),(D₂₂,1)...(D_2j,1)}]Where label 0 represents normal sound data and label 1 represents abnormal sound data. The training set has 2m sound data, and the test set has 2j data.

The step S3 specifically includes:

step S31, performing windowing and framing on each voice data, where each frame has a length of h, and the window function is as follows:

in the formula: w is a window sequence, and n is the nth point;

F_stft＝{Γ₁(W*D₁₁),...,Γ_i(W*D_1i)}

step S32, the obtained time frequency spectrum data F_stftPerforming exponential operation to adjust the difference of energy amplitudes of all frequency bands to obtain characteristic diagram data F corresponding to the difference_v. The adjustment formula is as follows:

in the formula

The ith frequency amplitude in each frame frequency spectrum is indicated, alpha belongs to (0,3) as an amplitude adjustment factor, F (F)_i) Adjusting the frequency spectrum after the amplitude value is adjusted for each frame;

F_mel＝F_stft*H_mel

In the formula: h_melIs a Mel transformation matrix;

Splicing front and back to obtain spliced time-frequency spectrum characteristic diagram data

The step S4 specifically includes:

step S41, applying the above four preprocessing methods to the sound data set D₁、D₂Generating a plurality of characteristic diagram data by each preprocessing method, wherein n groups are formed in total;

step S42, all n (n is odd) groups of feature training set data and labels are respectively sent into n same-type two-classification convolutional networks for training, and n two-classification models { M are constructed₁M₂...M_n}。

The step S5 specifically includes:

the number v of the n models judged to be normal in each prediction, and the probability p of all the models judged to be normal in each prediction is as follows:

The step S6 specifically includes:

step 61, during real-time detection, collected sound data are processed according to a signal starting point S₀And length N₀And (6) intercepting. Respectively extracting 4 n features from the intercepted data set to obtain n features, and extracting { t } t₁t₂...t_nN feature data;

Example 2

As shown in fig. 4, the present invention provides an acoustic classification system based on preprocessing ensemble learning, which includes a data acquisition module, a sample classification module, a feature extraction module, a training modeling module, a test statistics module, and a real-time classification module;

the data acquisition module is used for placing a sound acquisition device at the position of the object to be detected to acquire normal sound data D₁And abnormal data D₂Two types of sound data sets;

the sample classification module is used for respectively classifying the two types of data sets into a training set D and a testing set T, the training set D comprises training normal sound samples D1 and training abnormal sound samples D2, and the testing set T comprises testing normal samples T1 and abnormal testing samples T2;

the feature extraction module is used for carrying out 4 feature extraction methods on D1, D2, T1 and T2 to generate { D11, D12 · D1n }, { D21 · D2n }, { T11 · T1n } and { T21 · T2n };

the training modeling module is used for inputting the generated n feature data sets of two types, namely { D11, D12 · D1n } and { D21 · D2n } into n homogeneous two classifiers { M based on the convolutional neural network₁M₂...M_nIn the previous step, training and modeling are carried out;

the test statistic module is used for testing the trained n classifiers by using a test set, counting the times of misjudgment as qualification and obtaining the respective weights of all models;

the real-time classification module is used for carrying out various pre-processed characteristic data { t) on the collected sound data t in the real-time classification process₁t₂...t_nAnd inputting the n trained secondary classifiers, and weighting the output results of the n models according to an output formula to obtain classification results.

Other features in this embodiment are the same as those in embodiment 1, and therefore are not described herein again.

Example 3

As shown in fig. 5, based on the same concept, the present invention provides an electronic device, where the electronic device is a server, and the server may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform the steps of the acoustic classification method based on pre-processing ensemble learning as described in the various embodiments above.

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Example 4

Based on the same concept, the present invention also provides a non-transitory computer-readable storage medium storing a computer program, the computer program comprising at least one code section, which is executable by a master control device to control the master control device to implement the steps of the acoustic classification method based on pre-processing ensemble learning according to the embodiments.

The invention realizes the online detection classification of the working conditions of the object to be detected by learning the sound information of the object to be detected in the working state in advance. The application of the invention can improve the detection speed to dozens of times of the manual listening and detecting speed, thereby greatly improving the detection efficiency. And through the result fusion of a plurality of AI models for learning different characteristic diagrams, the method has better stability and anti-interference performance compared with the traditional online detection method.

Although the present invention has been described in detail with reference to the embodiments, it will be apparent to those skilled in the art that modifications, equivalents, improvements, and the like can be made in the technical solutions of the foregoing embodiments or in some of the technical features of the foregoing embodiments, but those modifications, equivalents, improvements, and the like are all within the spirit and principle of the present invention.

Claims

1. An acoustic classification method based on preprocessing ensemble learning is characterized by comprising the following steps:

step S3, 4 feature extraction methods are carried out on D1, D2, T1 and T2 to generate { D11, D12 · D1n }, { D21 · D2n }, { T11 · T1n } and { T21 · T2n };

step S4, inputting the generated n feature data sets of two types of { D11, D12 · · D1n } and { D21 · · D2n } into n homogeneous two classifiers { M based on the convolutional neural network₁M₂...M_nIn the previous step, training and modeling are carried out;

2. The acoustic classification method based on pre-processing ensemble learning of claim 1, wherein the step S1 specifically includes:

σ is the standard deviation of the data as the mean of the data.

3. The acoustic classification method based on pre-processing ensemble learning of claim 1, wherein the step S2 specifically includes:

the training set is denoted as T [ { (D)₁₁,0),(D₁₂,0)...(D_1m,0)},{(D₂₁,1),(D₂₂,1)...(D_2m,1)}]Test set P [ { (D)₁₁,0),(D₁₂,0)...(D_1j,0)},{(D₂₁,1),(D₂₂,1)...(D_2j,1)}]Wherein label 0 represents normal sound data and label 1 represents abnormal sound data; the training set has 2m sound data, and the test set has 2j data.

4. The acoustic classification method based on pre-processing ensemble learning of claim 1, wherein the step S3 specifically includes:

in the formula: w is a window sequence, and n is the nth point;

F_stft＝{Γ₁(W*D₁₁),...,Γ_i(W*D_1i)}

step S32, the obtained time frequency spectrum data F_stftPerforming exponential operation to adjust the difference of energy amplitudes of all frequency bands to obtain characteristic diagram data F corresponding to the difference_v(ii) a The adjustment formula is as follows:

in the formula d_fiThe ith frequency amplitude in each frame frequency spectrum is indicated, alpha belongs to (0,3) as an amplitude adjustment factor, F (F)_i) Adjusting the frequency spectrum after the amplitude value is adjusted for each frame;

F_mel＝F_stft*H_mel

In the formula: h_melIs a Mel transformation matrix;

step S34, dividing each frame frequency spectrum in the time frequency spectrum data into frequency f_a,f_bSplitting the frequency spectrum according to frequency segments

5. The acoustic classification method based on pre-processing ensemble learning of claim 1, wherein the step S4 specifically includes:

step S41, applying four preprocessing methods to the sound data set D₁、D₂Generating a plurality of characteristic diagram data by each preprocessing method, wherein n groups are formed in total;

step S42, all n groups of feature training set data are respectively sent into n same-type two-classification convolutional networks to construct n two-classification models { M₁M₂...M_nAnd n is an odd number.

6. The acoustic classification method based on pre-processing ensemble learning of claim 1, wherein the step S5 specifically includes:

testing the n trained binary classifiers by the test set T according to the n groups of feature data obtained by preprocessing to obtain the misjudgment rate { r } of the system to the normal prediction₁r₂...r_nThe number v of the n models judged to be normal in each prediction is calculated; obtaining respective weight according to the misjudgment rate of each model on normal prediction:

7. The acoustic classification method based on pre-processing ensemble learning of claim 1, wherein the step S6 specifically includes:

8. An acoustic classification system based on pre-processing ensemble learning, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the acoustic classification method based on pre-processing ensemble learning according to any of claims 1 to 7.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the acoustic classification method based on pre-processing ensemble learning according to any one of claims 1 to 7.