CN115457966A

CN115457966A - Pig cough sound identification method based on improved DS evidence theory multi-classifier fusion

Info

Publication number: CN115457966A
Application number: CN202211128776.2A
Authority: CN
Inventors: 尹艳玲; 沈维政; 王锡鹏; 纪楠; 寇胜利; 戴鑫鹏; 梁晨; 董娜
Original assignee: Northeast Agricultural University
Current assignee: Northeast Agricultural University
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2022-12-09
Anticipated expiration: 2042-09-16
Also published as: CN115457966B

Abstract

The application discloses a pig cough sound identification method based on improved DS evidence theory multi-classifier fusion, which comprises the following steps: collecting sound fragments of live pigs in a pigsty to obtain a corpus; based on the corpus, obtaining a training set and a test set, and extracting a plurality of acoustic features in the training set and the test set; inputting a plurality of acoustic features in the training set into a plurality of base classifiers, and outputting to obtain performance evaluation indexes of the base classifiers; screening the base classifier according to the performance evaluation index of the base classifier to obtain an optimal base classifier; training the optimal selection base classifier by using the training set to complete a target training model; and inputting the test set into a target training model, and fusing the output results of the optimal base classifier by adopting an improved DS evidence theory to complete the recognition of the cough sounds of the pigs. The method improves DS fusion by adopting distance fusion, solves the problem that data classification of a DS fusion method close to a decision boundary part is unreliable, and can remarkably improve the identification precision of the cough sound of the live pig.

Description

Pig cough sound identification method based on improved DS evidence theory multi-classifier fusion

Technical Field

The application relates to the field of voice signal processing, in particular to a pig cough sound identification method based on improved DS evidence theory multi-classifier fusion.

Background

In the process of live pig breeding, live pig respiratory diseases become one of main reasons for restricting the development of the live pig breeding industry due to high fatality rate, strong infectivity and the like, so that a rapid and accurate respiratory disease early warning method is urgently needed. In recent years, researches show that early warning of respiratory diseases can be realized by monitoring cough sounds of pigs, wherein the key technology is to identify the cough sounds of the pigs. From the existing research, the main method focuses on feature selection, feature fusion and classifier optimization so as to improve the classification performance. However, these classification algorithms are all based on a single classifier model, and are susceptible to environmental noise interference, and the classification accuracy is difficult to further improve.

Disclosure of Invention

The method for identifying the cough sound of the live pigs by using the improved DS multi-classifier fusion method obviously improves the sound identification precision.

In order to achieve the above object, the present application provides a pig cough sound identification method based on improved DS evidence theory multi-classifier fusion, comprising the steps of:

collecting sound segments of live pigs in a pigsty to obtain a corpus;

obtaining a training set and a test set based on the corpus, and extracting a plurality of acoustic features in the training set and the test set;

inputting a plurality of acoustic features in the training set into a plurality of base classifiers, and outputting to obtain a plurality of base classifier performance evaluation indexes;

screening the base classifier according to the performance evaluation index of the base classifier to obtain an optimal base classifier;

training the optimal base classifier by using the training set to complete a target training model;

and inputting the test set into the target training model, and fusing output results of the optimal base classifier by adopting an improved DS evidence theory to complete the pig cough sound recognition.

Preferably, the method for obtaining the training set and the test set includes:

labeling the corpus to obtain cough sound segments and non-cough sound segments;

and dividing the cough sound segments into a training set and a testing set according to a certain proportion based on the non-cough sound segments.

Preferably, the acoustic features include: mel-frequency cepstral coefficients, linear prediction cepstral coefficients, gamma pass cepstral coefficients, and power spectral density.

Preferably, the base classifier includes: support vector machines, random forests and K nearest neighbors classifiers.

Preferably, the base classifier performance evaluation index includes: and comprehensively evaluating the classification precision, the error similarity and the classification precision-error similarity, wherein the indexes are defined as follows:

assuming that the total number of cough and non-cough samples involved in classification is NA, the number of samples correctly classified by the ith base classifier is NR _i The number of correctly classified samples of the jth base classifier is NR _j The number of samples with errors classified by two base classifiers i and j is NF _ij ；

The classification accuracy of the ith base classifier is defined as:

correspondingly, the classification precision of the jth base classifier is as follows:

in the formula ,OA_i and OA_j Representing the classification accuracy of base classifiers i and j, respectively;

therefore, OA represents the ratio of the number of correctly classified samples to the total number of samples, i.e., the classification precision, and the value range is [0,1]; meanwhile, the error similarity between the ith base classifier and the jth base classifier is defined as:

in the formula ,ESR_ij Representing the degree of error similarity between two base classifiers i and j;

therefore, ESR represents the proportion of the number of samples classified simultaneously as erroneous to the total number of samples between two base classifiers, i.e. the degree of similarity of the errors between the two base classifiers, and its value range is [0,1]; the classification precision-error similarity comprehensive evaluation between the ith base classifier and the jth base classifier is defined as:

in the formula, OAESR _ij Representing the comprehensive evaluation of classification precision-error similarity between the base classifiers i and j;

for a system consisting of N base classifiers, its OAESR is defined as:

wherein OAESR represents the classification precision-error similarity comprehensive evaluation index.

Preferably, the method for screening a plurality of the base classifiers comprises the following steps: and (2) selecting preferentially by adopting a two-step screening method to obtain the preferred base classifier, wherein the two-step screening method comprises the following steps:

assuming that the number of the initial base classifiers is L ₁ Setting a threshold of the ESR to be ESR _thr Calculating the OA and the ESR between each two of the base classifiers _ij If the ESR between two of the base classifiers is zero _ij Not greater than the threshold ESR _thr Temporarily retaining the base classifier if the ESR between two base classifiers is present _ij Is greater than the threshold ESR _thr If the error similarity of the two base classifiers is higher, continuously judging the OA of the two base classifiers, eliminating the base classifier with a smaller OA value, temporarily retaining the base classifier with a larger OA value, traversing all the ESRs, continuously removing the base classifiers with a high ESR and a low OA value, and finally retaining the base classifier which is the preferred classifier after the first screening, wherein the number of the preferred base classifiers after the first screening is L ₂ (ii) a Forming a plurality of groups according to different numbers of the preferred base classifiers, calculating the OAESR values of all the base classifier combinations in each group, and sequencing the OAESR values from large to small, wherein the threshold of the OAESR is set as the OAESR _th The combinations in each group that exceed the threshold are fused.

Preferably, the method for fusing the output results of the preferred base classifier by using the improved DS evidence theory includes:

assume that each base classifier outputs proposition A with probability m _i (A _i ) Then, for n base classifiers, after DS fusion, the output is:

where K represents a collision coefficient, expressed as:

where Σ denotes summation, n denotes intersection,

the empty set is represented, in DS fusion, when the KNN output probability is 0, a single negative situation occurs, that is, if there is an output probability of 0 in a single base classifier, after DS fusion, the output probability is determined to be 0, resulting in no advantage obtained after fusion, and on the contrary, possibly resulting in performance degradation; therefore, the KNN output probability is converted into the KNN output probabilities of 0 and 1, when the output probability is 1, the KNN output probability is represented by the probability alpha, when the output probability is 0, the KNN output probability is represented by the probability 1-alpha, the value range of the alpha is (0.5, 1), and the optimal alpha value can be obtained by a linear search method;

when the fusion result is close to the decision boundary 0.5, the DS fusion result is unreliable due to the influence of factors such as noise interference, and a distance fusion algorithm is adopted to improve the DS fusion algorithm; for the test sample x, assuming that the probability of coughing output after DS evidence theory fusion is P and the classification result of distance fusion is RD, the final classification result R of the test sample is:

wherein, 1 represents cough, 0 represents non-cough, beta is a conversion boundary, the value range is [0.3,0.5], and the optimal beta value is searched for all the fusion strategies of the base classifiers by using a linear search method.

Preferably, the process of using the distance fusion algorithm includes:

use of

To represent the ith feature vector of the jth training sample, using y _i To represent the ith feature vector of the current test sample, wherein i =1,2,3,4 represents the mel-frequency cepstrum coefficient, the linearity, respectivelyPredicting cepstral coefficients, the gamma pass cepstral coefficients, and the power spectral density; let p (j) be a function that returns the jth test sample class,

denotes y _i And

the distance is calculated in a manhattan distance mode; the fusion distance D from the current test sample to the jth training sample _j Is defined as:

wherein ,

where M represents the total number of training samples, and the class R of the test sample is:

compared with the prior art, the application has the following beneficial effects:

according to the method, the live pig cough sound is identified by using an improved DS multi-classifier fusion method, the combination of different acoustic features and classifiers is subjected to optimized screening by using a secondary screening method, three indexes of OA, ESR and OAESR are defined for screening of the basis classifiers, the screened basis classifiers are output, and an improved DS algorithm is used for fusion to obtain a classification result. The method and the system have the advantages that different characteristics and different classifiers are optimized and screened at the same time, the DS fusion is improved by adopting distance fusion, the problem that data classification of a DS fusion method close to a decision boundary part is unreliable is solved, and compared with the existing algorithm, the method and the system can obviously improve the identification precision of the cough sound of the live pig.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method in an embodiment of the present application;

FIG. 2 is a flow chart of a first-pass filter-based classifier algorithm in an embodiment of the present application;

fig. 3 is a flowchart of a second filtering-based classifier algorithm in the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

As shown in fig. 1, a schematic flow chart of a method according to an embodiment of the present application includes:

collecting sound segments of live pigs in a pigsty to obtain a corpus; the corpus is labeled cough sounds and non-cough sound segments collected in an actual pigsty, and 1250 cough sounds and 1250 non-cough sounds are randomly selected from the corpus to serve as a training set and a test set. In this embodiment, all sound samples are divided into training set and test set in a scale of 4. Acoustic features of the sound signal are extracted. Firstly, preprocessing a sound signal, wherein the preprocessing process comprises the following steps: firstly, filtering sound signals, wherein the frequency of a band-pass filter is 100 Hz-16 kHz, then performing framing and windowing, the frame length is 20ms, the overlapping length is 10ms, and the window function is a Hamming window. Acoustic features of the sound signal are extracted, including mel-frequency cepstral coefficients (MFCCs), linear Prediction Cepstral Coefficients (LPCCs), gamma pass cepstral coefficients (GTCCs), and Power Spectral Density (PSD).

The MFCC extraction process comprises the following steps: and performing fast Fourier transform on the preprocessed signals to obtain frequency spectrums of the preprocessed signals, calculating power spectrums of the preprocessed signals, filtering power spectrum densities through a group of Mel filters, and performing DCT (discrete cosine transformation) to obtain the MFCC.

LPCC is a representation of Linear Prediction Coefficients (LPC) in the cepstral domain. The LPC is a set of prediction coefficients directly determined from a sound signal, which minimizes the prediction error between the actual sound signal and the linear prediction under the minimum mean square error criterion. The LPC is cepstrud to obtain the LPCC, and the LPCC contains envelope information of a signal spectrum.

The GTCC extraction process is similar to the MFCC except that the mel filter bank is replaced with a gamma pass filter bank.

The PSD reflects the power of the sound signal as a function of frequency. The PSD extraction method comprises an autocorrelation function method, a periodogram method, an average periodogram method and the like. The most common of these is the averaging periodogram method, i.e., the speech signal is divided into a number of segments, the PSD is calculated in each segment, and finally averaged.

In the acoustic feature extraction process, the MFCC order is 13, the LPCC order is 24, the GTCC order is 13, and the FFT point number is 1024 when the PSD is solved.

Different features are input into different classifiers to obtain different results, in the embodiment, one classifier is input into each feature to be called a base classifier, and twelve different base classifiers are obtained by four features and three classifiers. In this embodiment, the base classifier includes: support vector machines, random forests and K nearest neighbors classifiers. For convenience of illustration, in this embodiment, C is used _i Represents the ith base classifier, wherein i is 1 to 12, and the specific numbering rule is shown in Table 1. E.g. C ₁ Representing the base classifier resulting from inputting the LPCC into the SVM classifier. With (C) _i ,C _j ) Is represented by C _i and C_j And (4) fusing. The SVM kernel function is 'RBF', the KNN distance calculation mode adopts Manhattan distance, and the RF decision tree is 100.

TABLE 1

In this embodiment, accuracy and difference indexes of the base classifier are defined to evaluate and screen the base classifier, and the indexes mainly include an Overall Accuracy (OA), an Error Similarity (ESR), and fusion of the two indexes: overall Accuracy-Similarity of errors (OAESR). The OA represents the overall recognition precision of the classifiers, the ESR represents the similarity of the classification error data of the two classifiers, and the fusion index OAESR comprehensively evaluates the accuracy and the difference of the two classifiers. These several indices are used for the screening of the base classifier.

Assuming that the total number of cough and non-cough samples involved in classification is NA, the number of samples correctly classified by the ith base classifier is NR _i The number of correctly classified samples of the jth base classifier is NR _j The number of samples with errors classified by two base classifiers is NF _ij 。

Then the OA of the ith base classifier is defined as:

correspondingly, the OA of the jth base classifier is:

OA represents the ratio of correctly sorted samples to the total number of samples, and the value range is [0,1%](ii) a While ESR between ith and jth base classifiers _ij Is defined as:

ESR _ij the ratio of the number of samples with errors in simultaneous classification between two base classifiers to the total number of samples is shown, and the value range is [0, 1%](ii) a OAESR between ith and jth base classifiers _ij Is defined as:

for a system consisting of N base classifiers, its OAESR is defined as:

the screening process of the base classifiers is divided into two times, the first screening algorithm flow is shown in figure 2, and the number of the initial base classifiers is assumed to be L ₁ Setting ESR threshold ESR _thr Calculate OA and ESR between each base classifier _ij Wherein the set A = { C ₁ ，C ₂ ,…C _L1 }. If ESR between two base classifiers _ij Not greater than a threshold ESR _thr Then the base classifier is temporarily retained if the ESR between two base classifiers is present _ij ESR greater than threshold _thr If the error similarity of the two base classifiers is higher, continuing to judge the OA of the two base classifiers, eliminating the base classifiers with lower OA values, temporarily retaining the base classifiers with higher OA values, traversing all ESR, and continuously removing the base classifiers with higher ESR and lower OA values, wherein the finally retained base classifier is the preferred classifier after the first screening. After the first screening, the number of the preferred base classifiers is L ₂ 。

The OA and ESR of each base classifier were calculated, in this example, the ESR threshold was set to 2.5%, and after the first screening, the base classifier obtained included C ₁ 、C ₂ 、C ₅ 、C ₈ 、C ₉ 。

For L obtained in the first screening ₂ The individual base classifier can have a plurality of combinations according to the different quantity of the fusion classifiers, and further preferential screening is required. In this example, all L's are first obtained ₂ And grouping the base classifiers according to the number of the fusion classifiers in different combination modes, calculating the OAESR values of all the base classifier combinations in each group, and sequencing the OAESR values from large to small. In this embodiment, the combination of the OAESR values in each set accounting for the first 20% is taken for fusion, and then the fusion result is obtained through comparative analysis.

The five base classifiers obtained after the first screening are screened for the second time, as shown in fig. 3, four sets of classifiers with different numbers of 2,3,4,5 can be obtained respectively, and then are respectively marked as sets 2 to 5, wherein the set 5 has only one combination, so that the combinations are retained. Calculating OAESR values of different combinations in each set, taking the combination of which the OAESR value accounts for the first 20 percent in each set, and finally obtaining a fusion strategy comprising (C) ₂ ,C ₅ )，(C ₂ ,C ₉ )，(C ₁ ,C ₂ ,C ₅ )，(C ₂ ,C ₅ ,C ₉ )，(C ₁ ,C ₂ ,C ₅ ,C ₈ )，(C ₁ ,C ₂ ,C ₅ ,C ₈ ,C ₉ )。

And fusing the screened combinations by adopting an improved DS multi-classifier fusion algorithm. Assume that each base classifier outputs proposition A with probability m _i (A _i ) Then, for n base classifiers, after DS fusion, the output is:

where K is the collision coefficient, expressed as:

where, sigma represents summation, n-tableThe intersection is shown in the figure of the drawing,

indicating an empty set. In DS fusion, when the KNN output probability is 0, a single vote may occur, that is, if there is an output probability of 0 in a single base classifier, the output probability is determined to be 0 after DS fusion, so that no advantage is obtained after the fusion, and performance may be reduced; therefore, the KNN output probability is converted into the KNN output probabilities of 0 and 1, when the output probability is 1, the KNN output probability is represented by the probability alpha, when the output probability is 0, the KNN output probability is represented by the probability 1-alpha, the value range of the alpha is (0.5, 1), and the optimal alpha value can be obtained by a linear search method;

The distance fusion algorithm described above is described as follows, using

To represent the ith feature vector of the jth training sample, using y _i To represent the ith feature vector of the current test sample, where i =1,2,3,4 represents LPCC, MFCC, GTCC, and PSD, respectively. Let ρ (j) be the function that returns the jth test sample class.

Denotes y _i And

the distance is calculated as manhattan distance. The fusion distance D from the current test sample to the jth training sample _j Is defined as:

wherein ,

in the embodiment, the KNN output probabilities are 0 and 1, and the conversion parameter α finds the optimal value by means of linear search. And replacing the DS fusion samples close to the decision boundary beta by adopting the classification result of distance fusion. The decision boundary of each combination may be different, and the optimal boundary value needs to be obtained by means of linear search.

And comparing and analyzing results of different combinations, and selecting a proper fusion mode of the base classifier according to the classification accuracy and the calculation complexity.

The above-described embodiments are merely illustrative of the preferred embodiments of the present application, and do not limit the scope of the present application, and various modifications and improvements made to the technical solutions of the present application by those skilled in the art without departing from the spirit of the present application should fall within the protection scope defined by the claims of the present application.

Claims

1. The pig cough sound identification method based on the improved DS evidence theory multi-classifier fusion is characterized by comprising the following steps of:

collecting sound fragments of live pigs in a pigsty to obtain a corpus;

2. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 1, wherein the method for obtaining the training set and the test set comprises:

3. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 1, wherein the acoustic features comprise: mel-frequency cepstrum coefficients, linear prediction cepstrum coefficients, gamma pass cepstrum coefficients, and power spectral density.

4. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 1, wherein the base classifier comprises: support vector machine, random forest and K nearest neighbors classifier.

5. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 1, wherein the base classifier performance evaluation index comprises: and comprehensively evaluating the classification precision, the error similarity and the classification precision-error similarity, wherein the indexes are defined as follows:

The classification accuracy of the ith base classifier is defined as:

therefore, ESR represents the ratio of the number of samples that are classified simultaneously as erroneous to the total number of samples between two base classifiers, i.e. the degree of similarity of the errors between two base classifiers, and its value range is [0,1]; the classification precision-error similarity comprehensive evaluation between the ith base classifier and the jth base classifier is defined as:

for a system consisting of N base classifiers, its OAESR is defined as:

6. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 5, wherein the method for screening a plurality of the base classifiers comprises the following steps: and selecting preferably by adopting a two-step screening method to obtain the preferred base classifier, wherein the two-step screening method comprises the following steps:

assuming that the number of the initial base classifiers is L ₁ Setting a threshold of the ESR to be ESR _thr Calculating the OA and the ESR between each two of the base classifiers _ij If the ESR between two of the base classifiers is zero _ij Not greater than the threshold ESR _thr Temporarily retaining the base classifier if the ESR between two base classifiers is present _ij Is greater than the threshold ESR _thr If the error similarity between the two base classifiers is higher, continuing to judge the OA of the two base classifiers, eliminating the base classifiers with smaller OA values, temporarily reserving the base classifiers with larger OA values, traversing all the ESRs, continuously removing the base classifiers with high ESR and low OA values, and finally reserving the base classifiers which are the first base classifiersThe number of the preferred base classifiers after the first screening is L ₂ (ii) a Forming a plurality of groups according to different numbers of the preferred base classifiers, calculating the OAESR values of all the base classifier combinations in each group, and sequencing the OAESR values from large to small, wherein the threshold of the OAESR is set as the OAESR _th The combinations in each group that exceed the threshold are merged.

7. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 3, wherein the method for fusing the output result of the preferred base classifier by using the improved DS evidence theory comprises the following steps:

where K represents a collision coefficient, expressed as:

where Σ denotes summation, n denotes intersection,

the empty set is represented, in the DS fusion, when the KNN output probability is 0, a situation of a single vote rejection occurs, namely if the single base classifier has the output probability of 0, after the DS fusion, the output probability is determined to be 0, so that no advantage is obtained after the fusion, and the performance is possibly reduced; therefore, the conditions that the KNN output probability is 0 and 1 are converted, when the output probability is 1, the probability is represented by alpha, when the probability is 0, the probability is represented by 1-alpha, and the value range of alpha is represented byTo (0.5, 1), a linear search method can be used to obtain an optimal α value;

when the fusion result is close to the decision boundary by 0.5, the DS fusion result is unreliable due to the influence of factors such as noise interference, and the DS fusion algorithm is improved by adopting a distance fusion algorithm; for the test sample x, assuming that the probability of coughing output after DS evidence theory fusion is P and the classification result of distance fusion is RD, the final classification result R of the test sample is:

wherein 1 represents cough, 0 represents non-cough, beta is a conversion boundary, the value range thereof is [0.3,0.5], and the optimal beta value is searched for all the basis classifier fusion strategies by using a linear search method.

8. The improved DS evidence theory multi-classifier fusion-based pig cough sound identification method according to claim 7, wherein the process of adopting the distance fusion algorithm comprises:

use of

To represent the ith feature vector of the jth training sample, using y _i To represent an ith feature vector of a current test sample, wherein i =1,2,3,4 represents the mel-frequency cepstral coefficient, the linear prediction cepstral coefficient, the gamma-pass cepstral coefficient, and the power spectral density, respectively; let p (j) be a function that returns the jth test sample class,

denotes y _i And

wherein ,