CN114550697B - Voice sample equalization method combining mixed sampling and random forest - Google Patents

Voice sample equalization method combining mixed sampling and random forest Download PDF

Info

Publication number
CN114550697B
CN114550697B CN202210083571.0A CN202210083571A CN114550697B CN 114550697 B CN114550697 B CN 114550697B CN 202210083571 A CN202210083571 A CN 202210083571A CN 114550697 B CN114550697 B CN 114550697B
Authority
CN
China
Prior art keywords
sample
data set
voice data
samples
rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210083571.0A
Other languages
Chinese (zh)
Other versions
CN114550697A (en
Inventor
张晓俊
周长伟
朱欣程
陶智
赵鹤鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202210083571.0A priority Critical patent/CN114550697B/en
Publication of CN114550697A publication Critical patent/CN114550697A/en
Application granted granted Critical
Publication of CN114550697B publication Critical patent/CN114550697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to a voice sample equalization method combining mixed sampling and random forest, which comprises the steps of firstly, carrying out feature extraction on an initial voice data set; then, the extracted voice data feature set is balanced by using SMOTE-ENN mixed sampling, and a current balanced voice data set is obtained; secondly, inputting the current balanced voice data set into a double-factor random forest model, and outputting classification evaluation indexes and out-of-bag error classification rates of the double-factor random forest model; finally, judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting a current balanced voice data set; otherwise, updating the mixed sampling rate of SMOTE-ENN mixed sampling according to the out-of-bag error classification rate, returning to perform equalization processing on the extracted voice data set again until the classification evaluation index is converged, and outputting the current equalized voice data set. The invention maximally reserves the sample data with high information value.

Description

Voice sample equalization method combining mixed sampling and random forest
Technical Field
The invention relates to the technical field of data processing, in particular to a voice sample equalization method, a device and equipment for combining mixed sampling and random forest and a computer readable storage medium.
Background
In recent years, artificial intelligence technology has been developed in a breakthrough in speech recognition. However, the data imbalance problem has been a challenging problem in machine learning. The unevenly distributed data of the categories can cause the recognition capability of the classifier to be obviously biased to the majority of categories, so that the satisfactory classification performance of the minority of categories cannot be achieved.
At present, traditional unbalanced learning techniques for solving the problem of unbalanced data classification can be divided into two categories: internal methods and external methods. The internal method is to improve the existing classification algorithm to reduce its sensitivity to class imbalance. The external method preprocesses the training data to balance it. Among external methods, the sampling method for balancing unbalanced data sets can be divided into: SMOTE oversampling and ENN undersampling.
The basic idea of SMOTE oversampling is to analyze the minority samples and artificially synthesize new samples according to the minority samples to be added into the data set, but the distribution of the nearby majority samples is not considered when generating the new samples, and the K neighbor selection has blindness, meets much noise and invades into the majority sample space. ENN undersampling eliminates most samples to obtain an ideal class distribution rate, but causes the loss of classification information in a data set, so that a voice sample equalization method combining mixed sampling and random forest needs to be designed.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the defects that in the prior art, the distribution condition of a plurality of nearby samples is not considered when SMOTE oversampling is used for generating new samples, and a lot of noise invades into a plurality of sample spaces and ENN undersampling causes the loss of classification information in data set.
In order to solve the technical problem, the invention provides a voice sample equalization method combining mixed sampling and random forest, which comprises the following steps:
s101: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
s102: analyzing a few class samples of the voice data feature set by using oversampling SMOTE, generating a new target few class sample according to the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, deleting the target few class sample and the majority class sample according to the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
s103: calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model;
s104: inputting the current balanced voice data set into the double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
s105: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
In an embodiment of the present invention, the analyzing a few class samples of the voice data feature set by using oversampling SMOTE and generating a new target few class sample from the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, and deleting the target few class sample and the majority class sample from the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample to obtain a current equalized voice data set includes:
s201: analyzing the minority class samples S using the oversampled SMOTE min And according to the minority sample S min Generating a sample T gen The sample T is gen Store to at leastClass of samples space K min []Performing the following steps; wherein, sample C gen =count(K min );
S202: judging the sample C gen Whether less than the number M of samples that the oversampled SMOTE needs to generate up If C is gen <M up Returning to execute the step S201, otherwise executing the step S203; wherein M is up = few class samples S min X oversampling ratio N 1
S203: analyzing the sample T with the undersampled ENN gen And a plurality of classes of samples S in the speech data feature set maj If said sample T is a nearest neighbor sample gen K and k or more and the samples T gen If the samples are different in category, delete K min []Of the corresponding sample T gen If said plurality of samples S maj The nearest neighbor samples of (A) are k and more than k and the plurality of types of samples S maj If the samples with different categories are not the same, deleting the majority of samples S maj (ii) a Wherein the undersampled ENN deleted samples T del =T gen +S maj
S204: determining the samples T deleted by the undersampled ENN del Whether less than the number M of samples that the undersampled ENN needs to delete down If T is del <M down Returning to execute the step S203, otherwise, outputting the current balanced voice data set; wherein, M down = majority class sample S maj X undersampling rate N 2
In one embodiment of the present invention, the analyzing the minority sample S using oversampling SMOTE min And according to the minority sample S min Generating a sample T gen The method comprises the following steps:
in the minority sample S min Middle search k nearest neighbor samples S min_i
Assume that the number of samples generated by the over-sampled SMOTE is M up From said S min_i In the random selection of said M up A sample of said M up SampleThis mark is S min_1 ,S min_2 ,......S min_j
Associating said S min_i And said S min_j Generating samples T by a random interpolation operation gen =S min_i +rand(0,1)(S min_j -S min_i ) (ii) a Wherein rand (0,1) represents a random number within the interval (0,1), i =1,2, · up
In an embodiment of the present invention, if the classification evaluation index diverges, the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN are updated according to the out-of-bag misclassification rate, and the step S102 is executed again until the classification evaluation index converges, and outputting the currently equalized voice data set includes:
if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and the T gen 、M up 、M down And T del And returning to execute the step S102 until the classification evaluation index is converged, and outputting the current balanced voice data set.
In an embodiment of the present invention, the calculating the information gain rate and the kini coefficient of the current equalized voice data set, and linearly combining the information gain rate and the kini coefficient of the current equalized voice data set by using two factors, and the constructing the two-factor random forest model includes:
calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model;
constructing the dual-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
and judging whether the out-of-bag error value of the double-factor random forest model is a preset out-of-bag error value or not, if so, outputting the double-factor random forest model under a preset double-factor condition, otherwise, updating the double factors of the self-adaptive splitting of the nodes of the decision tree, and reconstructing the double-factor random forest model.
In an embodiment of the present invention, the calculating the information gain rate and the kini coefficient of the current equalized voice data set, and linearly combining the information gain rate and the kini coefficient of the current equalized voice data set by using two factors, and adaptively splitting the decision tree nodes of the two-factor random forest model includes:
dividing the current equalized speech data set D into subsets D 1 ,...,D k Calculating the information gain of the current equalized speech data set
Figure GDA0003874061410000041
Wherein the entropy of the current equalized speech data set D
Figure GDA0003874061410000045
Normalizing the information gain of the current balanced voice data set by using the value number of the characteristic to obtain the information gain rate of the current balanced voice data set
Figure GDA0003874061410000042
Figure GDA0003874061410000043
Computing a kini coefficient for the current equalized speech data set
Figure GDA0003874061410000044
Figure GDA0003874061410000051
Wherein,
Figure GDA0003874061410000052
utilizing the double factor to equalize the current number of speechesLinear combination of information gain ratio and kini coefficient of data set ψ (D; D) 1 ,...,D k )=α[β 1 Gini(D;D 1 ,...,D k )-β 2 Gain_ratio(D;D 1 ,...,D k )]Adaptively splitting decision tree nodes of the dual-factor random forest model; wherein, alpha is a random factor, beta i Is a balance factor of the node splitting index.
In an embodiment of the present invention, if the classification evaluation index diverges, the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN are updated according to the out-of-bag misclassification rate, and the step S102 is executed again until the classification evaluation index converges, and outputting the currently equalized voice data set includes:
if the classification evaluation index diverges, then according to
Figure GDA0003874061410000053
Updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN, returning to execute the step S102 until the classification evaluation index is converged, and outputting the current equalized voice data set;
wherein, OOBM mis_rate Is the out-of-bag error classification rate, N, of the two-factor random forest model mis_maj Number of classification errors for majority class samples, N mis_min_i For the number of sample classification errors for the ith minority class, minclass is the number of minority class classes.
The invention provides a voice sample equalization device combining mixed sampling and random forest, which is characterized by comprising the following components:
the voice recognition system comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring an initial voice data set, and extracting the characteristics of the initial voice data set to obtain an extracted voice data characteristic set;
the analysis module is used for analyzing a minority class sample of the voice data feature set by using oversampling SMOTE, generating a new target minority class sample according to the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes of samples in the voice data feature set by using undersampling ENN, deleting the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
the construction module is used for calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model;
the input module is used for inputting the current balanced voice data set into the double-factor random forest model and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
the judging module is used for judging whether the classification evaluation index is converged or not, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to the analysis module until the classification evaluation index converges, and outputting the current balanced voice data set.
The invention provides a voice sample equalization device combining mixed sampling and random forest, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the voice sample equalization method combining the mixed sampling and the random forest when the computer program is executed.
The invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of speech sample equalization combining mixed sampling and random forest as described above.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention relates to a voice sample equalization method for joint mixed sampling and random forests, which comprises the steps of firstly, collecting an initial voice data set, and carrying out feature extraction on the initial voice data set to obtain an extracted voice data feature set; then, analyzing a minority class sample of the voice data feature set by utilizing oversampling SMOTE, generating a new target minority class sample according to the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes of samples in the voice data feature set by utilizing undersampling ENN, deleting the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set; secondly, calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors so as to construct a double-factor random forest model; inputting the current balanced voice data set into a double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the double-factor random forest model under a preset double-factor condition; finally, judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, returning to perform equalization processing on the extracted voice data feature set again until the classification evaluation index converges, and outputting the current equalized voice data set. According to the method, the characteristic extraction is carried out on the voice data set, so that the consideration on the inherent characteristic of the sample is increased; by applying the under-sampled ENN to the target few samples generated by the over-sampled SMOTE to remove the samples, the problem that the distribution condition of the nearby most samples is not considered when the SMOTE over-sampled generates a new sample is solved, and the generation of noise samples is reduced; meanwhile, self-adaptive double-factor parameters are introduced into the random forest to adjust the bias of the double-factor random forest model, iterative analysis is carried out on the data characteristics of the double-factor random forest input in each turn, and the data characteristics are fed back to the mixed sampling stage according to the classification evaluation indexes, so that the mixed sampling technology can be assisted to obtain more reliable data results, sample data with high information value is reserved to the maximum extent, and the loss of the classification information of the data set is reduced.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the present disclosure taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a first embodiment of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention;
FIG. 2 is a flow chart of a second embodiment of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention;
FIG. 3 is a schematic diagram of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention;
FIG. 4 is a schematic diagram of feature extraction for a speech data set in accordance with the present invention;
FIG. 5 is a flow chart of a two-factor random forest of the present invention;
fig. 6 is a block diagram of a structure of a speech sample equalization method combining mixed sampling and random forest according to an embodiment of the present invention.
Detailed Description
The present invention is further described below in conjunction with the drawings and the embodiments so that those skilled in the art can better understand the present invention and can carry out the present invention, but the embodiments are not to be construed as limiting the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention; the specific operation steps are as follows:
step S101: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
step S102: analyzing a few class samples of the voice data feature set by using oversampling SMOTE, generating a new target few class sample according to the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, deleting the target few class sample and the majority class sample according to the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
step S103: calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors so as to construct a double-factor random forest model;
step S104: inputting the current balanced voice data set into the double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
step S105: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
In the method provided by this embodiment, the extracted voice data set is equalized by a hybrid sampling technique, and a new sample generated by applying an under-sampling algorithm to an over-sampling algorithm is removed. Meanwhile, the data characteristics are analyzed by means of random forests and fed back to the mixed sampling stage, and a more reliable data result is obtained by the aid of the mixed sampling technology. By introducing self-adaptive double-factor parameter adjustment model bias into the random forest, iterative analysis is carried out on data characteristics of the double-factor random forest input in each round, the out-of-bag error classification rate obtained after each round of iteration is used as guidance and fed back to a mixed sampling stage, and sample data with high information value is maximally reserved.
Based on the above embodiments, the present embodiment further describes the speech sample equalization method, and with reference to fig. 2 and fig. 3, the specific operation steps are as follows:
step S201: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
as shown in fig. 4, in order to analyze the nonlinear phenomenon caused by the eddy current at the glottis during the sounding process, a Bark wavelet sub-band filter bank is firstly adopted to filter the voice signal, then a discrete cosine transform method is adopted to extract the characteristics at the low frequency band, and the correlation and the maximum lyapunov characteristics are extracted at the high frequency band, so that the characteristics of the voice can be embodied in detail at each frequency band. The fluid-solid coupling feature extraction thought based on glottis flow field distribution to be extracted is as follows: firstly, dividing voice frequency bands into 24 frequency bands according to a Bark filter bank, then calculating logarithmic energy after carrying out Fourier transform on a low frequency band according to an MFCC extraction method, then carrying out discrete cosine transform, carrying out nonlinear dynamics analysis on a high frequency band, extracting correlation dimension and a maximum Lyapunov exponent, and then fusing multiple features.
Further analysis of the speech signal from vocal cord vibration perspective, the vocal cord model equation set is described as follows:
Figure GDA0003874061410000091
in the formula, α =1,2 denotes left and right side portions, respectively. x is a radical of a fluorine atom ,υ Respectively the motion displacement and the speed of the mass block; m is ,k ,k ,r Respectively representing the mass of the mass block, the elastic coefficient of the spring, the coupling elastic coefficient and the damping constant; l and d are the vocal cord length and the thickness of the lower-layer mass block; k is a radical of formula ,I Respectively, the bernoulli pressure and the impact force generated at the time of collision.
Setting the model mass, elastic coefficient, coupling coefficient, damping constant and subglottic pressure as optimizable parameters, expressed as vectors:Φ:=[m ,k ,k ,r ,P S ]and a proper phi is searched by using a variation particle swarm quasi-Newton method, so that the glottic fluid-solid coupling model can accurately generate a glottic waveform. In order to avoid directly obtaining a local minimum solution in a non-convex search space by using a gradient method, firstly, an optimization solution is obtained by using a variation particle swarm optimization method, and then, a quasi-Newton method is used for carrying out local optimization on the obtained solution to find a global optimum point.
The selection and crossing process adopts a roulette selection rule to select M individuals. And the particle swarm algorithm is terminated under the condition that the obtained highest fitness exceeds a preset threshold or reaches a preset iteration number. Target voice source U ge And waveform U simulated with parameter vector phi gs The time-domain error therebetween is defined as an objective function F:
Figure GDA0003874061410000101
in the formula, N represents U ge ,U gs And (6) counting the number of points. When the value of the objective function F reaches the global minimum, the simulation glottic airflow U of the vocal cord mass block model is shown gs And target glottic airflow U ge And the vocal cord mass model can accurately reflect the actual vocal cord structure of the accurate simulation target voice source.
The flow-solid characteristics of the voice signal are extracted through sub-band nonlinear analysis and a vocal cord mass block model, the sub-band nonlinear analysis reflects the nonlinear characteristics caused by optimal airflow vortex in the voice signal generation process, and the actual vocal cord structure of the target voice signal is simulated through the vocal cord mass block model. And applying the extracted flow-fixed characteristics of the voice signal to subsequent voice recognition.
Step S202: analyzing a minority sample class S of the speech data feature set using oversampled SMOTE min And according to the minority sample S min Generating a sample T gen And combining the samples T gen Store to minority sample space K min []Performing the following steps; wherein, sample C gen =count(K min );
Step (ii) ofS203: judging the sample C gen Whether less than the number M of samples that the oversampled SMOTE needs to generate up If C is gen <M up Returning to execute the step S202, otherwise executing the step S204; wherein M is up = few class samples S min X oversampling ratio N 1
Step S204: analyzing the sample T with undersampling ENN gen And a plurality of classes of samples S in the speech data feature set maj If said sample T is a nearest neighbor sample gen The nearest neighbor samples of (A) are k and k or more and the samples T gen If the samples are of different types, deleting K min []Of the corresponding sample T gen If the majority of samples S maj The nearest neighbor samples of (A) are k and more than k and the plurality of types of samples S maj If the samples with different categories are not the same, deleting the majority of samples S maj (ii) a Wherein the undersampled ENN deleted samples T del =T gen +S maj
Step S205: determining the samples T deleted by the undersampled ENN del Whether less than the number M of samples that the undersampled ENN needs to delete down If T is del <M down Returning to execute the step S204, otherwise, outputting the current balanced voice data set; wherein M is down = majority class sample S maj X undersampling rate N 2
The SMOTE oversampling algorithm searches k nearest neighbor samples S in a minority sample class based on a k nearest neighbor thought min_i . Assume that the number of generated samples of the data set is M up Then from S min_i In random selection of M up A sample, M up One sample is marked as S min_1 ,S min_2 ,......S min_j . Correlating data samples S min_i And S min_j Obtaining a synthetic sample S by corresponding random interpolation operation new
S new =S min_i +rand(0,1)(S min_j -S min_i );
Wherein rand (0,1) representsA random number in the interval (0,1), i =1,2.. K, j =1,2.... For, M up ,M up To generate the number of samples. The number of generated samples is determined by the over-sampling rate.
The ENN (Edited Nearest Neighbor) undersampling algorithm is also based on a K Neighbor selection strategy, and reduces majority class samples and minority class samples. The basic idea is as follows: suppose an unbalanced data set D, S maj Represents the majority of class samples, traverse S maj Each sample S in (1) maj_i Finding out S maj_i If two or more of the three nearest neighbor samples are associated with S maj_i If the types are different, deleting the sample S maj_i . By combining SMOTE and ENN, the extracted voice data set is subjected to equalization processing through SMOTE-ENN mixed sampling, and the new samples generated by the oversampling algorithm acted by the undersampling algorithm are subjected to sample elimination, so that the problem of generating noise samples is solved under the condition that data set information is not lost.
Step S206: calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model;
step S207: constructing the dual-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
the classification performance of random forests is reduced when non-uniform data sets are processed. The main reasons for this are two, one is that in the process of random forest construction, the training set is selected by bootstrap self-sampling. Because few samples of the original data set are fewer and the probability of sampling of the few samples is lower, the number of the few samples in the sub-training set is smaller than that of the original data set, and the non-equilibrium of the data set is aggravated. Secondly, because the number of the samples of the minority class in the original data set is low, the decision tree based on the sub-training set lacks the generalization capability and cannot embody the characteristics of the minority class.
A random forest is an integrated classifier R = { h (x, θ) composed of a set of decision trees k ) K =1,2.. K }, where { θ } k The K is the number of decision trees in the random forest, and the training set of each classifier is selected from a data set D =<X,Y>And randomly sampling to obtain. The marginal function of a random forest is:
Figure GDA0003874061410000121
wherein, the classification performance of the base classifier { h (x, θ) } is defined as:
s=E X,Y mr(X,Y);
assume s ≧ 0, i.e., the base classifier is a weak classifier. Random forest generalization error PE * The upper limit of (A) is:
Figure GDA0003874061410000122
wherein the subscripts X, Y denote the probability P covering the X, Y space,
Figure GDA0003874061410000123
and (4) averaging the correlation coefficients among all the classifiers, and showing that the generalization error of the random forest is related to the classification performance of the base classifier and the correlation coefficient among all the base classifiers. Therefore, a dual-factor decision tree splitting algorithm is provided, the correlation coefficient among all the base classifiers is reduced, the classification performance of the base classifiers is improved, and the generalization error of the random forest is reduced.
The node splitting algorithm of the decision tree mainly comprises ID3, C4.5[23], CART [24] and the like. The ID3 algorithm selects the information gain as the segmentation criterion. The "feature-value" combination based on the maximum information gain would be selected for segmentation. The disadvantage is that the information gain criterion favors features with many possible values, but ignores the relevance to the classification, and the classification result cannot be generalized. The C4.5 and CART algorithms use "information gain ratio" and "kini coefficient", respectively, as criteria for selecting the segmentation. The ID3 algorithm using information gain as a node splitting criterion can only process discrete features, and C4.5 and CART using information gain rate and a kini coefficient as indexes can process numerical features. The difference between the information gain rate and the Keyny coefficient is that the information gain rate is multiplied by the logarithm of the class probability by the class probability to calculate the entropy difference before and after splitting, which is beneficial to smaller distribution with less quantity and a plurality of characteristic values; the kini coefficient is derived by subtracting the sum of the squared probabilities of each class from one class, favoring a larger data distribution. Both are information-theoretic based algorithms, and there is some approximation of the reason for node splitting. Therefore, the combination between the two is established, and a random factor and a balance factor are introduced to realize the node self-adaptive splitting.
Given the current equalized speech data set D, the entropy of this current equalized speech data set is defined as:
Figure GDA0003874061410000124
when the current equalized speech data set D is divided into subsets D 1 ,...,D k When the entropy is reduced, the corresponding entropy is reduced to obtain the' information gain
Figure GDA0003874061410000131
The information gain rate is normalized by using the value number of the feature on the basis of the information gain, that is, the information gain rate is normalized by using the value number of the feature
Figure GDA0003874061410000132
The kini coefficient of the current equalized speech data set D is then defined as:
Figure GDA0003874061410000133
wherein,
Figure GDA0003874061410000135
line considering information gain rate and kini coefficientSex combination, the two-factor node splitting algorithm is as follows:
ψ(D;D 1 ,...,D k )=α[β 1 Gini(D;D 1 ,...,D k )-β 2 Gain_ratio(D;D 1 ,...,D k )]
wherein, alpha is a random factor (alpha is more than or equal to 0 and less than or equal to 1) to control the randomness of node splitting, when alpha =1, the generated decision tree is the same as the deterministic decision tree, and when alpha =0, the generated decision tree is a completely random tree. Beta is a i (i =1,2) is a balance factor of the node splitting index, and beta is more than or equal to 0 i (i =1,2) ≦ 1 may not be 0 or 1 at the same time, and on the boundary, there are only two combinations (1,0) or (0,1). As shown in FIG. 5, a dual-factor random forest is constructed by the dual-factor node splitting algorithm, when the nodes of the decision tree are split, the CART has the smallest Keyny coefficient, while the C4.5 algorithm has the largest information gain rate, if both indexes reach the optimum, psi (D; D) 1 ,...,D k ) And taking the minimum value as an optimal rule to split the nodes. After the random forest is generated, the out-of-bag error is estimated. If the error outside the bag reaches the minimum, outputting the random forest under the condition of the optimal factor; otherwise, updating the double factors and reconstructing the random forest.
Step S208: inputting the current balanced voice data set into a double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the double-factor random forest model under a preset double-factor condition;
the out-of-bag misclassification rate is:
Figure GDA0003874061410000134
wherein, N mis_maj Number of classification errors for majority class samples, N mis_min_i For the number of sample classification errors for the ith minority class, minclass is the number of minority class classes.
Step S209: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; if said score isIf the class evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and T gen 、M up 、M down And T del And returning to execute the step S202 until the classification evaluation index is converged, and outputting the current balanced voice data set.
In the invention, oversampling and undersampling can equalize data distribution as much as possible, but the existing sampling algorithm does not pay sufficient attention to the problems of class overlapping and noise, and the spatial distribution of data is distorted after sampling. Therefore, a mixed sampling algorithm of the combined double-factor random forest is provided, a new sample is synthesized for a few classes according to a sample distribution rule, and redundant information is removed under the condition that the space structure of the majority classes is not changed according to the feedback of the double-factor random forest. And pre-equalizing the data set by combining a mixed sampling algorithm of SMOTE and ENN, then evaluating the pre-equalized data set by using a double-factor random forest, and respectively calculating a classification evaluation index and an error classification rate. And correcting the mixed sampling rate according to the error classification rate, wherein in the iterative process of mixed sampling, the sampling rate is dynamically changed along with the out-of-bag error classification rate of the random forest, but not the unbalance degree of the data set. Judging whether the classification evaluation indexes are converged or not according to the classification evaluation indexes F1-macro serving as iteration stop standards, if so, namely when the F1-macro continuously descends twice or is kept unchanged, ending the mixed sampling, stopping the iteration process, and outputting a data set which is an optimal balanced data set conforming to the distribution of original data; if the classification evaluation index diverges, updating the mixed sampling rate of the SMOTE-ENN mixed sampling according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and T gen 、M up 、M down And T del And returning to carry out equalization processing on the extracted voice data set again until the classification evaluation index is converged, and outputting the current equalized voice data set.
The specific flow steps of the OOBM-SMOTE-ENN combined double-factor random forest mixed sampling algorithm are as follows:
inputting: data set D, majority class samples S maj Minority class sample S min Number of nearest neighbor samples k, initial oversampling rate N 1 Initial undersampling rate N 2
And (3) outputting: the data set D' is pre-equalized.
1. Initializing OOBM mis_rate =1, number of samples M needed to be generated to initialize oversampling up (M up =S min ×N 1 ) Will M up Set to 0;
2. correcting N according to double-factor random forest feedback 1 、N 2 Set up C gen =0, for each few samples S min_i Traversing, and storing a few samples generated by the SMOTE algorithm into a space K min [];
3、C gen =C gen +count(K min ) If C is present gen <M up Returning to the step 2, otherwise, performing the step 4;
4. initializing the number of samples M that need to be deleted for undersampling down (M down =S maj ×N 2 ) A 1, M down Set to 0;
5. setting T del =0, and S for each majority sample maj_i Go through and compare S maj_i And K min []Label of medium sample, if said K min []Middle T gen_i There are k and k or more nearest neighbor samples with the T gen_i If the samples are of different types, deleting K min []Corresponding few classes in (1) generate samples T gen_i . Meanwhile, the neighborhood samples are compared through ENN definition, if S is maj_i There are k and more than k nearest neighbor samples with the S maj_i If the samples are different in category, deleting the sample S maj_i
6、T del =T del +(T gen_i +S maj_i ) If T is del <M down Returning to step 5, otherwise, outputting a once equalized sample set D'.
And the current balanced voice data set is divided into a training set and a testing set, the random forest recognition model is trained by using the characteristic parameters of the voice of the training set, and the characteristic parameters of the testing set are subjected to prediction classification by using the trained random forest model. The comparative experiment results of the OOB MSE algorithm and the classical sampling algorithm are shown in the following table 1;
TABLE 1 comparative experiment results of OOBCE algorithm and classical sampling algorithm
Raw data SMOTE ADASYN BSM CNN OOBMSE
Recognition rate/%) 97.05 99.03 99.04 99.03 91.43 100.00
Recall/%) 92.31 99.16 98.86 99.16 91.34 100.00
Kappa number/% 89.89 98.03 98.02 98.03 82.82 100.00
F1 fraction/%) 94.94 99.02 99.01 99.02 91.40 100.00
The table, SMOTE, is called Synthetic minimum ownership over sampling Technique, i.e. a Technique for synthesizing Minority samples, and is an improved scheme based on a random Oversampling algorithm, and since the random Oversampling adopts a simple sample replication strategy to add Minority samples, the basic idea of the SMOTE algorithm is to analyze the Minority samples and artificially synthesize new samples according to the Minority samples to add the new samples to a data set.
ADASYNN is an adaptive comprehensive sampling method for unbalanced learning. The idea of adaptively generating minority class data samples based on their distribution is based on the idea that the minority class samples that are more difficult to learn generate more synthetic data than those that are easier to learn. The ADASYNN method can reduce learning bias brought by original unbalanced data distribution, and can adaptively transfer decision boundaries to samples which are difficult to learn.
Borderline SMOTE (BSM) is an improved over-sampling algorithm based on SMOTE, which uses only a few classes of samples on the boundary to synthesize new samples, thereby improving the class distribution of the samples.
Condensed Nearest Neighbor concentration, or CNN for short, is an undersampling technique used to find subsets of a sample set without incurring a loss in model performance (referred to as the minimum consistency set).
As can be seen from the above table, the OOB MSE mixed sampling algorithm provided by the invention is superior to the traditional SMOTE, ADASYNN, BSM and CNN algorithms. The accuracy of OOB MSE in the random forest classifier reaches 100%, and other evaluation indexes reach optimal values, so that the method is superior to the traditional method. Therefore, the equalization algorithm provided by the invention improves the recognition rate and reliability of the system.
The method provided by the embodiment is quite common in voice recognition and even intelligent medical diagnosis, and the mixed sampling algorithm combining the double-factor random forest provided by the invention is based on the double-factor random forest and combines SMOTE and ENN to solve the problem of unbalanced data classification in voice recognition. In view of the shortcomings of the conventional oversampling algorithm, in the hybrid sampling process, the oversampling rate is dynamically changed according to the out-of-bag error classification rate of the two-factor random forest, not the unbalanced rate of the data set, and simultaneously, noise in the oversampling generated samples is removed through ENN. And combining double-factor random forest and mixed sampling according to the out-of-bag error classification rate, dynamically correcting the mixed sampling rate, increasing the number of a few types of samples, and removing noise and repeated information in the samples to balance data.
Referring to fig. 6, fig. 6 is a block diagram illustrating a structure of a voice sample equalization method combining mixed sampling and random forest according to an embodiment of the present invention; the specific device may include:
the voice recognition system comprises an acquisition module 100, a processing module and a processing module, wherein the acquisition module 100 is used for acquiring an initial voice data set, and extracting the characteristics of the initial voice data set to obtain an extracted voice data characteristic set;
an analysis module 200, configured to analyze a minority class sample of the voice data feature set by using oversampling SMOTE and generate a new target minority class sample according to the minority class sample, analyze a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes samples in the voice data feature set by using undersampling ENN, delete the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtain a current balanced voice data set;
a constructing module 300, configured to calculate an information gain ratio and a kini coefficient of the current equalized voice data set, and linearly combine the information gain ratio and the kini coefficient of the current equalized voice data set by using two factors to construct a two-factor random forest model;
an input module 400, configured to input the current balanced voice data set into the two-factor random forest model, and output a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset two-factor condition;
a determining module 500, configured to determine whether the classification evaluation index converges, and if the classification evaluation index converges, output the current balanced speech data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to the analysis module until the classification evaluation index converges, and outputting the current balanced voice data set.
The apparatus for jointly mixing sampling and equalizing a voice sample of a random forest according to this embodiment is used to implement the method for jointly mixing sampling and equalizing a voice sample of a random forest, and thus specific embodiments of the apparatus for jointly mixing sampling and equalizing a voice sample of a random forest can be seen in the foregoing embodiments of the method for jointly mixing sampling and equalizing a voice sample of a random forest, for example, 100, 200, 300, 400, and 500 are respectively used to implement steps S101, S102, S103, S104, and S105 in the method for jointly mixing sampling and equalizing a voice sample of a random forest, and therefore, specific embodiments thereof may refer to descriptions of corresponding embodiments of various parts, and are not described herein again.
The embodiment of the invention also provides voice sample equalization equipment combining mixed sampling and random forest, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the voice sample equalization method combining the mixed sampling and the random forest when the computer program is executed.
The specific embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the above method for equalizing voice samples by combining mixed sampling and random forest.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Various other modifications and alterations will occur to those skilled in the art upon reading the foregoing description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (8)

1. A voice sample equalization method combining mixed sampling and random forest is characterized by comprising the following steps:
s101: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
s102: analyzing a few class samples of the voice data feature set by using oversampling SMOTE, generating a new target few class sample according to the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, deleting the target few class sample and the majority class sample according to the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
s103: calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model, wherein the method comprises the following steps:
calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model, wherein the decision tree nodes comprise:
dividing the current equalized speech data set D into subsets D 1 ,...,D k Calculating the information gain of the current equalized speech data set
Figure FDA0003874061400000011
Wherein the entropy of the current equalized speech data set D
Figure FDA0003874061400000012
Figure FDA0003874061400000013
Normalizing the information gain of the current balanced voice data set by using the value number of the characteristic to obtain the information gain rate of the current balanced voice data set
Figure FDA0003874061400000014
Computing a kini coefficient for the current equalized speech data set
Figure FDA0003874061400000015
Figure FDA0003874061400000016
Wherein,
Figure FDA0003874061400000017
using the two factors to determine the information gain rate and the Keyny coefficient of the current equalized speech data setLinear combination psi (D; D) 1 ,...,D k )=α[β 1 Gini(D;D 1 ,...,D k )-β 2 Gain_ratio(D;D 1 ,...,D k )]Adaptively splitting decision tree nodes of the dual-factor random forest model; wherein, alpha is a random factor, beta i A balance factor which is a node splitting index;
constructing the double-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
judging whether the out-of-bag error value of the dual-factor random forest model is a preset out-of-bag error value or not, if so, outputting the dual-factor random forest model under a preset dual-factor condition, otherwise, updating the dual-factor of the self-adaptive splitting of the nodes of the decision tree, and reconstructing the dual-factor random forest model;
s104: inputting the current balanced voice data set into the double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
s105: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
2. The method of claim 1, wherein the analyzing a minority class sample of the voice data feature set using oversampling SMOTE and generating a new target minority class sample from the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes samples in the voice data feature set using undersampling ENN, and deleting the target minority class sample and the plurality of classes samples from the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the plurality of classes samples to obtain a current equalized voice data set comprises:
s201: analyzing the minority sample S using the oversampled SMOTE min And according to the minority class samples S min Generating a sample T gen The sample T is gen Store to minority sample space K min []Performing the following steps; wherein, sample C gen =count(K min );
S202: judging the sample C gen Whether less than the number M of samples that the oversampled SMOTE needs to generate up If C is gen <M up If not, returning to execute the step S201, otherwise, executing the step S203; wherein M is up = few class samples S min X oversampling ratio N 1
S203: analyzing the sample T with the undersampled ENN gen And a plurality of classes of samples S in the speech data feature set maj If said sample T is a nearest neighbor sample gen The nearest neighbor samples of (A) are k and k or more and the samples T gen If the samples are of different types, deleting K min []Of the corresponding sample T gen If said plurality of samples S maj The nearest neighbor samples of (A) are k and more than k and the plurality of types of samples S maj If the samples with different categories are selected, deleting the samples S with most categories maj (ii) a Wherein the undersampled ENN deleted samples T del =T gen +S maj
S204: determining the samples T deleted by the undersampled ENN del Whether less than the number M of samples that the undersampled ENN needs to delete down If T is del <M down Returning to execute the step S203, otherwise, outputting the current balanced voice data set; wherein, M down = majority class sample S maj X undersampling rate N 2
3. The method of claim 2, wherein the analyzing the minority sample S using oversampling SMOTE min And according to the minority sample S min Generating a sample T gen The method comprises the following steps:
in the minority sample S min Middle search k nearest neighbor samples S min_i
Assuming that the number of samples generated by the oversampling SMOTE is M up From said S min_i In the random selection of said M up A sample of said M up One sample is marked as S min_1 ,S min_2 ,……S min_j
Associating the S min_i And said S min_j Generating samples T by a random interpolation operation gen =S min_i +rand(0,1)(S min_j -S min_i ) (ii) a Wherein rand (0,1) represents a random number within the interval (0,1), i =1,2, · up
4. The method of claim 2, wherein if the classification evaluation index diverges, updating the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag misclassification rate, and returning to the step S102 until the classification evaluation index converges, and outputting the current equalized speech data set comprises:
if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and T gen 、M up 、M down And T del And returning to execute the step S102 until the classification evaluation index is converged, and outputting the current balanced voice data set.
5. The method of claim 1, wherein if the classification evaluation index diverges, updating the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag misclassification rate, returning to perform step S102 until the classification evaluation index converges, and outputting the current equalized speech data set comprises:
if the classification evaluation index diverges, then according to
Figure FDA0003874061400000041
Updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN, returning to execute the step S102 until the classification evaluation index is converged, and outputting the current equalized voice data set;
wherein, OOBM mis_rate Is the out-of-bag error classification rate, N, of the two-factor random forest model mis_maj Number of classification errors for majority class samples, N mis_min_i For the number of sample classification errors for the ith minority class, minclass is the number of minority class classes.
6. A speech sample equalization apparatus that combines hybrid sampling and random forest, comprising:
the voice recognition system comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring an initial voice data set, and extracting the characteristics of the initial voice data set to obtain an extracted voice data characteristic set;
the analysis module is used for analyzing a minority class sample of the voice data feature set by using oversampling SMOTE, generating a new target minority class sample according to the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes of samples in the voice data feature set by using undersampling ENN, deleting the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
the construction module is used for calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and constructing a double-factor random forest model, and comprises the following steps:
calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model, wherein the decision tree nodes comprise:
dividing the current equalized speech data set D into subsets D 1 ,...,D k Calculating an information gain of the current equalized speech data set
Figure FDA0003874061400000051
Wherein the entropy of the current equalized speech data set D
Figure FDA0003874061400000052
Figure FDA0003874061400000053
Normalizing the information gain of the current balanced voice data set by using the value number of the characteristic to obtain the information gain rate of the current balanced voice data set
Figure FDA0003874061400000054
Computing a kini coefficient for the current equalized speech data set
Figure FDA0003874061400000055
Figure FDA0003874061400000056
Wherein,
Figure FDA0003874061400000057
linearly combining information gain ratio and a kini coefficient of the current equalized speech data set by the dual factor psi (D; D) 1 ,...,D k )=α[β 1 Gini(D;D 1 ,...,D k )-β 2 Gain_ratio(D;D 1 ,...,D k )]Adaptively splitting decision tree nodes of the dual-factor random forest model; wherein, alpha is a random factor, beta i A balance factor which is a node splitting index;
constructing the double-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
judging whether the out-of-bag error value of the dual-factor random forest model is a preset out-of-bag error value or not, if so, outputting the dual-factor random forest model under a preset dual-factor condition, otherwise, updating the dual-factor of the self-adaptive splitting of the nodes of the decision tree, and reconstructing the dual-factor random forest model;
the input module is used for inputting the current balanced voice data set into the double-factor random forest model and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
the judging module is used for judging whether the classification evaluation index is converged or not, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, returning to the analysis module until the classification evaluation index converges, and outputting the current balanced voice data set.
7. A voice sample equalization apparatus that combines hybrid sampling and random forest, comprising:
a memory for storing a computer program;
a processor for implementing the steps of a method of speech sample equalization combining mixed sampling and random forest as claimed in any one of claims 1 to 5 when executing said computer program.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for speech sample equalization in combined mixed sampling and random forest according to any one of claims 1 to 5.
CN202210083571.0A 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest Active CN114550697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210083571.0A CN114550697B (en) 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083571.0A CN114550697B (en) 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest

Publications (2)

Publication Number Publication Date
CN114550697A CN114550697A (en) 2022-05-27
CN114550697B true CN114550697B (en) 2022-11-18

Family

ID=81671633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083571.0A Active CN114550697B (en) 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest

Country Status (1)

Country Link
CN (1) CN114550697B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273909A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 The sorting algorithm of high dimensional data
CN111202526A (en) * 2020-01-20 2020-05-29 华东医院 Method for simplifying and optimizing multi-dimensional elderly auditory function evaluation system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11501304B2 (en) * 2020-03-11 2022-11-15 Synchrony Bank Systems and methods for classifying imbalanced data
US20210097449A1 (en) * 2020-12-11 2021-04-01 Intel Corporation Memory-efficient system for decision tree machine learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273909A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 The sorting algorithm of high dimensional data
CN111202526A (en) * 2020-01-20 2020-05-29 华东医院 Method for simplifying and optimizing multi-dimensional elderly auditory function evaluation system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data;Zhaozhao Xu, 等;《Journal of Biomedical Informatics》;20200705;全文 *
Class-imbalanced voice pathology classification: Combining hybrid sampling with optimal two-factor random forests;X Zhang,等;《Applied Acoustics》;20220120;全文 *
Solving the Problem of Class Imbalance in the Prediction of Hotel Cancelations: A Hybridized Machine Learning Approach;Mohd Adil;《Processes》;20210921;全文 *
基于SMOTE+ENN与随机森林的心电辅助诊疗应用研究;商紫薇;《中国优秀硕士学位论文全文数据库》;20200315;全文 *
联合多频带非线性方法的病理嗓音识别研究;赵品辉;《信息化研究》;20190620;全文 *

Also Published As

Publication number Publication date
CN114550697A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN107564513B (en) Voice recognition method and device
CN106952644A (en) A kind of complex audio segmentation clustering method based on bottleneck characteristic
JP7024515B2 (en) Learning programs, learning methods and learning devices
US11462210B2 (en) Data collecting method and system
US20210224647A1 (en) Model training apparatus and method
CN108877947B (en) Depth sample learning method based on iterative mean clustering
CN106971180B (en) A kind of micro- expression recognition method based on the sparse transfer learning of voice dictionary
JP6979203B2 (en) Learning method
JP2014026455A (en) Media data analysis device, method and program
CN112348068B (en) Time sequence data clustering method based on noise reduction encoder and attention mechanism
CN108154186B (en) Pattern recognition method and device
CN109344751B (en) Reconstruction method of noise signal in vehicle
WO2020214253A1 (en) Condition-invariant feature extraction network for speaker recognition
CN104077598A (en) Emotion recognition method based on speech fuzzy clustering
CN106384587B (en) A kind of audio recognition method and system
Fan et al. Modeling voice pathology detection using imbalanced learning
US20180061395A1 (en) Apparatus and method for training a neural network auxiliary model, speech recognition apparatus and method
CN114550697B (en) Voice sample equalization method combining mixed sampling and random forest
CN117527495A (en) Modulation mode identification method and device for wireless communication signals
JP2014215385A (en) Model estimation system, sound source separation system, model estimation method, sound source separation method, and program
Saeidi et al. Particle swarm optimization for sorted adapted gaussian mixture models
Choi et al. Adversarial speaker-consistency learning using untranscribed speech data for zero-shot multi-speaker text-to-speech
CN114048770B (en) Automatic detection method and system for digital audio deletion and insertion tampering operation
CN115472179A (en) Automatic detection method and system for digital audio deletion and insertion tampering operation
Farsi et al. Implementation and optimization of a speech recognition system based on hidden Markov model using genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant