CN114550697A - Voice sample equalization method combining mixed sampling and random forest - Google Patents

Voice sample equalization method combining mixed sampling and random forest Download PDF

Info

Publication number
CN114550697A
CN114550697A CN202210083571.0A CN202210083571A CN114550697A CN 114550697 A CN114550697 A CN 114550697A CN 202210083571 A CN202210083571 A CN 202210083571A CN 114550697 A CN114550697 A CN 114550697A
Authority
CN
China
Prior art keywords
sample
data set
voice data
rate
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210083571.0A
Other languages
Chinese (zh)
Other versions
CN114550697B (en
Inventor
张晓俊
周长伟
朱欣程
陶智
赵鹤鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202210083571.0A priority Critical patent/CN114550697B/en
Publication of CN114550697A publication Critical patent/CN114550697A/en
Application granted granted Critical
Publication of CN114550697B publication Critical patent/CN114550697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Complex Calculations (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

The invention relates to a voice sample equalization method combining mixed sampling and random forest, which comprises the steps of firstly, carrying out feature extraction on an initial voice data set; then, the extracted voice data feature set is balanced by using SMOTE-ENN mixed sampling, and a current balanced voice data set is obtained; secondly, inputting the current balanced voice data set into a double-factor random forest model, and outputting classification evaluation indexes and out-of-bag error classification rates of the double-factor random forest model; finally, judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting a current balanced voice data set; otherwise, updating the mixed sampling rate of SMOTE-ENN mixed sampling according to the out-of-bag error classification rate, returning to perform equalization processing on the extracted voice data set again until the classification evaluation index is converged, and outputting the current equalized voice data set. According to the method, the SMOTE-ENN mixed sampling and the double-factor random forest model are combined to balance the data set, so that sample data with high information value is reserved to the maximum extent.

Description

Voice sample equalization method combining mixed sampling and random forest
Technical Field
The invention relates to the technical field of data processing, in particular to a voice sample equalization method, a device and equipment for combining mixed sampling and random forest and a computer readable storage medium.
Background
In recent years, artificial intelligence technology has been developed in a breakthrough in speech recognition. However, the data imbalance problem has been a challenging problem in machine learning. The unevenly distributed data of the classes can cause the recognition capability of the classifier to be obviously biased to the majority of classes, and the satisfactory classification performance cannot be achieved for the minority of classes.
At present, traditional unbalanced learning techniques for solving the problem of unbalanced data classification can be divided into two categories: internal methods and external methods. The internal method is to improve the existing classification algorithm to reduce its sensitivity to class imbalance. The external method preprocesses the training data to balance it. Among external methods, the sampling method for balancing unbalanced data sets can be divided into: SMOTE oversampling and ENN undersampling.
The basic idea of SMOTE oversampling is to analyze a few classes of samples and artificially synthesize new samples from the few classes of samples to add to a data set, but the distribution of nearby most classes of samples is not considered when generating new samples, and K neighbor selection has blindness, meets much noise, and invades into a most classes of sample space. The ENN undersampling obtains an ideal class distribution rate by eliminating most types of samples, but causes the loss of classification information in a data set, so that a voice sample equalization method combining mixed sampling and random forest needs to be designed.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the defects that in the prior art, the distribution condition of a plurality of nearby samples is not considered when SMOTE oversampling is used for generating new samples, and a lot of noise invades into a plurality of sample spaces and ENN undersampling causes the loss of classification information in data set.
In order to solve the technical problem, the invention provides a voice sample equalization method combining mixed sampling and random forest, which comprises the following steps:
s101: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
s102: analyzing a few class samples of the voice data feature set by using oversampling SMOTE, generating a new target few class sample according to the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, deleting the target few class sample and the majority class sample according to the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
s103: calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model;
s104: inputting the current balanced voice data set into the double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
s105: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
In an embodiment of the present invention, the analyzing a few class samples of the voice data feature set by using oversampling SMOTE and generating a new target few class sample from the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, and deleting the target few class sample and the majority class sample from the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample to obtain a current equalized voice data set includes:
s201: analyzing the minority sample S using the oversampled SMOTEminAnd according to the minority sample SminGenerating a sample TgenThe sample T isgenStore to minority sample space Kmin[]Performing the following steps; wherein the sample Tgen=count(Kmin);
S202: judging the sample TgenWhether less than the number M of samples that the oversampled SMOTE needs to generateupIf T isgen<MupReturning to execute the step S201, otherwise executing the step S203; wherein M isupClass SminX oversampling ratio N1
S203: analyzing the sample T with the undersampled ENNgenAnd a plurality of classes of samples S in the speech data feature setmajIf said sample T is a nearest neighbor samplegenThe nearest neighbor samples of (A) are k and k or more and the samples TgenIf the samples are of different types, deleting Kmin[]Of the corresponding sample TgenIf the majority of samples SmajThe nearest neighbor samples of (A) are k and more than k and the plurality of types of samples SmajIf the samples with different categories are not the same, deleting the majority of samples Smaj(ii) a Wherein the undersampled ENN deleted samples Tdel=Tgen+Smaj
S204: determining the samples T deleted by the undersampled ENNdelWhether less than the number M of samples that the undersampled ENN needs to deletedownIf T isdel<MdownReturning to execute the step S203, otherwise, outputting the current balanced voice data set; wherein M isdownMajority class sample SmajX undersampling rate N2
In one embodiment of the present invention, the analyzing the minority sample S using oversampling SMOTEminAnd according to the minority sample SminGenerating a sample TgenThe method comprises the following steps:
in the minority sample SminMiddle search k nearest neighbor samples Smin_i
Assuming that the number of samples generated by the oversampling SMOTE is MupFrom said Smin_iIn the random selection of said MupA sample of said MupOne sample is marked as Smin_1,Smin_2,......Smin_j
Associating said Smin_iAnd said Smin_jGenerating samples T by a random interpolation operationgen=Smin_i+rand(0,1)(Smin_j-Smin_i) (ii) a Where rand (0, 1) denotes a random number in the interval (0, 1), i ═ 1, 2,.. 9., k, j ═ 1, 2up
In an embodiment of the present invention, if the classification evaluation index diverges, the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN are updated according to the out-of-bag misclassification rate, and the step S102 is executed again until the classification evaluation index converges, and outputting the currently equalized voice data set includes:
if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and Tgen、Mup、MdownAnd TdelAnd returning to execute the step S102 until the classification evaluation index is converged, and outputting the current balanced voice data set.
In an embodiment of the present invention, the calculating the information gain rate and the kini coefficient of the current equalized voice data set, and linearly combining the information gain rate and the kini coefficient of the current equalized voice data set by using two factors, and the constructing the two-factor random forest model includes:
calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model;
constructing the dual-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
and judging whether the out-of-bag error value of the double-factor random forest model is a preset out-of-bag error value or not, if so, outputting the double-factor random forest model under a preset double-factor condition, otherwise, updating the double factors of the self-adaptive splitting of the nodes of the decision tree, and reconstructing the double-factor random forest model.
In an embodiment of the present invention, the calculating the information gain rate and the kini coefficient of the current equalized voice data set, and linearly combining the information gain rate and the kini coefficient of the current equalized voice data set by using two factors, and adaptively splitting the decision tree nodes of the two-factor random forest model includes:
dividing the current equalized speech data set D into subsets D1,...,DkCalculating the information gain of the current equalized speech data set
Figure BDA0003474249260000041
Wherein the entropy of the current equalized speech data set D
Figure BDA0003474249260000042
Normalizing the information gain of the current balanced voice data set by using the value number of the characteristic to obtain the information gain rate of the current balanced voice data set
Figure BDA0003474249260000043
Figure BDA0003474249260000044
Computing a kini coefficient for the current equalized speech data set
Figure BDA0003474249260000045
Figure BDA0003474249260000046
Wherein the content of the first and second substances,
Figure BDA0003474249260000047
linearly combining information gain ratio and a kini coefficient of the current equalized speech data set by the dual factor psi (D; D)1,...,Dk)=α[β1Gini(D;D1,...,Dk)-β2Gain_ratio(D;D1,...,Dk)]Adaptively splitting decision tree nodes of the dual-factor random forest model; wherein, alpha is a random factor, betaiIs a balance factor of the node splitting index.
In an embodiment of the present invention, if the classification evaluation index diverges, the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN are updated according to the out-of-bag misclassification rate, and the step S102 is executed again until the classification evaluation index converges, and outputting the currently equalized voice data set includes:
if the classification evaluation index diverges, then according to
Figure BDA0003474249260000051
Updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN, returning to execute the step S102 until the classification evaluation index is converged, and outputting the current equalized voice data set;
wherein, OOBMmis_rateIs the out-of-bag error classification rate, N, of the two-factor random forest modelmis_majNumber of classification errors for majority class samples, Nmis_min_iFor the number of sample classification errors of the ith minority class, minclass is the number of minority class classes.
The invention provides a voice sample equalization device combining mixed sampling and random forest, which is characterized by comprising the following steps:
the voice recognition system comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring an initial voice data set, and extracting the characteristics of the initial voice data set to obtain an extracted voice data characteristic set;
the analysis module is used for analyzing a minority class sample of the voice data feature set by using oversampling SMOTE, generating a new target minority class sample according to the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes of samples in the voice data feature set by using undersampling ENN, deleting the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
the construction module is used for calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model;
the input module is used for inputting the current balanced voice data set into the double-factor random forest model and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
the judging module is used for judging whether the classification evaluation index is converged or not, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
The invention provides a voice sample equalization device combining mixed sampling and random forest, which is characterized by comprising the following steps:
a memory for storing a computer program;
and the processor is used for realizing the steps of the voice sample equalization method combining the mixed sampling and the random forest when the computer program is executed.
The invention provides a computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a method for speech sample equalization combining mixed sampling and random forest as described above.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention relates to a voice sample equalization method for joint mixed sampling and random forests, which comprises the steps of firstly, collecting an initial voice data set, and carrying out feature extraction on the initial voice data set to obtain an extracted voice data feature set; then, analyzing a minority class sample of the voice data feature set by utilizing oversampling SMOTE, generating a new target minority class sample according to the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes of samples in the voice data feature set by utilizing undersampling ENN, deleting the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set; secondly, calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors so as to construct a double-factor random forest model; inputting the current balanced voice data set into a double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the double-factor random forest model under a preset double-factor condition; finally, judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, returning to perform equalization processing on the extracted voice data feature set again until the classification evaluation index converges, and outputting the current equalized voice data set. The method increases the consideration of the inherent characteristics of the sample by extracting the characteristics of the voice data set; by applying the under-sampled ENN to the target few samples generated by the over-sampled SMOTE to remove the samples, the problem that the distribution condition of the nearby most samples is not considered when the SMOTE over-sampled generates a new sample is solved, and the generation of noise samples is reduced; meanwhile, self-adaptive double-factor parameters are introduced into the random forest to adjust the bias of the double-factor random forest model, iterative analysis is carried out on the data characteristics of the double-factor random forest input in each turn, and the data characteristics are fed back to the mixed sampling stage according to the classification evaluation indexes, so that the mixed sampling technology can be assisted to obtain more reliable data results, sample data with high information value is reserved to the maximum extent, and the loss of the classification information of the data set is reduced.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the present disclosure taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a first embodiment of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention;
FIG. 2 is a flow chart of a second embodiment of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention;
FIG. 3 is a schematic diagram of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention;
FIG. 4 is a schematic diagram of feature extraction for a speech data set in accordance with the present invention;
FIG. 5 is a flow chart of a two-factor random forest of the present invention;
fig. 6 is a block diagram of a structure of a speech sample equalization method combining mixed sampling and random forest according to an embodiment of the present invention.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a method for equalizing a speech sample of a combined mixed sampling and random forest according to the present invention; the specific operation steps are as follows:
step S101: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
step S102: analyzing a few class samples of the voice data feature set by using oversampling SMOTE, generating a new target few class sample according to the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, deleting the target few class sample and the majority class sample according to the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
step S103: calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors so as to construct a double-factor random forest model;
step S104: inputting the current balanced voice data set into the double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
step S105: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
In the method provided by this embodiment, the extracted voice data set is equalized by a hybrid sampling technique, and a new sample generated by applying an under-sampling algorithm to an over-sampling algorithm is removed. Meanwhile, the data characteristics are analyzed by means of random forests and fed back to the mixed sampling stage, and a more reliable data result is obtained by the aid of the mixed sampling technology. By introducing self-adaptive double-factor parameter adjustment model bias into the random forest, iterative analysis is carried out on data characteristics of the double-factor random forest input in each round, the out-of-bag error classification rate obtained after each round of iteration is used as guidance and fed back to a mixed sampling stage, and sample data with high information value is maximally reserved.
Based on the above embodiments, the present embodiment further describes the speech sample equalization method, and with reference to fig. 2 and fig. 3, the specific operation steps are as follows:
step S201: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
as shown in fig. 4, in order to analyze the nonlinear phenomenon caused by the vortex at the glottis during the utterance, a Bark wavelet sub-band filter bank is firstly adopted to filter the speech signal, then the feature is extracted by a discrete cosine transform method at the low frequency band, and the correlation and the maximum lyapunov feature are extracted at the high frequency band, so that the characteristics of the voice can be embodied in detail at each frequency band. A fluid-solid coupling feature extraction idea based on glottic flow field distribution to be extracted is as follows: firstly, dividing voice frequency bands into 24 frequency bands according to a Bark filter bank, then calculating logarithmic energy after carrying out Fourier transform on a low frequency band according to an MFCC extraction method, then carrying out discrete cosine transform, carrying out nonlinear dynamics analysis on a high frequency band, extracting correlation dimension and a maximum Lyapunov exponent, and then fusing multiple features.
Further analysis of the speech signal from vocal cord vibration perspective, the vocal cord model equation set is described as follows:
Figure BDA0003474249260000091
in the formula, α ═ 1, and 2 represent left and right portions, respectively. x is the number of,υRespectively the motion displacement and the speed of the mass block; m is,k,k,rRespectively representing the mass of the mass block, the elastic coefficient of the spring, the coupling elastic coefficient and the damping constant; l and d are the vocal cord length and the thickness of the lower-layer mass block; k is a radical of,IRespectively, the bernoulli pressure and the impact force generated at the time of collision.
Setting the model mass, elastic coefficient, coupling coefficient, damping constant and subglottic pressure as optimizable parameters, expressed as vectors: phi: is ═ m,k,k,r,PS]And a proper phi is searched by using a variation particle swarm quasi-Newton method, so that the glottic fluid-solid coupling model can accurately generate a glottic waveform. In order to avoid directly using a gradient method to obtain a local minimum solution in a non-convex search space, firstly, a variation particle swarm method is used for obtaining an optimal solution, then, a quasi-Newton method is used for carrying out local optimization on the obtained solution, and a global optimal point is found.
The selection and crossing process adopts a roulette selection rule to select M individuals. And the particle swarm algorithm is terminated under the condition that the obtained highest fitness exceeds a preset threshold or reaches a preset iteration number. Target voice source UgeAnd waveform U simulated with parameter vector phigsThe time-domain error therebetween is defined as an objective function F:
Figure BDA0003474249260000101
in the formula, N represents Uge,UgsAnd (6) counting the number of points. When the value of the objective function F reaches the global minimum, the simulation glottic airflow U of the vocal cord mass block model is showngsAnd target glottic airflow UgeConsistently, the vocal cord mass model can accurately reflect the actual vocal cord structure of the accurately simulated target vocal sound source.
The flow-solid characteristics of the voice signal are extracted through sub-band nonlinear analysis and a vocal band mass block model, the sub-band nonlinear analysis reflects the nonlinear characteristic caused by the optimal airflow vortex in the voice signal generation process, and the actual vocal band structure of the target voice signal is simulated through the vocal band mass block model. And applying the extracted flow-fixed characteristics of the voice signal to subsequent voice recognition.
Step S202: analyzing a minority class sample S of the speech data feature set using oversampled SMOTEminAnd according to the minority sample SminGenerating a sample TgenAnd combining the samples TgenStore to minority sample space Kmin[]Performing the following steps; wherein the sample Tgen=count(Kmin);
Step S203: judging the sample TgenWhether less than the number M of samples that the oversampled SMOTE needs to generateupIf T isgen<MupReturning to execute the step S202, otherwise executing the step S204; wherein M isupClass SminX oversampling ratio N1
Step S204: analyzing the sample T with undersampled ENNgenAnd a plurality of classes of samples S in the speech data feature setmajIf said sample T is a nearest neighbor samplegenThe nearest neighbor samples of (A) are k and k or more and the samples TgenIf the samples are of different types, deleting Kmin[]Of the corresponding sample TgenIf said plurality of samples SmajThe nearest neighbor samples of (A) are k and more than k and the plurality of types of samples SmajIf the samples with different categories are not the same, deleting the majority of samples Smaj(ii) a Wherein the undersampled ENN deleted samples Tdel=Tgen+Smaj
Step S205: determining the samples T deleted by the undersampled ENNdelWhether less than the number M of samples that the undersampled ENN needs to deletedownIf T isdel<MdownReturning to execute the step S204, otherwise, outputting the current balanced voice data set; wherein M isdownMajority class sample SmajX undersampling rate N2
The SMOTE oversampling algorithm searches k nearest neighbor samples S in a minority sample class based on a k nearest neighbor thoughtmin_i. Assume that the number of generated samples of the data set is MupThen from Smin_iIn random selection of MupA sample, MupOne sample is marked as Smin_1,Smin_2,......Smin_j. Correlating data samples Smin_iAnd Smin_jObtaining a synthetic sample S through corresponding random interpolation operationnew
Snew=Smin_i+rand(0,1)(Smin_j-Smin_i);
Where rand (0, 1) denotes a random number in the interval (0, 1), i ═ 1, 2,.. 9., k, j ═ 1, 2up,MupTo generate the number of samples. The number of generated samples is determined by the over-sampling rate.
The ENN (edited Nearest neighbor) undersampling algorithm also reduces majority class and minority class samples based on the K neighbor selection strategy. The basic idea is as follows: assuming an unbalanced data set D, SmajRepresents the majority of class samples, traverse SmajEach sample S in (1)maj_iFinding out Smaj_iIf two or more of the three nearest neighbor samples are Smaj_iIf the types are different, deleting the sample Smaj_i. By combining SMOTE and ENN, the extracted voice data set is subjected to equalization processing through SMOTE-ENN mixed sampling, and the new samples generated by the oversampling algorithm acted by the undersampling algorithm are subjected to sample elimination, so that the problem of generating noise samples is solved under the condition that data set information is not lost.
Step S206: calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model;
step S207: constructing the dual-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
the classification performance of random forests is reduced when non-uniform data sets are processed. The main reasons are two, firstly, in the random forest construction process, the training set is selected by bootstrap self-sampling. Because few samples of the original data set are fewer and the probability of sampling of the few samples is lower, the number of the few samples in the sub-training set is smaller than that of the original data set, and the non-equilibrium of the data set is aggravated. Secondly, because the number of the samples of the minority class in the original data set is low, the decision tree based on the sub-training set lacks the generalization capability and cannot embody the characteristics of the minority class.
A random forest is an integrated classifier R ═ h (x, θ) composed of a set of decision treesk) K is 1, 2,. K }, where { θ · is equal tokThe random vectors obeying independent and same distribution, K is the number of decision trees in the random forest, and the training set of each classifier is selected from a data set D<X,Y>And randomly sampling to obtain. The marginal function of a random forest is:
Figure BDA0003474249260000111
wherein, the classification performance of the base classifier { h (x, θ) } is defined as:
s=EX,Ymr(X,Y);
assume s ≧ 0, i.e., the base classifier is a weak classifier. Random forest generalization error PE*The upper limit of (A) is:
Figure BDA0003474249260000121
wherein the subscripts X, Y denote the probability P covering the X, Y space,
Figure BDA0003474249260000122
and (4) averaging the correlation coefficients among all the classifiers, and showing that the generalization error of the random forest is related to the classification performance of the base classifier and the correlation coefficient among all the base classifiers. Therefore, a dual-factor decision tree splitting algorithm is provided, the correlation coefficient among all the base classifiers is reduced, the classification performance of the base classifiers is improved, and the generalization error of the random forest is reduced.
The node splitting algorithm of the decision tree mainly comprises ID3, C4.5[23], CART [24] and the like. The ID3 algorithm selects the information gain as the segmentation criterion. The "feature-value" combination based on the maximum information gain would be selected as the segmentation. The disadvantage is that the information gain criterion favors features with many possible values, but ignores the relevance to the classification, and the classification result cannot be generalized. The C4.5 and CART algorithms use "information gain ratio" and "kini coefficient", respectively, as criteria for selecting the segmentation. The ID3 algorithm using information gain as a node splitting standard can only process discrete features, and C4.5 and CART using information gain rate and a kini coefficient as indexes can process numerical features. The difference between the information gain rate and the Keyny coefficient is that the information gain rate is multiplied by the logarithm of the class probability by the class probability to calculate the entropy difference before and after splitting, which is beneficial to smaller distribution with less quantity and a plurality of characteristic values; the kini coefficient is derived by subtracting the sum of the squared probabilities of each class from one class, favoring a larger data distribution. Both are algorithms based on information theory, and the reason for node splitting is somewhat approximated. Therefore, the combination between the two is established, and a random factor and a balance factor are introduced to realize the node self-adaptive splitting.
Given the current equalized speech data set D, the entropy of this current equalized speech data set is defined as:
Figure BDA0003474249260000123
when the current equalized speech data set D is divided into subsets D1,...,DkWhen the entropy is reduced, the corresponding entropy is reduced to obtain the' information gain
Figure BDA0003474249260000124
The information gain rate is normalized by using the value number of the feature on the basis of the information gain, that is, the information gain rate is normalized by using the value number of the feature
Figure BDA0003474249260000125
The kini coefficient of the current equalized speech data set D is then defined as:
Figure BDA0003474249260000131
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003474249260000132
considering the linear combination of the information gain rate and the kini coefficient, the two-factor node splitting algorithm is as follows:
ψ(D;D1,...,Dk)=α[β1Gini(D;D1,...,Dk)-β2Gain_ratio(D;D1,...,Dk)]
wherein α is a randomness factor (α ≦ 0 ≦ 1) controlling the randomness of node splitting, when α is 1, the generated decision tree is the same as the deterministic decision tree, and when α is 0, the generated decision tree is a completely random tree. Beta is ai(i is 1, 2) is a balance factor of node splitting index, and beta is more than or equal to 0i(i ═ 1, 2) ≦ 1 may not be 0 or 1 at the same time, and on the boundary, only (1, 0) or (0, 1) in combination. As shown in FIG. 5, a dual-factor random forest is constructed by the dual-factor node splitting algorithm, when the nodes of the decision tree are split, the CART has the smallest Keyny coefficient, while the C4.5 algorithm has the largest information gain rate, if both indexes reach the optimum, psi (D; D)1,...,Dk) And taking the minimum value as an optimal rule to split the nodes. After the random forest is generated, the out-of-bag error is estimated. If the error outside the bag reaches the minimum, outputting the random forest under the condition of the optimal factor; otherwise, updating the double factors and reconstructing the random forest.
Step S208: inputting the current balanced voice data set into a double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the double-factor random forest model under a preset double-factor condition;
the out-of-bag misclassification rate is:
Figure BDA0003474249260000133
wherein N ismis_majNumber of classification errors for majority class samples, Nmis_min_iFor the number of sample classification errors of the ith minority class, minclass is the number of minority class classes.
Step S209: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and Tgen、Mup、MdownAnd TdelAnd returning to execute the step S202 until the classification evaluation index is converged, and outputting the current balanced voice data set.
In the invention, over-sampling and under-sampling can equalize data distribution as much as possible, but the existing sampling algorithm does not pay full attention to the problems of class overlapping and noise, and the spatial distribution of data is distorted after sampling. Therefore, a mixed sampling algorithm of the combined double-factor random forest is provided, a new sample is synthesized for a few classes according to a sample distribution rule, and redundant information is removed under the condition that the space structure of the majority classes is not changed according to the feedback of the double-factor random forest. And pre-equalizing the data set by combining a mixed sampling algorithm of SMOTE and ENN, then evaluating the pre-equalized data set by using a double-factor random forest, and respectively calculating a classification evaluation index and an error classification rate. And correcting the mixed sampling rate according to the error classification rate, wherein in the iterative process of mixed sampling, the sampling rate is dynamically changed along with the out-of-bag error classification rate of the random forest, but not the unbalance degree of the data set. Judging whether the classification evaluation indexes are converged or not according to the classification evaluation indexes F1-macro serving as iteration stop standards, if so, namely when F1-macro continuously descends twice or keeps unchanged, ending mixed sampling, stopping an iteration process, and outputting a data set which is an optimal balanced data set conforming to the original data distribution; if the classification evaluation index diverges, updating the mixed sampling rate of the SMOTE-ENN mixed sampling according to the out-of-bag error classification rate, and initializing the out-of-bag errorMisclassification rate, Tgen、Mup、MdownAnd TdelAnd returning to carry out equalization processing on the extracted voice data set again until the classification evaluation index is converged, and outputting the current equalized voice data set.
The specific flow steps of the OOBM-SMOTE-ENN combined double-factor random forest mixed sampling algorithm are as follows:
inputting: data set D, majority class samples SmajMinority class sample SminNumber of nearest neighbor samples k, initial oversampling rate N1Initial undersampling rate N2
And (3) outputting: the data set D' is pre-equalized.
1. Initializing OOBMmis_rateInitializing oversampling requires generating the number of samples M, 1up(Mup=Smin×N1) Will MupSet to 0;
2. correcting N according to double-factor random forest feedback1、N2Setting Tgen0, S for each minority samplemin_iTraversing, and storing a few samples generated by the SMOTE algorithm into a space Kmin[];
3、Tgen=Tgen+count(Kmin) If T isgen<MupReturning to the step 2, otherwise, performing the step 4;
4. initializing the number of samples M that need to be deleted for undersamplingdown(Mdown=Smaj×N2) Will MdownSet to 0;
5. setting Tdel0 and for each majority sample Smaj_iGo through and compare Smaj_iAnd Kmin[]Label of medium sample, if said Kmin[]Middle Tgen_iThere are k and k or more nearest neighbor samples with the Tgen_iIf the samples are of different types, deleting Kmin[]Corresponding few classes in (1) generate samples Tgen_i. Meanwhile, the neighborhood samples are compared through ENN definition, if S ismaj_iThe nearest neighbor samples have k and more than k AND pointsS ismaj_iIf the samples are different in category, deleting the sample Smaj_i
6、Tdel=Tdel+(Tgen_i+Smaj_i) If T isdel<MdownReturning to step 5, otherwise, outputting a once equalized sample set D'.
And the current balanced voice data set is divided into a training set and a testing set, the random forest recognition model is trained by using the characteristic parameters of the voice of the training set, and the characteristic parameters of the testing set are subjected to prediction classification by using the trained random forest model. The comparative experiment results of the OOB MSE algorithm and the classical sampling algorithm are shown in the following table 1;
table 1 comparative experiment results of OOB MSE algorithm and classical sampling algorithm
Raw data SMOTE ADASYN BSM CNN OOBMSE
Recognition rate/%) 97.05 99.03 99.04 99.03 91.43 100.00
Recall/%) 92.31 99.16 98.86 99.16 91.34 100.00
Kappa number/% 89.89 98.03 98.02 98.03 82.82 100.00
F1 fraction/%) 94.94 99.02 99.01 99.02 91.40 100.00
The table, SMOTE, is called Synthetic minimum ownership over sampling Technique, i.e. a Technique for synthesizing Minority samples, and is an improved scheme based on a random Oversampling algorithm, and since the random Oversampling adopts a simple sample replication strategy to add Minority samples, the basic idea of the SMOTE algorithm is to analyze the Minority samples and artificially synthesize new samples according to the Minority samples to add the new samples to a data set.
ADASYNN is an adaptive comprehensive sampling method for unbalanced learning. Based on the idea of adaptively generating minority class data samples according to their distribution: fewer class samples that are more difficult to learn will generate more synthetic data than those that are easier to learn. The ADASYNN method can reduce learning bias brought by original unbalanced data distribution, and can adaptively transfer decision boundaries to samples which are difficult to learn.
Borderline SMOTE (BSM) is an improved over-sampling algorithm based on SMOTE, which uses only a few classes of samples on the boundary to synthesize new samples, thereby improving the class distribution of the samples.
Condensed Nearest Neighbor concentration, or CNN for short, is an undersampling technique used to find subsets of a sample set without incurring a loss in model performance (referred to as the minimum consistency set).
As can be seen from the above table, the OOB MSE mixed sampling algorithm provided by the invention is superior to the traditional SMOTE, ADASYNN, BSM and CNN algorithms. The accuracy of OOB MSE in the random forest classifier reaches 100%, and other evaluation indexes reach optimal values, so that the method is superior to the traditional method. Therefore, the equalization algorithm provided by the invention improves the recognition rate and reliability of the system.
The method provided by the embodiment is quite common in voice recognition and even intelligent medical diagnosis, and the mixed sampling algorithm combining the double-factor random forest provided by the invention is based on the double-factor random forest and combines SMOTE and ENN to solve the problem of unbalanced data classification in voice recognition. In view of the shortcomings of the conventional oversampling algorithm, in the hybrid sampling process, the oversampling rate is dynamically changed according to the out-of-bag error classification rate of the two-factor random forest, not the unbalanced rate of the data set, and simultaneously, noise in the oversampling generated samples is removed through ENN. And combining double-factor random forest and mixed sampling according to the out-of-bag error classification rate, dynamically correcting the mixed sampling rate, increasing the number of a few types of samples, and removing noise and repeated information in the samples to balance data.
Referring to fig. 6, fig. 6 is a block diagram illustrating a structure of a voice sample equalization method combining mixed sampling and random forest according to an embodiment of the present invention; the specific device may include:
the voice recognition system comprises an acquisition module 100, a processing module and a processing module, wherein the acquisition module 100 is used for acquiring an initial voice data set, and extracting the characteristics of the initial voice data set to obtain an extracted voice data characteristic set;
an analysis module 200, configured to analyze a minority class sample of the voice data feature set by using oversampling SMOTE and generate a new target minority class sample according to the minority class sample, analyze a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes samples in the voice data feature set by using undersampling ENN, delete the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtain a current balanced voice data set;
the construction module 300 is configured to calculate an information gain rate and a kini coefficient of the current equalized voice data set, and linearly combine the information gain rate and the kini coefficient of the current equalized voice data set by using two factors to construct a two-factor random forest model;
an input module 400, configured to input the current balanced voice data set into the two-factor random forest model, and output a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset two-factor condition;
a determining module 500, configured to determine whether the classification evaluation indicator converges, and if the classification evaluation indicator converges, output the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
The apparatus for jointly mixing sampling and equalizing voice samples in a random forest according to this embodiment is used to implement the foregoing method for jointly mixing sampling and equalizing voice samples in a random forest, and thus specific embodiments in the apparatus for jointly mixing sampling and equalizing voice samples in a random forest may be found in the foregoing embodiments of the method for jointly mixing sampling and equalizing voice samples in a random forest, for example, 100, 200, 300, 400, and 500 are respectively used to implement steps S101, S102, S103, S104, and S105 in the method for jointly mixing sampling and equalizing voice samples in a random forest, and therefore, the specific embodiments thereof may refer to descriptions of corresponding embodiments of various parts, and are not described herein again.
The embodiment of the invention also provides voice sample equalization equipment combining mixed sampling and random forest, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the voice sample equalization method combining the mixed sampling and the random forest when the computer program is executed.
A specific embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the method for equalizing the voice samples of the combined mixed sampling and random forest.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. A voice sample equalization method combining mixed sampling and random forest is characterized by comprising the following steps:
s101: acquiring an initial voice data set, and performing feature extraction on the initial voice data set to obtain an extracted voice data feature set;
s102: analyzing a few class samples of the voice data feature set by using oversampling SMOTE, generating a new target few class sample according to the few class samples, analyzing a nearest neighbor sample of the target few class sample and a nearest neighbor sample of a plurality of class samples in the voice data feature set by using undersampling ENN, deleting the target few class sample and the majority class sample according to the nearest neighbor sample of the target few class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
s103: calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model;
s104: inputting the current balanced voice data set into the double-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
s105: judging whether the classification evaluation index is converged, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
2. The method of claim 1, wherein the analyzing a minority class sample of the voice data feature set using oversampling SMOTE and generating a new target minority class sample from the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes samples in the voice data feature set using undersampling ENN, and deleting the target minority class sample and the plurality of classes samples from the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the plurality of classes samples to obtain a current equalized voice data set comprises:
s201: analyzing the minority sample S using the oversampled SMOTEminAnd according to the minority sample SminGenerating a sample TgenThe sample T isgenStore to a small numberClass sample space Kmin[]Performing the following steps; wherein the sample Tgen=count(Kmin);
S202: judging the sample TgenWhether less than the number M of samples that the oversampled SMOTE needs to generateupIf T isgen<MupReturning to execute the step S201, otherwise executing the step S203; wherein M isupSample S of minority classminX oversampling ratio N1
S203: analyzing the sample T with the undersampled ENNgenAnd a plurality of classes of samples S in the speech data feature setmajIf said sample T is a nearest neighbor samplegenThe nearest neighbor samples of (A) are k and k or more and the samples TgenIf the samples are of different types, deleting Kmin[]Of the corresponding sample TgenIf said plurality of samples SmajThe nearest neighbor samples of (A) are k and more than k and the plurality of types of samples SmajIf the samples with different categories are not the same, deleting the majority of samples Smaj(ii) a Wherein the undersampled ENN deleted samples Tdel=Tgen+Smaj
S204: determining the samples T deleted by the undersampled ENNdelWhether less than the number M of samples that the undersampled ENN needs to deletedownIf T isdel<MdownReturning to execute the step S203, otherwise, outputting the current balanced voice data set; wherein M isdownMajority class sample SmajX undersampling rate N2
3. The method of claim 2, wherein the analyzing the minority sample S using oversampling SMOTEminAnd according to the minority class samples SminGenerating a sample TgenThe method comprises the following steps:
in the minority sample SminMiddle search k nearest neighbor samples Smin_i
Assuming that the number of samples generated by the oversampling SMOTE is MupFrom said Smin_iMedium random selectionSelecting said MupA sample of said MupOne sample is marked as Smin_1,Smin_2,......Smin_j
Associating said Smin_iAnd said Smin_jGenerating samples T by a random interpolation operationgen=Smin_i+rand(0,1)(Smin_j-Smin_i) (ii) a Where rand (0, 1) denotes a random number in the interval (0, 1), i ═ 1, 2,.. 9., k, j ═ 1, 2up
4. The method of claim 2, wherein if the classification evaluation index diverges, updating the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag misclassification rate, returning to perform step S102 until the classification evaluation index converges, and outputting the current equalized speech data set comprises:
if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag error classification rate, and initializing the out-of-bag error classification rate and Tgen、Mup、MdownAnd TdelAnd returning to execute the step S102 until the classification evaluation index is converged, and outputting the current balanced voice data set.
5. The method of claim 1, wherein the calculating the information gain rate and the kini coefficient of the current equalized speech data set, and the linearly combining the information gain rate and the kini coefficient of the current equalized speech data set by using two factors, and the constructing the two-factor random forest model comprises:
calculating the information gain rate and the kini coefficient of the current balanced voice data set, linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors, and adaptively splitting decision tree nodes of the double-factor random forest model;
constructing the dual-factor random forest model according to the self-adaptive splitting of the decision tree nodes;
and judging whether the out-of-bag error value of the double-factor random forest model is a preset out-of-bag error value or not, if so, outputting the double-factor random forest model under a preset double-factor condition, otherwise, updating the double factors of the self-adaptive splitting of the nodes of the decision tree, and reconstructing the double-factor random forest model.
6. The method of claim 5, wherein the calculating the information gain rate and the kini coefficient of the current equalized speech data set, and the linearly combining the information gain rate and the kini coefficient of the current equalized speech data set by using two factors, and wherein adaptively splitting the decision tree nodes of the two-factor random forest model comprises:
dividing the current equalized speech data set D into subsets D1,...,DkCalculating the information gain of the current equalized speech data set
Figure FDA0003474249250000031
Wherein the entropy Ent (D) - Σ of the current equalized speech data set Dy∈yP(y|D)logP(y|D);
Normalizing the information gain of the current balanced voice data set by using the value number of the characteristic to obtain the information gain rate of the current balanced voice data set
Figure FDA0003474249250000032
Figure FDA0003474249250000033
Computing a kini coefficient for the current equalized speech data set
Figure FDA0003474249250000034
Figure FDA0003474249250000035
Wherein, I (D) is 1-Sigmay∈yP(y|D)2
Linearly combining information gain ratio and a kini coefficient of the current equalized speech data set by the dual factor psi (D; D)1,...,Dk)=α[β1Gini(D;D1,...,Dk)-β2Gain_ratio(D;D1,...,Dk)]Adaptively splitting decision tree nodes of the dual-factor random forest model; wherein, alpha is a random factor, betaiIs a balance factor of the node splitting index.
7. The method of claim 1, wherein if the classification evaluation index diverges, updating the oversampling rate of the oversampled SMOTE and the undersampling rate of the undersampled ENN according to the out-of-bag misclassification rate, returning to perform step S102 until the classification evaluation index converges, and outputting the current equalized speech data set comprises:
if the classification evaluation index diverges, then according to
Figure FDA0003474249250000041
Updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampled ENN, returning to execute the step S102 until the classification evaluation index is converged, and outputting the current equalized voice data set;
wherein, OOBMmis_rateIs the out-of-bag error classification rate, N, of the two-factor random forest modelmis_majNumber of classification errors for most classes of samples, Nmis_min_iFor the number of sample classification errors of the ith minority class, minclass is the number of minority class classes.
8. A speech sample equalization apparatus that combines hybrid sampling with random forests, comprising:
the voice recognition system comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring an initial voice data set, and extracting the characteristics of the initial voice data set to obtain an extracted voice data characteristic set;
the analysis module is used for analyzing a minority class sample of the voice data feature set by using oversampling SMOTE, generating a new target minority class sample according to the minority class sample, analyzing a nearest neighbor sample of the target minority class sample and a nearest neighbor sample of a plurality of classes of samples in the voice data feature set by using undersampling ENN, deleting the target minority class sample and the majority class sample according to the nearest neighbor sample of the target minority class sample and the nearest neighbor sample of the majority class sample, and obtaining a current balanced voice data set;
the construction module is used for calculating the information gain rate and the kini coefficient of the current balanced voice data set, and linearly combining the information gain rate and the kini coefficient of the current balanced voice data set by using double factors to construct a double-factor random forest model;
the input module is used for inputting the current balanced voice data set into the double-factor random forest model and outputting a classification evaluation index and an out-of-bag error classification rate of the current balanced voice data set under a preset double-factor condition;
the judging module is used for judging whether the classification evaluation index is converged or not, and if the classification evaluation index is converged, outputting the current balanced voice data set; and if the classification evaluation index diverges, updating the oversampling rate of the oversampling SMOTE and the undersampling rate of the undersampling ENN according to the out-of-bag error classification rate, returning to execute the step S102 until the classification evaluation index converges, and outputting the current balanced voice data set.
9. A voice sample equalization apparatus that combines hybrid sampling and random forest, comprising:
a memory for storing a computer program;
a processor for implementing the steps of a method of speech sample equalization combining mixed sampling and random forest as claimed in any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of a method for speech sample equalization for combined mixed sampling and random forest according to any one of claims 1 to 7.
CN202210083571.0A 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest Active CN114550697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210083571.0A CN114550697B (en) 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083571.0A CN114550697B (en) 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest

Publications (2)

Publication Number Publication Date
CN114550697A true CN114550697A (en) 2022-05-27
CN114550697B CN114550697B (en) 2022-11-18

Family

ID=81671633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083571.0A Active CN114550697B (en) 2022-01-17 2022-01-17 Voice sample equalization method combining mixed sampling and random forest

Country Status (1)

Country Link
CN (1) CN114550697B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273909A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 The sorting algorithm of high dimensional data
CN111202526A (en) * 2020-01-20 2020-05-29 华东医院 Method for simplifying and optimizing multi-dimensional elderly auditory function evaluation system
US20210097449A1 (en) * 2020-12-11 2021-04-01 Intel Corporation Memory-efficient system for decision tree machine learning
US20210287136A1 (en) * 2020-03-11 2021-09-16 Synchrony Bank Systems and methods for generating models for classifying imbalanced data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273909A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 The sorting algorithm of high dimensional data
CN111202526A (en) * 2020-01-20 2020-05-29 华东医院 Method for simplifying and optimizing multi-dimensional elderly auditory function evaluation system
US20210287136A1 (en) * 2020-03-11 2021-09-16 Synchrony Bank Systems and methods for generating models for classifying imbalanced data
US20210097449A1 (en) * 2020-12-11 2021-04-01 Intel Corporation Memory-efficient system for decision tree machine learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MOHD ADIL: "Solving the Problem of Class Imbalance in the Prediction of Hotel Cancelations: A Hybridized Machine Learning Approach", 《PROCESSES》 *
X ZHANG,等: "Class-imbalanced voice pathology classification: Combining hybrid sampling with optimal two-factor random forests", 《APPLIED ACOUSTICS》 *
ZHAOZHAO XU, 等: "A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data", 《JOURNAL OF BIOMEDICAL INFORMATICS》 *
商紫薇: "基于SMOTE+ENN与随机森林的心电辅助诊疗应用研究", 《中国优秀硕士学位论文全文数据库》 *
赵品辉: "联合多频带非线性方法的病理嗓音识别研究", 《信息化研究》 *

Also Published As

Publication number Publication date
CN114550697B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN107564513B (en) Voice recognition method and device
US11462210B2 (en) Data collecting method and system
US20210224647A1 (en) Model training apparatus and method
JP2014026455A (en) Media data analysis device, method and program
Zhang et al. Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction
CN108154186B (en) Pattern recognition method and device
JP6979203B2 (en) Learning method
CN110956277A (en) Interactive iterative modeling system and method
EP3956885A1 (en) Condition-invariant feature extraction network for speaker recognition
CN104077598A (en) Emotion recognition method based on speech fuzzy clustering
CN106384587B (en) A kind of audio recognition method and system
KR102406512B1 (en) Method and apparatus for voice recognition
Chen et al. SEC4SR: A security analysis platform for speaker recognition
Fan et al. Modeling voice pathology detection using imbalanced learning
CN114550697B (en) Voice sample equalization method combining mixed sampling and random forest
Lin et al. Domestic activities clustering from audio recordings using convolutional capsule autoencoder network
Saeidi et al. Particle swarm optimization for sorted adapted gaussian mixture models
JP2014215385A (en) Model estimation system, sound source separation system, model estimation method, sound source separation method, and program
Choi et al. Adversarial speaker-consistency learning using untranscribed speech data for zero-shot multi-speaker text-to-speech
CN115437960A (en) Regression test case sequencing method, device, equipment and storage medium
Farsi et al. Implementation and optimization of a speech recognition system based on hidden Markov model using genetic algorithm
CN111755012A (en) Robust speaker recognition method based on depth layer feature fusion
CN106373576A (en) Speaker confirmation method based on VQ and SVM algorithms, and system thereof
KR20140077774A (en) Apparatus and method for adapting language model based on document clustering
Toman et al. Content-based audio retrieval by using elitism GA-KNN approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant