CN116933166A - Cerebral apoplexy-oriented unbalanced data set classification method and system - Google Patents

Cerebral apoplexy-oriented unbalanced data set classification method and system Download PDF

Info

Publication number
CN116933166A
CN116933166A CN202310944187.XA CN202310944187A CN116933166A CN 116933166 A CN116933166 A CN 116933166A CN 202310944187 A CN202310944187 A CN 202310944187A CN 116933166 A CN116933166 A CN 116933166A
Authority
CN
China
Prior art keywords
positive
sample
negative
membership function
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310944187.XA
Other languages
Chinese (zh)
Inventor
李凤莲
张雪英
魏鑫
回海生
李彦民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202310944187.XA priority Critical patent/CN116933166A/en
Publication of CN116933166A publication Critical patent/CN116933166A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a classification method and a classification system for cerebral apoplexy unbalanced data sets, which relate to the technical field of data processing, and the method comprises the following steps: dividing a cerebral apoplexy unbalanced data set into a training sample set and a testing sample set; determining the self-adaptive adjustment radius of the positive/negative type sample according to the distance between each sample point so as to determine the self-adaptive adjustment factor of the positive/negative type sample and further construct a difference matrix; counting the number of positive class samples and the number of negative class samples in the effective range of the sample points according to the difference matrix to determine the positive/negative class information quantity contained in the sample points, and further constructing an information quantity fuzzy membership function; determining a positive/negative type fuzzy membership function based on the distance between samples; determining an improved positive/negative class fuzzy membership function, and further constructing a fuzzy support vector machine classifier; and classifying the cerebral apoplexy unbalanced data set by adopting a fuzzy support vector machine classifier. The invention effectively improves the classification performance of the cerebral apoplexy unbalanced data set.

Description

Cerebral apoplexy-oriented unbalanced data set classification method and system
The invention discloses a division application of a cerebral apoplexy unbalanced data set-oriented classification method and a cerebral apoplexy unbalanced data set-oriented classification system, wherein the application number of the mother application is 201911189087.0, and the application date is 2019.11.28.
Technical Field
The invention relates to the technical field of data processing, in particular to a cerebral apoplexy-oriented unbalanced data set classification method and system.
Background
The cerebral apoplexy is also called as "apoplexy" and "cerebrovascular accident", which is an acute cerebrovascular disease, and is a disease in which cerebral tissue is damaged due to sudden rupture of cerebral blood vessels or failure of blood to flow into the brain caused by vessel blockage. Along with the development of medical informatization, cerebral apoplexy data gradually show unbalanced data set characteristics, cerebral apoplexy patients are generally less than non-cerebral apoplexy patients, and due to the fact that the classification model is generally better in bias, namely the classification effect on non-cerebral apoplexy patients (called majority class) is better, the classification performance on cerebral apoplexy patients (called minority class) is lower and even can not be identified, and therefore when the existing classification model is adopted to classify cerebral apoplexy unbalanced data sets, the classification performance on cerebral apoplexy patients (minority class) data is poorer.
Disclosure of Invention
The invention aims to provide a classification method and a classification system for a cerebral apoplexy unbalanced data set, which aim to solve the problem that the existing classification model is poor in classifying cerebral patients, namely minority class classification performance, of the cerebral unbalanced data set.
In order to achieve the above object, the present invention provides the following solutions:
a cerebral apoplexy unbalanced data set-oriented classification method comprises the following steps:
acquiring a cerebral apoplexy unbalanced data set; the data in the cerebral apoplexy unbalanced data set are classified data, a large number of normal individuals are regarded as negative samples, and a small number of sick individuals are regarded as positive samples;
the cerebral stroke imbalance data set is calculated according to 7: the 3 proportion is randomly divided into a training sample set and a test sample set, wherein the unbalance rate of the training sample set and the test sample set is not changed; the sample points in the training sample set are expressed asx i ∈R d ,x i A feature vector representing the ith sample point in the cerebral stroke imbalance data set, d being the dimension of the feature vector, +.>Represents the d-th dimension feature vector, R d Means that the training sample set belongs to d-dimensional real space; by y i Representing two different class labels, y i E { -1, +1}, then y i = -1 represents a negative sample, i.e. a non-stroke patient; y is i = +1 represents a positive sample, i.e. a stroke patient; u (x) i ) Is a fuzzy membership function, represents the membership of the ith sample, represents the ith sample x i Belonging to y i Class degree, 0 < u (x i )≤1;
Calculating the distance between each sample point in the training sample set;
Determining a positive/negative sample self-adaptive adjusting radius according to the distance between the sample points; determining positive/negative sample adaptive adjustment factors according to the positive/negative sample adaptive adjustment radius; constructing a difference matrix according to the positive/negative sample self-adaptive adjustment factors;
counting the number of positive class samples and the number of negative class samples in the effective range of the sample points according to the difference matrix; wherein the positive sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set, and the negative sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set;
determining the positive/negative type information quantity contained in the sample points according to the number of the positive type samples and the number of the negative type samples;
constructing an information quantity fuzzy membership function u according to the positive/negative type information quantity contained in the sample points 1 (x i );
Determining a positive/negative type fuzzy membership function based on the distance between samples according to the distance between each sample point;
determining an improved positive/negative type fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative type fuzzy membership function based on the distance between samples;
constructing a fuzzy support vector machine classifier according to the improved positive/negative class fuzzy membership function;
Classifying a test sample set in the cerebral apoplexy unbalanced data set by adopting the fuzzy support vector machine classifier; inputting a cerebral apoplexy unbalanced data set to be classified into a constructed fuzzy support vector machine classifier, outputting the categories corresponding to the test data of the cerebral apoplexy unbalanced data set, and dividing the categories into cerebral apoplexy patients or non-cerebral apoplexy patients.
Optionally, determining the positive/negative sample adaptive adjustment radius according to the distance between the sample points specifically includes:
the positive class sample adaptive adjustment radius is defined as:
AR + =max(d ij )/Q +
the negative-type sample adaptive adjustment radius is defined as:
AR =max(d ij )/Q -
wherein max (d ij ) Representing the distance d between the individual sample points ij Positive sample adaptation factor Q + Adaptive factor Q of negative-type samples =q - =q/r; q is an adaptive factor; r is the unbalance rate corresponding to the unbalance data set, r=negative/positive number of samples.
Optionally, determining the positive/negative class sample adaptive adjustment factor according to the positive/negative class sample adaptive adjustment radius specifically includes:
the positive sample adaptive adjustment factors are:
the negative sample adaptive adjustment factors are:
optionally, the constructed difference matrix is:
Wherein t is ij The self-adaptive adjustment factor is a positive/negative sample, and n is the number of sample points in the training sample set; counting the number m of positive class samples in the effective range of the sample points according to the difference matrix + Sum of negative class sample number m -
Optionally, determining a positive/negative fuzzy membership function based on the distance between samples according to the distance between the sample points specifically includes:
according to the ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij Using the formulaDetermining centripetal force of positive class->
According to the ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij Using the formulaDetermining centripetal force of negative class->m + And m Respectively representing the number of positive class samples and the number of negative class samples;
centripetal force according to the positive classUsing the formula->Determining a positive class fuzzy membership function based on the distance between samples>Wherein δ is a positive parameter value; />Representing positive class of centripetal force->Is the maximum value of (2);
centripetal force according to the negative classUsing the formula->Determining a negative class fuzzy membership function based on the distance between samples> Representing negative class centripetal force->Is a maximum value of (a).
Optionally, determining an improved positive/negative type fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative type fuzzy membership function based on the distance between samples specifically includes:
Fuzzy membership function u according to the information quantity 1 (x i ) And the positive class fuzzy membership function based on the distance between samplesUsing the formula->Determining an improved positive class fuzzy membership function u + (x i );
Fuzzy membership function u according to the information quantity 1 (x i ) And the negative fuzzy membership function based on the distance between samplesUsing the formula->m + Not equal to 0, and determining improved negative fuzzy membership function u - (x i )。
Optionally, the function formula of the fuzzy support vector machine classifier is:
wherein w represents a normal vector of the hyperplane; c (C) + 、C Penalty factors respectively representing positive class samples and negative class samples, C + ,C - Is a constant; n is the number of sample points;representing a positive class fuzzy membership function, i.e. u + (x i );/>Representing a negative fuzzy membership function, i.e. u - (x i );ξ i Is a relaxation factor; y is i Representing two different class labels, phi (x i ) Representing a kernel function, b representing an offset;
obtaining an optimal classification hyperplane by solving a function formula of the fuzzy support vector machine classifier, thereby obtaining a sample point x i Category labels of (c).
In order to achieve the above purpose, the present invention also provides the following technical solutions:
a stroke-oriented unbalanced data set classification system, the system comprising:
the unbalanced data set acquisition module is used for acquiring a cerebral apoplexy unbalanced data set;
The unbalanced data set dividing module is used for dividing the cerebral apoplexy unbalanced data set according to 7: the 3 proportion is randomly divided into a training sample set and a test sample set, wherein the unbalance rate of the training sample set and the test sample set is not changed;
the sample distance calculating module is used for calculating the distance between each sample point in the training sample set;
the difference matrix construction module is used for constructing a difference matrix according to the distance between each sample point in the training sample set;
the sample number statistics module is used for counting the number of positive type samples and the number of negative type samples in the effective range of the sample points according to the difference matrix; wherein the positive sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set, and the negative sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set;
the positive and negative type information amount calculation module is used for determining positive/negative type information amount contained in the sample points according to the number of the positive type samples and the number of the negative type samples;
the information quantity fuzzy membership function construction module is used for constructing an information quantity fuzzy membership function according to the positive/negative type information quantity contained in the sample points;
the positive and negative class fuzzy membership function determining module is used for determining a positive/negative class fuzzy membership function based on the distance between samples according to the distance between the sample points;
The improved positive and negative fuzzy membership function construction module is used for determining an improved positive/negative fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative fuzzy membership function based on the distance between samples;
the fuzzy support vector machine classifier construction module is used for constructing a fuzzy support vector machine classifier according to the improved positive/negative class fuzzy membership function;
and the unbalanced data classification module is used for classifying the test sample set in the cerebral apoplexy unbalanced data set by adopting the fuzzy support vector machine classifier.
Optionally, the inter-sample distance calculating module specifically includes:
an inter-sample distance calculation unit for applying formula d ij =|x i -x j Computing an ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij
Optionally, the difference matrix construction module specifically includes:
an adaptive adjustment radius determining unit for determining the distance d between the sample points ij Determining a positive/negative sample self-adaptive adjustment radius;
the self-adaptive adjustment factor determining unit is used for determining a positive/negative type sample self-adaptive adjustment factor according to the positive/negative type sample self-adaptive adjustment radius;
And the difference matrix construction unit is used for constructing a difference matrix according to the positive/negative sample self-adaptive adjustment factors.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a classification method and a classification system for a cerebral apoplexy unbalanced data set, wherein a large number of normal individuals are regarded as negative samples, and a small number of sick individuals are regarded as positive samples, so that the cerebral apoplexy unbalanced data set is obtained; the difference matrix of the positive type sample and the negative type sample is constructed by utilizing the self-adaptive factor, the influence of the unbalanced characteristic of the cerebral apoplexy data set on the classification result is fully considered, so that the improved fuzzy membership function is more suitable for classifying cerebral apoplexy unbalanced data sets, namely classifying cerebral apoplexy patients or non-cerebral apoplexy patients in a large amount of patient data; when the fuzzy membership function is designed, firstly, uncertainty of sample points is measured by utilizing information entropy according to the number relation among samples of different categories, secondly, the relation of distances among samples of the same type is considered, an improved fuzzy membership function is constructed, the improved fuzzy membership function is applied to a fuzzy support vector machine, the constructed fuzzy support vector machine classifier is mainly used for improving the fuzzy membership function, and the purpose of improving the classification performance of the fuzzy support vector machine classifier on cerebral apoplexy unbalanced data sets is achieved in order to effectively solve the problem that the classification accuracy of minority classes in data is lower.
The invention fully considers the number relation among samples of different types and the relation of the distances among the samples of the same type, calculates the fuzzy membership based on the number relation and the distance, and constructs a fuzzy support vector machine; when data in a certain cerebral apoplexy unbalanced data set is required to be classified, the data set is directly input into a trained fuzzy support vector machine to obtain the category of each data, namely cerebral apoplexy patients or non-cerebral apoplexy patients, so that the problem of classifying the cerebral apoplexy unbalanced data set is solved, and the classification accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for classifying cerebral apoplexy-oriented unbalanced data sets;
FIG. 2 is a schematic diagram of a classification method for cerebral apoplexy unbalanced data sets;
Fig. 3 is a block diagram of a classification system for cerebral apoplexy unbalanced data set provided by the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a classification method and a classification system for a cerebral apoplexy unbalanced data set, which mainly aim at the defects that a fuzzy membership function in the cerebral apoplexy unbalanced data set classification process is endowed with inaccuracy, poor classification effect and the like by a fuzzy support vector machine, provide a certain reference for improving the fuzzy membership function, and finally apply the fuzzy membership function to the fuzzy support vector machine to effectively improve the classification performance of the fuzzy membership function in the cerebral apoplexy unbalanced data set.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Fig. 1 is a flow chart of a classification method for cerebral apoplexy unbalanced data set provided by the invention. Fig. 2 is a schematic diagram of a classification method for cerebral apoplexy unbalanced data set provided by the invention. Referring to fig. 1 and 2, the method for classifying cerebral apoplexy-oriented unbalanced data sets provided by the invention specifically includes:
step 101: a cerebral stroke imbalance dataset is acquired.
By unbalanced data set is meant that a data set is said to be an unbalanced data set if samples of one class in the data set are far more redundant than other classes. Taking two classifications as an example, the class with the small number in the dataset is called a minority class or a positive class sample, and the class with the large number is called a majority class or a negative class sample. The characteristics of the unbalanced data set are mainly characterized in two aspects, namely that the number of different categories is different; another aspect is the imbalance in the distribution of the different classes of samples.
The data in the cerebral apoplexy unbalanced data set obtained by the invention are classified data, a large number of normal individuals are regarded as negative samples, and a small number of sick individuals are regarded as positive samples.
Step 102: the cerebral apoplexy unbalanced data set is divided into a training sample set and a testing sample set randomly.
Samples in the cerebral stroke imbalance dataset were taken according to 7: and 3, randomly dividing the proportion to obtain a training sample set and a test sample set, wherein the proportion of the number of the positive and negative samples in the training sample set and the test sample set is kept the same as that of the original data set, namely the unbalance rate of the training sample set and the test sample set is not changed.
Representing sample points in a training sample set asx i ∈R d ,x i A feature vector representing the ith sample point in the cerebral stroke imbalance data set, d being the dimension of the feature vector, +.>Represents the d-th dimension feature vector, R d It means that the training sample set belongs to d-dimensional real space.
If y is adopted i Representing two different class labels, y i E { -1, +1}, then y i = -1 represents a negative sample, i.e. a non-stroke patient; y is i = +1 represents a positive sample, i.e. a stroke patient. u (x) i ) Is a fuzzy membership function, representing the membership of the ith sample, representing the ith sample x i Belonging to y i Class degree, 0 < u (x i ) The greater the value is less than or equal to 1, the more representative of sample x i Belonging to y i The higher the degree of class.
Step 103: and calculating the distance between each sample point in the training sample set.
Calculating distances among all sample points in the training sample set to measure differences among all feature vectors, wherein the differences are calculated by the following formula:
d ij =|x i -x j | (1)
Wherein x is i Representing the ith sample point, x in the training sample set j Represents the j-th sample point, d, in the training sample set ij Representing sample point x i And sample point x j Distance between them. d, d ij The smaller the description sample point x i Sample point x j The smaller the difference between the two, the sample point x i X is a group j The greater the probability of belonging to the same class.
Step 104: and constructing a difference matrix according to the distance between each sample point in the training sample set.
Constructing a difference matrix according to the distance between each sample point in the training sample set, specifically including:
1) According to the distance d between the sample points ij Positive/negative class sample adaptive adjustment radii are determined.
Let Q be an adaptive factor, which is a constant, and can be adaptively adjusted according to the sample set size. The invention takes Q=12, and then obtains the self-adaptive factor Q of the positive sample + Adaptive factor Q of negative-type samples =q - =q/r. Wherein r is the unbalance rate corresponding to the unbalance data set, and the unbalance rate r corresponding to the cerebral apoplexy unbalance data set can be obtained by adopting the following formula: r=negative/positive number of samples.
In the invention, the self-adaptive adjustment radius of the positive sample is defined as:
AR + =max(d ij )/Q + (2)
the negative-type sample adaptive adjustment radius is defined as:
AR =max(d ij )/Q - (3)
Wherein max (d) ij ) Representing the distance d between the individual sample points ij Is a maximum value of (a).
2) And determining positive/negative type sample adaptive adjustment factors according to the positive/negative type sample adaptive adjustment radius.
Further adaptively adjusting the radius AR based on the positive class sample + And said negative sample adaptive adjustment radius AR - Positive/negative class sample adaptation adjustment factors are defined.
The positive sample self-adaptive adjustment factors are as follows:
the negative sample adaptive adjustment factors are:
3) Adaptive adjustment factor t according to the positive/negative class sample ij And constructing a difference matrix R.
Let t= { T ij The } is an adaptive matrix based on unbalanced rates, which is constructed as follows:
in addition, according to the obtained d ij The difference matrix R can be further obtained as
Wherein n is the number of sample points in the training sample set, t ij Adaptive adjustment factor d corresponding to positive/negative sample ij Is the inter-sample difference.
Step 105: and counting the number of positive class samples and the number of negative class samples in the effective range of the sample points according to the difference matrix.
Statistics of sample Point x i The number m of positive class samples and negative class samples in the effective range + And m - Wherein the sample point x i The corresponding effective range is determined according to the ith row of the difference matrix R.
Step 106: and determining the positive/negative type information quantity contained in the sample points according to the number of the positive type samples and the number of the negative type samples.
The positive/negative type information amount includes a positive type information amount and a negative type information amount contained in the sample point. Let sample point x i Probability of belonging to the positive class isThe probability belonging to the negative class is +.>Where k=m + +m - 。m + For the ith sample point x i The number of positive samples in the effective range; m is m - For the ith sample point x i Negative sample number in the effective range. Then x can be derived i The positive/negative type information content is respectively as follows:
H + (x i )=-p + lnp + (8)
H - (x i )=-p - lnp - (9)
wherein H is + (x i ) Representing an ith sample point x in the training sample set i The amount of positive information contained; h - (x i ) Representing an ith sample point x in the training sample set i The amount of negative information contained; p is p + For sample point x i Probability of belonging to the positive class, p - For sample point x i Probability of belonging to the negative class.
Step 107: and constructing an information quantity fuzzy membership function according to the positive/negative type information quantity contained in the sample points.
According to the ith sample point x i The amount of positive information H contained + (x i ) And negative class information amount H - (x i ) Constructing an information quantity fuzzy membership function u 1 (x i ):
u 1 (x i )=1-(H + (x i )+H - (x i )) (10)
In which 0 < u 1 (x i )≤1。
Step 108: and determining a positive/negative fuzzy membership function based on the distance between samples according to the distance between the sample points.
Adjusting intra-radius target sample x based on adaptation i The distance between the sample and the sample of the same class obtains centripetal degree of positive class and negative class And->
Centripetal degree of positive class:
centripetal degree of negative class:
wherein d ij Representing the difference between the target sample and the sample of the same kind, m + And m Representing the number of positive class samples and the number of negative class samples, respectively.
The positive/negative type fuzzy membership function based on the distance between the samples comprises a positive type fuzzy membership function based on the distance between the samples and a negative type fuzzy membership function based on the distance between the samples. A positive class fuzzy membership function based on the inter-sample distance can be obtained according to equation (11):
a negative class fuzzy membership function based on the inter-sample distance can be obtained according to equation (12):
wherein the method comprises the steps ofAnd->Respectively representing fuzzy membership functions of positive class and negative class based on the distance between samples;delta represents a very small positive parameter value, < ->Representing positive class of centripetal force->Maximum value of>Representing negative class centripetal force->Is a maximum value of (a).
The invention reflects the tightness degree between samples through the class inward centrality, provides a fuzzy support vector machine based on the class inward centrality, overcomes the defects of the traditional fuzzy support vector machine, and can distinguish samples with higher mixing degree through centripetal degree, thereby achieving the purpose of effectively identifying effective samples and noise outlier points and reducing the influence of noise and outlier points on constructing an optimal classification surface.
Step 109: and determining an improved positive/negative type fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative type fuzzy membership function based on the distance between samples.
When calculating the fuzzy membership function, firstly, the difference among all sample points is required to be determined, a difference matrix is constructed according to the self-adaptive radius, and then the membership function is determined by utilizing the difference of the number of positive and negative samples in the difference matrix. When the sample point x i Belongs to the positive class, and x i When there is no positive sample around and there is only negative sample, it is regarded as noise point, and its membership value is set as a minimum value delta; similarly, when the sample point x i Belongs to the negative class, and x i When there is no negative sample around and there is only positive sample, it is also regarded as noise point, and its membership value is set as a minimum value. When the sample point x i Belongs to the positive category, x i When no negative type sample exists around and only positive type sample exists, the negative type sample is regarded as an effective point, and the membership degree is set to be 1; similarly, when the sample point x i Belongs to the negative class, and x i The positive type samples are not arranged around, and when the negative type samples are only arranged, the negative type samples are also regarded as effective points, and the membership degree is set to be 1. When positive class samples and negative class samples exist around the sample, the number of the positive class samples and the number of the negative class samples and the distance between the samples around each sample point are considered, the relationship of the number of the samples around the sample is measured by using information entropy, and the fuzzy membership functions of the positive class and the negative class are measured by using the membership function based on the distance between the samples.
The improved positive/negative fuzzy membership function comprises an improved positive fuzzy membership function and an improved negative fuzzy membership function. The improved fuzzy membership function calculation formula can be obtained according to the formula (13), the formula (14) and the formula (10). Wherein the membership function u is blurred according to the information quantity 1 (x i ) And the positive class fuzzy membership function based on the distance between samplesDetermining an improved positive class fuzzy membership function u + (x i ) The method comprises the following steps:
fuzzy membership function u according to the information quantity 1 (x i ) And the negative fuzzy membership function based on the distance between samplesDetermining modified negativesFuzzy membership degree function u - (x i ) The method comprises the following steps:
wherein 0 < u + (x i )≤1,0<u - (x i ) Less than or equal to 1, respectively representing the improved positive class and negative class fuzzy membership functions, representing the membership degree of the ith sample, representing the ith sample x i Belonging to y i The degree of reliability of the class. Delta is a small value and can be set according to practical situations.
Step 110: and constructing a fuzzy support vector machine classifier according to the improved positive/negative class fuzzy membership function.
The fuzzy support vector machine (Fuzzy Support Vector Machine, FSVM) adds a membership degree to each training sample based on the support vector machine, so that different training samples have different membership degrees. When constructing the objective function, different samples have different effects on the calculation of the optimal solution, so that different samples have different contributions to the determination of the optimal hyperplane. The membership degree of noise or isolated points is small, and the purpose of reducing the influence of noise or isolated points on the optimal hyperplane is achieved. The design of membership functions directly affects the classification performance of the fuzzy support vector machine. Different membership function design methods have important influences on the difficulty of algorithm implementation and the final classification result.
The invention constructs a fuzzy support vector machine classifier by utilizing the improved fuzzy membership function, and classifies the test sample by adopting the fuzzy support vector machine classifier.
The general form of the fuzzy support vector machine classifier constructed by the present invention can be expressed as:
wherein w represents a normal vector of the hyperplane; c (C) + 、C Penalty factors representing positive and negative samples respectively, C + ,C - Is constant. n is the number of sample points. y= +1 represents a normal sample label, namely a cerebral apoplexy patient label; y= -1 represents a negative sample label, namely a label of a non-cerebral stroke patient.Fuzzy membership function representing improved positive class, i.e. u + (x i );/>Fuzzy membership function representing modified negative classes, i.e. u - (x i )。ξ i Is a relaxation factor. y is i Representing two different class labels, y i ∈{-1,+1}。φ(x i ) Representing a kernel function, and b representing an offset.
By solving the formula (17), the optimal classification hyperplane can be obtained, thereby obtaining the sample point x i Category labels of (c).
The fuzzy support vector machine classifier constructed by the invention is mainly designed by improving the fuzzy membership function, and aims to effectively solve the problem of low classification accuracy of a few classes in data.
Step 111: and classifying the cerebral apoplexy unbalanced data set by adopting the fuzzy support vector machine classifier.
In practical application, the cerebral apoplexy unbalanced data set to be classified is input into a newly constructed fuzzy support vector machine classifier, so that the classes corresponding to the test data of the cerebral unbalanced data set can be output, namely, the cerebral unbalanced data set is classified into cerebral apoplexy patients or non-cerebral apoplexy patients.
The general fuzzy membership function design method does not comprehensively consider the relation between the number of samples and the distance of the samples, and mainly aims at the defects that the existing fuzzy membership function classification model is inaccurate and poor in classification effect in the process of classifying the cerebral apoplexy unbalanced data set, when the fuzzy membership function is designed, firstly, the uncertainty of sample points is measured by utilizing information entropy according to the number relation among samples of different categories, secondly, the relation among the similar samples is considered, a new fuzzy membership function is constructed, a certain reference is provided for improving the fuzzy membership function, and finally, the fuzzy membership function is applied to the fuzzy support vector machine, so that the classification performance of the cerebral apoplexy unbalanced data set is effectively improved.
The data in the test sample set is adopted to verify whether the cerebral apoplexy fuzzy support vector machine classifier designed by the invention can effectively improve the classification accuracy and classification performance of cerebral apoplexy patient data. The evaluation index of the verification experiment adopts the evaluation index commonly used for classifying the problems: sensitivity Se (Sensitivity), specificity Sp (Specificity), accuracy Acc (Accuracy) and geometric mean Gm (G-mean), defined as:
the above expression TP, FN, TN, FP represents the number of samples of a stroke patient, in which the sample points are correctly predicted by the classification model (i.e., the fuzzy support vector machine classifier of the present invention), the number of samples of a stroke patient, in which the classification model is incorrectly predicted by the non-stroke patient, the number of samples of a non-stroke patient, in which the classification model is correctly predicted by the non-stroke patient, and the number of samples of a non-stroke patient, in which the classification model is incorrectly predicted by the stroke patient, respectively, the larger the values of Se, sp, acc and Gm indicate that the classification effect is better. Se greatly shows that the accuracy rate of classifying cerebral apoplexy patient data is higher, and the expected result of an unbalanced data set is also obtained; sp reflects the classification of non-stroke patient data. However, in general, the classifier with higher Se does not have high Sp, namely, the classifying performance of the data of the cerebral apoplexy patient is good, and the classifying performance of the data of the non-cerebral apoplexy patient is reduced, so that the invention further adopts Gm as an evaluation index for the cerebral apoplexy unbalanced data set so as to more accurately reflect the overall performance of the cerebral apoplexy classifier designed by the invention.
The performance verification of the method is carried out by selecting the cerebral stroke unbalanced data set in the kagle database through experiments, and based on the cerebral stroke unbalanced data set in the kagle database, three cerebral stroke data sets with different balance rates are obtained through rearrangement, and the three data sets data1, data2 and data3 are detailed in table 1.
TABLE 1 description of three unbalanced cerebral stroke datasets with different balance rates
The stroke data sets data1, data2 and data3 with 3 different balance rates are respectively subjected to classification experiments by using the stroke fuzzy support vector machine classifier constructed by the invention, and detailed results are shown in the following table 2.
Table 2 detailed results of experiments
Data set Se Sp Acc Gm
data1 62.5 79.31 76.43 70.41
data2 76.49 71.5 72.22 73.95
data3 73.68 70.06 70.43 71.84
As shown in the experimental results in Table 2, in the classification results of three data sets, acc and Gm can reach more than 70%, and in addition, along with the increase of the total number of data set samples, the method provided by the invention has better adaptability. As can be seen from the observation of tables 1 and 2, when the total number of samples becomes large and the unbalance rate becomes high, se is gradually increased, the data classification performance of the cerebral apoplexy patient is improved, sp is gradually reduced, but the reduction amplitude is not very large, so that the invention fully considers the number relation among different types of samples and the relation among the distances among the similar samples, and can be used for solving the problem of classifying the cerebral apoplexy unbalance data set and improving the classification performance of the cerebral apoplexy unbalance data set.
The method utilizes the self-adaptive factor to construct the difference matrix, fully considers the influence of unbalance on the number of positive and negative samples, and enables the improved fuzzy membership function to be more suitable for classifying cerebral apoplexy unbalanced data sets. When the fuzzy membership function is designed, firstly, a difference matrix is set, then the membership function is divided into two parts by the relation between the number of positive and negative samples, the information entropy is used for measuring the information quantity contained in the positive and negative samples, and the relation between the distances between the samples is used for measuring the membership function based on the distance between the samples, so that a brand-new and improved fuzzy membership function is constructed. The invention not only designs the fuzzy membership function more accurately, but also provides a new idea for how to design the fuzzy membership function.
Based on the classifying method for the cerebral apoplexy unbalanced data set provided by the invention, the invention also provides a classifying system for the cerebral apoplexy unbalanced data set, see fig. 3, wherein the system comprises:
the unbalanced data set acquisition module 301 is configured to acquire a cerebral stroke unbalanced data set.
The unbalanced data set dividing module 302 is configured to randomly divide the cerebral stroke unbalanced data set into a training sample set and a test sample set according to a ratio of 7:3, where an unbalanced rate of the training sample set and the test sample set is not changed.
And a sample interval calculating module 303, configured to calculate the distance between each sample point in the training sample set.
A difference matrix construction module 304, configured to construct a difference matrix according to the distance between each sample point in the training sample set.
And the sample number statistics module 305 is configured to count the number of positive samples and the number of negative samples in the effective range of the sample points according to the difference matrix.
And the positive and negative class information amount calculating module 306 is configured to determine the positive/negative class information amount contained in the sample point according to the number of positive class samples and the number of negative class samples.
And the information quantity fuzzy membership function construction module 307 is used for constructing an information quantity fuzzy membership function according to the positive/negative type information quantity contained in the sample points.
The positive and negative class fuzzy membership function determining module 308 is configured to determine a positive/negative class fuzzy membership function based on the distance between samples according to the distance between the sample points.
The improved positive and negative fuzzy membership function construction module 309 is configured to determine an improved positive/negative fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative fuzzy membership function based on the distance between samples.
The fuzzy support vector machine classifier construction module 310 is configured to construct a fuzzy support vector machine classifier according to the modified positive/negative class fuzzy membership function.
And the unbalanced data classification module 311 is configured to classify the cerebral apoplexy unbalanced data set by using the fuzzy support vector machine classifier.
The inter-sample distance calculating module 303 specifically includes:
an inter-sample distance calculation unit for applying formula d ij =|x i -x j Computing an ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij
The difference matrix construction module 304 specifically includes:
an adaptive adjustment radius determining unit for determining the distance d between the sample points ij Positive/negative class sample adaptive adjustment radii are determined.
And the adaptive adjustment factor determining unit is used for determining the positive/negative type sample adaptive adjustment factor according to the positive/negative type sample adaptive adjustment radius.
And the difference matrix construction unit is used for constructing a difference matrix according to the positive/negative sample self-adaptive adjustment factors.
The positive and negative information amount calculation module 306 specifically includes:
a positive class information amount calculation unit for using the formula H + (x i )=-p + lnp + Determining an ith sample point x in the training sample set i The amount of positive information H contained + (x i ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein the method comprises the steps ofk=m++m - ;m + For the ith sample point x i The number of positive samples in the effective range; m is m - For the ith sample point x i Negative sample number in the effective range.
A negative information amount calculation unit for using the formula H - (x i )=-p - lnp - Determining the saidThe ith sample point x in the training sample set i The negative information content H - (x i ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein the method comprises the steps of
The information quantity fuzzy membership function construction module 307 specifically includes:
an information quantity fuzzy membership function construction unit for constructing a fuzzy membership function according to the ith sample point x i The amount of positive information H contained + (x i ) And negative class information amount H - (x i ) Using formula u 1 (x i )=1-(H + (x i )+H - (x i ) Construction of fuzzy membership function u of information quantity 1 (x i )。
The positive and negative class fuzzy membership function determining module 308 based on the distance between samples specifically includes:
a positive centripetal force calculating unit for calculating a positive centripetal force according to an ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij Using the formulaDetermining centripetal force of positive class->
A negative centripetal calculation unit for calculating a negative centripetal force according to an ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij Using the formulaDetermining centripetal force of negative class->
A positive class fuzzy membership function determining unit based on the distance between samples for determining centripetal degree according to the positive class Using the formula->Determining a positive class fuzzy membership function based on the distance between samples>Wherein delta is a positive parameter value, ">Representing positive class of centripetal force->Is a maximum value of (a).
Negative class fuzzy membership function determining unit based on distance between samples for determining centripetal degree according to the negative classUsing the formula->Determining a negative class fuzzy membership function based on the distance between samples>Wherein->Representing negative class centripetal force->Is a maximum value of (a).
The improved positive and negative class fuzzy membership function construction module 309 specifically includes:
an improved positive fuzzy membership function determining unit for determining fuzzy membership function u according to the information quantity 1 (x i ) And the positive class fuzzy membership function based on the distance between samplesUsing the formulam - Not equal to 0, and determining improved positive class fuzzy membership function u + (x i )。
An improved negative fuzzy membership function determining unit for determining the fuzzy membership function u according to the information quantity 1 (x i ) And the negative fuzzy membership function based on the distance between samplesUsing the formulam + Not equal to 0, and determining improved negative fuzzy membership function u - (x i )。
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (8)

1. A method for classifying a cerebral stroke-oriented unbalanced data set, the method comprising:
acquiring a cerebral apoplexy unbalanced data set; the data in the cerebral apoplexy unbalanced data set are classified data, a large number of normal individuals are regarded as negative samples, and a small number of sick individuals are regarded as positive samples;
the cerebral stroke imbalance data set is calculated according to 7: the 3 proportion is randomly divided into a training sample set and a test sample set, wherein the unbalance rate of the training sample set and the test sample set is not changed; the sample points in the training sample set are expressed asx i A feature vector representing the ith sample point in the cerebral stroke imbalance data set, d being the dimension of the feature vector, +.>Represents the d-th dimension feature vector, R d Means that the training sample set belongs to d-dimensional real space; by y i Representing two different class labels, y i E { -1, +1}, then y i = -1 represents a negative sample, i.e. a non-stroke patient; y is i = +1 represents a positive sample, i.e. a stroke patient; u (x) i ) Is a fuzzy membership function, represents the membership of the ith sample, represents the ith sample x i Belonging to y i Class degree, 0 < u (x i )≤1;
Calculating the distance between each sample point in the training sample set;
determining a positive/negative sample self-adaptive adjusting radius according to the distance between the sample points; determining positive/negative sample adaptive adjustment factors according to the positive/negative sample adaptive adjustment radius; constructing a difference matrix according to the positive/negative sample self-adaptive adjustment factors;
counting the number of positive class samples and the number of negative class samples in the effective range of the sample points according to the difference matrix; wherein the positive sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set, and the negative sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set;
determining the positive/negative type information quantity contained in the sample points according to the number of the positive type samples and the number of the negative type samples;
constructing an information quantity fuzzy membership function u according to the positive/negative type information quantity contained in the sample points 1 (x i );
Determining a positive/negative type fuzzy membership function based on the distance between samples according to the distance between each sample point;
determining an improved positive/negative type fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative type fuzzy membership function based on the distance between samples;
constructing a fuzzy support vector machine classifier according to the improved positive/negative class fuzzy membership function;
classifying a test sample set in the cerebral apoplexy unbalanced data set by adopting the fuzzy support vector machine classifier; inputting a cerebral apoplexy unbalanced data set to be classified into a constructed fuzzy support vector machine classifier, outputting the categories corresponding to the test data of the cerebral apoplexy unbalanced data set, and dividing the categories into cerebral apoplexy patients or non-cerebral apoplexy patients.
2. The method for classifying a cerebral stroke-oriented unbalanced data set according to claim 1, wherein determining a positive/negative sample adaptive adjustment radius according to a distance between the sample points comprises:
the positive class sample adaptive adjustment radius is defined as:
AR + =max(d ij )/Q +
the negative-type sample adaptive adjustment radius is defined as:
AR =max(d ij )/Q -
wherein max (d ij ) Representing the distance d between the individual sample points ij Positive sample adaptation factor Q + Adaptive factor Q of negative-type samples =q - =q/r; q is an adaptive factor; r is the unbalance rate corresponding to the unbalance data set, r=negative/positive number of samples.
3. The method for classifying a cerebral stroke oriented unbalanced data set according to claim 2, wherein determining the positive/negative type sample adaptive adjustment factor according to the positive/negative type sample adaptive adjustment radius specifically comprises:
the positive sample adaptive adjustment factors are:
the negative sample adaptive adjustment factors are:
4. the method for classifying a cerebral stroke-oriented unbalanced data set according to claim 3, wherein the constructed difference matrix is:
wherein t is ij The self-adaptive adjustment factor is a positive/negative sample, and n is the number of sample points in the training sample set; counting the number m of positive class samples in the effective range of the sample points according to the difference matrix + Sum of negative class sample number m -
5. The method for classifying a cerebral stroke-oriented unbalanced data set according to claim 1, wherein determining a positive/negative class fuzzy membership function based on an inter-sample distance according to a distance between the respective sample points, specifically comprises:
According to the ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij Using the formulaDetermining centripetal force of positive class->
According to the ith sample point x in the training sample set i And the j-th sample point x j Distance d between ij Using the formulaDetermining centripetal force of negative class->m + And m Respectively representing the number of positive class samples and the number of negative class samples;
centripetal force according to the positive classUsing the formula->Determining a positive class fuzzy membership function based on the distance between samples>Wherein δ is a positive parameter value; />Representing positive class of centripetal force->Is the maximum value of (2);
centripetal force according to the negative classUsing the formula->Determining a negative class fuzzy membership function based on the distance between samples>Representing negative class centripetal force->Is a maximum value of (a).
6. The method for classifying a cerebral stroke-oriented unbalanced data set according to claim 5, wherein the determining of the improved positive/negative class fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative class fuzzy membership function based on the distance between samples specifically comprises:
fuzzy membership function u according to the information quantity 1 (x i ) And the positive class fuzzy membership function based on the distance between samplesUsing the formula- >m - Not equal to 0, and determining improved positive class fuzzy membership function u + (x i );
Fuzzy membership function u according to the information quantity 1 (x i ) And the negative fuzzy membership function based on the distance between samplesUsing the formula->m + Not equal to 0, and determining improved negative fuzzy membership function u - (x i )。
7. The method for classifying a cerebral stroke-oriented unbalanced data set according to claim 1, wherein the function formula of the fuzzy support vector machine classifier is:
wherein w represents a normal vector of the hyperplane; c (C) + 、C Penalty factors respectively representing positive class samples and negative class samples, C + ,C - Is a constant; n is the number of sample points;representing a positive class fuzzy membership function, i.e. u + (x i );/>Representing a negative fuzzy membership function, i.e. u - (x i );ξ i Is a relaxation factor; y is i Representing two different class labels, phi (x i ) Representing a kernel function, d representing an offset;
obtaining an optimal classification hyperplane by solving a function formula of the fuzzy support vector machine classifier, thereby obtaining a sample point x i Category labels of (c).
8. A stroke imbalance data set oriented classification system, the system comprising:
the unbalanced data set acquisition module is used for acquiring a cerebral apoplexy unbalanced data set;
the unbalanced data set dividing module is used for dividing the cerebral apoplexy unbalanced data set according to 7: the 3 proportion is randomly divided into a training sample set and a test sample set, wherein the unbalance rate of the training sample set and the test sample set is not changed;
The sample distance calculating module is used for calculating the distance between each sample point in the training sample set;
the difference matrix construction module is used for constructing a difference matrix according to the distance between each sample point in the training sample set;
the sample number statistics module is used for counting the number of positive type samples and the number of negative type samples in the effective range of the sample points according to the difference matrix; wherein the positive sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set, and the negative sample is cerebral apoplexy patient data in the cerebral apoplexy unbalanced data set;
the positive and negative type information amount calculation module is used for determining positive/negative type information amount contained in the sample points according to the number of the positive type samples and the number of the negative type samples;
the information quantity fuzzy membership function construction module is used for constructing an information quantity fuzzy membership function according to the positive/negative type information quantity contained in the sample points;
the positive and negative class fuzzy membership function determining module is used for determining a positive/negative class fuzzy membership function based on the distance between samples according to the distance between the sample points;
the improved positive and negative fuzzy membership function construction module is used for determining an improved positive/negative fuzzy membership function according to the information quantity fuzzy membership function and the positive/negative fuzzy membership function based on the distance between samples;
The fuzzy support vector machine classifier construction module is used for constructing a fuzzy support vector machine classifier according to the improved positive/negative class fuzzy membership function;
and the unbalanced data classification module is used for classifying the test sample set in the cerebral apoplexy unbalanced data set by adopting the fuzzy support vector machine classifier.
CN202310944187.XA 2019-11-28 2019-11-28 Cerebral apoplexy-oriented unbalanced data set classification method and system Pending CN116933166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310944187.XA CN116933166A (en) 2019-11-28 2019-11-28 Cerebral apoplexy-oriented unbalanced data set classification method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911189087.0A CN110991517A (en) 2019-11-28 2019-11-28 Classification method and system for unbalanced data set in stroke
CN202310944187.XA CN116933166A (en) 2019-11-28 2019-11-28 Cerebral apoplexy-oriented unbalanced data set classification method and system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201911189087.0A Division CN110991517A (en) 2019-11-28 2019-11-28 Classification method and system for unbalanced data set in stroke

Publications (1)

Publication Number Publication Date
CN116933166A true CN116933166A (en) 2023-10-24

Family

ID=70087703

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310944187.XA Pending CN116933166A (en) 2019-11-28 2019-11-28 Cerebral apoplexy-oriented unbalanced data set classification method and system
CN201911189087.0A Pending CN110991517A (en) 2019-11-28 2019-11-28 Classification method and system for unbalanced data set in stroke

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201911189087.0A Pending CN110991517A (en) 2019-11-28 2019-11-28 Classification method and system for unbalanced data set in stroke

Country Status (1)

Country Link
CN (2) CN116933166A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814917B (en) * 2020-08-28 2020-11-24 成都千嘉科技有限公司 Character wheel image digital identification method with fuzzy state
CN114841294B (en) * 2022-07-04 2022-10-28 杭州德适生物科技有限公司 Classifier model training method and device for detecting chromosome structure abnormality

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355198A (en) * 2016-08-23 2017-01-25 衢州学院 Method for acquiring fuzzy support vector machine membership function
CN107978311B (en) * 2017-11-24 2020-08-25 腾讯科技(深圳)有限公司 Voice data processing method and device and voice interaction equipment
CN108335744B (en) * 2018-04-03 2019-01-11 江苏大学附属医院 A kind of emergency cardiovascular care network system and its method for early warning of classifying
CN109934280A (en) * 2019-03-07 2019-06-25 贵州大学 A kind of unbalanced data classification method based on PSO-DEC-IFSVM sorting algorithm

Also Published As

Publication number Publication date
CN110991517A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110443798B (en) Autism detection method, device and system based on magnetic resonance image
Gürüler A novel diagnosis system for Parkinson’s disease using complex-valued artificial neural network with k-means clustering feature weighting method
CN109935336B (en) Intelligent auxiliary diagnosis system for respiratory diseases of children
KR102556896B1 (en) Reject biased data using machine learning models
CN109785976A (en) A kind of goat based on Soft-Voting forecasting system by stages
CN109840554B (en) Alzheimer&#39;s disease MRI image classification method based on SVM-RFE-MRMR algorithm
CN108763590B (en) Data clustering method based on double-variant weighted kernel FCM algorithm
CN112633601B (en) Method, device, equipment and computer medium for predicting disease event occurrence probability
US20110179044A1 (en) Morphological analysis
CN111161814A (en) DRGs automatic grouping method based on convolutional neural network
Potdar et al. A comparative study of machine learning algorithms applied to predictive breast cancer data
CN117078026B (en) Wind control index management method and system based on data blood margin
CN116933166A (en) Cerebral apoplexy-oriented unbalanced data set classification method and system
CN113674862A (en) Acute renal function injury onset prediction method based on machine learning
Nababan et al. Implementation of K-Nearest Neighbors (KNN) algorithm in classification of data water quality
US20220101062A1 (en) System and a Method for Bias Estimation in Artificial Intelligence (AI) Models Using Deep Neural Network
Li et al. A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy
CN114494263A (en) Medical image lesion detection method, system and equipment integrating clinical information
El-Habil et al. A comparative study between linear discriminant analysis and multinomial logistic regression
CN117195027A (en) Cluster weighted clustering integration method based on member selection
CN117315379A (en) Deep learning-oriented medical image classification model fairness evaluation method and device
CN112233742A (en) Medical record document classification system, equipment and storage medium based on clustering
CN109191452B (en) Peritoneal transfer automatic marking method for abdominal cavity CT image based on active learning
CN116013527A (en) CV-MABAC hypertension age bracket prediction method based on entropy
Westari et al. Performa Comparison of the K-Means Method for Classification in Diabetes Patients Using Two Normalization Methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination