CN107845407A - Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined - Google Patents

Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined Download PDF

Info

Publication number
CN107845407A
CN107845407A CN201710733507.1A CN201710733507A CN107845407A CN 107845407 A CN107845407 A CN 107845407A CN 201710733507 A CN201710733507 A CN 201710733507A CN 107845407 A CN107845407 A CN 107845407A
Authority
CN
China
Prior art keywords
msub
feature
mrow
clusters
body composition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710733507.1A
Other languages
Chinese (zh)
Inventor
陈波
俞洁
高秀娥
郑庆国
白旭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201710733507.1A priority Critical patent/CN107845407A/en
Publication of CN107845407A publication Critical patent/CN107845407A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The invention discloses a kind of human body physiological characteristics selection algorithm being combined based on filtering type and improvement cluster, including:S1:Impedance model is selected, collects fisrt feature parameter and second feature supplemental characteristic structure initial characteristicses collection and final optimal subset;S2:Filter algorithm is introduced, for each feature in the data that are collected into;S3:Feature set is ranked up from big to small according to HSIC value;S4:The feature of K before ranking is added in feature set, parameter uncorrelated to body composition is filtered off using Filter algorithms, builds initial data set;S5:According to clustering algorithm by dataset construction feature sparse graph;S6:Redundancy feature in cluster is screened using improved clustering algorithm;The human body physiological characteristics selection algorithm that the application establishes can improve human body composition precision of prediction, and more efficiently detection means is provided for body composition Study and clinical practice.

Description

Human physiological feature selection algorithm based on combination of filtering and improved clustering
Technical Field
The invention belongs to the field of bioinformatics, and particularly relates to a human physiological characteristic selection algorithm based on combination of filtering and improved clustering.
Background
The equilibrium state of body components plays an important role in maintaining the stability of the environment in the body, and is an important factor influencing the health of the human body. When a disease occurs, the body composition changes often earlier than the clinical symptoms of the disease. Therefore, the change of the body composition of the human body can be used for carrying out correlation prediction on diseases such as hypertension, dyslipidemia, metabolic syndrome and the like. However, there are many relevant parameters affecting body composition, and there are features of high non-linearity, redundancy, irrelevance, etc. among the parameters.
The redundancy characteristic of the existing Wrapper algorithm is removed, the method can obtain better generic performance, but the algorithm is not suitable for large-scale data sets due to high complexity; the Filter algorithm gives each feature a weight value according to a criterion calculation result, the calculation efficiency is high, but the redundancy among the features is not fully considered in the method, and the selected feature subset is likely to have a large amount of redundancy; the clustering method divides the body composition parameter data into a plurality of groups or clusters in an object-oriented manner, so that the objects in the clusters have high similarity, and the judgment is carried out according to the distance between each cluster and a central point, so that redundant features are effectively screened out, but irrelevant features cannot be effectively screened out. In view of this, before analyzing the high-dimensional data of the volume components, it is necessary to provide a new method for processing the data in a dimension reduction way.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a human physiological feature selection algorithm based on the combination of filtering and improved clustering, firstly, a Filter feature selection algorithm is used for removing features irrelevant to the body component classification, and then, an M-Chameleon feature clustering method is used for removing redundant features, so that the advantages of both the Filter feature selection algorithm and the feature clustering are brought into full play. The human body composition prediction model established in the way can improve the human body composition prediction precision and provide a more effective detection means for human body composition research and clinical application.
In order to achieve the above object, the present invention provides a human physiological feature selection algorithm based on a combination of filtering and improved clustering, comprising:
s1: selecting an impedance model, collecting data of a first characteristic parameter and a second characteristic parameter to construct an initial characteristic set and a final optimal subset, and initializing the initial characteristic set and the final optimal subset into an empty set;
further, body composition data measured by a human body composition analyzer (INBODY) is used as a data set and is marked as T ═ O, F, C, wherein O is a data sample set, F is a selection feature set, and C is a body composition classification; using parameter set having important influence on human body composition, such as weight, height, age, sex, impedance value of each section of human body, etc. as first characteristic parameter, and reciprocal 1/R of each section of impedanceiSquare Ri 2、RiRjAs a second characteristic parameter. The impedance value selects a 1KHZ impedance parameter in INBODY, a first characteristic parameter (R1, R2, R3, R4, R5 and A, H, W, wherein A is age, H is height and W is weight) and a second characteristic parameter (1/R1, 1/R2, 1/R3, 1/R4, 1/R5, R1R2, R1R3, R1R4, R1R5, R2R3, R2R4, R2R5, R3R4, R3R5, R5R4 and the like) as original characteristic parameter sets, and is recorded as F ═ F { (F) } F1,f2,…,fm}; the body composition classification set C includes Body Fat Mass (BFM) and total water mass (TBW).
S2: introducing a filtering algorithm (Filter), and calculating an HSIC value under a body composition class C for each characteristic in the collected data, wherein the HSIC value is used for representing the correlation size of the physiological characteristic and the body composition class;
further, for each feature { f1,f2,…,fmE.g. F, defines a non-linear feature mapping φ:the mapping may map the feature points f1,f2,…,fmMapping to a regenerative kernel Hilbert spaceIn (1), the kernel function is:in the formula:space(s)Inner product of (d). Similarly, an individual component classification map ψ is defined:the volume composition index C space is mapped to a regeneration core Hilbert space and is recorded asIn (1), the kernel function is:in addition, the cross covariance operator that defines the feature and body composition classes is:in the formulaThe product of the tensors is represented by,andindicating a desire. For each feature f1,f2,…,fmBelongs to F, an HSIC value under a body composition class c is calculated (HSIC is an independence measurement method based on a kernel, a cross covariance operator is defined on a regeneration kernel Hilbert space, an independence judgment criterion is obtained through empirical estimation of operator norm, the similarity between two data distributions can be measured, and the method is widely applied to feature selection and dimension reduction), and the value represents the correlation size of physiological features and the body composition class:
for a certain feature f and body composition class c, a larger value of HSIC indicates a stronger dependency of c on f.
S3: sorting the feature set from large to small according to the value of HSIC;
s4: adding the characteristics of K before ranking into a characteristic set, filtering out parameters irrelevant to body composition by using a Filter algorithm, and constructing an initial data set;
s5: and constructing a characteristic sparse graph for the data set according to a clustering algorithm (M-chameleon). RI is an edge set of the mutual connection among the characteristics, RC is the similarity among the characteristics, and the number k of the expected clusters is initialized;
furthermore, Chameleon uses a coacervation hierarchical clustering method to construct a feature sparse graph according to a method of a K-nearest neighbor graph, each vertex in the graph represents a data object, an edge exists between the two vertices, the similarity of the objects can be reflected by using the weighting of the edges, and the algorithm principle is as shown in fig. 1. The similarity of the feature sub-clusters is evaluated according to two points: 1) interconnection of objects in the cluster; 2) proximity of clusters. If the interconnectivity of two feature clusters is high and the distance is close, the feature clusters with longer distance will be merged and replaced. And determining the similarity between the two features according to the relative interconnection degree RI and the relative approximation degree RC of the two feature clusters. Giving a normalized and Filter-filtered feature data set F ═ F1,f2,…,fm}, the data cluster F is divided into sub-clusters F1And f2Dividing F into two1And f2And the weight of the cut edge is the smallest, the feature sub-cluster f1And f2The greater the relative interconnectivity there between. Two feature clusters f1And f2Relative degree of interconnection RI (f)1,f2) Is defined as a feature cluster f1And f2Relative degree of interconnection between, with respect to two clusters f1And f2The internal interconnection degree of (c) is normalized, namely:
wherein,is composed of f1And f2The edge of the cluster is cut, and in the same way,orIs to mix f1(or f)2) The minimum sum of the edge cuts divided into two parts that are approximately equal.
Two feature clusters f1And f2Relative degree of approximation RC (f)1,f2) Is defined as f1And f2Absolute approximation between, with respect to two feature clusters f1And f2The normalization of the internal approximation of (a), namely:
wherein,is connecting f1Vertex sum f2The average weight of the edges of the vertices,(or) Is the minimum of two clusters f1(or f)2) Average weight of the edge of (1). By feature sub-cluster f1And f2Determines the similarity between two sub-clusters.
S6: screening redundant features in the clusters by using an improved clustering algorithm;
s61: calculating the distance between clusters and sequencing the distances, and judging whether the number h of the sample sub-clusters is equal to the number k of the initialized expected clusters; s62: if not, selecting two sub-clusters with the maximum similarity function value for merging, and if equal, ending; s63: recalculating the relative similarity RC of the new sub-clusters, traversing all the sub-clusters, and judging whether all the sub-clusters are merged every two; s64: if all the sub-clusters attempt to merge, returning to S61; otherwise, merging the two sub-clusters with the minimum similarity function and returning to S63; s65: and selecting the characteristics with the maximum HSIC value for combination.
S7: and selecting one feature with the highest HSIC value from each feature cluster to combine into an optimal feature set.
Due to the adoption of the technical scheme, the invention can obtain the following technical effects: according to the characteristics of human body physiological characteristic parameters, a human body characteristic parameter selection algorithm based on combination of Filter and clustering is provided, characteristics irrelevant to categories are removed by using a characteristic filtering method of Hilbert-Schmidt dependency criterion, improved Chameleon clustering is used for characteristic selection and optimized improvement, redundant characteristics are well removed, an optimal characteristic parameter set for a structure component model is effectively selected, the problems of multiple human body physiological characteristic parameters and redundancy are solved, and a more effective detection means is provided for human body component research and clinical application.
Drawings
FIG. 1 is a schematic diagram of a Chameleon clustering algorithm;
FIG. 2 is a schematic diagram of an improved Chameleon algorithm;
FIG. 3 is a human body feature parameter selection process;
FIG. 4 is a correlation between characteristic parameters obtained by using a filtering algorithm and BFM in a 1KHZ frequency band;
FIG. 5 is a characteristic parameter and BFM correlation degree obtained by using a filtering algorithm under a 250KHZ frequency band;
FIG. 6 is a BFM correlation of characteristic parameters obtained by using a filtering algorithm under a 500KHZ frequency band
FIG. 7 is an analysis of the number of parameter clusters after using a filtering algorithm;
FIG. 8 shows the distance between the characteristic parameter and the BFM index when different sample numbers are grouped into four categories;
FIG. 9 shows the comparison between the predicted value and the actual value of the BFM model;
FIG. 10 shows the comparison of the BFM model predicted values with respect to the error.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
Taking body composition data measured by INBODY as a data set, and recording as T ═ O, F and C; using parameter set having important influence on human body composition, such as weight, height, age, sex, impedance value of each segment of human body, etc. as first characteristic parameter, and reciprocal 1/R of each segment impedanceiSquare Ri 2、RiRjAs a second characteristic parameter. The INBODY measuring frequency ranges comprise three frequency ranges of 1KHZ, 250KHZ and 500KHZ, and the relation between body composition and characteristic parameters is researched under the conditions of the three frequency ranges and different sample sizes. Wherein a first characteristic parameter (R) is selected1、R2、R3、R4、R5A, H, W) and a second characteristic parameter 1/R1、1/R2、1/R3、1/R4、1/R5、R1R2、R1R3、R1R4、R1R5、R2R3、R2R4、R2R5、 R3R4、R3R5、R4R5As the original feature parameter set, it is noted that F ═ F1,f2,…,fm}; the body composition classification set C comprises Body Fat Mass (BFM) and total water mass (TBW); the body composition classification set C includes Body Fat Mass (BFM) and total water mass (TBW). The following table lists some of the sample data sets.
However, there are many relevant parameters affecting body composition, and there are features of high non-linearity, redundancy, irrelevance, etc. among the parameters. In view of the above problems, it is necessary to provide a method for performing dimension reduction processing on data to solve the problem of redundant and irrelevant characteristic parameters. The clustering method divides the body composition parameter data into a plurality of groups or clusters in an object-to-object manner, so that the objects in the clusters have high similarity, and judgment is carried out according to the distance between each cluster and a central point, and redundant features are effectively screened out. Meanwhile, before the high-dimensional data of the volume composition is analyzed, the attributes irrelevant to the required features are eliminated through a step of reducing the number of the features;
therefore, the data should first be subjected to a filtering algorithm. Given a set of raw features, F ═ F1,f2,…,fmData sample set O ═ O }1,o2,…,onAnd (4) running a filtering algorithm on the first 100 human samples under three frequency bands of 1KHZ, 250KHZ and 500KHZ, and listing the correlation degree of the filtered characteristic parameters obtained after running the algorithm on the body composition BFM in the lower graph.
In the formula:space(s)Inner product of (d). Similarly, a volume composition class mapping ψ is defined:the volume composition index C space is mapped to a regeneration core Hilbert space and is recorded asIn (3), the corresponding kernel function is:
the kernel function can calculate the inner product of two characteristic points between the characteristic space projection without explicitly calculating a specific mappingWithout paying the computational cost implied by the dimensionality. The cross-covariance operator, which can thus define the feature and body composition classes, is:
in the above formulaThe product of the tensors is represented by,andexpress expectation[16]The norm of the square of this covarianceCalled HSIC: the expression is[14]
After the BFM of the lower body components with different impedances is run by using the filter algorithm, the correlation condition can be obtained, as shown in fig. 4, 5 and 6, as can be seen from the above three graphs, when the impedance frequency band is gradually increased, the value of the impedance is also continuously reduced, and the BFM information amount contained in each characteristic parameter is gradually reduced. Selecting characteristic parameters according to the confidence interval of 80 percent as screening, and summarizing the characteristics after the filter algorithm is operated under different frequency bands as shown in the following table 2:
table 2: characteristics of running filter algorithm under different frequency bands
As can be seen from Table 2, the algorithm herein greatly reduces the number of primitive feature sets, and the 250KHZ frequency band features are more aggregated. Therefore, the characteristics of the filtered intermediate impedance frequency band of 250KHZ are selected for cluster analysis, and redundant information is screened out.
Before clustering, firstly, judging that the feature parameters are clustered into several classes, and respectively calculating the number of information contained in different clustering conditions according to the screened feature parameters, as shown in fig. 7, the analysis shows that the selected feature information can be better represented by dividing the feature parameters and body components into 4 classes. When the number of samples was 20, 40, 60, 80, and 100, as can be seen from FIG. 8, the cluster variation was not large, 1/R4,1/R5In the form of a polymer A, H, W, R5,R4In the form of a polymer, R4R5,R1R2,R2 2,R1 2,R5 2In the form of a polymer, R2R3,R1R3Are gathered into one group. The characteristic parameters obtained by the Filter algorithm can be removed from 1/R far away from the cluster center BFM after being clustered into 4 classes4, R4,R1 2,R1R3. Table 3 lists the feature parameter selection after Filter and clustering algorithm.
Table 3: characteristic parameters after Filter and clustering algorithm
Table 4 lists the candidate feature set and time complexity for body composition BFM prediction using three feature selection methods;
table 4: optimal feature set and complexity comparison
As can be seen from Table 4, under the condition that the dimensions of the data sets are the same, the number of the candidate feature sets obtained by using the algorithm and the time complexity thereof are both smaller than those of the feature selection algorithms of Filter, Wrapper and mRMR;
in order to verify the performance of the characteristic selection algorithm, for body composition (BFM), the characteristic selection is carried out by respectively using an mRMR, Filter and Wrapper combined characteristic selection algorithm and the characteristic selection algorithm, and in order to accurately measure the quality degree of the candidate characteristic set under the given body composition BFM, the first 80 of the sample sets are taken as training sample sets and are marked as T1={(x1,y1),(x2,y2),…,(x80,y80) The last 20 were used as test sample sets
T2={(x81,y81),(x82,y82),…,(x100,y100) In which xi∈RlFor the input characteristic parameter value, as an argument, yiE.R is the actual body composition value and is used as a dependent variable; using multiple linear regression in SPSS software on T1And (5) training. Table 5 shows a summary of the models obtained by regression modeling of BFM using the feature set described above:
table 5: model collection (modified)
a. Predictor variables (constants), W, S, A, R3,1/R2,1/R1,1/R3,R4 2,R4R5,R5 2
b. Predictor variable (constant), 1/R3,W,S,R2 2,R4 2,R4R5,R5 2,1/R1,R5
c. Predictor variables (constants), A, H, W, R5,R1R2,R2R3,R4R5,1/R5,R2 2,R5 2
As can be seen from table 5, the correlations between the physiological feature sets in models 1, 2 and 3 and BFM are 0.927, 0.906 and 0.978, respectively, so the correlation between the feature sets obtained by using the algorithm herein and body composition is strongest;
according to the obtained regression coefficients of the models, a prediction equation is listed:
BFM1=0.041*W+0.126*S+0.523*A-0.212*R3+0.171*1/R1+0.126*1/R2+0.179*1/R3+0.132R2 4+0.13R4R5+0.127R2 5-8.56(1)
BFM2=0.313*W-0.044*S-0.125*1/R3+0.108*1/R1+0.016*R4 2-0.01R2 2+0.071R5 2+0.072R4R5-0.526R5+5.674 (2)
BFM3=-0.464*A-0.15*H+0.122*W-0.143*R5+0.129*R1R2+0.122*R2R3-0.134*R4R5+0.145*1/R5+0.129*R2 2-0.141*R5 2(3)
using the obtained prediction model to test set T2And predicting and comparing with the actual value to obtain a comparison graph 9 of the predicted value and the actual value of the BFM model and an error analysis graph 10. As can be seen from fig. 10, the prediction model constructed using the features obtained by the feature selection algorithm herein has high accuracy, and the prediction relative error is less than 0.12. The result shows that the human physiological characteristics based on the combination of the filter and the clusteringThe characteristic set acquired by the selection algorithm shows good correlation with the body composition, so that the fitting precision of the body composition prediction model can be improved, and the prediction error can be reduced.
Compared with the prior art, the invention provides a human body physiological characteristic selection algorithm based on combination of Filter and clustering. The method comprises the steps of removing features irrelevant to categories by using a feature filtering method of Hilbert-Schmidt dependency criterion, using improved Chameleon clustering in feature selection and carrying out optimization improvement, well removing redundant features, effectively selecting an optimal feature parameter set for a structure component model, and solving the problems of more human body physiological feature parameters and redundancy; the human body composition prediction model established in the way can improve the human body composition prediction precision and provide a more effective detection means for human body composition research and clinical application.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and equivalent alternatives or modifications according to the technical solution and the inventive concept of the present invention should be covered by the scope of the present invention.

Claims (8)

1. Human physiological characteristic selection algorithm based on combination of filtering and improved clustering, which is characterized by comprising the following steps:
s1: selecting an impedance model, collecting data of a first characteristic parameter and a second characteristic parameter to construct an initial characteristic set and a final optimal subset, and initializing the initial characteristic set and the final optimal subset into an empty set;
s2: introducing a filtering algorithm, and calculating an HSIC value under the body composition classification for each feature in the collected data, wherein the value represents the correlation size of the physiological feature and the body composition classification;
s3: sorting the feature set from large to small according to the value of HSIC;
s4: adding the characteristics of K before ranking into a characteristic set, filtering out parameters irrelevant to body composition by using a filtering algorithm, and constructing an initial data set;
s5: constructing a characteristic sparse graph from a data set according to a clustering algorithm, wherein RI is an edge set of characteristics which are mutually connected, and RC is the similarity between the characteristics, and initializing the number k of expected clusters;
s6: screening redundant features in the clusters by using an improved clustering algorithm;
s7: and selecting one feature with the highest HSIC value from each feature cluster to combine into an optimal feature set.
2. The human physiological feature selection algorithm based on the combination of filtering and improved clustering according to claim 1, wherein the body composition data measured by the human body composition analyzer is used as a data set, denoted as T ═ O, F, C, where O is a data sample set, F is a selection feature set, and C is a body composition classification; taking the parameter set having important influence on the body composition of human body as the first characteristic parameter, and the reciprocal 1/R of each section impedanceiSquare Ri 2、RiRjAs a second characteristic parameter; wherein, the impedance value selects 1KHZ impedance parameter in the human body composition analyzer, the first characteristic parameters R1, R2, R3, R4, R5, A, H, W and the second characteristic parameters 1/R1, 1/R2, 1/R3, 1/R4, 1/R5, R1R2, R1R3, R1R4, R1R5, R2R3, R2R4, R2R5, R3R4, R3R5 and R5R4 are used as original characteristic parameter sets, and F { F ═1,f2,···,fm},f1,f2,···,fmThe characteristics are that A is age, H is height, W is weight, and the body composition classification C includes body fat amount and total water amount.
3. The human physiological feature selection algorithm based on a combination of filtering and improved clustering according to claim 1 or 2, wherein for each feature in the collected data, the HSIC values at the body composition class C are calculated, specifically:
for each feature f1,f2,···,fmE.g. F, define a nonlinear feature mappingThe mapping maps the feature points f1,f2,···,fmMapping to a regenerative kernel Hilbert spaceIn (1), the volume composition index C space is mapped to the regeneration core Hilbert space and is recorded asIn (1), the kernel function is:for each feature f1,f2,···,fmE.g. F, calculating HSIC value under the body composition classification C.
4. The human physiological feature selection algorithm based on a combination of filtering and improved clustering according to claim 3, wherein the cross covariance operator for defining the feature and body composition classes is:in the formulaThe product of the tensors is represented by,andindicating a desire; the HSIC value characterizes the magnitude of the correlation of physiological features with body composition classes:
for a certain feature f and body composition class c, a larger value of HSIC indicates a stronger dependency of c on f.
5. The human physiological feature selection algorithm based on the combination of filtering and improved clustering according to claim 1, wherein a feature sparse graph is constructed according to a K-nearest neighbor graph method by using a Chameleon agglomerative hierarchical clustering method, each vertex in the graph represents a data object, an edge exists between the two vertices, the similarity of the objects can be reflected by using the weighting of the edges, and the similarity of feature sub-clusters is evaluated according to two points: 1) interconnection of objects in the cluster; 2) proximity of clusters; and determining the similarity between the two features according to the relative interconnection degree RI and the relative approximation degree RC of the two feature clusters.
6. The human physiological feature selection algorithm based on a combination of filtering and improved clustering according to claim 4, wherein given the normalized feature data set F ═ { F ═ F filtered using a filtering algorithm1,f2,···,fm}, the data cluster F is divided into sub-clusters F1And f2Dividing F into two1And f2And the weight of the cut edge is the smallest, the feature sub-cluster f1And f2The greater the relative interconnectivity between; two feature clusters f1And f2Relative degree of interconnection RI (f)1,f2) Is defined as a feature cluster f1And f2Relative degree of interconnection between, with respect to two clusters f1And f2The internal interconnection degree of (c) is normalized, namely:
<mrow> <mi>R</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>EC</mi> <mrow> <mo>{</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>}</mo> </mrow> </msub> <mo>|</mo> </mrow> <mrow> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mrow> <mo>(</mo> <mo>|</mo> <mi>E</mi> <mi>C</mi> <msub> <mo>|</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> </msub> <mo>+</mo> <mo>|</mo> <mi>E</mi> <mi>C</mi> <msub> <mo>|</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
wherein,is composed of f1And f2The edge of the cluster is cut, and in the same way,orIs to mix f1Or f2The minimum sum of the edge cuts divided into two parts that are approximately equal.
7. The human physiological feature selection algorithm based on a combination of filtering and improved clustering according to claim 5,
two feature clusters f1And f2Relative degree of approximation RC (f)1,f2) Is defined as f1And f2Relative degree of approximation between, with respect to two feature clusters f1And f2The normalization of the internal approximation of (a), namely:
<mrow> <mi>R</mi> <mi>C</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msub> <mover> <mi>S</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <msub> <mi>EC</mi> <mrow> <mo>{</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>}</mo> </mrow> </msub> </mrow> </msub> <mrow> <mfrac> <mrow> <mo>|</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>|</mo> <mo>+</mo> <mo>|</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>|</mo> </mrow> </mfrac> <msub> <mover> <mi>S</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <msub> <mi>EC</mi> <msub> <mi>f</mi> <mn>1</mn> </msub> </msub> </mrow> </msub> <mo>+</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>|</mo> <mo>+</mo> <mo>|</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>|</mo> </mrow> </mfrac> <msub> <mover> <mi>S</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <msub> <mi>EC</mi> <msub> <mi>f</mi> <mn>2</mn> </msub> </msub> </mrow> </msub> </mrow> </mfrac> </mrow>
wherein,is connecting f1Vertex sum f2The average weight of the edges of the vertices,orIs the minimum of two clusters f1Or f2The average weight of the edge of (a);
by feature sub-cluster f1And f2Relative interconnectivity and relative proximity of the two sub-clusters determines the similarity between the two sub-clusters.
8. The human physiological feature selection algorithm based on the combination of filtering and improved clustering according to claim 1, wherein the improved clustering algorithm is characterized in that all feature sub-clusters are traversed and tried to be merged and replaced, the feature selection quality is evaluated after the sub-clusters are merged, and merging is tried to be performed between every two existing sub-clusters; the method comprises the following specific steps:
s61: calculating the distance between clusters and sequencing the distances, and judging whether the number h of the sample sub-clusters is equal to the number k of the initialized expected clusters;
s62: if not, selecting two sub-clusters with the maximum similarity function value for merging, and if equal, ending;
s63: recalculating the relative similarity RC of the new sub-clusters, traversing all the sub-clusters, and judging whether all the sub-clusters are tried to be combined pairwise;
s64: if all the sub-clusters attempt to merge, returning to S61; otherwise, merging the two sub-clusters with the minimum similarity function and returning to S63;
s65: and selecting the characteristics with the maximum HSIC value for combination.
CN201710733507.1A 2017-08-24 2017-08-24 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined Pending CN107845407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710733507.1A CN107845407A (en) 2017-08-24 2017-08-24 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710733507.1A CN107845407A (en) 2017-08-24 2017-08-24 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined

Publications (1)

Publication Number Publication Date
CN107845407A true CN107845407A (en) 2018-03-27

Family

ID=61683251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710733507.1A Pending CN107845407A (en) 2017-08-24 2017-08-24 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined

Country Status (1)

Country Link
CN (1) CN107845407A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210559A (en) * 2019-05-31 2019-09-06 北京小米移动软件有限公司 Object screening technique and device, storage medium
CN110363229A (en) * 2019-06-27 2019-10-22 岭南师范学院 A kind of characteristics of human body's parameter selection method combined based on improvement RReliefF and mRMR

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007002991A1 (en) * 2005-07-01 2007-01-11 Impedimed Limited Monitoring system
CN106485086A (en) * 2016-10-19 2017-03-08 大连大学 Human body composition Forecasting Methodology based on AIC and improvement entropy assessment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007002991A1 (en) * 2005-07-01 2007-01-11 Impedimed Limited Monitoring system
CN106485086A (en) * 2016-10-19 2017-03-08 大连大学 Human body composition Forecasting Methodology based on AIC and improvement entropy assessment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘文凤 等: "Chameleon聚类算法的Weka实现", 《计算机系统应用》 *
吴金峰: "基于支持向量机的人体体成分预测模型研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210559A (en) * 2019-05-31 2019-09-06 北京小米移动软件有限公司 Object screening technique and device, storage medium
CN110210559B (en) * 2019-05-31 2021-10-08 北京小米移动软件有限公司 Object screening method and device and storage medium
CN110363229A (en) * 2019-06-27 2019-10-22 岭南师范学院 A kind of characteristics of human body's parameter selection method combined based on improvement RReliefF and mRMR
WO2020258973A1 (en) * 2019-06-27 2020-12-30 岭南师范学院 Human body feature parameter selection method based on improved rrelieff in combination with mrmr
CN110363229B (en) * 2019-06-27 2021-07-27 岭南师范学院 Human body characteristic parameter selection method based on combination of improved RReliefF and mRMR

Similar Documents

Publication Publication Date Title
CN111695626B (en) High-dimensionality unbalanced data classification method based on mixed sampling and feature selection
Pan et al. Learning imbalanced datasets based on SMOTE and Gaussian distribution
Carvalho et al. Breast cancer diagnosis from histopathological images using textural features and CBIR
Wang et al. Risk assessment of coronary heart disease based on cloud-random forest
Van Hulse et al. Feature selection with high-dimensional imbalanced data
Antonelli et al. Analysis of diabetic patients through their examination history
US7653646B2 (en) Method and apparatus for quantum clustering
CN112633601B (en) Method, device, equipment and computer medium for predicting disease event occurrence probability
Hung et al. Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation
CN111009321A (en) Application method of machine learning classification model in juvenile autism auxiliary diagnosis
Tomar et al. A survey on pre-processing and post-processing techniques in data mining
CN106250442A (en) The feature selection approach of a kind of network security data and system
SG188469A1 (en) Method for providing with a score an object, and decision-support system
CN111989747A (en) Spectrophotometry and apparatus for predicting quantification of a component in a sample
Zhou et al. A hybrid feature selection method RFSTL for manufacturing quality prediction based on a high dimensional imbalanced dataset
CN111554402A (en) Machine learning-based method and system for predicting postoperative recurrence risk of primary liver cancer
Chicho et al. Machine learning classifiers based classification for IRIS recognition
Huang et al. Multilabel feature selection using relief and minimum redundancy maximum relevance based on neighborhood rough sets
CN107845407A (en) Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined
Brusco et al. A comparison of spectral clustering and the walktrap algorithm for community detection in network psychometrics.
Rahim et al. Cross-Validation and Validation Set Methods for Choosing K in KNN Algorithm for Healthcare Case Study
CN112378942B (en) White spirit grade classification and identification method based on nuclear magnetic resonance fingerprint
Bazan et al. Comparison of aggregation classes in ensemble classifiers for high dimensional datasets
Kuzudisli et al. Effect of recursive cluster elimination with different clustering algorithms applied to gene expression data
Chatur et al. Effectiveness evaluation of regression models for predictive data-mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180327