Disclosure of Invention
The invention provides a functional magnetic resonance image data classification method based on a super-network fusion characteristic, which aims to solve the problems of information one-sided and low classification precision in a single attribute characteristic quantization brain function super-network model.
The invention adopts the technical scheme that a functional magnetic resonance image data classification method based on the super-network fusion attribute characteristics is specifically carried out according to the following steps:
step S1: preprocessing the resting state functional magnetic resonance image data, and extracting a time sequence according to a brain map template.
Step S2: in the research, a composition MCP method is used for solving a sparse linear regression model, so that a hyper-network is obtained;
step S3: compute 11 different topology attributes of the super network: the method comprises the following steps of including 3 different single-node-based super-network clustering coefficients (hereinafter represented by HCC), 5 different super-network mutual clustering coefficients (hereinafter represented by ComHCC), average shortest path (hereinafter represented by dist), point degree (hereinafter represented by n) and betweenness center degree (hereinafter represented by B);
step S4: adopting KS (Kolmogorov-Smimov) test as a feature selection method on a training set, and using p < 0.09 as a threshold value of feature selection;
step S5: using a Support Vector Machine (SVM) as a classifier, using fusion attributes as features (selecting difference features obtained after statistical analysis as classification features), using a given regularization parameter C and a given optimal feature subset to construct a classification model, and then adopting a cross validation method to test the constructed classifier;
step S6: the invention adopts a mutual information analysis method to respectively calculate the effectiveness and the redundancy of the characteristics, and then screens out the characteristics with higher effectiveness and lower redundancy according to the calculation result, thereby obtaining the optimal fusion characteristic set.
Further, in step S1, the static functional magnetic resonance image data is preprocessed, where the preprocessing at least includes temporal layer correction, panning correction, joint registration, spatial normalization, and low-frequency filtering. The fMRI data used by the invention has noise influence caused by the model of equipment, the head movement of a tested head and the like in the acquisition process. Therefore, the fMRI data needs to be preprocessed first to improve the signal-to-noise ratio of the image. Then, the influence is normalized to the selected standard space by methods such as local nonlinear transformation and the like.
The extraction of the average time sequence of each segmented brain region comprises the following specific steps: and extracting activation signals of all voxels in each brain region at different time points, and performing arithmetic mean on the activation signals of all voxels at different time points to obtain a mean time sequence of the brain regions.
Further, in step S2, solving the sparse linear regression model specifically includes:
the sparse linear regression model is as follows:
xk=Akαk+τk (1)
response vector xk∈RNRepresents the average time series of the kth region of interest (ROI). In particular, Ak=[x1,...,xk-1,0,xk+1,…,xK]∈RN×PA matrix representing the average time series of other ROIs except the k-th ROI whose response vector is set to zero. Alpha is alphak∈RpIs a weight vector representing the degree of influence of other ROIs on the kth ROI. Alpha is alphakThe ROIs corresponding to the zero elements represent no interaction with the selected k-th ROI, the brain regions and the selected brain regions are considered to be mutually independent, and the brain regions with the interaction are effectively represented by the mutually independent connection zero setting method. Tau iskIs a noise term.
Solving sparse regression models includes a variety of approaches. By introducing different penalty terms, the methods for solving the sparse regression model are different, and the methods for constructing the hyper-network are different. However, the traditional research is to construct brain function super-network by the Lasso method. Furthermore, some studies have extended the Lasso approach to further improve the construction of the super-network, taking into account the group effect problem between brain regions. However, the above methods all have the same problem that the excessive compression of the coefficients by the penalty function leads to biased estimation of the regression coefficients of the target variables in the model, so that the hyper-network constructed by using the method and extending the method may lose some important connections. Therefore, the invention provides a composite MCP method to create a super network, and further improves the construction of a brain function super network, thereby better simulating the complex multi-element interaction of the human brain. The composite MCP method is an extension of MCP, and bi-level variable selection is realized by using the MCP as an internal penalty method and an external penalty method at the same time, namely, variables can be selected between groups and important variables in the groups. Before the super-edge creation among the groups of the method, the brain areas are required to be clustered at first, and all the brain areas are required to be clustered, the brain areas are clustered by using a clustering algorithm, then the composite MCP is used for constructing the brain function super-network, and the optimization objective function is as follows:
wherein
Penalizing for MCP
x
k,A
k,α
kSimilar to that in equation (1). Gamma ray
1,γ
2Adjustment parameters for intra-group penalties and inter-group penalties respectively,
representing MCP penalty, wherein gamma is larger than 1, and when gamma is larger than → ∞, the sparsity of the solved regression coefficient vector gradually becomes smaller and gradually approaches to a Lasso model penalty term, and carrying out stricter compression; when γ → 1, the sparsity of the regression coefficient vector is getting larger and larger, and gradually no penalty will be made, i.e. no compression will be performed.In previous studies, it was mostly set as a default value. Lambda is more than or equal to 0 and is another adjusting parameter. The larger the λ, the more sparse the model, and vice versa.
The expression is the regression coefficient of the jth variable of the pth group, the non-zero regression coefficient expresses that the corresponding brain area is interacted with the kth ROI, the zero regression coefficient expresses that the corresponding brain area is independent from the kth ROI without interaction, and all the regression coefficients form alpha
kA weight vector.
A brain function hyper-network model is constructed based on the method, nodes represent one interested area, hyper-edges represent interaction among a plurality of interested areas, and alpha is calculatedkCan be obtained, namely alphakThe non-zero elements in (1) form a super edge. Specifically, for each subject, γ is fixed based on the selected ROI1And gamma2Selecting the value of lambda to obtain alphakThe weight vector, i.e., a super edge is generated. Here, γ is fixed on a per ROI basis in consideration of the influence of the multi-level of information between brain regions1And gamma2The value, the range of lambda values is varied to produce a set of hyper-edges based on a particular brain region. And finally, calculating weight vectors corresponding to all brain areas based on a composite MCP method to obtain the super edges generated by all brain areas, and combining the super edges to form the tested brain function super network model.
Further, in step S3, the method includes:
each single attribute feature needs to be computed separately. First is the node importance, including degree and betweenness centrality. There are also many definitions of degrees in a super network, including node degrees and super-edge degrees. The node degree refers to the number of nodes directly connected with each node, and the excess edge degree refers to the number of excess edges connected with each node. For the diagnosis of brain diseases, the hyper-network is generally characterized by calculating local attributes of nodes, and therefore, in this document, the node degree is introduced as an index for the quantification of the hyper-network. The formula is as follows:
in formula 4, H (v, e) represents the adjacency matrix of the hypergraph, and the calculation is obtained from formula (5); v represents a specific certain node, and e represents a specific certain super edge.
Where v E represents a node and E represents a superedge. Each column in the correlation matrix represents a super edge and each row represents a node. If v ∈ c, then H (v, e) ═ 1, if
Then H (v, e) is 0.
The betweenness centrality refers to the number of shortest paths passing through the node, and has been mainly used in social networks in previous studies. It is defined as:
gk: from vertex vjTo vkAll shortest path numbers. gk(i) The method comprises the following steps Passing node v in these shortest pathsiThe number of (2). n represents the number of nodes.
Then the calculation for the different clustering coefficients. The clustering coefficients that are first computed based on dual nodes are called mutual clustering coefficients. The cross-clustering coefficient requires a pair of nodes as parameters to calculate the result. Five different mutual clustering coefficients (ComHCC) are specifically defined as follows:
we can also get the same meaning definition by transforming denominator as well:
in addition to the conventional definitions set forth above. Still other geometry-based and hypergeometric definitions are ComHCC4,ComHCC5:
Wherein u and v are both nodes; m (u) ═ ei∈E:u∈eiU represents a node, eiRepresents a super edge, M (u) refers to all super edges containing node u; total refers to all supercages.
Then, we use these two-node clustering coefficient definitions to define the clustering coefficient of a single vertex as the average of the clustering coefficients of the vertex and its neighboring points:
ComHCC (u, v) refers to the mutual clustering coefficients of the above five super networks; u, v refer to nodes;
v represents a node set, E represents a super edge set, E represents a super edge, and N (u) represents a set of nodes including other nodes in the super edge where the node u is located.
Next, a quantization hyper-network is performed based on the clustering coefficients on the single nodes. The clustering coefficient is defined to be closer to the ordinary graph, is used for measuring the closeness degree of the neighbor of a node, is an extension of the clustering coefficient of the ordinary graph, and needs additional information blended into the hypergraph. We observe all vertices in an edge and only count the connections between nodes if the edge satisfies a certain condition. The method comprises three different definition modes, which are as follows:
wherein u, v and w refer to nodes; n (u) has the same meaning as in the above formula (12); when in use
At the same time v, w ∈ e
iAnd is
Then
And vice versa. HCC (HCC)
1(u) finding connections between the neighbors of u that do not contain u has the advantage that any interaction found in this set may represent a true connection between the neighbors. It may be too much focused on those minor shared connections that have little relationship to u's interaction.
Wherein u, v and w refer to nodes; n (planchette) has the same meaning as in the above formula (12); when in use
At the same time v, w ∈ e
iAnd u e
iThen, I
F(v, w, u) ═ 1, and vice versa. HCC (HCC)
2(u) find those connections that contain the neighbors of u. The edges found in this way truly reflect the aggregation between u and the neighbors. It should be noted, however, that this connection may simply be a manual number of shared connections with uAccordingly.
Wherein u refers to a node; e refers to a super edge; n (u) has the same meaning as in the above formula (12); m (u) has the same meaning as in the above formula (7). HCC (HCC)3And (u) measuring the density of the neighborhood through the overlapping amount of the super edges of the neighborhood. Unlike both definitions, it defines the amount of overlap from the perspective of the node. Thereby avoiding the above-mentioned problems.
Further, in step S4, a Kolmogorov-Smirnov test (i.e., KS test) is used as a feature selection method on the training set. The method specifically comprises the following steps: Kolmogorov-Smimov is a test method that compares one frequency distribution f (x) with the theoretical distribution g (x) or two observed distributions. Its original assumption is H0: the two data are distributed consistently or the data fit into a theoretical distribution. D ═ max | f (x) -g (x) |, when the actual observed value D > D (n, α), H0 is rejected, otherwise the H0 hypothesis is accepted. By this method we can calculate the correlation between each feature and the label, denoted as p-value. Then p is less than 0.05 and is used as a threshold value, and better characteristics are selected;
further, in step S5, the construction of the classifier specifically includes: adopting an SVM classifier with a Gaussian kernel function, selecting an optimal feature subset as a classification feature, and selecting an optimal regularization parameter C, thereby constructing the classifier;
the constructed classifier is inspected by adopting a cross validation method, and the steps are as follows: randomly selecting 90% of samples from the optimal feature subset as a training set, and using the rest 10% of the optimal feature subset as a test set, thereby performing classification test and obtaining classification accuracy; and performing arithmetic mean on the classification accuracy obtained after 100 times of repeated classification tests, and taking the arithmetic mean as the classification accuracy of the classifier.
Further, in step S6, the quantization formula is specifically expressed as follows:
d represents the importance of the feature in the classifier; s represents a fusion feature set; | S | represents the total number of features in S; x is the number ofiRepresenting the selected feature; c represents a class label of the sample; f (x)iC) mutual information representing the selected feature and the class label c of the sample;
r represents the redundancy of the selected features in the classifier; x is the number ofiRepresenting the selected feature; x is the number ofjRepresenting other features of the fusion feature set; f (x)i,xj) Mutual information representing the selected feature and other features;
the secondary screening step is as follows: and ranking the selected features according to the importance and the redundancy respectively, and then screening out the features with higher importance and lower redundancy.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The functional magnetic resonance image data classification method based on the hyper-network fusion characteristics specifically comprises the following steps:
step S1: preprocessing resting state functional magnetic resonance image data, then performing region segmentation on the image according to a selected standardized brain map, and extracting average time sequences of all segmented brain regions;
step S2: in the research, a sparse linear regression model is solved by using composition MCP as a penalty term, so that a hyper-network is obtained;
step S3: compute 11 different topology attributes of the super network: the method comprises the following steps of including 3 different single-node-based super-network clustering coefficients (hereinafter represented by HCC), 5 different super-network mutual clustering coefficients (hereinafter represented by ComHCC), average shortest path (hereinafter represented by dist), point degree (hereinafter represented by n) and betweenness center degree (hereinafter represented by B);
step S4: adopting KS (Kolmogorov-Smimov) test as a feature selection method on a training set, and using p < 0.09 as a threshold value of feature selection;
step S5: using a Support Vector Machine (SVM) as a classifier, using fusion attributes as features (selecting difference features obtained after statistical analysis as classification features), using a given regularization parameter C and a given optimal feature subset to construct a classification model, and then adopting a cross validation method to test the constructed classifier;
step S6: because the number of the fusion attribute features is too large, a mutual information analysis method can be adopted to quantify the importance and redundancy of the selected features in the classifier, and then secondary screening is carried out on the selected features according to quantification results, so that the optimal fusion feature set is obtained.
Further, in step S1, the resting functional magnetic resonance image data is preprocessed, and there is a noise influence caused by the model of the device, the movement of the head to be tested, and the like during the acquisition of the fMRI data used in the present invention. Therefore, the fMRI data needs to be preprocessed first to improve the signal-to-noise ratio of the image. Then, the influence is normalized to the selected standard space by methods such as local nonlinear transformation and the like.
The extraction of the average time sequence of each segmented brain region comprises the following specific steps: and extracting activation signals of all voxels in each brain region at different time points, and performing arithmetic mean on the activation signals of all voxels at different time points to obtain a mean time sequence of the brain regions.
Further, the preprocessing step at least comprises time layer correction, head motion correction, joint registration, spatial standardization and low-frequency filtering.
Further, in step S2, solving the sparse linear regression model specifically includes:
the sparse linear regression model is as follows:
xk=Akαk+τk (1)
response vector xk∈RNRepresents the average time series of the kth region of interest (ROI). In particular, Ak=[x1,...,xk-1,0,xk+1,...,xK]∈RN×PA matrix representing the average time series of other ROIs except the k-th ROI whose response vector is set to zero. Alpha is alphak∈RpIs a weight vector representing the degree of influence of other ROIs on the kth ROI. Alpha is alphakThe ROIs corresponding to the zero elements represent no interaction with the selected k-th ROI, the brain regions and the selected brain regions are considered to be mutually independent, and the brain regions with the interaction are effectively represented by the mutually independent connection zero setting method. Tau iskIs a noise term.
Solving sparse regression models includes a variety of approaches. By introducing different penalty terms, the methods for solving the sparse regression model are different, and the methods for constructing the hyper-network are different. However, the traditional research is to construct brain function super-network by the Lasso method. Furthermore, some studies have extended the Lasso approach to further improve the construction of the super-network, taking into account the group effect problem between brain regions. However, the above methods all have the same problem that the excessive compression of the coefficients by the penalty function leads to biased estimation of the regression coefficients of the target variables in the model, so that the hyper-network constructed by using the method and extending the method may lose some important connections. Therefore, the invention provides a composite MCP method to create a super network, and further improves the construction of a brain function super network, thereby better simulating the complex multi-element interaction of the human brain. The composite MCP method is an extension of MCP, and bi-level variable selection is realized by using the MCP as an internal penalty method and an external penalty method at the same time, namely, variables can be selected between groups and important variables in the groups. Before the super-edge creation among the groups of the method, the brain areas are required to be clustered at first, and all the brain areas are required to be clustered, the brain areas are clustered by using a clustering algorithm, then the composite MCP is used for constructing the brain function super-network, and the optimization objective function is as follows:
wherein
Penalizing for MCP
x
k,A
k,α
kSimilar to that in equation (1). Gamma ray
1,γ
2Adjustment parameters for intra-group penalties and inter-group penalties respectively,
showing MCP punishment, wherein gamma is more than 1, and when gamma is → ∞, sparsity of a regression coefficient vector to be solved is gradually reduced and gradually approaches to the Lasso model punishmentPenalty term, performing more stringent compression; when γ → 1, the sparsity of the regression coefficient vector is getting larger and larger, and gradually no penalty will be made, i.e. no compression will be performed. In previous studies, it was mostly set as a default value. Lambda is more than or equal to 0 and is another adjusting parameter. The larger the λ, the more sparse the model, and vice versa.
The expression is the regression coefficient of the jth variable of the pth group, the non-zero regression coefficient expresses that the corresponding brain area is interacted with the kth ROI, the zero regression coefficient expresses that the corresponding brain area is independent from the kth ROI without interaction, and all the regression coefficients form alpha
kA weight vector.
A brain function hyper-network model is constructed based on the method, nodes represent one interested area, hyper-edges represent interaction among a plurality of interested areas, and alpha is calculatedkCan be obtained, namely alphakThe non-zero elements in (1) form a super edge. Specifically, for each subject, γ is fixed based on the selected ROI1And gamma2Selecting the value of lambda to obtain alphakThe weight vector, i.e., a super edge is generated. Here, γ is fixed on a per ROI basis in consideration of the influence of the multi-level of information between brain regions1And gamma2The value, the range of lambda values is varied to produce a set of hyper-edges based on a particular brain region. And finally, calculating weight vectors corresponding to all brain areas based on a composite MCP method to obtain the super edges generated by all brain areas, and combining the super edges to form the tested brain function super network model.
Further, in step S3, the method includes:
each single attribute feature needs to be computed separately. First is the node importance, including degree and betweenness centrality. There are also many definitions of degrees in a super network, including node degrees and super-edge degrees. The node degree refers to the number of nodes directly connected with each node, and the excess edge degree refers to the number of excess edges connected with each node. For the diagnosis of brain diseases, the hyper-network is generally characterized by calculating local attributes of nodes, and therefore, in this document, the node degree is introduced as an index for the quantification of the hyper-network. The formula is as follows:
in formula 4, H (v, e) represents the adjacency matrix of the hypergraph, and the calculation is obtained from formula (5); v represents a specific certain node, and e represents a specific certain super edge.
Where v E represents a node and E represents a superedge. Each column in the correlation matrix represents a super edge and each row represents a node. If v ∈ e, then H (v, e) ═ 1, if v ∈ e
Then H (v, e) is 0.
The betweenness centrality refers to the number of shortest paths passing through the node, and has been mainly used in social networks in previous studies. It is defined as:
gk: from vertex vjTo vkAll shortest path numbers. gk(i) The method comprises the following steps Passing node v in these shortest pathsiThe number of (2). n represents the number of nodes.
Then the calculation for the different clustering coefficients. The clustering coefficients that are first computed based on dual nodes are called mutual clustering coefficients. The cross-clustering coefficient requires a pair of nodes as parameters to calculate the result. Five different mutual clustering coefficients (ComHCC) are specifically defined as follows:
we can also get the same meaning definition by transforming denominator as well:
in addition to the conventional definitions set forth above. Still other geometry-based and hypergeometric definitions are ComHCC4,ComH0C5:
Wherein u and v are both nodes; m (u) ═ ei∈E:u∈eiU represents a node, eiRepresents a super edge, M (u) refers to all super edges containing node u; total refers to all supercages.
Then, we use these two-node clustering coefficient definitions to define the clustering coefficient of a single vertex as the average of the clustering coefficients of the vertex and its neighboring points:
ComHCC (u, v) refers to the mutual clustering coefficients of the above five super networks; u, v refer to nodes;
v represents a node set, E represents a super edge set, E represents a super edge, and N (u) represents a set of nodes including other nodes in the super edge where the node u is located.
Next, a quantization hyper-network is performed based on the clustering coefficients on the single nodes. The clustering coefficient is defined to be closer to the ordinary graph, is used for measuring the closeness degree of the neighbor of a node, is an extension of the clustering coefficient of the ordinary graph, and needs additional information blended into the hypergraph. We observe all vertices in an edge and only count the connections between nodes if the edge satisfies a certain condition. The method comprises three different definition modes, which are as follows:
wherein u, v and w refer to nodes; n (u) has the same meaning as in the above formula (12); when in use
At the same time v, w ∈ e
iAnd is
Then
And vice versa. HCC (HCC)
1(u) finding connections between the neighbors of u that do not contain u has the advantage that any interaction found in this set may represent a true connection between the neighbors. It may be too much focused on those minor shared connections that have little relationship to u's interaction.
Wherein u, v and w refer to nodes; n (u) has the same meaning as in the above formula (12); when in use
At the same time v, w ∈ e
iAnd u e
iThen, I
E(v, w, u) ═ 1, and vice versa. HCC (HCC)
2(u) find those connections that contain the neighbors of u. This is achieved byThe edges found in this way truly reflect the aggregation between u and the neighbors. But it should be noted that this connection may simply be artificial data sharing the connection with u.
Wherein u refers to a node; e refers to a super edge; n (u) has the same meaning as in the above formula (12); m (u) has the same meaning as in the above formula (7). HCC3 measures the density of the neighborhood by the amount of overlap of the super edges of the neighborhood. Unlike both definitions, it defines the amount of overlap from the perspective of the node. Thereby avoiding the above-mentioned problems.
Further, in step S4, a Kolmogorov-Smirnov test (i.e., KS test) is used as a feature selection method on the training set. The method specifically comprises the following steps: Kolmogorov-Smimov is a test method that compares one frequency distribution f (x) with the theoretical distribution g (x) or two observed distributions. Its original assumption is H0: the two data are distributed consistently or the data fit into a theoretical distribution. D max | f (x) -g (x) l, rejecting H0 if the actual observation D > D (n, α), otherwise accepting the H0 hypothesis. By this method we can calculate the correlation between each feature and the label, denoted as p-value. Then p is less than 0.05 and is used as a threshold value, and better characteristics are selected;
further, in step S5, the construction of the classifier specifically includes: adopting an SVM classifier with a Gaussian kernel function, selecting an optimal feature subset as a classification feature, and selecting an optimal regularization parameter C, thereby constructing the classifier;
the constructed classifier is inspected by adopting a cross validation method, and the steps are as follows: randomly selecting 90% of samples from the optimal feature subset as a training set, and using the rest 10% of the optimal feature subset as a test set, thereby performing classification test and obtaining classification accuracy; and performing arithmetic mean on the classification accuracy obtained after 100 times of repeated classification tests, and taking the arithmetic mean as the classification accuracy of the classifier.
Further, in step S6, the quantization formula is specifically expressed as follows:
d represents the importance of the feature in the classifier; s represents a fusion feature set; | S | represents the total number of features in S; x is the number ofiRepresenting the selected feature; c represents a class label of the sample; f (x)iC) mutual information representing the selected feature and the class label c of the sample;
r represents the redundancy of the selected features in the classifier; x is the number ofiRepresenting the selected feature; x is the number ofjRepresenting other features of the fusion feature set; f (x)i,xj) Mutual information representing the selected feature and other features;
the secondary screening step is as follows: and ranking the selected features according to the importance and the redundancy respectively, and then screening out the features with higher importance and lower redundancy.
The invention has the beneficial effects that: compared with the traditional magnetic resonance image data classification method, the method firstly uses the composition MCP method to construct the super network model, and then extracts 11 different features from the super network as fusion features for classification so as to make up for the defect that the single attribute feature contains single information. The feature set rich in sufficient information can represent the topology of all-around and multi-angle in a brain hyper-network, and the completeness of a hyper-network topological structure is presented, so that the classifier constructed later can effectively extract and distinguish information, and the upper limit of the classification precision of the classifier is improved. The invention overcomes the defect that the previous research uses single attribute as the characteristic, and can be used for classifying the magnetic resonance image data.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.