CN114580497A - Method for analyzing influence of genes on multi-modal brain image phenotype - Google Patents

Method for analyzing influence of genes on multi-modal brain image phenotype Download PDF

Info

Publication number
CN114580497A
CN114580497A CN202210092765.7A CN202210092765A CN114580497A CN 114580497 A CN114580497 A CN 114580497A CN 202210092765 A CN202210092765 A CN 202210092765A CN 114580497 A CN114580497 A CN 114580497A
Authority
CN
China
Prior art keywords
matrix
similarity
phenotype
hypergraph
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210092765.7A
Other languages
Chinese (zh)
Other versions
CN114580497B (en
Inventor
汪美玲
张道强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210092765.7A priority Critical patent/CN114580497B/en
Publication of CN114580497A publication Critical patent/CN114580497A/en
Application granted granted Critical
Publication of CN114580497B publication Critical patent/CN114580497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a method for analyzing influence of genes on multi-modal brain image phenotypes, which firstly provides a graph diffusion method for enhancing similarity measurement among samples under the condition of giving multi-modal phenotypes, and fusing a plurality of input similarity graphs into a unified graph with geometrical structures among different imaging phenotypes; secondly, representing a high-order similarity relation between samples by using a unified graph, combining cross-modal information through a hypergraph regularization item, establishing a hypergraph regularization multi-modal learning model based on graph diffusion, and designing an optimization strategy to solve the model to obtain an imaging phenotype related to genes; and finally, fusing the phenotype characteristics selected from different modes by adopting a multi-core support vector machine. The invention can fully utilize the phenotype data of the image and effectively analyze the influence of the gene on the multi-modal brain image phenotype.

Description

Method for analyzing influence of genes on multi-modal brain image phenotype
Technical Field
The invention belongs to the field of image analysis based on image genetics, and particularly relates to a method for analyzing influence of genes on multi-modal brain image phenotype.
Background
Brain imaging genetics (or brain imaging genetics) combines multi-modal neuroimaging and genetics methods to detect genetic variations in brain structure and function associated with behaviors that affect cognition, mood regulation, and the like. The influence of genes on individuals is evaluated by using the structure and the function of the brain as phenotypes by using a brain imaging technology, and how the genes influence the neural structure and the function of the brain is studied. The correlation between heredity and brain structure and function is researched, and a visible bridge is built between gene and brain and behavior.
In recent research work, some literature has indicated that integrating multi-modal brain imaging data can help understand changes in human brain state. It should be noted that, first, in most studies, the effect of genes on the multimodal brain imaging phenotype is not generally considered. It is well known that not all changes in the brain are necessarily the result of genetic effects, by not knowing which imaging phenotypes are associated with a particular state of the brain. That is, this imaging phenotype is a feature that is not negligible when observing changes in brain state. Secondly, feature extraction is the key to observing the state of the human brain. Currently, most feature extraction methods only focus on different features from multiple imaging and genetic data. Since genetic variation has been identified as an important factor for different imaging phenotypes, we attempted to integrate multiple imaging phenotype data and genetic data to facilitate understanding of brain mechanisms. Finally, the structural information obtained by the conventional graph-based feature extraction method only captures the pairwise relationship, and ignores the multiple interrelations between multiple vertices. An ideal feature extraction method should be able to describe the geometry between different imaging phenotypes and exploit the complementarity of data to observe and understand the state of the human brain.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a method for analyzing influence of genes on a multi-modal brain image phenotype, which can fully utilize sample magnetic resonance image data so as to analyze influence of the genes in a human brain on the brain image phenotype.
The technical scheme is as follows: the invention provides a method for analyzing influence of genes on a multi-modal brain image phenotype, which specifically comprises the following steps:
(1) preprocessing pre-acquired gene brain image data;
(2) obtaining a unified map containing valuable geometric structure information in all the multi-modal table charts based on a multi-modal table chart diffusion method;
(3) establishing a hypergraph regularization multi-mode learning model based on graph diffusion;
(4) optimizing the hypergraph regularization multi-mode learning model of image diffusion to obtain brain image phenotype characteristic data related to genes;
(5) and classifying the obtained image phenotype characteristic data related to the gene by using a multi-core support vector machine, and performing correlation analysis.
Further, the step (2) is realized as follows:
constructing a weighted graph G with N samples and M-mode phenotype using the characteristics of the M-th modem=(Vm,Em) To simulate the relationship between samples, where VmCorresponding to N samples and EmWeights corresponding to similarity relationships between samples; the edge weights are represented by an NxN similarity matrix S, where Si,jRepresenting the similarity between sample i and sample j; si,jIs calculated a priori by the label as:
Figure BDA0003489737080000021
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003489737080000022
c is the number of sample classes, x is reconstructed using samples with the same labeliIn which
Figure BDA0003489737080000023
Representing the ith sample in class d, p is a regularization parameter,
Figure BDA0003489737080000024
is a vector in which
Figure BDA0003489737080000025
The element of the position is zero, the negative solution is ignored, i.e.:
Figure BDA0003489737080000026
Figure BDA0003489737080000027
use of SLEP toolkit to obtain optimal solution to the above optimal problem
Figure BDA0003489737080000028
Figure BDA0003489737080000029
Wherein the global similarity matrix S is an NxN symmetric matrix, Si,jIs an element of the similarity matrix S;
computing local similarity matrices for multimodal topographical maps using a K-nearest neighbor algorithm
Figure BDA00034897370800000210
Namely:
Figure BDA00034897370800000211
wherein the content of the first and second substances,
Figure BDA00034897370800000212
is a local similarity matrix
Figure BDA00034897370800000213
An element of (1); local similarity can be propagated to remote similarity through a diffusion process, and is widely adopted in other manifold learning algorithms; sparse matrix
Figure BDA00034897370800000214
Only the strong connections between the vertices are retained while the weak connections are removed; notably, S carries complete information about the similarity between vertices, and
Figure BDA0003489737080000031
describes the part of the vertex closest to KSimilarity, which is robust to noise in the similarity measure;
establishing a diffusion process of the multi-modal phenotype map:
Figure BDA0003489737080000032
Figure BDA0003489737080000033
wherein the content of the first and second substances,
Figure BDA0003489737080000034
and
Figure BDA0003489737080000035
respectively representing the global similarity matrix of the 1 st model and the 2 nd modality at time t,
Figure BDA0003489737080000036
and
Figure BDA0003489737080000037
the method comprises the following steps of respectively setting local similarity matrixes of a 1 st model and a 2 nd mode, wherein eta is a parameter, I is an identity matrix, eta I is introduced in the diffusion process, and the diffusion process of various modes is as follows:
Figure BDA0003489737080000038
wherein the content of the first and second substances,
Figure BDA0003489737080000039
and
Figure BDA00034897370800000310
respectively representing a global similarity matrix and a local similarity matrix of the mth model at the time t; at this time, the diffusion matrix is obtained as follows:
Figure BDA00034897370800000311
further, the step (3) is realized as follows:
(31) constructing a hypergraph according to the unified map structure information P obtained by multi-mode tabular map diffusion, if xiStrongly correlated with multiple samples in the graph, it is contained in the super edge ejIn (3), the obtained incidence matrix H is:
Figure BDA00034897370800000312
where θ is a threshold value, by judging the similarity PijWhether the vertex is larger than the threshold value theta or not is directly determined whether the vertex v is to be replaced or notiTo the super edge ej
(32) Taking each sample as the center and constructing a hyper-edge by selecting the most relevant sample in the unified graph, according to H, defining the vertex viE degree and super edge of VjThe degree of E is as follows:
Figure BDA00034897370800000313
Figure BDA00034897370800000314
wherein, w (e)j) Is ejH belongs to H; will PiAs a sample xiAnd the similarity between two samples is measured by the dot product of two feature vectors:
M(i,j)=|<Pi,Pj>|
at this time, the neighborhood matrix is calculated as M ═ PTP |, then the weight of the super edge is calculated as:
Figure BDA0003489737080000041
obtaining a weighted hypergraph G (V, E, Q), wherein Q is a weight matrix of the hyperedge, and Q belongs to Q;
(33) based on the constructed hypergraph G, the following laplacian matrix is obtained:
Figure BDA0003489737080000042
wherein D isvAnd DeDiagonal matrices of vertex degrees and hyper-edge degrees, respectively; according to the Laplace matrix LhThe hypergraph regularization term Ω is defined as:
Ω=(Xw)TLhXw
(34) according to the hypergraph regularization term omega, establishing a hypergraph regularization multi-mode learning model based on graph diffusion as follows:
Figure BDA0003489737080000043
wherein the content of the first and second substances,
Figure BDA0003489737080000044
refers to the phenotype of M modalities; y ═ y1,y2,…,yn,…,yN]∈RNIs a corresponding gene; n is the number of samples, and p corresponds to the feature dimension of each modal phenotype; w is am∈RpIs the weight vector of the mth modality; w ═ W1,w2,…,wM]∈Rp×MRepresenting a matrix formed by weight vectors on corresponding modes; | W | count the hair2,1Refers to a group of sparse regularization terms for jointly selecting a few gene locus-related features from a multi-modal brain image phenotype; λ and μ are two regularization parameters.
Further, the step (4) is realized as follows:
(41) decomposing the hypergraph regularization multi-mode learning model based on graph diffusion, which is put forward in the step (3), into smoothlets g1(W) and a non-smoothing sub-formula g2(W), namely:
Figure BDA0003489737080000045
g2(W)=λ||W||2,1
(42) defining an approximation function Ql(W,Wt) The following were used:
Figure BDA0003489737080000051
wherein | · | purple sweetFRepresenting a Frobenius norm operator;
Figure BDA0003489737080000052
is g1(W) t th iteration at WtThe gradient of (d); symbol l represents the step size;
(43) carrying out hypergraph regularization multi-mode learning model optimization based on graph diffusion by adopting an accelerated approximate gradient method, wherein the method comprises the following steps:
Figure BDA0003489737080000053
wherein, wkAnd vkRespectively refer to the k-th columns of matrix W and matrix V;
(44) and (4) obtaining image phenotype characteristic data related to the gene through iterative optimization of (41) to (43).
Further, the step (5) is realized as follows:
detecting whether all the brain image phenotype characteristic data related to the genes are straight through; if yes, outputting brain image phenotype characteristic data related to the genes, and classifying the brain image phenotype characteristic data related to the genes by using a multi-core support vector machine; otherwise, go back to step (1).
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: the invention can fully utilize the image phenotype data and effectively analyze the influence of the genes in the human brain change on the multi-modal brain image phenotype; the invention not only achieves strong association, but also discovers a remarkable, consistent and stable phenotype biomarker related to genes.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The present invention provides a method of gene impact on multimodal brain imaging phenotypes, first providing a graph diffusion method for enhancing similarity measurements between samples given multimodal phenotypes, the method utilizing a plurality of input similarity graphs and fusing them to a unified graph having valuable geometries between different imaging phenotypes; secondly, the invention uses the unified map to represent the high-order similarity relation between samples, combines cross-modal information through a hypergraph regularization item, establishes a hypergraph regularization multi-modal learning model based on graph diffusion, and designs an optimization strategy to solve the model to obtain an imaging phenotype related to genes. In the validation section, the above algorithm was tested in the ADNI dataset using MATLAB software. The overall flow chart of the invention is shown in fig. 1, and specifically comprises the following steps:
step 1, preprocessing pre-acquired gene brain image data to remove irrelevant information.
And 2, providing a multi-mode tabular diagram diffusion method, and obtaining a unified diagram containing valuable geometric structure information in all multi-mode tabular diagrams based on the multi-mode tabular diagram diffusion method.
First assume N samples and M modal phenotypes. Constructing a weighted graph G using features of the mth modalitym=(Vm,Em) To simulate the relationship between samples, where VmCorresponding to N samples and EmWeights corresponding to similarity relationships between samples. The edge weights are represented by an NxN similarity matrix S, where Si,jRepresenting the similarity between sample i and sample j. To make full use of the tag information, Si,jIs a priori calculated by the tag as:
Figure BDA0003489737080000061
wherein the content of the first and second substances,
Figure BDA0003489737080000062
c is the number of sample classes. In the above equation, x is reconstructed using samples with the same labeliWherein
Figure BDA0003489737080000063
Represents the ith sample in class d, and can be represented by samples with the same diagnostic label. P is a regularization parameter.
Figure BDA0003489737080000064
Is a vector in which
Figure BDA0003489737080000065
The element of the position is zero, the negative solution is ignored, i.e.:
Figure BDA0003489737080000066
Figure BDA0003489737080000067
here the SLEP toolkit is used to obtain the optimal solution to the above-described optimal problem
Figure BDA0003489737080000068
At this time:
Figure BDA0003489737080000069
wherein the global similarity matrix S is an NxN symmetric matrix, Si,jAre elements of the similarity matrix S.
Then, a local similarity matrix of the multi-modal tabular form is calculated using a K-nearest neighbor algorithm
Figure BDA00034897370800000610
Namely:
Figure BDA00034897370800000611
wherein the content of the first and second substances,
Figure BDA00034897370800000612
is a local similarity matrix
Figure BDA00034897370800000613
The element (c) of (a). Local similarities can be propagated to remote similarities through a diffusion process, and are also widely adopted in other manifold learning algorithms. Sparse matrix
Figure BDA00034897370800000614
Only strong connections between vertices are retained while weak connections are removed. Notably, S carries complete information about the similarity between vertices, and
Figure BDA0003489737080000071
local similarity to the nearest vertex of K is described, which is robust to noise in the similarity measure.
And finally, establishing a diffusion process of the multi-modal phenotype map. Considering the case of two modal phenotypes, according to step (2), a global similarity matrix S can be calculated1And S2And local similarity matrix
Figure BDA0003489737080000072
And
Figure BDA0003489737080000073
will be provided with
Figure BDA0003489737080000074
Figure BDA0003489737080000075
And
Figure BDA0003489737080000076
the similarity matrix at t-0 is expressed. The key to the diffusion process is to iteratively update the similarity by information in other modalities, as follows:
Figure BDA0003489737080000077
Figure BDA0003489737080000078
wherein the content of the first and second substances,
Figure BDA0003489737080000079
is the similarity matrix of the mth modality after t iterations,
Figure BDA00034897370800000710
is the local similarity matrix for the mth modality. The similarity matrix is updated each time two parallel swap-diffusion processes are generated. After each iteration, all are paired
Figure BDA00034897370800000711
And
Figure BDA00034897370800000712
the normalization is performed in order to ensure that the sample is always most similar to itself, and not to other samples, during the diffusion process. Further, the air conditioner is provided with a fan,
Figure BDA00034897370800000713
Figure BDA00034897370800000714
wherein eta is a parameter, I is an identity matrix, and eta I is introduced in the diffusion process, so that not only can the loss of self-similarity be avoided, but also more stable quality distribution can be ensured. Based on the above considerations, the algorithm is extended to the multi-modal case (i.e., M > 2). Similar to the case of two modalities, the diffusion process for multiple modalities can be modeled as:
Figure BDA00034897370800000715
finally, the diffusion matrix is obtained as follows:
Figure BDA00034897370800000716
the basic idea of the diffusion method described above is that xiAnd xjOften not strongly connected in one modality, however, their connections in different modalities may be strong and this information may propagate through diffusion processes. Thus, when a strong connection in one or more graphs is added to other graphs, then all weak connections in the graphs disappear at the same time. In addition, if xiAnd xjSharing local connections in all figures, then xiAnd xjAnd are likely to belong to the same class, in which case the diffusion process may enhance similarity. A graph of one modality may absorb supplemental information from a graph of other modalities in each iteration. Thus, the final diffusion-derived unified map contains valuable geometric information in all the multimodal phenotypes.
And 3, establishing a hypergraph regularization multi-mode learning model based on graph diffusion.
Specifically, the hypergraph is constructed according to the unified graph structure information P obtained by the multi-mode tabular graph diffusion in step 2. If xiStrongly correlated with multiple samples in the graph, it is contained in the super edge ejThe obtained incidence matrix H is:
Figure BDA0003489737080000081
where θ is a threshold. It can be seen that by judging the similarity PijWhether the vertex is larger than the threshold value theta or not is directly determined whether the vertex v is to be replaced or notiTo the super edge ej. At this point, each sample is taken as the center and the super-edge is constructed by selecting the most relevant sample in the unified map. The neighborhood number is adaptive to each sample center, facilitating the capture of local group information for the data. From H, define vertex viE degree and super edge of VjThe degree of E is as follows:
Figure BDA0003489737080000082
Figure BDA0003489737080000083
wherein, w (e)j) Is ejH ∈ H. The main difference between the hypergraph and the normal graph is the hyperedge, which can connect more than two vertices. The super-edge has the ability to model complex relationships between local population information and samples. In addition, the super-edge weights have an important role in the hypergraph. In particular, if there is a strong connection and similarity between vertices in a hyper-edge, then the vertices are likely to belong to the same class of samples, and the hyper-edge should have a high weight. Conversely, if the vertices in a hyperedge are not very similar, the hyperedge should have a lower weight. Thus, P can be substitutediAs a sample xiAnd the similarity between two samples can be measured by the dot product of two feature vectors:
M(i,j)=|<Pi,Pj>|
at this time, the neighborhood matrix may be calculated as M ═ PTP |, then the weight of the super edge can be calculated as:
Figure BDA0003489737080000084
the resulting weighted hypergraph is G (V, E, Q), where Q is the weight matrix for the hyperedge, and Q ∈ Q.
Based on the constructed hypergraph G, the following laplacian matrix can be obtained:
Figure BDA0003489737080000091
wherein D isvAnd DeRespectively, a diagonal matrix of vertex degrees and hyper-edge degrees.
According to the Laplace matrix LhThe hypergraph regularization term Ω may be defined as:
Ω=(Xw)TLhXw
in order to retain the basic structure and high-order information of data, according to a hypergraph regularization term omega, the following hypergraph regularization multi-mode learning model based on graph diffusion is established:
Figure BDA0003489737080000092
wherein the content of the first and second substances,
Figure BDA0003489737080000093
refers to a phenotype of M modalities, y ═ y1,y2,…,yn,…,yN]∈RNIs the corresponding gene (APOE SNP rs429358), N is the number of samples, p corresponds to the characteristic dimension of each modal phenotype, wm∈RpIs the weight vector of the mth mode, W ═ W1,w2,…,wM]∈Rp×MRepresenting a matrix formed by weight vectors on the corresponding modality, | W | | non-woven cells2,1Refers to a set of sparse regularization terms for jointly selecting a few gene locus-related features from a multi-modal brain image phenotype, λ and μ being two regularization parameters. The hypermap regularization multi-modal learning model based on map diffusion can sufficiently mine and describe high-order information among samples so as to better discover and select multi-modal phenotypes related to genetic genes.
And 4, designing an optimization algorithm to optimize the hypergraph regularization multi-modal learning model based on graph diffusion in the step 3 to obtain brain image phenotype characteristic data related to genes, wherein the process is as follows:
1) the model proposed in step 3 is decomposed into smooth sub-formula g by simplification1(W) and a non-smoothing sub-formula g2(W), namely:
Figure BDA0003489737080000094
g2(W)=λ||W||2,1
2) defining an approximation function Ql(W,Wt) The following were used:
Figure BDA0003489737080000095
wherein | · | purple sweetFRepresenting the Frobenius norm operator.
Figure BDA0003489737080000096
Is g1(W) t th iteration at WtThe gradient of (a). The symbol l denotes the step size.
3) According to 1) and 2), model optimization is performed by using an accelerated approximation gradient method, wherein:
Figure BDA0003489737080000101
wherein, wkAnd vkRespectively, to the kth column of matrix W and matrix V.
4) And (3) obtaining image phenotype characteristic data related to the gene through iterative optimization of 1) to 3).
And classifying the obtained image phenotype characteristic data related to the genes by using a multi-core support vector machine, and performing correlation analysis.
Finally detecting whether all the brain image phenotype characteristic data related to the gene are straight-through; if yes, outputting brain image phenotype characteristic data related to the genes, and classifying the brain image phenotype characteristic data related to the genes by using a multi-core support vector machine; otherwise, returning to the gene image data preprocessing step.
To verify the effectiveness of the inventive method, this example evaluated the performance of the inventive method on a real ADNI dataset. Multimodal brain image data and corresponding genetic data were used, including 913 samples. The invention not only achieves strong association, but also discovers a remarkable, consistent and stable phenotype biomarker related to genes.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (5)

1. A method for analyzing influence of genes on a multi-modal brain imaging phenotype, comprising the steps of:
(1) preprocessing pre-acquired gene brain image data;
(2) obtaining a unified map containing valuable geometric structure information in all the multi-modal table charts based on a multi-modal table chart diffusion method;
(3) establishing a hypergraph regularization multi-mode learning model based on graph diffusion;
(4) optimizing the hypergraph regularization multi-mode learning model of image diffusion to obtain brain image phenotype characteristic data related to genes;
(5) and classifying the obtained image phenotype characteristic data related to the gene by using a multi-core support vector machine, and performing correlation analysis.
2. The method according to claim 1, wherein the step (2) is performed by:
constructing a weighted graph G with N samples and M-mode phenotype using the characteristics of the M-th modem=(Vm,Em) To simulate the relationship between samples, where VmCorresponding to N samples and EmWeights corresponding to similarity relationships between samples; the edge weights are represented by an NxN similarity matrix S, where Si,jRepresenting the similarity between sample i and sample j; si,jIs calculated a priori by the label as:
Figure FDA0003489737070000011
wherein the content of the first and second substances,
Figure FDA0003489737070000012
c is the number of sample classes, x is reconstructed using samples with the same labeliWherein
Figure FDA0003489737070000013
Representing the ith sample in class d, p is a regularization parameter,
Figure FDA0003489737070000014
is a vector in which
Figure FDA0003489737070000015
The element of the position is zero, the negative solution is ignored, i.e.:
Figure FDA0003489737070000016
Figure FDA0003489737070000017
use of SLEP toolkit to obtain optimal solution to the above optimal problem
Figure FDA0003489737070000018
Figure FDA0003489737070000019
Wherein the global similarity matrix S is an NxN symmetric matrix, Si,jIs an element of the similarity matrix S;
computing local similarity matrices for multimodal topographical maps using a K-nearest neighbor algorithm
Figure FDA00034897370700000110
Namely:
Figure FDA0003489737070000021
wherein the content of the first and second substances,
Figure FDA0003489737070000022
is a local similarity matrix
Figure FDA0003489737070000023
An element of (1); local similarity can be propagated to remote similarity through a diffusion process, and is widely adopted in other manifold learning algorithms; sparse matrix
Figure FDA0003489737070000024
Only the strong connections between the vertices are retained while the weak connections are removed; notably, S carries complete information about the similarity between vertices, and
Figure FDA0003489737070000025
local similarity to the nearest vertex of K is described, which is robust to noise in the similarity measure;
establishing a diffusion process of the multi-modal phenotype map:
Figure FDA0003489737070000026
Figure FDA0003489737070000027
wherein the content of the first and second substances,
Figure FDA0003489737070000028
and
Figure FDA0003489737070000029
respectively representing the global similarity matrix of the 1 st model and the 2 nd modality at time t,
Figure FDA00034897370700000210
and
Figure FDA00034897370700000211
the method comprises the following steps of respectively setting local similarity matrixes of a 1 st model and a 2 nd mode, wherein eta is a parameter, I is an identity matrix, eta I is introduced in the diffusion process, and the diffusion process of various modes is as follows:
Figure FDA00034897370700000212
wherein the content of the first and second substances,
Figure FDA00034897370700000213
and
Figure FDA00034897370700000214
respectively representing a global similarity matrix and a local similarity matrix of the mth model at the time t; at this time, the diffusion matrix is obtained as follows:
Figure FDA00034897370700000215
3. the method according to claim 1, wherein the step (3) is performed by:
(31) constructing a hypergraph according to the unified map structure information P obtained by multi-mode tabular map diffusion, if xiStrongly correlated with multiple samples in the graph, it is contained in the super edge ejThe obtained incidence matrix H is:
Figure FDA00034897370700000216
where θ is a threshold value, by judging the similarity PijWhether the vertex is larger than the threshold value theta or not is directly determined whether the vertex v is to be replaced or notiTo the super edge ej
(32) Taking each sample as the center and constructing a hyper-edge by selecting the most relevant sample in the unified graph, according to H, defining the vertex viE degree and super edge of VjThe degree of E is as follows:
Figure FDA0003489737070000031
Figure FDA0003489737070000032
wherein, w (e)j) Is ejH belongs to H; will PiAs a sample xiAnd the similarity between two samples is measured by the dot product of two feature vectors:
M(i,j)=|<Pi,Pj>|
at this time, the neighborhood matrix is calculated as M ═ PTP |, then the weight of the super edge is calculated as:
Figure FDA0003489737070000033
obtaining a weighted hypergraph G (V, E, Q), wherein Q is a weight matrix of the hyperedge, and Q belongs to Q;
(33) based on the constructed hypergraph G, the following laplacian matrix is obtained:
Figure FDA0003489737070000034
wherein D isvAnd DeDiagonal matrices of vertex degrees and hyper-edge degrees, respectively; according to the Laplace matrix LhThe hypergraph regularization term Ω is defined as:
Ω=(Xw)TLhXw
(34) according to the hypergraph regularization term omega, establishing a hypergraph regularization multi-mode learning model based on graph diffusion:
Figure FDA0003489737070000035
wherein the content of the first and second substances,
Figure FDA0003489737070000036
refers to the phenotype of M modalities; y ═ y1,y2,…,yn,…,yN]∈RNIs a corresponding gene; n is the number of samples, and p corresponds to the feature dimension of each modal phenotype; w is an∈RpIs the weight vector for the mth modality; w ═ W1,w2,…,wM]∈Rp×MRepresenting a matrix formed by weight vectors on corresponding modes; | W | count the hair2,1Refers to a group of sparse regularization terms for jointly selecting a few gene locus-related features from a multi-modal brain image phenotype; λ and μ are two regularization parameters.
4. The method according to claim 1, wherein the step (4) is performed by:
(41) decomposing the hypergraph regularization multi-mode learning model based on graph diffusion, which is put forward in the step (3), into smoothlets g1(W) and a non-smoothing sub-formula g2(W), namely:
Figure FDA0003489737070000041
g2(W)=λ||W||2,1
(42) defining an approximation function Ql(W,Wt) The following were used:
Figure FDA0003489737070000042
wherein | · | purple sweetFRepresenting a Frobenius norm operator;
Figure FDA0003489737070000043
is g1(W) t th iteration at WtThe gradient of (d); symbol l represents the step size;
(43) carrying out hypergraph regularization multi-mode learning model optimization based on graph diffusion by adopting an accelerated approximate gradient method, wherein the method comprises the following steps:
Figure FDA0003489737070000044
wherein, wkAnd vkRespectively refer to the k-th column of the matrix W and the matrix V;
(44) and (4) obtaining image phenotype characteristic data related to the gene through iterative optimization of (41) to (43).
5. The method according to claim 1, wherein the step (5) is performed by:
detecting whether all the brain image phenotype characteristic data related to the genes are straight through; if yes, outputting brain image phenotype characteristic data related to the genes, and classifying the brain image phenotype characteristic data related to the genes by using a multi-core support vector machine; otherwise, go back to step (1).
CN202210092765.7A 2022-01-26 2022-01-26 Method for analyzing influence of genes on multimodal brain image phenotype Active CN114580497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210092765.7A CN114580497B (en) 2022-01-26 2022-01-26 Method for analyzing influence of genes on multimodal brain image phenotype

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210092765.7A CN114580497B (en) 2022-01-26 2022-01-26 Method for analyzing influence of genes on multimodal brain image phenotype

Publications (2)

Publication Number Publication Date
CN114580497A true CN114580497A (en) 2022-06-03
CN114580497B CN114580497B (en) 2023-07-11

Family

ID=81769732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210092765.7A Active CN114580497B (en) 2022-01-26 2022-01-26 Method for analyzing influence of genes on multimodal brain image phenotype

Country Status (1)

Country Link
CN (1) CN114580497B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203470A (en) * 2016-06-22 2016-12-07 南京航空航天大学 A kind of multi-modal feature selection based on hypergraph and sorting technique
CN107507195A (en) * 2017-08-14 2017-12-22 四川大学 The multi-modal nasopharyngeal carcinoma image partition methods of PET CT based on hypergraph model
US20190139236A1 (en) * 2016-12-28 2019-05-09 Shanghai United Imaging Healthcare Co., Ltd. Method and system for processing multi-modality image
CN110222745A (en) * 2019-05-24 2019-09-10 中南大学 A kind of cell type identification method based on similarity-based learning and its enhancing
CN110833414A (en) * 2019-11-28 2020-02-25 广州中医药大学第一附属医院 Multi-modal molecular imaging strategy of radioactive brain injury biomarker after nasopharyngeal carcinoma radiotherapy
CN112288027A (en) * 2020-11-05 2021-01-29 河北工业大学 Heterogeneous multi-modal image genetics data feature analysis method
CN112614129A (en) * 2020-12-31 2021-04-06 南方医科大学 Image correlation detection method based on time sequence sparse regression and additive model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203470A (en) * 2016-06-22 2016-12-07 南京航空航天大学 A kind of multi-modal feature selection based on hypergraph and sorting technique
US20190139236A1 (en) * 2016-12-28 2019-05-09 Shanghai United Imaging Healthcare Co., Ltd. Method and system for processing multi-modality image
CN107507195A (en) * 2017-08-14 2017-12-22 四川大学 The multi-modal nasopharyngeal carcinoma image partition methods of PET CT based on hypergraph model
CN110222745A (en) * 2019-05-24 2019-09-10 中南大学 A kind of cell type identification method based on similarity-based learning and its enhancing
CN110833414A (en) * 2019-11-28 2020-02-25 广州中医药大学第一附属医院 Multi-modal molecular imaging strategy of radioactive brain injury biomarker after nasopharyngeal carcinoma radiotherapy
CN112288027A (en) * 2020-11-05 2021-01-29 河北工业大学 Heterogeneous multi-modal image genetics data feature analysis method
CN112614129A (en) * 2020-12-31 2021-04-06 南方医科大学 Image correlation detection method based on time sequence sparse regression and additive model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MEILING WANG 等: "Discovering network phenotype between genetic risk factors and disease status via diagnosis-aligned multi-modality regression method in Alzheimer\'s disease", vol. 35, no. 11, pages 1948 - 1957 *
MEILING WANG 等: "Identify Consistent Cross-Modality Imaging Genetic Patterns via Discriminant Sparse Canonical Correlation Analysis", vol. 18, no. 4, pages 1549 - 1561, XP011870857, DOI: 10.1109/TCBB.2019.2944825 *
彭瑶: "基于超图的多模态特征选择及分类方法研究", pages 21 - 27 *
彭瑶;祖辰;张道强;: "基于超图的多模态特征选择算法及其应用", pages 112 - 119 *

Also Published As

Publication number Publication date
CN114580497B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
Vu et al. Histopathological image classification using discriminative feature-oriented dictionary learning
Li et al. Deep convolutional neural networks for detecting secondary structures in protein density maps from cryo-electron microscopy
US8977579B2 (en) Latent factor dependency structure determination
Banerji et al. Deep learning in histopathology: A review
WO2019178291A1 (en) Methods for data segmentation and identification
CN108664986B (en) Based on lpNorm regularized multi-task learning image classification method and system
Chaddad et al. Deep radiomic analysis based on modeling information flow in convolutional neural networks
Levin et al. A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
Wind et al. Link prediction in weighted networks
Vengatesan et al. The performance analysis of microarray data using occurrence clustering
Huang et al. A new sparse simplex model for brain anatomical and genetic network analysis
CN114580497A (en) Method for analyzing influence of genes on multi-modal brain image phenotype
CN116705151A (en) Dimension reduction method and system for space transcriptome data
CN116188428A (en) Bridging multi-source domain self-adaptive cross-domain histopathological image recognition method
CN112768001A (en) Single cell trajectory inference method based on manifold learning and main curve
Ramathilaga et al. Two novel fuzzy clustering methods for solving data clustering problems
Murua et al. Biclustering via semiparametric Bayesian inference
CN108304546B (en) Medical image retrieval method based on content similarity and Softmax classifier
Xin et al. Multi-level Topological Analysis Framework for Multifocal Diseases
US9569585B2 (en) Combining RNAi imaging data with genomic data for gene interaction network construction
Wang et al. Partially-independent component analysis for tissue heterogeneity correction in microarray gene expression analysis
Logan C-SHIFT, Quantile theory, and assessing monotonicity
CN117496279B (en) Image classification model building method and device, and classification method, device and system
Pereira et al. Handcrafted features vs deep-learned features: Hermite Polynomial Classification of Liver Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant