CN114580497A

CN114580497A - Method for analyzing influence of genes on multi-modal brain image phenotype

Info

Publication number: CN114580497A
Application number: CN202210092765.7A
Authority: CN
Inventors: 汪美玲; 张道强
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-06-03
Anticipated expiration: 2042-01-26
Also published as: CN114580497B

Abstract

The invention discloses a method for analyzing influence of genes on multi-modal brain image phenotypes, which firstly provides a graph diffusion method for enhancing similarity measurement among samples under the condition of giving multi-modal phenotypes, and fusing a plurality of input similarity graphs into a unified graph with geometrical structures among different imaging phenotypes; secondly, representing a high-order similarity relation between samples by using a unified graph, combining cross-modal information through a hypergraph regularization item, establishing a hypergraph regularization multi-modal learning model based on graph diffusion, and designing an optimization strategy to solve the model to obtain an imaging phenotype related to genes; and finally, fusing the phenotype characteristics selected from different modes by adopting a multi-core support vector machine. The invention can fully utilize the phenotype data of the image and effectively analyze the influence of the gene on the multi-modal brain image phenotype.

Description

Method for analyzing influence of genes on multi-modal brain image phenotype

Technical Field

The invention belongs to the field of image analysis based on image genetics, and particularly relates to a method for analyzing influence of genes on multi-modal brain image phenotype.

Background

Brain imaging genetics (or brain imaging genetics) combines multi-modal neuroimaging and genetics methods to detect genetic variations in brain structure and function associated with behaviors that affect cognition, mood regulation, and the like. The influence of genes on individuals is evaluated by using the structure and the function of the brain as phenotypes by using a brain imaging technology, and how the genes influence the neural structure and the function of the brain is studied. The correlation between heredity and brain structure and function is researched, and a visible bridge is built between gene and brain and behavior.

In recent research work, some literature has indicated that integrating multi-modal brain imaging data can help understand changes in human brain state. It should be noted that, first, in most studies, the effect of genes on the multimodal brain imaging phenotype is not generally considered. It is well known that not all changes in the brain are necessarily the result of genetic effects, by not knowing which imaging phenotypes are associated with a particular state of the brain. That is, this imaging phenotype is a feature that is not negligible when observing changes in brain state. Secondly, feature extraction is the key to observing the state of the human brain. Currently, most feature extraction methods only focus on different features from multiple imaging and genetic data. Since genetic variation has been identified as an important factor for different imaging phenotypes, we attempted to integrate multiple imaging phenotype data and genetic data to facilitate understanding of brain mechanisms. Finally, the structural information obtained by the conventional graph-based feature extraction method only captures the pairwise relationship, and ignores the multiple interrelations between multiple vertices. An ideal feature extraction method should be able to describe the geometry between different imaging phenotypes and exploit the complementarity of data to observe and understand the state of the human brain.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides a method for analyzing influence of genes on a multi-modal brain image phenotype, which can fully utilize sample magnetic resonance image data so as to analyze influence of the genes in a human brain on the brain image phenotype.

The technical scheme is as follows: the invention provides a method for analyzing influence of genes on a multi-modal brain image phenotype, which specifically comprises the following steps:

(1) preprocessing pre-acquired gene brain image data;

(2) obtaining a unified map containing valuable geometric structure information in all the multi-modal table charts based on a multi-modal table chart diffusion method;

(3) establishing a hypergraph regularization multi-mode learning model based on graph diffusion;

(4) optimizing the hypergraph regularization multi-mode learning model of image diffusion to obtain brain image phenotype characteristic data related to genes;

(5) and classifying the obtained image phenotype characteristic data related to the gene by using a multi-core support vector machine, and performing correlation analysis.

Further, the step (2) is realized as follows:

constructing a weighted graph G with N samples and M-mode phenotype using the characteristics of the M-th mode^m＝(V^m,E^m) To simulate the relationship between samples, where V^mCorresponding to N samples and E^mWeights corresponding to similarity relationships between samples; the edge weights are represented by an NxN similarity matrix S, where S_i,jRepresenting the similarity between sample i and sample j; s_i,jIs calculated a priori by the label as:

wherein, the first and the second end of the pipe are connected with each other,

c is the number of sample classes, x is reconstructed using samples with the same label_iIn which

Representing the ith sample in class d, p is a regularization parameter,

is a vector in which

The element of the position is zero, the negative solution is ignored, i.e.:

use of SLEP toolkit to obtain optimal solution to the above optimal problem

Wherein the global similarity matrix S is an NxN symmetric matrix, S_i,jIs an element of the similarity matrix S;

computing local similarity matrices for multimodal topographical maps using a K-nearest neighbor algorithm

Namely:

wherein the content of the first and second substances,

is a local similarity matrix

An element of (1); local similarity can be propagated to remote similarity through a diffusion process, and is widely adopted in other manifold learning algorithms; sparse matrix

Only the strong connections between the vertices are retained while the weak connections are removed; notably, S carries complete information about the similarity between vertices, and

describes the part of the vertex closest to KSimilarity, which is robust to noise in the similarity measure;

establishing a diffusion process of the multi-modal phenotype map:

wherein the content of the first and second substances,

and

respectively representing the global similarity matrix of the 1 st model and the 2 nd modality at time t,

and

the method comprises the following steps of respectively setting local similarity matrixes of a 1 st model and a 2 nd mode, wherein eta is a parameter, I is an identity matrix, eta I is introduced in the diffusion process, and the diffusion process of various modes is as follows:

wherein the content of the first and second substances,

and

respectively representing a global similarity matrix and a local similarity matrix of the mth model at the time t; at this time, the diffusion matrix is obtained as follows:

further, the step (3) is realized as follows:

(31) constructing a hypergraph according to the unified map structure information P obtained by multi-mode tabular map diffusion, if x_iStrongly correlated with multiple samples in the graph, it is contained in the super edge e_jIn (3), the obtained incidence matrix H is:

where θ is a threshold value, by judging the similarity P_ijWhether the vertex is larger than the threshold value theta or not is directly determined whether the vertex v is to be replaced or not_iTo the super edge e_j；

(32) Taking each sample as the center and constructing a hyper-edge by selecting the most relevant sample in the unified graph, according to H, defining the vertex v_iE degree and super edge of V_jThe degree of E is as follows:

wherein, w (e)_j) Is e_jH belongs to H; will P_iAs a sample x_iAnd the similarity between two samples is measured by the dot product of two feature vectors:

M(i，j)＝|<P_i，P_j>|

at this time, the neighborhood matrix is calculated as M ═ P^TP |, then the weight of the super edge is calculated as:

obtaining a weighted hypergraph G (V, E, Q), wherein Q is a weight matrix of the hyperedge, and Q belongs to Q;

(33) based on the constructed hypergraph G, the following laplacian matrix is obtained:

wherein D is_vAnd D_eDiagonal matrices of vertex degrees and hyper-edge degrees, respectively; according to the Laplace matrix L^hThe hypergraph regularization term Ω is defined as:

Ω＝(Xw)^TL^hXw

(34) according to the hypergraph regularization term omega, establishing a hypergraph regularization multi-mode learning model based on graph diffusion as follows:

wherein the content of the first and second substances,

refers to the phenotype of M modalities; y ═ y₁,y₂,…,y_n,…,y_N]∈R^NIs a corresponding gene; n is the number of samples, and p corresponds to the feature dimension of each modal phenotype; w is a^m∈R^pIs the weight vector of the mth modality; w ═ W¹,w²,…,w^M]∈R^p×MRepresenting a matrix formed by weight vectors on corresponding modes; | W | count the hair_2,1Refers to a group of sparse regularization terms for jointly selecting a few gene locus-related features from a multi-modal brain image phenotype; λ and μ are two regularization parameters.

Further, the step (4) is realized as follows:

(41) decomposing the hypergraph regularization multi-mode learning model based on graph diffusion, which is put forward in the step (3), into smoothlets g₁(W) and a non-smoothing sub-formula g₂(W), namely:

g₂(W)＝λ||W||_2，1

(42) defining an approximation function Q_l(W,W_t) The following were used:

wherein | · | purple sweet_FRepresenting a Frobenius norm operator;

is g₁(W) t th iteration at W_tThe gradient of (d); symbol l represents the step size;

(43) carrying out hypergraph regularization multi-mode learning model optimization based on graph diffusion by adopting an accelerated approximate gradient method, wherein the method comprises the following steps:

wherein, w_kAnd v_kRespectively refer to the k-th columns of matrix W and matrix V;

(44) and (4) obtaining image phenotype characteristic data related to the gene through iterative optimization of (41) to (43).

Further, the step (5) is realized as follows:

detecting whether all the brain image phenotype characteristic data related to the genes are straight through; if yes, outputting brain image phenotype characteristic data related to the genes, and classifying the brain image phenotype characteristic data related to the genes by using a multi-core support vector machine; otherwise, go back to step (1).

Has the advantages that: compared with the prior art, the invention has the beneficial effects that: the invention can fully utilize the image phenotype data and effectively analyze the influence of the genes in the human brain change on the multi-modal brain image phenotype; the invention not only achieves strong association, but also discovers a remarkable, consistent and stable phenotype biomarker related to genes.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The present invention provides a method of gene impact on multimodal brain imaging phenotypes, first providing a graph diffusion method for enhancing similarity measurements between samples given multimodal phenotypes, the method utilizing a plurality of input similarity graphs and fusing them to a unified graph having valuable geometries between different imaging phenotypes; secondly, the invention uses the unified map to represent the high-order similarity relation between samples, combines cross-modal information through a hypergraph regularization item, establishes a hypergraph regularization multi-modal learning model based on graph diffusion, and designs an optimization strategy to solve the model to obtain an imaging phenotype related to genes. In the validation section, the above algorithm was tested in the ADNI dataset using MATLAB software. The overall flow chart of the invention is shown in fig. 1, and specifically comprises the following steps:

step 1, preprocessing pre-acquired gene brain image data to remove irrelevant information.

And 2, providing a multi-mode tabular diagram diffusion method, and obtaining a unified diagram containing valuable geometric structure information in all multi-mode tabular diagrams based on the multi-mode tabular diagram diffusion method.

First assume N samples and M modal phenotypes. Constructing a weighted graph G using features of the mth modality^m＝(V^m,E^m) To simulate the relationship between samples, where V^mCorresponding to N samples and E^mWeights corresponding to similarity relationships between samples. The edge weights are represented by an NxN similarity matrix S, where S_i,jRepresenting the similarity between sample i and sample j. To make full use of the tag information, S_i,jIs a priori calculated by the tag as:

wherein the content of the first and second substances,

c is the number of sample classes. In the above equation, x is reconstructed using samples with the same label_iWherein

Represents the ith sample in class d, and can be represented by samples with the same diagnostic label. P is a regularization parameter.

Is a vector in which

The element of the position is zero, the negative solution is ignored, i.e.:

here the SLEP toolkit is used to obtain the optimal solution to the above-described optimal problem

At this time:

wherein the global similarity matrix S is an NxN symmetric matrix, S_i,jAre elements of the similarity matrix S.

Then, a local similarity matrix of the multi-modal tabular form is calculated using a K-nearest neighbor algorithm

Namely:

wherein the content of the first and second substances,

is a local similarity matrix

The element (c) of (a). Local similarities can be propagated to remote similarities through a diffusion process, and are also widely adopted in other manifold learning algorithms. Sparse matrix

Only strong connections between vertices are retained while weak connections are removed. Notably, S carries complete information about the similarity between vertices, and

local similarity to the nearest vertex of K is described, which is robust to noise in the similarity measure.

And finally, establishing a diffusion process of the multi-modal phenotype map. Considering the case of two modal phenotypes, according to step (2), a global similarity matrix S can be calculated¹And S²And local similarity matrix

And

will be provided with

And

the similarity matrix at t-0 is expressed. The key to the diffusion process is to iteratively update the similarity by information in other modalities, as follows:

wherein the content of the first and second substances,

is the similarity matrix of the mth modality after t iterations,

is the local similarity matrix for the mth modality. The similarity matrix is updated each time two parallel swap-diffusion processes are generated. After each iteration, all are paired

And

the normalization is performed in order to ensure that the sample is always most similar to itself, and not to other samples, during the diffusion process. Further, the air conditioner is provided with a fan,

wherein eta is a parameter, I is an identity matrix, and eta I is introduced in the diffusion process, so that not only can the loss of self-similarity be avoided, but also more stable quality distribution can be ensured. Based on the above considerations, the algorithm is extended to the multi-modal case (i.e., M > 2). Similar to the case of two modalities, the diffusion process for multiple modalities can be modeled as:

finally, the diffusion matrix is obtained as follows:

the basic idea of the diffusion method described above is that x_iAnd x_jOften not strongly connected in one modality, however, their connections in different modalities may be strong and this information may propagate through diffusion processes. Thus, when a strong connection in one or more graphs is added to other graphs, then all weak connections in the graphs disappear at the same time. In addition, if x_iAnd x_jSharing local connections in all figures, then x_iAnd x_jAnd are likely to belong to the same class, in which case the diffusion process may enhance similarity. A graph of one modality may absorb supplemental information from a graph of other modalities in each iteration. Thus, the final diffusion-derived unified map contains valuable geometric information in all the multimodal phenotypes.

And 3, establishing a hypergraph regularization multi-mode learning model based on graph diffusion.

Specifically, the hypergraph is constructed according to the unified graph structure information P obtained by the multi-mode tabular graph diffusion in step 2. If x_iStrongly correlated with multiple samples in the graph, it is contained in the super edge e_jThe obtained incidence matrix H is:

where θ is a threshold. It can be seen that by judging the similarity P_ijWhether the vertex is larger than the threshold value theta or not is directly determined whether the vertex v is to be replaced or not_iTo the super edge e_j. At this point, each sample is taken as the center and the super-edge is constructed by selecting the most relevant sample in the unified map. The neighborhood number is adaptive to each sample center, facilitating the capture of local group information for the data. From H, define vertex v_iE degree and super edge of V_jThe degree of E is as follows:

wherein, w (e)_j) Is e_jH ∈ H. The main difference between the hypergraph and the normal graph is the hyperedge, which can connect more than two vertices. The super-edge has the ability to model complex relationships between local population information and samples. In addition, the super-edge weights have an important role in the hypergraph. In particular, if there is a strong connection and similarity between vertices in a hyper-edge, then the vertices are likely to belong to the same class of samples, and the hyper-edge should have a high weight. Conversely, if the vertices in a hyperedge are not very similar, the hyperedge should have a lower weight. Thus, P can be substituted_iAs a sample x_iAnd the similarity between two samples can be measured by the dot product of two feature vectors:

M(i,j)＝|<P_i,P_j>|

at this time, the neighborhood matrix may be calculated as M ═ P^TP |, then the weight of the super edge can be calculated as:

the resulting weighted hypergraph is G (V, E, Q), where Q is the weight matrix for the hyperedge, and Q ∈ Q.

Based on the constructed hypergraph G, the following laplacian matrix can be obtained:

wherein D is_vAnd D_eRespectively, a diagonal matrix of vertex degrees and hyper-edge degrees.

According to the Laplace matrix L^hThe hypergraph regularization term Ω may be defined as:

Ω＝(Xw)^TL^hXw

in order to retain the basic structure and high-order information of data, according to a hypergraph regularization term omega, the following hypergraph regularization multi-mode learning model based on graph diffusion is established:

wherein the content of the first and second substances,

refers to a phenotype of M modalities, y ═ y₁,y₂,…,y_n,…,y_N]∈R^NIs the corresponding gene (APOE SNP rs429358), N is the number of samples, p corresponds to the characteristic dimension of each modal phenotype, w^m∈R^pIs the weight vector of the mth mode, W ═ W¹,w²,…,w^M]∈Rp^×MRepresenting a matrix formed by weight vectors on the corresponding modality, | W | | non-woven cells_2,1Refers to a set of sparse regularization terms for jointly selecting a few gene locus-related features from a multi-modal brain image phenotype, λ and μ being two regularization parameters. The hypermap regularization multi-modal learning model based on map diffusion can sufficiently mine and describe high-order information among samples so as to better discover and select multi-modal phenotypes related to genetic genes.

And 4, designing an optimization algorithm to optimize the hypergraph regularization multi-modal learning model based on graph diffusion in the step 3 to obtain brain image phenotype characteristic data related to genes, wherein the process is as follows:

1) the model proposed in step 3 is decomposed into smooth sub-formula g by simplification₁(W) and a non-smoothing sub-formula g₂(W), namely:

g₂(W)＝λ||W||_2,1

2) defining an approximation function Q_l(W,W_t) The following were used:

wherein | · | purple sweet_FRepresenting the Frobenius norm operator.

Is g₁(W) t th iteration at W_tThe gradient of (a). The symbol l denotes the step size.

3) According to 1) and 2), model optimization is performed by using an accelerated approximation gradient method, wherein:

wherein, w_kAnd v_kRespectively, to the kth column of matrix W and matrix V.

4) And (3) obtaining image phenotype characteristic data related to the gene through iterative optimization of 1) to 3).

And classifying the obtained image phenotype characteristic data related to the genes by using a multi-core support vector machine, and performing correlation analysis.

Finally detecting whether all the brain image phenotype characteristic data related to the gene are straight-through; if yes, outputting brain image phenotype characteristic data related to the genes, and classifying the brain image phenotype characteristic data related to the genes by using a multi-core support vector machine; otherwise, returning to the gene image data preprocessing step.

To verify the effectiveness of the inventive method, this example evaluated the performance of the inventive method on a real ADNI dataset. Multimodal brain image data and corresponding genetic data were used, including 913 samples. The invention not only achieves strong association, but also discovers a remarkable, consistent and stable phenotype biomarker related to genes.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims

1. A method for analyzing influence of genes on a multi-modal brain imaging phenotype, comprising the steps of:

(1) preprocessing pre-acquired gene brain image data;

2. The method according to claim 1, wherein the step (2) is performed by:

wherein the content of the first and second substances,

c is the number of sample classes, x is reconstructed using samples with the same label_iWherein

Representing the ith sample in class d, p is a regularization parameter,

is a vector in which

The element of the position is zero, the negative solution is ignored, i.e.:

use of SLEP toolkit to obtain optimal solution to the above optimal problem

Namely:

wherein the content of the first and second substances,

is a local similarity matrix

local similarity to the nearest vertex of K is described, which is robust to noise in the similarity measure;

establishing a diffusion process of the multi-modal phenotype map:

wherein the content of the first and second substances,

and

and

wherein the content of the first and second substances,

and

3. the method according to claim 1, wherein the step (3) is performed by:

(31) constructing a hypergraph according to the unified map structure information P obtained by multi-mode tabular map diffusion, if x_iStrongly correlated with multiple samples in the graph, it is contained in the super edge e_jThe obtained incidence matrix H is:

M(i,j)＝|<P_i,P_j>|

Ω＝(Xw)^TL^hXw

(34) according to the hypergraph regularization term omega, establishing a hypergraph regularization multi-mode learning model based on graph diffusion:

wherein the content of the first and second substances,

refers to the phenotype of M modalities; y ═ y₁,y₂,…,y_n,…,y_N]∈R^NIs a corresponding gene; n is the number of samples, and p corresponds to the feature dimension of each modal phenotype; w is aⁿ∈R^pIs the weight vector for the mth modality; w ═ W¹,w²,…,w^M]∈R^p×MRepresenting a matrix formed by weight vectors on corresponding modes; | W | count the hair_2,1Refers to a group of sparse regularization terms for jointly selecting a few gene locus-related features from a multi-modal brain image phenotype; λ and μ are two regularization parameters.

4. The method according to claim 1, wherein the step (4) is performed by:

g₂(W)＝λ||W||_2,1

(42) defining an approximation function Q_l(W,W_t) The following were used:

wherein | · | purple sweet_FRepresenting a Frobenius norm operator;

wherein, w_kAnd v_kRespectively refer to the k-th column of the matrix W and the matrix V;

5. The method according to claim 1, wherein the step (5) is performed by: