CN111261243B

CN111261243B - Method for detecting phase change critical point of complex biological system based on relative entropy index

Info

Publication number: CN111261243B
Application number: CN202010025627.8A
Authority: CN
Inventors: 刘锐; 王俊霞; 陈培
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2023-04-21
Anticipated expiration: 2040-01-10
Also published as: CN111261243A

Abstract

The invention discloses a method for detecting phase transition critical points of a complex biological system based on relative entropy indexes, which is used for determining early warning signals of a pre-disease state or phase transition by researching rich dynamic information provided by high-flux data and learning different characteristics of networks under two different states by utilizing different characteristics between a normal state and a pre-disease state. To verify validity, the present invention applies this detection method to two real data sets. The two real data sets are respectively: lung squamous cell carcinoma (luc) and lung adenocarcinoma (LUAD).

Description

Method for detecting phase change critical point of complex biological system based on relative entropy index

Technical Field

The invention relates to the technical field of biological system phase transition critical point detection, in particular to a method for detecting complex biological system phase transition critical points based on a relative entropy index (Relative Entropy Score, recorded as RES).

Background

The progression of complex diseases such as diabetes and cancer is generally a nonlinear process with three phases, a normal state, a pre-disease state, and a disease state, where the pre-disease state is a critical state or point prior to the disease state. Traditional biomarkers aim to identify disease states by using observed differential expression information of molecules, but pre-disease states may not be detected since there is typically no significant difference between normal and pre-disease states. Thus, signaling pre-disease states is a challenge, which in effect means disease prediction.

The theoretical derivation of the calculation method is presented below:

different dynamics before and around critical phase transitions:

the dynamics of complex disease progression can be represented by the following nonlinear discrete-time dynamic system:

Z(t)＝f(Z(t-1)；P), (1)

here Z (t) = (Z) ₁ (t),z ₂ (t),…,z _n (t)) is an n-dimensional state vector or variable at time t=1, 2, …, p= (P) ₁ ,…,P _s ) Is a parameter vector or driving factor representing a slowly varying factor, such as a genetic factor (SNP, CNV, etc.), an epigenetic factor (methylation, acetylation, etc.), or an environmental factor. f R ⁿ ×R ^s ×R ⁿ Is a nonlinear function. For such a nonlinear systemThe system is in

Will undergo a phase change or be a kind of a phase change when the parameter P reaches the threshold P _c From a stable equilibrium bifurcation (Gilmore, 1993). The supplementary information A1 gives a detailed description.

For a system (1) near z, P reaches P _c Previously, the system should maintain a stable equilibrium

All eigenvalues are therefore modulo (0, 1). Parameter value P for shifting system state _c Referred to as a bifurcation parameter value or a threshold value, and the state prior to such bifurcation is referred to as a pre-disease state. In general, a real system is often disturbed by noise and thus has random dynamics. When the system approaches from a normal state to a pre-disease state, the dynamic and statistical properties have been demonstrated that as the system approaches the pre-disease state, a significant set or Dynamic Network Biomarkers (DNBs) appear in the observed variables, meeting the following three conditions (Chen et al 2012, liu et al 2012,2013a,2014 b)

The variable z in this group _i (t) an increase in correlation between;

the set of variables z _i (t) and other groups of variables z _j (t) a decrease in correlation between;

the set of variables z _i The standard deviation of (t) increases.

Thus, there is a significant difference in kinetics between normal and pre-disease states. The normal state is a steady state with high rebound, insensitive to parameter disturbances, and therefore can be modeled as a smooth markov process. When the system is in a normal state, there is no significant change between the distributions of Z (t) and Z (t-1), i.e., the probability distribution remains almost unchanged over time. In contrast, a pre-disease state with low rebound is sensitive to parameter changes, whose dynamics or probability distribution changes over time. In this way, the pre-disease state is modeled as a time-varying Markov process. When the system is in a pre-disease state, there is a significant difference between the distribution of Z (t) and the distribution of Z (t-1). Based on these dynamics, the switching time from the normal state to the pre-disease state can be identified.

Most biomolecules perform their function through interactions with functional modules or other biomolecules between modules. This inter-and intra-module interconnectivity suggests that the effects of a particular genetic abnormality not only affect the activity of the gene product carrying it, but can extend along links of a network consisting of biomolecules and alter the activity of other gene products. Thus, understanding the interaction network environment of a biomolecule is critical to determining the phenotypic effects of defects affecting a biomolecule.

Disclosure of Invention

The invention aims to provide a method for detecting a phase transition critical point of a complex biological system based on a relative entropy index (Relative Entropy Score) by utilizing different characteristics between a normal state and a disease state, wherein in the biological process of the complex disease, a pre-disease state is identified before the critical point is reached. In particular, identifying a pre-disease state corresponds to detecting a switching point where two networks differ.

To study the evolution of the network system, the invention uses a difference network which integrates the difference edges, namely, quantifies the statistical importance (namely, relative entropy index, RES) of each difference edge in the difference network.

The aim of the invention can be achieved by adopting the following technical scheme:

a method for detecting phase transition critical points of a complex biological system based on relative entropy indexes, the method comprising the following steps:

s1, a continuous time observation data sequence O _t ＝{o ₁ ,o ₂ ,o ₃ ,…,o _t Conversion to a sequence of time-variant networks { DN } ₂ ,DN ₃ ,…,DN _t-1 ,DN _t }；

The correlation network is built first, and the correlation is mapped to the existing functional network, namely the STRING network, at each sampling timePoint pair observation sequence { o } ₁ ,o ₂ ,o ₃ ,…,o _t Construction of a related network sequence { N } ₁ ,…,N _t}, wherein ,N_t Representing a correlation network at time t, each edge connecting two nodes represents a correlation between two biomolecules, while each edge connecting only one node represents a self-adjustment or variation of the biomolecules, and subsequently, a parameter α is selected such that the Pearson correlation coefficient PCC satisfies the following formula: the I PCC I is not less than alpha, wherein the parameter alpha is a parameter to be determined based on specific real data, the edges of the related network of the PCC meeting the above conditions are reserved, and the edges not meeting the above conditions are removed, so that the related network is obtained;

s2, preparing a reference sample, and taking a sample extracted in a normal period as the reference sample. For a real dataset we usually choose a sample from normal tissue as a reference sample;

s3, fitting the distribution of the biomolecules according to a reference sample, wherein the distribution is specifically as follows:

for biomolecules g _i Fitting a gaussian distribution based on the expression level in the reference samples { s1, s2, …, sk }; then, a k-dimensional vector (area (D _gi (S ₁ )),area(D _gi (S ₂ )),…,area(D _gi (S _k ) Of), wherein area (D) _gi (S _k ) Representing the biomolecule g in the kth sample _i A cumulative area determined by a gaussian distribution;

s4, constructing a reference distribution P according to the following formula

wherein ,

representing the biomolecule g in the kth sample _i The cumulative area determined by the corresponding Gaussian distribution is +.>

S5, calculating a relative entropy index (namely Relative Entropy Score) which is recorded as RES

Wherein RES_N represents the relative entropy index obtained by the normal sample,

wherein ,

H _<u,v> (x _v ,x _ul ) Represents the edge feature between subject v and subject u's first normal sample, H _<v,u> (x _u ,x _vp ) Representing the edge features between subject u and the p-th normal sample of subject v, x _v All normal samples, x, representing subject v _u All normal samples, x, representing subject u _ul The first normal sample, x, representing subject u _vs An s-th normal sample representing subject v

wherein ,p(x_v1 ) Distribution of the 1 st normal sample representing subject v, p (x _v2 ) Distribution of the 2 nd normal sample representing subject v, …, p (x _vm ) Distribution of the mth normal sample representing subject v, p (x _ul ) A distribution of the first normal sample representing subject u;

is available in the same way

Wherein RES_D represents the relative entropy index obtained from the disease sample, H _<u,v> (y _v ,y _ul ) Representing the subjectv and the first disease sample of subject u, H _<v,u> (y _u ,y _vs ) Representing the edge features between subject u and subject v's disease sample, y _v All disease samples representing subject v, y _u All disease samples representing subject u, y _ul A first disease sample representing subject u, y _vs The s-th disease sample representing the subject v, P representing a discrete probability distribution, distribution P satisfying

Where P (x) is the probability value expressed with reference to the xth sample, u, v represent the subject, l represents the ith sample, s represents the s-th sample.

Further, the method for detecting the phase transition critical point of the complex biological system requires at least 3 samples.

Further, the relative entropy index (RES) has different characteristics in different states, and the value of the relative entropy index (RES) in a disease state is smaller than that in a normal state.

Further, the parameter α is selected according to a principle that the difference network in the normal state has as few difference edges as possible, so as to highlight the pre-disease state with a certain number of difference edges.

Compared with the prior art, the invention has the following advantages and effects:

the invention provides a calculation method based on a relative entropy index (RES) for identifying an upcoming critical transition, which is proved to be valid by a real dataset. It is noted that the object of the present invention is to detect early warning signals generated from normal conditions (or pre-disease conditions) rather than to find signs of a disease condition (or pre-disease condition) where a qualitative change occurs. The innovation of the invention is as follows:

1. the traditional method can only judge whether an individual is in a healthy state or a disease state, but the critical transition critical period cannot be effectively perceived in the limit state of the healthy state, and the invention adopts a calculation method of a time difference network, so that the pre-disease period in the complex disease development process can be accurately reflected or the occurrence of complex disease deterioration can be predicted;

2. in the prior art, single variable or few variables are greatly influenced by noise, and critical point signals are not obvious, so that the method can overcome;

3. the method adopts unsupervised learning and a forward algorithm to realize actual operation of high-flux data;

4. the very ingenious model design in the method of the invention converts continuous gene expression data into observation data, which is a difficult point and a key point.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting phase transition critical points of a complex biological system based on relative entropy indexes;

fig. 2 (a) is a graph of lung squamous cell carcinoma (luc) dataset in a first case, namely: the IA phase sample is used as a control sample, the rest samples are used as experimental groups, the comparison schematic diagram of the gene expression and the relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of the gene expression on the data set, and the right side represents the survival analysis results of the relative entropy index (RES) on the data set;

fig. 2 (b) is a graph of lung squamous cell carcinoma (luc) dataset in a second case, namely: the comparison schematic of the results of the survival analysis of the gene expression and the relative entropy index (RES) is shown by taking the IA and the IBETA phase samples as control samples, taking the rest samples as experimental groups, wherein the left side represents the results of the survival analysis of the gene expression on the data set, and the right side represents the results of the survival analysis of the relative entropy index (RES) on the data set;

fig. 2 (c) is a graph of lung squamous cell carcinoma (luc) dataset in a third case, namely: the comparison schematic of the survival analysis results of gene expression and relative entropy index (RES) is shown in the left side, wherein the left side represents the survival analysis results of gene expression on a data set, and the right side represents the survival analysis results of relative entropy index (RES) on the data set;

fig. 2 (d) is a graph of lung squamous cell carcinoma (luc) dataset in a fourth case, namely: IA, IBETA, IIA and IIBETA phase samples are used as control samples, the rest samples are used as experimental groups, the comparison schematic diagram of the gene expression and the relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of the gene expression on the data set, and the right side represents the survival analysis results of the relative entropy index (RES) on the data set;

fig. 2 (e) is a graph of lung squamous cell carcinoma (luc) dataset in a fifth case, namely: IALPHA, IBETA, IIALPHA, IIBETA and IIIALPHA phase samples are used as control samples, the rest samples are used as experimental groups, the comparison diagram of gene expression and relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of gene expression on the data set, and the right side represents the survival analysis results of relative entropy index (RES) on the data set;

FIG. 2 (f) is a graphical representation of a comparison of gene expression and relative entropy index (RES) results on a lung squamous cell carcinoma (LUSC) dataset;

FIG. 2 (g) is a schematic representation of the dynamic evolution of a network of relative entropy indices (RES) on a lung squamous cell carcinoma (LUSC) dataset;

fig. 3 (a) is a graph of lung adenocarcinoma (LUAD) dataset in a first case, namely: the phase I sample is used as a control sample, the rest sample is used as an experimental group, the comparison diagram of the gene expression and the relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of the gene expression on the data set, and the right side represents the survival analysis results of the relative entropy index (RES) on the data set;

fig. 3 (b) is a graph of lung adenocarcinoma (LUAD) dataset in a second case, namely: i and an IA period sample are taken as control samples, the rest samples are taken as experimental groups, the comparison diagram of the gene expression and the relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of the gene expression on the data set, and the right side represents the survival analysis results of the relative entropy index (RES) on the data set;

fig. 3 (c) is a graph of lung adenocarcinoma (LUAD) dataset in a third case, namely: i, using IA and IBETA samples as control samples, using the rest samples as experimental groups, comparing gene expression and relative entropy index (RES) survival analysis results, wherein the left side represents the survival analysis result of gene expression on the data set, and the right side represents the survival analysis result of relative entropy index (RES) on the data set;

fig. 3 (d) is a graph of lung adenocarcinoma (LUAD) dataset in a fourth case, namely: i, IALPHA, IBETA and IIALPHA phase samples are used as control samples, the rest samples are used as experimental groups, the comparison diagram of gene expression and relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of gene expression on the data set, and the right side represents the survival analysis results of relative entropy index (RES) on the data set;

fig. 3 (e) is a graph of lung adenocarcinoma (LUAD) dataset in a fifth case, namely: i, IALPHA, IBETA, IIALPHA and IIBETA phase samples are used as control samples, the rest samples are used as experimental groups, the comparison diagram of gene expression and relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of gene expression on the data set, and the right side represents the survival analysis results of relative entropy index (RES) on the data set;

fig. 3 (f) is a graph of lung adenocarcinoma (LUAD) dataset in a sixth scenario, namely: i, IALPHA, IBETA, IIALPHA, IIBETA and IIIALPHA phase samples are used as control samples, the rest samples are used as experimental groups, the comparison diagram of gene expression and relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis results of gene expression on the data set, and the right side represents the survival analysis results of relative entropy index (RES) on the data set;

fig. 3 (g) is a graph of lung adenocarcinoma (LUAD) dataset in a seventh case, namely: i, IALPHA, IBETA, IIBETA, IIIALPHA and IIIBETA phase samples are used as control samples, the rest samples are used as experimental groups, the comparison of gene expression and relative entropy index (RES) survival analysis results is shown, the left side represents the survival analysis result of gene expression on the data set, and the right side represents the survival analysis result of relative entropy index (RES) on the data set;

FIG. 3 (h) is a graphical representation of a comparison of gene expression and relative entropy index (RES) results on a lung adenocarcinoma (LUAD) dataset;

fig. 3 (i) is a diagram of the dynamic evolution of a network of relative entropy indices (RES) on a lung adenocarcinoma (LUAD) dataset.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

As shown in fig. 1, the present invention discloses a method for detecting critical state before phase transition of complex biological system based on relative entropy index.

A data matrix illustration of node features and edge features for calculating the relative entropy index (RES) is given below.

First, node characteristics are used to distinguish normal samples from disease samples by x, y. The mth normal sample of subject u is denoted as x _um The nth disease sample of study subject u is designated y _un . In addition, the mth normal sample of subject v is denoted as x _vm The nth disease sample of study subject v was noted as y _vn 。

The edge feature follows.

H _<u,v> (x _v ,x _ul ) An edge feature between a normal sample representing subject v and the first normal sample of subject u;

H _<v,u> (x _u ,x _vp ) An edge feature between a normal sample representing subject u and a p-th normal sample of subject v;

H _<u,v> (y _v ,y _ul ) An edge feature between a disease sample representing subject v and a first disease sample of subject u;

H _<v,u> (y _u ,y _vp ) Edge features between the disease sample representing subject u and the p-th disease sample of subject v;

the relative entropy index res_n obtained by the normal sample:

wherein ,

x _v all normal samples, x, representing subject v _u All normal samples, x, representing subject u _ul The first normal sample, x, representing subject u _vs An s-th normal sample representing subject v, p (x _v1 ) Distribution of the 1 st normal sample representing subject v, p (x _v2 ) Distribution of the 2 nd normal sample representing subject v, …, p (x _vm ) Represents the distribution of the mth normal sample of subject v. p (x) _ul ) Representing the distribution of the first normal sample of subject u.

The relative entropy index RES_D obtained by the disease sample can be obtained by the same method

wherein ,H_<u,v> (y _v ,y _ul ) Represents the edge features between subject v and subject u's first disease sample, H _<v,u> (y _u ,y _vp ) Representing the edge features between subject u and subject v's p-th disease sample, y _v All disease samples representing subject v, y _u All disease samples representing subject u, y _ul A first disease sample representing subject u, y _vs An s disease sample representing subject v.

According to the flow diagram disclosed in fig. 1.

The results obtained in this example are as follows:

1. predicting critical points of a real dataset

The present example applies a method based on the relative entropy index to two real experimental data sets, namely lung squamous cell carcinoma (luc) and lung adenocarcinoma (LUAD).

2. Application of relative entropy index in 2 tumor data sets

To further demonstrate the effectiveness of this method, it was applied to 2 tumor datasets: lung squamous cell carcinoma, lung adenocarcinoma, all from TCGA oncogene patterns, consisting of tumor and tumor proximity samples. According to the corresponding clinical data of TCGA cancer gene map, the tumor is divided into different stages. Lung squamous cell carcinoma, lung adenocarcinoma can be divided into 7 stages. In all 2 data sets, the relative entropy index (RES) of each pair of genes was calculated according to the relative entropy index (RES) algorithm. Finally, the critical phase of the tumor is determined by observing the changes in the relative entropy index (RES) values of each pair of genes.

The relative entropy index (RES) successfully identified the key stages before both cancers were worsened. To verify the identified critical period, kaplan-mean (log-rank) survival analysis was performed on samples before and after critical transformation (FIGS. 2 (a) -2 (e), 3 (a) -3 (g)). The prognostic life of the sample before critical transformation is generally higher than that of the sample after critical transformation. In particular, for lung squamous cell carcinoma, it can be seen from fig. 2 (c) that the survival time of the sample before the critical period (sample of stage IA-IIA) is much longer than that of the sample after the critical period (sample of stage IIB-IV), and there is a significant difference between the survival curves of the two groups of samples (significant value p=0.042). The survival curves of the samples before and after stage ii B of lung adenocarcinoma were significantly different (p=0.015, fig. 3 (g)), and the survival time of the pre-critical samples (samples of stages IA-IIB) was much longer than that of the post-critical samples (samples of stages IIIA-IV). These results indicate that the determined critical phase is accurate and closely related to prognosis.

In summary, the invention provides a calculation method based on a time difference network by utilizing the observed difference correlation information between molecules in normal and pre-disease states, which can accurately reflect the pre-disease state or predict the occurrence of serious diseases. This differential network differs from existing methods in that the skilled artisan studies the differential association (or correlation) of genes or proteins, rather than differential expression of genes or proteins. The theoretical basis for this work is the quantification of critical states using dynamic network biomarkers.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The method for detecting the phase change critical point of the complex biological system based on the relative entropy index is characterized by comprising the following steps:

s1, a continuous time observation data sequence O _t ＝{o ₁ ，o ₂ ，o ₃ ，...，o _t Conversion to a sequence of time-variant networks { DN } ₂ ，DN ₃ ，...，DN _t-1 ，DN _t }；

A correlation network is established first, and the observation sequence { o } is mapped to the existing functional network, namely the STRING network at each sampling time point ₁ ，o ₂ ，o ₃ ，...，o _t Construction of a related network sequence { N } ₁ ，...，N _t}, wherein ,N_t Representing a correlation network at time t, each edge connecting two nodes represents a correlation between two biomolecules, while each edge connecting only one node represents a self-adjustment or variation of the biomolecules, and subsequently, a parameter α is selected such that the Pearson correlation coefficient PCC satisfies the following formula: the I PCC I is not less than alpha, wherein the parameter alpha is a parameter to be determined based on specific real data, the edges of the related network of the PCC meeting the above conditions are reserved, and the edges not meeting the above conditions are removed, so that the related network is obtained;

for biomolecules g _i Fitting a gaussian distribution based on the expression levels in the reference samples { s1, s2,.,. Then, a k-dimensional vector is obtained

wherein ,

representing the biomolecule g in the kth sample _i A cumulative area determined by a gaussian distribution;

s4, constructing a reference distribution P according to the following formula

wherein ,

representing the cumulative area of the biomolecules gi in the kth sample as determined by the corresponding Gaussian distribution, for distribution P there is +.>

S5, calculating a relative entropy index, and marking the relative entropy index as RES

wherein ,

H _<u，v> (x _v ，x _ul ) Represents the edge feature between subject v and subject u's first normal sample, H _<v，u> (x _u ，x _vp ) Representing the edge features between subject u and the p-th normal sample of subject v, x _v All normal samples, x, representing subject v _u All normal samples, x, representing subject u _ul The first normal sample, x, representing subject u _vs An s-th normal sample representing subject v

wherein ,p(x_v1 ) Distribution of the 1 st normal sample representing subject v, p (x _v2 ) Distribution of the 2 nd normal sample representing subject v _vm ) Distribution of the mth normal sample representing subject v, p (x _ul ) A distribution of the first normal sample representing subject u;

is available in the same way

Wherein RES_D represents the relative entropy index obtained from the disease sample, H _<u，v> (y _v ，y _ul ) Represents the edge features between subject v and subject u's first disease sample, H _<v，u> (y _u ，y _vs ) Representing the edge features between subject u and subject v's disease sample, y _v All disease samples representing subject v, y _u All disease samples representing subject u, y _ul A first disease sample representing subject u, y _vs Representative ofThe s-th disease sample of the study object v, P represents a discrete probability distribution, the distribution P satisfies

2. The method for detecting phase transition critical points of a complex biological system based on relative entropy indexes of claim 1, wherein the method for detecting phase transition critical points of a complex biological system requires at least 3 samples.

3. The method for detecting phase transition critical points of complex biological systems based on relative entropy indexes according to claim 2, wherein the relative entropy indexes have different characteristics in different states, and the relative entropy indexes in a disease state have smaller values than those in a normal state.

4. The method for detecting phase transition critical points of complex biological systems based on relative entropy indexes according to claim 1, wherein the parameter α is selected in such a way that the difference network in the normal state has as few difference edges as possible, so as to highlight the pre-disease state with a certain number of difference edges.