CN109101783A - A kind of cancer operator logo object based on probabilistic model determines method and system - Google Patents

A kind of cancer operator logo object based on probabilistic model determines method and system Download PDF

Info

Publication number
CN109101783A
CN109101783A CN201810920673.7A CN201810920673A CN109101783A CN 109101783 A CN109101783 A CN 109101783A CN 201810920673 A CN201810920673 A CN 201810920673A CN 109101783 A CN109101783 A CN 109101783A
Authority
CN
China
Prior art keywords
sample
gene
disease
likelihood score
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810920673.7A
Other languages
Chinese (zh)
Other versions
CN109101783B (en
Inventor
杜玉改
刘文斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN201810920673.7A priority Critical patent/CN109101783B/en
Publication of CN109101783A publication Critical patent/CN109101783A/en
Application granted granted Critical
Publication of CN109101783B publication Critical patent/CN109101783B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a kind of cancer operator logo objects based on probabilistic model to determine method and system, this method comprises: utilizing probability density function, the gene expression data matrix of all normal samples and disease sample that will acquire is converted into likelihood score matrix, and according to all normal sample likelihood score matrixes, normal sample distribution function is constructed;Then each element in each disease sample likelihood score matrix is brought into normal sample distribution function, determine the significant difference gene sets of each disease sample, and the significant difference gene sets of each disease sample are mapped in protein-protein interaction network, determine the operator logo object of each disease sample.Using method provided by the invention or system, cancer operator logo object can be accurately and effectively obtained, and Subtypes are carried out to realize the accurate diagnosing and treating of disease to disease using these cancer operator logo objects.

Description

A kind of cancer operator logo object based on probabilistic model determines method and system
Technical field
The present invention relates to technical field of gene detection, in particular to a kind of cancer operator logo object based on probabilistic model is true Determine method and system.
Background technique
Existing research shows that the occurrence and development of cancer are by the coefficient result of multiple genes.Due to traditional base Because that there are noises is big for expression modal data, the disadvantages of sample is few and positive and negative sample imbalance, therefore will expression modal data and biological net Network combines, and determines that cancer operator logo object just becomes a potential resolving ideas.Meanwhile operator logo object and previous single Gene marker is compared to having higher efficiency and stability.
Summary of the invention
The present invention is in view of the heterogeneity between sample and since the difference of the factors such as pathogenic factor causes disease to exist On the basis of being had differences between different patients, provide a kind of cancer operator logo object based on probabilistic model determine method and System.The present invention can accurately and effectively obtain cancer operator logo object, and using these cancer operator logo objects to disease into Row classification is to realize the accurate diagnosing and treating of disease.
To achieve the above object, the present invention provides following schemes:
A kind of cancer operator logo object based on probabilistic model determines method, and the cancer operator logo object determines method packet It includes:
Obtain the gene expression data matrix of multiple normal samples and multiple disease samples;The gene expression data matrix In element be gene expression amount;
Using probability density function, normal sample is converted by the gene expression data matrix of all normal samples The gene expression data matrix of all disease samples is converted disease sample likelihood score matrix by likelihood score matrix;Institute Stating the element in normal sample likelihood score matrix and the disease sample likelihood score matrix is gene likelihood score;
According to all normal sample likelihood score matrixes, normal sample distribution function is constructed;
Successively bring each element in each disease sample likelihood score matrix into the normal sample distribution function In, determine the significant difference gene sets of each disease sample;
The significant difference gene sets of each disease sample are successively mapped to protein-protein interaction net In network, the operator logo object of each disease sample is determined.
Optionally, the cancer operator logo object determines method further include:
According to the operator logo object of each disease sample and known cancer subtypes priori data to disease sample into The classification of row different subtype.
Optionally, described to utilize probability density function, the gene expression data matrix of all normal samples is turned Normal sample likelihood score matrix is turned to, converts disease sample seemingly for the gene expression data matrix of all disease samples So degree matrix, specifically includes:
Using probability density function, gene likelihood score computation model is constructed;The expression of the gene likelihood score computation model Formula isWherein, λiIndicate the likelihood score of gene i;Indicate i-th Expression quantity of a gene in j-th of sample;fi 1Indicate normal distribution curve of the gene i under disease sample;fi 2Indicate gene i Normal distribution curve under normal sample;
According to the gene likelihood score computation model, the gene expression data matrix of all normal samples is converted For normal sample likelihood score matrix, disease sample likelihood is converted by the gene expression data matrix of all disease samples Spend matrix.
Optionally, described to construct normal sample distribution function according to all normal sample likelihood score matrixes, it is specific to wrap It includes:
According to all normal sample likelihood score matrixes, the mean value and variance of each gene likelihood score are calculated;
According to the mean value and variance of the gene likelihood score, the normal distribution of each gene likelihood score under normal sample is constructed Function.
Optionally, each element by each disease sample likelihood score matrix successively brings the normal sample into In this distribution function, determines the significant difference gene sets of each disease sample, specifically includes:
Each element in the disease sample likelihood score matrix is successively brought into the normal sample distribution function, is counted Calculate the probability value of each gene in each disease sample;
Judge whether the probability value is less than or equal to given threshold;
If so, gene corresponding to the probability value for being less than or equal to given threshold to be determined as to the significant difference of disease sample Gene.
Optionally, described that the significant difference gene sets of each disease sample are successively mapped to protein-albumen In matter interactive network, determines the operator logo object of each disease sample, specifically includes:
The significant difference gene sets of the disease sample are successively mapped to protein-protein interaction network In, and according to intergenic dependent interaction relationship, by the connection gene dosage filtered out most five genes and five institutes The single order neighbor node for stating gene is determined as the operator logo object of disease sample.
The present invention also provides a kind of cancer operator logo objects based on probabilistic model to determine system, the cancer network mark Will object determines that system includes:
Gene expression data matrix obtains module, for obtaining the gene expression of multiple normal samples and multiple disease samples Data matrix;Element in the gene expression data matrix is gene expression amount;
Gene expression data matrix conversion module, for utilizing probability density function, by the base of all normal samples Because expression data matrix is converted into normal sample likelihood score matrix, by the gene expression data matrix of all disease samples It is converted into disease sample likelihood score matrix;In the normal sample likelihood score matrix and the disease sample likelihood score matrix Element is gene likelihood score;
Normal sample distribution function constructs module, for according to all normal sample likelihood score matrixes, building to be normal Sample distribution function;
Significant difference gene sets determining module, for by each element in each disease sample likelihood score matrix It successively brings into the normal sample distribution function, determines the significant difference gene sets of each disease sample;
Operator logo object determining module, for being successively mapped to the significant difference gene sets of each disease sample In protein-protein interaction network, the operator logo object of each disease sample is determined.
Optionally, the cancer operator logo object determines system further include:
Disease subtypes categorization module, for the operator logo object and known cancer subtypes according to each disease sample Priori data carries out the classification of different subtype to the disease sample.
Optionally, the gene expression data matrix conversion module, specifically includes:
Gene likelihood score computation model construction unit, for utilizing probability density function, building gene likelihood score calculates mould Type;The expression formula of the gene likelihood score computation model isWherein, λiIndicate the likelihood score of gene i;Indicate i-th of gene in the expression quantity of j-th of sample;fi 1Indicate gene i in disease sample Under normal distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample;
Conversion unit is used for according to the gene likelihood score computation model, by the gene expression of all normal samples Data matrix is converted into normal sample likelihood score matrix, and the gene expression data matrix of all disease samples is converted For disease sample likelihood score matrix.
Optionally, the significant difference gene sets determining module, specifically includes:
Probability value computing unit, for by each element in the disease sample likelihood score matrix successively bring into it is described just In normal sample distribution function, the probability value of each gene in each disease sample is calculated;
Judging unit, for judging whether the probability value is less than or equal to given threshold;
Significant difference gene sets determination unit, it is true for gene corresponding to the probability value of given threshold will to be less than or equal to It is set to the significant difference gene of disease sample.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
The present invention provides a kind of cancer operator logo objects based on probabilistic model to determine method and system, the cancer network Marker determines that method includes: the gene expression data matrix for obtaining multiple normal samples and multiple disease samples, and using generally The gene expression data matrix of all normal samples is converted normal sample likelihood score matrix, Suo Youji by rate density function The gene expression data matrix of sick sample is converted into disease sample likelihood score matrix;Then according to all normal sample likelihood scores Matrix constructs normal sample distribution function, and each element in each disease sample likelihood score matrix is successively brought into normally In sample distribution function, the significant difference gene sets of each disease sample are determined;Finally successively by the aobvious of each disease sample Differential gene compound mapping is write into protein-protein interaction network, determines the operator logo object of each disease sample. Using method provided by the invention or system, cancer operator logo object can be accurately and effectively obtained, and utilizes these cancers Operator logo object carries out Subtypes to disease to realize the accurate diagnosing and treating of disease.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is the flow diagram that the embodiment of the present invention determines method based on the cancer operator logo object of probabilistic model;
Fig. 2 is that the present invention is based on the schematic diagrames of the determination cancer operator logo object of probabilistic model;
Fig. 3 is the schematic diagram for the operator logo object that the present invention filters out;
Fig. 4 is the relational graph for each hypotype part marker that cancer UCEC is obtained;
Fig. 5 is the Subtypes result figure of cancer UCEC;
Fig. 6 is each hypotype sample size distribution map of cancer UCEC;
Fig. 7 is each hypotype survivorship curve figure of cancer UCEC;
Fig. 8 is the structural schematic diagram that the embodiment of the present invention determines system based on the cancer operator logo object of probabilistic model.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The present invention is in view of the heterogeneity between sample and since the difference of the factors such as pathogenic factor causes disease to exist On the basis of being had differences between different patients, provide a kind of cancer operator logo object based on probabilistic model determine method and System.The present invention can accurately and effectively obtain cancer operator logo object, and using these cancer operator logo objects to disease into Row classification is to realize the accurate diagnosing and treating of disease.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
In order to overcome data noise, current invention assumes that in a specific group or phenotype each gene express spectra number According to Normal Distribution.Based on this it is assumed that original gene expression profile data matrix can be converted into a likelihood score matrix. The present invention determines the significant difference gene in each disease sample by likelihood score matrix, and significant difference gene is projected egg It can be obtained by the operator logo object of each disease sample in white matter-protein interaction (PPI) network.
Due to the difference of the factors such as its pathogenic factor, same disease has differences various diseases between different patients, Traditional classification of diseases cannot characterize all disease samples well.Therefore, these traditional diseases are carried out more detailed Subtypes in terms of medical diagnosis on disease and treatment with critically important biological significance.The marker of all disease samples is taken Union combines, the likelihood score matrix about cancer markers after being integrated, in conjunction with existing cancer subtypes information The classification of different subtype is carried out to disease sample using the ConsensusClusterPlus method of R language.
Based on the above content, main idea is that introducing probability density function and combining single sample thought, to every The network identity object of a disease sample is screened, using the special marker of these samples and sample clinical information to cancer Different subtype is classified.
Fig. 1 is the flow diagram that the embodiment of the present invention determines method based on the cancer operator logo object of probabilistic model, such as Shown in Fig. 1, the cancer operator logo object provided in an embodiment of the present invention based on probabilistic model determines that method includes following step Suddenly.
Step 101: obtaining the gene expression data matrix of multiple normal samples and multiple disease samples;The gene expression Element in data matrix is gene expression amount.
Step 102: utilizing probability density function, convert the gene expression data matrix of each normal sample to Normal sample likelihood score matrix, the gene expression data matrix of each disease sample are converted into disease sample likelihood score square Battle array;Element in the normal sample likelihood score matrix and the disease sample likelihood score matrix is gene likelihood score.
Step 103: according to all normal sample likelihood score matrixes, constructing normal sample distribution function.
Step 104: bringing each element in each disease sample likelihood score matrix into the normal sample and be distributed In function, the significant difference gene sets of each disease sample are determined.
Step 105: the significant difference gene sets of each disease sample are mapped to protein-protein phase interaction With the operator logo object in network, determining each disease sample.
Step 106: according to the operator logo object of each disease sample and known cancer subtypes priori data to institute State the classification that disease sample carries out different subtype.
Wherein, the data in step 101 in the gene expression data matrix of disease sample are from The Cancer GenomeAtlas (TCGA) database obtains.
Step 102 specifically includes:
Using probability density function, gene likelihood score computation model is constructed;The expression of the gene likelihood score computation model Formula isWherein, λiIndicate the likelihood score of gene i;Indicate i-th A gene is in the expression quantity of j-th of sample, and i is gene number, and j is sample number;fi 1Indicate gene i under disease sample just State distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample, 1 and 2 respectively represent disease and normal.
Specifically: mean value and variance are measured to each gene expression of normal sample and disease sample respectively, building is each Normal distribution curve f of the gene under normal sample and under disease samplei 2And fi 1, wherein normal distyribution function beX is expression quantity, and μ is mean value, and σ is standard deviation;It is then based on each gene Normal distribution curve f under normal sample and under disease samplei 2、fi 1Construct gene likelihood score computation model.
According to the gene likelihood score computation model, the gene expression data matrix of each normal sample is converted Gene expression data matrix for normal sample likelihood score matrix, each disease sample is converted into disease sample likelihood score Matrix.
Step 103 specifically includes:
According to all normal sample likelihood score matrixes, the mean value and variance of each gene likelihood score are calculated.
According to the mean value and variance of the gene likelihood score, the distribution letter of each gene likelihood score under normal sample is constructed Number.Distribution function herein is normal distyribution function.
Step 104 is the thought based on single sample, to calculate the significant difference gene sets of each disease sample.Using seemingly So the normal sample in degree matrix constructs probability density function, and for each disease sample, more each gene is just In normal sample whether significant difference, to filter out significant difference gene.
Step 104 specifically includes:
Each element in the disease sample likelihood score matrix is brought into the normal sample distribution function, is calculated every The probability value p of each gene in a disease sample.
Judge whether the probability value p is less than or equal to given threshold;Given threshold herein is 0.05.
If so, gene corresponding to the probability value p for being less than or equal to given threshold to be determined as to the significance difference of disease sample Allogene.
Protein-protein interaction (Protein-protein Interaction, the abbreviation PPI) network information from STRING database obtains.STRING database be current application it is relatively broad and develop comparative maturity search protein between The database of interaction, including by verification experimental verification protein between direct Physical interaction, also have from The result of the protein interaction and the prediction of other bioinformatics methods excavated in PubMed abstract.
Step 105 specifically includes:
The significant difference gene sets of the disease sample are mapped in protein-protein interaction network, and According to intergenic dependent interaction relationship, by the connection gene dosage filtered out most five genes and five genes Single order neighbor node be determined as the operator logo object of disease sample, to delete the part of false positive from differential gene, Avoid containing noise due to gene expression data, sample size is few and positive and negative sample imbalance to cause the marker got to exist false Positive events.
Step 106 specifically includes the priori knowledge benefit of cancer operator logo object and cancer subtypes by each disease sample The classification of different subtype is carried out to disease sample with the ConsensusClusterPlus method of R language, and utilizes disease sample Clinical data information survival analysis is done to each hypotype of acquisition.Wherein, disease sample clinical data is also from TCGA data Library obtains.
On this basis, researcher can carry out the acquisition of cancer markers and cancer subtypes classification by this concept More in-depth study, and the accurate diagnosing and treating of disease is realized on this basis.
Herein, the present invention also provides a specific DATA Examples, illustratively to illustrate the present invention.
Fig. 2 is that the present invention is based on the schematic diagrames of the determination cancer operator logo object of probabilistic model, as shown in Fig. 2, in specific Hold as follows:
Calculate the conversion of gene expression matrix to likelihood score matrix
1 mRNA gene expression matrix of table
Table 1 is one about mRNA gene expression matrix, includes 8 sample informations in the data, wherein (n1, n2, n3, n4) Indicate that normal tissue sample, (d1, d2, d3, d4) indicate diseased tissue sample.G1, g2, g3, g4, g5 indicate the title of mRNA, Data are gene expression data in table.Likelihood score matrix after conversion is then are as follows:
2 likelihood score matrix of table
Table 2 is one about likelihood score matrix, is asked respectively 5 genes of this 8 samplesTo obtaining likelihood score matrix, in table data be after conversion seemingly Right degree evidence.
Obtain the difference expression gene of each disease sample
Likelihood score matrix after the conversion obtained using mRNA gene expression matrix, it is assumed that normal sample is still obeyed at this time Normal distribution, count the gene in each disease sample in normal sample whether significant difference, to obtain each disease sample This difference expression gene set (p < 0.05), as shown in table 3:
The differential gene that table 3 filters out
As shown in table 3, for (d1, d2, d3, d4) this four disease samples, investigate each gene is in normal sample No significant difference (p < 0.05), the data of overstriking indicate that these genes are significant differences in corresponding sample in table.
Operator logo object obtains
The case where there may be false positives due to the differential gene that is got by gene expression amount, utilizes gene in PPI Between interaction relationship, delete the part of wherein false positive.In a network, if some gene significant difference and very The gene being mostly connected directly with it is all differential gene, it is judged that these differential genes be it is more stable, they are made For the cancer markers of sample, screening criteria be linker factor ranking in differential gene network first five gene and its For the single order node of connection as operator logo object, the square of dark color shown in Fig. 3 is the operator logo object of screening.
The classification of different subtype is carried out to cancer
According to the operator logo object information of each disease sample of acquisition, in conjunction with existing cancer subtypes as shown in Figure 4 The clinical data of knowledge and disease sample carries out the classification of different subtype to carcinoma of endometrium (UCEC) data, obtain such as Fig. 5 and The each hypotype sample size distribution map of the Subtypes result figure of cancer UCEC shown in fig. 6, cancer UCEC, and then obtain such as figure The each hypotype survivorship curve of cancer UCEC shown in 7, the existence difference between each hypotype are characterized by p value, and p < 0.05 shows respectively There are biggish differences between a cancer subtypes.
To achieve the above object, the present invention also provides a kind of cancer operator logo objects based on probabilistic model to determine system System.
Fig. 8 is the structural schematic diagram that the embodiment of the present invention determines system based on the cancer operator logo object of probabilistic model, such as Shown in Fig. 8, cancer operator logo object provided in an embodiment of the present invention determines that system includes:
Gene expression data matrix obtains module 100, for obtaining the gene of multiple normal samples and multiple disease samples Express data matrix;Element in the gene expression data matrix is gene expression amount.
Gene expression data matrix conversion module 200, for utilizing probability density function, by each normal sample Gene expression data matrix is converted into normal sample likelihood score matrix, the gene expression data matrix of each disease sample It is converted into disease sample likelihood score matrix;In the normal sample likelihood score matrix and the disease sample likelihood score matrix Element is gene likelihood score.
Normal sample distribution function constructs module 300, for according to all normal sample likelihood score matrixes, building to be just Normal sample distribution function.
Significant difference gene sets determining module 400, for by each of each described disease sample likelihood score matrix Element is brought into the normal sample distribution function, determines the significant difference gene sets of each disease sample.
Operator logo object determining module 500, for being mapped to the significant difference gene sets of each disease sample In protein-protein interaction network, the operator logo object of each disease sample is determined.
Disease subtypes categorization module 600, for according to each disease sample operator logo object and known cancer Hypotype priori data carries out the classification of different subtype to the disease sample.
The gene expression data matrix conversion module 200 specifically includes:
Gene likelihood score computation model construction unit, for utilizing probability density function, building gene likelihood score calculates mould Type;The expression formula of the gene likelihood score computation model isWherein, λiIndicate the likelihood score of gene i;Indicate i-th of gene in the expression quantity of j-th of sample;fi 1Indicate gene i in disease sample Under normal distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample.
Conversion unit is used for according to the gene likelihood score computation model, by the gene expression of each normal sample Data matrix is converted into normal sample likelihood score matrix, and the gene expression data matrix of each disease sample is converted into Disease sample likelihood score matrix.
The significant difference gene sets determining module 400 specifically includes:
Probability value computing unit, for bringing each element in the disease sample likelihood score matrix into the normal sample In this distribution function, the probability value of each gene in each disease sample is calculated.
Judging unit, for judging whether the probability value is less than or equal to given threshold.
Significant difference gene sets determination unit, it is true for gene corresponding to the probability value of given threshold will to be less than or equal to It is set to the significant difference gene of disease sample.
The present invention is the base for causing disease to have differences between different patients in the difference due to factors such as pathogenic factors On plinth, by carrying out each hypotype that classification obtains corresponding disease to disease, helping preferably to improve the diagnosis of disease and controlling It treats, and proposes a kind of cancer operator logo object based on probabilistic model and determine method and system.This is in cancer operator logo object Play the role of in terms of obtaining with cancer subtypes classification very important.Cancer markers phase is shared with traditional disease sample Than the present invention can obtain the special operator logo object of each disease sample, and can find belonging to each disease sample Cancer subtypes type preferably realizes Precise Diagnosis and treatment to disease, from screening during disease development The diagnosing and treating of mRNA and improvement cancer to key effect all have very important significance.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (10)

1. a kind of cancer operator logo object based on probabilistic model determines method, which is characterized in that the cancer operator logo object The method of determination includes:
Obtain the gene expression data matrix of multiple normal samples and multiple disease samples;In the gene expression data matrix Element is gene expression amount;
Using probability density function, normal sample likelihood is converted by the gene expression data matrix of all normal samples Matrix is spent, converts disease sample likelihood score matrix for the gene expression data matrix of all disease samples;It is described just Element in normal sample likelihood score matrix and the disease sample likelihood score matrix is gene likelihood score;
According to all normal sample likelihood score matrixes, normal sample distribution function is constructed;
Each element in each disease sample likelihood score matrix is successively brought into the normal sample distribution function, really The significant difference gene sets of fixed each disease sample;
The significant difference gene sets of each disease sample are successively mapped to protein-protein interaction network In, determine the operator logo object of each disease sample.
2. cancer operator logo object according to claim 1 determines method, which is characterized in that the cancer operator logo object Determine method further include:
Disease sample is carried out not according to the operator logo object of each disease sample and known cancer subtypes priori data With the classification of hypotype.
3. cancer operator logo object according to claim 1 determines method, which is characterized in that described to utilize probability density letter Number, converts normal sample likelihood score matrix for the gene expression data matrix of all normal samples, will be all described The gene expression data matrix of disease sample is converted into disease sample likelihood score matrix, specifically includes:
Using probability density function, gene likelihood score computation model is constructed;The expression formula of the gene likelihood score computation model isWherein, λiIndicate the likelihood score of gene i;Indicate i-th of gene In the expression quantity of j-th of sample;fi 1Indicate normal distribution curve of the gene i under disease sample;fi 2Indicate gene i normal Normal distribution curve under sample;
According to the gene likelihood score computation model, the gene expression data matrix of all normal samples is converted and is positive Normal sample likelihood score matrix, converts disease sample likelihood score square for the gene expression data matrix of all disease samples Battle array.
4. cancer operator logo object according to claim 1 determines method, which is characterized in that it is described according to it is all it is described just Normal sample likelihood score matrix, constructs normal sample distribution function, specifically includes:
According to all normal sample likelihood score matrixes, the mean value and variance of each gene likelihood score are calculated;
According to the mean value and variance of the gene likelihood score, the normal distribution letter of each gene likelihood score under normal sample is constructed Number.
5. cancer operator logo object according to claim 1 determines method, which is characterized in that described by each disease Each element in sample likelihood score matrix is successively brought into the normal sample distribution function, determines the aobvious of each disease sample Differential gene set is write, is specifically included:
Each element in the disease sample likelihood score matrix is successively brought into the normal sample distribution function, is calculated every The probability value of each gene in a disease sample;
Judge whether the probability value is less than or equal to given threshold;
If so, gene corresponding to the probability value for being less than or equal to given threshold to be determined as to the significant difference base of disease sample Cause.
6. cancer operator logo object according to claim 1 determines method, which is characterized in that the successively general is each described The significant difference gene sets of disease sample are mapped in protein-protein interaction network, determine each disease sample Operator logo object, specifically include:
Successively the significant difference gene sets of the disease sample are mapped in protein-protein interaction network, and According to intergenic dependent interaction relationship, by the connection gene dosage filtered out most five genes and five genes Single order neighbor node be determined as the operator logo object of disease sample.
7. a kind of cancer operator logo object based on probabilistic model determines system, which is characterized in that the cancer operator logo object The system of determination includes:
Gene expression data matrix obtains module, for obtaining the gene expression data of multiple normal samples and multiple disease samples Matrix;Element in the gene expression data matrix is gene expression amount;
Gene expression data matrix conversion module, for utilizing probability density function, by the gene table of all normal samples It is converted into normal sample likelihood score matrix up to data matrix, the gene expression data matrix of all disease samples is turned Turn to disease sample likelihood score matrix;Element in the normal sample likelihood score matrix and the disease sample likelihood score matrix It is gene likelihood score;
Normal sample distribution function constructs module, for constructing normal sample according to all normal sample likelihood score matrixes Distribution function;
Significant difference gene sets determining module, for by each element in each disease sample likelihood score matrix successively It brings into the normal sample distribution function, determines the significant difference gene sets of each disease sample;
Operator logo object determining module, for the significant difference gene sets of each disease sample to be successively mapped to albumen In matter-protein-protein interaction network, the operator logo object of each disease sample is determined.
8. cancer operator logo object according to claim 7 determines system, which is characterized in that the cancer operator logo object Determine system further include:
Disease subtypes categorization module, for according to each disease sample operator logo object and known cancer subtypes priori Data carry out the classification of different subtype to the disease sample.
9. cancer operator logo object according to claim 7 determines system, which is characterized in that the gene expression data square Battle array conversion module, specifically includes:
Gene likelihood score computation model construction unit constructs gene likelihood score computation model for utilizing probability density function;Institute The expression formula for stating gene likelihood score computation model isWherein, λiTable Show the likelihood score of gene i;Indicate i-th of gene in the expression quantity of j-th of sample;fi 1Indicate gene i under disease sample Normal distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample;
Conversion unit is used for according to the gene likelihood score computation model, by the gene expression data of all normal samples Matrix is converted into normal sample likelihood score matrix, converts disease for the gene expression data matrix of all disease samples Sick sample likelihood score matrix.
10. cancer operator logo object according to claim 7 determines system, which is characterized in that the significant difference gene Gather determining module, specifically include:
Probability value computing unit, for successively bringing each element in the disease sample likelihood score matrix into the normal sample In this distribution function, the probability value of each gene in each disease sample is calculated;
Judging unit, for judging whether the probability value is less than or equal to given threshold;
Significant difference gene sets determination unit is determined as will be less than or equal to gene corresponding to the probability value of given threshold The significant difference gene of disease sample.
CN201810920673.7A 2018-08-14 2018-08-14 Cancer network marker determination method and system based on probability model Expired - Fee Related CN109101783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810920673.7A CN109101783B (en) 2018-08-14 2018-08-14 Cancer network marker determination method and system based on probability model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810920673.7A CN109101783B (en) 2018-08-14 2018-08-14 Cancer network marker determination method and system based on probability model

Publications (2)

Publication Number Publication Date
CN109101783A true CN109101783A (en) 2018-12-28
CN109101783B CN109101783B (en) 2020-09-04

Family

ID=64849535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810920673.7A Expired - Fee Related CN109101783B (en) 2018-08-14 2018-08-14 Cancer network marker determination method and system based on probability model

Country Status (1)

Country Link
CN (1) CN109101783B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110010204A (en) * 2019-04-04 2019-07-12 中南大学 Prognosis biomarker recognition methods based on converged network and more marking strategies
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN110797083A (en) * 2019-09-18 2020-02-14 中南大学 Multi-network-based biomarker identification method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268431A (en) * 2013-05-21 2013-08-28 中山大学 Cancer hypotype biomarker detecting system based on student t distribution
CN103473416A (en) * 2013-09-13 2013-12-25 中国人民解放军国防科学技术大学 Protein-protein interaction model building method and device
WO2013192504A1 (en) * 2012-06-22 2013-12-27 The Trustees Of Dartmouth College Novel vista-ig constructs and the use of vista-ig for treatment of autoimmune, allergic and inflammatory disorders
CN105117617A (en) * 2015-08-26 2015-12-02 大连海事大学 Method for screening environmentally sensitive biomolecules
CN106295246A (en) * 2016-08-07 2017-01-04 吉林大学 Find the lncRNA relevant to tumor and predict its function
CN107025387A (en) * 2017-03-29 2017-08-08 电子科技大学 One kind is used for biomarker for cancer and knows method for distinguishing
CN108181471A (en) * 2017-12-15 2018-06-19 新疆医科大学第附属医院 A kind of detection marker of dissection of aorta and marker appraisal procedure
US20180211013A1 (en) * 2017-01-25 2018-07-26 International Business Machines Corporation Patient Communication Priority By Compliance Dates, Risk Scores, and Organizational Goals
CN108345768A (en) * 2017-01-20 2018-07-31 深圳华大生命科学研究院 A kind of method and marker combination of determining infant's intestinal flora maturity

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013192504A1 (en) * 2012-06-22 2013-12-27 The Trustees Of Dartmouth College Novel vista-ig constructs and the use of vista-ig for treatment of autoimmune, allergic and inflammatory disorders
CN103268431A (en) * 2013-05-21 2013-08-28 中山大学 Cancer hypotype biomarker detecting system based on student t distribution
CN103473416A (en) * 2013-09-13 2013-12-25 中国人民解放军国防科学技术大学 Protein-protein interaction model building method and device
CN105117617A (en) * 2015-08-26 2015-12-02 大连海事大学 Method for screening environmentally sensitive biomolecules
CN106295246A (en) * 2016-08-07 2017-01-04 吉林大学 Find the lncRNA relevant to tumor and predict its function
CN108345768A (en) * 2017-01-20 2018-07-31 深圳华大生命科学研究院 A kind of method and marker combination of determining infant's intestinal flora maturity
US20180211013A1 (en) * 2017-01-25 2018-07-26 International Business Machines Corporation Patient Communication Priority By Compliance Dates, Risk Scores, and Organizational Goals
CN107025387A (en) * 2017-03-29 2017-08-08 电子科技大学 One kind is used for biomarker for cancer and knows method for distinguishing
CN108181471A (en) * 2017-12-15 2018-06-19 新疆医科大学第附属医院 A kind of detection marker of dissection of aorta and marker appraisal procedure

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JOSE M. PENA 等: "Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control", 《EUROPEAN CONFERENCE ON EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS》 *
JUNJIE SU 等: "Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity", 《PLOS ONE》 *
XIAOPING LIU 等: "Personalized characterization of diseases using sample-specific networks", 《NUCLEIC ACIDS RESEARCH》 *
高云朝: "血清肿瘤标志物在胰腺癌诊断中的选择", 《上海医学》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110010204A (en) * 2019-04-04 2019-07-12 中南大学 Prognosis biomarker recognition methods based on converged network and more marking strategies
CN110010204B (en) * 2019-04-04 2022-12-02 中南大学 Fusion network and multi-scoring strategy based prognostic biomarker identification method
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN110444248B (en) * 2019-07-22 2021-09-24 山东大学 Cancer biomolecule marker screening method and system based on network topology parameters
CN110797083A (en) * 2019-09-18 2020-02-14 中南大学 Multi-network-based biomarker identification method
CN110797083B (en) * 2019-09-18 2023-04-18 中南大学 Biomarker identification method based on multiple networks

Also Published As

Publication number Publication date
CN109101783B (en) 2020-09-04

Similar Documents

Publication Publication Date Title
Yang et al. Deep learning for detecting corona virus disease 2019 (COVID-19) on high-resolution computed tomography: a pilot study
CN103415624B (en) Cancer of pancreas biomarker and application thereof
CN102985819B (en) Lung cancer biomarkers and uses thereof
CN106168624B (en) Lung cancer biomarkers and application thereof
CN106599616B (en) Ultralow frequency mutational site determination method based on duplex-seq
CN103429753A (en) Mesothelioma biomarkers and uses thereof
He et al. Automated model design and benchmarking of deep learning models for covid-19 detection with chest ct scans
CN102209968A (en) Lung cancer biomarkers and uses thereof
CN113168886A (en) Systems and methods for germline and somatic variant calling using neural networks
CN109101783A (en) A kind of cancer operator logo object based on probabilistic model determines method and system
Wang et al. Integrated bioinformatic analysis reveals YWHAB as a novel diagnostic biomarker for idiopathic pulmonary arterial hypertension
Cao et al. Integrating multiple evidence sources to predict adverse drug reactions based on a systems pharmacology model
KR102181058B1 (en) Method for data processing to derive new drug candidate substance
Hu et al. Classifying the multi-omics data of gastric cancer using a deep feature selection method
US8068994B2 (en) Method for analyzing biological networks
CN107169264B (en) complex disease diagnosis system
Sun et al. Protein classifier for thyroid nodules learned from rapidly acquired proteotypes
Ye et al. Circular Trajectory Reconstruction Uncovers Cell‐Cycle Progression and Regulatory Dynamics from Single‐Cell Hi‐C Maps
Yuan et al. Self-organizing maps for cellular in silico staining and cell substate classification
Liu et al. Construction of disease-specific cytokine profiles by associating disease genes with immune responses
Liu et al. Joint skeleton estimation of multiple directed acyclic graphs for heterogeneous population
KR102187594B1 (en) Multi-omics data processing apparatus and method for discovering new drug candidates
KR20200123771A (en) New drug candidate substance search method based on multiomics network
TWI450968B (en) A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients
Shi et al. A novel high-dimensional kernel joint non-negative matrix factorization with multimodal information for lung cancer study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200904

Termination date: 20210814