CN109101783A - A kind of cancer operator logo object based on probabilistic model determines method and system - Google Patents
A kind of cancer operator logo object based on probabilistic model determines method and system Download PDFInfo
- Publication number
- CN109101783A CN109101783A CN201810920673.7A CN201810920673A CN109101783A CN 109101783 A CN109101783 A CN 109101783A CN 201810920673 A CN201810920673 A CN 201810920673A CN 109101783 A CN109101783 A CN 109101783A
- Authority
- CN
- China
- Prior art keywords
- sample
- gene
- disease
- likelihood score
- normal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a kind of cancer operator logo objects based on probabilistic model to determine method and system, this method comprises: utilizing probability density function, the gene expression data matrix of all normal samples and disease sample that will acquire is converted into likelihood score matrix, and according to all normal sample likelihood score matrixes, normal sample distribution function is constructed;Then each element in each disease sample likelihood score matrix is brought into normal sample distribution function, determine the significant difference gene sets of each disease sample, and the significant difference gene sets of each disease sample are mapped in protein-protein interaction network, determine the operator logo object of each disease sample.Using method provided by the invention or system, cancer operator logo object can be accurately and effectively obtained, and Subtypes are carried out to realize the accurate diagnosing and treating of disease to disease using these cancer operator logo objects.
Description
Technical field
The present invention relates to technical field of gene detection, in particular to a kind of cancer operator logo object based on probabilistic model is true
Determine method and system.
Background technique
Existing research shows that the occurrence and development of cancer are by the coefficient result of multiple genes.Due to traditional base
Because that there are noises is big for expression modal data, the disadvantages of sample is few and positive and negative sample imbalance, therefore will expression modal data and biological net
Network combines, and determines that cancer operator logo object just becomes a potential resolving ideas.Meanwhile operator logo object and previous single
Gene marker is compared to having higher efficiency and stability.
Summary of the invention
The present invention is in view of the heterogeneity between sample and since the difference of the factors such as pathogenic factor causes disease to exist
On the basis of being had differences between different patients, provide a kind of cancer operator logo object based on probabilistic model determine method and
System.The present invention can accurately and effectively obtain cancer operator logo object, and using these cancer operator logo objects to disease into
Row classification is to realize the accurate diagnosing and treating of disease.
To achieve the above object, the present invention provides following schemes:
A kind of cancer operator logo object based on probabilistic model determines method, and the cancer operator logo object determines method packet
It includes:
Obtain the gene expression data matrix of multiple normal samples and multiple disease samples;The gene expression data matrix
In element be gene expression amount;
Using probability density function, normal sample is converted by the gene expression data matrix of all normal samples
The gene expression data matrix of all disease samples is converted disease sample likelihood score matrix by likelihood score matrix;Institute
Stating the element in normal sample likelihood score matrix and the disease sample likelihood score matrix is gene likelihood score;
According to all normal sample likelihood score matrixes, normal sample distribution function is constructed;
Successively bring each element in each disease sample likelihood score matrix into the normal sample distribution function
In, determine the significant difference gene sets of each disease sample;
The significant difference gene sets of each disease sample are successively mapped to protein-protein interaction net
In network, the operator logo object of each disease sample is determined.
Optionally, the cancer operator logo object determines method further include:
According to the operator logo object of each disease sample and known cancer subtypes priori data to disease sample into
The classification of row different subtype.
Optionally, described to utilize probability density function, the gene expression data matrix of all normal samples is turned
Normal sample likelihood score matrix is turned to, converts disease sample seemingly for the gene expression data matrix of all disease samples
So degree matrix, specifically includes:
Using probability density function, gene likelihood score computation model is constructed;The expression of the gene likelihood score computation model
Formula isWherein, λiIndicate the likelihood score of gene i;Indicate i-th
Expression quantity of a gene in j-th of sample;fi 1Indicate normal distribution curve of the gene i under disease sample;fi 2Indicate gene i
Normal distribution curve under normal sample;
According to the gene likelihood score computation model, the gene expression data matrix of all normal samples is converted
For normal sample likelihood score matrix, disease sample likelihood is converted by the gene expression data matrix of all disease samples
Spend matrix.
Optionally, described to construct normal sample distribution function according to all normal sample likelihood score matrixes, it is specific to wrap
It includes:
According to all normal sample likelihood score matrixes, the mean value and variance of each gene likelihood score are calculated;
According to the mean value and variance of the gene likelihood score, the normal distribution of each gene likelihood score under normal sample is constructed
Function.
Optionally, each element by each disease sample likelihood score matrix successively brings the normal sample into
In this distribution function, determines the significant difference gene sets of each disease sample, specifically includes:
Each element in the disease sample likelihood score matrix is successively brought into the normal sample distribution function, is counted
Calculate the probability value of each gene in each disease sample;
Judge whether the probability value is less than or equal to given threshold;
If so, gene corresponding to the probability value for being less than or equal to given threshold to be determined as to the significant difference of disease sample
Gene.
Optionally, described that the significant difference gene sets of each disease sample are successively mapped to protein-albumen
In matter interactive network, determines the operator logo object of each disease sample, specifically includes:
The significant difference gene sets of the disease sample are successively mapped to protein-protein interaction network
In, and according to intergenic dependent interaction relationship, by the connection gene dosage filtered out most five genes and five institutes
The single order neighbor node for stating gene is determined as the operator logo object of disease sample.
The present invention also provides a kind of cancer operator logo objects based on probabilistic model to determine system, the cancer network mark
Will object determines that system includes:
Gene expression data matrix obtains module, for obtaining the gene expression of multiple normal samples and multiple disease samples
Data matrix;Element in the gene expression data matrix is gene expression amount;
Gene expression data matrix conversion module, for utilizing probability density function, by the base of all normal samples
Because expression data matrix is converted into normal sample likelihood score matrix, by the gene expression data matrix of all disease samples
It is converted into disease sample likelihood score matrix;In the normal sample likelihood score matrix and the disease sample likelihood score matrix
Element is gene likelihood score;
Normal sample distribution function constructs module, for according to all normal sample likelihood score matrixes, building to be normal
Sample distribution function;
Significant difference gene sets determining module, for by each element in each disease sample likelihood score matrix
It successively brings into the normal sample distribution function, determines the significant difference gene sets of each disease sample;
Operator logo object determining module, for being successively mapped to the significant difference gene sets of each disease sample
In protein-protein interaction network, the operator logo object of each disease sample is determined.
Optionally, the cancer operator logo object determines system further include:
Disease subtypes categorization module, for the operator logo object and known cancer subtypes according to each disease sample
Priori data carries out the classification of different subtype to the disease sample.
Optionally, the gene expression data matrix conversion module, specifically includes:
Gene likelihood score computation model construction unit, for utilizing probability density function, building gene likelihood score calculates mould
Type;The expression formula of the gene likelihood score computation model isWherein,
λiIndicate the likelihood score of gene i;Indicate i-th of gene in the expression quantity of j-th of sample;fi 1Indicate gene i in disease sample
Under normal distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample;
Conversion unit is used for according to the gene likelihood score computation model, by the gene expression of all normal samples
Data matrix is converted into normal sample likelihood score matrix, and the gene expression data matrix of all disease samples is converted
For disease sample likelihood score matrix.
Optionally, the significant difference gene sets determining module, specifically includes:
Probability value computing unit, for by each element in the disease sample likelihood score matrix successively bring into it is described just
In normal sample distribution function, the probability value of each gene in each disease sample is calculated;
Judging unit, for judging whether the probability value is less than or equal to given threshold;
Significant difference gene sets determination unit, it is true for gene corresponding to the probability value of given threshold will to be less than or equal to
It is set to the significant difference gene of disease sample.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
The present invention provides a kind of cancer operator logo objects based on probabilistic model to determine method and system, the cancer network
Marker determines that method includes: the gene expression data matrix for obtaining multiple normal samples and multiple disease samples, and using generally
The gene expression data matrix of all normal samples is converted normal sample likelihood score matrix, Suo Youji by rate density function
The gene expression data matrix of sick sample is converted into disease sample likelihood score matrix;Then according to all normal sample likelihood scores
Matrix constructs normal sample distribution function, and each element in each disease sample likelihood score matrix is successively brought into normally
In sample distribution function, the significant difference gene sets of each disease sample are determined;Finally successively by the aobvious of each disease sample
Differential gene compound mapping is write into protein-protein interaction network, determines the operator logo object of each disease sample.
Using method provided by the invention or system, cancer operator logo object can be accurately and effectively obtained, and utilizes these cancers
Operator logo object carries out Subtypes to disease to realize the accurate diagnosing and treating of disease.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow diagram that the embodiment of the present invention determines method based on the cancer operator logo object of probabilistic model;
Fig. 2 is that the present invention is based on the schematic diagrames of the determination cancer operator logo object of probabilistic model;
Fig. 3 is the schematic diagram for the operator logo object that the present invention filters out;
Fig. 4 is the relational graph for each hypotype part marker that cancer UCEC is obtained;
Fig. 5 is the Subtypes result figure of cancer UCEC;
Fig. 6 is each hypotype sample size distribution map of cancer UCEC;
Fig. 7 is each hypotype survivorship curve figure of cancer UCEC;
Fig. 8 is the structural schematic diagram that the embodiment of the present invention determines system based on the cancer operator logo object of probabilistic model.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The present invention is in view of the heterogeneity between sample and since the difference of the factors such as pathogenic factor causes disease to exist
On the basis of being had differences between different patients, provide a kind of cancer operator logo object based on probabilistic model determine method and
System.The present invention can accurately and effectively obtain cancer operator logo object, and using these cancer operator logo objects to disease into
Row classification is to realize the accurate diagnosing and treating of disease.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
In order to overcome data noise, current invention assumes that in a specific group or phenotype each gene express spectra number
According to Normal Distribution.Based on this it is assumed that original gene expression profile data matrix can be converted into a likelihood score matrix.
The present invention determines the significant difference gene in each disease sample by likelihood score matrix, and significant difference gene is projected egg
It can be obtained by the operator logo object of each disease sample in white matter-protein interaction (PPI) network.
Due to the difference of the factors such as its pathogenic factor, same disease has differences various diseases between different patients,
Traditional classification of diseases cannot characterize all disease samples well.Therefore, these traditional diseases are carried out more detailed
Subtypes in terms of medical diagnosis on disease and treatment with critically important biological significance.The marker of all disease samples is taken
Union combines, the likelihood score matrix about cancer markers after being integrated, in conjunction with existing cancer subtypes information
The classification of different subtype is carried out to disease sample using the ConsensusClusterPlus method of R language.
Based on the above content, main idea is that introducing probability density function and combining single sample thought, to every
The network identity object of a disease sample is screened, using the special marker of these samples and sample clinical information to cancer
Different subtype is classified.
Fig. 1 is the flow diagram that the embodiment of the present invention determines method based on the cancer operator logo object of probabilistic model, such as
Shown in Fig. 1, the cancer operator logo object provided in an embodiment of the present invention based on probabilistic model determines that method includes following step
Suddenly.
Step 101: obtaining the gene expression data matrix of multiple normal samples and multiple disease samples;The gene expression
Element in data matrix is gene expression amount.
Step 102: utilizing probability density function, convert the gene expression data matrix of each normal sample to
Normal sample likelihood score matrix, the gene expression data matrix of each disease sample are converted into disease sample likelihood score square
Battle array;Element in the normal sample likelihood score matrix and the disease sample likelihood score matrix is gene likelihood score.
Step 103: according to all normal sample likelihood score matrixes, constructing normal sample distribution function.
Step 104: bringing each element in each disease sample likelihood score matrix into the normal sample and be distributed
In function, the significant difference gene sets of each disease sample are determined.
Step 105: the significant difference gene sets of each disease sample are mapped to protein-protein phase interaction
With the operator logo object in network, determining each disease sample.
Step 106: according to the operator logo object of each disease sample and known cancer subtypes priori data to institute
State the classification that disease sample carries out different subtype.
Wherein, the data in step 101 in the gene expression data matrix of disease sample are from The Cancer
GenomeAtlas (TCGA) database obtains.
Step 102 specifically includes:
Using probability density function, gene likelihood score computation model is constructed;The expression of the gene likelihood score computation model
Formula isWherein, λiIndicate the likelihood score of gene i;Indicate i-th
A gene is in the expression quantity of j-th of sample, and i is gene number, and j is sample number;fi 1Indicate gene i under disease sample just
State distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample, 1 and 2 respectively represent disease and normal.
Specifically: mean value and variance are measured to each gene expression of normal sample and disease sample respectively, building is each
Normal distribution curve f of the gene under normal sample and under disease samplei 2And fi 1, wherein normal distyribution function beX is expression quantity, and μ is mean value, and σ is standard deviation;It is then based on each gene
Normal distribution curve f under normal sample and under disease samplei 2、fi 1Construct gene likelihood score computation model.
According to the gene likelihood score computation model, the gene expression data matrix of each normal sample is converted
Gene expression data matrix for normal sample likelihood score matrix, each disease sample is converted into disease sample likelihood score
Matrix.
Step 103 specifically includes:
According to all normal sample likelihood score matrixes, the mean value and variance of each gene likelihood score are calculated.
According to the mean value and variance of the gene likelihood score, the distribution letter of each gene likelihood score under normal sample is constructed
Number.Distribution function herein is normal distyribution function.
Step 104 is the thought based on single sample, to calculate the significant difference gene sets of each disease sample.Using seemingly
So the normal sample in degree matrix constructs probability density function, and for each disease sample, more each gene is just
In normal sample whether significant difference, to filter out significant difference gene.
Step 104 specifically includes:
Each element in the disease sample likelihood score matrix is brought into the normal sample distribution function, is calculated every
The probability value p of each gene in a disease sample.
Judge whether the probability value p is less than or equal to given threshold;Given threshold herein is 0.05.
If so, gene corresponding to the probability value p for being less than or equal to given threshold to be determined as to the significance difference of disease sample
Allogene.
Protein-protein interaction (Protein-protein Interaction, the abbreviation PPI) network information from
STRING database obtains.STRING database be current application it is relatively broad and develop comparative maturity search protein between
The database of interaction, including by verification experimental verification protein between direct Physical interaction, also have from
The result of the protein interaction and the prediction of other bioinformatics methods excavated in PubMed abstract.
Step 105 specifically includes:
The significant difference gene sets of the disease sample are mapped in protein-protein interaction network, and
According to intergenic dependent interaction relationship, by the connection gene dosage filtered out most five genes and five genes
Single order neighbor node be determined as the operator logo object of disease sample, to delete the part of false positive from differential gene,
Avoid containing noise due to gene expression data, sample size is few and positive and negative sample imbalance to cause the marker got to exist false
Positive events.
Step 106 specifically includes the priori knowledge benefit of cancer operator logo object and cancer subtypes by each disease sample
The classification of different subtype is carried out to disease sample with the ConsensusClusterPlus method of R language, and utilizes disease sample
Clinical data information survival analysis is done to each hypotype of acquisition.Wherein, disease sample clinical data is also from TCGA data
Library obtains.
On this basis, researcher can carry out the acquisition of cancer markers and cancer subtypes classification by this concept
More in-depth study, and the accurate diagnosing and treating of disease is realized on this basis.
Herein, the present invention also provides a specific DATA Examples, illustratively to illustrate the present invention.
Fig. 2 is that the present invention is based on the schematic diagrames of the determination cancer operator logo object of probabilistic model, as shown in Fig. 2, in specific
Hold as follows:
Calculate the conversion of gene expression matrix to likelihood score matrix
1 mRNA gene expression matrix of table
Table 1 is one about mRNA gene expression matrix, includes 8 sample informations in the data, wherein (n1, n2, n3, n4)
Indicate that normal tissue sample, (d1, d2, d3, d4) indicate diseased tissue sample.G1, g2, g3, g4, g5 indicate the title of mRNA,
Data are gene expression data in table.Likelihood score matrix after conversion is then are as follows:
2 likelihood score matrix of table
Table 2 is one about likelihood score matrix, is asked respectively 5 genes of this 8 samplesTo obtaining likelihood score matrix, in table data be after conversion seemingly
Right degree evidence.
Obtain the difference expression gene of each disease sample
Likelihood score matrix after the conversion obtained using mRNA gene expression matrix, it is assumed that normal sample is still obeyed at this time
Normal distribution, count the gene in each disease sample in normal sample whether significant difference, to obtain each disease sample
This difference expression gene set (p < 0.05), as shown in table 3:
The differential gene that table 3 filters out
As shown in table 3, for (d1, d2, d3, d4) this four disease samples, investigate each gene is in normal sample
No significant difference (p < 0.05), the data of overstriking indicate that these genes are significant differences in corresponding sample in table.
Operator logo object obtains
The case where there may be false positives due to the differential gene that is got by gene expression amount, utilizes gene in PPI
Between interaction relationship, delete the part of wherein false positive.In a network, if some gene significant difference and very
The gene being mostly connected directly with it is all differential gene, it is judged that these differential genes be it is more stable, they are made
For the cancer markers of sample, screening criteria be linker factor ranking in differential gene network first five gene and its
For the single order node of connection as operator logo object, the square of dark color shown in Fig. 3 is the operator logo object of screening.
The classification of different subtype is carried out to cancer
According to the operator logo object information of each disease sample of acquisition, in conjunction with existing cancer subtypes as shown in Figure 4
The clinical data of knowledge and disease sample carries out the classification of different subtype to carcinoma of endometrium (UCEC) data, obtain such as Fig. 5 and
The each hypotype sample size distribution map of the Subtypes result figure of cancer UCEC shown in fig. 6, cancer UCEC, and then obtain such as figure
The each hypotype survivorship curve of cancer UCEC shown in 7, the existence difference between each hypotype are characterized by p value, and p < 0.05 shows respectively
There are biggish differences between a cancer subtypes.
To achieve the above object, the present invention also provides a kind of cancer operator logo objects based on probabilistic model to determine system
System.
Fig. 8 is the structural schematic diagram that the embodiment of the present invention determines system based on the cancer operator logo object of probabilistic model, such as
Shown in Fig. 8, cancer operator logo object provided in an embodiment of the present invention determines that system includes:
Gene expression data matrix obtains module 100, for obtaining the gene of multiple normal samples and multiple disease samples
Express data matrix;Element in the gene expression data matrix is gene expression amount.
Gene expression data matrix conversion module 200, for utilizing probability density function, by each normal sample
Gene expression data matrix is converted into normal sample likelihood score matrix, the gene expression data matrix of each disease sample
It is converted into disease sample likelihood score matrix;In the normal sample likelihood score matrix and the disease sample likelihood score matrix
Element is gene likelihood score.
Normal sample distribution function constructs module 300, for according to all normal sample likelihood score matrixes, building to be just
Normal sample distribution function.
Significant difference gene sets determining module 400, for by each of each described disease sample likelihood score matrix
Element is brought into the normal sample distribution function, determines the significant difference gene sets of each disease sample.
Operator logo object determining module 500, for being mapped to the significant difference gene sets of each disease sample
In protein-protein interaction network, the operator logo object of each disease sample is determined.
Disease subtypes categorization module 600, for according to each disease sample operator logo object and known cancer
Hypotype priori data carries out the classification of different subtype to the disease sample.
The gene expression data matrix conversion module 200 specifically includes:
Gene likelihood score computation model construction unit, for utilizing probability density function, building gene likelihood score calculates mould
Type;The expression formula of the gene likelihood score computation model isWherein,
λiIndicate the likelihood score of gene i;Indicate i-th of gene in the expression quantity of j-th of sample;fi 1Indicate gene i in disease sample
Under normal distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample.
Conversion unit is used for according to the gene likelihood score computation model, by the gene expression of each normal sample
Data matrix is converted into normal sample likelihood score matrix, and the gene expression data matrix of each disease sample is converted into
Disease sample likelihood score matrix.
The significant difference gene sets determining module 400 specifically includes:
Probability value computing unit, for bringing each element in the disease sample likelihood score matrix into the normal sample
In this distribution function, the probability value of each gene in each disease sample is calculated.
Judging unit, for judging whether the probability value is less than or equal to given threshold.
Significant difference gene sets determination unit, it is true for gene corresponding to the probability value of given threshold will to be less than or equal to
It is set to the significant difference gene of disease sample.
The present invention is the base for causing disease to have differences between different patients in the difference due to factors such as pathogenic factors
On plinth, by carrying out each hypotype that classification obtains corresponding disease to disease, helping preferably to improve the diagnosis of disease and controlling
It treats, and proposes a kind of cancer operator logo object based on probabilistic model and determine method and system.This is in cancer operator logo object
Play the role of in terms of obtaining with cancer subtypes classification very important.Cancer markers phase is shared with traditional disease sample
Than the present invention can obtain the special operator logo object of each disease sample, and can find belonging to each disease sample
Cancer subtypes type preferably realizes Precise Diagnosis and treatment to disease, from screening during disease development
The diagnosing and treating of mRNA and improvement cancer to key effect all have very important significance.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (10)
1. a kind of cancer operator logo object based on probabilistic model determines method, which is characterized in that the cancer operator logo object
The method of determination includes:
Obtain the gene expression data matrix of multiple normal samples and multiple disease samples;In the gene expression data matrix
Element is gene expression amount;
Using probability density function, normal sample likelihood is converted by the gene expression data matrix of all normal samples
Matrix is spent, converts disease sample likelihood score matrix for the gene expression data matrix of all disease samples;It is described just
Element in normal sample likelihood score matrix and the disease sample likelihood score matrix is gene likelihood score;
According to all normal sample likelihood score matrixes, normal sample distribution function is constructed;
Each element in each disease sample likelihood score matrix is successively brought into the normal sample distribution function, really
The significant difference gene sets of fixed each disease sample;
The significant difference gene sets of each disease sample are successively mapped to protein-protein interaction network
In, determine the operator logo object of each disease sample.
2. cancer operator logo object according to claim 1 determines method, which is characterized in that the cancer operator logo object
Determine method further include:
Disease sample is carried out not according to the operator logo object of each disease sample and known cancer subtypes priori data
With the classification of hypotype.
3. cancer operator logo object according to claim 1 determines method, which is characterized in that described to utilize probability density letter
Number, converts normal sample likelihood score matrix for the gene expression data matrix of all normal samples, will be all described
The gene expression data matrix of disease sample is converted into disease sample likelihood score matrix, specifically includes:
Using probability density function, gene likelihood score computation model is constructed;The expression formula of the gene likelihood score computation model isWherein, λiIndicate the likelihood score of gene i;Indicate i-th of gene
In the expression quantity of j-th of sample;fi 1Indicate normal distribution curve of the gene i under disease sample;fi 2Indicate gene i normal
Normal distribution curve under sample;
According to the gene likelihood score computation model, the gene expression data matrix of all normal samples is converted and is positive
Normal sample likelihood score matrix, converts disease sample likelihood score square for the gene expression data matrix of all disease samples
Battle array.
4. cancer operator logo object according to claim 1 determines method, which is characterized in that it is described according to it is all it is described just
Normal sample likelihood score matrix, constructs normal sample distribution function, specifically includes:
According to all normal sample likelihood score matrixes, the mean value and variance of each gene likelihood score are calculated;
According to the mean value and variance of the gene likelihood score, the normal distribution letter of each gene likelihood score under normal sample is constructed
Number.
5. cancer operator logo object according to claim 1 determines method, which is characterized in that described by each disease
Each element in sample likelihood score matrix is successively brought into the normal sample distribution function, determines the aobvious of each disease sample
Differential gene set is write, is specifically included:
Each element in the disease sample likelihood score matrix is successively brought into the normal sample distribution function, is calculated every
The probability value of each gene in a disease sample;
Judge whether the probability value is less than or equal to given threshold;
If so, gene corresponding to the probability value for being less than or equal to given threshold to be determined as to the significant difference base of disease sample
Cause.
6. cancer operator logo object according to claim 1 determines method, which is characterized in that the successively general is each described
The significant difference gene sets of disease sample are mapped in protein-protein interaction network, determine each disease sample
Operator logo object, specifically include:
Successively the significant difference gene sets of the disease sample are mapped in protein-protein interaction network, and
According to intergenic dependent interaction relationship, by the connection gene dosage filtered out most five genes and five genes
Single order neighbor node be determined as the operator logo object of disease sample.
7. a kind of cancer operator logo object based on probabilistic model determines system, which is characterized in that the cancer operator logo object
The system of determination includes:
Gene expression data matrix obtains module, for obtaining the gene expression data of multiple normal samples and multiple disease samples
Matrix;Element in the gene expression data matrix is gene expression amount;
Gene expression data matrix conversion module, for utilizing probability density function, by the gene table of all normal samples
It is converted into normal sample likelihood score matrix up to data matrix, the gene expression data matrix of all disease samples is turned
Turn to disease sample likelihood score matrix;Element in the normal sample likelihood score matrix and the disease sample likelihood score matrix
It is gene likelihood score;
Normal sample distribution function constructs module, for constructing normal sample according to all normal sample likelihood score matrixes
Distribution function;
Significant difference gene sets determining module, for by each element in each disease sample likelihood score matrix successively
It brings into the normal sample distribution function, determines the significant difference gene sets of each disease sample;
Operator logo object determining module, for the significant difference gene sets of each disease sample to be successively mapped to albumen
In matter-protein-protein interaction network, the operator logo object of each disease sample is determined.
8. cancer operator logo object according to claim 7 determines system, which is characterized in that the cancer operator logo object
Determine system further include:
Disease subtypes categorization module, for according to each disease sample operator logo object and known cancer subtypes priori
Data carry out the classification of different subtype to the disease sample.
9. cancer operator logo object according to claim 7 determines system, which is characterized in that the gene expression data square
Battle array conversion module, specifically includes:
Gene likelihood score computation model construction unit constructs gene likelihood score computation model for utilizing probability density function;Institute
The expression formula for stating gene likelihood score computation model isWherein, λiTable
Show the likelihood score of gene i;Indicate i-th of gene in the expression quantity of j-th of sample;fi 1Indicate gene i under disease sample
Normal distribution curve;fi 2Indicate normal distribution curve of the gene i under normal sample;
Conversion unit is used for according to the gene likelihood score computation model, by the gene expression data of all normal samples
Matrix is converted into normal sample likelihood score matrix, converts disease for the gene expression data matrix of all disease samples
Sick sample likelihood score matrix.
10. cancer operator logo object according to claim 7 determines system, which is characterized in that the significant difference gene
Gather determining module, specifically include:
Probability value computing unit, for successively bringing each element in the disease sample likelihood score matrix into the normal sample
In this distribution function, the probability value of each gene in each disease sample is calculated;
Judging unit, for judging whether the probability value is less than or equal to given threshold;
Significant difference gene sets determination unit is determined as will be less than or equal to gene corresponding to the probability value of given threshold
The significant difference gene of disease sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810920673.7A CN109101783B (en) | 2018-08-14 | 2018-08-14 | Cancer network marker determination method and system based on probability model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810920673.7A CN109101783B (en) | 2018-08-14 | 2018-08-14 | Cancer network marker determination method and system based on probability model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101783A true CN109101783A (en) | 2018-12-28 |
CN109101783B CN109101783B (en) | 2020-09-04 |
Family
ID=64849535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810920673.7A Expired - Fee Related CN109101783B (en) | 2018-08-14 | 2018-08-14 | Cancer network marker determination method and system based on probability model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101783B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110010204A (en) * | 2019-04-04 | 2019-07-12 | 中南大学 | Prognosis biomarker recognition methods based on converged network and more marking strategies |
CN110444248A (en) * | 2019-07-22 | 2019-11-12 | 山东大学 | Cancer Biology molecular marker screening technique and system based on network topology parameters |
CN110797083A (en) * | 2019-09-18 | 2020-02-14 | 中南大学 | Multi-network-based biomarker identification method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103268431A (en) * | 2013-05-21 | 2013-08-28 | 中山大学 | Cancer hypotype biomarker detecting system based on student t distribution |
CN103473416A (en) * | 2013-09-13 | 2013-12-25 | 中国人民解放军国防科学技术大学 | Protein-protein interaction model building method and device |
WO2013192504A1 (en) * | 2012-06-22 | 2013-12-27 | The Trustees Of Dartmouth College | Novel vista-ig constructs and the use of vista-ig for treatment of autoimmune, allergic and inflammatory disorders |
CN105117617A (en) * | 2015-08-26 | 2015-12-02 | 大连海事大学 | Method for screening environmentally sensitive biomolecules |
CN106295246A (en) * | 2016-08-07 | 2017-01-04 | 吉林大学 | Find the lncRNA relevant to tumor and predict its function |
CN107025387A (en) * | 2017-03-29 | 2017-08-08 | 电子科技大学 | One kind is used for biomarker for cancer and knows method for distinguishing |
CN108181471A (en) * | 2017-12-15 | 2018-06-19 | 新疆医科大学第附属医院 | A kind of detection marker of dissection of aorta and marker appraisal procedure |
US20180211013A1 (en) * | 2017-01-25 | 2018-07-26 | International Business Machines Corporation | Patient Communication Priority By Compliance Dates, Risk Scores, and Organizational Goals |
CN108345768A (en) * | 2017-01-20 | 2018-07-31 | 深圳华大生命科学研究院 | A kind of method and marker combination of determining infant's intestinal flora maturity |
-
2018
- 2018-08-14 CN CN201810920673.7A patent/CN109101783B/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013192504A1 (en) * | 2012-06-22 | 2013-12-27 | The Trustees Of Dartmouth College | Novel vista-ig constructs and the use of vista-ig for treatment of autoimmune, allergic and inflammatory disorders |
CN103268431A (en) * | 2013-05-21 | 2013-08-28 | 中山大学 | Cancer hypotype biomarker detecting system based on student t distribution |
CN103473416A (en) * | 2013-09-13 | 2013-12-25 | 中国人民解放军国防科学技术大学 | Protein-protein interaction model building method and device |
CN105117617A (en) * | 2015-08-26 | 2015-12-02 | 大连海事大学 | Method for screening environmentally sensitive biomolecules |
CN106295246A (en) * | 2016-08-07 | 2017-01-04 | 吉林大学 | Find the lncRNA relevant to tumor and predict its function |
CN108345768A (en) * | 2017-01-20 | 2018-07-31 | 深圳华大生命科学研究院 | A kind of method and marker combination of determining infant's intestinal flora maturity |
US20180211013A1 (en) * | 2017-01-25 | 2018-07-26 | International Business Machines Corporation | Patient Communication Priority By Compliance Dates, Risk Scores, and Organizational Goals |
CN107025387A (en) * | 2017-03-29 | 2017-08-08 | 电子科技大学 | One kind is used for biomarker for cancer and knows method for distinguishing |
CN108181471A (en) * | 2017-12-15 | 2018-06-19 | 新疆医科大学第附属医院 | A kind of detection marker of dissection of aorta and marker appraisal procedure |
Non-Patent Citations (4)
Title |
---|
JOSE M. PENA 等: "Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control", 《EUROPEAN CONFERENCE ON EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS》 * |
JUNJIE SU 等: "Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity", 《PLOS ONE》 * |
XIAOPING LIU 等: "Personalized characterization of diseases using sample-specific networks", 《NUCLEIC ACIDS RESEARCH》 * |
高云朝: "血清肿瘤标志物在胰腺癌诊断中的选择", 《上海医学》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110010204A (en) * | 2019-04-04 | 2019-07-12 | 中南大学 | Prognosis biomarker recognition methods based on converged network and more marking strategies |
CN110010204B (en) * | 2019-04-04 | 2022-12-02 | 中南大学 | Fusion network and multi-scoring strategy based prognostic biomarker identification method |
CN110444248A (en) * | 2019-07-22 | 2019-11-12 | 山东大学 | Cancer Biology molecular marker screening technique and system based on network topology parameters |
CN110444248B (en) * | 2019-07-22 | 2021-09-24 | 山东大学 | Cancer biomolecule marker screening method and system based on network topology parameters |
CN110797083A (en) * | 2019-09-18 | 2020-02-14 | 中南大学 | Multi-network-based biomarker identification method |
CN110797083B (en) * | 2019-09-18 | 2023-04-18 | 中南大学 | Biomarker identification method based on multiple networks |
Also Published As
Publication number | Publication date |
---|---|
CN109101783B (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Deep learning for detecting corona virus disease 2019 (COVID-19) on high-resolution computed tomography: a pilot study | |
CN103415624B (en) | Cancer of pancreas biomarker and application thereof | |
CN102985819B (en) | Lung cancer biomarkers and uses thereof | |
CN106168624B (en) | Lung cancer biomarkers and application thereof | |
CN106599616B (en) | Ultralow frequency mutational site determination method based on duplex-seq | |
CN103429753A (en) | Mesothelioma biomarkers and uses thereof | |
He et al. | Automated model design and benchmarking of deep learning models for covid-19 detection with chest ct scans | |
CN102209968A (en) | Lung cancer biomarkers and uses thereof | |
CN113168886A (en) | Systems and methods for germline and somatic variant calling using neural networks | |
CN109101783A (en) | A kind of cancer operator logo object based on probabilistic model determines method and system | |
Wang et al. | Integrated bioinformatic analysis reveals YWHAB as a novel diagnostic biomarker for idiopathic pulmonary arterial hypertension | |
Cao et al. | Integrating multiple evidence sources to predict adverse drug reactions based on a systems pharmacology model | |
KR102181058B1 (en) | Method for data processing to derive new drug candidate substance | |
Hu et al. | Classifying the multi-omics data of gastric cancer using a deep feature selection method | |
US8068994B2 (en) | Method for analyzing biological networks | |
CN107169264B (en) | complex disease diagnosis system | |
Sun et al. | Protein classifier for thyroid nodules learned from rapidly acquired proteotypes | |
Ye et al. | Circular Trajectory Reconstruction Uncovers Cell‐Cycle Progression and Regulatory Dynamics from Single‐Cell Hi‐C Maps | |
Yuan et al. | Self-organizing maps for cellular in silico staining and cell substate classification | |
Liu et al. | Construction of disease-specific cytokine profiles by associating disease genes with immune responses | |
Liu et al. | Joint skeleton estimation of multiple directed acyclic graphs for heterogeneous population | |
KR102187594B1 (en) | Multi-omics data processing apparatus and method for discovering new drug candidates | |
KR20200123771A (en) | New drug candidate substance search method based on multiomics network | |
TWI450968B (en) | A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients | |
Shi et al. | A novel high-dimensional kernel joint non-negative matrix factorization with multimodal information for lung cancer study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200904 Termination date: 20210814 |