CN108647489A - A kind of method and system of screening disease medicament target and target combination - Google Patents
A kind of method and system of screening disease medicament target and target combination Download PDFInfo
- Publication number
- CN108647489A CN108647489A CN201810461277.2A CN201810461277A CN108647489A CN 108647489 A CN108647489 A CN 108647489A CN 201810461277 A CN201810461277 A CN 201810461277A CN 108647489 A CN108647489 A CN 108647489A
- Authority
- CN
- China
- Prior art keywords
- protein
- target
- gene
- value
- testing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Abstract
The present invention discloses a kind of method and system of screening disease medicament target and drug targets combination, and this method includes:Autocoder is built according to differential expression data of the protein between disease cells system and normal structure;The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;According to the knockout neural network forecast disease-associated protein;The related protein is drug targets;According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein is drug targets combination.By this method or system can predictive disease related protein and protein simultaneously combined effect.
Description
Technical field
The present invention relates to deep neural network fields, are combined more particularly to a kind of screening disease medicament target and target
Method and system.
Background technology
With the progress of bio-measurement means, the relevant high-throughput data of disease medicament constantly accumulate, to some diseases and
The understanding of the relevant genes/proteins matter of disease also deepens continuously.Target drug therapy is considered in safety and medicine at present
It is better than conventional medication method in product adverse reaction (ADR), therefore target drug is increasingly becoming disease treatment and medicament research and development
Main way.In the work of such medicament research and development, the step of most critical, is just to determine drug target, and determines drug target
Key is preferred disease-associated protein.
Many bioinformatics methods for drug design can be in conjunction with various types of data informations, such as albumen at present
Matter interacts (protein-protein interactions, PPI), genome mutation, the expression of genes/proteins matter and function
Disease related gene/protein is screened in annotation etc., and wherein utilizes the certain methods of bio-networks that there is preferable performance.One
A little methods are using including that the relevant bioprocess information of disease is used for predictive disease related gene/egg in protein interaction network
White matter;Also certain methods combine other groups of data using protein interaction network, such as genes/proteins matter express spectra and gene
Abrupt information etc. is organized to speculate new related gene;Other method can be screened by network topology structure.These sides
Method bio-networks method can generally follow " guiltby association (GBA) " principle, i.e., with known disease gene/albumen
The closely related genes/proteins matter of matter or phenotype are also more likely related to the disease, and this kind of prediction is likely to introducing one
A little prejudice results.Some methods incorporate the data structure network of multisample, can also ignore tissue and condition present in network
Specificity.
The existing method based on protein-protein interaction network predictive disease target spot is typically based on following steps:One:It collects
A large amount of protein interaction data, and it is a nonredundancy to arrange, and removes the set of incorrect link;Two:Collect normal group
The gene expression profile with diseased tissue is knitted, and calculates the differential expression value between two class loadings;Three:Calculate has phase with selected protein
The sum of the differential expression value of all proteins of interaction, and using this value as the standard of preferred candidate gene.
Neural network has very strong nonlinear fitting ability, is realized convenient for computer, has very strong robustness, memory energy
Power, non-linear mapping capability and powerful self-learning capability, are currently the important means of deep learning, before having prodigious application
Scape.Egg is learnt based on the deep neural network model of autocoder auto-encoder structures we have proposed one herein
Specificity of the white matter interaction in diseased tissue, and by the network after training for screening disease-associated protein and albumen
Matter combines.
Invention content
The object of the present invention is to provide the method and system of a kind of screening disease medicament target and drug targets combination, propose
A kind of deep learning method based on autocoder, can fully learn protein interaction in the multigroup data of cancer
In specificity, the network after deep learning training can combine with Effective selection cancer related drugs target and target.
To achieve the above object, the present invention provides following schemes:
A method of screening disease medicament target and drug targets combination, including:
Autocoder is built according to differential expression data of the protein between disease cells system and normal structure;
The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;
According to the knockout neural network forecast disease-associated protein;The related protein is drug targets;
According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein is drug
Target combines.
Optionally, the knock-up effect that gene is calculated according to the autocoder, structure knock out network and specifically include:
Deep learning network model is built according to autocoder;
Differential expression spectrum is inputted the deep learning network model, obtains difference by given differential expression spectrum
Expression value is denoted as background output B;
Difference value threshold value is set, the gene that difference value in the differential expression spectrum is more than the difference value threshold value, note are chosen
For cance high-expression gene;
All cance high-expression genes are sorted from big to small by difference value, by the maximum cance high-expression gene of difference value
The numerical value for assigning difference value minimum in the differential expression spectrum, assigns all cance high-expression genes to new difference value successively;
According to the cance high-expression gene with new difference value and the differential expression spectrum after all cance high-expression genes of removing
In remaining gene constitute new differential expression spectrum;
The new differential expression spectrum is inputted into the deep learning network model, obtains the second output K;
Threshold value is compared in setting;
The difference for calculating the second output K of all cance high-expression genes and the background output B of the cance high-expression gene, obtains
To comparing difference;
The cance high-expression gene that all relatively differences are more than the relatively threshold value is denoted as knockout gene;
According to all gene constructed knockout networks of knockout.
Optionally, described to be specially according to the gene constructed knockout network of knockout:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Optionally, described to be specifically included according to the knockout neural network forecast disease-associated protein:
Known drug target is set as marker gene, testing protein, relevance threshold;
The target point protein matter being connected with the testing protein and source point protein are obtained according to the knockout network;
According to the target point protein matter and the marker gene, the target point protein matter and activation effect of depression effect are distinguished
Target point protein matter;
Calculate the side that the target point protein matter of the depression effect is connected with the testing protein weight and, be denoted as first
Weight and;
Calculate the absolute value of the weight on the side that the target point protein matter of the activation effect is connected with the testing protein
With, be denoted as the first absolute value and;
The sum for calculating the weight on the side of all positive values of the source point protein is denoted as the second weight and all negative values power
Weight absolute value sum, be denoted as the second absolute value with;
According to first weight and first absolute value and second weight and second absolute value and meter
Calculate the Relevance scores of the testing protein;
The Relevance scores for choosing all testing proteins are higher than the testing protein of the relevance threshold, as
Disease-associated protein.
Optionally, described to be specifically included according to the combination for knocking out neural network forecast disease-associated protein:
It is given birth at random according to the positive sample as positive sample with lethal and combined effect protein combination known to collecting
At the negative sample of 10 times of positive sample quantity;
The either objective protein in the knockout network is chosen, screens what all and target protein was connected directly
Target point protein matter and source point protein;
Judge target point protein matter, the source point protein of the target protein and the target protein in the knockout network
It is to be present in the positive sample, is still present in the negative sample;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited
It is in the positive sample, then by the target protein, the target point protein matter of the target protein and source point protein
The weight addition on side takes absolute value, and obtains the first combining weights and absolute value;
By the first combining weights of all target proteins in the positive sample with absolute value be added and just combined
Weight and absolute value;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited
It is in the negative sample, then by the target protein, the target point protein matter of the target protein and source point protein
The weight addition on side takes absolute value, and obtains the second combining weights and absolute value;
Second combining weights of all target proteins in the negative sample are added to obtain with absolute value and bear combined weights
Weight and absolute value;
According to the absolute value of the first combining weights sum and the absolute value of the second combining weights sum by the target
Protein assigns 1, -1 or 0 value, obtains target protein assignment;
Selected first testing protein and the second testing protein;
Set the first detection threshold value, the second detection threshold value;
The ratio for calculating the protein of first testing protein and the second testing protein joint effect, is denoted as
Joint effect protein ratio;
First testing protein is calculated according to the target protein assignment and second testing protein is common
The ratio of the protein being evaluated influenced is denoted as joint effect and is evaluated protein ratio, the albumen being evaluated
Matter is the protein for being assigned a value of 1 or -1;
Judge whether the joint effect protein ratio is more than first detection threshold value, while the joint effect quilt
Assess whether protein ratio is more than second detection threshold value;
If so, first testing protein and the combination of the second testing protein are then the group of disease-associated protein
It closes.
A kind of system of screening disease medicament target and drug targets combination, including:
Autocoding module, for being built according to differential expression data of the protein between disease cells system and normal structure
Autocoder;
Network struction module is knocked out, the knock-up effect for calculating gene according to the autocoder, structure knocks out net
Network;
Related protein prediction module, for according to the knockout neural network forecast disease-associated protein;The correlation egg
White matter is drug targets;
Protein combination prediction module, it is described for the combination according to the knockout neural network forecast disease-associated protein
The combination of related protein is drug targets combination.
Optionally, the knockout network struction module specifically includes:
Network model construction unit, for building deep learning network model according to autocoder;
Background exports computing unit, and for giving a differential expression spectrum, differential expression spectrum is inputted the depth
Learning network model, obtains differential expression value, is denoted as background output B;
Cance high-expression gene acquiring unit is chosen difference value in the differential expression spectrum and is more than for setting difference value threshold value
The gene of the difference value threshold value, is denoted as cance high-expression gene;
Cance high-expression gene assignment unit will be poor for all cance high-expression genes to sort from big to small by difference value
The different maximum cance high-expression gene of value assigns the numerical value of difference value minimum in the differential expression spectrum, successively by all height
Expressing gene assigns new difference value;
New differential expression composes construction unit, has the cance high-expression gene of new difference value for basis and removes all described
The remaining gene in differential expression spectrum after cance high-expression gene constitutes new differential expression spectrum;
Second output computing unit is obtained for the new differential expression spectrum to be inputted the deep learning network model
Second output K;
Compare threshold setting unit, compares threshold value for setting;
Compare difference computational unit, the second output K and Gao Biaodaji for calculating all cance high-expression genes
The difference of the background output B of cause, obtains comparing difference;
Gene acquiring unit is knocked out, the Gao Biaodaji for all relatively differences to be more than to the relatively threshold value
Because being denoted as knockout gene;
Network struction unit, for according to all gene constructed knockout networks of knockout.
Optionally, the network struction unit is specially:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Optionally, the related protein prediction module specifically includes:
Gene setup unit, for setting known drug target as marker gene, testing protein, relevance threshold;
Target spot source point protein acquiring unit, for what is be connected with the testing protein according to the knockout network acquisition
Target point protein matter and source point protein;
Target point protein matter discrimination unit, for according to the target point protein matter and the marker gene, distinguishing depression effect
Target point protein matter and activation effect target point protein matter;
Inhibit target point protein matter computing unit, the target point protein matter for calculating the depression effect and the testing protein
The weight on the connected side of matter and, be denoted as the first weight and;
Target point protein matter computing unit is activated, the target point protein matter for calculating the activation effect and the testing protein
Sum of the absolute value of the weight on the connected side of matter, be denoted as the first absolute value with;
Source point protein computing unit, the sum of the weight on the side of all positive values for calculating the source point protein, note
For the second weight and, sum of the absolute value of all negative value weights, be denoted as the second absolute value with;
Relevance scores computing unit, for according to first weight and first absolute value and second power
Weight and second absolute value and the Relevance scores for calculating the testing protein;
Disease-associated protein determination unit, for choosing the Relevance scores of all testing proteins higher than described
The testing protein of relevance threshold, as disease-associated protein.
Optionally, the protein combination prediction module specifically includes:
Sample collection unit, for collect it is known with lethal and combined effect protein combination as positive sample, root
Generate the negative sample of 10 times of positive sample quantity at random according to the positive sample;
Target protein screening unit, for choose it is described knockout network in either objective protein, screening it is all with
The target point protein matter and source point protein that the target protein is connected directly;
First judging unit, the target spot for judging target protein and the target protein in the knockout network
Protein, source point protein are present in the positive sample, are still present in the negative sample;
First combining weights computing unit, if for the target protein knocked out in network and the target protein
Target point protein matter, source point protein be present in the positive sample, then by the target protein, the target protein
Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the first combining weights and absolute value;
Positive combining weights computing unit, for by the first combining weights of all target proteins in the positive sample and
Absolute value be added to obtain positive combining weights and absolute value;
Second combining weights computing unit, if for the target protein knocked out in network and the target protein
Target point protein matter, source point protein be present in the negative sample, then by the target protein, the target protein
Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the second combining weights and absolute value;
Negative combining weights computing unit, for by the second combining weights of all target proteins in the negative sample and
Absolute value is added to obtain negative combining weights and absolute value;
Target protein assignment unit is used for the absolute value according to the first combining weights sum and second combined weights
The absolute value of weight sum assigns the target protein to 1, -1 or 0 value, obtains assignment target protein;
Protein group selection unit, for selecting the first testing protein and the second testing protein, as testing protein
Matter group;
Threshold setting unit, for setting the first detection threshold value, the second detection threshold value;
The first protein matter ratio computing unit, for calculating first testing protein and second testing protein
The ratio of the protein of joint effect is denoted as joint effect protein ratio;
Second protein ratio computing unit, for calculating first testing protein according to the assignment target protein
The ratio for the protein of matter and the second testing protein joint effect being evaluated is denoted as joint effect and is evaluated albumen
Matter ratio;
Second judgment unit, for judging whether the joint effect protein ratio is more than first detection threshold value,
The joint effect is evaluated whether protein ratio is more than second detection threshold value simultaneously;
Protein combination determination unit is used for if so, first testing protein and the combination of the second testing protein
It is then the combination of disease-associated protein;Otherwise, it is not the combination of disease-associated protein.
According to specific embodiment provided by the invention, the invention discloses following technique effects:
Differential expression data in the present invention according to protein between disease cells system and normal structure builds autocoding
Device;The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;According to the knockout neural network forecast disease
Related protein;The related protein is drug targets;According to the group for knocking out neural network forecast disease-associated protein
It closes, the combination of the related protein is drug targets combination.Propose a kind of simulation knockout based on deep neural network
The method of effect:In depth model, change the input value of each protein, observation exports the difference of generation to assess protein
To the effect of disease.A kind of network model, which is used only, in the present invention can capture the feature structure and interior implied in complex data
In rule, can predictive disease related protein and protein simultaneously combined effect, to two class forecasting problems theoretical and real
Unification is carried out on now.
Description of the drawings
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the present invention
Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the method flow diagram that the embodiment of the present invention screens disease medicament target and drug targets combination;
Fig. 2 is the method flow diagram of predictive disease related protein of the embodiment of the present invention;
Fig. 3 is the system module figure that the embodiment of the present invention screens disease medicament target and drug targets combination;
Fig. 4 is related protein prediction module structure chart of the embodiment of the present invention;
Fig. 5 is that the embodiment of the present invention uses deep neural network model predictive disease related protein and protein combination
Flow chart;
Fig. 6 is the network for the cancer-associated proteins matter combination that the embodiment of the present invention is predicted.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
Fig. 1 is the method flow diagram that the embodiment of the present invention screens disease medicament target and drug targets combination.Referring to Fig. 1,
A method of screening disease medicament target and drug targets combination, including:
Step 101:Autocoding is built according to differential expression data of the protein between disease cells system and normal structure
Device;
Step 102:The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;
Step 103:According to the knockout neural network forecast disease-associated protein;The related protein is medicine target
Mark;
Step 104:According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein
As drug targets combine.
The feature structure and inherent law implied in complex data can be captured using the above method, can be predicted simultaneously
The combined effect of disease-associated protein and protein carries out unification to two class forecasting problems in theoretical and realization.
The present invention deep neural network model basic unit be automatic coding machine, the immanent structure of automatic coding machine with
Protein-protein interaction network is corresponding, and differential expression data is utilized in training process, therefore can utilize different tissues
Specificity of the difference inquiry learning to protein interaction.In addition multiple automatic coding machines are combined, it can will be mutual
It acts on two protein of distance farther out in network and establishes contact.
Wherein, the knock-up effect that gene is calculated according to the autocoder, structure knock out network and specifically include:
Deep learning network model is built according to autocoder;
Differential expression spectrum is inputted the deep learning network model, obtains difference by given differential expression spectrum
Expression value is denoted as background output B;
Difference value threshold value is set, the gene that difference value in the differential expression spectrum is more than the difference value threshold value, note are chosen
For cance high-expression gene;
All cance high-expression genes are sorted from big to small by difference value, by the maximum cance high-expression gene of difference value
The numerical value for assigning difference value minimum in the differential expression spectrum, assigns all cance high-expression genes to new difference value successively;
According to the cance high-expression gene with new difference value and the differential expression spectrum after all cance high-expression genes of removing
In remaining gene constitute new differential expression spectrum;
The new differential expression spectrum is inputted into the deep learning network model, obtains the second output K;
Threshold value is compared in setting;
The difference for calculating the second output K of all cance high-expression genes and the background output B of the cance high-expression gene, obtains
To comparing difference;
The cance high-expression gene that all relatively differences are more than the relatively threshold value is denoted as knockout gene;
According to all gene constructed knockout networks of knockout.
Wherein, described to be specially according to the gene constructed knockout network of knockout:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Fig. 2 is the method flow diagram of predictive disease related protein of the embodiment of the present invention.Referring to Fig. 2, described in the basis
Neural network forecast disease-associated protein is knocked out to specifically include:
Step 201:Known drug target is set as marker gene, testing protein, relevance threshold;
Step 202:The target point protein matter being connected with the testing protein and source point egg are obtained according to the knockout network
White matter;
Step 203:According to the target point protein matter and the marker gene, distinguishes the target point protein matter of depression effect and swash
The target point protein matter that active is answered;
Step 204:Calculate the side that the target point protein matter of the depression effect is connected with the testing protein weight and,
Be denoted as the first weight and;
Step 205:Calculate the weight on the side that the target point protein matter of the activation effect is connected with the testing protein
The sum of absolute value, be denoted as the first absolute value and;
Step 206:The sum for calculating the weight on the side of all positive values of the source point protein is denoted as the second weight and institute
The sum for having the absolute value of negative value weight, be denoted as the second absolute value and;
Step 207:Absolutely according to first weight and first absolute value and second weight and described second
To being worth and calculating the Relevance scores of the testing protein;
Step 208:The Relevance scores for choosing all testing proteins are higher than the egg to be measured of the relevance threshold
White matter, as disease-associated protein.
Wherein, described to be specifically included according to the combination for knocking out neural network forecast disease-associated protein:
It is given birth at random according to the positive sample as positive sample with lethal and combined effect protein combination known to collecting
At the negative sample of 10 times of positive sample quantity;
The either objective protein in the knockout network is chosen, screens what all and target protein was connected directly
Target point protein matter and source point protein;
Judge target point protein matter, the source point protein of the target protein and the target protein in the knockout network
It is to be present in the positive sample, is still present in the negative sample;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited
It is in the positive sample, then by the target protein, the target point protein matter of the target protein and source point protein
The weight addition on side takes absolute value, and obtains the first combining weights and absolute value;
By the first combining weights of all target proteins in the positive sample with absolute value be added and just combined
Weight and absolute value;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited
It is in the negative sample, then by the target protein, the target point protein matter of the target protein and source point protein
The weight addition on side takes absolute value, and obtains the second combining weights and absolute value;
Second combining weights of all target proteins in the negative sample are added to obtain with absolute value and bear combined weights
Weight and absolute value;
According to the absolute value of the first combining weights sum and the absolute value of the second combining weights sum by the target
Protein assigns 1, -1 or 0 value, obtains target protein assignment;
Selected first testing protein and the second testing protein;
Set the first detection threshold value, the second detection threshold value;
The ratio for calculating the protein of first testing protein and the second testing protein joint effect, is denoted as
Joint effect protein ratio;
First testing protein is calculated according to the target protein assignment and second testing protein is common
The ratio of the protein being evaluated influenced is denoted as joint effect and is evaluated protein ratio, the albumen being evaluated
Matter is the protein for being assigned a value of 1 or -1;
Judge whether the joint effect protein ratio is more than first detection threshold value, while the joint effect quilt
Assess whether protein ratio is more than second detection threshold value;
If so, first testing protein and the combination of the second testing protein are then the group of disease-associated protein
It closes.
Fig. 3 is the system module figure that the embodiment of the present invention screens disease medicament target and drug targets combination.Referring to Fig. 3,
A kind of system of screening disease medicament target and drug targets combination, including:
Autocoding module 301, for the differential expression data according to protein between disease cells system and normal structure
Build autocoder;
Network struction module 302 is knocked out, the knock-up effect for calculating gene according to the autocoder, structure knocks out
Network;
Related protein prediction module 303, for according to the knockout neural network forecast disease-associated protein;The correlation
Protein is drug targets;
Protein combination prediction module 304, for according to the combination for knocking out neural network forecast disease-associated protein, institute
The combination for stating related protein is drug targets combination.
Wherein, the knockout network struction module specifically includes:
Network model construction unit, for building deep learning network model according to autocoder;
Background exports computing unit, and for giving a differential expression spectrum, differential expression spectrum is inputted the depth
Learning network model, obtains differential expression value, is denoted as background output B;
Cance high-expression gene acquiring unit is chosen difference value in the differential expression spectrum and is more than for setting difference value threshold value
The gene of the difference value threshold value, is denoted as cance high-expression gene;
Cance high-expression gene assignment unit will be poor for all cance high-expression genes to sort from big to small by difference value
The different maximum cance high-expression gene of value assigns the numerical value of difference value minimum in the differential expression spectrum, successively by all height
Expressing gene assigns new difference value;
New differential expression composes construction unit, has the cance high-expression gene of new difference value for basis and removes all described
The remaining gene in differential expression spectrum after cance high-expression gene constitutes new differential expression spectrum;
Second output computing unit is obtained for the new differential expression spectrum to be inputted the deep learning network model
Second output K;
Compare threshold setting unit, compares threshold value for setting;
Compare difference computational unit, the second output K and Gao Biaodaji for calculating all cance high-expression genes
The difference of the background output B of cause, obtains comparing difference;
Gene acquiring unit is knocked out, the Gao Biaodaji for all relatively differences to be more than to the relatively threshold value
Because being denoted as knockout gene;
Network struction unit, for according to all gene constructed knockout networks of knockout.
Wherein, the network struction unit is specially:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Fig. 4 is related protein prediction module structure chart of the embodiment of the present invention.Referring to Fig. 4, the related protein prediction
Module specifically includes:
Marker gene setup unit 401, for setting known drug target as marker gene, testing protein, correlation
Property threshold value;
Target spot source point protein acquiring unit 402, for being obtained and the testing protein phase according to the knockout network
Target point protein matter even and source point protein;
Target point protein matter discrimination unit 403, for according to the target point protein matter and the marker gene, distinguishing and inhibiting effect
The target point protein matter of the target point protein matter and activation effect answered;
Inhibit target point protein matter computing unit 404, for calculate the target point protein matter of the depression effect with it is described to be measured
The weight on the connected side of protein and, be denoted as the first weight and;
Activate target point protein matter computing unit 405, for calculate the target point protein matter of the activation effect with it is described to be measured
Sum of the absolute value of the weight on the connected side of protein, be denoted as the first absolute value with;
Source point protein computing unit 406, the sum of the weight on the side of all positive values for calculating the source point protein,
Be denoted as the second weight and, sum of the absolute value of all negative value weights, be denoted as the second absolute value with;
Relevance scores computing unit 407, for according to first weight and first absolute value and described the
Two weights and second absolute value and the Relevance scores for calculating the testing protein;
Disease-associated protein determination unit 408, the Relevance scores for choosing all testing proteins are higher than
The testing protein of the relevance threshold, as disease-associated protein.
Wherein, the protein combination prediction module specifically includes:
Sample collection unit, for collect it is known with lethal and combined effect protein combination as positive sample, root
Generate the negative sample of 10 times of positive sample quantity at random according to the positive sample;
Target protein screening unit, for choose it is described knockout network in either objective protein, screening it is all with
The target point protein matter and source point protein that the target protein is connected directly;
First judging unit, the target spot for judging target protein and the target protein in the knockout network
Protein, source point protein are present in the positive sample, are still present in the negative sample;
First combining weights computing unit, if for the target protein knocked out in network and the target protein
Target point protein matter, source point protein be present in the positive sample, then by the target protein, the target protein
Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the first combining weights and absolute value;
Positive combining weights computing unit, for by the first combining weights of all target proteins in the positive sample and
Absolute value be added to obtain positive combining weights and absolute value;
Second combining weights computing unit, if for the target protein knocked out in network and the target protein
Target point protein matter, source point protein be present in the negative sample, then by the target protein, the target protein
Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the second combining weights and absolute value;
Negative combining weights computing unit, for by the second combining weights of all target proteins in the negative sample and
Absolute value is added to obtain negative combining weights and absolute value;
Target protein assignment unit is used for the absolute value according to the first combining weights sum and second combined weights
The absolute value of weight sum assigns the target protein to 1, -1 or 0 value, obtains assignment target protein;
Protein group selection unit, for selecting the first testing protein and the second testing protein, as testing protein
Matter group;
Threshold setting unit, for setting the first detection threshold value, the second detection threshold value;
The first protein matter ratio computing unit, for calculating first testing protein and second testing protein
The ratio of the protein of joint effect is denoted as joint effect protein ratio;
Second protein ratio computing unit, for calculating first testing protein according to the assignment target protein
The ratio for the protein of matter and the second testing protein joint effect being evaluated is denoted as joint effect and is evaluated albumen
Matter ratio;
Second judgment unit, for judging whether the joint effect protein ratio is more than first detection threshold value,
The joint effect is evaluated whether protein ratio is more than second detection threshold value simultaneously;
Protein combination determination unit is used for if so, first testing protein and the combination of the second testing protein
It is then the combination of disease-associated protein.
Fig. 5 is that the embodiment of the present invention uses deep neural network model predictive disease related protein and protein combination
The method of the present invention is described in detail below referring to Fig. 5 in flow chart:
Neural network model designs and training (referring to a model trainings part in Fig. 5)
The auto-encoder of standard is optimized for a kind of supersparsity model to be applicable in the study relevant spy of disease by this research
Sign.In a model, each input unit represents differential expression value of the gene in diseased tissue and normal structure, Mei Geyin
A protein interaction is represented containing neuron.Because differential expression value can be positive value, or negative value, therefore one
There are Three models for hidden neuron:++, -- and+-.Different patterns has different meanings in biology, therefore need to be to this
Make differentiation.The neuron of model identical is formed into an automatic coding machine, therefore there are three automatic coding machines.Each coding
The number of the input unit of machine is the number of protein, and the number of hidden neuron is the number of interaction.Three automatic volumes
Ink recorder is trained respectively, merges three automatic coding machines after the completion of training, the automatic coding machine input unit after merging
Number remains unchanged, and the number of hidden neuron is three times originally.The activation primitive of automatic coding machine is to integrate linearly to swash
Function (formula 4) living updates weight with back-propagation algorithm, and learning rate is set as 0.005, and momentum is set as 0.5, and carries out L2
Regularization.Each training cycle has 6 samples, cycle-index to be dimensioned to fixed number of times according to training sample.
P1=W1|i1| (1)
P2=W2|i2| (2)
I in formula1And i2Represent the input value of input unit, W1And W2Corresponding weight is represented, and W representatives each follow
Average weight in ring, Pi(i=1,2) represent the value that weight is multiplied by input, and P represents the input value of hidden neuron.
Calculate the knock-up effect (knocking out analog portion referring to the b in Fig. 5) of gene
In order to calculate the knock-up effect of given genes/proteins matter, we build one using trained automatic coding machine
Deep learning network.A deep learning network is built with automatic coding machine first, is then composed using a differential expression as defeated
Enter the output of computation model and (differential expression spectrum is calculated by experimental data, from public database as background output
Download), finally change and give the input value of gene and calculate output, the difference between the output valve and background output is counted as this
Other genes that the knock-up effect of gene is influenced.The value given in practical applications after gene alteration is a negative value, specifically
The distribution that size is dependent on all differences expression value provides, close to the minimum value in distribution.Computation model is by five serial phases
The automatic coding machine that training even is completed is constituted.Each layer of output consists of two parts, i.e., the output of this layer automatic coding machine
With the input of this layer of automatic coding machine, it is respectively a and (1-a) that this two parts, which assigns different weights, and 0.25 is set as in this
(formula 5).Calculate comprising the steps of for knock-up effect:
1:Given differential expression spectrum calculates the output of neural network model as input, and output is neural network meter
Obtained differential expression value, and it is that background exports B to note down.
2:Calculate the knock-up effect that gene is raised in differential expression spectrum.A threshold value, such as 0.5 are set, by differential expression
Gene of the difference value more than 0.5 is selected in spectrum, and is assigned a smaller value successively and kept the constant (ratio of the value of other genes
Smaller value is the minimum value in rounding volume data depending on foundation data distribution).Express spectra after this is changed is input to identical
Model in and record its output be K.
3:To the output K of each cance high-expression gene in step 2, its difference between background output B is calculated.Due to
It is all gene that output unit is corresponding, therefore the big gene of difference is exactly to knock out to give the gene that gene is influenced.
4:Structure knocks out network.The result of step 3 can be expressed as the form of network, and the gene of knockout is source point, side
The gene that is affected is directed toward in direction, and the weight on side is then the difference value of the gene that is affected in K-B, wherein K refer to change some
The output obtained after cance high-expression gene.B refers to the output obtained for the input through change.Each differential expression spectrum can obtain
Corresponding knockout network.
ln=α en+(1-α)ln-1 (5)
D=B-K (6)
I in formulanRepresent the output of layer n, enThe output of layer n automatic coding machines is represented, a is weight, is set as 0.25 herein.
Vectorial K and B indicates to knock out the output of some gene and background output, the differences of D two kinds of outputs of expression respectively.
Predictive disease related protein (referring to the c preferred proteins content in Fig. 5)
To assess the correlation of each gene and disease, needs to make according to its connection in knocking out network and beat
Point, the score value the high then higher with the correlation of disease.The participation of marker gene is needed during calculating gene and giving a mark, herein
We are using known drug target as label.The score for calculating given protein P needs to consider in knocking out network directly
Coupled protein is directed edge due to knocking out the side in network, and directly connected protein is divided into two
Class, source point protein and target point protein matter simultaneously are respectively used to calculate the score of given protein P.Known drug target can divide
For activation effect target spot and depression effect target spot.There is the side of positive value weight to mean the knockout of source point protein in knocking out network
Weaken target point protein matter function, so if the target point protein matter of protein P be the drug target of known depression effect and
Side with positive value weight then by the adduction for the weight for calculating the side that all proteinoid are connected with P and is marked as
SpwtIf the target point protein matter of protein P is the drug target of activation effect and the side with negative value weight, such egg is calculated
The absolute value for the negative value weight that white matter is connected with P and and be marked as Snwt.For the source point albumen i being connected with P, we
First by its it is all positive value weights and be denoted asThe absolute value of negative value weight and be denoted asBy the S of protein ipwt
And SnwtIt is respectively labeled asWithThe calculating of the score of final protein matter P has complete description in formula 7.
The ω in formulaiRepresent the weight of source point protein i to target point protein matter P.
The combination of predictive disease related protein (referring to the d preferred proteins combined arrangement in Fig. 5)
We also have the protein combination of lethal effect using knockout neural network forecast.Basic assumption is there is lethal effect
The protein combination answered can influence same group of other oroteins, or can be influenced by same histone matter, if by these eggs
White matter effectively identifies, then can utilize the protein combination with lethal effect of these protein predictions newly.First
It is then random to generate 10 times with lethal and combined effect protein combination as positive sample known to being collected from database
The negative sample of positive sample quantity.It is as follows:
1:Each protein in network, all target point protein matter being directly connected of screening and source are knocked out at one
Point protein, if arbitrarily combination is present in positive sample in these protein, two protein and choosing during this is combined
The weight phase adduction for determining the side of protein takes its absolute value, is all carried out in this way to all protein combinations being present in positive sample
Operation and will value sum and be labeled as LWpos.Negative sample is done and similarly calculates and is marked as LWneg.It needs to note
Meaning is that source point protein and target point protein matter are respectively calculated.
2:With previous step be calculated as a result, we assign each protein to 1, -1 or 0 value according to formula 10.It is public
It is close to seek the quantity of the protein of -1 and 1 that threshold value T in formula is set as 2.3.
3:Selected protein x and y simultaneously calculates whether x and y has lethal effect or combined effect.All and x is screened first
The protein that is connected with y and be X and Y by two histone matter aggregated labels, and its intersection is labeled as XY.Then 9 He of formula is used
The CR that formula 10 is calculatedxyAnd CRAxyValue, CRxyIndicate the ratio of the protein of protein x, y joint effect, CRAxyIt is altogether
With the ratio of the protein being evaluated influenced, in formulaIndicate the weight on the side of given protein x connection protein p.
Finally, if CRxy>0.3andCRAxy>0.03, then the protein combination be just accredited as with combined effect or lethal effect.
It also needs to carry out following operation in embodiments of the present invention:
1 collects known drug target spot data set
This research obtains 913 known drug target point proteins from Therapeutic Target Database (TTD)
Matter, and it is divided into two classes:Inhibiting effect class (inhibitors) and agonism class (agonists).For screening new cancer
Related gene.
2 collect gene and protein expression data set
Gene and protein expression data have 3 sources:(1) the protein table downloaded from ProteomicsDB databases
Up to data, including 98 cell line samples;(2) the rna expression data downloaded from BioXpress databases, including 18 kinds of cancers
The sample of disease type and 660 patients;These genes and protein expression data are mainly used for 2 parts of this research:Training
Neural network model:Use the differential expression value compared between cell line in ProteomicsDB and tissue;Predict that cancer is related
Gene:Use the data in ProteomicsDB and BioXpress.
3 collect protein interaction data set
Protein interaction (protein-protein interactions, PPIs) data set has collected from five
The data of public protein database, i.e. The Biological General Repository for Interaction
Datasets(BioGRID)、Human Protein Reference Database(HPRD)、IntAct、The Database
It is high-quality in ofInteracting Proteins (DIP) and The MolecularInteraction Database (MINT)
These Data Integrations are become the protein interaction data set of a nonredundancy by the interaction protein of amount, and selecting has
224,988 Thermodynamic parameters of 14,759 protein of ProteomicsDB protein expression information.For building nerve net
Network.
4 collect protein combination data set
This research and establishment one includes known drug target spot combination (drug combination) and synthetic lethal
The protein combination data set of (synthetic lethal) information, from The Drug Combination Database V2.0
(DCDB2) 1,272 pairs of drug target combined informations are obtained and (have wherein removed the drug target group for being not suitable for cancer in database
It closes).In addition, synthetic lethal protein combination information from SynLethDB databases (including 13,171 pairs of experimental verifications and 5,
489 pairs of synthetic lethal protein combinations for calculating prediction) and research paper result data (calculate comprising 182 pairs and to predict
Synthetic lethal protein combination).This protein combination data set incorporates information above, and it includes 20,062 pair to construct one
The network of protein combination.For predicting new protein combination
Fig. 6 is the network for the cancer-associated proteins matter combination that the embodiment of the present invention is predicted.Referring to Fig. 6, this research is excellent
The auto-encoder for having changed standard is optimized for a kind of supersparsity model to be applicable in the study relevant feature of cancer.We make
Use in ProteomicsDB differential expression data of the protein between cancer cell system and normal structure as input, with protein
The connection of unit (hiddenunits) and input unit are implied in interactive network constraint, and each implicit unit can be expressed as
One interaction.3 kinds of auto-encoder are respectively trained, merge into 1 single auto-encoder later.After training
Network can to capture which Interactions Mode more important to cancer disease process.The differential term of activation primitive is corresponding by being multiplied by
Coefficient is restricted to 1.Training process carries out 2,940 times, includes 6 comparisons of a cell line, therefore each cell line every time
It is repeated and has trained 30 times.The threshold value of activation primitive is set as 1, and is multiplied by average weight.Learning rate are set as 0.005,
Moment coefficient are set as 0.5, and attenuation coefficient (decay coefficient) is set as 1.
The auto-encoder that training is obtained forms one group of five layers of Recognition with Recurrent Neural Network, and every layer uses equal weight
auto-encoder.It will export last time and be added with input every time, last output can be by the difference of the key protein learnt
Value amplification.This research uses the differential expression Value Data in BioXpress and ProteomicsDB as input value, and definition knocks out
Effect is each High Defferential expression protein (differential expression (DE) in deep neural network model
value>0.5) input value is changed into the difference of output valve after negative value (modified value=-7.5).By all outputs
The absolute value of difference value>0.000001 interaction protein constructs knockdown (KD) network.
The result of protein assessment shows as the form of KD networks, and the change of a protein input value can influence very much
The output of other protein only considers the influence to up-regulated expression protein/gene herein, and the influence to the protein can
It is positive or negative.A single value indicates the importance of protein in order to obtain, in a KD network, by known cancer target
Point protein TTD databases calculate KD score as the cancer importance index (KD in evaluation protein as label
Score marking strategies consider direct effect value and indirect effect factor in network).We express protein to each High Defferential
DE value>0.5 all proteins are all assessed, and final KD score are the flat of each cell line or disease sample
Mean value.
In this research, by the protein (ProteomicsDB of high KD score:KD score>=0.1;BioXpress:
KD score>=0.02) it is selected as cancer-associated proteins matter in advance.Comprehensive ProteomicsDB and BioXpress two datasets
As a result, it has been found that tentative prediction obtains 4,862 cancer-associated proteins matter, the intersection of two datasets accounts for 87% He of respective sum
85%, wherein include 386 known drug target spots, cover 86.35% can by known drug target spot that this method assess (
Research is using 913 known drug target spot information in TTD databases, wherein there is 447 protein to contain application method assessment
Necessary PPI and expressing information).In 500 pre-selection cancer-associated proteins matter of TOP, (average KD score are maximum between cell line
Protein) in, have 211 known cancer drug target spots, then to the Gene Ontology (GO) of remaining 289 protein
Enrichment analysis finds that the function of these protein is mainly enriched in DNA replication dna (GO:0006270;GO:0006260;GO:
0006268) metabolic pathway (GO:0009058;GO:0044267;GO:0051246;GO:0009894;GO:0019538) it, dyes
Matter structure (GO:0051276;GO:0098813;GO:0007059;GO:0051983), cell division correlation (cell cycle
GO:0007049;GO:0000278;GO:0022402;GO:1903047;GO:0007346) processes such as.It is pre- in these TOP 500
It selects in cancer-associated proteins matter, some protein being widely studied, such as cellular tumor antigenp53
(TP53)、epidermal growth factorreceptor(EGFR)、GTPase HRas(HRAS)、GTPase NRas
(NRAS) and GTPase KRas (KRAS) have higher KD score values.In addition, this research is it has also been found that some new cancer phases
Protein is closed, such as amyloidbeta A4protein (APP), neural cell adhesion molecule L1
(L1CAM), thymidine kinase1 (TK1), DNA replication licensing factor MCM2 (MCM2) and
MCM4 etc. has higher KD score values, potential drug target can be used as to carry out follow-up study.We pay close attention to
One newfound cancer related target app protein.Forefathers are research shows that app protein variation is to lead to Alzheimer's disease
The major reason of (Alzheimer's disease, AD), the missing of app protein can be with suppression of cell fission process, and APP
The addition of c-terminal of protein can restart fission process.This research the results show that app protein in 18 kinds of cancers and
There are higher KD score values in 77 cell lines, shows that app protein may also assist in extremely important function in cancer.
This research also uses the combination of Neural Network model predictive cancer-associated proteins matter.Composition of medicine is proved in recent years
It is the effective means for the treatment of of cancer, and synthetic lethal effect for the accurate Therapy study of individual for being very helpful.
Since the difference and target spot of individual and cancer types combine huge search space, all drugs are found with routine experiment means
Target spot combines and synthetic lethal gene is it is difficult to carry out.Therefore with Bioinformatic methods prescreening protein combination then with reality
Verification becomes a key tactics.This method based on the assumption that:The albumen of a pair of effective synthetic lethal or effective target spot
Matter combines, they can influence another (some) protein or by another (some) protein influence, these protein can
To be come out with known synthetic lethal pair and target spot combined sorting.This research and establishment one combined comprising known drug target spot and and
The protein combination data set (including 20,062 pair of protein combination information) of synthetic lethal information, and use ProteomicsDB
And obtained KD networks indicate mutual between protein after the expression data input deep neural network model in BioXpress
Make relationship.
This research carries out selective analysis to the combination predicted in the combination of known cancer target spot and PPI protein combinations.Its
In the combination of the frequency that occurs in cell line higher than 0.25 be retained, the higher protein combination of confidence level, covers in order to obtain
The protein combination of 10% cell line of lid is predicted to be the combination of cancer-associated proteins matter.This research is respectively by PPI and known target spot
The protein collection that the combination of cancer-associated proteins matter is predicted as in combination constructs two networks (Fig. 6).In the cancer correlation egg of PPI
In the network of white matter combination, there are 2,439 pairs of protein combinations, including EGFR, TP53, cyclin-dependentkinase
The higher connection angle value such as 2 (CDK2), KRAS and RAC-alpha serine/threonine-proteinkinase (AKT1)
(degree) combination of protein (referring to the parts a in Fig. 6).In the network that the cancer-associated proteins matter of known target spot combines
In, have 2,543 pairs of protein combinations, including NRAS, KRAS, EGFR, thymidylate synthase (TYMS),
ribonucleoside-diphosphate reductase large subunit(RRM1)、DNA topoisomerase 2-
Alpha (TOP2A) and serine/threonine-protein kinase Chk1 (CHEK1) grade height connect angle value protein
Combination referring to the parts b in Fig. 6).These combinations can be used as the combination of potential cancer-associated proteins matter to carry out follow-up study.
The basic unit of deep neural network model is automatic coding machine, immanent structure and the protein phase of automatic coding machine
Interaction network is corresponding, and differential expression data is utilized in training process, therefore can utilize the difference sexology of different tissues
Practise the specificity of protein interaction.In addition multiple automatic coding machines are combined, it can will be in interactive network
Two protein of middle distance farther out establish contact.This two promise model is due to existing method.
The present invention can be with predictive disease related protein.The essence of Informatics Method predictive disease related protein/gene
It is the influence that prediction knocks out a certain protein/gene pairs cellular activity.The effect for accurately calculating gene activation/knockout is related to very
The interaction of complicated cellular content, complete and accurate interaction relationship is difficult to measure, and dynamics therein
Parameter is more difficult to measure.A kind of simulation based on deep neural network is proposed in this model prediction disease-associated protein method
The method of knock-up effect:In depth model, change the input value of each protein, observation exports the difference of generation to assess egg
Effect of the white matter to disease.
The present invention can be with the combination of predictive disease related protein.Composition of medicine is proved to be having for disease treatment in recent years
Effect means, and synthetic lethal effect for the accurate Therapy study of individual for being very helpful.But due to individual and
The difference and target spot of disease type combine huge search space, with routine experiment means find all drug target combinations and
Synthetic lethal gene is it is difficult to carry out.Therefore assist prescreening protein combination then to carry out reality again with the calculating of this model
Verification becomes a key tactics.This method based on the assumption that:The albumen of a pair of effective synthetic lethal or effective target spot
Matter combines, they can influence another (some) protein or by another (some) protein influence, these protein can
To be come out with known synthetic lethal pair and target spot combined sorting.This research inputs deep neural network model using expression data
The KD networks that obtain afterwards indicate the interaction between protein.
This model also is suitable as a kind of effective analysis method of general genes/proteins matter expression data analysis.This mould
Type is to be trained based on genes/proteins matter expression data, therefore the model can be according to the different study of input data to not
With interaction specificity under background.Can the model of a well trained be obtained in expression data basis on a large scale in advance, utilized
The model trained can quickly analyze the expression data difference feature of same new platform, show the advantage of big data analysis
(the method is all suitable for big small-scale new expression data, even the data of small sample amount, can be integrated with flat by this method
The extensive expressing information of platform achievees the purpose that analyze data specificity).
The model is more related to combining the differential gene of network to analyze better predictive disease than simple differential gene analysis
Gene.This model has important application value in the screening of area of computer aided drug target.
This model also is suitable as a kind of effective analysis method of general genes/proteins matter expression data analysis.This mould
Type is to be trained based on genes/proteins matter expression data, therefore the model is applicable in the different study differences according to input data
The research of interaction specificity under background.Can the model of a well trained be obtained in expression data basis on a large scale in advance,
The expression data difference feature of same new platform can quickly be analyzed using the model trained, shows big data analysis
(the method is all suitable for big small-scale new expression data to advantage, even the data of small sample amount, can be integrated by this method
With the extensive expressing information of platform, achieve the purpose that analyze data specificity).Therefore using the significant correlation gene of this model
Screening technique is also adapted to unicellular sequencing data.Since unicellular sequencing data is originated from single cancer types, it is therefore desirable to
Pre-training model is carried out according to given data set.Model used herein is used for following for prostate cancer patients by us herein
Ring tumour cell (circulating tumor cells, CTCs) transcript profile data pass through similar flow before as input
The neural network model of training prostate cancer specific, for screening prostate cancer related protein and drug and protein combination.
Using the unicellular sequencing data (accession number GSE67980) downloaded from NCBI, wherein including
122 prostate circulating tumor cells (circulating tumor cells, CTCs) sample of 12 patients, 12 forefront
Gland tumor sample and 3 normal prostate tissue samples.Using the differential expression of cancer cell system and normal structure as input number
According to training prostate cancer specific model, this time training process and model training process before are essentially identical, training prostate cancer
Special model shares 2,144 wheels, and (first 5 times are to select the difference table that 536 groups of samples compare at random for 6 comparisons of often wheel progress calculating
Up to value, for the last time to select differential expression value that 536 groups of samples compare successively and be repeated 4 times).Wherein, learning rate
It is set as 0.005, moment coefficient and is set as 0.5, decay coefficient being set as 1.In 5 layers of Recognition with Recurrent Neural Network
In, the threshold value of activation function is set as 0.01, and has been multiplied by average weight.For DE value>0 all eggs
White matter, modifiedvalue are set as -4.
The results show that this model can preferably find the relevant key protein of prostate cancer, such as androgen
Receptor (AR), kallikrein-2 (KLK2) and KLK3 etc..Androgen receptor (androgenreceptor, AR) is one
Important known relating to prostate cancers because.As a result AR not significantly expression in some CTC cell lines is found in, and at these
There are the protein of some critical functions in cell line, such as disabled homolog 2 (DAB2), chromatin
modification-related protein MEAF6(MEAF6)、tyrosine-proteinkinase JAK1(JAK1)、
interleukin-2receptor subunitbeta(IL2RB)、mitogen-activatedproteinkinase
Kinase kinase 2 (MAP3K2), integrin-linkedprotein kinase (ILK) and cyclin-dependent
Kinase inhibitor 1 (CDKN1A) etc. have higher KD score, it is meant that these genes are participated in non-AR expression
Play an important roll in prostate cancer CTC cell functions (some functions that perhaps can substitute AR).
In addition, the relevant key protein combination of prostate cancer is also predicted in this research.These protein combinations are in prostate
Cancer CTC difference iuntercellulars have differences.The higher protein of some coverages is screened out, in prostate cancer related protein
It combines in the network constituted, includes some critical function protein, such as signal transducer and activator
of transcription 3(STAT3)、exportin-1(XPO1)、cyclin-dependent kinase 4(CDK4)、
microtubule-associated protein 4(MAP4)、bcl-2-likeprotein 1(BCL2L1)。
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other
The difference of embodiment, just to refer each other for identical similar portion between each embodiment.For system disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place is said referring to method part
It is bright.
Principle and implementation of the present invention are described for specific case used herein, and above example is said
The bright method and its core concept for being merely used to help understand the present invention;Meanwhile for those of ordinary skill in the art, foundation
The thought of the present invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (10)
1. a kind of method of screening disease medicament target and drug targets combination, which is characterized in that including:
Autocoder is built according to differential expression data of the protein between disease cells system and normal structure;
The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;
According to the knockout neural network forecast disease-associated protein;The related protein is drug targets;
According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein is drug targets
Combination.
2. the method for screening disease medicament target according to claim 1 and drug targets combination, which is characterized in that described
The knock-up effect of gene is calculated according to the autocoder, structure knocks out network and specifically includes:
Deep learning network model is built according to autocoder;
Differential expression spectrum is inputted the deep learning network model, obtains differential expression by given differential expression spectrum
Value is denoted as background output B;
Difference value threshold value is set, the gene that difference value in the differential expression spectrum is more than the difference value threshold value is chosen, is denoted as height
Expressing gene;
All cance high-expression genes are sorted from big to small by difference value, the maximum cance high-expression gene of difference value is assigned
The numerical value of difference value minimum, assigns all cance high-expression genes to new difference value successively in the differential expression spectrum;
In being composed with the differential expression after all cance high-expression genes of removing according to the cance high-expression gene with new difference value
Remaining gene constitutes new differential expression spectrum;
The new differential expression spectrum is inputted into the deep learning network model, obtains the second output K;
Threshold value is compared in setting;
The difference for calculating the second output K of all cance high-expression genes and the background output B of the cance high-expression gene, is compared
Compared with difference;
The cance high-expression gene that all relatively differences are more than the relatively threshold value is denoted as knockout gene;
According to all gene constructed knockout networks of knockout.
3. the method for screening disease medicament target according to claim 2 and drug targets combination, which is characterized in that described
It is specially according to the gene constructed knockout network of knockout:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
4. the method for screening disease medicament target according to claim 1 and drug targets combination, which is characterized in that described
It is specifically included according to the knockout neural network forecast disease-associated protein:
Known drug target is set as marker gene, setting testing protein, relevance threshold;
The target point protein matter being connected with the testing protein and source point protein are obtained according to the knockout network;
According to the target point protein matter and the marker gene, the target point protein matter of depression effect and the target spot of activation effect are distinguished
Protein;
Calculate the side that the target point protein matter of the depression effect is connected with the testing protein weight and, be denoted as the first weight
With;
Calculate the absolute value of the weight on the side that the target point protein matter of the activation effect is connected with the testing protein and, note
For the first absolute value and;
The sum for calculating the weight on the side of all positive values of the source point protein is denoted as the second weight and all negative value weights
The sum of absolute value, be denoted as the second absolute value and;
According to first weight and first absolute value and second weight and second absolute value and calculate institute
State the Relevance scores of testing protein;
The Relevance scores for choosing all testing proteins are higher than the testing protein of the relevance threshold, as disease
Related protein.
5. the method for screening disease medicament target according to claim 1 and drug targets combination, which is characterized in that described
It is specifically included according to the combination for knocking out neural network forecast disease-associated protein:
With lethal and combined effect protein combination as positive sample known to collecting, 10 are generated at random according to the positive sample
The negative sample of times positive sample quantity;
Choose the either objective protein in the knockout network, all target spots being connected directly with the target protein of screening
Protein and source point protein;
Judge that the target protein knocked out in network and the target point protein matter of the target protein, source point protein are to deposit
It is in the positive sample, is still present in the negative sample;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are present in
In the positive sample, then by the side of the target protein, the target point protein matter of the target protein and source point protein
Weight addition takes absolute value, and obtains the first combining weights and absolute value;
By the first combining weights of all target proteins in the positive sample with absolute value be added to obtain positive combining weights
And absolute value;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are present in
In the negative sample, then by the side of the target protein, the target point protein matter of the target protein and source point protein
Weight addition takes absolute value, and obtains the second combining weights and absolute value;
Second combining weights of all target proteins in the negative sample are added with absolute value obtain negative combining weights and
Absolute value;
According to the absolute value of the first combining weights sum and the absolute value of the second combining weights sum by the target protein
Matter assigns 1, -1 or 0 value, obtains target protein assignment;
Selected first testing protein and the second testing protein;
Set the first detection threshold value, the second detection threshold value;
The ratio for calculating the protein of first testing protein and the second testing protein joint effect is denoted as common
Influence protein ratio;
First testing protein and the second testing protein joint effect are calculated according to the target protein assignment
The protein being evaluated ratio, be denoted as joint effect and be evaluated protein ratio, the protein being evaluated is
It is assigned a value of 1 or -1 protein;
Judge whether the joint effect protein ratio is more than first detection threshold value, while the joint effect is evaluated
Whether protein ratio is more than second detection threshold value;
If so, first testing protein and the combination of the second testing protein are then the combination of disease-associated protein.
6. the system of a kind of screening disease medicament target and drug targets combination, which is characterized in that including:
Autocoding module, it is automatic for being built according to differential expression data of the protein between disease cells system and normal structure
Encoder;
Network struction module is knocked out, the knock-up effect for calculating gene according to the autocoder, structure knocks out network;
Related protein prediction module, for according to the knockout neural network forecast disease-associated protein;The related protein
As drug targets;
Protein combination prediction module, for according to the combination for knocking out neural network forecast disease-associated protein, the correlation
The combination of protein is drug targets combination.
7. the system of screening disease medicament target according to claim 6 and drug targets combination, which is characterized in that described
Network struction module is knocked out to specifically include:
Network model construction unit, for building deep learning network model according to autocoder;
Background exports computing unit, and for giving a differential expression spectrum, differential expression spectrum is inputted the deep learning
Network model obtains differential expression value, is denoted as background output B;
Cance high-expression gene acquiring unit, for setting difference value threshold value, choose difference value in the differential expression spectrum be more than it is described
The gene of difference value threshold value, is denoted as cance high-expression gene;
Cance high-expression gene assignment unit, for all cance high-expression genes to sort from big to small by difference value, by difference value
The maximum cance high-expression gene assigns the numerical value of difference value minimum in the differential expression spectrum, successively by all high expression
Gene assigns new difference value;
New differential expression composes construction unit, has the cance high-expression gene of new difference value for basis and removes all high tables
New differential expression spectrum is constituted up to the remaining gene in the differential expression spectrum after gene;
Second output computing unit obtains second for the new differential expression spectrum to be inputted the deep learning network model
Export K;
Compare threshold setting unit, compares threshold value for setting;
Compare difference computational unit, the second output K for calculating all cance high-expression genes and the cance high-expression gene
Background exports the difference of B, obtains comparing difference;
Gene acquiring unit is knocked out, the cance high-expression gene for all relatively differences to be more than the relatively threshold value is remembered
To knock out gene;
Network struction unit, for according to all gene constructed knockout networks of knockout.
8. the system of screening disease medicament target according to claim 7 and drug targets combination, which is characterized in that described
Network struction unit is specially:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
9. the system of screening disease medicament target according to claim 6 and drug targets combination, which is characterized in that described
Related protein prediction module specifically includes:
Marker gene setup unit, for setting known drug target as marker gene, testing protein, relevance threshold;
Target spot source point protein acquiring unit, for obtaining the target spot being connected with the testing protein according to the knockout network
Protein and source point protein;
Target point protein matter discrimination unit, for according to the target point protein matter and the marker gene, distinguishing the target of depression effect
The target point protein matter of point protein and activation effect;
Inhibit target point protein matter computing unit, the target point protein matter for calculating the depression effect and the testing protein phase
Even side weight and, be denoted as the first weight and;
Activate target point protein matter computing unit, the target point protein matter for calculating the activation effect and the testing protein phase
Even side weight absolute value sum, be denoted as the first absolute value with;
Source point protein computing unit, the sum of the weight on the side of all positive values for calculating the source point protein are denoted as
Two weights and, sum of the absolute value of all negative value weights, be denoted as the second absolute value with;
Relevance scores computing unit, for according to first weight and first absolute value and second weight
With the Relevance scores of second absolute value and the calculating testing protein;
Disease-associated protein determination unit, the Relevance scores for choosing all testing proteins are higher than the correlation
The testing protein of property threshold value, as disease-associated protein.
10. the system of screening disease medicament target according to claim 6 and drug targets combination, which is characterized in that institute
Protein combination prediction module is stated to specifically include:
Sample collection unit, for collect it is known with lethal and combined effect protein combination as positive sample, according to institute
State the negative sample that positive sample generates 10 times of positive sample quantity at random;
Target protein screening unit, for choose it is described knockout network in either objective protein, screening it is all with it is described
The target point protein matter and source point protein that target protein is connected directly;
First judging unit, the target point protein for judging target protein and the target protein in the knockout network
Matter, source point protein are present in the positive sample, are still present in the negative sample;
First combining weights computing unit, if the target for target protein and the target protein in the knockout network
Point protein, source point protein are present in the positive sample, then by the target protein, the target spot of the target protein
Protein is added with the weight on the side of source point protein and takes absolute value, and obtains the first combining weights and absolute value;
Positive combining weights computing unit, is used for the exhausted of the first combining weights sum of all target proteins in the positive sample
Positive combining weights and absolute value are obtained to value addition;
Second combining weights computing unit, if the target for target protein and the target protein in the knockout network
Point protein, source point protein are present in the negative sample, then by the target protein, the target spot of the target protein
Protein is added with the weight on the side of source point protein and takes absolute value, and obtains the second combining weights and absolute value;
Negative combining weights computing unit, for by the second combining weights of all target proteins in the negative sample and absolutely
Value addition obtains negative combining weights and absolute value;
Target protein assignment unit, for according to the absolute value of the first combining weights sum and second combining weights and
Absolute value by the target protein assign 1, -1 or 0 value, obtain assignment target protein;
Protein group selection unit, for selecting the first testing protein and the second testing protein, as testing protein group;
Threshold setting unit, for setting the first detection threshold value, the second detection threshold value;
The first protein matter ratio computing unit, it is common for calculating first testing protein and second testing protein
The ratio of the protein of influence is denoted as joint effect protein ratio;
Second protein ratio computing unit, for according to the assignment target protein calculate first testing protein and
The ratio for the protein of the second testing protein joint effect being evaluated is denoted as joint effect and is evaluated protein ratio
Example;
Second judgment unit, for judging whether the joint effect protein ratio is more than first detection threshold value, simultaneously
The joint effect is evaluated whether protein ratio is more than second detection threshold value;
Protein combination determination unit, for if so, first testing protein and the combination of the second testing protein are then
The combination of disease-associated protein.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810461277.2A CN108647489B (en) | 2018-05-15 | 2018-05-15 | Method and system for screening disease drug target and target combination |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810461277.2A CN108647489B (en) | 2018-05-15 | 2018-05-15 | Method and system for screening disease drug target and target combination |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108647489A true CN108647489A (en) | 2018-10-12 |
CN108647489B CN108647489B (en) | 2020-06-30 |
Family
ID=63755608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810461277.2A Expired - Fee Related CN108647489B (en) | 2018-05-15 | 2018-05-15 | Method and system for screening disease drug target and target combination |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647489B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109637595A (en) * | 2018-12-12 | 2019-04-16 | 中国人民解放军军事科学院军事医学研究院 | A kind of drug method for relocating, device, electronic equipment and storage medium |
CN110021367A (en) * | 2018-10-16 | 2019-07-16 | 中国人民解放军军事科学院军事医学研究院 | Drug integrated information database building method and system based on drug and target information |
CN110544506A (en) * | 2019-08-27 | 2019-12-06 | 上海源兹生物科技有限公司 | Protein interaction network-based target point PPIs (Portable information processors) drug property prediction method and device |
CN112180098A (en) * | 2019-12-06 | 2021-01-05 | 中山大学 | Screening method of placenta-related disease marker and marker |
CN112326767A (en) * | 2020-11-03 | 2021-02-05 | 浙江大学滨海产业技术研究院 | Cancer drug target effect prediction method based on targeted proteomics |
CN112820417A (en) * | 2021-01-26 | 2021-05-18 | 四川大学 | Transcriptomics-based prostate cancer drug combination prediction method |
CN112927766A (en) * | 2021-03-29 | 2021-06-08 | 天士力国际基因网络药物创新中心有限公司 | Method for screening disease combination drug |
CN113053470A (en) * | 2019-12-26 | 2021-06-29 | 财团法人工业技术研究院 | Drug screening system and drug screening method |
WO2021208993A1 (en) * | 2020-04-17 | 2021-10-21 | 中国科学院上海药物研究所 | Information processing method and apparatus for predicting drug target |
WO2022060139A1 (en) * | 2020-09-17 | 2022-03-24 | 에스케이 주식회사 | Method and system for discovering target by using artificial intelligence |
CN115116561A (en) * | 2022-06-29 | 2022-09-27 | 南方医科大学南方医院 | Construction method and application of drug-target protein-schizophrenia interaction network |
CN115458061A (en) * | 2022-10-13 | 2022-12-09 | 南开大学 | Drug-protein interaction prediction method and system |
US11664094B2 (en) | 2019-12-26 | 2023-05-30 | Industrial Technology Research Institute | Drug-screening system and drug-screening method |
CN116312866A (en) * | 2023-05-09 | 2023-06-23 | 普瑞基准生物医药(苏州)有限公司 | Training method and device for synthetic lethal gene pair prediction model and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101925902A (en) * | 2007-11-28 | 2010-12-22 | 剑桥企业有限公司 | Protein aggregation prediction systems |
US20130071418A1 (en) * | 2009-11-18 | 2013-03-21 | The Board Of Regents Of The University Of Texas System | Physicochemical (PCP) Based Consensus Sequences and Uses Thereof |
CN104376234A (en) * | 2014-12-03 | 2015-02-25 | 苏州大学 | Promoter identification method and system |
CN106909807A (en) * | 2017-02-14 | 2017-06-30 | 同济大学 | A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data |
CN107885971A (en) * | 2017-10-30 | 2018-04-06 | 陕西师范大学 | Using the method for improving flower pollination algorithm identification key protein matter |
-
2018
- 2018-05-15 CN CN201810461277.2A patent/CN108647489B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101925902A (en) * | 2007-11-28 | 2010-12-22 | 剑桥企业有限公司 | Protein aggregation prediction systems |
US20130071418A1 (en) * | 2009-11-18 | 2013-03-21 | The Board Of Regents Of The University Of Texas System | Physicochemical (PCP) Based Consensus Sequences and Uses Thereof |
CN104376234A (en) * | 2014-12-03 | 2015-02-25 | 苏州大学 | Promoter identification method and system |
CN106909807A (en) * | 2017-02-14 | 2017-06-30 | 同济大学 | A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data |
CN107885971A (en) * | 2017-10-30 | 2018-04-06 | 陕西师范大学 | Using the method for improving flower pollination algorithm identification key protein matter |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021367A (en) * | 2018-10-16 | 2019-07-16 | 中国人民解放军军事科学院军事医学研究院 | Drug integrated information database building method and system based on drug and target information |
CN109637595A (en) * | 2018-12-12 | 2019-04-16 | 中国人民解放军军事科学院军事医学研究院 | A kind of drug method for relocating, device, electronic equipment and storage medium |
CN110544506B (en) * | 2019-08-27 | 2022-02-11 | 上海源兹生物科技有限公司 | Protein interaction network-based target point PPIs (Portable information processors) drug property prediction method and device |
CN110544506A (en) * | 2019-08-27 | 2019-12-06 | 上海源兹生物科技有限公司 | Protein interaction network-based target point PPIs (Portable information processors) drug property prediction method and device |
CN112180098B (en) * | 2019-12-06 | 2023-02-17 | 中山大学 | Screening method of placenta-related disease marker and marker |
CN112180098A (en) * | 2019-12-06 | 2021-01-05 | 中山大学 | Screening method of placenta-related disease marker and marker |
US11664094B2 (en) | 2019-12-26 | 2023-05-30 | Industrial Technology Research Institute | Drug-screening system and drug-screening method |
CN113053470A (en) * | 2019-12-26 | 2021-06-29 | 财团法人工业技术研究院 | Drug screening system and drug screening method |
CN113539366A (en) * | 2020-04-17 | 2021-10-22 | 中国科学院上海药物研究所 | Information processing method and device for predicting drug target |
WO2021208993A1 (en) * | 2020-04-17 | 2021-10-21 | 中国科学院上海药物研究所 | Information processing method and apparatus for predicting drug target |
WO2022060139A1 (en) * | 2020-09-17 | 2022-03-24 | 에스케이 주식회사 | Method and system for discovering target by using artificial intelligence |
CN112326767A (en) * | 2020-11-03 | 2021-02-05 | 浙江大学滨海产业技术研究院 | Cancer drug target effect prediction method based on targeted proteomics |
CN112820417A (en) * | 2021-01-26 | 2021-05-18 | 四川大学 | Transcriptomics-based prostate cancer drug combination prediction method |
CN112820417B (en) * | 2021-01-26 | 2022-12-23 | 四川大学 | Transcriptomics-based prostate cancer drug combination prediction method |
CN112927766A (en) * | 2021-03-29 | 2021-06-08 | 天士力国际基因网络药物创新中心有限公司 | Method for screening disease combination drug |
CN115116561A (en) * | 2022-06-29 | 2022-09-27 | 南方医科大学南方医院 | Construction method and application of drug-target protein-schizophrenia interaction network |
CN115116561B (en) * | 2022-06-29 | 2023-04-28 | 南方医科大学南方医院 | Application of drug-target protein-schizophrenia interaction network |
CN115458061A (en) * | 2022-10-13 | 2022-12-09 | 南开大学 | Drug-protein interaction prediction method and system |
CN115458061B (en) * | 2022-10-13 | 2024-01-23 | 南开大学 | Medicine-protein interaction prediction method and system |
CN116312866A (en) * | 2023-05-09 | 2023-06-23 | 普瑞基准生物医药(苏州)有限公司 | Training method and device for synthetic lethal gene pair prediction model and electronic equipment |
CN116312866B (en) * | 2023-05-09 | 2023-08-08 | 普瑞基准生物医药(苏州)有限公司 | Training method and device for synthetic lethal gene pair prediction model and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108647489B (en) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647489A (en) | A kind of method and system of screening disease medicament target and target combination | |
CN109360604A (en) | A kind of oophoroma molecule parting forecasting system | |
Yang et al. | Kinase inhibition-related adverse events predicted from in vitro kinome and clinical trial data | |
CN108830040A (en) | A kind of drug sensitivity prediction method based on cell line and drug similitude network | |
CN105243296A (en) | Tumor feature gene selection method combining mRNA and microRNA expression profile chips | |
Jaume et al. | Modeling dense multimodal interactions between biological pathways and histology for survival prediction | |
KR102639616B1 (en) | Method for predicting therapeutic efficacy of combined drug by machine learning ensemble model | |
US8655598B2 (en) | Predictive radiosensitivity network model | |
KR102431534B1 (en) | Model for predicting the toxic side effects of the intended drug and method thereof | |
CN107111689A (en) | Method and system for generating non-coding encoding gene coexpression network | |
Ge et al. | FRL: An integrative feature selection algorithm based on the fisher score, recursive feature elimination, and logistic regression to identify potential genomic biomarkers | |
So et al. | GraphComm: a graph-based deep learning method to predict cell-cell communication in single-cell RNAseq data | |
Li et al. | The research and development thinking on the status of artificial intelligence in traditional Chinese medicine | |
Ford et al. | Selecting compounds for focused screening using linear discriminant analysis and artificial neural networks | |
Mythili et al. | CTCHABC-hybrid online sequential fuzzy Extreme Kernel learning method for detection of Breast Cancer with hierarchical Artificial Bee | |
Li et al. | Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data | |
CN112397140A (en) | Target identification method and device based on allosteric mechanism and storage medium | |
CN112750510A (en) | Method for predicting permeability of blood brain barrier of medicine | |
CN105243294B (en) | A kind of method for predicting the related protein pair of cancer patient prognosis | |
Xu et al. | AutoOmics: An AutoML Tool for Multi-Omics Research | |
CN110322929A (en) | A method of the direct target spot of prediction Chinese medicine compound prescription and action component | |
Saghapour et al. | Explorative Discovery of Gene Signatures and Clinotypes in Glioblastoma Cancer Through GeneTerrain Knowledge Map Representation | |
US20230178173A1 (en) | Systems and methods for gut microbiome precision medicine | |
Han et al. | The combination of single-cell and Seq-RNA sequences revealed homeostatic chondrocyte osteoarthritic immune infiltrate | |
CN115910212A (en) | Method for analyzing cell communication mediated by ligand-receptor interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200630 Termination date: 20210515 |
|
CF01 | Termination of patent right due to non-payment of annual fee |