CN108647489A - A kind of method and system of screening disease medicament target and target combination - Google Patents

A kind of method and system of screening disease medicament target and target combination Download PDF

Info

Publication number
CN108647489A
CN108647489A CN201810461277.2A CN201810461277A CN108647489A CN 108647489 A CN108647489 A CN 108647489A CN 201810461277 A CN201810461277 A CN 201810461277A CN 108647489 A CN108647489 A CN 108647489A
Authority
CN
China
Prior art keywords
protein
target
gene
value
testing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810461277.2A
Other languages
Chinese (zh)
Other versions
CN108647489B (en
Inventor
陈玲玲
常继伟
丁毓端
高俊祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN201810461277.2A priority Critical patent/CN108647489B/en
Publication of CN108647489A publication Critical patent/CN108647489A/en
Application granted granted Critical
Publication of CN108647489B publication Critical patent/CN108647489B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Abstract

The present invention discloses a kind of method and system of screening disease medicament target and drug targets combination, and this method includes:Autocoder is built according to differential expression data of the protein between disease cells system and normal structure;The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;According to the knockout neural network forecast disease-associated protein;The related protein is drug targets;According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein is drug targets combination.By this method or system can predictive disease related protein and protein simultaneously combined effect.

Description

A kind of method and system of screening disease medicament target and target combination
Technical field
The present invention relates to deep neural network fields, are combined more particularly to a kind of screening disease medicament target and target Method and system.
Background technology
With the progress of bio-measurement means, the relevant high-throughput data of disease medicament constantly accumulate, to some diseases and The understanding of the relevant genes/proteins matter of disease also deepens continuously.Target drug therapy is considered in safety and medicine at present It is better than conventional medication method in product adverse reaction (ADR), therefore target drug is increasingly becoming disease treatment and medicament research and development Main way.In the work of such medicament research and development, the step of most critical, is just to determine drug target, and determines drug target Key is preferred disease-associated protein.
Many bioinformatics methods for drug design can be in conjunction with various types of data informations, such as albumen at present Matter interacts (protein-protein interactions, PPI), genome mutation, the expression of genes/proteins matter and function Disease related gene/protein is screened in annotation etc., and wherein utilizes the certain methods of bio-networks that there is preferable performance.One A little methods are using including that the relevant bioprocess information of disease is used for predictive disease related gene/egg in protein interaction network White matter;Also certain methods combine other groups of data using protein interaction network, such as genes/proteins matter express spectra and gene Abrupt information etc. is organized to speculate new related gene;Other method can be screened by network topology structure.These sides Method bio-networks method can generally follow " guiltby association (GBA) " principle, i.e., with known disease gene/albumen The closely related genes/proteins matter of matter or phenotype are also more likely related to the disease, and this kind of prediction is likely to introducing one A little prejudice results.Some methods incorporate the data structure network of multisample, can also ignore tissue and condition present in network Specificity.
The existing method based on protein-protein interaction network predictive disease target spot is typically based on following steps:One:It collects A large amount of protein interaction data, and it is a nonredundancy to arrange, and removes the set of incorrect link;Two:Collect normal group The gene expression profile with diseased tissue is knitted, and calculates the differential expression value between two class loadings;Three:Calculate has phase with selected protein The sum of the differential expression value of all proteins of interaction, and using this value as the standard of preferred candidate gene.
Neural network has very strong nonlinear fitting ability, is realized convenient for computer, has very strong robustness, memory energy Power, non-linear mapping capability and powerful self-learning capability, are currently the important means of deep learning, before having prodigious application Scape.Egg is learnt based on the deep neural network model of autocoder auto-encoder structures we have proposed one herein Specificity of the white matter interaction in diseased tissue, and by the network after training for screening disease-associated protein and albumen Matter combines.
Invention content
The object of the present invention is to provide the method and system of a kind of screening disease medicament target and drug targets combination, propose A kind of deep learning method based on autocoder, can fully learn protein interaction in the multigroup data of cancer In specificity, the network after deep learning training can combine with Effective selection cancer related drugs target and target.
To achieve the above object, the present invention provides following schemes:
A method of screening disease medicament target and drug targets combination, including:
Autocoder is built according to differential expression data of the protein between disease cells system and normal structure;
The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;
According to the knockout neural network forecast disease-associated protein;The related protein is drug targets;
According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein is drug Target combines.
Optionally, the knock-up effect that gene is calculated according to the autocoder, structure knock out network and specifically include:
Deep learning network model is built according to autocoder;
Differential expression spectrum is inputted the deep learning network model, obtains difference by given differential expression spectrum Expression value is denoted as background output B;
Difference value threshold value is set, the gene that difference value in the differential expression spectrum is more than the difference value threshold value, note are chosen For cance high-expression gene;
All cance high-expression genes are sorted from big to small by difference value, by the maximum cance high-expression gene of difference value The numerical value for assigning difference value minimum in the differential expression spectrum, assigns all cance high-expression genes to new difference value successively;
According to the cance high-expression gene with new difference value and the differential expression spectrum after all cance high-expression genes of removing In remaining gene constitute new differential expression spectrum;
The new differential expression spectrum is inputted into the deep learning network model, obtains the second output K;
Threshold value is compared in setting;
The difference for calculating the second output K of all cance high-expression genes and the background output B of the cance high-expression gene, obtains To comparing difference;
The cance high-expression gene that all relatively differences are more than the relatively threshold value is denoted as knockout gene;
According to all gene constructed knockout networks of knockout.
Optionally, described to be specially according to the gene constructed knockout network of knockout:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Optionally, described to be specifically included according to the knockout neural network forecast disease-associated protein:
Known drug target is set as marker gene, testing protein, relevance threshold;
The target point protein matter being connected with the testing protein and source point protein are obtained according to the knockout network;
According to the target point protein matter and the marker gene, the target point protein matter and activation effect of depression effect are distinguished Target point protein matter;
Calculate the side that the target point protein matter of the depression effect is connected with the testing protein weight and, be denoted as first Weight and;
Calculate the absolute value of the weight on the side that the target point protein matter of the activation effect is connected with the testing protein With, be denoted as the first absolute value and;
The sum for calculating the weight on the side of all positive values of the source point protein is denoted as the second weight and all negative values power Weight absolute value sum, be denoted as the second absolute value with;
According to first weight and first absolute value and second weight and second absolute value and meter Calculate the Relevance scores of the testing protein;
The Relevance scores for choosing all testing proteins are higher than the testing protein of the relevance threshold, as Disease-associated protein.
Optionally, described to be specifically included according to the combination for knocking out neural network forecast disease-associated protein:
It is given birth at random according to the positive sample as positive sample with lethal and combined effect protein combination known to collecting At the negative sample of 10 times of positive sample quantity;
The either objective protein in the knockout network is chosen, screens what all and target protein was connected directly Target point protein matter and source point protein;
Judge target point protein matter, the source point protein of the target protein and the target protein in the knockout network It is to be present in the positive sample, is still present in the negative sample;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited It is in the positive sample, then by the target protein, the target point protein matter of the target protein and source point protein The weight addition on side takes absolute value, and obtains the first combining weights and absolute value;
By the first combining weights of all target proteins in the positive sample with absolute value be added and just combined Weight and absolute value;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited It is in the negative sample, then by the target protein, the target point protein matter of the target protein and source point protein The weight addition on side takes absolute value, and obtains the second combining weights and absolute value;
Second combining weights of all target proteins in the negative sample are added to obtain with absolute value and bear combined weights Weight and absolute value;
According to the absolute value of the first combining weights sum and the absolute value of the second combining weights sum by the target Protein assigns 1, -1 or 0 value, obtains target protein assignment;
Selected first testing protein and the second testing protein;
Set the first detection threshold value, the second detection threshold value;
The ratio for calculating the protein of first testing protein and the second testing protein joint effect, is denoted as Joint effect protein ratio;
First testing protein is calculated according to the target protein assignment and second testing protein is common The ratio of the protein being evaluated influenced is denoted as joint effect and is evaluated protein ratio, the albumen being evaluated Matter is the protein for being assigned a value of 1 or -1;
Judge whether the joint effect protein ratio is more than first detection threshold value, while the joint effect quilt Assess whether protein ratio is more than second detection threshold value;
If so, first testing protein and the combination of the second testing protein are then the group of disease-associated protein It closes.
A kind of system of screening disease medicament target and drug targets combination, including:
Autocoding module, for being built according to differential expression data of the protein between disease cells system and normal structure Autocoder;
Network struction module is knocked out, the knock-up effect for calculating gene according to the autocoder, structure knocks out net Network;
Related protein prediction module, for according to the knockout neural network forecast disease-associated protein;The correlation egg White matter is drug targets;
Protein combination prediction module, it is described for the combination according to the knockout neural network forecast disease-associated protein The combination of related protein is drug targets combination.
Optionally, the knockout network struction module specifically includes:
Network model construction unit, for building deep learning network model according to autocoder;
Background exports computing unit, and for giving a differential expression spectrum, differential expression spectrum is inputted the depth Learning network model, obtains differential expression value, is denoted as background output B;
Cance high-expression gene acquiring unit is chosen difference value in the differential expression spectrum and is more than for setting difference value threshold value The gene of the difference value threshold value, is denoted as cance high-expression gene;
Cance high-expression gene assignment unit will be poor for all cance high-expression genes to sort from big to small by difference value The different maximum cance high-expression gene of value assigns the numerical value of difference value minimum in the differential expression spectrum, successively by all height Expressing gene assigns new difference value;
New differential expression composes construction unit, has the cance high-expression gene of new difference value for basis and removes all described The remaining gene in differential expression spectrum after cance high-expression gene constitutes new differential expression spectrum;
Second output computing unit is obtained for the new differential expression spectrum to be inputted the deep learning network model Second output K;
Compare threshold setting unit, compares threshold value for setting;
Compare difference computational unit, the second output K and Gao Biaodaji for calculating all cance high-expression genes The difference of the background output B of cause, obtains comparing difference;
Gene acquiring unit is knocked out, the Gao Biaodaji for all relatively differences to be more than to the relatively threshold value Because being denoted as knockout gene;
Network struction unit, for according to all gene constructed knockout networks of knockout.
Optionally, the network struction unit is specially:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Optionally, the related protein prediction module specifically includes:
Gene setup unit, for setting known drug target as marker gene, testing protein, relevance threshold;
Target spot source point protein acquiring unit, for what is be connected with the testing protein according to the knockout network acquisition Target point protein matter and source point protein;
Target point protein matter discrimination unit, for according to the target point protein matter and the marker gene, distinguishing depression effect Target point protein matter and activation effect target point protein matter;
Inhibit target point protein matter computing unit, the target point protein matter for calculating the depression effect and the testing protein The weight on the connected side of matter and, be denoted as the first weight and;
Target point protein matter computing unit is activated, the target point protein matter for calculating the activation effect and the testing protein Sum of the absolute value of the weight on the connected side of matter, be denoted as the first absolute value with;
Source point protein computing unit, the sum of the weight on the side of all positive values for calculating the source point protein, note For the second weight and, sum of the absolute value of all negative value weights, be denoted as the second absolute value with;
Relevance scores computing unit, for according to first weight and first absolute value and second power Weight and second absolute value and the Relevance scores for calculating the testing protein;
Disease-associated protein determination unit, for choosing the Relevance scores of all testing proteins higher than described The testing protein of relevance threshold, as disease-associated protein.
Optionally, the protein combination prediction module specifically includes:
Sample collection unit, for collect it is known with lethal and combined effect protein combination as positive sample, root Generate the negative sample of 10 times of positive sample quantity at random according to the positive sample;
Target protein screening unit, for choose it is described knockout network in either objective protein, screening it is all with The target point protein matter and source point protein that the target protein is connected directly;
First judging unit, the target spot for judging target protein and the target protein in the knockout network Protein, source point protein are present in the positive sample, are still present in the negative sample;
First combining weights computing unit, if for the target protein knocked out in network and the target protein Target point protein matter, source point protein be present in the positive sample, then by the target protein, the target protein Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the first combining weights and absolute value;
Positive combining weights computing unit, for by the first combining weights of all target proteins in the positive sample and Absolute value be added to obtain positive combining weights and absolute value;
Second combining weights computing unit, if for the target protein knocked out in network and the target protein Target point protein matter, source point protein be present in the negative sample, then by the target protein, the target protein Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the second combining weights and absolute value;
Negative combining weights computing unit, for by the second combining weights of all target proteins in the negative sample and Absolute value is added to obtain negative combining weights and absolute value;
Target protein assignment unit is used for the absolute value according to the first combining weights sum and second combined weights The absolute value of weight sum assigns the target protein to 1, -1 or 0 value, obtains assignment target protein;
Protein group selection unit, for selecting the first testing protein and the second testing protein, as testing protein Matter group;
Threshold setting unit, for setting the first detection threshold value, the second detection threshold value;
The first protein matter ratio computing unit, for calculating first testing protein and second testing protein The ratio of the protein of joint effect is denoted as joint effect protein ratio;
Second protein ratio computing unit, for calculating first testing protein according to the assignment target protein The ratio for the protein of matter and the second testing protein joint effect being evaluated is denoted as joint effect and is evaluated albumen Matter ratio;
Second judgment unit, for judging whether the joint effect protein ratio is more than first detection threshold value, The joint effect is evaluated whether protein ratio is more than second detection threshold value simultaneously;
Protein combination determination unit is used for if so, first testing protein and the combination of the second testing protein It is then the combination of disease-associated protein;Otherwise, it is not the combination of disease-associated protein.
According to specific embodiment provided by the invention, the invention discloses following technique effects:
Differential expression data in the present invention according to protein between disease cells system and normal structure builds autocoding Device;The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;According to the knockout neural network forecast disease Related protein;The related protein is drug targets;According to the group for knocking out neural network forecast disease-associated protein It closes, the combination of the related protein is drug targets combination.Propose a kind of simulation knockout based on deep neural network The method of effect:In depth model, change the input value of each protein, observation exports the difference of generation to assess protein To the effect of disease.A kind of network model, which is used only, in the present invention can capture the feature structure and interior implied in complex data In rule, can predictive disease related protein and protein simultaneously combined effect, to two class forecasting problems theoretical and real Unification is carried out on now.
Description of the drawings
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the present invention Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is the method flow diagram that the embodiment of the present invention screens disease medicament target and drug targets combination;
Fig. 2 is the method flow diagram of predictive disease related protein of the embodiment of the present invention;
Fig. 3 is the system module figure that the embodiment of the present invention screens disease medicament target and drug targets combination;
Fig. 4 is related protein prediction module structure chart of the embodiment of the present invention;
Fig. 5 is that the embodiment of the present invention uses deep neural network model predictive disease related protein and protein combination Flow chart;
Fig. 6 is the network for the cancer-associated proteins matter combination that the embodiment of the present invention is predicted.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
Fig. 1 is the method flow diagram that the embodiment of the present invention screens disease medicament target and drug targets combination.Referring to Fig. 1, A method of screening disease medicament target and drug targets combination, including:
Step 101:Autocoding is built according to differential expression data of the protein between disease cells system and normal structure Device;
Step 102:The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;
Step 103:According to the knockout neural network forecast disease-associated protein;The related protein is medicine target Mark;
Step 104:According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein As drug targets combine.
The feature structure and inherent law implied in complex data can be captured using the above method, can be predicted simultaneously The combined effect of disease-associated protein and protein carries out unification to two class forecasting problems in theoretical and realization.
The present invention deep neural network model basic unit be automatic coding machine, the immanent structure of automatic coding machine with Protein-protein interaction network is corresponding, and differential expression data is utilized in training process, therefore can utilize different tissues Specificity of the difference inquiry learning to protein interaction.In addition multiple automatic coding machines are combined, it can will be mutual It acts on two protein of distance farther out in network and establishes contact.
Wherein, the knock-up effect that gene is calculated according to the autocoder, structure knock out network and specifically include:
Deep learning network model is built according to autocoder;
Differential expression spectrum is inputted the deep learning network model, obtains difference by given differential expression spectrum Expression value is denoted as background output B;
Difference value threshold value is set, the gene that difference value in the differential expression spectrum is more than the difference value threshold value, note are chosen For cance high-expression gene;
All cance high-expression genes are sorted from big to small by difference value, by the maximum cance high-expression gene of difference value The numerical value for assigning difference value minimum in the differential expression spectrum, assigns all cance high-expression genes to new difference value successively;
According to the cance high-expression gene with new difference value and the differential expression spectrum after all cance high-expression genes of removing In remaining gene constitute new differential expression spectrum;
The new differential expression spectrum is inputted into the deep learning network model, obtains the second output K;
Threshold value is compared in setting;
The difference for calculating the second output K of all cance high-expression genes and the background output B of the cance high-expression gene, obtains To comparing difference;
The cance high-expression gene that all relatively differences are more than the relatively threshold value is denoted as knockout gene;
According to all gene constructed knockout networks of knockout.
Wherein, described to be specially according to the gene constructed knockout network of knockout:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Fig. 2 is the method flow diagram of predictive disease related protein of the embodiment of the present invention.Referring to Fig. 2, described in the basis Neural network forecast disease-associated protein is knocked out to specifically include:
Step 201:Known drug target is set as marker gene, testing protein, relevance threshold;
Step 202:The target point protein matter being connected with the testing protein and source point egg are obtained according to the knockout network White matter;
Step 203:According to the target point protein matter and the marker gene, distinguishes the target point protein matter of depression effect and swash The target point protein matter that active is answered;
Step 204:Calculate the side that the target point protein matter of the depression effect is connected with the testing protein weight and, Be denoted as the first weight and;
Step 205:Calculate the weight on the side that the target point protein matter of the activation effect is connected with the testing protein The sum of absolute value, be denoted as the first absolute value and;
Step 206:The sum for calculating the weight on the side of all positive values of the source point protein is denoted as the second weight and institute The sum for having the absolute value of negative value weight, be denoted as the second absolute value and;
Step 207:Absolutely according to first weight and first absolute value and second weight and described second To being worth and calculating the Relevance scores of the testing protein;
Step 208:The Relevance scores for choosing all testing proteins are higher than the egg to be measured of the relevance threshold White matter, as disease-associated protein.
Wherein, described to be specifically included according to the combination for knocking out neural network forecast disease-associated protein:
It is given birth at random according to the positive sample as positive sample with lethal and combined effect protein combination known to collecting At the negative sample of 10 times of positive sample quantity;
The either objective protein in the knockout network is chosen, screens what all and target protein was connected directly Target point protein matter and source point protein;
Judge target point protein matter, the source point protein of the target protein and the target protein in the knockout network It is to be present in the positive sample, is still present in the negative sample;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited It is in the positive sample, then by the target protein, the target point protein matter of the target protein and source point protein The weight addition on side takes absolute value, and obtains the first combining weights and absolute value;
By the first combining weights of all target proteins in the positive sample with absolute value be added and just combined Weight and absolute value;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are deposited It is in the negative sample, then by the target protein, the target point protein matter of the target protein and source point protein The weight addition on side takes absolute value, and obtains the second combining weights and absolute value;
Second combining weights of all target proteins in the negative sample are added to obtain with absolute value and bear combined weights Weight and absolute value;
According to the absolute value of the first combining weights sum and the absolute value of the second combining weights sum by the target Protein assigns 1, -1 or 0 value, obtains target protein assignment;
Selected first testing protein and the second testing protein;
Set the first detection threshold value, the second detection threshold value;
The ratio for calculating the protein of first testing protein and the second testing protein joint effect, is denoted as Joint effect protein ratio;
First testing protein is calculated according to the target protein assignment and second testing protein is common The ratio of the protein being evaluated influenced is denoted as joint effect and is evaluated protein ratio, the albumen being evaluated Matter is the protein for being assigned a value of 1 or -1;
Judge whether the joint effect protein ratio is more than first detection threshold value, while the joint effect quilt Assess whether protein ratio is more than second detection threshold value;
If so, first testing protein and the combination of the second testing protein are then the group of disease-associated protein It closes.
Fig. 3 is the system module figure that the embodiment of the present invention screens disease medicament target and drug targets combination.Referring to Fig. 3, A kind of system of screening disease medicament target and drug targets combination, including:
Autocoding module 301, for the differential expression data according to protein between disease cells system and normal structure Build autocoder;
Network struction module 302 is knocked out, the knock-up effect for calculating gene according to the autocoder, structure knocks out Network;
Related protein prediction module 303, for according to the knockout neural network forecast disease-associated protein;The correlation Protein is drug targets;
Protein combination prediction module 304, for according to the combination for knocking out neural network forecast disease-associated protein, institute The combination for stating related protein is drug targets combination.
Wherein, the knockout network struction module specifically includes:
Network model construction unit, for building deep learning network model according to autocoder;
Background exports computing unit, and for giving a differential expression spectrum, differential expression spectrum is inputted the depth Learning network model, obtains differential expression value, is denoted as background output B;
Cance high-expression gene acquiring unit is chosen difference value in the differential expression spectrum and is more than for setting difference value threshold value The gene of the difference value threshold value, is denoted as cance high-expression gene;
Cance high-expression gene assignment unit will be poor for all cance high-expression genes to sort from big to small by difference value The different maximum cance high-expression gene of value assigns the numerical value of difference value minimum in the differential expression spectrum, successively by all height Expressing gene assigns new difference value;
New differential expression composes construction unit, has the cance high-expression gene of new difference value for basis and removes all described The remaining gene in differential expression spectrum after cance high-expression gene constitutes new differential expression spectrum;
Second output computing unit is obtained for the new differential expression spectrum to be inputted the deep learning network model Second output K;
Compare threshold setting unit, compares threshold value for setting;
Compare difference computational unit, the second output K and Gao Biaodaji for calculating all cance high-expression genes The difference of the background output B of cause, obtains comparing difference;
Gene acquiring unit is knocked out, the Gao Biaodaji for all relatively differences to be more than to the relatively threshold value Because being denoted as knockout gene;
Network struction unit, for according to all gene constructed knockout networks of knockout.
Wherein, the network struction unit is specially:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
Fig. 4 is related protein prediction module structure chart of the embodiment of the present invention.Referring to Fig. 4, the related protein prediction Module specifically includes:
Marker gene setup unit 401, for setting known drug target as marker gene, testing protein, correlation Property threshold value;
Target spot source point protein acquiring unit 402, for being obtained and the testing protein phase according to the knockout network Target point protein matter even and source point protein;
Target point protein matter discrimination unit 403, for according to the target point protein matter and the marker gene, distinguishing and inhibiting effect The target point protein matter of the target point protein matter and activation effect answered;
Inhibit target point protein matter computing unit 404, for calculate the target point protein matter of the depression effect with it is described to be measured The weight on the connected side of protein and, be denoted as the first weight and;
Activate target point protein matter computing unit 405, for calculate the target point protein matter of the activation effect with it is described to be measured Sum of the absolute value of the weight on the connected side of protein, be denoted as the first absolute value with;
Source point protein computing unit 406, the sum of the weight on the side of all positive values for calculating the source point protein, Be denoted as the second weight and, sum of the absolute value of all negative value weights, be denoted as the second absolute value with;
Relevance scores computing unit 407, for according to first weight and first absolute value and described the Two weights and second absolute value and the Relevance scores for calculating the testing protein;
Disease-associated protein determination unit 408, the Relevance scores for choosing all testing proteins are higher than The testing protein of the relevance threshold, as disease-associated protein.
Wherein, the protein combination prediction module specifically includes:
Sample collection unit, for collect it is known with lethal and combined effect protein combination as positive sample, root Generate the negative sample of 10 times of positive sample quantity at random according to the positive sample;
Target protein screening unit, for choose it is described knockout network in either objective protein, screening it is all with The target point protein matter and source point protein that the target protein is connected directly;
First judging unit, the target spot for judging target protein and the target protein in the knockout network Protein, source point protein are present in the positive sample, are still present in the negative sample;
First combining weights computing unit, if for the target protein knocked out in network and the target protein Target point protein matter, source point protein be present in the positive sample, then by the target protein, the target protein Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the first combining weights and absolute value;
Positive combining weights computing unit, for by the first combining weights of all target proteins in the positive sample and Absolute value be added to obtain positive combining weights and absolute value;
Second combining weights computing unit, if for the target protein knocked out in network and the target protein Target point protein matter, source point protein be present in the negative sample, then by the target protein, the target protein Target point protein matter is added with the weight on the side of source point protein and takes absolute value, and obtains the second combining weights and absolute value;
Negative combining weights computing unit, for by the second combining weights of all target proteins in the negative sample and Absolute value is added to obtain negative combining weights and absolute value;
Target protein assignment unit is used for the absolute value according to the first combining weights sum and second combined weights The absolute value of weight sum assigns the target protein to 1, -1 or 0 value, obtains assignment target protein;
Protein group selection unit, for selecting the first testing protein and the second testing protein, as testing protein Matter group;
Threshold setting unit, for setting the first detection threshold value, the second detection threshold value;
The first protein matter ratio computing unit, for calculating first testing protein and second testing protein The ratio of the protein of joint effect is denoted as joint effect protein ratio;
Second protein ratio computing unit, for calculating first testing protein according to the assignment target protein The ratio for the protein of matter and the second testing protein joint effect being evaluated is denoted as joint effect and is evaluated albumen Matter ratio;
Second judgment unit, for judging whether the joint effect protein ratio is more than first detection threshold value, The joint effect is evaluated whether protein ratio is more than second detection threshold value simultaneously;
Protein combination determination unit is used for if so, first testing protein and the combination of the second testing protein It is then the combination of disease-associated protein.
Fig. 5 is that the embodiment of the present invention uses deep neural network model predictive disease related protein and protein combination The method of the present invention is described in detail below referring to Fig. 5 in flow chart:
Neural network model designs and training (referring to a model trainings part in Fig. 5)
The auto-encoder of standard is optimized for a kind of supersparsity model to be applicable in the study relevant spy of disease by this research Sign.In a model, each input unit represents differential expression value of the gene in diseased tissue and normal structure, Mei Geyin A protein interaction is represented containing neuron.Because differential expression value can be positive value, or negative value, therefore one There are Three models for hidden neuron:++, -- and+-.Different patterns has different meanings in biology, therefore need to be to this Make differentiation.The neuron of model identical is formed into an automatic coding machine, therefore there are three automatic coding machines.Each coding The number of the input unit of machine is the number of protein, and the number of hidden neuron is the number of interaction.Three automatic volumes Ink recorder is trained respectively, merges three automatic coding machines after the completion of training, the automatic coding machine input unit after merging Number remains unchanged, and the number of hidden neuron is three times originally.The activation primitive of automatic coding machine is to integrate linearly to swash Function (formula 4) living updates weight with back-propagation algorithm, and learning rate is set as 0.005, and momentum is set as 0.5, and carries out L2 Regularization.Each training cycle has 6 samples, cycle-index to be dimensioned to fixed number of times according to training sample.
P1=W1|i1| (1)
P2=W2|i2| (2)
I in formula1And i2Represent the input value of input unit, W1And W2Corresponding weight is represented, and W representatives each follow Average weight in ring, Pi(i=1,2) represent the value that weight is multiplied by input, and P represents the input value of hidden neuron.
Calculate the knock-up effect (knocking out analog portion referring to the b in Fig. 5) of gene
In order to calculate the knock-up effect of given genes/proteins matter, we build one using trained automatic coding machine Deep learning network.A deep learning network is built with automatic coding machine first, is then composed using a differential expression as defeated Enter the output of computation model and (differential expression spectrum is calculated by experimental data, from public database as background output Download), finally change and give the input value of gene and calculate output, the difference between the output valve and background output is counted as this Other genes that the knock-up effect of gene is influenced.The value given in practical applications after gene alteration is a negative value, specifically The distribution that size is dependent on all differences expression value provides, close to the minimum value in distribution.Computation model is by five serial phases The automatic coding machine that training even is completed is constituted.Each layer of output consists of two parts, i.e., the output of this layer automatic coding machine With the input of this layer of automatic coding machine, it is respectively a and (1-a) that this two parts, which assigns different weights, and 0.25 is set as in this (formula 5).Calculate comprising the steps of for knock-up effect:
1:Given differential expression spectrum calculates the output of neural network model as input, and output is neural network meter Obtained differential expression value, and it is that background exports B to note down.
2:Calculate the knock-up effect that gene is raised in differential expression spectrum.A threshold value, such as 0.5 are set, by differential expression Gene of the difference value more than 0.5 is selected in spectrum, and is assigned a smaller value successively and kept the constant (ratio of the value of other genes Smaller value is the minimum value in rounding volume data depending on foundation data distribution).Express spectra after this is changed is input to identical Model in and record its output be K.
3:To the output K of each cance high-expression gene in step 2, its difference between background output B is calculated.Due to It is all gene that output unit is corresponding, therefore the big gene of difference is exactly to knock out to give the gene that gene is influenced.
4:Structure knocks out network.The result of step 3 can be expressed as the form of network, and the gene of knockout is source point, side The gene that is affected is directed toward in direction, and the weight on side is then the difference value of the gene that is affected in K-B, wherein K refer to change some The output obtained after cance high-expression gene.B refers to the output obtained for the input through change.Each differential expression spectrum can obtain Corresponding knockout network.
ln=α en+(1-α)ln-1 (5)
D=B-K (6)
I in formulanRepresent the output of layer n, enThe output of layer n automatic coding machines is represented, a is weight, is set as 0.25 herein. Vectorial K and B indicates to knock out the output of some gene and background output, the differences of D two kinds of outputs of expression respectively.
Predictive disease related protein (referring to the c preferred proteins content in Fig. 5)
To assess the correlation of each gene and disease, needs to make according to its connection in knocking out network and beat Point, the score value the high then higher with the correlation of disease.The participation of marker gene is needed during calculating gene and giving a mark, herein We are using known drug target as label.The score for calculating given protein P needs to consider in knocking out network directly Coupled protein is directed edge due to knocking out the side in network, and directly connected protein is divided into two Class, source point protein and target point protein matter simultaneously are respectively used to calculate the score of given protein P.Known drug target can divide For activation effect target spot and depression effect target spot.There is the side of positive value weight to mean the knockout of source point protein in knocking out network Weaken target point protein matter function, so if the target point protein matter of protein P be the drug target of known depression effect and Side with positive value weight then by the adduction for the weight for calculating the side that all proteinoid are connected with P and is marked as SpwtIf the target point protein matter of protein P is the drug target of activation effect and the side with negative value weight, such egg is calculated The absolute value for the negative value weight that white matter is connected with P and and be marked as Snwt.For the source point albumen i being connected with P, we First by its it is all positive value weights and be denoted asThe absolute value of negative value weight and be denoted asBy the S of protein ipwt And SnwtIt is respectively labeled asWithThe calculating of the score of final protein matter P has complete description in formula 7.
The ω in formulaiRepresent the weight of source point protein i to target point protein matter P.
The combination of predictive disease related protein (referring to the d preferred proteins combined arrangement in Fig. 5)
We also have the protein combination of lethal effect using knockout neural network forecast.Basic assumption is there is lethal effect The protein combination answered can influence same group of other oroteins, or can be influenced by same histone matter, if by these eggs White matter effectively identifies, then can utilize the protein combination with lethal effect of these protein predictions newly.First It is then random to generate 10 times with lethal and combined effect protein combination as positive sample known to being collected from database The negative sample of positive sample quantity.It is as follows:
1:Each protein in network, all target point protein matter being directly connected of screening and source are knocked out at one Point protein, if arbitrarily combination is present in positive sample in these protein, two protein and choosing during this is combined The weight phase adduction for determining the side of protein takes its absolute value, is all carried out in this way to all protein combinations being present in positive sample Operation and will value sum and be labeled as LWpos.Negative sample is done and similarly calculates and is marked as LWneg.It needs to note Meaning is that source point protein and target point protein matter are respectively calculated.
2:With previous step be calculated as a result, we assign each protein to 1, -1 or 0 value according to formula 10.It is public It is close to seek the quantity of the protein of -1 and 1 that threshold value T in formula is set as 2.3.
3:Selected protein x and y simultaneously calculates whether x and y has lethal effect or combined effect.All and x is screened first The protein that is connected with y and be X and Y by two histone matter aggregated labels, and its intersection is labeled as XY.Then 9 He of formula is used The CR that formula 10 is calculatedxyAnd CRAxyValue, CRxyIndicate the ratio of the protein of protein x, y joint effect, CRAxyIt is altogether With the ratio of the protein being evaluated influenced, in formulaIndicate the weight on the side of given protein x connection protein p. Finally, if CRxy>0.3andCRAxy>0.03, then the protein combination be just accredited as with combined effect or lethal effect.
It also needs to carry out following operation in embodiments of the present invention:
1 collects known drug target spot data set
This research obtains 913 known drug target point proteins from Therapeutic Target Database (TTD) Matter, and it is divided into two classes:Inhibiting effect class (inhibitors) and agonism class (agonists).For screening new cancer Related gene.
2 collect gene and protein expression data set
Gene and protein expression data have 3 sources:(1) the protein table downloaded from ProteomicsDB databases Up to data, including 98 cell line samples;(2) the rna expression data downloaded from BioXpress databases, including 18 kinds of cancers The sample of disease type and 660 patients;These genes and protein expression data are mainly used for 2 parts of this research:Training Neural network model:Use the differential expression value compared between cell line in ProteomicsDB and tissue;Predict that cancer is related Gene:Use the data in ProteomicsDB and BioXpress.
3 collect protein interaction data set
Protein interaction (protein-protein interactions, PPIs) data set has collected from five The data of public protein database, i.e. The Biological General Repository for Interaction Datasets(BioGRID)、Human Protein Reference Database(HPRD)、IntAct、The Database It is high-quality in ofInteracting Proteins (DIP) and The MolecularInteraction Database (MINT) These Data Integrations are become the protein interaction data set of a nonredundancy by the interaction protein of amount, and selecting has 224,988 Thermodynamic parameters of 14,759 protein of ProteomicsDB protein expression information.For building nerve net Network.
4 collect protein combination data set
This research and establishment one includes known drug target spot combination (drug combination) and synthetic lethal The protein combination data set of (synthetic lethal) information, from The Drug Combination Database V2.0 (DCDB2) 1,272 pairs of drug target combined informations are obtained and (have wherein removed the drug target group for being not suitable for cancer in database It closes).In addition, synthetic lethal protein combination information from SynLethDB databases (including 13,171 pairs of experimental verifications and 5, 489 pairs of synthetic lethal protein combinations for calculating prediction) and research paper result data (calculate comprising 182 pairs and to predict Synthetic lethal protein combination).This protein combination data set incorporates information above, and it includes 20,062 pair to construct one The network of protein combination.For predicting new protein combination
Fig. 6 is the network for the cancer-associated proteins matter combination that the embodiment of the present invention is predicted.Referring to Fig. 6, this research is excellent The auto-encoder for having changed standard is optimized for a kind of supersparsity model to be applicable in the study relevant feature of cancer.We make Use in ProteomicsDB differential expression data of the protein between cancer cell system and normal structure as input, with protein The connection of unit (hiddenunits) and input unit are implied in interactive network constraint, and each implicit unit can be expressed as One interaction.3 kinds of auto-encoder are respectively trained, merge into 1 single auto-encoder later.After training Network can to capture which Interactions Mode more important to cancer disease process.The differential term of activation primitive is corresponding by being multiplied by Coefficient is restricted to 1.Training process carries out 2,940 times, includes 6 comparisons of a cell line, therefore each cell line every time It is repeated and has trained 30 times.The threshold value of activation primitive is set as 1, and is multiplied by average weight.Learning rate are set as 0.005, Moment coefficient are set as 0.5, and attenuation coefficient (decay coefficient) is set as 1.
The auto-encoder that training is obtained forms one group of five layers of Recognition with Recurrent Neural Network, and every layer uses equal weight auto-encoder.It will export last time and be added with input every time, last output can be by the difference of the key protein learnt Value amplification.This research uses the differential expression Value Data in BioXpress and ProteomicsDB as input value, and definition knocks out Effect is each High Defferential expression protein (differential expression (DE) in deep neural network model value>0.5) input value is changed into the difference of output valve after negative value (modified value=-7.5).By all outputs The absolute value of difference value>0.000001 interaction protein constructs knockdown (KD) network.
The result of protein assessment shows as the form of KD networks, and the change of a protein input value can influence very much The output of other protein only considers the influence to up-regulated expression protein/gene herein, and the influence to the protein can It is positive or negative.A single value indicates the importance of protein in order to obtain, in a KD network, by known cancer target Point protein TTD databases calculate KD score as the cancer importance index (KD in evaluation protein as label Score marking strategies consider direct effect value and indirect effect factor in network).We express protein to each High Defferential DE value>0.5 all proteins are all assessed, and final KD score are the flat of each cell line or disease sample Mean value.
In this research, by the protein (ProteomicsDB of high KD score:KD score>=0.1;BioXpress: KD score>=0.02) it is selected as cancer-associated proteins matter in advance.Comprehensive ProteomicsDB and BioXpress two datasets As a result, it has been found that tentative prediction obtains 4,862 cancer-associated proteins matter, the intersection of two datasets accounts for 87% He of respective sum 85%, wherein include 386 known drug target spots, cover 86.35% can by known drug target spot that this method assess ( Research is using 913 known drug target spot information in TTD databases, wherein there is 447 protein to contain application method assessment Necessary PPI and expressing information).In 500 pre-selection cancer-associated proteins matter of TOP, (average KD score are maximum between cell line Protein) in, have 211 known cancer drug target spots, then to the Gene Ontology (GO) of remaining 289 protein Enrichment analysis finds that the function of these protein is mainly enriched in DNA replication dna (GO:0006270;GO:0006260;GO: 0006268) metabolic pathway (GO:0009058;GO:0044267;GO:0051246;GO:0009894;GO:0019538) it, dyes Matter structure (GO:0051276;GO:0098813;GO:0007059;GO:0051983), cell division correlation (cell cycle GO:0007049;GO:0000278;GO:0022402;GO:1903047;GO:0007346) processes such as.It is pre- in these TOP 500 It selects in cancer-associated proteins matter, some protein being widely studied, such as cellular tumor antigenp53 (TP53)、epidermal growth factorreceptor(EGFR)、GTPase HRas(HRAS)、GTPase NRas (NRAS) and GTPase KRas (KRAS) have higher KD score values.In addition, this research is it has also been found that some new cancer phases Protein is closed, such as amyloidbeta A4protein (APP), neural cell adhesion molecule L1 (L1CAM), thymidine kinase1 (TK1), DNA replication licensing factor MCM2 (MCM2) and MCM4 etc. has higher KD score values, potential drug target can be used as to carry out follow-up study.We pay close attention to One newfound cancer related target app protein.Forefathers are research shows that app protein variation is to lead to Alzheimer's disease The major reason of (Alzheimer's disease, AD), the missing of app protein can be with suppression of cell fission process, and APP The addition of c-terminal of protein can restart fission process.This research the results show that app protein in 18 kinds of cancers and There are higher KD score values in 77 cell lines, shows that app protein may also assist in extremely important function in cancer.
This research also uses the combination of Neural Network model predictive cancer-associated proteins matter.Composition of medicine is proved in recent years It is the effective means for the treatment of of cancer, and synthetic lethal effect for the accurate Therapy study of individual for being very helpful. Since the difference and target spot of individual and cancer types combine huge search space, all drugs are found with routine experiment means Target spot combines and synthetic lethal gene is it is difficult to carry out.Therefore with Bioinformatic methods prescreening protein combination then with reality Verification becomes a key tactics.This method based on the assumption that:The albumen of a pair of effective synthetic lethal or effective target spot Matter combines, they can influence another (some) protein or by another (some) protein influence, these protein can To be come out with known synthetic lethal pair and target spot combined sorting.This research and establishment one combined comprising known drug target spot and and The protein combination data set (including 20,062 pair of protein combination information) of synthetic lethal information, and use ProteomicsDB And obtained KD networks indicate mutual between protein after the expression data input deep neural network model in BioXpress Make relationship.
This research carries out selective analysis to the combination predicted in the combination of known cancer target spot and PPI protein combinations.Its In the combination of the frequency that occurs in cell line higher than 0.25 be retained, the higher protein combination of confidence level, covers in order to obtain The protein combination of 10% cell line of lid is predicted to be the combination of cancer-associated proteins matter.This research is respectively by PPI and known target spot The protein collection that the combination of cancer-associated proteins matter is predicted as in combination constructs two networks (Fig. 6).In the cancer correlation egg of PPI In the network of white matter combination, there are 2,439 pairs of protein combinations, including EGFR, TP53, cyclin-dependentkinase The higher connection angle value such as 2 (CDK2), KRAS and RAC-alpha serine/threonine-proteinkinase (AKT1) (degree) combination of protein (referring to the parts a in Fig. 6).In the network that the cancer-associated proteins matter of known target spot combines In, have 2,543 pairs of protein combinations, including NRAS, KRAS, EGFR, thymidylate synthase (TYMS), ribonucleoside-diphosphate reductase large subunit(RRM1)、DNA topoisomerase 2- Alpha (TOP2A) and serine/threonine-protein kinase Chk1 (CHEK1) grade height connect angle value protein Combination referring to the parts b in Fig. 6).These combinations can be used as the combination of potential cancer-associated proteins matter to carry out follow-up study.
The basic unit of deep neural network model is automatic coding machine, immanent structure and the protein phase of automatic coding machine Interaction network is corresponding, and differential expression data is utilized in training process, therefore can utilize the difference sexology of different tissues Practise the specificity of protein interaction.In addition multiple automatic coding machines are combined, it can will be in interactive network Two protein of middle distance farther out establish contact.This two promise model is due to existing method.
The present invention can be with predictive disease related protein.The essence of Informatics Method predictive disease related protein/gene It is the influence that prediction knocks out a certain protein/gene pairs cellular activity.The effect for accurately calculating gene activation/knockout is related to very The interaction of complicated cellular content, complete and accurate interaction relationship is difficult to measure, and dynamics therein Parameter is more difficult to measure.A kind of simulation based on deep neural network is proposed in this model prediction disease-associated protein method The method of knock-up effect:In depth model, change the input value of each protein, observation exports the difference of generation to assess egg Effect of the white matter to disease.
The present invention can be with the combination of predictive disease related protein.Composition of medicine is proved to be having for disease treatment in recent years Effect means, and synthetic lethal effect for the accurate Therapy study of individual for being very helpful.But due to individual and The difference and target spot of disease type combine huge search space, with routine experiment means find all drug target combinations and Synthetic lethal gene is it is difficult to carry out.Therefore assist prescreening protein combination then to carry out reality again with the calculating of this model Verification becomes a key tactics.This method based on the assumption that:The albumen of a pair of effective synthetic lethal or effective target spot Matter combines, they can influence another (some) protein or by another (some) protein influence, these protein can To be come out with known synthetic lethal pair and target spot combined sorting.This research inputs deep neural network model using expression data The KD networks that obtain afterwards indicate the interaction between protein.
This model also is suitable as a kind of effective analysis method of general genes/proteins matter expression data analysis.This mould Type is to be trained based on genes/proteins matter expression data, therefore the model can be according to the different study of input data to not With interaction specificity under background.Can the model of a well trained be obtained in expression data basis on a large scale in advance, utilized The model trained can quickly analyze the expression data difference feature of same new platform, show the advantage of big data analysis (the method is all suitable for big small-scale new expression data, even the data of small sample amount, can be integrated with flat by this method The extensive expressing information of platform achievees the purpose that analyze data specificity).
The model is more related to combining the differential gene of network to analyze better predictive disease than simple differential gene analysis Gene.This model has important application value in the screening of area of computer aided drug target.
This model also is suitable as a kind of effective analysis method of general genes/proteins matter expression data analysis.This mould Type is to be trained based on genes/proteins matter expression data, therefore the model is applicable in the different study differences according to input data The research of interaction specificity under background.Can the model of a well trained be obtained in expression data basis on a large scale in advance, The expression data difference feature of same new platform can quickly be analyzed using the model trained, shows big data analysis (the method is all suitable for big small-scale new expression data to advantage, even the data of small sample amount, can be integrated by this method With the extensive expressing information of platform, achieve the purpose that analyze data specificity).Therefore using the significant correlation gene of this model Screening technique is also adapted to unicellular sequencing data.Since unicellular sequencing data is originated from single cancer types, it is therefore desirable to Pre-training model is carried out according to given data set.Model used herein is used for following for prostate cancer patients by us herein Ring tumour cell (circulating tumor cells, CTCs) transcript profile data pass through similar flow before as input The neural network model of training prostate cancer specific, for screening prostate cancer related protein and drug and protein combination.
Using the unicellular sequencing data (accession number GSE67980) downloaded from NCBI, wherein including 122 prostate circulating tumor cells (circulating tumor cells, CTCs) sample of 12 patients, 12 forefront Gland tumor sample and 3 normal prostate tissue samples.Using the differential expression of cancer cell system and normal structure as input number According to training prostate cancer specific model, this time training process and model training process before are essentially identical, training prostate cancer Special model shares 2,144 wheels, and (first 5 times are to select the difference table that 536 groups of samples compare at random for 6 comparisons of often wheel progress calculating Up to value, for the last time to select differential expression value that 536 groups of samples compare successively and be repeated 4 times).Wherein, learning rate It is set as 0.005, moment coefficient and is set as 0.5, decay coefficient being set as 1.In 5 layers of Recognition with Recurrent Neural Network In, the threshold value of activation function is set as 0.01, and has been multiplied by average weight.For DE value>0 all eggs White matter, modifiedvalue are set as -4.
The results show that this model can preferably find the relevant key protein of prostate cancer, such as androgen Receptor (AR), kallikrein-2 (KLK2) and KLK3 etc..Androgen receptor (androgenreceptor, AR) is one Important known relating to prostate cancers because.As a result AR not significantly expression in some CTC cell lines is found in, and at these There are the protein of some critical functions in cell line, such as disabled homolog 2 (DAB2), chromatin modification-related protein MEAF6(MEAF6)、tyrosine-proteinkinase JAK1(JAK1)、 interleukin-2receptor subunitbeta(IL2RB)、mitogen-activatedproteinkinase Kinase kinase 2 (MAP3K2), integrin-linkedprotein kinase (ILK) and cyclin-dependent Kinase inhibitor 1 (CDKN1A) etc. have higher KD score, it is meant that these genes are participated in non-AR expression Play an important roll in prostate cancer CTC cell functions (some functions that perhaps can substitute AR).
In addition, the relevant key protein combination of prostate cancer is also predicted in this research.These protein combinations are in prostate Cancer CTC difference iuntercellulars have differences.The higher protein of some coverages is screened out, in prostate cancer related protein It combines in the network constituted, includes some critical function protein, such as signal transducer and activator of transcription 3(STAT3)、exportin-1(XPO1)、cyclin-dependent kinase 4(CDK4)、 microtubule-associated protein 4(MAP4)、bcl-2-likeprotein 1(BCL2L1)。
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other The difference of embodiment, just to refer each other for identical similar portion between each embodiment.For system disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place is said referring to method part It is bright.
Principle and implementation of the present invention are described for specific case used herein, and above example is said The bright method and its core concept for being merely used to help understand the present invention;Meanwhile for those of ordinary skill in the art, foundation The thought of the present invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (10)

1. a kind of method of screening disease medicament target and drug targets combination, which is characterized in that including:
Autocoder is built according to differential expression data of the protein between disease cells system and normal structure;
The knock-up effect of gene is calculated according to the autocoder, structure knocks out network;
According to the knockout neural network forecast disease-associated protein;The related protein is drug targets;
According to the combination for knocking out neural network forecast disease-associated protein, the combination of the related protein is drug targets Combination.
2. the method for screening disease medicament target according to claim 1 and drug targets combination, which is characterized in that described The knock-up effect of gene is calculated according to the autocoder, structure knocks out network and specifically includes:
Deep learning network model is built according to autocoder;
Differential expression spectrum is inputted the deep learning network model, obtains differential expression by given differential expression spectrum Value is denoted as background output B;
Difference value threshold value is set, the gene that difference value in the differential expression spectrum is more than the difference value threshold value is chosen, is denoted as height Expressing gene;
All cance high-expression genes are sorted from big to small by difference value, the maximum cance high-expression gene of difference value is assigned The numerical value of difference value minimum, assigns all cance high-expression genes to new difference value successively in the differential expression spectrum;
In being composed with the differential expression after all cance high-expression genes of removing according to the cance high-expression gene with new difference value Remaining gene constitutes new differential expression spectrum;
The new differential expression spectrum is inputted into the deep learning network model, obtains the second output K;
Threshold value is compared in setting;
The difference for calculating the second output K of all cance high-expression genes and the background output B of the cance high-expression gene, is compared Compared with difference;
The cance high-expression gene that all relatively differences are more than the relatively threshold value is denoted as knockout gene;
According to all gene constructed knockout networks of knockout.
3. the method for screening disease medicament target according to claim 2 and drug targets combination, which is characterized in that described It is specially according to the gene constructed knockout network of knockout:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
4. the method for screening disease medicament target according to claim 1 and drug targets combination, which is characterized in that described It is specifically included according to the knockout neural network forecast disease-associated protein:
Known drug target is set as marker gene, setting testing protein, relevance threshold;
The target point protein matter being connected with the testing protein and source point protein are obtained according to the knockout network;
According to the target point protein matter and the marker gene, the target point protein matter of depression effect and the target spot of activation effect are distinguished Protein;
Calculate the side that the target point protein matter of the depression effect is connected with the testing protein weight and, be denoted as the first weight With;
Calculate the absolute value of the weight on the side that the target point protein matter of the activation effect is connected with the testing protein and, note For the first absolute value and;
The sum for calculating the weight on the side of all positive values of the source point protein is denoted as the second weight and all negative value weights The sum of absolute value, be denoted as the second absolute value and;
According to first weight and first absolute value and second weight and second absolute value and calculate institute State the Relevance scores of testing protein;
The Relevance scores for choosing all testing proteins are higher than the testing protein of the relevance threshold, as disease Related protein.
5. the method for screening disease medicament target according to claim 1 and drug targets combination, which is characterized in that described It is specifically included according to the combination for knocking out neural network forecast disease-associated protein:
With lethal and combined effect protein combination as positive sample known to collecting, 10 are generated at random according to the positive sample The negative sample of times positive sample quantity;
Choose the either objective protein in the knockout network, all target spots being connected directly with the target protein of screening Protein and source point protein;
Judge that the target protein knocked out in network and the target point protein matter of the target protein, source point protein are to deposit It is in the positive sample, is still present in the negative sample;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are present in In the positive sample, then by the side of the target protein, the target point protein matter of the target protein and source point protein Weight addition takes absolute value, and obtains the first combining weights and absolute value;
By the first combining weights of all target proteins in the positive sample with absolute value be added to obtain positive combining weights And absolute value;
If the target protein knocked out in network and the target point protein matter of the target protein, source point protein are present in In the negative sample, then by the side of the target protein, the target point protein matter of the target protein and source point protein Weight addition takes absolute value, and obtains the second combining weights and absolute value;
Second combining weights of all target proteins in the negative sample are added with absolute value obtain negative combining weights and Absolute value;
According to the absolute value of the first combining weights sum and the absolute value of the second combining weights sum by the target protein Matter assigns 1, -1 or 0 value, obtains target protein assignment;
Selected first testing protein and the second testing protein;
Set the first detection threshold value, the second detection threshold value;
The ratio for calculating the protein of first testing protein and the second testing protein joint effect is denoted as common Influence protein ratio;
First testing protein and the second testing protein joint effect are calculated according to the target protein assignment The protein being evaluated ratio, be denoted as joint effect and be evaluated protein ratio, the protein being evaluated is It is assigned a value of 1 or -1 protein;
Judge whether the joint effect protein ratio is more than first detection threshold value, while the joint effect is evaluated Whether protein ratio is more than second detection threshold value;
If so, first testing protein and the combination of the second testing protein are then the combination of disease-associated protein.
6. the system of a kind of screening disease medicament target and drug targets combination, which is characterized in that including:
Autocoding module, it is automatic for being built according to differential expression data of the protein between disease cells system and normal structure Encoder;
Network struction module is knocked out, the knock-up effect for calculating gene according to the autocoder, structure knocks out network;
Related protein prediction module, for according to the knockout neural network forecast disease-associated protein;The related protein As drug targets;
Protein combination prediction module, for according to the combination for knocking out neural network forecast disease-associated protein, the correlation The combination of protein is drug targets combination.
7. the system of screening disease medicament target according to claim 6 and drug targets combination, which is characterized in that described Network struction module is knocked out to specifically include:
Network model construction unit, for building deep learning network model according to autocoder;
Background exports computing unit, and for giving a differential expression spectrum, differential expression spectrum is inputted the deep learning Network model obtains differential expression value, is denoted as background output B;
Cance high-expression gene acquiring unit, for setting difference value threshold value, choose difference value in the differential expression spectrum be more than it is described The gene of difference value threshold value, is denoted as cance high-expression gene;
Cance high-expression gene assignment unit, for all cance high-expression genes to sort from big to small by difference value, by difference value The maximum cance high-expression gene assigns the numerical value of difference value minimum in the differential expression spectrum, successively by all high expression Gene assigns new difference value;
New differential expression composes construction unit, has the cance high-expression gene of new difference value for basis and removes all high tables New differential expression spectrum is constituted up to the remaining gene in the differential expression spectrum after gene;
Second output computing unit obtains second for the new differential expression spectrum to be inputted the deep learning network model Export K;
Compare threshold setting unit, compares threshold value for setting;
Compare difference computational unit, the second output K for calculating all cance high-expression genes and the cance high-expression gene Background exports the difference of B, obtains comparing difference;
Gene acquiring unit is knocked out, the cance high-expression gene for all relatively differences to be more than the relatively threshold value is remembered To knock out gene;
Network struction unit, for according to all gene constructed knockout networks of knockout.
8. the system of screening disease medicament target according to claim 7 and drug targets combination, which is characterized in that described Network struction unit is specially:
Using the knockout gene as the source point for knocking out network;
Side by the gene for knocking out effect gene as the source point;
The weight of the relatively difference as the side.
9. the system of screening disease medicament target according to claim 6 and drug targets combination, which is characterized in that described Related protein prediction module specifically includes:
Marker gene setup unit, for setting known drug target as marker gene, testing protein, relevance threshold;
Target spot source point protein acquiring unit, for obtaining the target spot being connected with the testing protein according to the knockout network Protein and source point protein;
Target point protein matter discrimination unit, for according to the target point protein matter and the marker gene, distinguishing the target of depression effect The target point protein matter of point protein and activation effect;
Inhibit target point protein matter computing unit, the target point protein matter for calculating the depression effect and the testing protein phase Even side weight and, be denoted as the first weight and;
Activate target point protein matter computing unit, the target point protein matter for calculating the activation effect and the testing protein phase Even side weight absolute value sum, be denoted as the first absolute value with;
Source point protein computing unit, the sum of the weight on the side of all positive values for calculating the source point protein are denoted as Two weights and, sum of the absolute value of all negative value weights, be denoted as the second absolute value with;
Relevance scores computing unit, for according to first weight and first absolute value and second weight With the Relevance scores of second absolute value and the calculating testing protein;
Disease-associated protein determination unit, the Relevance scores for choosing all testing proteins are higher than the correlation The testing protein of property threshold value, as disease-associated protein.
10. the system of screening disease medicament target according to claim 6 and drug targets combination, which is characterized in that institute Protein combination prediction module is stated to specifically include:
Sample collection unit, for collect it is known with lethal and combined effect protein combination as positive sample, according to institute State the negative sample that positive sample generates 10 times of positive sample quantity at random;
Target protein screening unit, for choose it is described knockout network in either objective protein, screening it is all with it is described The target point protein matter and source point protein that target protein is connected directly;
First judging unit, the target point protein for judging target protein and the target protein in the knockout network Matter, source point protein are present in the positive sample, are still present in the negative sample;
First combining weights computing unit, if the target for target protein and the target protein in the knockout network Point protein, source point protein are present in the positive sample, then by the target protein, the target spot of the target protein Protein is added with the weight on the side of source point protein and takes absolute value, and obtains the first combining weights and absolute value;
Positive combining weights computing unit, is used for the exhausted of the first combining weights sum of all target proteins in the positive sample Positive combining weights and absolute value are obtained to value addition;
Second combining weights computing unit, if the target for target protein and the target protein in the knockout network Point protein, source point protein are present in the negative sample, then by the target protein, the target spot of the target protein Protein is added with the weight on the side of source point protein and takes absolute value, and obtains the second combining weights and absolute value;
Negative combining weights computing unit, for by the second combining weights of all target proteins in the negative sample and absolutely Value addition obtains negative combining weights and absolute value;
Target protein assignment unit, for according to the absolute value of the first combining weights sum and second combining weights and Absolute value by the target protein assign 1, -1 or 0 value, obtain assignment target protein;
Protein group selection unit, for selecting the first testing protein and the second testing protein, as testing protein group;
Threshold setting unit, for setting the first detection threshold value, the second detection threshold value;
The first protein matter ratio computing unit, it is common for calculating first testing protein and second testing protein The ratio of the protein of influence is denoted as joint effect protein ratio;
Second protein ratio computing unit, for according to the assignment target protein calculate first testing protein and The ratio for the protein of the second testing protein joint effect being evaluated is denoted as joint effect and is evaluated protein ratio Example;
Second judgment unit, for judging whether the joint effect protein ratio is more than first detection threshold value, simultaneously The joint effect is evaluated whether protein ratio is more than second detection threshold value;
Protein combination determination unit, for if so, first testing protein and the combination of the second testing protein are then The combination of disease-associated protein.
CN201810461277.2A 2018-05-15 2018-05-15 Method and system for screening disease drug target and target combination Expired - Fee Related CN108647489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810461277.2A CN108647489B (en) 2018-05-15 2018-05-15 Method and system for screening disease drug target and target combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810461277.2A CN108647489B (en) 2018-05-15 2018-05-15 Method and system for screening disease drug target and target combination

Publications (2)

Publication Number Publication Date
CN108647489A true CN108647489A (en) 2018-10-12
CN108647489B CN108647489B (en) 2020-06-30

Family

ID=63755608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810461277.2A Expired - Fee Related CN108647489B (en) 2018-05-15 2018-05-15 Method and system for screening disease drug target and target combination

Country Status (1)

Country Link
CN (1) CN108647489B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637595A (en) * 2018-12-12 2019-04-16 中国人民解放军军事科学院军事医学研究院 A kind of drug method for relocating, device, electronic equipment and storage medium
CN110021367A (en) * 2018-10-16 2019-07-16 中国人民解放军军事科学院军事医学研究院 Drug integrated information database building method and system based on drug and target information
CN110544506A (en) * 2019-08-27 2019-12-06 上海源兹生物科技有限公司 Protein interaction network-based target point PPIs (Portable information processors) drug property prediction method and device
CN112180098A (en) * 2019-12-06 2021-01-05 中山大学 Screening method of placenta-related disease marker and marker
CN112326767A (en) * 2020-11-03 2021-02-05 浙江大学滨海产业技术研究院 Cancer drug target effect prediction method based on targeted proteomics
CN112820417A (en) * 2021-01-26 2021-05-18 四川大学 Transcriptomics-based prostate cancer drug combination prediction method
CN112927766A (en) * 2021-03-29 2021-06-08 天士力国际基因网络药物创新中心有限公司 Method for screening disease combination drug
CN113053470A (en) * 2019-12-26 2021-06-29 财团法人工业技术研究院 Drug screening system and drug screening method
WO2021208993A1 (en) * 2020-04-17 2021-10-21 中国科学院上海药物研究所 Information processing method and apparatus for predicting drug target
WO2022060139A1 (en) * 2020-09-17 2022-03-24 에스케이 주식회사 Method and system for discovering target by using artificial intelligence
CN115116561A (en) * 2022-06-29 2022-09-27 南方医科大学南方医院 Construction method and application of drug-target protein-schizophrenia interaction network
CN115458061A (en) * 2022-10-13 2022-12-09 南开大学 Drug-protein interaction prediction method and system
US11664094B2 (en) 2019-12-26 2023-05-30 Industrial Technology Research Institute Drug-screening system and drug-screening method
CN116312866A (en) * 2023-05-09 2023-06-23 普瑞基准生物医药(苏州)有限公司 Training method and device for synthetic lethal gene pair prediction model and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101925902A (en) * 2007-11-28 2010-12-22 剑桥企业有限公司 Protein aggregation prediction systems
US20130071418A1 (en) * 2009-11-18 2013-03-21 The Board Of Regents Of The University Of Texas System Physicochemical (PCP) Based Consensus Sequences and Uses Thereof
CN104376234A (en) * 2014-12-03 2015-02-25 苏州大学 Promoter identification method and system
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN107885971A (en) * 2017-10-30 2018-04-06 陕西师范大学 Using the method for improving flower pollination algorithm identification key protein matter

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101925902A (en) * 2007-11-28 2010-12-22 剑桥企业有限公司 Protein aggregation prediction systems
US20130071418A1 (en) * 2009-11-18 2013-03-21 The Board Of Regents Of The University Of Texas System Physicochemical (PCP) Based Consensus Sequences and Uses Thereof
CN104376234A (en) * 2014-12-03 2015-02-25 苏州大学 Promoter identification method and system
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN107885971A (en) * 2017-10-30 2018-04-06 陕西师范大学 Using the method for improving flower pollination algorithm identification key protein matter

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021367A (en) * 2018-10-16 2019-07-16 中国人民解放军军事科学院军事医学研究院 Drug integrated information database building method and system based on drug and target information
CN109637595A (en) * 2018-12-12 2019-04-16 中国人民解放军军事科学院军事医学研究院 A kind of drug method for relocating, device, electronic equipment and storage medium
CN110544506B (en) * 2019-08-27 2022-02-11 上海源兹生物科技有限公司 Protein interaction network-based target point PPIs (Portable information processors) drug property prediction method and device
CN110544506A (en) * 2019-08-27 2019-12-06 上海源兹生物科技有限公司 Protein interaction network-based target point PPIs (Portable information processors) drug property prediction method and device
CN112180098B (en) * 2019-12-06 2023-02-17 中山大学 Screening method of placenta-related disease marker and marker
CN112180098A (en) * 2019-12-06 2021-01-05 中山大学 Screening method of placenta-related disease marker and marker
US11664094B2 (en) 2019-12-26 2023-05-30 Industrial Technology Research Institute Drug-screening system and drug-screening method
CN113053470A (en) * 2019-12-26 2021-06-29 财团法人工业技术研究院 Drug screening system and drug screening method
CN113539366A (en) * 2020-04-17 2021-10-22 中国科学院上海药物研究所 Information processing method and device for predicting drug target
WO2021208993A1 (en) * 2020-04-17 2021-10-21 中国科学院上海药物研究所 Information processing method and apparatus for predicting drug target
WO2022060139A1 (en) * 2020-09-17 2022-03-24 에스케이 주식회사 Method and system for discovering target by using artificial intelligence
CN112326767A (en) * 2020-11-03 2021-02-05 浙江大学滨海产业技术研究院 Cancer drug target effect prediction method based on targeted proteomics
CN112820417A (en) * 2021-01-26 2021-05-18 四川大学 Transcriptomics-based prostate cancer drug combination prediction method
CN112820417B (en) * 2021-01-26 2022-12-23 四川大学 Transcriptomics-based prostate cancer drug combination prediction method
CN112927766A (en) * 2021-03-29 2021-06-08 天士力国际基因网络药物创新中心有限公司 Method for screening disease combination drug
CN115116561A (en) * 2022-06-29 2022-09-27 南方医科大学南方医院 Construction method and application of drug-target protein-schizophrenia interaction network
CN115116561B (en) * 2022-06-29 2023-04-28 南方医科大学南方医院 Application of drug-target protein-schizophrenia interaction network
CN115458061A (en) * 2022-10-13 2022-12-09 南开大学 Drug-protein interaction prediction method and system
CN115458061B (en) * 2022-10-13 2024-01-23 南开大学 Medicine-protein interaction prediction method and system
CN116312866A (en) * 2023-05-09 2023-06-23 普瑞基准生物医药(苏州)有限公司 Training method and device for synthetic lethal gene pair prediction model and electronic equipment
CN116312866B (en) * 2023-05-09 2023-08-08 普瑞基准生物医药(苏州)有限公司 Training method and device for synthetic lethal gene pair prediction model and electronic equipment

Also Published As

Publication number Publication date
CN108647489B (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN108647489A (en) A kind of method and system of screening disease medicament target and target combination
CN109360604A (en) A kind of oophoroma molecule parting forecasting system
Yang et al. Kinase inhibition-related adverse events predicted from in vitro kinome and clinical trial data
CN108830040A (en) A kind of drug sensitivity prediction method based on cell line and drug similitude network
CN105243296A (en) Tumor feature gene selection method combining mRNA and microRNA expression profile chips
Jaume et al. Modeling dense multimodal interactions between biological pathways and histology for survival prediction
KR102639616B1 (en) Method for predicting therapeutic efficacy of combined drug by machine learning ensemble model
US8655598B2 (en) Predictive radiosensitivity network model
KR102431534B1 (en) Model for predicting the toxic side effects of the intended drug and method thereof
CN107111689A (en) Method and system for generating non-coding encoding gene coexpression network
Ge et al. FRL: An integrative feature selection algorithm based on the fisher score, recursive feature elimination, and logistic regression to identify potential genomic biomarkers
So et al. GraphComm: a graph-based deep learning method to predict cell-cell communication in single-cell RNAseq data
Li et al. The research and development thinking on the status of artificial intelligence in traditional Chinese medicine
Ford et al. Selecting compounds for focused screening using linear discriminant analysis and artificial neural networks
Mythili et al. CTCHABC-hybrid online sequential fuzzy Extreme Kernel learning method for detection of Breast Cancer with hierarchical Artificial Bee
Li et al. Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data
CN112397140A (en) Target identification method and device based on allosteric mechanism and storage medium
CN112750510A (en) Method for predicting permeability of blood brain barrier of medicine
CN105243294B (en) A kind of method for predicting the related protein pair of cancer patient prognosis
Xu et al. AutoOmics: An AutoML Tool for Multi-Omics Research
CN110322929A (en) A method of the direct target spot of prediction Chinese medicine compound prescription and action component
Saghapour et al. Explorative Discovery of Gene Signatures and Clinotypes in Glioblastoma Cancer Through GeneTerrain Knowledge Map Representation
US20230178173A1 (en) Systems and methods for gut microbiome precision medicine
Han et al. The combination of single-cell and Seq-RNA sequences revealed homeostatic chondrocyte osteoarthritic immune infiltrate
CN115910212A (en) Method for analyzing cell communication mediated by ligand-receptor interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200630

Termination date: 20210515

CF01 Termination of patent right due to non-payment of annual fee