CN108508207A - The identification method of protein-DNA binding sites - Google Patents
The identification method of protein-DNA binding sites Download PDFInfo
- Publication number
- CN108508207A CN108508207A CN201710245597.XA CN201710245597A CN108508207A CN 108508207 A CN108508207 A CN 108508207A CN 201710245597 A CN201710245597 A CN 201710245597A CN 108508207 A CN108508207 A CN 108508207A
- Authority
- CN
- China
- Prior art keywords
- amino acid
- protein
- dna binding
- attribute
- binding sites
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6845—Methods of identifying protein-protein interactions in protein mixtures
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Engineering & Computer Science (AREA)
- Hematology (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Food Science & Technology (AREA)
- Biotechnology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention proposes the method for determining protein dna binding site and determines the device of protein dna binding site.The method includes:The amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split as multiple candidate units with predetermined amino acid number respectively;It determines the multiple candidate unit amino acid attribute of each, determines the protein dna binding site of the testing protein.Protein dna binding site can be accurately determined using the method and device of the determination protein dna binding site of the present invention, and step is simple, it is easy to operate, significantly reduce cost.
Description
Technical field
The present invention relates to biological fields.In particular it relates to the identification method of protein-DNA binding sites.More
Body, the present invention relates to the method for determining protein-DNA binding sites and the devices of determining protein-DNA binding sites.
Background technology
The interaction of protein and DNA are widely present in the vital movement of cell.DNA molecular serves not only as hereditary object
Matter carrys out coding protein, moreover it is possible to be combined with specific protein, controlling gene expression.Such as DNA replication dna, mRNA transcription with modification with
And the infection etc. of virus is directed to the interaction between DNA and protein.
However, at present to determining that the method and device of protein-DNA binding sites still has much room for improvement.
Invention content
The present invention is directed to solve at least one the technical problems existing in the prior art at least to a certain extent.For this purpose,
The present invention proposes the method and device of determining protein-DNA binding sites.Utilize determining albumen according to the ... of the embodiment of the present invention
The method and device of matter-DNA binding sites can accurately determine protein-DNA binding sites, and step is simple, operation side
Just, cost is significantly reduced.
It should be noted that the present invention is the following discovery based on inventor and completes:
Currently, the method for determining protein-DNA binding sites includes mainly point mutation experiment, DNA mobility shifting realities
It tests, DNaseI foot printing tests, X-ray diffraction, nuclear magnetic resonance etc..But experimental period is long, input is huge, especially some
Protein-DNA complexes be difficult obtain, cause protein function site mark speed lag far behind protein sequence and
The speed that structural information increases.
In view of this, inventor has found by many experiments, some amino acid attributes significantly affect amino acid and are tied with DNA
Close, in turn, the binding site of these amino acid attributes, its protein-DNA based on reference protein and testing protein this
A little amino acid attributes, can accurately determine the protein-DNA binding sites of testing protein.It is utilized as a result, according to the present invention
The method and device of the determination protein-DNA binding sites of embodiment can accurately determine protein-DNA binding sites, and
Step is simple, easy to operate, significantly reduces cost.
For this purpose, in one aspect of the invention, the present invention proposes a kind of method of determining protein-DNA binding sites.
According to an embodiment of the invention, the method includes:Respectively by the amino acid sequence of reference protein collection and testing protein
Amino acid sequence is split as multiple candidate units with predetermined amino acid number;Determine the multiple candidate unit each
Amino acid attribute, the amino acid attribute include selected from least one of following:The average non-binding energy of residue, transfer free energy
Cap-chx, amino acid composition participate in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, alpha-helix tendency
Property, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD total protein
Amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and surface accessibility protein content;
And the attribute based on amino acid in the candidate unit, determine the protein-DNA binding sites of the testing protein.
Inventor has found that above-mentioned amino acid attribute significantly affects amino acid and combined with DNA, in turn, is based on reference protein
These amino acid attributes, its protein-DNA binding sites and testing protein these amino acid attributes, can be accurately
Determine the protein-DNA binding sites of testing protein.Furthermore, it is contemplated that amino acid residue adjacent on protein sequence it
Between there may be interactions, the amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split as having
The multiple candidate units for having predetermined amino acid number, to improve the accuracy of result.It utilizes as a result, according to embodiments of the present invention
The methods of determination protein-DNA binding sites can accurately determine protein-DNA binding sites, and step is simple, behaviour
Facilitate, significantly reduces cost.
According to an embodiment of the invention, the method for above-mentioned determining protein-DNA binding sites can also have following additional
Technical characteristic:
According to an embodiment of the invention, the predetermined amino acid numerical value is 19.It utilizes as a result, according to embodiments of the present invention
The methods of determination protein-DNA binding sites further accurately determine protein-DNA binding sites.
According to an embodiment of the invention, the reference protein collection contains at least 30 reference proteins.It is sharp as a result,
Protein-DNA knots are further accurately determined with the method for determining protein-DNA binding sites according to the ... of the embodiment of the present invention
Close site.
According to an embodiment of the invention, the amino acid in the candidate unit has at least one following attribute, is institute
State the instruction that amino acid is protein-DNA binding sites:The average non-binding energy of residue is -26.17~-7.59;Transfer free energy
Cap-chx is -8.21~1.45;Amino acid group becomes 0.7~8.8;Participate in short- and medium-range it is non-binding can for -14.42~-
5.46;Molecular weight is 75.07~204.24;Transfer free energy vap-oct is -18.6~2.39;Alpha-helix tendentiousness is -0.38
~1.24;Chromatography RF values with high salt are 0.2~0.97;Residue average external volume is 67.5~237.2;Cytochromes synthetic proteins amino
Acid group becomes 1.06~8.36;Principal component III is -0.29~0.49;The amino acid group of SD total proteins becomes 1.15~3.73;It can
And surface area is 0~271.6;The mesophilic protein family amino acids distribution of 18 nonredundancies is 1~9.4;And surface accessibility egg
Bai Hanliang is 0~0.22.Utilize the method for determining protein-DNA binding sites according to the ... of the embodiment of the present invention further as a result,
Accurately determine protein-DNA binding sites.
In another aspect of this invention, the present invention proposes a kind of device of determining protein-DNA binding sites.According to
The embodiment of the present invention, described device include:Component is split, is suitable for the amino acid sequence of reference protein collection and to be measured respectively
The amino acid sequence of protein is split as multiple candidate units with predetermined amino acid number;Amino acid attribute determines component,
The amino acid attribute determines that component is connected with the fractionation component, is adapted to determine that the multiple candidate unit amino of each
Sour attribute, the amino acid attribute include selected from least one of following:The average non-binding energy of residue, transfer free energy cap-
Chx, amino acid composition, participate in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, alpha-helix tendentiousness,
The amino of chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD total protein
Acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and surface accessibility protein content;And
It determines that component, the determining component determine that component is connected with the amino acid attribute, is suitable for based on amino in the candidate unit
The attribute of acid, determines the protein-DNA binding sites of the testing protein.It utilizes as a result, according to the ... of the embodiment of the present invention true
Protein-DNA binding sites can be accurately determined by determining the device of protein-DNA binding sites, and step is simple, operation side
Just, cost is significantly reduced.
The additional aspect and advantage of the present invention will be set forth in part in the description, and will partly become from the following description
Obviously, or practice through the invention is recognized.
Description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become in the description from combination following accompanying drawings to embodiment
Obviously and it is readily appreciated that, wherein:
Fig. 1 shows the flow signal of the method for determining protein-DNA binding sites according to an embodiment of the invention
Figure;
Fig. 2 shows the structural representation of the device of determining protein-DNA binding sites according to an embodiment of the invention
Figure;
Fig. 3 shows actually determined protein-DNA binding sites according to an embodiment of the invention and theoretical setting egg(s)
The comparative analysis schematic diagram of white matter-DNA binding sites;And
Fig. 4 shows the analysis schematic diagram that predetermined amino acid number according to an embodiment of the invention influences result.
Specific implementation mode
The embodiment of the present invention is described below in detail.The embodiments described below is exemplary, and is only used for explaining this hair
It is bright, and be not considered as limiting the invention.
It should be noted that term " first ", " second " are used for description purposes only, it is not understood to indicate or imply phase
To importance or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be with
Express or implicitly include one or more this feature.Further, in the description of the present invention, unless otherwise saying
Bright, the meaning of " plurality " is two or more.
The present invention proposes the method and device of determining protein-DNA binding sites, will be carried out in detail to it respectively below
Description.
The method for determining protein-DNA binding sites
In one aspect of the invention, the present invention proposes a kind of method of determining protein-DNA binding sites.According to
The embodiment of the present invention, referring to Fig. 1, this method includes:
S100 is split as multiple candidate units
In this step, the amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split respectively
For multiple candidate units with predetermined amino acid number.
Inventor will refer to egg in view of there may be interactions between amino acid residue adjacent on protein sequence
The amino acid sequence of white matter collection and the amino acid sequence of testing protein are split as multiple candidates with predetermined amino acid number
Unit, will close on residue feature addition predicted characteristics can improve prediction effect.It utilizes as a result, according to the ... of the embodiment of the present invention true
Protein-DNA binding sites can be accurately determined by determining the method for protein-DNA binding sites, and step is simple, operation side
Just, cost is significantly reduced.
According to an embodiment of the invention, predetermined amino acid numerical value is 19.Inventor has found, the ammonia in each candidate unit
Base acid number (i.e. predetermined amino acid numerical value) can significantly affect the accuracy of result.If number is very few, information content is insufficient;If a
Number is excessive, can reduce the speed of subsequent operation progress, and then reduce whole efficiency.Inventor has found by further investigation, when pre-
Determine amino acid numerical value be 19 when, effect is preferable.
According to an embodiment of the invention, reference protein collection contains at least 30 reference proteins.Inventor has found, with not
As reference less than the amino acid attribute of 30 reference proteins and its with the relationships of protein-DNA binding sites, have relatively strong
Universality, be compared with the amino acid attribute of above-mentioned reference protein using the amino acid attribute of testing protein, can
Accurately determine protein-DNA binding sites.
S200 determines amino acid attribute
In this step, the attribute of each amino acid in each candidate unit in multiple candidate units, amino are determined
Sour attribute includes selected from least one of following:Residue be averaged non-binding energy, transfer free energy cap-chx, amino acid composition, ginseng
Add the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue
Amino acid composition, the accessible surface of average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD total protein
Product, the mesophilic protein family amino acids distribution of 18 nonredundancies and surface accessibility protein content.
Inventor has found that above-mentioned amino acid attribute significantly affects amino acid and combined with DNA, in turn, is based on reference protein
These amino acid attributes, protein-DNA binding sites and testing protein these amino acid attributes, can accurately really
Determine the protein-DNA binding sites of testing protein.
According to an embodiment of the invention, the amino acid in candidate unit have lower Column Properties, be amino acid be protein-
The instruction of DNA binding sites:The average non-binding energy of residue is -26.17~-7.59;Transfer free energy cap-chx be -8.21~
1.45;Amino acid group becomes 0.7~8.8;It can be -14.42~-5.46 that it is non-binding, which to participate in short- and medium-range,;Molecular weight is 75.07
~204.24;Transfer free energy vap-oct is -18.6~2.39;Alpha-helix tendentiousness is -0.38~1.24;Chromatography RF with high salt
Value is 0.2~0.97;Residue average external volume is 67.5~237.2;Cytochromes synthetic proteins amino acid group become 1.06~
8.36;Principal component III is -0.29~0.49;The amino acid group of SD total proteins becomes 1.15~3.73;Accessible surface product for 0~
271.6;The mesophilic protein family amino acids distribution of 18 nonredundancies is 1~9.4;And surface accessibility protein content be 0~
0.22。
It should be noted that the attribute of amino acid is obtained by PSAIA softwares in the present invention.
S300 determines protein-DNA binding sites
In this step, the attribute based on amino acid in candidate unit determines that the protein-DNA of testing protein is combined
Site.According to a particular embodiment of the invention, determine that the protein-DNA of testing protein is combined using the model of kernel function
Site.Specifically, model (core letter is built according to the amino acid sequence of reference protein collection and its protein-DNA binding sites
Number), then the amino acid attribute of testing protein is substituted into model, model of the numerical value (being known as decision value) being calculated 0~1
In enclosing, you can determine that the amino acid is protein-DNA binding sites.
In order to facilitate understanding, the source code of the model of the method for determining protein-DNA binding sites and corresponding is given below
It illustrates:
Determine the device of protein-DNA binding sites
In another aspect of this invention, the present invention proposes a kind of device of determining protein-DNA binding sites.According to
The embodiment of the present invention, referring to Fig. 2, which includes:Component 100 is split, amino acid attribute determines component 200 and determines group
Part 300.As a result, egg can be accurately determined using the device of determining protein-DNA binding sites according to the ... of the embodiment of the present invention
White matter-DNA binding sites, and step is simple, it is easy to operate, significantly reduce cost.
Split component 100
According to an embodiment of the invention, component 100 is split to be suitable for the amino acid sequence of reference protein collection and to wait for respectively
The amino acid sequence for surveying protein is split as multiple candidate units with predetermined amino acid number.
Amino acid attribute determines component 200
According to an embodiment of the invention, amino acid attribute determines that component 200 is connected with component 100 is split, and is adapted to determine that more
A candidate unit amino acid attribute of each, amino acid attribute include selected from least one of following:Residue is average non-binding
Energy, participates in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap- at transfer free energy cap-chx, amino acid composition
Oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component
Amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and the surface of III, SD total protein
Accessibility protein content.
Determine component 300
According to an embodiment of the invention, it determines that component 300 determines that component 200 is connected with amino acid attribute, is suitable for based on time
The attribute of amino acid in menu member determines the protein-DNA binding sites of testing protein.
It will be appreciated to those of skill in the art that above for described by the method for determining protein-DNA binding sites
Feature and advantage, be equally applicable to the device of the determination protein-DNA binding sites, details are not described herein.
The solution of the present invention is explained below in conjunction with embodiment.It will be understood to those of skill in the art that following
Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Particular technique or item are not specified in embodiment
Part, it is carried out according to technology or condition described in document in the art or according to product description.Agents useful for same or instrument
Production firm person is not specified in device, and being can be with conventional products that are commercially available.
Embodiment 1
In this embodiment, the protein that number is 5EEA using in the websites PDB is as testing protein, in following manner
Determine its protein-DNA binding sites:
1,62 reference proteins (shown in table specific as follows) are acquired, from the websites PDB (http://www.rcsb.org/pdb/
Home/home.do its protein structural information and protein-DNA binding sites are obtained on).The amino acid sequence of testing protein
Column information available sources are various, can be experiment acquisition, sequencing acquisition etc..By each reference protein and testing protein
Amino acid sequence is split as multiple units with 19 amino acid.
1AAY | 1AZQ | 1A74 | 1A02 | 1BER-a | 1BF5 | 1BHM-a | 1BL0 | 1B3T |
1CDW | 1CF7-a | 1CJG | 1CMA | 1C0W-b | 1DP7 | 1D02-a | 1D66-a | 1ECR |
1FJL-a | 1GAT | 1GCC | 1GDT-a | 1HCQ-a | 1HCR | 1HDD-c | 1HLO-a | 1HRY |
1HWT-h | 1IFL-a | 1IGN-a | 1IHF | 1LMB-4 | 1MDY-a | 1MEY-c | 1MHD-a | 1MNM |
1MSE | 1OCT | 1PAR-b | 1PDN | 1PER-1 | 1PNR | 1PUE-e | 1PVI-b | 1PYI-a |
1REP-c | 1SRS | 1SVC | 1TC3 | 1TF3 | 1TRO-a | 1TSR-b | 1UBD | 1YRN-a |
1YSA | 1YUI | 1XBR-a | 2BOP | 2DRP-a | 2GLI | 2HDC | 3CRO-1 |
2, for each amino acid in each unit, the lower Column Properties of amino acid are determined:The average non-binding energy of residue turns
It moves free energy cap-chx, amino acid composition, participate in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, α-
Helical propensity, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD
Amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and the surface accessibility of total protein
Protein content.
3, the amino acid category of the amino acid attribute based on reference protein, protein-DNA binding sites and testing protein
Property, determine the protein-DNA binding sites of testing protein.
Fig. 3 give using the present invention the actually determined protein-DNA binding sites of method (practical binding site) with
Protein-DNA the binding sites (theoretically binding site) that the method for document report determines.As can be seen that the method for the present invention
In important indicator (the precision of prediction Ac, under prediction susceptibility Sn, prediction accuracy MCC, ROC curve of four kinds of evaluation and foreca effects
Area) in the performance method that is better than document report, maintain an equal level in performance and the literature procedure of prediction specific index.
Embodiment 2
In this embodiment, influence of the research predetermined amino acid number for result.
Predetermined amino acid number is an odd number, it is contemplated that may be deposited between adjacent amino acid residue on protein sequence
It is interacting, will close on residue feature addition predicted characteristics can improve prediction effect.In the selection of predetermined amino acid number
In, if the choosing of predetermined amino acid number it is too small if information content it is insufficient, if the too conference of choosing reduces program operation speed and predicts
Modelling effect is without too big promotion, so it is also to build a ring important in model to select suitable predetermined amino acid number.
In order to assess influence of the different predetermined amino acid numbers to prediction effect, 11 odd numbers between 3 to 23 have been used
Model is built respectively as predetermined amino acid number, is obtained 11 group model evaluation parameters, be see the table below.Due to comparing in evaluation parameter
These three values of concern Ac, MCC, AUC, therefore tendency chart of these three values with predetermined amino acid number of variations is drawn, it will make a reservation for
It is as shown in Figure 4 to draw line chart as ordinate as abscissa, evaluation parameter for amino acid number.
With the increase of predetermined amino acid number it can be seen from table and Fig. 4, this four parameters of Ac, Sn, Sp, MCC
General trend is all first to rise to decline afterwards, and gradually increased trend is presented in AUC but later stage growth trend gradually slows down, and comprehensive five are commented
The prediction effect of valence index, the model that predetermined amino acid number is built when being 19 is best.Therefore, subsequent experiment all uses 19 to make
For predetermined amino acid number.
Influence of the different predetermined amino acid numbers of table 1 to prediction model
Ac | Sn | Sp | MCC | AUC | |
3 | 0.568844 | 0.556183 | 0.581505 | 0.13801 | 0.599206 |
5 | 0.572204 | 0.596129 | 0.54828 | 0.145152 | 0.615707 |
7 | 0.606075 | 0.601075 | 0.611075 | 0.212586 | 0.637525 |
9 | 0.611102 | 0.617903 | 0.604301 | 0.223451 | 0.653098 |
11 | 0.617608 | 0.621129 | 0.614086 | 0.236466 | 0.665952 |
13 | 0.621855 | 0.606183 | 0.637527 | 0.244997 | 0.671059 |
15 | 0.623522 | 0.617903 | 0.62914 | 0.248033 | 0.681997 |
17 | 0.623495 | 0.612903 | 0.634086 | 0.248059 | 0.677847 |
19 | 0.626855 | 0.614516 | 0.639194 | 0.25445 | 0.680827 |
21 | 0.609462 | 0.599624 | 0.619301 | 0.21966 | 0.683184 |
23 | 0.617769 | 0.618065 | 0.617473 | 0.237091 | 0.682246 |
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, changes, replacing and modification.
Claims (5)
1. a kind of method of determining protein-DNA binding sites, which is characterized in that including:
The amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split as to have predetermined amino respectively
Multiple candidate units of sour number;
Determine the multiple candidate unit amino acid attribute of each, the amino acid attribute include selected from it is following at least it
One:Residue be averaged non-binding energy, transfer free energy cap-chx, amino acid composition, participate in short- and medium-range it is non-binding can, molecule
Amount, transfer free energy vap-oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins
Amino acid composition, the amino acid composition of principal component III, SD total protein, accessible surface product, the mesophilic protein family ammonia of 18 nonredundancies
Base acid is distributed and surface accessibility protein content;And
Based on the attribute of amino acid in the candidate unit, the protein-DNA binding sites of the testing protein are determined.
2. according to the method described in claim 1, it is characterized in that, the predetermined amino acid numerical value is 19.
3. according to the method described in claim 1, it is characterized in that, the reference protein collection contains at least 30 references
Protein.
4. according to the method described in claim 1, it is characterized in that, amino acid in the candidate unit have it is following at least it
One attribute is the instruction that the amino acid is protein-DNA binding sites:
The average non-binding energy of residue is -26.17~-7.59;
Transfer free energy cap-chx is -8.21~1.45;
Amino acid group becomes 0.7~8.8;
It can be -14.42~-5.46 that it is non-binding, which to participate in short- and medium-range,;
Molecular weight is 75.07~204.24;
Transfer free energy vap-oct is -18.6~2.39;
Alpha-helix tendentiousness is -0.38~1.24;
Chromatography RF values with high salt are 0.2~0.97;
Residue average external volume is 67.5~237.2;
Cytochromes synthetic proteins amino acid group becomes 1.06~8.36;
Principal component III is -0.29~0.49;
The amino acid group of SD total proteins becomes 1.15~3.73;
Accessible surface product is 0~271.6;
The mesophilic protein family amino acids distribution of 18 nonredundancies is 1~9.4;And
Surface accessibility protein content is 0~0.22.
5. a kind of device of determining protein-DNA binding sites, which is characterized in that including:
Component is split, suitable for being respectively split as the amino acid sequence of the amino acid sequence of reference protein collection and testing protein
Multiple candidate units with predetermined amino acid number;
Amino acid attribute determines that component, the amino acid attribute determine that component is connected with the fractionation component, is adapted to determine that described
Multiple candidate units amino acid attribute of each, the amino acid attribute include selected from least one of following:Residue is average
Non-binding energy, participates in the non-binding energy of short- and medium-range, molecular weight, transfer freely at transfer free energy cap-chx, amino acid composition
Can vap-oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition,
Principal component III, SD total protein amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies with
And surface accessibility protein content;And
It determines that component, the determining component determine that component is connected with the amino acid attribute, is suitable for based in the candidate unit
The attribute of amino acid determines the protein-DNA binding sites of the testing protein.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710245597.XA CN108508207A (en) | 2017-04-14 | 2017-04-14 | The identification method of protein-DNA binding sites |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710245597.XA CN108508207A (en) | 2017-04-14 | 2017-04-14 | The identification method of protein-DNA binding sites |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108508207A true CN108508207A (en) | 2018-09-07 |
Family
ID=63373335
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710245597.XA Pending CN108508207A (en) | 2017-04-14 | 2017-04-14 | The identification method of protein-DNA binding sites |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108508207A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930152A (en) * | 2012-10-26 | 2013-02-13 | 中国科学院上海药物研究所 | Method and system for simulating ligand molecule and target receptor reaction and calculating and forecasting thermodynamics and kinetics parameters of reaction |
CN105912886A (en) * | 2016-03-29 | 2016-08-31 | 上海师范大学 | Method of predicting binding site of protein in RNA virus gene |
CN106446602A (en) * | 2016-09-06 | 2017-02-22 | 中南大学 | Prediction method and system for RNA binding sites in protein molecules |
-
2017
- 2017-04-14 CN CN201710245597.XA patent/CN108508207A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930152A (en) * | 2012-10-26 | 2013-02-13 | 中国科学院上海药物研究所 | Method and system for simulating ligand molecule and target receptor reaction and calculating and forecasting thermodynamics and kinetics parameters of reaction |
CN105912886A (en) * | 2016-03-29 | 2016-08-31 | 上海师范大学 | Method of predicting binding site of protein in RNA virus gene |
CN106446602A (en) * | 2016-09-06 | 2017-02-22 | 中南大学 | Prediction method and system for RNA binding sites in protein molecules |
Non-Patent Citations (2)
Title |
---|
LIANGJIANG WANG,ET AL.: "Prediction of DNA-binding residues from protein sequence information using random forests", 《BMC GENOMICS》 * |
S.AHMAD ET AL.: "Analysis and prediction of DNA-binding proteins and their binding residues based on composition,sequence and structural information", 《BIOINFORMATICS》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain | |
Fehlmann et al. | miRMaster 2.0: multi-species non-coding RNA sequencing analyses at scale | |
Wang et al. | Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis | |
Gao et al. | Comparison of high-throughput single-cell RNA sequencing data processing pipelines | |
JP2018092575A (en) | Program, device, and method for predicting biological activity of chemical compound | |
Schneider et al. | The utility of differential scanning calorimetry curves of blood plasma for diagnosis, subtype differentiation and predicted survival in lung cancer | |
Zheng et al. | Cistrome Data Browser and Toolkit: analyzing human and mouse genomic data using compendia of ChIP-seq and chromatin accessibility data | |
CN108508207A (en) | The identification method of protein-DNA binding sites | |
Case et al. | Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space | |
Dong et al. | Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features | |
Nadeau et al. | PIGNON: a protein–protein interaction-guided functional enrichment analysis for quantitative proteomics | |
CN103163257A (en) | Processing method of cow urine iTRAQ test data | |
US20030124548A1 (en) | Method for association of genomic and proteomic pathways associated with physiological or pathophysiological processes | |
Kriebel et al. | Nonnegative matrix factorization integrates single-cell multi-omic datasets with partially overlapping features | |
Mir et al. | In vivo ChIP-Seq of nuclear receptors: a rough guide to transform frozen tissues into high-confidence genome-wide binding profiles | |
Liu et al. | CyclicPepedia: a knowledge base of natural and synthetic cyclic peptides | |
CN110970093A (en) | Method and device for screening primer design template and application | |
Zhang et al. | A high platelet-to-lymphocyte ratio predicts all-cause mortality and cardiovascular mortality in maintenance hemodialysis patients | |
CN102321733A (en) | Method for analyzing iTRAQ (isobaric Tags for Relative and Absolute Quantitation) data | |
Pikin et al. | Analysis of postoperative complications after pneumo-n-ectomy using thoracic morbidity and mortality (tmm) system in nsclc patients for a 5-year period | |
Liu et al. | An informatics pipeline for profiling and annotating RNA modifications | |
Kim et al. | Comparative proteomics: assessment of biological variability and dataset comparability | |
Gao et al. | ClusterMap: comparing analyses across multiple single cell RNA-seq profiles | |
CN108763861A (en) | Prediction technique, device, terminal and the medium of protein-protein interaction | |
Al Bkhetan et al. | Multi-levels 3D chromatin interactions prediction using epigenomic profiles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180907 |
|
RJ01 | Rejection of invention patent application after publication |