CN108508207A - The identification method of protein-DNA binding sites - Google Patents

The identification method of protein-DNA binding sites Download PDF

Info

Publication number
CN108508207A
CN108508207A CN201710245597.XA CN201710245597A CN108508207A CN 108508207 A CN108508207 A CN 108508207A CN 201710245597 A CN201710245597 A CN 201710245597A CN 108508207 A CN108508207 A CN 108508207A
Authority
CN
China
Prior art keywords
amino acid
protein
dna binding
attribute
binding sites
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710245597.XA
Other languages
Chinese (zh)
Inventor
张德强
司婧娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Forestry University
Original Assignee
Beijing Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Forestry University filed Critical Beijing Forestry University
Priority to CN201710245597.XA priority Critical patent/CN108508207A/en
Publication of CN108508207A publication Critical patent/CN108508207A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6845Methods of identifying protein-protein interactions in protein mixtures

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Hematology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Food Science & Technology (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention proposes the method for determining protein dna binding site and determines the device of protein dna binding site.The method includes:The amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split as multiple candidate units with predetermined amino acid number respectively;It determines the multiple candidate unit amino acid attribute of each, determines the protein dna binding site of the testing protein.Protein dna binding site can be accurately determined using the method and device of the determination protein dna binding site of the present invention, and step is simple, it is easy to operate, significantly reduce cost.

Description

The identification method of protein-DNA binding sites
Technical field
The present invention relates to biological fields.In particular it relates to the identification method of protein-DNA binding sites.More Body, the present invention relates to the method for determining protein-DNA binding sites and the devices of determining protein-DNA binding sites.
Background technology
The interaction of protein and DNA are widely present in the vital movement of cell.DNA molecular serves not only as hereditary object Matter carrys out coding protein, moreover it is possible to be combined with specific protein, controlling gene expression.Such as DNA replication dna, mRNA transcription with modification with And the infection etc. of virus is directed to the interaction between DNA and protein.
However, at present to determining that the method and device of protein-DNA binding sites still has much room for improvement.
Invention content
The present invention is directed to solve at least one the technical problems existing in the prior art at least to a certain extent.For this purpose, The present invention proposes the method and device of determining protein-DNA binding sites.Utilize determining albumen according to the ... of the embodiment of the present invention The method and device of matter-DNA binding sites can accurately determine protein-DNA binding sites, and step is simple, operation side Just, cost is significantly reduced.
It should be noted that the present invention is the following discovery based on inventor and completes:
Currently, the method for determining protein-DNA binding sites includes mainly point mutation experiment, DNA mobility shifting realities It tests, DNaseI foot printing tests, X-ray diffraction, nuclear magnetic resonance etc..But experimental period is long, input is huge, especially some Protein-DNA complexes be difficult obtain, cause protein function site mark speed lag far behind protein sequence and The speed that structural information increases.
In view of this, inventor has found by many experiments, some amino acid attributes significantly affect amino acid and are tied with DNA Close, in turn, the binding site of these amino acid attributes, its protein-DNA based on reference protein and testing protein this A little amino acid attributes, can accurately determine the protein-DNA binding sites of testing protein.It is utilized as a result, according to the present invention The method and device of the determination protein-DNA binding sites of embodiment can accurately determine protein-DNA binding sites, and Step is simple, easy to operate, significantly reduces cost.
For this purpose, in one aspect of the invention, the present invention proposes a kind of method of determining protein-DNA binding sites. According to an embodiment of the invention, the method includes:Respectively by the amino acid sequence of reference protein collection and testing protein Amino acid sequence is split as multiple candidate units with predetermined amino acid number;Determine the multiple candidate unit each Amino acid attribute, the amino acid attribute include selected from least one of following:The average non-binding energy of residue, transfer free energy Cap-chx, amino acid composition participate in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, alpha-helix tendency Property, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD total protein Amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and surface accessibility protein content; And the attribute based on amino acid in the candidate unit, determine the protein-DNA binding sites of the testing protein.
Inventor has found that above-mentioned amino acid attribute significantly affects amino acid and combined with DNA, in turn, is based on reference protein These amino acid attributes, its protein-DNA binding sites and testing protein these amino acid attributes, can be accurately Determine the protein-DNA binding sites of testing protein.Furthermore, it is contemplated that amino acid residue adjacent on protein sequence it Between there may be interactions, the amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split as having The multiple candidate units for having predetermined amino acid number, to improve the accuracy of result.It utilizes as a result, according to embodiments of the present invention The methods of determination protein-DNA binding sites can accurately determine protein-DNA binding sites, and step is simple, behaviour Facilitate, significantly reduces cost.
According to an embodiment of the invention, the method for above-mentioned determining protein-DNA binding sites can also have following additional Technical characteristic:
According to an embodiment of the invention, the predetermined amino acid numerical value is 19.It utilizes as a result, according to embodiments of the present invention The methods of determination protein-DNA binding sites further accurately determine protein-DNA binding sites.
According to an embodiment of the invention, the reference protein collection contains at least 30 reference proteins.It is sharp as a result, Protein-DNA knots are further accurately determined with the method for determining protein-DNA binding sites according to the ... of the embodiment of the present invention Close site.
According to an embodiment of the invention, the amino acid in the candidate unit has at least one following attribute, is institute State the instruction that amino acid is protein-DNA binding sites:The average non-binding energy of residue is -26.17~-7.59;Transfer free energy Cap-chx is -8.21~1.45;Amino acid group becomes 0.7~8.8;Participate in short- and medium-range it is non-binding can for -14.42~- 5.46;Molecular weight is 75.07~204.24;Transfer free energy vap-oct is -18.6~2.39;Alpha-helix tendentiousness is -0.38 ~1.24;Chromatography RF values with high salt are 0.2~0.97;Residue average external volume is 67.5~237.2;Cytochromes synthetic proteins amino Acid group becomes 1.06~8.36;Principal component III is -0.29~0.49;The amino acid group of SD total proteins becomes 1.15~3.73;It can And surface area is 0~271.6;The mesophilic protein family amino acids distribution of 18 nonredundancies is 1~9.4;And surface accessibility egg Bai Hanliang is 0~0.22.Utilize the method for determining protein-DNA binding sites according to the ... of the embodiment of the present invention further as a result, Accurately determine protein-DNA binding sites.
In another aspect of this invention, the present invention proposes a kind of device of determining protein-DNA binding sites.According to The embodiment of the present invention, described device include:Component is split, is suitable for the amino acid sequence of reference protein collection and to be measured respectively The amino acid sequence of protein is split as multiple candidate units with predetermined amino acid number;Amino acid attribute determines component, The amino acid attribute determines that component is connected with the fractionation component, is adapted to determine that the multiple candidate unit amino of each Sour attribute, the amino acid attribute include selected from least one of following:The average non-binding energy of residue, transfer free energy cap- Chx, amino acid composition, participate in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, alpha-helix tendentiousness, The amino of chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD total protein Acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and surface accessibility protein content;And It determines that component, the determining component determine that component is connected with the amino acid attribute, is suitable for based on amino in the candidate unit The attribute of acid, determines the protein-DNA binding sites of the testing protein.It utilizes as a result, according to the ... of the embodiment of the present invention true Protein-DNA binding sites can be accurately determined by determining the device of protein-DNA binding sites, and step is simple, operation side Just, cost is significantly reduced.
The additional aspect and advantage of the present invention will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become in the description from combination following accompanying drawings to embodiment Obviously and it is readily appreciated that, wherein:
Fig. 1 shows the flow signal of the method for determining protein-DNA binding sites according to an embodiment of the invention Figure;
Fig. 2 shows the structural representation of the device of determining protein-DNA binding sites according to an embodiment of the invention Figure;
Fig. 3 shows actually determined protein-DNA binding sites according to an embodiment of the invention and theoretical setting egg(s) The comparative analysis schematic diagram of white matter-DNA binding sites;And
Fig. 4 shows the analysis schematic diagram that predetermined amino acid number according to an embodiment of the invention influences result.
Specific implementation mode
The embodiment of the present invention is described below in detail.The embodiments described below is exemplary, and is only used for explaining this hair It is bright, and be not considered as limiting the invention.
It should be noted that term " first ", " second " are used for description purposes only, it is not understood to indicate or imply phase To importance or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be with Express or implicitly include one or more this feature.Further, in the description of the present invention, unless otherwise saying Bright, the meaning of " plurality " is two or more.
The present invention proposes the method and device of determining protein-DNA binding sites, will be carried out in detail to it respectively below Description.
The method for determining protein-DNA binding sites
In one aspect of the invention, the present invention proposes a kind of method of determining protein-DNA binding sites.According to The embodiment of the present invention, referring to Fig. 1, this method includes:
S100 is split as multiple candidate units
In this step, the amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split respectively For multiple candidate units with predetermined amino acid number.
Inventor will refer to egg in view of there may be interactions between amino acid residue adjacent on protein sequence The amino acid sequence of white matter collection and the amino acid sequence of testing protein are split as multiple candidates with predetermined amino acid number Unit, will close on residue feature addition predicted characteristics can improve prediction effect.It utilizes as a result, according to the ... of the embodiment of the present invention true Protein-DNA binding sites can be accurately determined by determining the method for protein-DNA binding sites, and step is simple, operation side Just, cost is significantly reduced.
According to an embodiment of the invention, predetermined amino acid numerical value is 19.Inventor has found, the ammonia in each candidate unit Base acid number (i.e. predetermined amino acid numerical value) can significantly affect the accuracy of result.If number is very few, information content is insufficient;If a Number is excessive, can reduce the speed of subsequent operation progress, and then reduce whole efficiency.Inventor has found by further investigation, when pre- Determine amino acid numerical value be 19 when, effect is preferable.
According to an embodiment of the invention, reference protein collection contains at least 30 reference proteins.Inventor has found, with not As reference less than the amino acid attribute of 30 reference proteins and its with the relationships of protein-DNA binding sites, have relatively strong Universality, be compared with the amino acid attribute of above-mentioned reference protein using the amino acid attribute of testing protein, can Accurately determine protein-DNA binding sites.
S200 determines amino acid attribute
In this step, the attribute of each amino acid in each candidate unit in multiple candidate units, amino are determined Sour attribute includes selected from least one of following:Residue be averaged non-binding energy, transfer free energy cap-chx, amino acid composition, ginseng Add the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue Amino acid composition, the accessible surface of average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD total protein Product, the mesophilic protein family amino acids distribution of 18 nonredundancies and surface accessibility protein content.
Inventor has found that above-mentioned amino acid attribute significantly affects amino acid and combined with DNA, in turn, is based on reference protein These amino acid attributes, protein-DNA binding sites and testing protein these amino acid attributes, can accurately really Determine the protein-DNA binding sites of testing protein.
According to an embodiment of the invention, the amino acid in candidate unit have lower Column Properties, be amino acid be protein- The instruction of DNA binding sites:The average non-binding energy of residue is -26.17~-7.59;Transfer free energy cap-chx be -8.21~ 1.45;Amino acid group becomes 0.7~8.8;It can be -14.42~-5.46 that it is non-binding, which to participate in short- and medium-range,;Molecular weight is 75.07 ~204.24;Transfer free energy vap-oct is -18.6~2.39;Alpha-helix tendentiousness is -0.38~1.24;Chromatography RF with high salt Value is 0.2~0.97;Residue average external volume is 67.5~237.2;Cytochromes synthetic proteins amino acid group become 1.06~ 8.36;Principal component III is -0.29~0.49;The amino acid group of SD total proteins becomes 1.15~3.73;Accessible surface product for 0~ 271.6;The mesophilic protein family amino acids distribution of 18 nonredundancies is 1~9.4;And surface accessibility protein content be 0~ 0.22。
It should be noted that the attribute of amino acid is obtained by PSAIA softwares in the present invention.
S300 determines protein-DNA binding sites
In this step, the attribute based on amino acid in candidate unit determines that the protein-DNA of testing protein is combined Site.According to a particular embodiment of the invention, determine that the protein-DNA of testing protein is combined using the model of kernel function Site.Specifically, model (core letter is built according to the amino acid sequence of reference protein collection and its protein-DNA binding sites Number), then the amino acid attribute of testing protein is substituted into model, model of the numerical value (being known as decision value) being calculated 0~1 In enclosing, you can determine that the amino acid is protein-DNA binding sites.
In order to facilitate understanding, the source code of the model of the method for determining protein-DNA binding sites and corresponding is given below It illustrates:
Determine the device of protein-DNA binding sites
In another aspect of this invention, the present invention proposes a kind of device of determining protein-DNA binding sites.According to The embodiment of the present invention, referring to Fig. 2, which includes:Component 100 is split, amino acid attribute determines component 200 and determines group Part 300.As a result, egg can be accurately determined using the device of determining protein-DNA binding sites according to the ... of the embodiment of the present invention White matter-DNA binding sites, and step is simple, it is easy to operate, significantly reduce cost.
Split component 100
According to an embodiment of the invention, component 100 is split to be suitable for the amino acid sequence of reference protein collection and to wait for respectively The amino acid sequence for surveying protein is split as multiple candidate units with predetermined amino acid number.
Amino acid attribute determines component 200
According to an embodiment of the invention, amino acid attribute determines that component 200 is connected with component 100 is split, and is adapted to determine that more A candidate unit amino acid attribute of each, amino acid attribute include selected from least one of following:Residue is average non-binding Energy, participates in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap- at transfer free energy cap-chx, amino acid composition Oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component Amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and the surface of III, SD total protein Accessibility protein content.
Determine component 300
According to an embodiment of the invention, it determines that component 300 determines that component 200 is connected with amino acid attribute, is suitable for based on time The attribute of amino acid in menu member determines the protein-DNA binding sites of testing protein.
It will be appreciated to those of skill in the art that above for described by the method for determining protein-DNA binding sites Feature and advantage, be equally applicable to the device of the determination protein-DNA binding sites, details are not described herein.
The solution of the present invention is explained below in conjunction with embodiment.It will be understood to those of skill in the art that following Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Particular technique or item are not specified in embodiment Part, it is carried out according to technology or condition described in document in the art or according to product description.Agents useful for same or instrument Production firm person is not specified in device, and being can be with conventional products that are commercially available.
Embodiment 1
In this embodiment, the protein that number is 5EEA using in the websites PDB is as testing protein, in following manner Determine its protein-DNA binding sites:
1,62 reference proteins (shown in table specific as follows) are acquired, from the websites PDB (http://www.rcsb.org/pdb/ Home/home.do its protein structural information and protein-DNA binding sites are obtained on).The amino acid sequence of testing protein Column information available sources are various, can be experiment acquisition, sequencing acquisition etc..By each reference protein and testing protein Amino acid sequence is split as multiple units with 19 amino acid.
1AAY 1AZQ 1A74 1A02 1BER-a 1BF5 1BHM-a 1BL0 1B3T
1CDW 1CF7-a 1CJG 1CMA 1C0W-b 1DP7 1D02-a 1D66-a 1ECR
1FJL-a 1GAT 1GCC 1GDT-a 1HCQ-a 1HCR 1HDD-c 1HLO-a 1HRY
1HWT-h 1IFL-a 1IGN-a 1IHF 1LMB-4 1MDY-a 1MEY-c 1MHD-a 1MNM
1MSE 1OCT 1PAR-b 1PDN 1PER-1 1PNR 1PUE-e 1PVI-b 1PYI-a
1REP-c 1SRS 1SVC 1TC3 1TF3 1TRO-a 1TSR-b 1UBD 1YRN-a
1YSA 1YUI 1XBR-a 2BOP 2DRP-a 2GLI 2HDC 3CRO-1
2, for each amino acid in each unit, the lower Column Properties of amino acid are determined:The average non-binding energy of residue turns It moves free energy cap-chx, amino acid composition, participate in the non-binding energy of short- and medium-range, molecular weight, transfer free energy vap-oct, α- Helical propensity, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, principal component III, SD Amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies and the surface accessibility of total protein Protein content.
3, the amino acid category of the amino acid attribute based on reference protein, protein-DNA binding sites and testing protein Property, determine the protein-DNA binding sites of testing protein.
Fig. 3 give using the present invention the actually determined protein-DNA binding sites of method (practical binding site) with Protein-DNA the binding sites (theoretically binding site) that the method for document report determines.As can be seen that the method for the present invention In important indicator (the precision of prediction Ac, under prediction susceptibility Sn, prediction accuracy MCC, ROC curve of four kinds of evaluation and foreca effects Area) in the performance method that is better than document report, maintain an equal level in performance and the literature procedure of prediction specific index.
Embodiment 2
In this embodiment, influence of the research predetermined amino acid number for result.
Predetermined amino acid number is an odd number, it is contemplated that may be deposited between adjacent amino acid residue on protein sequence It is interacting, will close on residue feature addition predicted characteristics can improve prediction effect.In the selection of predetermined amino acid number In, if the choosing of predetermined amino acid number it is too small if information content it is insufficient, if the too conference of choosing reduces program operation speed and predicts Modelling effect is without too big promotion, so it is also to build a ring important in model to select suitable predetermined amino acid number.
In order to assess influence of the different predetermined amino acid numbers to prediction effect, 11 odd numbers between 3 to 23 have been used Model is built respectively as predetermined amino acid number, is obtained 11 group model evaluation parameters, be see the table below.Due to comparing in evaluation parameter These three values of concern Ac, MCC, AUC, therefore tendency chart of these three values with predetermined amino acid number of variations is drawn, it will make a reservation for It is as shown in Figure 4 to draw line chart as ordinate as abscissa, evaluation parameter for amino acid number.
With the increase of predetermined amino acid number it can be seen from table and Fig. 4, this four parameters of Ac, Sn, Sp, MCC General trend is all first to rise to decline afterwards, and gradually increased trend is presented in AUC but later stage growth trend gradually slows down, and comprehensive five are commented The prediction effect of valence index, the model that predetermined amino acid number is built when being 19 is best.Therefore, subsequent experiment all uses 19 to make For predetermined amino acid number.
Influence of the different predetermined amino acid numbers of table 1 to prediction model
Ac Sn Sp MCC AUC
3 0.568844 0.556183 0.581505 0.13801 0.599206
5 0.572204 0.596129 0.54828 0.145152 0.615707
7 0.606075 0.601075 0.611075 0.212586 0.637525
9 0.611102 0.617903 0.604301 0.223451 0.653098
11 0.617608 0.621129 0.614086 0.236466 0.665952
13 0.621855 0.606183 0.637527 0.244997 0.671059
15 0.623522 0.617903 0.62914 0.248033 0.681997
17 0.623495 0.612903 0.634086 0.248059 0.677847
19 0.626855 0.614516 0.639194 0.25445 0.680827
21 0.609462 0.599624 0.619301 0.21966 0.683184
23 0.617769 0.618065 0.617473 0.237091 0.682246
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims (5)

1. a kind of method of determining protein-DNA binding sites, which is characterized in that including:
The amino acid sequence of the amino acid sequence of reference protein collection and testing protein is split as to have predetermined amino respectively Multiple candidate units of sour number;
Determine the multiple candidate unit amino acid attribute of each, the amino acid attribute include selected from it is following at least it One:Residue be averaged non-binding energy, transfer free energy cap-chx, amino acid composition, participate in short- and medium-range it is non-binding can, molecule Amount, transfer free energy vap-oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins Amino acid composition, the amino acid composition of principal component III, SD total protein, accessible surface product, the mesophilic protein family ammonia of 18 nonredundancies Base acid is distributed and surface accessibility protein content;And
Based on the attribute of amino acid in the candidate unit, the protein-DNA binding sites of the testing protein are determined.
2. according to the method described in claim 1, it is characterized in that, the predetermined amino acid numerical value is 19.
3. according to the method described in claim 1, it is characterized in that, the reference protein collection contains at least 30 references Protein.
4. according to the method described in claim 1, it is characterized in that, amino acid in the candidate unit have it is following at least it One attribute is the instruction that the amino acid is protein-DNA binding sites:
The average non-binding energy of residue is -26.17~-7.59;
Transfer free energy cap-chx is -8.21~1.45;
Amino acid group becomes 0.7~8.8;
It can be -14.42~-5.46 that it is non-binding, which to participate in short- and medium-range,;
Molecular weight is 75.07~204.24;
Transfer free energy vap-oct is -18.6~2.39;
Alpha-helix tendentiousness is -0.38~1.24;
Chromatography RF values with high salt are 0.2~0.97;
Residue average external volume is 67.5~237.2;
Cytochromes synthetic proteins amino acid group becomes 1.06~8.36;
Principal component III is -0.29~0.49;
The amino acid group of SD total proteins becomes 1.15~3.73;
Accessible surface product is 0~271.6;
The mesophilic protein family amino acids distribution of 18 nonredundancies is 1~9.4;And
Surface accessibility protein content is 0~0.22.
5. a kind of device of determining protein-DNA binding sites, which is characterized in that including:
Component is split, suitable for being respectively split as the amino acid sequence of the amino acid sequence of reference protein collection and testing protein Multiple candidate units with predetermined amino acid number;
Amino acid attribute determines that component, the amino acid attribute determine that component is connected with the fractionation component, is adapted to determine that described Multiple candidate units amino acid attribute of each, the amino acid attribute include selected from least one of following:Residue is average Non-binding energy, participates in the non-binding energy of short- and medium-range, molecular weight, transfer freely at transfer free energy cap-chx, amino acid composition Can vap-oct, alpha-helix tendentiousness, chromatography RF values with high salt, residue average external volume, cytochromes synthetic proteins amino acid composition, Principal component III, SD total protein amino acid composition, accessible surface product, the mesophilic protein family amino acids distribution of 18 nonredundancies with And surface accessibility protein content;And
It determines that component, the determining component determine that component is connected with the amino acid attribute, is suitable for based in the candidate unit The attribute of amino acid determines the protein-DNA binding sites of the testing protein.
CN201710245597.XA 2017-04-14 2017-04-14 The identification method of protein-DNA binding sites Pending CN108508207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710245597.XA CN108508207A (en) 2017-04-14 2017-04-14 The identification method of protein-DNA binding sites

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710245597.XA CN108508207A (en) 2017-04-14 2017-04-14 The identification method of protein-DNA binding sites

Publications (1)

Publication Number Publication Date
CN108508207A true CN108508207A (en) 2018-09-07

Family

ID=63373335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710245597.XA Pending CN108508207A (en) 2017-04-14 2017-04-14 The identification method of protein-DNA binding sites

Country Status (1)

Country Link
CN (1) CN108508207A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930152A (en) * 2012-10-26 2013-02-13 中国科学院上海药物研究所 Method and system for simulating ligand molecule and target receptor reaction and calculating and forecasting thermodynamics and kinetics parameters of reaction
CN105912886A (en) * 2016-03-29 2016-08-31 上海师范大学 Method of predicting binding site of protein in RNA virus gene
CN106446602A (en) * 2016-09-06 2017-02-22 中南大学 Prediction method and system for RNA binding sites in protein molecules

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930152A (en) * 2012-10-26 2013-02-13 中国科学院上海药物研究所 Method and system for simulating ligand molecule and target receptor reaction and calculating and forecasting thermodynamics and kinetics parameters of reaction
CN105912886A (en) * 2016-03-29 2016-08-31 上海师范大学 Method of predicting binding site of protein in RNA virus gene
CN106446602A (en) * 2016-09-06 2017-02-22 中南大学 Prediction method and system for RNA binding sites in protein molecules

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIANGJIANG WANG,ET AL.: "Prediction of DNA-binding residues from protein sequence information using random forests", 《BMC GENOMICS》 *
S.AHMAD ET AL.: "Analysis and prediction of DNA-binding proteins and their binding residues based on composition,sequence and structural information", 《BIOINFORMATICS》 *

Similar Documents

Publication Publication Date Title
Liu et al. Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain
Fehlmann et al. miRMaster 2.0: multi-species non-coding RNA sequencing analyses at scale
Wang et al. Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis
Gao et al. Comparison of high-throughput single-cell RNA sequencing data processing pipelines
JP2018092575A (en) Program, device, and method for predicting biological activity of chemical compound
Schneider et al. The utility of differential scanning calorimetry curves of blood plasma for diagnosis, subtype differentiation and predicted survival in lung cancer
Zheng et al. Cistrome Data Browser and Toolkit: analyzing human and mouse genomic data using compendia of ChIP-seq and chromatin accessibility data
CN108508207A (en) The identification method of protein-DNA binding sites
Case et al. Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space
Dong et al. Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features
Nadeau et al. PIGNON: a protein–protein interaction-guided functional enrichment analysis for quantitative proteomics
CN103163257A (en) Processing method of cow urine iTRAQ test data
US20030124548A1 (en) Method for association of genomic and proteomic pathways associated with physiological or pathophysiological processes
Kriebel et al. Nonnegative matrix factorization integrates single-cell multi-omic datasets with partially overlapping features
Mir et al. In vivo ChIP-Seq of nuclear receptors: a rough guide to transform frozen tissues into high-confidence genome-wide binding profiles
Liu et al. CyclicPepedia: a knowledge base of natural and synthetic cyclic peptides
CN110970093A (en) Method and device for screening primer design template and application
Zhang et al. A high platelet-to-lymphocyte ratio predicts all-cause mortality and cardiovascular mortality in maintenance hemodialysis patients
CN102321733A (en) Method for analyzing iTRAQ (isobaric Tags for Relative and Absolute Quantitation) data
Pikin et al. Analysis of postoperative complications after pneumo-n-ectomy using thoracic morbidity and mortality (tmm) system in nsclc patients for a 5-year period
Liu et al. An informatics pipeline for profiling and annotating RNA modifications
Kim et al. Comparative proteomics: assessment of biological variability and dataset comparability
Gao et al. ClusterMap: comparing analyses across multiple single cell RNA-seq profiles
CN108763861A (en) Prediction technique, device, terminal and the medium of protein-protein interaction
Al Bkhetan et al. Multi-levels 3D chromatin interactions prediction using epigenomic profiles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180907

RJ01 Rejection of invention patent application after publication