CN107358064A - The system and method that predicted amino acid makes a variation to protein structure stability influence - Google Patents

The system and method that predicted amino acid makes a variation to protein structure stability influence Download PDF

Info

Publication number
CN107358064A
CN107358064A CN201710533801.8A CN201710533801A CN107358064A CN 107358064 A CN107358064 A CN 107358064A CN 201710533801 A CN201710533801 A CN 201710533801A CN 107358064 A CN107358064 A CN 107358064A
Authority
CN
China
Prior art keywords
amino acid
protein
variation
acid variation
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710533801.8A
Other languages
Chinese (zh)
Inventor
杨洋
朱斐
严文颖
钱福良
郁春江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201710533801.8A priority Critical patent/CN107358064A/en
Publication of CN107358064A publication Critical patent/CN107358064A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

System and method the invention discloses a kind of variation of predicted amino acid to protein structure stability influence, the system are made up of amino acid variation MIM message input module, amino acid variation site attribute computing module, protein sequence attribute computing module, prediction stability change module, prediction result output module;Steps of the method are typing and obtain variation information;About AAindex attributive character and the calculating micro- electric physico-chemical properties feature of amino acid in advance;Calculate the conservative and its protein attribute of protein sequence corresponding with amino acid variation;Influence of the amino acid variation to protein stability is calculated using two layers of three points of random forests algorithms;Store and export prediction result.The amino acid variation and corresponding protein sequence that the present invention can provide according to user, structural stability rise, the reduction or constant of protein where the Accurate Prediction amino acid variation can cause, and corresponding probability, and result is stored and sends user's preservation.

Description

The system and method that predicted amino acid makes a variation to protein structure stability influence
Technical field
The invention belongs to biomedical data analysis technical field, and in particular to a kind of predicted amino acid variation is to protein The system and method that structural stability influences.
Background technology
The important indicator that predicted amino acid variation influences on protein stability is albumen after wild-type protein and variation The Gibbs free value ddG of matter.Current existing Forecasting Methodology is divided into two kinds:One kind is based on directly on energy physical equation meter Calculate, but it is indefinite due to protein physical arrangement, and such result of calculation is inaccurate, and generalization is weak;
Another kind is based on existing experimental data, is predicted with the method for machine learning, but this method can exist it is following this A little problems:
(1)Accuracy is poor, and the data Wrong, missing in experimental data base Protherm general at present is more, causes training dataset It is of poor quality, have a strong impact on the accuracy of prediction result;
(2)Generalization is poor, the method use the related input attribute of a large amount of protein structures, but unknown for protein structure Situation, then it is unpredictable.
(3)Poor practicability, this method lacks a single and batch input of support, and prediction result can be divided into three classes (Variation causes protein stability rise, reduces, be constant)System.
The content of the invention
In order to solve the above problems, the present invention is intended to provide a kind of variation of predicted amino acid is to protein structure stability shadow Loud system and method, the amino acid variation and corresponding protein sequence that the system and method can provide according to user Row, structural stability rise, the reduction or constant of protein where the Accurate Prediction amino acid variation can cause, and it is corresponding general Rate, and result is stored and sends user's preservation.
To realize above-mentioned technical purpose and the technique effect, the present invention is achieved through the following technical solutions:
A kind of variation of predicted amino acid to the system of protein structure stability influence, by amino acid variation MIM message input module, Amino acid variation site attribute computing module, protein sequence attribute computing module, prediction stability change module, prediction result Output module forms, wherein, the amino acid variation MIM message input module calculates with amino acid variation site attribute respectively Module connects with the protein sequence attribute computing module, the amino acid variation site attribute computing module and the albumen Matter sequence properties computing module is connected with the prediction stability change module simultaneously, the prediction stability change module and institute State the connection of prediction result output module;
The function of the amino acid variation MIM message input module for obtain user submit single or groups of amino acid variation and Its protein sequence, and carry out the storage of user profile and data;
The function of the amino acid variation site attribute computing module is according to the amino acid feelings in wild type and anomaly site Condition, corresponding AAindex attributive character values are extracted, and according to amino acid variation data, calculate the amino acid sites after variation Physico-chemical properties feature;
The function of the protein sequence attribute computing module is to calculate the conservative of related protein according to amino acid variation data Property and protein attributive character;
The function of the prediction stability change module is to be become amino acid by two layers of three sorting algorithms based on random forest The different influence to protein stability is calculated and classified, and provides corresponding probability, as prediction result;
The function of the prediction result output module is that prediction result is generated into excel and pdf document forms, storage and automatic postal Part sends user, while supports user's query statistic.
A kind of predicted amino acid variation comprises the following steps to the method for protein structure stability influence:
Step 1)The amino acid variation information that the amino acid variation MIM message input module inputs according to user first, obtain wherein Amino acid variation and its protein sequence, the amino acid variation that then the amino acid variation MIM message input module will be got Data and protein sequence data corresponding with amino acid variation are transmitted separately to amino acid variation site attribute and calculated Module and the protein sequence attribute computing module, meanwhile, all input datas and submit the user profile of data will be by System stores;
Step 2)After the amino acid variation data are received, the amino acid variation site attribute computing module on the one hand from In AAindex databases, according to the amino acid situation in wild type and anomaly site, corresponding AAindex attributes are extracted Characteristic value, on the other hand centered on the amino acid variation site, the distribution situation of each amino acid in adjacent sites is calculated, and It is converted into corresponding amino acid sites physico-chemical properties feature;Then, the amino acid variation site attribute computing module will The AAindex attributive character value extracted and the amino acid sites physico-chemical properties feature simultaneous transmission calculated To the prediction stability change module;
Step 3)After the protein sequence data corresponding with amino acid variation is received, the protein sequence attribute calculates On the one hand module calls BLAST methods to find the homologous sequence of the protein sequence, then construct PSSM matrixes, calculate the albumen The conservative of matter sequence, the input attributive character as prediction;On the other hand ProtDCal algorithms are called, calculate the protein sequence The protein attributive character of row;Then, the protein sequence attribute computing module is by the guarantor of the protein sequence calculated Keeping property and protein attribute simultaneous transmission predict stability change module to described;
Step 4)Receiving the AAindex attributive character value, the amino acid sites physico-chemical properties feature, the albumen After the conservative of matter sequence and the protein attributive character, the prediction stability change module, which uses, is based on random forest Two layers of three disaggregated model Forecasting Methodologies, by influence of the amino acid variation to protein structure stability be classified as influence protein it is steady One kind in qualitative reduction, rise and constant three class, and corresponding probability is calculated, as prediction result;Then, the prediction Stability change module transmits the prediction result calculated to the prediction result output module;
Step 5)After the result according to survey is received, the prediction result output module is first deposited the prediction result Storage, the prediction result is then generated into excel and pdf document forms, and corresponding submission number is sent mail to according to task According to user;For the user of Accreditation System, can incoming task title to check corresponding prediction result, also or input some Specified protein, its all above variation is counted on influence caused by stability.
Further, step 1)In, the method for input amino acid variation information specifically includes following three kinds of modes:
1)Protein sequence, experimental temperature and the pH value inputted where single variation and variation;
2)Protein sequence, experimental temperature and pH value corresponding to the multiple amino acid variations of batch input and each variation;
3)Input protein sequence, experimental temperature and the pH value specified(Purpose is to predict all possible amino on the protein Influence of the acid variation to protein stability).
Further, step 4)In, the specific step of two layers of three disaggregated model Forecasting Methodologies based on random forest It is rapid as follows:
1)According to the calculating in the amino acid variation site attribute computing module and the protein sequence attribute computing module Method and feature extraction algorithm, two classification fallout predictors based on random forest, first classification fallout predictor base are constructed respectively In the important input attribute in part, amino acid variation, which is divided into, causes protein stability to reduce and do not reduce by two classes;The second point Class fallout predictor is based on another part input attribute, and variation, which is divided into, causes protein stability rise and constant two class;
2)Attribute is inputted corresponding to all amino acid variation data to be predicted are extracted respectively, is classified in advance with described first Device is surveyed, amino acid variation, which is categorized into, causes protein stability to reduce and do not reduce by two classes;
3)To being predicted to be the variation data for causing protein stability not reduce in previous step, corresponding input category is extracted again Property, with second classification fallout predictor, the variation of this partial amino-acid is categorized as causing protein stability rise and not Become two classes;
Thus, amino acid variation to be predicted has been divided into the reduction of influence protein stability, rise and constant three class.
Compared with prior art, the beneficial effects of the invention are as follows:
The amino acid variation and corresponding protein sequence that the system and method for the present invention can provide according to user, it is accurate pre- Structural stability rise, the reduction or constant of place protein, and corresponding probability can be caused by surveying the amino acid variation, and will knot Fruit stores and sends user's preservation.This three classes prediction is practical, and has higher predictablity rate, particularly in albumen In the case that matter structure is unknown also can forecast variation influence, generalization is strong.For protein functional assays, auxiliary protein work Journey and design, drug design etc. are significant.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention, and it is described in detail with reference to accompanying drawing. The embodiment of the present invention is shown in detail by following examples and its accompanying drawing.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is structural representation of the predicted amino acid variation to the system of protein structure stability influence of the present invention;
The predicted amino acid variation that Fig. 2 is the present invention is shown two layers of three sorting algorithms in protein structure stability influence method It is intended to;
Fig. 3 is the flow chart of forecast model attributive character extraction algorithm in the present invention.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, the present invention is described in detail.
Shown in Figure 1, a kind of predicted amino acid variation is become to the system of protein structure stability influence by amino acid Different MIM message input module 1, amino acid variation site attribute computing module 2, protein sequence attribute computing module 3, prediction are stable Property change module 4, prediction result output module 5 form, wherein, the amino acid variation MIM message input module 1 respectively with it is described Amino acid variation site attribute computing module 2 and the protein sequence attribute computing module 3 connect, the amino acid variation position Point attribute computing module 2 and the protein sequence attribute computing module 3 connect with the prediction stability change module 4 simultaneously Connect, the prediction stability change module 4 is connected with the prediction result output module 5;
The function of the amino acid variation MIM message input module 1 for obtain user submit single or groups of amino acid variation and Its protein sequence, and carry out the storage of user profile and data;
The function of the amino acid variation site attribute computing module 2 is according to the amino acid feelings in wild type and anomaly site Condition, corresponding AAindex attributive character values are extracted, and according to amino acid variation data, calculate the amino acid sites after variation Physico-chemical properties feature;
The function of the protein sequence attribute computing module 3 is to calculate the conservative of related protein according to amino acid variation data Property and protein attributive character;
The function of the prediction stability change module 4 is to be become amino acid by two layers of three sorting algorithms based on random forest The different influence to protein stability is calculated and classified, and provides corresponding probability, as prediction result;
The function of the prediction result output module 5 is that prediction result is generated into excel and pdf document forms, is stored and automatic Mail sends user, while supports user's query statistic.
Referring to a kind of method that shown in Fig. 1 and Fig. 2, predicted amino acid makes a variation to protein structure stability influence, including Following steps:
Step 1.1)User by the amino acid variation MIM message input module 1 input amino acid variation information, specifically include with Lower three kinds of modes:
(1)Protein sequence, experimental temperature and the pH value inputted where single variation and variation;
(2)Protein sequence, experimental temperature and pH value corresponding to the multiple amino acid variations of batch input and each variation;
(3)Input protein sequence, experimental temperature and the pH value specified(Purpose is to predict all possible amino on the protein Influence of the acid variation to protein stability);
Step 1.2)The amino acid variation MIM message input module 1 obtains the variation information that user submits and verified:If user What is submitted is single variation and corresponding protein title, sequence, experimental temperature, pH value, whether checks variant sites and sequence Unanimously, between inspection temperature span is -20 to 100(Default value is 25), between pH scopes are 0 to 14(Default value is 7), Otherwise report an error and require that user resubmits;What if user submitted is batch amino acid variation and corresponding protein title, sequence The file of the information such as row, check whether file format and reference format are consistent, it is such as consistent, then variant sites and sequence are checked one by one It is whether consistent, between inspection temperature span is -20 to 100(Default value is 25), between pH scopes are 0 to 14(Default value For 7), otherwise report an error and require that user resubmits;
Step 1.3)The amino acid variation MIM message input module 1 obtains the personal information that user submits and verification, constructs task Storage:The personal information submitted to user, mainly addresses of items of mail are verified, and reporting an error for format specification are not met, with user Addresses of items of mail+numbering construction task names, the variation information submitted with user are established the link, stored together in database;
Step 1.4)The amino acid variation MIM message input module 1 becomes by the amino acid variation data got and with amino acid Protein sequence data corresponding to different is transmitted separately to the amino acid variation site attribute computing module 2 and the protein sequence Column Properties computing module 3;
Step 2.1)After the amino acid variation data are received, the amino acid variation site attribute computing module 2 from In AAindex databases, the wild type and anomaly amino acid of variant sites are read, then according to wild type and anomaly site On amino acid situation, corresponding AAindex attributive character values are extracted from AAindex data matrixes;
Step 2.2)The amino acid variation site attribute computing module 2 reads the resi-dues of variant sites, in amino acid sequence In row centered on the position, the window that a length is 21 is defined(10 are respectively taken before and after variant sites), phase in calculation window Each amino acid in the point of ortho position(Classify by its physico-chemical properties)Distribution situation, and be converted into corresponding amino acid sites thing Physicochemical attributive character;
20 kinds of amino acid can be specifically divided into 6 groups by physico-chemical properties, including:
Hydrophobicity(hydrophobic):V、I、L、F、M、W、Y、C;
It is negatively charged(negatively charged):D、E;
It is positively charged(positively charged):R、K、H;
Conformation is special(conformational):G、P;
Polarity(polar):N、Q、S;
It is other:A、T;
Step 2.3)The amino acid variation site attribute computing module 2 by the AAindex attributive character value extracted and The amino acid sites physico-chemical properties feature simultaneous transmission calculated predicts stability change module 4 to described;
Step 3.1)After the protein sequence data corresponding with amino acid variation is received, the protein sequence attribute meter Module 3 is calculated by calling BLAST methods to find the homologous sequence of the protein sequence, PSSM matrixes is then constructed, calculates the egg The conservative of white matter sequence, construct 3 input attributive character;
Step 3.2)The protein sequence attribute computing module 3 calculates the protein sequence by calling ProtDCal algorithms Other 19 energy, the related protein attributive character of structure;
Step 3.3)The protein sequence attribute computing module 3 is by the conservative and protein of the protein sequence calculated Attribute simultaneous transmission predicts stability change module 4 to described;
Step 4.1)Receiving the AAindex attributive character value, the amino acid sites physico-chemical properties feature, the egg Before the conservative of white matter sequence and the protein attributive character, the prediction stability change module 4 uses machine first The method that device learns and manual read combines, obtain training dataset and verify repeatedly, ensure data accuracy;
Step 4.2)Then the prediction stability change module 4 calculates the input attribute needed for prediction, i.e., by the amino acid Computational methods in variant sites attribute computing module 2 and the protein sequence attribute computing module 3, calculate training data Concentrate the site attribute and protein sequence attribute of amino acid variation;
Step 4.3)Then the prediction stability change module 4 uses feature extraction algorithm, the attribute that previous step is calculated Feature iterates, and obtains the input attribute set in Fig. 2 needed for two classification fallout predictors respectively;
Shown in Figure 3, described feature extraction algorithm is described in detail as follows:
Stage one
4.3.1)Start;
4.3.2)5 training datasets in being verified for 5 retransposings, leave 1 and are used as test set every time, 4 conducts in addition Training set;Established respectively on 5 training sets based on all input attributes and be based on random forest(RF)Classification fallout predictor;
4.3.3)The predictablity rate of 5 retransposings checking is calculated, and the arrangement of attribute descending will be inputted according to the importance in RF;
4.3.4)Remove the input attribute for being arranged in most end;
4.3.5)Based on residue input attribute re -training classification fallout predictor;
4.3.6)Judge whether that only 8 input attributes are remaining, if so, then carrying out in next step, if it is not, being then back to step 4.1.3);
4.3.7)Calculate predictablity rate respectively on 5 training sets, and it is defeated by being extracted on corresponding classification fallout predictor Enter attribute to retain and store;
Stage two
4.3.8)An input attribute set is established, includes a kind of stage all input attributes extracted on 5 training sets;
4.3.9)A classification fallout predictor is trained based on all properties in input attribute set, and by these attributes according in RF Importance ranking, choose importance highest attribute and enter final property set;
4.3.10)A classification fallout predictor is trained based on all properties in final property set;
4.3.11)The highest one that sorted in remaining attribute will be inputted inside property set and add final property set;
4.3.12)Re -training classification fallout predictor, if it is possible to increase predictablity rate, then leave the feature category being newly joined Property, otherwise remove;
4.3.13)Judge whether all properties are all traversed in attribute set, if so, then carrying out in next step, if it is not, then returning To step 4.3.10);
4.3.14)The attribute set being finally drawn into;
4.3.15)Terminate;
Step 4.4)And then, the important input attribute of two parts is extracted in attribute set is inputted, constructing two respectively is based on The classification fallout predictor of random forest;First classification fallout predictor amino acid variation is divided into cause protein stability reduce and not Reduce by two classes;Variation is divided into by second classification fallout predictor causes protein stability rise and constant two class;
Step 4.5)It is shown in Figure 2, receiving the AAindex attributive character value, the amino acid sites physical chemistry category Property feature, the conservative of the protein sequence and the protein attributive character after, it is described prediction stability change mould Block 4 inputs attribute corresponding to being extracted respectively to all amino acid variation data to be predicted first, and uses described first Classification fallout predictor, amino acid variation, which is categorized into, causes protein stability to reduce and do not reduce by two classes;
Step 4.6)Then the prediction stability change module 4 causes protein stability not drop to being predicted to be in previous step Low variation data, corresponding input attribute is extracted again, and use second classification fallout predictor, by this part amino Acid variation is categorized as causing protein stability rise and constant two class;
Thus, amino acid variation to be predicted has been divided into the reduction of influence protein stability, rise and constant three class;
Step 4.7)The prediction stability change module 4 transmits the prediction result calculated defeated to the prediction result Go out module 5;
Step 5.1)After the result according to survey is received, the prediction result output module 5 is stored the prediction result;
Step 5.2)According to task, the prediction result is generated excel and pdf file shapes by the prediction result output module 5 Formula, and send mail to the corresponding user for submitting data according to task;
Step 5.3)For the user of Accreditation System, can incoming task title to check corresponding prediction result, also or input certain One specified protein, its all above variation is counted on influence caused by stability.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (4)

1. a kind of predicted amino acid variation is to the system of protein structure stability influence, it is characterised in that:By amino acid variation MIM message input module(1), amino acid variation site attribute computing module(2), protein sequence attribute computing module(3), prediction Stability change module(4), prediction result output module(5)Composition, wherein, the amino acid variation MIM message input module(1) Respectively with the amino acid variation site attribute computing module(2)With the protein sequence attribute computing module(3)Connection, institute State amino acid variation site attribute computing module(2)With the protein sequence attribute computing module(3)Simultaneously with the prediction Stability change module(4)Connection, the prediction stability change module(4)With the prediction result output module(5)Connection;
The amino acid variation MIM message input module(1)Function for obtain user submit single or groups of amino acid variation And its protein sequence, and carry out the storage of user profile and data;
The amino acid variation site attribute computing module(2)Function be according to the amino acid in wild type and anomaly site Situation, corresponding AAindex attributive character values are extracted, and according to amino acid variation data, calculate the amino acid position after variation Point physico-chemical properties feature;
The protein sequence attribute computing module(3)Function be according to amino acid variation data calculate related protein guarantor Keeping property and protein attributive character;
The prediction stability change module(4)Function be by amino acid by two layers of three sorting algorithms based on random forest The influence to protein stability that makes a variation is calculated and classified, and provides corresponding probability, as prediction result;
The prediction result output module(5)Function be by prediction result generate excel and pdf document forms, store and from Dynamic mail sends user, while supports user's query statistic.
2. a kind of predicted amino acid variation using the system as claimed in claim 1 is to the side of protein structure stability influence Method, it is characterised in that comprise the following steps:
Step 1)The amino acid variation MIM message input module(1)The amino acid variation information inputted first according to user, obtain Amino acid variation and its protein sequence therein, then the amino acid variation MIM message input module(1)The ammonia that will be got Base acid variance is according to this and protein sequence data corresponding with amino acid variation is transmitted separately to the amino acid variation site Attribute computing module(2)With the protein sequence attribute computing module(3), meanwhile, all input datas and submission data User profile will be stored by system;
Step 2)After the amino acid variation data are received, the amino acid variation site attribute computing module(2)On the one hand From AAindex databases, according to the amino acid situation in wild type and anomaly site, corresponding AAindex category is extracted Property characteristic value, on the other hand centered on the amino acid variation site, calculate adjacent sites in each amino acid distribution situation, And it is converted into corresponding amino acid sites physico-chemical properties feature;Then, the amino acid variation site attribute computing module (2)The AAindex attributive character value extracted and the amino acid sites physico-chemical properties feature calculated is same When transmit to it is described prediction stability change module(4);
Step 3)After the protein sequence data corresponding with amino acid variation is received, the protein sequence attribute calculates Module(3)On the one hand call BLAST methods to find the homologous sequence of the protein sequence, then construct PSSM matrixes, calculating should The conservative of protein sequence, the input attributive character as prediction;On the other hand ProtDCal algorithms are called, calculate the albumen The protein attributive character of matter sequence;Then, the protein sequence attribute computing module(3)The protein sequence that will be calculated The conservative and protein attribute simultaneous transmission of row predict stability change module to described(4);
Step 4)Receiving the AAindex attributive character value, the amino acid sites physico-chemical properties feature, the albumen After the conservative of matter sequence and the protein attributive character, the prediction stability change module(4)Using based on random Two layers of three disaggregated model Forecasting Methodologies of forest, influence of the amino acid variation to protein structure stability is classified as to influence albumen One kind in the reduction of matter stability, rise and constant three class, and corresponding probability is calculated, as prediction result;Then, it is described Predict stability change module(4)The prediction result calculated is transmitted to the prediction result output module(5);
Step 5)After the result according to survey is received, the prediction result output module(5)The prediction result is carried out first Storage, the prediction result is then generated into excel and pdf document forms, and corresponding submission is sent mail to according to task The user of data;For the user of Accreditation System, can incoming task title to check corresponding prediction result, or input some Specified protein, its all above variation is counted on influence caused by stability.
3. predicted amino acid variation according to claim 2 exists to the method for protein structure stability influence, its feature In step 1)In, the method for input amino acid variation information specifically includes following three kinds of modes:
1)Protein sequence, experimental temperature and the pH value inputted where single variation and variation;
2)Protein sequence, experimental temperature and pH value corresponding to the multiple amino acid variations of batch input and each variation;
3)Input protein sequence, experimental temperature and the pH value specified.
4. predicted amino acid variation according to claim 2 exists to the method for protein structure stability influence, its feature In step 4)In, two layers of three disaggregated model Forecasting Methodologies based on random forest comprise the following steps that:
1)According to the amino acid variation site attribute computing module(2)With the protein sequence attribute computing module(3)In Computational methods and feature extraction algorithm, construct two classification fallout predictors based on random forest respectively, first classification is pre- Amino acid variation is divided into by survey device causes protein stability to reduce and do not reduce by two classes;Second classification fallout predictor, which will make a variation, to be divided To cause protein stability rise and constant two class;
2)Attribute is inputted corresponding to all amino acid variation data to be predicted are extracted respectively, is classified in advance with described first Device is surveyed, amino acid variation, which is categorized into, causes protein stability to reduce and do not reduce by two classes;
3)To being predicted to be the variation data for causing protein stability not reduce in previous step, corresponding input category is extracted again Property, with second classification fallout predictor, the variation of this partial amino-acid is categorized as causing protein stability rise and not Become two classes;Thus, amino acid variation to be predicted has been divided into the reduction of influence protein stability, rise and constant three class.
CN201710533801.8A 2017-07-03 2017-07-03 The system and method that predicted amino acid makes a variation to protein structure stability influence Pending CN107358064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710533801.8A CN107358064A (en) 2017-07-03 2017-07-03 The system and method that predicted amino acid makes a variation to protein structure stability influence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710533801.8A CN107358064A (en) 2017-07-03 2017-07-03 The system and method that predicted amino acid makes a variation to protein structure stability influence

Publications (1)

Publication Number Publication Date
CN107358064A true CN107358064A (en) 2017-11-17

Family

ID=60291947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710533801.8A Pending CN107358064A (en) 2017-07-03 2017-07-03 The system and method that predicted amino acid makes a variation to protein structure stability influence

Country Status (1)

Country Link
CN (1) CN107358064A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801672A (en) * 2018-11-16 2019-05-24 天津大学 Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter
CN110415762A (en) * 2019-08-06 2019-11-05 苏州大学 A kind of system and method based on sequence prediction temperature of protein denaturation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046103A (en) * 2015-07-03 2015-11-11 景德镇陶瓷学院 Novel representation method for protein sequence fusing genetic information
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046103A (en) * 2015-07-03 2015-11-11 景德镇陶瓷学院 Novel representation method for protein sequence fusing genetic information
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
FANG ZHENG等: "Complex network-based random forest algorithm for predicting the impact of amino acid mutation on protein stability", 《化学研究与应用》 *
YASSER B RUIZ-BLANCO等: "ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins", 《BMC BIOINFORMATICS》 *
严文颖: "氨基酸相互作用网络的构建、分析及应用", 《中国博士学位论文全文数据库 基础科技辑》 *
杨洋: "疾病相关氨基酸变异的生物信息学研究", 《中国博士学位论文全文数据库 医药卫生科技辑》 *
王燕春: "面向蛋白质二级结构预测的特征提取方法研究", 《中国优秀硕士学位论文全文数据库 基础科技辑》 *
谌标: "蛋白质中氨基酸变异对其结构稳定性影响的预测", 《中国优秀硕士学位论文全文数据库 基础科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801672A (en) * 2018-11-16 2019-05-24 天津大学 Interaction prediction method between multivariate mutual information and residue combination calorie-protein matter
CN110415762A (en) * 2019-08-06 2019-11-05 苏州大学 A kind of system and method based on sequence prediction temperature of protein denaturation

Similar Documents

Publication Publication Date Title
CN107657015B (en) Interest point recommendation method and device, electronic equipment and storage medium
CN110473083B (en) Tree risk account identification method, device, server and storage medium
CN106919957B (en) Method and device for processing data
CN104615616B (en) group recommendation method and system
CN105389480B (en) Multiclass imbalance genomics data iteration Ensemble feature selection method and system
CN105069470A (en) Classification model training method and device
CN110415091A (en) Shop and Method of Commodity Recommendation, device, equipment and readable storage medium storing program for executing
CN105069122A (en) Personalized recommendation method and recommendation apparatus based on user behaviors
TW201437933A (en) Ranking product search results
CN106651057A (en) Mobile terminal user age prediction method based on installation package sequence table
CN109919532B (en) Logistics node determining method and device
CN107357902A (en) A kind of tables of data categorizing system and method based on correlation rule
CN109726764A (en) A kind of model selection method, device, equipment and medium
CN105761154A (en) Socialized recommendation method and device
CN112131261B (en) Community query method and device based on community network and computer equipment
CN110866775A (en) User air-rail joint inter-city trip information processing method based on machine learning
Cruz-Ramírez et al. A preliminary study of ordinal metrics to guide a multi-objective evolutionary algorithm
WO2024067387A1 (en) User portrait generation method based on characteristic variable scoring, device, vehicle, and storage medium
CN107358064A (en) The system and method that predicted amino acid makes a variation to protein structure stability influence
CN103942251A (en) Method and system for inputting high altitude meteorological data into database based on multiple quality control methods
CN110765351A (en) Target user identification method and device, computer equipment and storage medium
CN107909498B (en) Recommendation method based on area below maximized receiver operation characteristic curve
CN110147449A (en) File classification method and device
CN113537878A (en) Package delivery method, device, equipment and storage medium
CN110415762B (en) System and method for predicting protein denaturation temperature based on sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171117

WD01 Invention patent application deemed withdrawn after publication