CN107463793A - Complementary antibody determines area's conformation fingerprint database - Google Patents

Complementary antibody determines area's conformation fingerprint database Download PDF

Info

Publication number
CN107463793A
CN107463793A CN201710475559.3A CN201710475559A CN107463793A CN 107463793 A CN107463793 A CN 107463793A CN 201710475559 A CN201710475559 A CN 201710475559A CN 107463793 A CN107463793 A CN 107463793A
Authority
CN
China
Prior art keywords
conformation
antibody
amino acid
complementary
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710475559.3A
Other languages
Chinese (zh)
Inventor
杨家安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Mai Gro Pharmaceutical Technology Co Ltd
Original Assignee
Nanjing Mai Gro Pharmaceutical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Mai Gro Pharmaceutical Technology Co Ltd filed Critical Nanjing Mai Gro Pharmaceutical Technology Co Ltd
Priority to CN201710475559.3A priority Critical patent/CN107463793A/en
Publication of CN107463793A publication Critical patent/CN107463793A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The present invention relates to complementary antibody to determine area's conformation fingerprint database, belongs to field of bioinformatics.For each antibody protein, including title, amino acid sequence, complementary determining region segment ranges, full information conformation fingerprint totally four groups of data, described amino acid sequence and complementary determining region segment ranges, obtain, described full information conformation fingerprint, handle in two kinds of situation from albumen knowledge data base:For the antibody protein of known three-dimensional structure, entire infrastructure data are obtained in albumen database, and with protein folding shape representation;For the antibody protein of unknown structure, the three-dimensional conformation of antibody complementary determining region is predicted to obtain conformation bands of a spectrum.The present invention not only contains the primary structure of antibody protein sequences and well-regulated secondary structure, and extension covers random tertiary structure.The similarity scores labelled antibody complementary determining region conformational characteristic of conformation fingerprint can be used, the classification for determining area for complementary antibody provides new parameter.

Description

Complementary antibody determines area's conformation fingerprint database
Technical field
The present invention relates to complementary antibody to determine area's conformation fingerprint database, belongs to field of bioinformatics.
Background technology
Antibody is also known as immunoglobulin, is that immune system is used for identifying and resisting bacterium or virus of exotic invasive etc. Large-scale Y shape protein.Antibody can be according to physical chemistry, biological function or origin classification.The development warp of monoclonal antibody Four-stage has been gone through, has been respectively:Mouse monoclonal antibody, chimeric monoclonal antibodies, Humanized monoclonal antibodies and full people Resource monoclonal antibody.
Antibody is the symmetrical structure with 4 polypeptide chains, wherein 2 longer heavy chains (H chains);2 shorter light chain (L Chain).Whole antibody molecule can be divided into constant region and variable region two parts.Variable region is located at the two-arm end of " Y ".In variable region There is the change of sub-fraction amino acid residue especially strong, the residue of these amino acid, which is formed and put in order, is more easy to the area that morphs Domain claims hypervariable region.Hypervariable region is located at molecular surface, and the position on space structure with antigenic determinant because that can form the mutual of precision Mend, therefore hypervariable region is also known as complementary determining region.Hypervariable region amino acid sequence and conformation determine specific recognition of the antibody to antigen And combination.The major function of antibody is to determine that area identifies that antigen is combined with antigen by complementary antibody, so as to effectively The foreign matters such as the microorganism invaded in body, parasite are removed, therefore, it is heavy to closing that complementary antibody determines that the research in area is studied antibody Will.
It is heretofore unknown, there is the three-dimensional structure quilt of about 550 kinds of antibody proteins in global albumen knowledge data base (UniProt) Measure, has 20,000 5 thousand antibody proteins only to just know that primary sequence in addition.Protein structure fingerprint technique based on inventor's exploitation (Protein Structure Fingerprint, PSFT), the antibody egg of known structure is characterized using protein structure fingerprint method In vain, while the antibody protein of unknown structure is predicted.Then, on this basis, establish complementary antibody and determine area's fingerprint database.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of complementary antibody and determine area's conformation fingerprint database.
The complementary antibody of the present invention determines area's conformation fingerprint database, for each antibody protein, including title, amino acid Sequence, complementary determining region segment ranges, full information conformation fingerprint totally four groups of data, described amino acid sequence and complementary determining region Segment ranges, obtain, described full information conformation fingerprint, handle in two kinds of situation from albumen knowledge data base (UniProt):
For the antibody protein of known three-dimensional structure, entire infrastructure data are obtained in albumen database, and rolled over albumen Folded shape representation, then complementary antibody is determined that the collapsed shape code in area extracts, determine the complete of area as the complementary antibody Information conformation fingerprint;
For the antibody protein of unknown structure, the complementary antibody determines that the full information conformation fingerprint in area is that complementary antibody is determined The three-dimensional conformation for determining area is predicted obtained conformation bands of a spectrum.
The amino acid from the point of view of mathematical angle, by different sequences, 5 amino acid can form different arrangements.From complete 5 amino acid are arbitrarily extracted in 20, portion amino acid will can form different arrangements of the sum for 3,200,000.Each row The possibility folded conformation of row can obtain from global Protein Data Bank (PDB), then with protein folding shape code (PFSC) table Show.On this basis, we create a database to collect the folded conformation of above-mentioned 3,200,000 arrangements.This is brand-new Database be named as 5AAPFSC.
In the database of the present invention, determine that area predicts obtained conformation bands of a spectrum for the complementary antibody of unknown structure, pass through Following process obtains:
1) 5 amino acid are arbitrarily extracted from all 20 amino acid, forms sum as 3,200,000 different rows Row, the possibility folded conformation of each arrangement obtains from global Protein Data Bank (PDB), then with protein folding shape code table Show;Create a database and be named as to collect above-mentioned arrangement and its corresponding protein folding shape code, the database 5AAPFSC;
2) protein in area is determined for complementary antibody, along amino acid sequence, since N- ends, is progressively moved to C- End, is successively read every 5 continuous amino acid, its folded conformation that may have directly obtains from 5AAPFSC databases, uses egg The character representation of white collapsed shape code;The albumen corresponding to frequency of occurrences highest folded conformation in Protein Data Bank (PDB) Collapsed shape code makes number one, and the high collapsed shape code of the frequency of occurrences second comes second, sequentially forms one from top to bottom Row, untill collecting completely, every 5 continuous amino acid have different number of folded conformation may;
3) the possible collapsed shape code of the whole of antigen complementary determining region forms an array, and referred to as protein folding conformation is composed Band, represent complementary determining region all possible folded conformations;For each site, pass through its whole possible collapsed shape Being substituted for each other for code, can accurately obtain all possible conformation;The total number of possible conformation is that all every 5 amino acid can The continued product of energy folded conformation number.
Area is determined for any one complementary antibody, can although the number of whole possible space conformations is huge The high space conformation of energy property passes through the high local folded conformation of the frequency of occurrences and combines acquisition.For example, possibility it is high first Individual space conformation is collectively formed by each site frequency of occurrences highest folded configuration shape code;Second space conformation is by every The high collapsed shape code of the individual site frequency of occurrences second is formed, in the position without the second high-frequency conformation, with frequency highest Collapsed shape code as supplement form;3rd space conformation is by the high collapsed shape code of each site frequency of occurrences the 3rd Composition, in the position without the 3rd high-frequency conformation, formed using the collapsed shape code of frequency highest as supplement;It is such as such Push away, form a series of higher possible prediction conformations of possibility.
The complementary determining region of the antibody protein of the sign known structure of invention, and prediction unknown structure complementary antibody area Conformation bands of a spectrum, it all employ PFSC (Protein disclosed in patent ZL200880003164.2 before inventor Folding Shape Code,PFSC).Can fully it be retouched by the PFSC (PFSC) being strictly derived by State the collapsed shape of continuous 5 amino acid fragments.Any collapsed shape of 5 amino acid fragments in protein can pass through 27PFSC vectors describe, and whole 27PFSC vectors employ 26 English alphabets and add $ symbols.Importantly, All 27 PFSC vectors cover a complete mathematical space.Moreover, all the collapsed shape of 27 PFSC vectors is high Spend closely related.Each PFSC vectors can be another vector from a vectorial transition and conversion.
The present invention complementary antibody determine area's conformation fingerprint database may collect in various structure determinations obtain it is complete Portion's data, include the structured data that can not be marked.And the coordinate for only needing to read backbone c atoms (requires no knowledge about other originals Subcoordinate) it just can determine that complementary antibody determines area's conformation fingerprint.Due to not limited by the resolution of measure structure, Ke Yishou Collection determines plot structure comprising more complementary antibodies.Area is determined to the three-D space structure complementary antibody that measure obtains, not only wrapped High-resolution structural containing measuring, and the low-res structured data of measuring can be enumerated.
Area is determined to the complementary antibody of unknown three-D space structure, the present invention is not to be provided solely for single pre- geodesic structure, and It there is provided the prediction of protein conformation fingerprint full information.The possibility situation of change of conformation can be disclosed comprehensively.
The complementary antibody of the present invention determines that area's conformation fingerprint database not only contains the primary structure of antibody protein sequences With well-regulated secondary structure, and extension cover random tertiary structure.The similitude point of conformation fingerprint can be used Number labelled antibody complementary determining region conformational characteristic, the classification for determining area for complementary antibody provide new parameter.
The complementary antibody of the present invention determines area's conformation fingerprint database, comprises more than 500 known antibodies protein structures, and More than the prediction conformation of 2.5 ten thousand unknown structure antibody proteins.
Brief description of the drawings
Fig. 1, the amino acid sequence of the complementary determining region of unknown structure antibody and the fingerprint conformation of prediction.
Embodiment
The complementary antibody of the present invention determines area's conformation fingerprint database, for each antibody protein, including title, amino acid Sequence, variable region fragment scope, full information conformation fingerprint totally four groups of data, described amino acid sequence and variable region fragment model Enclose, obtained from albumen knowledge data base (UniProt), described full information conformation fingerprint, preparation method is as follows:
Whole antibody protein structural informations are collected from global albumen knowledge data base (UniProt).To known three-dimensional knot The antibody protein of structure and the antibody protein of unknown structure are handled respectively.
For about 550 kinds of antibody proteins of known structure, each albumen may have multiple known structures in albumen Database (PDB), entire infrastructure data are obtained from PDB.By each protein structure, pass through PFSC (Protein Folding Shape Code, PFSC) switchs to finger print data.The fingerprint extraction that complementary antibody determines area is come out, The full information conformation fingerprint in area is determined as the complementary antibody.
For about 20,000 5 thousand kinds of antibody proteins of unknown structure, directly obtained in albumen knowledge data base (UniProt) Whole amino acid sequences.(CPPC) method is predicted by the full information of protein conformation, to each ammonia of antibody protein The conformation of base acid sequence is predicted, and prediction result is all with PFSC (Protein Folding Shape Code, PFSC) represent, form prediction bands of a spectrum.Determine that complementary antibody determines the fingerprint in area, the complementary determining region as the antibody Full information conformation fingerprint.
The antibody variable region fingerprint database conformation of establishment will save as XML file format.Antibody variable region fingerprint number It will include title, amino acid sequence, variable region fragment scope, full information conformation fingerprint according to the finger print information in storehouse.
In Fig. 1, the first row represents the complementary determining region amino acid sequence of antibody, and the corresponding structure that prediction obtains is shown below As dactylogram band, the full information conformation fingerprint of the complementary determining region is represented.

Claims (5)

1. a kind of complementary antibody determines area's conformation fingerprint database, for each antibody protein, including title, amino acid sequence, Complementary determining region segment ranges, full information conformation fingerprint totally four groups of data, described amino acid sequence and complementary determining region fragment Scope, obtain, described full information conformation fingerprint, handle in two kinds of situation from albumen knowledge data base:
For the antibody protein of known three-dimensional structure, entire infrastructure data are obtained in albumen database, and with protein folding shape Shape representation, then complementary antibody is determined that the collapsed shape code in area extracts, the full information in area is determined as the complementary antibody Conformation fingerprint;
For the antibody protein of unknown structure, the complementary antibody determines that the full information conformation fingerprint in area is to antibody complementary determining region Three-dimensional conformation be predicted obtained conformation bands of a spectrum;
Determine that area predicts obtained conformation bands of a spectrum for the complementary antibody of unknown structure, obtained by following process:
1) 5 amino acid are arbitrarily extracted from all 20 amino acid, forms sum as 3,200,000 different arrangements, often The possibility folded conformation of one arrangement obtains from global Protein Data Bank, then with protein folding shape representation;Create One database is named as 5AAPFSC to collect above-mentioned arrangement and its corresponding protein folding shape code, the database;
2) protein in area is determined for complementary antibody, along amino acid sequence, since N- ends, is progressively moved to C- ends, according to Secondary to read every 5 continuous amino acid, its folded conformation that may have is directly obtained from 5AAPFSC databases, rolled over albumen The character representation of folded shape code;The protein folding shape code corresponding to frequency of occurrences highest folded conformation in Protein Data Bank Make number one, the high collapsed shape code of the frequency of occurrences second comes second, sequentially forms a row from top to bottom, until collecting Untill completely, every 5 continuous amino acid have different number of folded conformation may;
3) the possible collapsed shape code of the whole of antigen complementary determining region forms an array, referred to as protein folding conformation bands of a spectrum, Represent complementary determining region all possible folded conformations;For each site, pass through its whole possible collapsed shape code Be substituted for each other, all possible conformation can be accurately obtained;The total number of possible conformation is that all every 5 amino acid may The continued product of folded conformation number.
2. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that for unknown structure Complementary antibody determine that the space conformation of conformation bands of a spectrum that area predicts to obtain is rolled over by each site frequency of occurrences highest Folded structure shape code collectively forms.
3. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that for unknown structure Complementary antibody determine that the space conformation of conformation bands of a spectrum that area predicts to obtain is high by each site frequency of occurrences second Collapsed shape code is formed, and in the position without the second high-frequency conformation, is formed using frequency highest collapsed shape code as supplement.
4. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that for unknown structure Complementary antibody determine that the space conformation of conformation bands of a spectrum that area predicts to obtain is high by each site frequency of occurrences the 3rd Collapsed shape code forms, and in the position without the 3rd high-frequency conformation, is formed using frequency highest collapsed shape code as supplement.
5. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that the protein folding Folded shape code, the vector of the collapsed shape of five continuous amino acid residues of corresponding 27 descriptions;The vector is by the following method Structure:
A every five continuous amino acid) are taken in protein as an elementary cell;
B first dihedral angle in each elementary cell) is calculated, the dihedral angle is that the first, the second, the three amino acid determines Plane with second, third, the angle of plane that determines of the 4th amino acid;The dihedral angle is a1, scope determined by a2, a3 it One;
C second dihedral angle in each elementary cell) is calculated, the dihedral angle is second, third, the 4th amino acid determines The angle for the plane that plane is with the three, the 4th, five amino acid determines, the dihedral angle is b1, scope determined by b2, b3 it One;
D the extension distance between first and five amino acid in each elementary cell) is calculated, the distance of upholding is c1, One of scope determined by c2, c3;
E) according to step B, the numerical value that C, D are obtained determines the vector of each elementary cell;
The a1 is from 0 °~130 °, and a2 is from 130 °~-130 °, and a3 is from -130 °~0 °;B1 from 0 °~130 °, b2 from 130 °~- 130 °, b3 is from -130 °~0 °;C1 is from 0~7.0 angstrom, and for c2 from 4.0~17 angstroms, c3 is more than 12 angstroms.
CN201710475559.3A 2017-06-21 2017-06-21 Complementary antibody determines area's conformation fingerprint database Pending CN107463793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710475559.3A CN107463793A (en) 2017-06-21 2017-06-21 Complementary antibody determines area's conformation fingerprint database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710475559.3A CN107463793A (en) 2017-06-21 2017-06-21 Complementary antibody determines area's conformation fingerprint database

Publications (1)

Publication Number Publication Date
CN107463793A true CN107463793A (en) 2017-12-12

Family

ID=60544124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710475559.3A Pending CN107463793A (en) 2017-06-21 2017-06-21 Complementary antibody determines area's conformation fingerprint database

Country Status (1)

Country Link
CN (1) CN107463793A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101647022A (en) * 2007-01-31 2010-02-10 桑迪亚医药技术(上海)有限责任公司 Methods, systems, algorithyms and means for describing the possible conformations of actual and theoretical proteins and for evaluating actual and theoretical proteins with respect to folding, overall
CN105205351A (en) * 2015-09-25 2015-12-30 麦科罗医药科技(武汉)有限公司 High-throughput retrieval method for drug targets
CN105243292A (en) * 2015-09-25 2016-01-13 麦科罗医药科技(武汉)有限公司 Protein structure fingerprint database
CN105260626A (en) * 2015-09-25 2016-01-20 麦科罗医药科技(武汉)有限公司 Complete prediction method for protein structure spatial conformation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101647022A (en) * 2007-01-31 2010-02-10 桑迪亚医药技术(上海)有限责任公司 Methods, systems, algorithyms and means for describing the possible conformations of actual and theoretical proteins and for evaluating actual and theoretical proteins with respect to folding, overall
CN105205351A (en) * 2015-09-25 2015-12-30 麦科罗医药科技(武汉)有限公司 High-throughput retrieval method for drug targets
CN105243292A (en) * 2015-09-25 2016-01-13 麦科罗医药科技(武汉)有限公司 Protein structure fingerprint database
CN105260626A (en) * 2015-09-25 2016-01-20 麦科罗医药科技(武汉)有限公司 Complete prediction method for protein structure spatial conformation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANFU ZHOU ET AL.: "Rapid search for tertiary fragments reveals protein sequence–structure relationships", 《PROTEIN SCIENCE》 *

Similar Documents

Publication Publication Date Title
Jia et al. Gabor cube selection based multitask joint sparse representation for hyperspectral image classification
Saha et al. BcePred: prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties
Justino et al. Reconstructing shredded documents through feature matching
Han et al. A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC
Bu et al. Wat: Finding top-k discords in time series database
Zhao et al. Antibody-specified B-cell epitope prediction in line with the principle of context-awareness
CN109858477A (en) The Raman spectrum analysis method of object is identified in complex environment with depth forest
CN103679192A (en) Image scene type discrimination method based on covariance features
Nyborg et al. Generalized classification of satellite image time series with thermal positional encoding
CN105678342B (en) Corn seed hyperspectral image band selection method based on the joint degree of bias
CN112381144A (en) Heterogeneous deep network method for non-European and European domain space spectrum feature learning
CN117292742A (en) Anticancer peptide identification method and system
Rusakov et al. Towards query-by-eXpression retrieval of cuneiform signs
Mirceva et al. Efficient approaches for retrieving protein tertiary structures
CN107463793A (en) Complementary antibody determines area's conformation fingerprint database
CN105260626B (en) The full information Forecasting Methodology of protein structure space conformation
CN107451421A (en) epitope conformation fingerprint database
Einav et al. Quantitatively visualizing bipartite datasets
Zhang et al. DeepANIS: Predicting antibody paratope from concatenated CDR sequences by integrating bidirectional long-short-term memory and transformer neural networks
Tripathi et al. TemPred: A novel protein template search engine to improve protein structure prediction
CN1889086A (en) Cross reaction antigen computer-aided screening method
Tonnelier et al. Machine learning of generic reactions: 3. an efficient algorithm for maximal common substructure determination
Rathod et al. An extensive review of deep learning driven remote sensing image classification models
Poona et al. Reducing hyperspectral data dimensionality using random forest based wrappers
Benros et al. Analyzing the sequence–structure relationship of a library of local structural prototypes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171212