CN107463793A - Complementary antibody determines area's conformation fingerprint database - Google Patents
Complementary antibody determines area's conformation fingerprint database Download PDFInfo
- Publication number
- CN107463793A CN107463793A CN201710475559.3A CN201710475559A CN107463793A CN 107463793 A CN107463793 A CN 107463793A CN 201710475559 A CN201710475559 A CN 201710475559A CN 107463793 A CN107463793 A CN 107463793A
- Authority
- CN
- China
- Prior art keywords
- conformation
- antibody
- amino acid
- complementary
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Peptides Or Proteins (AREA)
Abstract
The present invention relates to complementary antibody to determine area's conformation fingerprint database, belongs to field of bioinformatics.For each antibody protein, including title, amino acid sequence, complementary determining region segment ranges, full information conformation fingerprint totally four groups of data, described amino acid sequence and complementary determining region segment ranges, obtain, described full information conformation fingerprint, handle in two kinds of situation from albumen knowledge data base:For the antibody protein of known three-dimensional structure, entire infrastructure data are obtained in albumen database, and with protein folding shape representation;For the antibody protein of unknown structure, the three-dimensional conformation of antibody complementary determining region is predicted to obtain conformation bands of a spectrum.The present invention not only contains the primary structure of antibody protein sequences and well-regulated secondary structure, and extension covers random tertiary structure.The similarity scores labelled antibody complementary determining region conformational characteristic of conformation fingerprint can be used, the classification for determining area for complementary antibody provides new parameter.
Description
Technical field
The present invention relates to complementary antibody to determine area's conformation fingerprint database, belongs to field of bioinformatics.
Background technology
Antibody is also known as immunoglobulin, is that immune system is used for identifying and resisting bacterium or virus of exotic invasive etc.
Large-scale Y shape protein.Antibody can be according to physical chemistry, biological function or origin classification.The development warp of monoclonal antibody
Four-stage has been gone through, has been respectively:Mouse monoclonal antibody, chimeric monoclonal antibodies, Humanized monoclonal antibodies and full people
Resource monoclonal antibody.
Antibody is the symmetrical structure with 4 polypeptide chains, wherein 2 longer heavy chains (H chains);2 shorter light chain (L
Chain).Whole antibody molecule can be divided into constant region and variable region two parts.Variable region is located at the two-arm end of " Y ".In variable region
There is the change of sub-fraction amino acid residue especially strong, the residue of these amino acid, which is formed and put in order, is more easy to the area that morphs
Domain claims hypervariable region.Hypervariable region is located at molecular surface, and the position on space structure with antigenic determinant because that can form the mutual of precision
Mend, therefore hypervariable region is also known as complementary determining region.Hypervariable region amino acid sequence and conformation determine specific recognition of the antibody to antigen
And combination.The major function of antibody is to determine that area identifies that antigen is combined with antigen by complementary antibody, so as to effectively
The foreign matters such as the microorganism invaded in body, parasite are removed, therefore, it is heavy to closing that complementary antibody determines that the research in area is studied antibody
Will.
It is heretofore unknown, there is the three-dimensional structure quilt of about 550 kinds of antibody proteins in global albumen knowledge data base (UniProt)
Measure, has 20,000 5 thousand antibody proteins only to just know that primary sequence in addition.Protein structure fingerprint technique based on inventor's exploitation
(Protein Structure Fingerprint, PSFT), the antibody egg of known structure is characterized using protein structure fingerprint method
In vain, while the antibody protein of unknown structure is predicted.Then, on this basis, establish complementary antibody and determine area's fingerprint database.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of complementary antibody and determine area's conformation fingerprint database.
The complementary antibody of the present invention determines area's conformation fingerprint database, for each antibody protein, including title, amino acid
Sequence, complementary determining region segment ranges, full information conformation fingerprint totally four groups of data, described amino acid sequence and complementary determining region
Segment ranges, obtain, described full information conformation fingerprint, handle in two kinds of situation from albumen knowledge data base (UniProt):
For the antibody protein of known three-dimensional structure, entire infrastructure data are obtained in albumen database, and rolled over albumen
Folded shape representation, then complementary antibody is determined that the collapsed shape code in area extracts, determine the complete of area as the complementary antibody
Information conformation fingerprint;
For the antibody protein of unknown structure, the complementary antibody determines that the full information conformation fingerprint in area is that complementary antibody is determined
The three-dimensional conformation for determining area is predicted obtained conformation bands of a spectrum.
The amino acid from the point of view of mathematical angle, by different sequences, 5 amino acid can form different arrangements.From complete
5 amino acid are arbitrarily extracted in 20, portion amino acid will can form different arrangements of the sum for 3,200,000.Each row
The possibility folded conformation of row can obtain from global Protein Data Bank (PDB), then with protein folding shape code (PFSC) table
Show.On this basis, we create a database to collect the folded conformation of above-mentioned 3,200,000 arrangements.This is brand-new
Database be named as 5AAPFSC.
In the database of the present invention, determine that area predicts obtained conformation bands of a spectrum for the complementary antibody of unknown structure, pass through
Following process obtains:
1) 5 amino acid are arbitrarily extracted from all 20 amino acid, forms sum as 3,200,000 different rows
Row, the possibility folded conformation of each arrangement obtains from global Protein Data Bank (PDB), then with protein folding shape code table
Show;Create a database and be named as to collect above-mentioned arrangement and its corresponding protein folding shape code, the database
5AAPFSC;
2) protein in area is determined for complementary antibody, along amino acid sequence, since N- ends, is progressively moved to C-
End, is successively read every 5 continuous amino acid, its folded conformation that may have directly obtains from 5AAPFSC databases, uses egg
The character representation of white collapsed shape code;The albumen corresponding to frequency of occurrences highest folded conformation in Protein Data Bank (PDB)
Collapsed shape code makes number one, and the high collapsed shape code of the frequency of occurrences second comes second, sequentially forms one from top to bottom
Row, untill collecting completely, every 5 continuous amino acid have different number of folded conformation may;
3) the possible collapsed shape code of the whole of antigen complementary determining region forms an array, and referred to as protein folding conformation is composed
Band, represent complementary determining region all possible folded conformations;For each site, pass through its whole possible collapsed shape
Being substituted for each other for code, can accurately obtain all possible conformation;The total number of possible conformation is that all every 5 amino acid can
The continued product of energy folded conformation number.
Area is determined for any one complementary antibody, can although the number of whole possible space conformations is huge
The high space conformation of energy property passes through the high local folded conformation of the frequency of occurrences and combines acquisition.For example, possibility it is high first
Individual space conformation is collectively formed by each site frequency of occurrences highest folded configuration shape code;Second space conformation is by every
The high collapsed shape code of the individual site frequency of occurrences second is formed, in the position without the second high-frequency conformation, with frequency highest
Collapsed shape code as supplement form;3rd space conformation is by the high collapsed shape code of each site frequency of occurrences the 3rd
Composition, in the position without the 3rd high-frequency conformation, formed using the collapsed shape code of frequency highest as supplement;It is such as such
Push away, form a series of higher possible prediction conformations of possibility.
The complementary determining region of the antibody protein of the sign known structure of invention, and prediction unknown structure complementary antibody area
Conformation bands of a spectrum, it all employ PFSC (Protein disclosed in patent ZL200880003164.2 before inventor
Folding Shape Code,PFSC).Can fully it be retouched by the PFSC (PFSC) being strictly derived by
State the collapsed shape of continuous 5 amino acid fragments.Any collapsed shape of 5 amino acid fragments in protein can pass through
27PFSC vectors describe, and whole 27PFSC vectors employ 26 English alphabets and add $ symbols.Importantly,
All 27 PFSC vectors cover a complete mathematical space.Moreover, all the collapsed shape of 27 PFSC vectors is high
Spend closely related.Each PFSC vectors can be another vector from a vectorial transition and conversion.
The present invention complementary antibody determine area's conformation fingerprint database may collect in various structure determinations obtain it is complete
Portion's data, include the structured data that can not be marked.And the coordinate for only needing to read backbone c atoms (requires no knowledge about other originals
Subcoordinate) it just can determine that complementary antibody determines area's conformation fingerprint.Due to not limited by the resolution of measure structure, Ke Yishou
Collection determines plot structure comprising more complementary antibodies.Area is determined to the three-D space structure complementary antibody that measure obtains, not only wrapped
High-resolution structural containing measuring, and the low-res structured data of measuring can be enumerated.
Area is determined to the complementary antibody of unknown three-D space structure, the present invention is not to be provided solely for single pre- geodesic structure, and
It there is provided the prediction of protein conformation fingerprint full information.The possibility situation of change of conformation can be disclosed comprehensively.
The complementary antibody of the present invention determines that area's conformation fingerprint database not only contains the primary structure of antibody protein sequences
With well-regulated secondary structure, and extension cover random tertiary structure.The similitude point of conformation fingerprint can be used
Number labelled antibody complementary determining region conformational characteristic, the classification for determining area for complementary antibody provide new parameter.
The complementary antibody of the present invention determines area's conformation fingerprint database, comprises more than 500 known antibodies protein structures, and
More than the prediction conformation of 2.5 ten thousand unknown structure antibody proteins.
Brief description of the drawings
Fig. 1, the amino acid sequence of the complementary determining region of unknown structure antibody and the fingerprint conformation of prediction.
Embodiment
The complementary antibody of the present invention determines area's conformation fingerprint database, for each antibody protein, including title, amino acid
Sequence, variable region fragment scope, full information conformation fingerprint totally four groups of data, described amino acid sequence and variable region fragment model
Enclose, obtained from albumen knowledge data base (UniProt), described full information conformation fingerprint, preparation method is as follows:
Whole antibody protein structural informations are collected from global albumen knowledge data base (UniProt).To known three-dimensional knot
The antibody protein of structure and the antibody protein of unknown structure are handled respectively.
For about 550 kinds of antibody proteins of known structure, each albumen may have multiple known structures in albumen
Database (PDB), entire infrastructure data are obtained from PDB.By each protein structure, pass through PFSC
(Protein Folding Shape Code, PFSC) switchs to finger print data.The fingerprint extraction that complementary antibody determines area is come out,
The full information conformation fingerprint in area is determined as the complementary antibody.
For about 20,000 5 thousand kinds of antibody proteins of unknown structure, directly obtained in albumen knowledge data base (UniProt)
Whole amino acid sequences.(CPPC) method is predicted by the full information of protein conformation, to each ammonia of antibody protein
The conformation of base acid sequence is predicted, and prediction result is all with PFSC (Protein Folding Shape
Code, PFSC) represent, form prediction bands of a spectrum.Determine that complementary antibody determines the fingerprint in area, the complementary determining region as the antibody
Full information conformation fingerprint.
The antibody variable region fingerprint database conformation of establishment will save as XML file format.Antibody variable region fingerprint number
It will include title, amino acid sequence, variable region fragment scope, full information conformation fingerprint according to the finger print information in storehouse.
In Fig. 1, the first row represents the complementary determining region amino acid sequence of antibody, and the corresponding structure that prediction obtains is shown below
As dactylogram band, the full information conformation fingerprint of the complementary determining region is represented.
Claims (5)
1. a kind of complementary antibody determines area's conformation fingerprint database, for each antibody protein, including title, amino acid sequence,
Complementary determining region segment ranges, full information conformation fingerprint totally four groups of data, described amino acid sequence and complementary determining region fragment
Scope, obtain, described full information conformation fingerprint, handle in two kinds of situation from albumen knowledge data base:
For the antibody protein of known three-dimensional structure, entire infrastructure data are obtained in albumen database, and with protein folding shape
Shape representation, then complementary antibody is determined that the collapsed shape code in area extracts, the full information in area is determined as the complementary antibody
Conformation fingerprint;
For the antibody protein of unknown structure, the complementary antibody determines that the full information conformation fingerprint in area is to antibody complementary determining region
Three-dimensional conformation be predicted obtained conformation bands of a spectrum;
Determine that area predicts obtained conformation bands of a spectrum for the complementary antibody of unknown structure, obtained by following process:
1) 5 amino acid are arbitrarily extracted from all 20 amino acid, forms sum as 3,200,000 different arrangements, often
The possibility folded conformation of one arrangement obtains from global Protein Data Bank, then with protein folding shape representation;Create
One database is named as 5AAPFSC to collect above-mentioned arrangement and its corresponding protein folding shape code, the database;
2) protein in area is determined for complementary antibody, along amino acid sequence, since N- ends, is progressively moved to C- ends, according to
Secondary to read every 5 continuous amino acid, its folded conformation that may have is directly obtained from 5AAPFSC databases, rolled over albumen
The character representation of folded shape code;The protein folding shape code corresponding to frequency of occurrences highest folded conformation in Protein Data Bank
Make number one, the high collapsed shape code of the frequency of occurrences second comes second, sequentially forms a row from top to bottom, until collecting
Untill completely, every 5 continuous amino acid have different number of folded conformation may;
3) the possible collapsed shape code of the whole of antigen complementary determining region forms an array, referred to as protein folding conformation bands of a spectrum,
Represent complementary determining region all possible folded conformations;For each site, pass through its whole possible collapsed shape code
Be substituted for each other, all possible conformation can be accurately obtained;The total number of possible conformation is that all every 5 amino acid may
The continued product of folded conformation number.
2. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that for unknown structure
Complementary antibody determine that the space conformation of conformation bands of a spectrum that area predicts to obtain is rolled over by each site frequency of occurrences highest
Folded structure shape code collectively forms.
3. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that for unknown structure
Complementary antibody determine that the space conformation of conformation bands of a spectrum that area predicts to obtain is high by each site frequency of occurrences second
Collapsed shape code is formed, and in the position without the second high-frequency conformation, is formed using frequency highest collapsed shape code as supplement.
4. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that for unknown structure
Complementary antibody determine that the space conformation of conformation bands of a spectrum that area predicts to obtain is high by each site frequency of occurrences the 3rd
Collapsed shape code forms, and in the position without the 3rd high-frequency conformation, is formed using frequency highest collapsed shape code as supplement.
5. complementary antibody according to claim 1 determines area's conformation fingerprint database, it is characterised in that the protein folding
Folded shape code, the vector of the collapsed shape of five continuous amino acid residues of corresponding 27 descriptions;The vector is by the following method
Structure:
A every five continuous amino acid) are taken in protein as an elementary cell;
B first dihedral angle in each elementary cell) is calculated, the dihedral angle is that the first, the second, the three amino acid determines
Plane with second, third, the angle of plane that determines of the 4th amino acid;The dihedral angle is a1, scope determined by a2, a3 it
One;
C second dihedral angle in each elementary cell) is calculated, the dihedral angle is second, third, the 4th amino acid determines
The angle for the plane that plane is with the three, the 4th, five amino acid determines, the dihedral angle is b1, scope determined by b2, b3 it
One;
D the extension distance between first and five amino acid in each elementary cell) is calculated, the distance of upholding is c1,
One of scope determined by c2, c3;
E) according to step B, the numerical value that C, D are obtained determines the vector of each elementary cell;
The a1 is from 0 °~130 °, and a2 is from 130 °~-130 °, and a3 is from -130 °~0 °;B1 from 0 °~130 °, b2 from 130 °~-
130 °, b3 is from -130 °~0 °;C1 is from 0~7.0 angstrom, and for c2 from 4.0~17 angstroms, c3 is more than 12 angstroms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710475559.3A CN107463793A (en) | 2017-06-21 | 2017-06-21 | Complementary antibody determines area's conformation fingerprint database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710475559.3A CN107463793A (en) | 2017-06-21 | 2017-06-21 | Complementary antibody determines area's conformation fingerprint database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107463793A true CN107463793A (en) | 2017-12-12 |
Family
ID=60544124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710475559.3A Pending CN107463793A (en) | 2017-06-21 | 2017-06-21 | Complementary antibody determines area's conformation fingerprint database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107463793A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101647022A (en) * | 2007-01-31 | 2010-02-10 | 桑迪亚医药技术(上海)有限责任公司 | Methods, systems, algorithyms and means for describing the possible conformations of actual and theoretical proteins and for evaluating actual and theoretical proteins with respect to folding, overall |
CN105205351A (en) * | 2015-09-25 | 2015-12-30 | 麦科罗医药科技(武汉)有限公司 | High-throughput retrieval method for drug targets |
CN105243292A (en) * | 2015-09-25 | 2016-01-13 | 麦科罗医药科技(武汉)有限公司 | Protein structure fingerprint database |
CN105260626A (en) * | 2015-09-25 | 2016-01-20 | 麦科罗医药科技(武汉)有限公司 | Complete prediction method for protein structure spatial conformation |
-
2017
- 2017-06-21 CN CN201710475559.3A patent/CN107463793A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101647022A (en) * | 2007-01-31 | 2010-02-10 | 桑迪亚医药技术(上海)有限责任公司 | Methods, systems, algorithyms and means for describing the possible conformations of actual and theoretical proteins and for evaluating actual and theoretical proteins with respect to folding, overall |
CN105205351A (en) * | 2015-09-25 | 2015-12-30 | 麦科罗医药科技(武汉)有限公司 | High-throughput retrieval method for drug targets |
CN105243292A (en) * | 2015-09-25 | 2016-01-13 | 麦科罗医药科技(武汉)有限公司 | Protein structure fingerprint database |
CN105260626A (en) * | 2015-09-25 | 2016-01-20 | 麦科罗医药科技(武汉)有限公司 | Complete prediction method for protein structure spatial conformation |
Non-Patent Citations (1)
Title |
---|
JIANFU ZHOU ET AL.: "Rapid search for tertiary fragments reveals protein sequence–structure relationships", 《PROTEIN SCIENCE》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jia et al. | Gabor cube selection based multitask joint sparse representation for hyperspectral image classification | |
Saha et al. | BcePred: prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties | |
Justino et al. | Reconstructing shredded documents through feature matching | |
Han et al. | A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC | |
Bu et al. | Wat: Finding top-k discords in time series database | |
Zhao et al. | Antibody-specified B-cell epitope prediction in line with the principle of context-awareness | |
CN109858477A (en) | The Raman spectrum analysis method of object is identified in complex environment with depth forest | |
CN103679192A (en) | Image scene type discrimination method based on covariance features | |
Nyborg et al. | Generalized classification of satellite image time series with thermal positional encoding | |
CN105678342B (en) | Corn seed hyperspectral image band selection method based on the joint degree of bias | |
CN112381144A (en) | Heterogeneous deep network method for non-European and European domain space spectrum feature learning | |
CN117292742A (en) | Anticancer peptide identification method and system | |
Rusakov et al. | Towards query-by-eXpression retrieval of cuneiform signs | |
Mirceva et al. | Efficient approaches for retrieving protein tertiary structures | |
CN107463793A (en) | Complementary antibody determines area's conformation fingerprint database | |
CN105260626B (en) | The full information Forecasting Methodology of protein structure space conformation | |
CN107451421A (en) | epitope conformation fingerprint database | |
Einav et al. | Quantitatively visualizing bipartite datasets | |
Zhang et al. | DeepANIS: Predicting antibody paratope from concatenated CDR sequences by integrating bidirectional long-short-term memory and transformer neural networks | |
Tripathi et al. | TemPred: A novel protein template search engine to improve protein structure prediction | |
CN1889086A (en) | Cross reaction antigen computer-aided screening method | |
Tonnelier et al. | Machine learning of generic reactions: 3. an efficient algorithm for maximal common substructure determination | |
Rathod et al. | An extensive review of deep learning driven remote sensing image classification models | |
Poona et al. | Reducing hyperspectral data dimensionality using random forest based wrappers | |
Benros et al. | Analyzing the sequence–structure relationship of a library of local structural prototypes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171212 |