CN110349628A - A kind of protein phosphorylation site recognition methods, system, device and storage medium - Google Patents
A kind of protein phosphorylation site recognition methods, system, device and storage medium Download PDFInfo
- Publication number
- CN110349628A CN110349628A CN201910569671.2A CN201910569671A CN110349628A CN 110349628 A CN110349628 A CN 110349628A CN 201910569671 A CN201910569671 A CN 201910569671A CN 110349628 A CN110349628 A CN 110349628A
- Authority
- CN
- China
- Prior art keywords
- amino acid
- phosphorylation site
- protein phosphorylation
- feature vector
- acid sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000009822 protein phosphorylation Effects 0.000 title claims abstract description 91
- 238000000034 method Methods 0.000 title claims abstract description 47
- 239000013598 vector Substances 0.000 claims abstract description 86
- 150000001413 amino acids Chemical class 0.000 claims abstract description 69
- 230000026731 phosphorylation Effects 0.000 claims abstract description 34
- 238000006366 phosphorylation reaction Methods 0.000 claims abstract description 34
- 238000007637 random forest analysis Methods 0.000 claims abstract description 29
- 238000004458 analytical method Methods 0.000 claims abstract description 8
- 125000003275 alpha amino acid group Chemical group 0.000 claims abstract 20
- 108090000623 proteins and genes Proteins 0.000 claims description 25
- 102000004169 proteins and genes Human genes 0.000 claims description 25
- NBIIXXVUZAFLBC-UHFFFAOYSA-N Phosphoric acid Chemical compound OP(O)(O)=O NBIIXXVUZAFLBC-UHFFFAOYSA-N 0.000 claims description 8
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 claims description 6
- 102000008297 Nuclear Matrix-Associated Proteins Human genes 0.000 claims description 5
- 108010035916 Nuclear Matrix-Associated Proteins Proteins 0.000 claims description 5
- 210000000299 nuclear matrix Anatomy 0.000 claims description 5
- 229910000147 aluminium phosphate Inorganic materials 0.000 claims description 4
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 229910052698 phosphorus Inorganic materials 0.000 claims description 3
- 239000011574 phosphorus Substances 0.000 claims description 3
- 230000008488 polyadenylation Effects 0.000 claims description 3
- 238000011161 development Methods 0.000 abstract description 5
- 230000007246 mechanism Effects 0.000 abstract description 5
- 238000011160 research Methods 0.000 abstract description 5
- 238000004364 calculation method Methods 0.000 abstract description 4
- 201000010099 disease Diseases 0.000 abstract description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 4
- 235000018102 proteins Nutrition 0.000 description 22
- 230000006870 function Effects 0.000 description 19
- 235000001014 amino acid Nutrition 0.000 description 16
- 229940024606 amino acid Drugs 0.000 description 16
- 238000012549 training Methods 0.000 description 10
- 238000012360 testing method Methods 0.000 description 9
- 125000000539 amino acid group Chemical group 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- QCVGEOXPDFCNHA-UHFFFAOYSA-N 5,5-dimethyl-2,4-dioxo-1,3-oxazolidine-3-carboxamide Chemical compound CC1(C)OC(=O)N(C(N)=O)C1=O QCVGEOXPDFCNHA-UHFFFAOYSA-N 0.000 description 3
- 102000002322 Egg Proteins Human genes 0.000 description 3
- 108010000912 Egg Proteins Proteins 0.000 description 3
- 102000003839 Human Proteins Human genes 0.000 description 3
- 108090000144 Human Proteins Proteins 0.000 description 3
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 3
- 235000003704 aspartic acid Nutrition 0.000 description 3
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 235000014103 egg white Nutrition 0.000 description 3
- 210000000969 egg white Anatomy 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 125000003607 serino group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C(O[H])([H])[H] 0.000 description 3
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 2
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 2
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 2
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 2
- BZQFBWGGLXLEPQ-UHFFFAOYSA-N O-phosphoryl-L-serine Natural products OC(=O)C(N)COP(O)(O)=O BZQFBWGGLXLEPQ-UHFFFAOYSA-N 0.000 description 2
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 235000004279 alanine Nutrition 0.000 description 2
- 230000008827 biological function Effects 0.000 description 2
- 235000018417 cysteine Nutrition 0.000 description 2
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 2
- 229950006137 dexfosfoserine Drugs 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 235000013922 glutamic acid Nutrition 0.000 description 2
- 239000004220 glutamic acid Substances 0.000 description 2
- BZQFBWGGLXLEPQ-REOHCLBHSA-N phosphoserine Chemical compound OC(=O)[C@@H](N)COP(O)(O)=O BZQFBWGGLXLEPQ-REOHCLBHSA-N 0.000 description 2
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 2
- ZKHQWZAMYRWXGA-KQYNXXCUSA-N Adenosine triphosphate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O ZKHQWZAMYRWXGA-KQYNXXCUSA-N 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 208000037273 Pathologic Processes Diseases 0.000 description 1
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 1
- 102000001253 Protein Kinase Human genes 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 229910021529 ammonia Inorganic materials 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 230000011712 cell development Effects 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 230000005754 cellular signaling Effects 0.000 description 1
- 238000006757 chemical reactions by type Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 229930182817 methionine Natural products 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004118 muscle contraction Effects 0.000 description 1
- 230000009054 pathological process Effects 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000004853 protein function Effects 0.000 description 1
- 108060006633 protein kinase Proteins 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a kind of protein phosphorylation site recognition methods, system, device and storage mediums, this method comprises: obtaining the amino acid sequence segments of protein phosphorylation site to be identified;Logical operation is carried out to the binary coding of amino acid in the amino acid sequence segments, obtains the logical binary feature vector of the amino acid sequence segments;According to preset kernel function, core principle component analysis is carried out to the logical binary feature vector, obtains core principle component logical binary feature vector;The core principle component logical binary feature vector is input in Random Forest model and is handled, the recognition result of the protein phosphorylation site is obtained.Theoretical calculation of the invention based on Random Forest model, it can rapidly and accurately identify a large amount of protein phosphorylation site information, and it is at low cost, facilitate the development of phosphorylation mechanism and phosphorylation and disease relationship research, protein phosphorylation site is widely applied and identifies field.
Description
Technical field
The present invention relates to protein phosphorylation site identification field more particularly to a kind of protein phosphorylation site identification sides
Method, system, device and storage medium.
Background technique
Protein is the agent and executor of life entity biological function, and the protein after gene expression is known as precursor egg
White, usually not no bioactivity, needing could become by a series of processing and modification has certain biological function
Protein.Protein phosphorylation, which refers to, is transferred to bottom the phosphate group of atriphos last bit under the action of protein kinase catalysis
Series reaction type on the specific amino of object protein, be processed after the protein translation being currently known it is a kind of most
Common modified types.Studies have shown that protein phosphorylation is in cell Proliferation, development, differentiation and apoptosis, cell signalling,
Important role is played during nervous activity, contraction of muscle, and metabolism and tumour generation etc., and is also to adjust
With the main mechanism of control protein function.Therefore, the identification of protein phosphorylation site multiplicity complicated for parsing life entity
Physiology and pathologic process, the researchs such as prevention, diagnosing and treating and the medicament research and development and design of disease have important work
With.With the rapid development of various high throughput sequencing technologies, the protein sequence data of magnanimity has been produced.But only know
Not very small amount of protein phosphorylation site information, greatly hinders the research of protein phosphorylation mechanism.And it is real
Proved recipe method identifies that phosphorylation site is usually time-consuming, laborious, and needs expensive cost.
Summary of the invention
In view of this, the purpose of the embodiment of the present invention is that providing a kind of protein phosphorylation site recognition methods, system, dress
It sets and storage medium.The recognition methods is based on theoretical calculation, can recognize a large amount of protein phosphorylation site information, efficiently, accurately,
It is at low cost.
In a first aspect, the embodiment of the invention provides a kind of protein phosphorylation site recognition methods, comprising the following steps:
Obtain the amino acid sequence segments of protein phosphorylation site to be identified;
Logical operation is carried out to the binary coding of amino acid in the amino acid sequence segments, obtains the amino acid sequence
The logical binary feature vector of column-slice section;
According to preset kernel function, core principle component analysis is carried out to the logical binary feature vector, obtain core it is main at
Divide logical binary feature vector;
The core principle component logical binary feature vector is input in Random Forest model and is handled, described in acquisition
The recognition result of protein phosphorylation site.
Preferably, the binary coding to amino acid in the amino acid sequence segments carries out logical operation, obtains
The logical binary feature vector of the amino acid sequence segments, comprising the following steps:
Logical AND operation is carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtains the first spy
Levy vector set;
Logic or operation are carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtain the second spy
Levy vector set;
Logic xor operation is carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtains third
Set of eigenvectors;
The first eigenvector collection, the second feature vector set and third feature vector set head and the tail are connected, are obtained
The logical binary feature vector of the amino acid sequence segments.
Preferably, described according to preset kernel function, core principle component analysis is carried out to the logical binary feature vector,
Obtain core principle component logical binary feature vector, comprising the following steps:
According to preset kernel function by logical binary maps feature vectors to higher dimensional space, high bit space nuclear moment is obtained
Battle array;
Calculate the characteristic value of the nuclear matrix and the feature vector of characteristic value;
It chooses in the characteristic value the corresponding feature vector of preceding k larger characteristic values and carries out head and the tail connection, obtain the ammonia
The corresponding core principle component logical binary feature vector of base acid sequence segment.
Preferably, the kernel function is gaussian kernel function.
Preferably, the Random Forest model before application, need to be by training and test, and detailed process includes following step
It is rapid:
According to the corresponding protein amino acid sequence of protein phosphorylation site in data bank, the protein phosphoric acid is obtained
Change the corresponding core principle component logical binary feature vector in site as input data positive sample, and by protein phosphorylation site
Information is as output data positive sample;
The corresponding protein amino acid sequence of nonprotein phosphorylation site is obtained according to data bank, obtains the non-protein
The corresponding core principle component logical binary feature vector of matter phosphorylation site is as input data negative sample, and by nonprotein phosphorus
Polyadenylation sites information is as output data negative sample;
Selected part input data positive sample, input data negative sample, output data positive sample and the negative sample of output data
This, is trained the Random Forest model;
Remaining input data positive sample, input data negative sample and corresponding output are chosen as a result, to described random gloomy
Woods model is tested.
Preferably, the nonprotein phosphorylation site, is prepared by the following:
All lysine residues are searched in the protein sequence where protein phosphorylation site;
It is protein phosphorylation site marked in data bank when determining the lysine residue not, then is labeled as non-egg
White matter phosphorylation site.
Second aspect, the embodiment of the invention provides a kind of protein phosphorylation site identifying systems, comprising:
Retrieval module, for obtaining the amino acid sequence segments of protein phosphorylation site to be identified;
Primary vector obtains module, carries out logic for the binary coding to amino acid in the amino acid sequence segments
Operation, obtains the logical binary feature vector of the amino acid sequence segments;
Secondary vector obtains module, for carrying out core to the logical binary feature vector according to preset kernel function
Principal component analysis obtains core principle component logical binary feature vector;
Identification module is carried out for the core principle component logical binary feature vector to be input in Random Forest model
Processing, obtains the recognition result of the protein phosphorylation site.
The third aspect, the embodiment of the invention provides a kind of protein phosphorylation site identification devices, comprising:
At least one processor;
At least one processor, for storing at least one program;
When at least one described program is executed by least one described processor, so that at least one described processor is realized
The protein phosphorylation site recognition methods.
Fourth aspect, the embodiment of the invention provides a kind of storage mediums, wherein it is stored with the executable instruction of processor,
The executable instruction of the processor is when executed by the processor for executing the protein phosphorylation site recognition methods.
5th aspect, the embodiment of the invention provides a kind of protein phosphorylation site identifying systems, including amino acid sequence
Column acquisition equipment and the computer equipment being connect with amino acid sequence acquisition equipment;Wherein,
The amino acid sequence acquires equipment, for acquiring the corresponding amino acid sequence of protein phosphorylation site to be identified
Segment;
The computer equipment includes:
At least one processor;
At least one processor, for storing at least one program;
When at least one described program is executed by least one described processor, so that at least one described processor is realized
The protein phosphorylation site recognition methods.
Implementing the present invention includes following the utility model has the advantages that the present invention is by the corresponding amino acid sequence piece of protein phosphorylation site
Section is converted into core principle component logical binary feature vector, then with Random Forest model to core principle component logical binary feature to
Amount is handled, to obtain the recognition result of protein phosphorylation site, which is based on theoretical calculation, can be quick
It accurately identifies a large amount of protein phosphorylation site information, and at low cost, facilitates phosphorylation mechanism and phosphorylation and disease
The development of relationship research.
Detailed description of the invention
Fig. 1 is a kind of step flow diagram of protein phosphorylation site recognition methods provided in an embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of protein phosphorylation site identifying system provided in an embodiment of the present invention;
Fig. 3 is a kind of structural block diagram of protein phosphorylation site identification device provided in an embodiment of the present invention;
Fig. 4 is the structural block diagram of another protein phosphorylation site identifying system provided in an embodiment of the present invention.
Specific embodiment
The present invention is described in further detail in the following with reference to the drawings and specific embodiments.In for the examples below
Number of steps is arranged only for the purposes of illustrating explanation, does not do any restriction to the sequence between step, each in embodiment
The execution sequence of step can be adaptively adjusted according to the understanding of those skilled in the art.
As shown in Figure 1, the embodiment of the invention provides a kind of protein phosphorylation site recognition methods comprising the step of
It is as follows:
S1, the corresponding amino acid sequence segments of protein phosphorylation site to be identified are obtained;
S2, logical operation is carried out to the corresponding binary coding of amino acid in the amino acid sequence segments, described in acquisition
The corresponding logical binary feature vector of amino acid sequence segments;
S3, core master is obtained to logical binary feature vector progress core principle component analysis according to preset kernel function
Ingredient logical binary feature vector;
S4, it the core principle component logical binary feature vector is input in Random Forest model handles, obtain
The recognition result of the protein phosphorylation site.
Specifically, amino acid name and corresponding binary system in the corresponding amino acid sequence segments of protein phosphorylation site
It encodes as follows:
Alanine A is indicated are as follows: [1 000000000000000000 0];
Cysteine C is indicated are as follows: [0 100000000000000000 0];
Aspartic acid D is indicated are as follows: [0 010000000000000000 0];
Glutamic acid E is indicated are as follows: [0 001000000000000000 0];
Phenylalanine F is indicated are as follows: [0 000100000000000000 0];
Glycine G is indicated are as follows: [0 000010000000000000 0];
Histidine H is indicated are as follows: [0 000001000000000000 0];
Isoleucine I is indicated are as follows: [0 000000100000000000 0];
Lysine K is indicated are as follows: [0 000000010000000000 0];
Leucine L is indicated are as follows: [0 000000001000000000 0];
Methionine M is indicated are as follows: [0 000000000100000000 0];
Aspartic acid N is indicated are as follows: [0 000000000010000000 0];
Proline P is indicated are as follows: [0 000000000001000000 0];
Glutamine Q is indicated are as follows: [0 000000000000100000 0];
Arginine R is indicated are as follows: [0 000000000000010000 0];
Serine S is indicated are as follows: [0 000000000000001000 0];
Threonine T is indicated are as follows: [0 000000000000000100 0];
Valine V is indicated are as follows: [0 000000000000000010 0];
Tryptophan W is indicated are as follows: [0 000000000000000001 0];
Tyrosine Y is indicated are as follows: [0 000000000000000000 1].
Logical operation includes logical AND operation, logic or operation and logic xor operation, and the rule of each logical operation is as follows:
Logical AND operation: And (0,0)=0;And (1,0)=0;And (0,1)=0;And (1,1)=1;
Logic or operation: Or (0,0)=0;Or (1,0)=1;Or (0,1)=1;Or (1,1)=1;
Logic xor operation: Xor (0,0)=0;Xor (1,0)=1;Xor (0,1)=1;Xor (1,1)=0.
Implementing the present invention includes following the utility model has the advantages that the present invention is by the corresponding amino acid sequence piece of protein phosphorylation site
Section is converted into core principle component logical binary feature vector, then with Random Forest model to core principle component logical binary feature to
Amount is handled, to obtain the recognition result of protein phosphorylation site, which is based on theoretical calculation, can be quick
It accurately identifies a large amount of protein phosphorylation site information, and at low cost, facilitates phosphorylation mechanism and phosphorylation and disease
The development of relationship research.
It is preferentially, described that logical operation is carried out to the corresponding binary coding of amino acid in the amino acid sequence segments,
Obtain the corresponding logical binary feature vector of the amino acid sequence segments, comprising the following steps:
Logical AND operation is carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtains the first spy
Levy vector set;
Logic or operation are carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtain the second spy
Levy vector set;
Logic xor operation is carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtains third
Set of eigenvectors;
The first eigenvector collection, the second feature vector set and third feature vector set head and the tail are connected, are obtained
The logical binary feature vector of the amino acid sequence segments.Specifically, by taking following amino acid sequence segments as an example: alanine
A, cysteine C, tyrosine Y, aspartic acid D and glutamic acid E, abbreviation ACYDE.The logical AND of amino acid sequence segments ACYDE
Operating process is as follows: A logically being carried out logical AND operation with the binary coding of C, Y, D and E respectively with the rule of operation;
C is subjected to logical AND operation with the binary coding of Y, D and E respectively;Y is subjected to logical AND with the binary coding of D and E respectively
Operation;Logical AND operation is carried out to the binary coding of amino acid D and E;The vector head and the tail that all logical ANDs operate are connected
It connects, the vector of binary features of composition logical AND operation.With the method for analogy, by logical operation rule to amino acid sequence segments
ACYDE carries out logic or operation and logic xor operation, obtain the binary features of logic or operation and logic xor operation to
Amount.
According to logical AND operation, logic or operation and the sequence of logic xor operation, by corresponding binary features
Vector head and the tail connect, and obtain the corresponding logical binary feature vector BFV of amino acid sequence segments ACYDEi=[0 10
0 ... 00 1], protein phosphorylation site amino acid sequence segments ACYDE is characterized.
Preferably, described according to preset kernel function, core principle component analysis is carried out to the logical binary feature vector,
Obtain core principle component logical binary feature vector, comprising the following steps:
According to preset kernel function by logical binary maps feature vectors to higher dimensional space, high bit space nuclear moment is obtained
Battle array;
Calculate the characteristic value and corresponding feature vector of the nuclear matrix;
It chooses the corresponding feature vector of preceding k the larger value in the characteristic value and carries out head and the tail connection, obtain the amino acid
The corresponding core principle component logical binary feature vector of sequence fragment, k is positive integer.
Preferably, the kernel function is gaussian kernel function.
Specifically, using preset kernel function Φ (x) by logical binary feature vector BFViIt is mapped in higher dimensional space:
BFVi→Φ(BFVi), then calculate nuclear matrix KMi,j=(Φ (BFVi),Φ(BFVj)), obtain centralization nuclear matrixWhereinAccording toCalculate feature
Value λiWith corresponding feature vector αi.Finally, characteristic value is arranged from big to small, by the corresponding feature of preceding k characteristic value to
Amount head and the tail connect, and form core principle component logical binary feature vector, characterize protein phosphorylation site sequence information.Kernel function
Using gaussian kernel function:
Preferably, the Random Forest model before application, need to be by training and test, and detailed process includes following step
It is rapid:
According to the corresponding protein amino acid sequence of protein phosphorylation site in data bank, the protein phosphoric acid is obtained
Change the corresponding core principle component logical binary feature vector in site as input data positive sample, and by protein phosphorylation site
Information is as output data positive sample;
The corresponding protein amino acid sequence of nonprotein phosphorylation site is obtained according to data bank, obtains the non-protein
The corresponding core principle component logical binary feature vector of matter phosphorylation site is as input data negative sample, and by nonprotein phosphorus
Polyadenylation sites information is as output data negative sample;
Selected part input data positive sample, input data negative sample and corresponding output are as a result, to the random forest
Model is trained;
Remaining input data positive sample, input data negative sample and corresponding output are chosen as a result, to described random gloomy
Woods model is tested.
Specifically, data bank includes database and various bibliography, the known protein phosphorylation of database purchase
Site information stores the amino acid residue that phosphorylation can occur.Protein in protein phosphorylation site database
Phosphorylation site information is the amino acid residue that phosphorylation truly occurs by experimental verification.Therefore, protein phosphorylation position
In point data base the corresponding core principle component logical binary feature vector of protein phosphorylation site information can be used as training and
The input data positive sample of Random Forest model is tested, corresponding site information can be used as training and test Random Forest model
Output data positive sample.
Nonprotein phosphorylation site information is not present in protein phosphorylation site database, as protein
Nonprotein phosphorylation site information except phosphorylation site database, each of which site information indicate that the amino acid cannot
Occur phosphorylation, the fact that also quantificational expression be nonprotein phosphorylation site information.Therefore, protein phosphorylation site number
According to the nonprotein phosphorylation site information except library, the output data for training and testing Random Forest model can be used as
Negative sample, corresponding nonprotein phosphorylation site core principle component logical binary feature vector can be used as training and
Test the input data negative sample of Random Forest model.
Random Forest model is trained and is tested using data positive sample and data negative sample, makes Random Forest model
Prediction result is more acurrate, closer with truth.
Protein phosphorylation site database is UniProtKB (The Universal Protein Resource
Knowledgebase) database is obtained by screening, and the specific method is as follows:
Human protein's sequence information is collected, i.e., is noted as the protein of Homo sapiens in database;
Collection include amino acid residue annotation be Phosphoserine protein sequence and site information, i.e., if
It is Phosphoserine that some amino acid residue in human protein annotates in UniProtKB database, then means
The amino acid residue is serine, and phosphorylation can occur;
The protein phosphorylation site information that amino acid residue annotation is ECO:0000250 is deleted, i.e., if the amino acid
Residue annotation be ECO:0000250, then mean the amino acid residue phosphorylation be obtained by sequence alignment of protein, and
It is not to be determined by specific experiment information.Therefore, in order to guarantee the reliabilities of positive sample data, the amino with the annotation is deleted
Acid phosphoric acid site information.
Preferably, the nonprotein phosphorylation site, is prepared by the following:
All lysine residues are searched in the protein sequence where protein phosphorylation site;
It is protein phosphorylation site marked in data bank when determining the lysine residue not, then is labeled as non-egg
White matter phosphorylation site.
Specifically, randomly choose collection includes the human protein sequence data of phosphorylation site information, determines choosing
The serine residue location information of phosphorylation is deleted in the position of all serine residues in the protein sequence selected, remaining
As nonprotein phosphorylation site.
Preferably, the construction method of training set and test set uses five folding cross validation methods, specific training and test
The data set size and construction method of Random Forest model are as follows: will output and input data positive sample, output and input data
Negative sample stochastic averagina is divided into five equal portions, i.e., every portion accounts for the 20% of entire positive and negative sample data set;It selects at random a defeated
Enter and output and input data minus sample composition test set with output data positive sample and portion, the input of remaining quarter and
Output data positive sample and quarter output and input data minus sample composition training set;Repeat above-mentioned random election process
Five times, guarantee that every portion outputs and inputs data positive sample and every a output and output data negative sample is all selected work
Primary for test set, more times are selected as training set four times.Using prediction overall accuracy, sensibility, specificity, geneva related coefficient
With the parameter evaluations Random Forest model predictive ability such as Receiver operating curve's area.
In one embodiment, the number set in random forest is set as 100, the randomly selected spy of each node of tree
The square root round numbers that number is total number of features is levied, Random Forest model is constructed using five folding cross validation methods, is used in combination
Overall accuracy, sensibility, specificity, geneva related coefficient and the assessment Random Forest model prediction of Receiver operating curve's area
Ability.The results are shown in Table 1, and core principle component number is divided into 5 since 5, be up to 100, as core principle component number is into one
Step increases, and small range of fluctuation is presented in the assessment parameter such as prediction overall accuracy;It is random gloomy when core principle component number is set as 65
Woods model obtains highest total precision of prediction 93.21%, corresponding sensibility, specificity, geneva related coefficient and subject
Performance curve area is 96.82%, 89.61%, 0.8665 and 0.9835 respectively, all relatively high.Therefore, final random
The model that forest model constructs when using core principle component number as 65, for identification potential protein phosphorylation site.
Five folding cross validation results of Random Forest model when table 1, different IPs number of principal components
As described in Figure 2, a kind of protein phosphorylation site identifying system, comprising:
Retrieval module, for obtaining the amino acid sequence segments of protein phosphorylation site to be identified;
Primary vector obtains module, carries out logic for the binary coding to amino acid in the amino acid sequence segments
Operation, obtains the logical binary feature vector of the amino acid sequence segments;
Secondary vector obtains module, for carrying out core to the logical binary feature vector according to preset kernel function
Principal component analysis obtains core principle component logical binary feature vector;
Identification module is carried out for the core principle component logical binary feature vector to be input in Random Forest model
Processing, obtains the recognition result of the protein phosphorylation site.
As it can be seen that the content in above method embodiment, suitable for this system embodiment, this system embodiment institute is specific
The function of realization is identical as above method embodiment, and the beneficial effect reached and above method embodiment are achieved beneficial
Effect is also identical.
As described in Figure 3, the embodiment of the invention also provides a kind of protein phosphorylation site identification devices, comprising:
At least one processor;
At least one processor, for storing at least one program;
When at least one described program is executed by least one described processor, so that at least one described processor is realized
The protein phosphorylation site recognition methods.
As it can be seen that the content in above method embodiment, suitable for present apparatus embodiment, present apparatus embodiment institute is specific
The function of realization is identical as above method embodiment, and the beneficial effect reached and above method embodiment are achieved beneficial
Effect is also identical.
In addition, the embodiment of the invention also provides a kind of storage mediums, wherein being stored with the executable instruction of processor, institute
The executable instruction of processor is stated when executed by the processor for executing the protein phosphorylation site recognition methods.Together
Sample, for the content in above method embodiment suitable for this storage medium embodiment, this storage medium embodiment institute is specific
The function of realization is identical as above method embodiment, and the beneficial effect reached and above method embodiment are achieved beneficial
Effect is also identical.
As shown in figure 4, the embodiment of the invention also provides a kind of protein phosphorylation site identifying system, including amino acid
Sequence acquisition equipment and the computer equipment being connect with amino acid sequence acquisition equipment;Wherein,
The amino acid sequence acquires equipment, for acquiring the corresponding amino acid sequence of protein phosphorylation site to be identified
Segment;
The computer equipment includes:
At least one processor;
At least one processor, for storing at least one program;
When at least one described program is executed by least one described processor, so that at least one described processor is realized
The protein phosphorylation site recognition methods.
Specifically, the computer equipment can be different types of electronic equipment, including but not limited to there is desktop
The terminals such as brain, laptop computer.
As it can be seen that the content in above method embodiment, suitable for this system embodiment, this system embodiment institute is specific
The function of realization is identical as above method embodiment, and the beneficial effect reached and above method embodiment are achieved beneficial
Effect is also identical.
It is to be illustrated to preferable implementation of the invention, but the invention is not limited to the implementation above
Example, those skilled in the art can also make various equivalent variations on the premise of without prejudice to spirit of the invention or replace
It changes, these equivalent deformations or replacement are all included in the scope defined by the claims of the present application.
Claims (10)
1. a kind of protein phosphorylation site recognition methods, which comprises the following steps:
Obtain the amino acid sequence segments of protein phosphorylation site to be identified;
Logical operation is carried out to the binary coding of amino acid in the amino acid sequence segments, obtains the amino acid sequence piece
The logical binary feature vector of section;
According to preset kernel function, core principle component analysis is carried out to the logical binary feature vector, core principle component is obtained and patrols
Collect vector of binary features;
The core principle component logical binary feature vector is input in Random Forest model and is handled, the albumen is obtained
The recognition result of matter phosphorylation site.
2. protein phosphorylation site recognition methods according to claim 1, which is characterized in that described to the amino acid
The binary coding of amino acid carries out logical operation in sequence fragment, and the logical binary for obtaining the amino acid sequence segments is special
Levy vector, comprising the following steps:
Logical AND operation is carried out two-by-two to the binary coding of amino acid in the amino acid sequence segments, obtain fisrt feature to
Quantity set;
Logic or operation are carried out two-by-two to the binary coding of amino acid in the amino acid sequence segments, obtain second feature to
Quantity set;
Logic xor operation is carried out to the binary coding of amino acid in the amino acid sequence segments two-by-two, obtains third feature
Vector set;
The first eigenvector collection, the second feature vector set and third feature vector set head and the tail are connected, are obtained
The logical binary feature vector of the amino acid sequence segments.
3. protein phosphorylation site recognition methods according to claim 1, which is characterized in that described according to preset core
Function carries out core principle component analysis to the logical binary feature vector, obtains core principle component logical binary feature vector,
The following steps are included:
According to preset kernel function by logical binary maps feature vectors to higher dimensional space, high bit space nuclear matrix is obtained;
Calculate the characteristic value of the nuclear matrix and the feature vector of characteristic value;
It chooses in the characteristic value the corresponding feature vector of preceding k larger characteristic values and carries out head and the tail connection, obtain the amino acid
The core principle component logical binary feature vector of sequence fragment.
4. protein phosphorylation site recognition methods according to claim 3, which is characterized in that the kernel function is Gauss
Kernel function.
5. protein phosphorylation site recognition methods according to claim 1, which is characterized in that the Random Forest model
Construction method the following steps are included:
According to the corresponding protein amino acid sequence of protein phosphorylation site in data bank, the protein phosphorylation position is obtained
The corresponding core principle component logical binary feature vector of point is as input data positive sample, and by protein phosphorylation site information
As output data positive sample;
The corresponding protein amino acid sequence of nonprotein phosphorylation site is obtained according to data bank, obtains the nonprotein phosphorus
The corresponding core principle component logical binary feature vector of polyadenylation sites is as input data negative sample, and by nonprotein phosphorylation
Site information is as output data negative sample;
Selected part input data positive sample, input data negative sample, output data positive sample and output data negative sample, it is right
The Random Forest model is trained;
Remaining input data positive sample, input data negative sample and corresponding output are chosen as a result, to the random forest mould
Type is tested.
6. protein phosphorylation site recognition methods according to claim 5, which is characterized in that the nonprotein phosphoric acid
Change site, be prepared by the following:
All lysine residues are searched in the protein sequence where protein phosphorylation site;
It is protein phosphorylation site marked in data bank when determining the lysine residue not, then is labeled as nonprotein
Phosphorylation site.
7. a kind of protein phosphorylation site identifying system characterized by comprising
Retrieval module, for obtaining the amino acid sequence segments of protein phosphorylation site to be identified;
Primary vector obtains module, carries out logic behaviour for the binary coding to amino acid in the amino acid sequence segments
Make, obtains the logical binary feature vector of the amino acid sequence segments;
Secondary vector obtains module, for according to preset kernel function, to the logical binary feature vector carry out core it is main at
Analysis obtains core principle component logical binary feature vector;
Identification module, for the core principle component logical binary feature vector to be input in Random Forest model
Reason, obtains the recognition result of the protein phosphorylation site.
8. a kind of protein phosphorylation site identification device characterized by comprising
At least one processor;
At least one processor, for storing at least one program;
When at least one described program is executed by least one described processor, so that at least one described processor is realized as weighed
Benefit requires the described in any item protein phosphorylation site recognition methods of 1-6.
9. a kind of storage medium, wherein being stored with the executable instruction of processor, which is characterized in that the processor can be performed
Instruction is when executed by the processor for executing protein phosphorylation site identification side as claimed in any one of claims 1 to 6
Method.
10. a kind of protein phosphorylation site identifying system, which is characterized in that including amino acid sequence acquire equipment and with institute
State the computer equipment of amino acid sequence acquisition equipment connection;Wherein,
The amino acid sequence acquires equipment, for acquiring the corresponding amino acid sequence piece of protein phosphorylation site to be identified
Section;
The computer equipment includes:
At least one processor;
At least one processor, for storing at least one program;
When at least one described program is executed by least one described processor, so that at least one described processor is realized as weighed
Benefit requires the described in any item protein phosphorylation site recognition methods of 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910569671.2A CN110349628B (en) | 2019-06-27 | 2019-06-27 | Protein phosphorylation site recognition method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910569671.2A CN110349628B (en) | 2019-06-27 | 2019-06-27 | Protein phosphorylation site recognition method, system, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110349628A true CN110349628A (en) | 2019-10-18 |
CN110349628B CN110349628B (en) | 2021-06-15 |
Family
ID=68176723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910569671.2A Active CN110349628B (en) | 2019-06-27 | 2019-06-27 | Protein phosphorylation site recognition method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110349628B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489789A (en) * | 2020-04-21 | 2020-08-04 | 华中科技大学 | Method for improving mass spectrum phosphorylation modification site identification flux and accuracy |
CN111696621A (en) * | 2020-06-03 | 2020-09-22 | 广东药科大学 | Protein phosphorylation modification site-disease relation identification method, system, device and storage medium |
CN112489721A (en) * | 2020-11-25 | 2021-03-12 | 清华大学 | Mirror image protein information storage and coding technology |
CN114927165A (en) * | 2022-07-20 | 2022-08-19 | 深圳大学 | Method, device, system and storage medium for identifying ubiquitination sites |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710365A (en) * | 2009-12-14 | 2010-05-19 | 重庆大学 | Method for calculating and identifying protein kinase phosphorylation specific sites |
US20130280238A1 (en) * | 2012-04-24 | 2013-10-24 | Laboratory Corporation Of America Holdings | Methods and Systems for Identification of a Protein Binding Site |
CN103617203A (en) * | 2013-11-15 | 2014-03-05 | 南京理工大学 | Protein-ligand binding site predicting method based on inquiry drive |
CN105637097A (en) * | 2013-08-05 | 2016-06-01 | 特韦斯特生物科学公司 | De novo synthesized gene libraries |
CN105893787A (en) * | 2016-06-21 | 2016-08-24 | 南昌大学 | Prediction method for protein post-translational modification methylation loci |
CN106570336A (en) * | 2016-11-10 | 2017-04-19 | 中南大学 | Method and system for predicting the sulfenylation sulfur sites in cysteine |
CN107247873A (en) * | 2017-03-29 | 2017-10-13 | 电子科技大学 | A kind of recognition methods of differential methylation site |
CN107395196A (en) * | 2017-08-23 | 2017-11-24 | 郑州轻工业学院 | Matrix-vector multiplication double rail logic circuit and its method based on the compound strand displacements of DNA |
CN107463795A (en) * | 2017-08-02 | 2017-12-12 | 南昌大学 | A kind of prediction algorithm for identifying tyrosine posttranslational modification site |
CN107817466A (en) * | 2017-06-19 | 2018-03-20 | 重庆大学 | Based on the indoor orientation method for stacking limited Boltzmann machine and random forests algorithm |
-
2019
- 2019-06-27 CN CN201910569671.2A patent/CN110349628B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710365A (en) * | 2009-12-14 | 2010-05-19 | 重庆大学 | Method for calculating and identifying protein kinase phosphorylation specific sites |
US20130280238A1 (en) * | 2012-04-24 | 2013-10-24 | Laboratory Corporation Of America Holdings | Methods and Systems for Identification of a Protein Binding Site |
CN105637097A (en) * | 2013-08-05 | 2016-06-01 | 特韦斯特生物科学公司 | De novo synthesized gene libraries |
CN103617203A (en) * | 2013-11-15 | 2014-03-05 | 南京理工大学 | Protein-ligand binding site predicting method based on inquiry drive |
CN105893787A (en) * | 2016-06-21 | 2016-08-24 | 南昌大学 | Prediction method for protein post-translational modification methylation loci |
CN106570336A (en) * | 2016-11-10 | 2017-04-19 | 中南大学 | Method and system for predicting the sulfenylation sulfur sites in cysteine |
CN107247873A (en) * | 2017-03-29 | 2017-10-13 | 电子科技大学 | A kind of recognition methods of differential methylation site |
CN107817466A (en) * | 2017-06-19 | 2018-03-20 | 重庆大学 | Based on the indoor orientation method for stacking limited Boltzmann machine and random forests algorithm |
CN107463795A (en) * | 2017-08-02 | 2017-12-12 | 南昌大学 | A kind of prediction algorithm for identifying tyrosine posttranslational modification site |
CN107395196A (en) * | 2017-08-23 | 2017-11-24 | 郑州轻工业学院 | Matrix-vector multiplication double rail logic circuit and its method based on the compound strand displacements of DNA |
Non-Patent Citations (6)
Title |
---|
MD. MEHEDI HASAN 等: "Computational identifcation of microbial phosphorylation sites by the enhanced characteristics of sequence information", 《SCIENTIFIC REPORTS》 * |
ZIMO YIN 等: "New encoding schemes for prediction of protein phosphorylation sites", 《2012 IEEE 6TH INTERNATIONAL CONFERENCE ON SYSTEMS BIOLOGY》 * |
胡敏菁: "面向蛋白质功能位点识别的机器学习平台构建", 《万方数据库》 * |
胡青 等: "核主成分分析与随机森林相结合的变压器故障诊断方法", 《高压电技术》 * |
范自柱 著: "《新型特征抽取算法研究》", 31 December 2016, 中国科学技术大学出版社 * |
赵云彬 等: "DNA逻辑计算模型的研究现状与展望", 《HTTP://WWW.AROCMAG.COM/ARTICLE/02-2019-11-087.HTML》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489789A (en) * | 2020-04-21 | 2020-08-04 | 华中科技大学 | Method for improving mass spectrum phosphorylation modification site identification flux and accuracy |
CN111489789B (en) * | 2020-04-21 | 2021-10-15 | 华中科技大学 | Method for improving mass spectrum phosphorylation modification site identification flux and accuracy |
CN111696621A (en) * | 2020-06-03 | 2020-09-22 | 广东药科大学 | Protein phosphorylation modification site-disease relation identification method, system, device and storage medium |
CN111696621B (en) * | 2020-06-03 | 2023-03-31 | 广东药科大学 | Protein phosphorylation modification site-disease relation identification method, system, device and storage medium |
CN112489721A (en) * | 2020-11-25 | 2021-03-12 | 清华大学 | Mirror image protein information storage and coding technology |
CN112489721B (en) * | 2020-11-25 | 2021-11-12 | 清华大学 | Mirror image protein information storage and coding technology |
CN114927165A (en) * | 2022-07-20 | 2022-08-19 | 深圳大学 | Method, device, system and storage medium for identifying ubiquitination sites |
Also Published As
Publication number | Publication date |
---|---|
CN110349628B (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110349628A (en) | A kind of protein phosphorylation site recognition methods, system, device and storage medium | |
Hie et al. | Efficient integration of heterogeneous single-cell transcriptomes using Scanorama | |
Xu et al. | scIGANs: single-cell RNA-seq imputation using generative adversarial networks | |
Weber et al. | Comparison of clustering methods for high‐dimensional single‐cell flow and mass cytometry data | |
Shilov et al. | The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra | |
Jiang et al. | Bridging the information gap: computational tools for intermediate resolution structure interpretation | |
Swofford et al. | A method for the statistical interpretation of friction ridge skin impression evidence: method development and validation | |
CN110659207B (en) | Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration | |
Braytee et al. | Multi-label feature selection using correlation information | |
Courtney et al. | Shotgun correlations in software measures | |
CN110890137A (en) | Modeling method, device and application of compound toxicity prediction model | |
Kumar et al. | PRmePRed: A protein arginine methylation prediction tool | |
WO2015037003A1 (en) | Method and electronic nose for comparing odors | |
CN114420212A (en) | Escherichia coli strain identification method and system | |
Zhu et al. | Datr: Domain-adaptive transformer for multi-domain landmark detection | |
CN112966702A (en) | Method and apparatus for classifying protein-ligand complex | |
Moharekar et al. | Thyroid disease detection using machine learning and Pycaret | |
González Calabozo et al. | Gene Expression Array Exploration Using-Formal Concept Analysis | |
Que et al. | Evaluation of protein phosphorylation site predictors | |
TWI652481B (en) | Method for detecting drug resistance of microorganism | |
Baker et al. | Quality assurance and error identification for the Community Earth System Model | |
Lim et al. | JSOM: Jointly-evolving self-organizing maps for alignment of biological datasets and identification of related clusters | |
Cascitti et al. | RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing | |
Inhester | Mining of Interaction Geometries in Collections of Protein Structures | |
CN114496089B (en) | Pathogenic microorganism identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |