CN112397138A - AI technology-based method for drawing strain protein two-dimensional spectrum - Google Patents
AI technology-based method for drawing strain protein two-dimensional spectrum Download PDFInfo
- Publication number
- CN112397138A CN112397138A CN202010995311.1A CN202010995311A CN112397138A CN 112397138 A CN112397138 A CN 112397138A CN 202010995311 A CN202010995311 A CN 202010995311A CN 112397138 A CN112397138 A CN 112397138A
- Authority
- CN
- China
- Prior art keywords
- protein
- strain
- dimensional spectrum
- music
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 199
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 199
- 238000001228 spectrum Methods 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 42
- 150000001413 amino acids Chemical class 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 11
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 9
- 238000000547 structure data Methods 0.000 claims description 8
- 101001065501 Escherichia phage MS2 Lysis protein Proteins 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 108700010904 coronavirus proteins Proteins 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000003042 antagnostic effect Effects 0.000 claims description 2
- 238000011160 research Methods 0.000 abstract description 8
- 238000004458 analytical method Methods 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 5
- 241000711573 Coronaviridae Species 0.000 description 4
- 241000700605 Viruses Species 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012514 protein characterization Methods 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 108010067390 Viral Proteins Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000018044 dehydration Effects 0.000 description 1
- 238000006297 dehydration reaction Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention discloses a method for drawing a strain protein two-dimensional spectrum based on AI technology, which aims at the characteristics of strain protein sequences, structures and music in expression form, and realizes a method for generating a two-dimensional music score from the strain protein structures based on AI technology, thereby establishing the one-to-one corresponding relation between the strain protein sequences and the music to assist the analysis and research of the strain proteins. After the strain protein is expressed in a two-dimensional spectrum manner, when the strain protein is researched, the difference of different strain proteins can be visually and intuitively seen through the two-dimensional spectrum, the two-dimensional spectrum can also be played into music, the difference of different strain proteins can be perceived in an auditory sense, and a novel method is provided for the research of the strain protein.
Description
Technical Field
The invention relates to the technical field of artificial intelligence application, in particular to a method for drawing a strain protein two-dimensional spectrum based on an AI (artificial intelligence) technology.
Background
In the field of life sciences, AI technology has also gradually opened an irreplaceable position for data analysis. Protein as an important component of a living body has sequence diversity and functional structure complexity, so that protein research still remains a life field which is difficult to completely overcome by scientists.
At present, whether the protein characterization is performed in other forms or not mainly by the amino acid sequence, the spatial structure and the like of the protein is adopted to improve the visualization effect of the protein, so that the protein characterization is convenient to analyze and becomes the focus of research of people.
Disclosure of Invention
In view of the above, the invention provides a method for drawing a two-dimensional spectrum of strain proteins based on an AI technology, wherein the strain proteins are characterized in a two-dimensional spectrum form by the AI technology, and different strain proteins correspond to different music while the visualization effect of the proteins is increased, so as to assist the analysis and research of the proteins in the visual and auditory aspects.
The technical scheme provided by the invention is specifically a method for drawing a strain protein two-dimensional spectrum based on AI technology, which is characterized by comprising the following steps:
s1: acquiring a primary structure and a secondary structure of a strain protein sample;
s2: regarding amino acid sequences in the primary structure of the strain protein sample as linear arrangement to form one-dimensional single-channel data;
s3: projecting four main chain atoms in a secondary structure of the strain protein sample in each coordinate system of a three-dimensional space to form three-channel data of main chain skeleton atoms;
s4: constructing a protein generation two-dimensional spectrum model based on a generation type countermeasure network, adopting a plurality of strain protein samples, respectively taking a music style and a protein sequence as constraint conditions, and training the protein generation two-dimensional spectrum model to obtain model parameters;
s5: under the model parameters obtained in step S4, a two-dimensional spectrum of strain proteins is drawn using the protein generation two-dimensional spectrum model.
Preferably, in step S2, regarding the amino acid sequences in the primary structure of the strain protein sample as linear arrangement, forming one-dimensional single-channel data, specifically:
setting the values of 20 amino acids forming the protein as s according to the value range of the image gray value of 0-2551~s20;
And forming one-dimensional single-channel data according to the sequence of the amino acids in the strain protein sample and the corresponding numerical values of the amino acids.
Further preferably, in step S3, the projection of four main chain atoms in the secondary structure of the strain protein sample in each coordinate system of the three-dimensional space forms three-channel data of main chain skeleton atoms, specifically:
setting a main chain amino acid skeleton atom C according to the value range of the image gray value of 0-255αThe values of C, N, O are k1、k2、k3And k4;
And projecting the four main chain atoms in each coordinate system of the three-dimensional space to form a three-channel distribution gray image of the main chain skeleton atoms, wherein the data is three-channel data.
Further preferably, the plurality of strain protein samples comprise: natural strain protein samples and strain protein samples with increased production-type antagonistic networks.
Further preferably, in step S4, the generating a two-dimensional spectrum model based on the protein constructed by the generative confrontation network includes:
a two-dimensional spectrum generator, a music generation discriminator, a music style discriminator, a protein inverse generator and a protein discriminator;
the training process of the protein generation two-dimensional spectrum model comprises the following steps:
s401: inputting single-channel data of a primary structure in a strain protein sample, three-channel data of a secondary structure in the strain protein sample and single-channel data of music style constraint into a two-dimensional spectrum generator to generate a two-dimensional spectrum and output a music work;
s402: judging the difference between the music generated by the two-dimensional spectrum generator and the real music through the music discriminator;
s403: judging whether the generated music accords with the specified style constraint or not by the music style discriminator;
s404: adjusting model parameters corresponding to the two-dimensional spectrum generator and the music discriminator according to discrimination results of the step S402 and the step S403 until the model parameters meet the threshold requirement;
s405: generating, by the protein reverse generator, an artificial protein sequence with the two-dimensional spectrum generated by the two-dimensional spectrum generator and a protein sequence constraint as its inputs;
s406: and (3) distinguishing the difference between the artificial protein sequence and the real protein sequence through the protein discriminator, if the difference exceeds a threshold value, adjusting model parameters corresponding to the two-dimensional spectrum generator and the music discriminator, and repeating the steps S401 to S405 until the difference between the artificial protein sequence and the real protein sequence meets the threshold value requirement.
Further preferably, the first layer of the two-dimensional spectrum generator is a mixed convolution layer composed of one channel and three channels, and different lines are used for performing convolution processing on input data, wherein for input protein primary structure data, according to amino acid distribution characteristics, 20 types of 3 × 1 convolution kernels are used, and the step length is 3; for input protein secondary structure data, 4 types of 3 × 3 × 3 convolution kernels are set corresponding to main framework atoms according to the three-dimensional distribution characteristics of amino acids, and the step length is 3; for the music style constraint data, adopting m 3 × 1 convolution kernels corresponding to the music style constraint, wherein the step length is 1;
in the middle level, a CycleGAN model is referred, but various characteristics are reserved for the maximum program, a pooling layer is not adopted, and an LReLu function is adopted for each layer of activation function;
and the output layer is synthesized by adopting Softmax, and the two-dimensional spectrum drawing is completed.
Further preferably, the objective function of the protein generating two-dimensional spectrum model is:
T=min(G+L1+L2+F+L3);
wherein G is an objective function of a two-dimensional spectrum generator (G1):
G(X1,C)=max(Ep[fp(X1,C)]);
L1generating an objective function of an arbiter (D1) for music:
L1(D,G)=minGmaxD(Ex[log(D(X2,c))]+Ey[log(1-D(G(X1,G)))]);
L2for the objective function of the music style discriminator (D2):
L2(Y,C)=max(EP[fp(Y,C)]);
f is the objective function of the protein inverse generator (F1):
F(Y,L)=max(Ep[fp(Y,L,X3)]);
L3for the objective function of the protein discriminator (D3):
L3(Z,X1)=max(Ep[fp(Z,X1)]);
wherein, X1 is protein one, secondary structure mixed channel data, Z is artificial protein sequence generated by F1, X2 is real existing music data, X3 is discrimination result of Z, C music style constraint, L protein sequence characteristic constraint, and two-dimensional spectrum music data generated by Y two-dimensional spectrum generator G1.
Further preferably, the strain protein is a novel coronavirus protein.
The method for drawing the strain protein two-dimensional spectrum based on the AI technology realizes a method for generating a two-dimensional music score from a strain protein structure based on the AI technology aiming at the characteristics of the strain protein sequence, the structure and the music in the expression form, thereby establishing the one-to-one correspondence relationship between the strain protein sequence and the music to assist the analysis and research of the strain protein. After the strain protein is expressed in a two-dimensional spectrum manner, when the strain protein is researched, the difference of different strain proteins can be visually and intuitively seen through the two-dimensional spectrum, the two-dimensional spectrum can also be played into music, the difference of different strain proteins can be perceived in an auditory sense, and a novel method is provided for the research of the strain protein.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as disclosed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a block flow diagram of a method for drawing a strain protein two-dimensional spectrum based on AI technology according to an embodiment of the disclosure;
FIG. 2 is a schematic diagram of a specific flow chart of a method for drawing a strain protein two-dimensional spectrum based on AI technology, provided by the disclosed embodiment of the invention;
FIG. 3 is a model diagram of a two-dimensional spectrum generated based on a protein constructed by a generative countermeasure network in a method for drawing a strain protein two-dimensional spectrum based on AI technology provided by the disclosed embodiment of the invention;
fig. 4 is a training flowchart of a two-dimensional spectrum model generated for a protein in a method for drawing a strain protein two-dimensional spectrum based on an AI technique according to an embodiment of the disclosure.
Fig. 5 is a schematic model structure diagram of a two-dimensional spectrum generator G1 in a method for drawing a strain protein two-dimensional spectrum based on an AI technique according to an embodiment of the disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of methods consistent with certain aspects of the invention, as detailed in the appended claims.
In order to realize characterization of strain proteins from another perspective to assist in protein analysis and study, the present embodiment provides a method for drawing a two-dimensional spectrum of strain proteins based on AI technology, wherein the basic constituent elements of strain proteins are generally 20 amino acids, and the basic constituent units of music are seven scales, which can match the basic elements by designing mapping methods although the number of the basic elements is different.
On the basis that 20 amino acids are combined according to different arrangements to form a primary structure, the protein can also construct various spatial conformations through covalent bonds and non-covalent bonds to form biological macromolecules with various shapes and functions. The music forms basic tunes on the basis of different scale permutation and combination. And then integrating the synthesis and regulation of rhythm, harmony, dynamics, tone, curved style, texture and tone to form styles and melodies with different characteristics, thereby giving people different sensory experiences.
Aiming at the characteristics of the protein sequence, the structure of the strain and the representation form of music, the AI technology can be used for realizing the method for generating the two-dimensional score from the protein structure of the strain, thereby establishing the relationship between the protein sequence of the new coronavirus and the music.
The method for drawing the strain protein two-dimensional spectrum based on the AI technology mainly comprises the following steps: establishing a training and testing data set for generating a two-dimensional music score by using viral proteins, and aiming at the shortage of the strain protein sample amount, increasing the viral samples by using a generating type countermeasure network; designing a mapping relation based on an amino acid structure and music reconconformation, establishing the relevance of different expression results, and establishing a two-dimensional spectrum music generation method based on a generative confrontation network technology; the primary and secondary structure of protein and music style are used as constraint and input into generator; generating a two-dimensional spectrum which accords with the specific lewy wind, wherein the two-dimensional spectrum can generate proteins which accord with the structure of the new coronavirus through a protein generator; the generated music is taken as a new input and is respectively sent to a music discriminator and a protein generator, the music discriminator is used for judging whether the generated music accords with the composition rule, and the protein generator is used for generating a protein-like secondary and tertiary structure and comparing the protein-like secondary and tertiary structure with the original protein to ensure the relevance of the music and the protein.
Referring to fig. 1, an overall framework flow of a method for drawing a two-dimensional spectrum of a strain protein based on an AI technology is shown in fig. 2, under the guidance of the framework flow, the method for drawing a two-dimensional spectrum of a strain protein based on an AI technology provided in the present embodiment specifically includes the following steps:
s1: acquiring a primary structure and a secondary structure of a strain protein sample;
s2: regarding amino acid sequences in the primary structure of the strain protein sample as linear arrangement to form one-dimensional single-channel data;
s3: projecting four main chain atoms in a secondary structure of the strain protein sample in each coordinate system of a three-dimensional space to form three-channel data of main chain skeleton atoms;
s4: constructing a protein generation two-dimensional spectrum model based on a generation type countermeasure network, adopting a plurality of strain protein samples, respectively taking a music style and a protein sequence as constraint conditions, and training the protein generation two-dimensional spectrum model to obtain model parameters;
s5: under the model parameters obtained in step S4, a two-dimensional spectrum of strain proteins is drawn using the protein generation two-dimensional spectrum model.
In the above method, the primary structure of the strain protein sample, the secondary structure of the strain protein sample, the musical style constraint and the protein sequence constraint are used as input data.
Wherein the content of the first and second substances,
primary structure data of strain protein samples: the method takes the primary amino acid sequence of protein data as input, and takes the amino acid sequence in the primary structure of the protein as linear arrangement because the primary structure of the protein is formed by the dehydration of amino acid to form front and back linear connection, and the data forms one-dimensional single-channel data, namely one-dimensional data. Setting the values of 20 amino acids forming the protein as s according to the value range of the image gray value of 0-2551~s20。
Secondary structure data of strain protein samples: the protein sequence secondary structure is in three-dimensional space distribution due to the existence of alpha helix and beta folding and the connection of a plurality of times of linked amino acids on the main chain. Neglecting secondary link amino acids with small influence on spatial characteristics, and taking values according to image gray values of 0-255Range, set up as backbone amino acid skeleton atom CαThe values of C, N, O are k1、k2、k3And k4Each represents k1%、k2%、k3% and k4% black, the projection of four main chain atoms in each coordinate system of the three-dimensional space forms a three-channel distribution gray image of the main chain skeleton atoms, and the data is three-channel data, namely three-dimensional data.
Constraint conditions of music style: the tone, chord and rhythm of different specific music styles are combined into music style constraint according to the categories of music works, the rules of music creation and the like.
Protein sequence constraints: the comprehensive characteristics of one-dimensional and three-dimensional property abstractions of the amino acid sequences of the primary and secondary structures of the new coronavirus are used as protein sequence constraints of the model.
Because a large amount of sample data of strain proteins are needed in the training process of generating a two-dimensional spectrum model of proteins, and for some viruses, the training requirements cannot be met due to the small sample size, for example: the new coronavirus can be amplified through a strain protein sample in an artificial mode to meet the training requirement of a model, multiple amplification methods can be selected for the strain protein sample, and the strain protein sample is increased by adopting a deep learning mode of a generative confrontation network in the scheme.
Referring to fig. 3, for the protein generation two-dimensional spectrum model constructed based on the generative confrontation network provided in the present embodiment, the model is based on the generative confrontation model, and a protein-to-music generation model is designed by taking a strain protein generation two-dimensional score and a specific style of music as research objects, and the model includes: the two-dimensional spectrum generator G1, the music generation discriminator D1, the music style discriminator D2, the protein inverse generator F1 and the protein discriminator D3 input the primary and secondary structure X1 of the input protein, the music style constraint C and the protein sequence constraint L, and then the two-dimensional spectrum can be generated and the musical composition can be output.
The training process for generating the two-dimensional spectral model for the above-mentioned proteins, see fig. 4, includes:
s401: inputting mixed channel data X1 formed by single channel data of a primary structure in a strain protein sample and three channel data of a secondary structure in the strain protein sample and single channel data C constrained by music style into a two-dimensional spectrum generator G1 to generate a two-dimensional spectrum and output a music piece;
s402: judging the difference between the music generated by the two-dimensional spectrum generator G1 and the real music through a music discriminator D1;
s403: judging whether the generated music accords with the specified style constraint or not through a music style discriminator D2, and controlling the music style generated by G1 to be consistent with the specific virus disease sequence;
s404: according to the judgment results of the step S402 and the step S403, adjusting model parameters corresponding to the two-dimensional spectrum generator G1 and the music discriminator D1 until the model parameters meet the threshold requirement;
s405: generating an artificial protein sequence X3 by a protein reverse generator F1 with the two-dimensional spectrum generated by the two-dimensional spectrum generator G1 and the protein sequence constraint L as its inputs;
s406: and (3) distinguishing the difference between the artificial protein sequence X3 and the real protein sequence X1 through a protein discriminator D3, if the difference exceeds a threshold value, adjusting model parameters corresponding to a two-dimensional spectrum generator G1 and a music discriminator D1, and then repeating the steps S401 to S405 until the difference between the artificial protein sequence X3 and the real protein sequence X1 meets the threshold value requirement.
The input data of the two-dimensional spectrum generator G1 includes: the single-channel data composed of mixed channel data X1 composed of single-channel data of a primary structure of protein and three-channel data generated by a secondary structure and music style constraint C, namely X1+ C, constitutes mixed multi-channel data.
Referring to fig. 5, a schematic diagram of a model structure of a two-dimensional spectrum generator G1 is shown, where a first layer of the two-dimensional spectrum generator G1 is a mixed convolution layer composed of one channel and three channels, and corresponds to a primary protein structure, a music style constraint, and a secondary protein space structure, respectively, to collect a primary virus protein feature, and to perform convolution processing on input data by using different lines according to data, where, to extract protein amino acid features, for the input primary protein structure data, 20 types of 3 × 1 convolution kernels are used according to amino acid distribution characteristics, and a step length is 3; for input protein secondary structure data, 4 types of 3 × 3 × 3 convolution kernels are set corresponding to main framework atoms according to the three-dimensional distribution characteristics of amino acids, and the step length is 3; for the music style constraint data, adopting m 3 × 1 convolution kernels corresponding to the music style constraint, wherein the step length is 1;
in the middle level, a CycleGAN model is referred, but various characteristics are reserved for the maximum program, a pooling layer is not adopted, and an LReLu function is adopted for each layer of activation function;
and the output layer is synthesized by adopting Softmax, and the two-dimensional spectrum drawing is completed.
The objective function of the protein generating two-dimensional spectrum model is as follows:
T=min(G+L1+L2+F+L3);
wherein G is an objective function of a two-dimensional spectrum generator (G1):
G(X1,C)=max(Ep[fp(X1,C)]);
L1generating an objective function of an arbiter (D1) for music:
L1(D,G)=minGmaxD(Ex[log(D(X2,c))]+Ey[log(1-D(G(X1,G)))]);
L2for the objective function of the music style discriminator (D2):
L2(Y,C)=max(EP[fp(Y,C)]);
f is the objective function of the protein inverse generator (F1):
F(Y,L)=max(Ep[fp(Y,L,X3)]);
L3for the objective function of the protein discriminator (D3):
L3(Z,X1)=max(Ep[fp(Z,X1)]);
wherein, X1 is protein one, secondary structure mixed channel data, Z is artificial protein sequence generated by F1, X2 is real existing music data, X3 is discrimination result of Z, C music style constraint, L protein sequence characteristic constraint, and two-dimensional spectrum music data generated by Y two-dimensional spectrum generator G1.
The method for drawing the strain protein two-dimensional spectrum based on the AI technology provided by the embodiment is particularly suitable for researching and using new coronavirus proteins.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the present invention is not limited to what has been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (8)
1. A method for drawing a strain protein two-dimensional spectrum based on AI technology is characterized by comprising the following steps:
s1: acquiring a primary structure and a secondary structure of a strain protein sample;
s2: regarding amino acid sequences in the primary structure of the strain protein sample as linear arrangement to form one-dimensional single-channel data;
s3: projecting four main chain atoms in a secondary structure of the strain protein sample in each coordinate system of a three-dimensional space to form three-channel data of main chain skeleton atoms;
s4: constructing a protein generation two-dimensional spectrum model based on a generation type countermeasure network, adopting a plurality of strain protein samples, respectively taking a music style and a protein sequence as constraint conditions, and training the protein generation two-dimensional spectrum model to obtain model parameters;
s5: under the model parameters obtained in step S4, a two-dimensional spectrum of strain proteins is drawn using the protein generation two-dimensional spectrum model.
2. The AI-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 1, wherein in step S2, the amino acid sequences in the primary structure of the strain protein sample are regarded as linear arrangement to form one-dimensional single-channel data, specifically:
setting the values of 20 amino acids forming the protein as s according to the value range of the image gray value of 0-2551~s20;
And forming one-dimensional single-channel data according to the sequence of the amino acids in the strain protein sample and the corresponding numerical values of the amino acids.
3. The AI-technology-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 1, wherein in step S3, the projections of four main chain atoms in the secondary structure of the strain protein sample in each coordinate system of a three-dimensional space form three-channel data of main chain skeleton atoms, specifically:
setting a main chain amino acid skeleton atom C according to the value range of the image gray value of 0-255αThe values of C, N, O are k1、k2、k3And k4;
And projecting the four main chain atoms in each coordinate system of the three-dimensional space to form a three-channel distribution gray image of the main chain skeleton atoms, wherein the data is three-channel data.
4. The AI-based technique for drawing a two-dimensional spectrum of strain proteins as recited in claim 1, wherein the plurality of strain protein samples comprise: natural strain protein samples and strain protein samples with increased production-type antagonistic networks.
5. The AI-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 1, wherein in step S4, the generation of the two-dimensional spectrum model based on the protein constructed by the generative confrontation network comprises:
a two-dimensional spectrum generator (G1), a music generation discriminator (D1), a music style discriminator (D2), a protein inverse generator (F1), and a protein discriminator (D3);
the training process of the protein generation two-dimensional spectrum model comprises the following steps:
s401: inputting single-channel data of a primary structure in a strain protein sample, three-channel data of a secondary structure in the strain protein sample and single-channel data of music style constraint into a two-dimensional spectrum generator (G1), generating a two-dimensional spectrum, and outputting a musical piece;
s402: determining, by the music discriminator (D1), a difference between the music generated by the two-dimensional spectrum generator (G1) and real music;
s403: determining, by the music style discriminator (D2), whether the generated music complies with a specified style constraint;
s404: according to the discrimination results of the step S402 and the step S403, adjusting model parameters corresponding to the two-dimensional spectrum generator (G1) and the music discriminator (D1) until the model parameters meet the threshold requirement;
s405: generating, by the protein reverse generator (F1), an artificial protein sequence (X3) with the two-dimensional spectrum generated by the two-dimensional spectrum generator (G1) and a protein sequence constraint (L) as inputs thereof;
s406: and (3) distinguishing the difference between the artificial protein sequence (X3) and the real protein sequence (X1) through the protein discriminator (D3), if the difference exceeds a threshold value, adjusting model parameters corresponding to the two-dimensional spectrum generator (G1) and the music discriminator (D1), and then repeating the steps S401 to S405 until the difference between the artificial protein sequence (X3) and the real protein sequence (X1) meets the threshold value requirement.
6. The AI-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 5, wherein the first layer of the two-dimensional spectrum generator (G1) is a mixed convolution layer composed of one channel and three channels, and different circuits are used for convolution processing of input data, wherein for input primary structure data of proteins, 20 types of 3 × 1 convolution kernels are used according to amino acid distribution characteristics, and the step length is 3; for input protein secondary structure data, 4 types of 3 × 3 × 3 convolution kernels are set corresponding to main framework atoms according to the three-dimensional distribution characteristics of amino acids, and the step length is 3; for the music style constraint data, adopting m 3 × 1 convolution kernels corresponding to the music style constraint, wherein the step length is 1;
in the middle level, a CycleGAN model is referred, but various characteristics are reserved for the maximum program, a pooling layer is not adopted, and an LReLu function is adopted for each layer of activation function;
and the output layer is synthesized by adopting Softmax, and the two-dimensional spectrum drawing is completed.
7. The AI-based technique for creating a two-dimensional spectrum of proteins from a strain of claim 5, wherein the objective function of the two-dimensional spectrum model generated by the proteins is as follows:
T=min(G+L1+L2+F+L3);
wherein G is an objective function of a two-dimensional spectrum generator (G1):
G(X1,C)=max(Ep[fp(X1,C)]);
L1generating an objective function of an arbiter (D1) for music:
L1(D,G)=minGmaxD(Ex[log(D(X2,c))]+Ey[log(1-D(G(X1,G)))]);
L2for the objective function of the music style discriminator (D2):
L2(Y,C)=max(EP[fp(Y,C)]);
f is the objective function of the protein inverse generator (F1):
F(Y,L)=max(Ep[fp(Y,L,X3)]);
L3for the objective function of the protein discriminator (D3):
L3(Z,X1)=max(Ep[fp(Z,X1)]);
wherein, X1 is protein one, secondary structure mixed channel data, Z is artificial protein sequence generated by F1, X2 is real existing music data, X3 is discrimination result of Z, C music style constraint, L protein sequence characteristic constraint, and two-dimensional spectrum music data generated by Y two-dimensional spectrum generator G1.
8. The AI-based technique for profiling two-dimensional strain proteins according to claim 1, wherein the strain proteins are novel coronavirus proteins.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010995311.1A CN112397138B (en) | 2020-09-21 | 2020-09-21 | Method for drawing two-dimensional spectrum of strain protein based on AI technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010995311.1A CN112397138B (en) | 2020-09-21 | 2020-09-21 | Method for drawing two-dimensional spectrum of strain protein based on AI technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112397138A true CN112397138A (en) | 2021-02-23 |
CN112397138B CN112397138B (en) | 2024-02-13 |
Family
ID=74596327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010995311.1A Active CN112397138B (en) | 2020-09-21 | 2020-09-21 | Method for drawing two-dimensional spectrum of strain protein based on AI technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112397138B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023048251A1 (en) * | 2021-09-27 | 2023-03-30 | 国立大学法人筑波大学 | Structure estimation program, structure estimation device, and structure estimation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994025860A1 (en) * | 1993-04-28 | 1994-11-10 | Immunex Corporation | Method and system for protein modeling |
CN110517730A (en) * | 2019-09-02 | 2019-11-29 | 河南师范大学 | A method of thermophilic protein is identified based on machine learning |
CN111242922A (en) * | 2020-01-13 | 2020-06-05 | 上海极链网络科技有限公司 | Protein image classification method, device, equipment and medium |
US20200273541A1 (en) * | 2019-02-27 | 2020-08-27 | The Regents Of The University Of California | Unsupervised protein sequence generation |
-
2020
- 2020-09-21 CN CN202010995311.1A patent/CN112397138B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994025860A1 (en) * | 1993-04-28 | 1994-11-10 | Immunex Corporation | Method and system for protein modeling |
US20200273541A1 (en) * | 2019-02-27 | 2020-08-27 | The Regents Of The University Of California | Unsupervised protein sequence generation |
CN110517730A (en) * | 2019-09-02 | 2019-11-29 | 河南师范大学 | A method of thermophilic protein is identified based on machine learning |
CN111242922A (en) * | 2020-01-13 | 2020-06-05 | 上海极链网络科技有限公司 | Protein image classification method, device, equipment and medium |
Non-Patent Citations (1)
Title |
---|
王佳;丁雄飞;: "数据挖掘在预测甲型流感病毒蛋白宿主偏好性中的应用研究", 数字技术与应用, no. 06 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023048251A1 (en) * | 2021-09-27 | 2023-03-30 | 国立大学法人筑波大学 | Structure estimation program, structure estimation device, and structure estimation method |
Also Published As
Publication number | Publication date |
---|---|
CN112397138B (en) | 2024-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110378985B (en) | Animation drawing auxiliary creation method based on GAN | |
Sauquet | A practical guide to molecular dating | |
Wang et al. | STL rapid prototyping bio-CAD model for CT medical image segmentation | |
JP4721052B2 (en) | Feature change image creation method, feature change image creation device, and feature change image creation program | |
CN113343705B (en) | Text semantic based detail preservation image generation method and system | |
RU2009115198A (en) | METHODS OF CHARACTERISTIC SELECTION USING BASED ON THE CLASSIFIER GROUP GENETIC ALGORITHMS | |
CN106023195A (en) | BP neural network image segmentation method and device based on adaptive genetic algorithm | |
Lasserre et al. | A neuron membrane mesh representation for visualization of electrophysiological simulations | |
WO2010005119A1 (en) | Method for creating living body data model, living body data model creating device, device for storing data structure of living body data model and living body data model, method for dispersing load of three-dimensional data model and three-dimensional data model load dispersion device | |
CN110162475A (en) | A kind of Software Defects Predict Methods based on depth migration | |
CN109598279A (en) | Based on the zero sample learning method for generating network from coding confrontation | |
CN110175168A (en) | A kind of time series data complementing method and system based on generation confrontation network | |
CN111275613A (en) | Editing method for generating confrontation network face attribute by introducing attention mechanism | |
CN110322398B (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
WO2021132099A1 (en) | Learning support device, learning device, learning support method, and learning support program | |
CN112397138B (en) | Method for drawing two-dimensional spectrum of strain protein based on AI technology | |
CN106530383B (en) | The facial rendering intent of face based on Hermite interpolation neural net regression models | |
CN110532291A (en) | Model conversion method and system between deep learning frame based on minimum Executing Cost | |
CN105069794B (en) | A kind of total blindness's stereo image quality evaluation method competed based on binocular | |
Sansalone et al. | Homo sapiens and Neanderthals share high cerebral cortex integration into adulthood | |
CN110796594B (en) | Image generation method, device and equipment | |
Selig et al. | Three‐dimensional geometric morphometric analysis of treeshrew (Scandentia) lower molars: Insight into dental variation and systematics | |
US11836936B2 (en) | Method for generating a digital data set representing a target tooth arrangement | |
Isnanto et al. | Fractal batik motifs generation using variations of parameters in julia set function | |
CN113111906B (en) | Method for generating confrontation network model based on condition of single pair image training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20231205 Address after: 363000 No. 36 straight street, Zhangzhou City, Fujian Province Applicant after: MINNAN NORMAL University Address before: 028000 No.22 Huolinhe street, Horqin district, Tongliao City, Inner Mongolia Autonomous Region Applicant before: Inner Mongolia University For The Nationalities |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |