CN112397138A - AI technology-based method for drawing strain protein two-dimensional spectrum - Google Patents

AI technology-based method for drawing strain protein two-dimensional spectrum Download PDF

Info

Publication number
CN112397138A
CN112397138A CN202010995311.1A CN202010995311A CN112397138A CN 112397138 A CN112397138 A CN 112397138A CN 202010995311 A CN202010995311 A CN 202010995311A CN 112397138 A CN112397138 A CN 112397138A
Authority
CN
China
Prior art keywords
protein
strain
dimensional spectrum
music
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010995311.1A
Other languages
Chinese (zh)
Other versions
CN112397138B (en
Inventor
张辉
王利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minnan Normal University
Original Assignee
Inner Mongolia University for Nationlities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University for Nationlities filed Critical Inner Mongolia University for Nationlities
Priority to CN202010995311.1A priority Critical patent/CN112397138B/en
Publication of CN112397138A publication Critical patent/CN112397138A/en
Application granted granted Critical
Publication of CN112397138B publication Critical patent/CN112397138B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a method for drawing a strain protein two-dimensional spectrum based on AI technology, which aims at the characteristics of strain protein sequences, structures and music in expression form, and realizes a method for generating a two-dimensional music score from the strain protein structures based on AI technology, thereby establishing the one-to-one corresponding relation between the strain protein sequences and the music to assist the analysis and research of the strain proteins. After the strain protein is expressed in a two-dimensional spectrum manner, when the strain protein is researched, the difference of different strain proteins can be visually and intuitively seen through the two-dimensional spectrum, the two-dimensional spectrum can also be played into music, the difference of different strain proteins can be perceived in an auditory sense, and a novel method is provided for the research of the strain protein.

Description

AI technology-based method for drawing strain protein two-dimensional spectrum
Technical Field
The invention relates to the technical field of artificial intelligence application, in particular to a method for drawing a strain protein two-dimensional spectrum based on an AI (artificial intelligence) technology.
Background
In the field of life sciences, AI technology has also gradually opened an irreplaceable position for data analysis. Protein as an important component of a living body has sequence diversity and functional structure complexity, so that protein research still remains a life field which is difficult to completely overcome by scientists.
At present, whether the protein characterization is performed in other forms or not mainly by the amino acid sequence, the spatial structure and the like of the protein is adopted to improve the visualization effect of the protein, so that the protein characterization is convenient to analyze and becomes the focus of research of people.
Disclosure of Invention
In view of the above, the invention provides a method for drawing a two-dimensional spectrum of strain proteins based on an AI technology, wherein the strain proteins are characterized in a two-dimensional spectrum form by the AI technology, and different strain proteins correspond to different music while the visualization effect of the proteins is increased, so as to assist the analysis and research of the proteins in the visual and auditory aspects.
The technical scheme provided by the invention is specifically a method for drawing a strain protein two-dimensional spectrum based on AI technology, which is characterized by comprising the following steps:
s1: acquiring a primary structure and a secondary structure of a strain protein sample;
s2: regarding amino acid sequences in the primary structure of the strain protein sample as linear arrangement to form one-dimensional single-channel data;
s3: projecting four main chain atoms in a secondary structure of the strain protein sample in each coordinate system of a three-dimensional space to form three-channel data of main chain skeleton atoms;
s4: constructing a protein generation two-dimensional spectrum model based on a generation type countermeasure network, adopting a plurality of strain protein samples, respectively taking a music style and a protein sequence as constraint conditions, and training the protein generation two-dimensional spectrum model to obtain model parameters;
s5: under the model parameters obtained in step S4, a two-dimensional spectrum of strain proteins is drawn using the protein generation two-dimensional spectrum model.
Preferably, in step S2, regarding the amino acid sequences in the primary structure of the strain protein sample as linear arrangement, forming one-dimensional single-channel data, specifically:
setting the values of 20 amino acids forming the protein as s according to the value range of the image gray value of 0-2551~s20
And forming one-dimensional single-channel data according to the sequence of the amino acids in the strain protein sample and the corresponding numerical values of the amino acids.
Further preferably, in step S3, the projection of four main chain atoms in the secondary structure of the strain protein sample in each coordinate system of the three-dimensional space forms three-channel data of main chain skeleton atoms, specifically:
setting a main chain amino acid skeleton atom C according to the value range of the image gray value of 0-255αThe values of C, N, O are k1、k2、k3And k4
And projecting the four main chain atoms in each coordinate system of the three-dimensional space to form a three-channel distribution gray image of the main chain skeleton atoms, wherein the data is three-channel data.
Further preferably, the plurality of strain protein samples comprise: natural strain protein samples and strain protein samples with increased production-type antagonistic networks.
Further preferably, in step S4, the generating a two-dimensional spectrum model based on the protein constructed by the generative confrontation network includes:
a two-dimensional spectrum generator, a music generation discriminator, a music style discriminator, a protein inverse generator and a protein discriminator;
the training process of the protein generation two-dimensional spectrum model comprises the following steps:
s401: inputting single-channel data of a primary structure in a strain protein sample, three-channel data of a secondary structure in the strain protein sample and single-channel data of music style constraint into a two-dimensional spectrum generator to generate a two-dimensional spectrum and output a music work;
s402: judging the difference between the music generated by the two-dimensional spectrum generator and the real music through the music discriminator;
s403: judging whether the generated music accords with the specified style constraint or not by the music style discriminator;
s404: adjusting model parameters corresponding to the two-dimensional spectrum generator and the music discriminator according to discrimination results of the step S402 and the step S403 until the model parameters meet the threshold requirement;
s405: generating, by the protein reverse generator, an artificial protein sequence with the two-dimensional spectrum generated by the two-dimensional spectrum generator and a protein sequence constraint as its inputs;
s406: and (3) distinguishing the difference between the artificial protein sequence and the real protein sequence through the protein discriminator, if the difference exceeds a threshold value, adjusting model parameters corresponding to the two-dimensional spectrum generator and the music discriminator, and repeating the steps S401 to S405 until the difference between the artificial protein sequence and the real protein sequence meets the threshold value requirement.
Further preferably, the first layer of the two-dimensional spectrum generator is a mixed convolution layer composed of one channel and three channels, and different lines are used for performing convolution processing on input data, wherein for input protein primary structure data, according to amino acid distribution characteristics, 20 types of 3 × 1 convolution kernels are used, and the step length is 3; for input protein secondary structure data, 4 types of 3 × 3 × 3 convolution kernels are set corresponding to main framework atoms according to the three-dimensional distribution characteristics of amino acids, and the step length is 3; for the music style constraint data, adopting m 3 × 1 convolution kernels corresponding to the music style constraint, wherein the step length is 1;
in the middle level, a CycleGAN model is referred, but various characteristics are reserved for the maximum program, a pooling layer is not adopted, and an LReLu function is adopted for each layer of activation function;
and the output layer is synthesized by adopting Softmax, and the two-dimensional spectrum drawing is completed.
Further preferably, the objective function of the protein generating two-dimensional spectrum model is:
T=min(G+L1+L2+F+L3);
wherein G is an objective function of a two-dimensional spectrum generator (G1):
G(X1,C)=max(Ep[fp(X1,C)]);
L1generating an objective function of an arbiter (D1) for music:
L1(D,G)=minGmaxD(Ex[log(D(X2,c))]+Ey[log(1-D(G(X1,G)))]);
L2for the objective function of the music style discriminator (D2):
L2(Y,C)=max(EP[fp(Y,C)]);
f is the objective function of the protein inverse generator (F1):
F(Y,L)=max(Ep[fp(Y,L,X3)]);
L3for the objective function of the protein discriminator (D3):
L3(Z,X1)=max(Ep[fp(Z,X1)]);
wherein, X1 is protein one, secondary structure mixed channel data, Z is artificial protein sequence generated by F1, X2 is real existing music data, X3 is discrimination result of Z, C music style constraint, L protein sequence characteristic constraint, and two-dimensional spectrum music data generated by Y two-dimensional spectrum generator G1.
Further preferably, the strain protein is a novel coronavirus protein.
The method for drawing the strain protein two-dimensional spectrum based on the AI technology realizes a method for generating a two-dimensional music score from a strain protein structure based on the AI technology aiming at the characteristics of the strain protein sequence, the structure and the music in the expression form, thereby establishing the one-to-one correspondence relationship between the strain protein sequence and the music to assist the analysis and research of the strain protein. After the strain protein is expressed in a two-dimensional spectrum manner, when the strain protein is researched, the difference of different strain proteins can be visually and intuitively seen through the two-dimensional spectrum, the two-dimensional spectrum can also be played into music, the difference of different strain proteins can be perceived in an auditory sense, and a novel method is provided for the research of the strain protein.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as disclosed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a block flow diagram of a method for drawing a strain protein two-dimensional spectrum based on AI technology according to an embodiment of the disclosure;
FIG. 2 is a schematic diagram of a specific flow chart of a method for drawing a strain protein two-dimensional spectrum based on AI technology, provided by the disclosed embodiment of the invention;
FIG. 3 is a model diagram of a two-dimensional spectrum generated based on a protein constructed by a generative countermeasure network in a method for drawing a strain protein two-dimensional spectrum based on AI technology provided by the disclosed embodiment of the invention;
fig. 4 is a training flowchart of a two-dimensional spectrum model generated for a protein in a method for drawing a strain protein two-dimensional spectrum based on an AI technique according to an embodiment of the disclosure.
Fig. 5 is a schematic model structure diagram of a two-dimensional spectrum generator G1 in a method for drawing a strain protein two-dimensional spectrum based on an AI technique according to an embodiment of the disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of methods consistent with certain aspects of the invention, as detailed in the appended claims.
In order to realize characterization of strain proteins from another perspective to assist in protein analysis and study, the present embodiment provides a method for drawing a two-dimensional spectrum of strain proteins based on AI technology, wherein the basic constituent elements of strain proteins are generally 20 amino acids, and the basic constituent units of music are seven scales, which can match the basic elements by designing mapping methods although the number of the basic elements is different.
On the basis that 20 amino acids are combined according to different arrangements to form a primary structure, the protein can also construct various spatial conformations through covalent bonds and non-covalent bonds to form biological macromolecules with various shapes and functions. The music forms basic tunes on the basis of different scale permutation and combination. And then integrating the synthesis and regulation of rhythm, harmony, dynamics, tone, curved style, texture and tone to form styles and melodies with different characteristics, thereby giving people different sensory experiences.
Aiming at the characteristics of the protein sequence, the structure of the strain and the representation form of music, the AI technology can be used for realizing the method for generating the two-dimensional score from the protein structure of the strain, thereby establishing the relationship between the protein sequence of the new coronavirus and the music.
The method for drawing the strain protein two-dimensional spectrum based on the AI technology mainly comprises the following steps: establishing a training and testing data set for generating a two-dimensional music score by using viral proteins, and aiming at the shortage of the strain protein sample amount, increasing the viral samples by using a generating type countermeasure network; designing a mapping relation based on an amino acid structure and music reconconformation, establishing the relevance of different expression results, and establishing a two-dimensional spectrum music generation method based on a generative confrontation network technology; the primary and secondary structure of protein and music style are used as constraint and input into generator; generating a two-dimensional spectrum which accords with the specific lewy wind, wherein the two-dimensional spectrum can generate proteins which accord with the structure of the new coronavirus through a protein generator; the generated music is taken as a new input and is respectively sent to a music discriminator and a protein generator, the music discriminator is used for judging whether the generated music accords with the composition rule, and the protein generator is used for generating a protein-like secondary and tertiary structure and comparing the protein-like secondary and tertiary structure with the original protein to ensure the relevance of the music and the protein.
Referring to fig. 1, an overall framework flow of a method for drawing a two-dimensional spectrum of a strain protein based on an AI technology is shown in fig. 2, under the guidance of the framework flow, the method for drawing a two-dimensional spectrum of a strain protein based on an AI technology provided in the present embodiment specifically includes the following steps:
s1: acquiring a primary structure and a secondary structure of a strain protein sample;
s2: regarding amino acid sequences in the primary structure of the strain protein sample as linear arrangement to form one-dimensional single-channel data;
s3: projecting four main chain atoms in a secondary structure of the strain protein sample in each coordinate system of a three-dimensional space to form three-channel data of main chain skeleton atoms;
s4: constructing a protein generation two-dimensional spectrum model based on a generation type countermeasure network, adopting a plurality of strain protein samples, respectively taking a music style and a protein sequence as constraint conditions, and training the protein generation two-dimensional spectrum model to obtain model parameters;
s5: under the model parameters obtained in step S4, a two-dimensional spectrum of strain proteins is drawn using the protein generation two-dimensional spectrum model.
In the above method, the primary structure of the strain protein sample, the secondary structure of the strain protein sample, the musical style constraint and the protein sequence constraint are used as input data.
Wherein the content of the first and second substances,
primary structure data of strain protein samples: the method takes the primary amino acid sequence of protein data as input, and takes the amino acid sequence in the primary structure of the protein as linear arrangement because the primary structure of the protein is formed by the dehydration of amino acid to form front and back linear connection, and the data forms one-dimensional single-channel data, namely one-dimensional data. Setting the values of 20 amino acids forming the protein as s according to the value range of the image gray value of 0-2551~s20
Secondary structure data of strain protein samples: the protein sequence secondary structure is in three-dimensional space distribution due to the existence of alpha helix and beta folding and the connection of a plurality of times of linked amino acids on the main chain. Neglecting secondary link amino acids with small influence on spatial characteristics, and taking values according to image gray values of 0-255Range, set up as backbone amino acid skeleton atom CαThe values of C, N, O are k1、k2、k3And k4Each represents k1%、k2%、k3% and k4% black, the projection of four main chain atoms in each coordinate system of the three-dimensional space forms a three-channel distribution gray image of the main chain skeleton atoms, and the data is three-channel data, namely three-dimensional data.
Constraint conditions of music style: the tone, chord and rhythm of different specific music styles are combined into music style constraint according to the categories of music works, the rules of music creation and the like.
Protein sequence constraints: the comprehensive characteristics of one-dimensional and three-dimensional property abstractions of the amino acid sequences of the primary and secondary structures of the new coronavirus are used as protein sequence constraints of the model.
Because a large amount of sample data of strain proteins are needed in the training process of generating a two-dimensional spectrum model of proteins, and for some viruses, the training requirements cannot be met due to the small sample size, for example: the new coronavirus can be amplified through a strain protein sample in an artificial mode to meet the training requirement of a model, multiple amplification methods can be selected for the strain protein sample, and the strain protein sample is increased by adopting a deep learning mode of a generative confrontation network in the scheme.
Referring to fig. 3, for the protein generation two-dimensional spectrum model constructed based on the generative confrontation network provided in the present embodiment, the model is based on the generative confrontation model, and a protein-to-music generation model is designed by taking a strain protein generation two-dimensional score and a specific style of music as research objects, and the model includes: the two-dimensional spectrum generator G1, the music generation discriminator D1, the music style discriminator D2, the protein inverse generator F1 and the protein discriminator D3 input the primary and secondary structure X1 of the input protein, the music style constraint C and the protein sequence constraint L, and then the two-dimensional spectrum can be generated and the musical composition can be output.
The training process for generating the two-dimensional spectral model for the above-mentioned proteins, see fig. 4, includes:
s401: inputting mixed channel data X1 formed by single channel data of a primary structure in a strain protein sample and three channel data of a secondary structure in the strain protein sample and single channel data C constrained by music style into a two-dimensional spectrum generator G1 to generate a two-dimensional spectrum and output a music piece;
s402: judging the difference between the music generated by the two-dimensional spectrum generator G1 and the real music through a music discriminator D1;
s403: judging whether the generated music accords with the specified style constraint or not through a music style discriminator D2, and controlling the music style generated by G1 to be consistent with the specific virus disease sequence;
s404: according to the judgment results of the step S402 and the step S403, adjusting model parameters corresponding to the two-dimensional spectrum generator G1 and the music discriminator D1 until the model parameters meet the threshold requirement;
s405: generating an artificial protein sequence X3 by a protein reverse generator F1 with the two-dimensional spectrum generated by the two-dimensional spectrum generator G1 and the protein sequence constraint L as its inputs;
s406: and (3) distinguishing the difference between the artificial protein sequence X3 and the real protein sequence X1 through a protein discriminator D3, if the difference exceeds a threshold value, adjusting model parameters corresponding to a two-dimensional spectrum generator G1 and a music discriminator D1, and then repeating the steps S401 to S405 until the difference between the artificial protein sequence X3 and the real protein sequence X1 meets the threshold value requirement.
The input data of the two-dimensional spectrum generator G1 includes: the single-channel data composed of mixed channel data X1 composed of single-channel data of a primary structure of protein and three-channel data generated by a secondary structure and music style constraint C, namely X1+ C, constitutes mixed multi-channel data.
Referring to fig. 5, a schematic diagram of a model structure of a two-dimensional spectrum generator G1 is shown, where a first layer of the two-dimensional spectrum generator G1 is a mixed convolution layer composed of one channel and three channels, and corresponds to a primary protein structure, a music style constraint, and a secondary protein space structure, respectively, to collect a primary virus protein feature, and to perform convolution processing on input data by using different lines according to data, where, to extract protein amino acid features, for the input primary protein structure data, 20 types of 3 × 1 convolution kernels are used according to amino acid distribution characteristics, and a step length is 3; for input protein secondary structure data, 4 types of 3 × 3 × 3 convolution kernels are set corresponding to main framework atoms according to the three-dimensional distribution characteristics of amino acids, and the step length is 3; for the music style constraint data, adopting m 3 × 1 convolution kernels corresponding to the music style constraint, wherein the step length is 1;
in the middle level, a CycleGAN model is referred, but various characteristics are reserved for the maximum program, a pooling layer is not adopted, and an LReLu function is adopted for each layer of activation function;
and the output layer is synthesized by adopting Softmax, and the two-dimensional spectrum drawing is completed.
The objective function of the protein generating two-dimensional spectrum model is as follows:
T=min(G+L1+L2+F+L3);
wherein G is an objective function of a two-dimensional spectrum generator (G1):
G(X1,C)=max(Ep[fp(X1,C)]);
L1generating an objective function of an arbiter (D1) for music:
L1(D,G)=minGmaxD(Ex[log(D(X2,c))]+Ey[log(1-D(G(X1,G)))]);
L2for the objective function of the music style discriminator (D2):
L2(Y,C)=max(EP[fp(Y,C)]);
f is the objective function of the protein inverse generator (F1):
F(Y,L)=max(Ep[fp(Y,L,X3)]);
L3for the objective function of the protein discriminator (D3):
L3(Z,X1)=max(Ep[fp(Z,X1)]);
wherein, X1 is protein one, secondary structure mixed channel data, Z is artificial protein sequence generated by F1, X2 is real existing music data, X3 is discrimination result of Z, C music style constraint, L protein sequence characteristic constraint, and two-dimensional spectrum music data generated by Y two-dimensional spectrum generator G1.
The method for drawing the strain protein two-dimensional spectrum based on the AI technology provided by the embodiment is particularly suitable for researching and using new coronavirus proteins.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the present invention is not limited to what has been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (8)

1. A method for drawing a strain protein two-dimensional spectrum based on AI technology is characterized by comprising the following steps:
s1: acquiring a primary structure and a secondary structure of a strain protein sample;
s2: regarding amino acid sequences in the primary structure of the strain protein sample as linear arrangement to form one-dimensional single-channel data;
s3: projecting four main chain atoms in a secondary structure of the strain protein sample in each coordinate system of a three-dimensional space to form three-channel data of main chain skeleton atoms;
s4: constructing a protein generation two-dimensional spectrum model based on a generation type countermeasure network, adopting a plurality of strain protein samples, respectively taking a music style and a protein sequence as constraint conditions, and training the protein generation two-dimensional spectrum model to obtain model parameters;
s5: under the model parameters obtained in step S4, a two-dimensional spectrum of strain proteins is drawn using the protein generation two-dimensional spectrum model.
2. The AI-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 1, wherein in step S2, the amino acid sequences in the primary structure of the strain protein sample are regarded as linear arrangement to form one-dimensional single-channel data, specifically:
setting the values of 20 amino acids forming the protein as s according to the value range of the image gray value of 0-2551~s20
And forming one-dimensional single-channel data according to the sequence of the amino acids in the strain protein sample and the corresponding numerical values of the amino acids.
3. The AI-technology-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 1, wherein in step S3, the projections of four main chain atoms in the secondary structure of the strain protein sample in each coordinate system of a three-dimensional space form three-channel data of main chain skeleton atoms, specifically:
setting a main chain amino acid skeleton atom C according to the value range of the image gray value of 0-255αThe values of C, N, O are k1、k2、k3And k4
And projecting the four main chain atoms in each coordinate system of the three-dimensional space to form a three-channel distribution gray image of the main chain skeleton atoms, wherein the data is three-channel data.
4. The AI-based technique for drawing a two-dimensional spectrum of strain proteins as recited in claim 1, wherein the plurality of strain protein samples comprise: natural strain protein samples and strain protein samples with increased production-type antagonistic networks.
5. The AI-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 1, wherein in step S4, the generation of the two-dimensional spectrum model based on the protein constructed by the generative confrontation network comprises:
a two-dimensional spectrum generator (G1), a music generation discriminator (D1), a music style discriminator (D2), a protein inverse generator (F1), and a protein discriminator (D3);
the training process of the protein generation two-dimensional spectrum model comprises the following steps:
s401: inputting single-channel data of a primary structure in a strain protein sample, three-channel data of a secondary structure in the strain protein sample and single-channel data of music style constraint into a two-dimensional spectrum generator (G1), generating a two-dimensional spectrum, and outputting a musical piece;
s402: determining, by the music discriminator (D1), a difference between the music generated by the two-dimensional spectrum generator (G1) and real music;
s403: determining, by the music style discriminator (D2), whether the generated music complies with a specified style constraint;
s404: according to the discrimination results of the step S402 and the step S403, adjusting model parameters corresponding to the two-dimensional spectrum generator (G1) and the music discriminator (D1) until the model parameters meet the threshold requirement;
s405: generating, by the protein reverse generator (F1), an artificial protein sequence (X3) with the two-dimensional spectrum generated by the two-dimensional spectrum generator (G1) and a protein sequence constraint (L) as inputs thereof;
s406: and (3) distinguishing the difference between the artificial protein sequence (X3) and the real protein sequence (X1) through the protein discriminator (D3), if the difference exceeds a threshold value, adjusting model parameters corresponding to the two-dimensional spectrum generator (G1) and the music discriminator (D1), and then repeating the steps S401 to S405 until the difference between the artificial protein sequence (X3) and the real protein sequence (X1) meets the threshold value requirement.
6. The AI-based method for drawing a two-dimensional spectrum of strain proteins as claimed in claim 5, wherein the first layer of the two-dimensional spectrum generator (G1) is a mixed convolution layer composed of one channel and three channels, and different circuits are used for convolution processing of input data, wherein for input primary structure data of proteins, 20 types of 3 × 1 convolution kernels are used according to amino acid distribution characteristics, and the step length is 3; for input protein secondary structure data, 4 types of 3 × 3 × 3 convolution kernels are set corresponding to main framework atoms according to the three-dimensional distribution characteristics of amino acids, and the step length is 3; for the music style constraint data, adopting m 3 × 1 convolution kernels corresponding to the music style constraint, wherein the step length is 1;
in the middle level, a CycleGAN model is referred, but various characteristics are reserved for the maximum program, a pooling layer is not adopted, and an LReLu function is adopted for each layer of activation function;
and the output layer is synthesized by adopting Softmax, and the two-dimensional spectrum drawing is completed.
7. The AI-based technique for creating a two-dimensional spectrum of proteins from a strain of claim 5, wherein the objective function of the two-dimensional spectrum model generated by the proteins is as follows:
T=min(G+L1+L2+F+L3);
wherein G is an objective function of a two-dimensional spectrum generator (G1):
G(X1,C)=max(Ep[fp(X1,C)]);
L1generating an objective function of an arbiter (D1) for music:
L1(D,G)=minGmaxD(Ex[log(D(X2,c))]+Ey[log(1-D(G(X1,G)))]);
L2for the objective function of the music style discriminator (D2):
L2(Y,C)=max(EP[fp(Y,C)]);
f is the objective function of the protein inverse generator (F1):
F(Y,L)=max(Ep[fp(Y,L,X3)]);
L3for the objective function of the protein discriminator (D3):
L3(Z,X1)=max(Ep[fp(Z,X1)]);
wherein, X1 is protein one, secondary structure mixed channel data, Z is artificial protein sequence generated by F1, X2 is real existing music data, X3 is discrimination result of Z, C music style constraint, L protein sequence characteristic constraint, and two-dimensional spectrum music data generated by Y two-dimensional spectrum generator G1.
8. The AI-based technique for profiling two-dimensional strain proteins according to claim 1, wherein the strain proteins are novel coronavirus proteins.
CN202010995311.1A 2020-09-21 2020-09-21 Method for drawing two-dimensional spectrum of strain protein based on AI technology Active CN112397138B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010995311.1A CN112397138B (en) 2020-09-21 2020-09-21 Method for drawing two-dimensional spectrum of strain protein based on AI technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010995311.1A CN112397138B (en) 2020-09-21 2020-09-21 Method for drawing two-dimensional spectrum of strain protein based on AI technology

Publications (2)

Publication Number Publication Date
CN112397138A true CN112397138A (en) 2021-02-23
CN112397138B CN112397138B (en) 2024-02-13

Family

ID=74596327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010995311.1A Active CN112397138B (en) 2020-09-21 2020-09-21 Method for drawing two-dimensional spectrum of strain protein based on AI technology

Country Status (1)

Country Link
CN (1) CN112397138B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023048251A1 (en) * 2021-09-27 2023-03-30 国立大学法人筑波大学 Structure estimation program, structure estimation device, and structure estimation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994025860A1 (en) * 1993-04-28 1994-11-10 Immunex Corporation Method and system for protein modeling
CN110517730A (en) * 2019-09-02 2019-11-29 河南师范大学 A method of thermophilic protein is identified based on machine learning
CN111242922A (en) * 2020-01-13 2020-06-05 上海极链网络科技有限公司 Protein image classification method, device, equipment and medium
US20200273541A1 (en) * 2019-02-27 2020-08-27 The Regents Of The University Of California Unsupervised protein sequence generation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994025860A1 (en) * 1993-04-28 1994-11-10 Immunex Corporation Method and system for protein modeling
US20200273541A1 (en) * 2019-02-27 2020-08-27 The Regents Of The University Of California Unsupervised protein sequence generation
CN110517730A (en) * 2019-09-02 2019-11-29 河南师范大学 A method of thermophilic protein is identified based on machine learning
CN111242922A (en) * 2020-01-13 2020-06-05 上海极链网络科技有限公司 Protein image classification method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王佳;丁雄飞;: "数据挖掘在预测甲型流感病毒蛋白宿主偏好性中的应用研究", 数字技术与应用, no. 06 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023048251A1 (en) * 2021-09-27 2023-03-30 国立大学法人筑波大学 Structure estimation program, structure estimation device, and structure estimation method

Also Published As

Publication number Publication date
CN112397138B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN110378985B (en) Animation drawing auxiliary creation method based on GAN
Sauquet A practical guide to molecular dating
Wang et al. STL rapid prototyping bio-CAD model for CT medical image segmentation
JP4721052B2 (en) Feature change image creation method, feature change image creation device, and feature change image creation program
CN113343705B (en) Text semantic based detail preservation image generation method and system
RU2009115198A (en) METHODS OF CHARACTERISTIC SELECTION USING BASED ON THE CLASSIFIER GROUP GENETIC ALGORITHMS
CN106023195A (en) BP neural network image segmentation method and device based on adaptive genetic algorithm
Lasserre et al. A neuron membrane mesh representation for visualization of electrophysiological simulations
WO2010005119A1 (en) Method for creating living body data model, living body data model creating device, device for storing data structure of living body data model and living body data model, method for dispersing load of three-dimensional data model and three-dimensional data model load dispersion device
CN110162475A (en) A kind of Software Defects Predict Methods based on depth migration
CN109598279A (en) Based on the zero sample learning method for generating network from coding confrontation
CN110175168A (en) A kind of time series data complementing method and system based on generation confrontation network
CN111275613A (en) Editing method for generating confrontation network face attribute by introducing attention mechanism
CN110322398B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
WO2021132099A1 (en) Learning support device, learning device, learning support method, and learning support program
CN112397138B (en) Method for drawing two-dimensional spectrum of strain protein based on AI technology
CN106530383B (en) The facial rendering intent of face based on Hermite interpolation neural net regression models
CN110532291A (en) Model conversion method and system between deep learning frame based on minimum Executing Cost
CN105069794B (en) A kind of total blindness's stereo image quality evaluation method competed based on binocular
Sansalone et al. Homo sapiens and Neanderthals share high cerebral cortex integration into adulthood
CN110796594B (en) Image generation method, device and equipment
Selig et al. Three‐dimensional geometric morphometric analysis of treeshrew (Scandentia) lower molars: Insight into dental variation and systematics
US11836936B2 (en) Method for generating a digital data set representing a target tooth arrangement
Isnanto et al. Fractal batik motifs generation using variations of parameters in julia set function
CN113111906B (en) Method for generating confrontation network model based on condition of single pair image training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231205

Address after: 363000 No. 36 straight street, Zhangzhou City, Fujian Province

Applicant after: MINNAN NORMAL University

Address before: 028000 No.22 Huolinhe street, Horqin district, Tongliao City, Inner Mongolia Autonomous Region

Applicant before: Inner Mongolia University For The Nationalities

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant