CN116543839B - Phage construction method, device, equipment and storage medium - Google Patents

Phage construction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116543839B
CN116543839B CN202310771128.7A CN202310771128A CN116543839B CN 116543839 B CN116543839 B CN 116543839B CN 202310771128 A CN202310771128 A CN 202310771128A CN 116543839 B CN116543839 B CN 116543839B
Authority
CN
China
Prior art keywords
phage
sequence
target
time sequence
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310771128.7A
Other languages
Chinese (zh)
Other versions
CN116543839A (en
Inventor
李坚强
陈杰
肖敏凤
林子杰
张家骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202310771128.7A priority Critical patent/CN116543839B/en
Publication of CN116543839A publication Critical patent/CN116543839A/en
Application granted granted Critical
Publication of CN116543839B publication Critical patent/CN116543839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the field of biotechnology, and discloses a phage construction method, a device, equipment and a storage medium, wherein the method comprises the following steps: obtaining a phage element sequence corresponding to an original phage; determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence, and determining element space structures corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics; and performing characteristic distribution fitting on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network to obtain the target phage. According to the invention, the target phage is obtained after characteristic distribution fitting is carried out on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure corresponding to the original phage through the time sequence generation countermeasure network, so that the technical problem of low phage construction efficiency through phage genome rearrangement technology in the prior art is solved.

Description

Phage construction method, device, equipment and storage medium
Technical Field
The invention relates to the field of biotechnology, in particular to a phage construction method, a phage construction device, phage construction equipment and a phage storage medium.
Background
Phage (Phage) is a virus that infects and destroys bacteria, and is found in nature in conjunction with bacteria. An important advantage of bacteriophages is their high specificity for bacteria, often used to detect and kill specific bacteria. Among them, artificial phage can defend against the possible development of a mechanism against phage by future bacteria, and maintain the effectiveness of phage therapy, which is of great importance in phage therapy.
Because the genome rearrangement technology can realize rapid evolution of organisms and provides a rapid and brand-new method for researching the relation between genome structural variation and phenotype variation, the construction of artificial phage can be carried out through the phage genome rearrangement technology in the existing scheme. However, phage genome rearrangement techniques may result in rearrangement of phage genomes, making the rearrangement results unstable, and multiple experiments are required to obtain reliable results, resulting in lower efficiency of the techniques in practical applications.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a phage construction method, device, equipment and storage medium, and aims to solve the technical problem of low phage construction efficiency in the prior art by phage genome rearrangement technology.
To achieve the above object, the present invention provides a phage construction method comprising the steps of:
obtaining a phage element sequence corresponding to an original phage;
determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence;
determining an element spatial structure corresponding to the original phage based on the element sequence timing characteristics and the biophysical characteristics;
and performing characteristic distribution fitting on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network to obtain the target phage.
Optionally, the phage building method generates an countermeasure network implementation based on the timing, the timing countermeasure network having embedded components disposed therein;
correspondingly, the step of determining the element space structure corresponding to the original phage based on the element sequence timing characteristics and the biological physiological characteristics comprises the following steps:
converting, by the embedding component, the element sequence temporal feature and the bio-physiological feature into a temporal feature latent code and a physiological feature latent code, respectively;
determining a node attribute matrix corresponding to the phage element according to the time sequence feature potential codes and the physiological feature potential codes;
Constructing an adjacency matrix between phage elements based on the phage element sequences;
and determining the element space structure corresponding to the original phage based on the adjacency matrix and the node attribute matrix.
Optionally, the step of constructing an adjacency matrix between phage elements from the phage element sequence comprises:
combining the phage element sequence with a standard phage element sequence after Jing Sheng signal analysis to obtain a combined phage element sequence;
and performing multi-sequence comparison on the combined phage element sequences and all phage element sequences of the original phage through preset sequence comparison software to obtain an adjacency matrix between phage elements.
Optionally, the timing generation countermeasure network is further provided with an encoding component;
the step of obtaining the target phage further comprises the following steps of:
converting the physiological characteristic potential code into a target biological physiological characteristic through a physiological characteristic recovery network;
Converting the time sequence characteristic potential codes into a target element sequence time sequence characteristic through a time sequence characteristic recovery network;
the step of obtaining a target phage by fitting the characteristic distribution of the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network comprises the following steps:
encoding the element space structure through a structure encoding network and the encoding component to obtain a structure potential encoding corresponding to the element space structure;
converting the structure potential code into a target element spatial structure through a structural feature decoding network and a decoder;
constructing a target phage according to the target biological physiological characteristic, the target element space structure and the target element sequence time sequence characteristic.
Optionally, after the step of converting the structure potential code into the target element spatial structure by the structural feature decoding network and the decoder, the method further includes:
updating a arbiter in the time series generation countermeasure network based on the structural latent code, the element sequence time series characteristics, and the bio-physiological characteristics;
directing, by the updated arbiter, the generator in the timing generation countermeasure network to learn spatial features of the phage elements to update the target element spatial structure.
Optionally, after the step of potentially transcoding the timing characteristic into the target element sequence timing characteristic through the timing characteristic recovery network, the method further includes:
receiving, by the generator, a timing feature latent code to generate a target timing feature latent code;
the generator is directed to learn timing characteristics of the phage element based on the timing characteristic latent encoding and the target timing characteristic latent encoding to update target element sequence timing characteristics.
Optionally, after the step of constructing a target phage according to the target biological physiological characteristic, the target element spatial structure and the target element sequence timing characteristic, the method further comprises:
inputting a time sequence feature potential code into a time sequence feature discrimination function, inputting the physiological feature potential code into a physiological feature discrimination function, and inputting the structure potential code into a spatial feature discrimination function to respectively obtain the real sample probability of the time sequence feature of the target element sequence, the real sample probability of the physiological feature of the target organism and the real sample probability of the spatial structure of the target element.
In addition, in order to achieve the above object, the present invention also provides a phage constructing apparatus comprising:
The element sequence acquisition module is used for acquiring phage element sequences corresponding to the original phage;
the characteristic determining module is used for determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence;
the space structure determining module is used for determining the element space structure corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics;
and the phage construction module is used for carrying out characteristic distribution fitting on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network to obtain the target phage.
Furthermore, to achieve the above object, the present invention also provides a phage construction apparatus comprising: a memory, a processor, and a phage building program stored on the memory and executable on the processor, the phage building program configured to implement the steps of the phage building method as described above.
In addition, to achieve the above object, the present invention also proposes a storage medium having stored thereon a phage building program which, when executed by a processor, implements the steps of the phage building method as described above.
In the invention, phage element sequences corresponding to the original phage are obtained; determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence, and determining element space structures corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics; performing characteristic distribution fitting on element sequence time sequence characteristics, biological physiological characteristics and element space structures through a time sequence generation countermeasure network to obtain target phage; compared with the prior art, when the artificial phage is constructed by phage genome rearrangement technology, rearrangement of phage genome may be caused, so that the rearrangement result is unstable, and the efficiency is lower.
Drawings
FIG. 1 is a schematic diagram of a phage construction apparatus of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of the phage construction method of the present invention;
FIG. 3 is a schematic flow chart of a second embodiment of the phage construction method of the present invention;
FIG. 4 is a schematic diagram of model training in a second embodiment of the phage building method of the present invention;
FIG. 5 is a schematic flow chart of a third embodiment of the phage construction method of the present invention;
FIG. 6 is a schematic diagram of the PCA visualization results for each phage in a third embodiment of the phage construction process of the present invention;
FIG. 7 is a schematic diagram of the visualization of t-SNE for each phage in a third embodiment of the phage construction process of the present invention;
FIG. 8 is a graph showing the average nucleotide profile of each phage in the third example of the phage construction process of the present invention;
FIG. 9 is a graph showing GC content distribution of each phage in the third example of the phage construction method of the present invention;
FIG. 10 is a block diagram showing the construction of a first embodiment of the phage construction apparatus according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a phage construction apparatus of a hardware running environment according to an embodiment of the present invention.
As shown in fig. 1, the phage construction apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in FIG. 1 does not constitute a limitation of the phage construction apparatus, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a phage building program may be included in the memory 1005 as one storage medium.
In the phage building apparatus shown in FIG. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the phage building apparatus of the present invention may be provided in a phage building apparatus which calls a phage building program stored in the memory 1005 through the processor 1001 and performs the phage building method provided by the embodiment of the present invention.
The embodiment of the invention provides a phage construction method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the phage construction method of the invention.
In this embodiment, the phage construction method comprises the steps of:
step S10: and obtaining phage element sequences corresponding to the original phage.
It should be noted that the execution subject of the method of this embodiment may be a phage construction apparatus for phage construction based on a time-series generation countermeasure network, or other phage construction system including the phage construction apparatus capable of achieving the same or similar functions. The phage construction method according to this example and the following examples will be specifically described with a phage construction system (hereinafter referred to as "system"). Among these, phage construction can be a process that utilizes known DNA sequence information to involve and construct entirely new phage.
It should be appreciated that the original phage described above may be any phage in a given laboratory, and this example is not limiting.
It will be appreciated that the phage element sequence described above may be a sequence consisting of the base elements of the original phage. For example: for a given arbitrary phage p, the phage can be expressed as a sequence of base elements(i.e., phage element sequence) Wherein->,/>Is the number of bases of an element, wherein +.>. In the phage construction method of the present embodiment, the base sequences of all phage elements can be stored in one element library, so that the phage element sequences can be obtained from the element library. In addition, after the phage element sequence is obtained, the phage element sequence may also be converted into a base sequence.
In a specific implementation, since the phage genome is too long, for any given phage p, it can be and is expressed as a corresponding set of phage element sequencesThus the whole genome of the phage is componentized, each component in the genome consisting of several bases, wherein the components are divided into coding and non-coding components to effect conversion of the ultralong phage gene sequence into a shorter phage component sequence.
Step S20: and determining the element sequence time sequence characteristics and the biological physiological characteristics corresponding to the original phage according to the phage element sequence.
The sequence of elements described above may be a sequence of elements (simply referred to as a sequence of elements) representing a dynamic transfer pattern of phage.
It should be understood that if N phages are present, the phage dataset can be defined asCorrespondingly, the target of phage construction can be the learning data set +.>Potential distribution of->So as to achieve the purpose of continuously extracting samples from the model and obtaining the artificial phage. To achieve this goal, the present embodiment may rely on training data sets +.>Approaching its distribution: />. However, since the amount of experimental phage data available in practical applications is not large, a large amount of effective gene data is required for generating a truly effective gene sequence. Therefore, in order to cope with the situation that the phage data amount is small, expert knowledge (namely the above biological physiological characteristics, simply referred to as physiological characteristics) can be added as a global characteristic of the model in the embodiment, so that the model is helped to learn more physiological information about phage, and the effectiveness and generalization capability of the model are improved.
In a specific implementation, for phage element sequencesA feature vector can be extracted>To represent each element and then +_ with the feature vector sequence>To represent the dynamic transfer pattern of the phage, which is defined by this example as the element sequence timing characteristics of the phage. Wherein (1)>,/>Representing the timing characteristics->Represented as one example of a timing feature. While expert knowledge books in the sequence pointed out by the expertThe method is called phage biological physiological characteristics->Vector->Important biological modes of capturing specific phages, +.>. Wherein (1)>Representing the biophysical characteristics,/->Represented as one example of a biophysical feature.
Step S30: and determining the element space structure corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics.
It will be appreciated that the element space structure described above may be the spatial structure of a phage element sequence of a phage.
In a specific implementation, the element space structure of phage may be represented by element diagram GIs a graph designated by an adjacency matrix a and a node attribute matrix N. The node attribute matrix N consists of element sequence time sequence characteristics and potential codes corresponding to biological physiological characteristics, so that the element sequence time sequence characteristics and the potential codes corresponding to the biological physiological characteristics can be acquired first, and an element space structure corresponding to the original phage can be constructed based on the element sequence time sequence characteristics, the potential codes corresponding to the biological physiological characteristics and the adjacent matrix.
Step S40: and performing characteristic distribution fitting on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network to obtain the target phage.
It should be noted that the above time sequence generation countermeasure network may be a deep learning model. The generation countermeasure network may be composed of a generator that is responsible for generating the samples and a arbiter that is responsible for determining whether the samples generated by the generator are true. The generator is to confuse the arbiter as much as possible, and the arbiter is to distinguish the samples generated by the generator from the real samples as much as possible.
It will be appreciated that the target phage may be an artificial microorganism constructed based on a time series generation antagonistic network, which is naturally infectious and toxic, and which may be injected or orally administered into a human or animal body to treat a disease or provide nutrition. Artificial phage can be used in a number of fields of medicine, nutrition, agriculture, and environmental protection, for example: treating bacterial infection and virus infection, improving food nutrition, preventing and treating crop diseases, treating pollutants, etc.
It should be understood that the phage construction method based on the time series generation countermeasure network provided in this embodiment can combine known phage genome sequence and protein sequence information to guide design, and can realize phage genome rearrangement without professional skill; the method can also quickly search a large number of genome rearrangement schemes to find an optimal recombination scheme, thereby improving the efficiency of phage genome rearrangement; the method can also realize automation of phage genome rearrangement, and saves labor and time cost; the method can also improve the diversity and success rate of phage genome rearrangement through a plurality of different algorithms and models.
In a specific implementation, in order to approach the true distribution of the data, three kinds of information, namely, element sequence timing characteristics, biological physiological characteristics and element space structures, can be used to construct the training data set in the embodiment. At this point phage construction can be a potential distribution in the learning dataset +.>And rely on training data set->To get close to its distribution:that is, feature distribution fitting is performed on the element sequence temporal features, the biological physiological features and the element space structure through the temporal generation countermeasure network. Based on the decomposition of the auto-regressive,the above approximation can be solved by solving two objectives. The first is global to ensure that the statistics of the generated data are similar to the statistics of the original data over time:wherein->Is some suitable measure of the distance between the distributions. The second is local to ensure that the distribution of the generated data and the raw data between the element sequences is similar:. Under the GAN (Generative Adversarial Network, generating an antagonism network) framework, KL divergence (Kullback-Leibler divergence) is used to calculate the distribution distance of global targets +.>The distribution of local targets uses JS dispersion (Jensen-Shannon dispersion) to calculate the distribution distance. By using both divergence measurement methods, GAN can generate artificial phage sequence data with similar biological characteristics and element structure similarity to the original phage.
The embodiment discloses obtaining phage element sequences corresponding to original phage; determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence, and determining element space structures corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics; performing characteristic distribution fitting on element sequence time sequence characteristics, biological physiological characteristics and element space structures through a time sequence generation countermeasure network to obtain target phage; compared with the construction of artificial phage by phage genome rearrangement technology in the prior art, rearrangement of phage genome may be caused, so that rearrangement results are unstable, and because the embodiment determines element sequence time sequence characteristics and biological physiological characteristics according to phage element sequences corresponding to original phage, determines element space structures based on the element sequence time sequence characteristics and the biological physiological characteristics, and obtains target phage after characteristic distribution fitting of element sequence time sequence characteristics, biological physiological characteristics and element space structures by a time sequence generation countermeasure network, the technical problem of low phage construction efficiency by phage genome rearrangement technology in the prior art is solved.
Referring to FIG. 3, FIG. 3 is a schematic flow chart of a second embodiment of the phage construction method of the present invention.
Based on the first embodiment described above, in order to explore the correlation in the phage element sequence space, in this embodiment, the phage building method generates an countermeasure network implementation based on the timing, in which embedded components are provided; the step S30 includes:
step S301: the element sequence temporal features and the biophysical features are converted by the embedding component into temporal feature latent codes and physiological feature latent codes, respectively.
It should be noted that, in order to perform feature distribution fitting on the element sequence time sequence feature, the biological physiological feature and the element space structure, this embodiment may set an embedded component in the phage building frame, which is used for learning the biological physiological feature of phage and the low-dimensional representation of the phage element sequence, and at the same time provides the node property of phage element for the element space structure.
It should be understood that referring to fig. 4, fig. 4 is a schematic diagram of model training in the second embodiment of the phage building process of the present invention. As shown in fig. 4, the Embedding layer (Embedding) and the Recovery layer (Recovery) may provide a mapping between element sequence timing features and bio-physiological features and potential space, Allowing the timing generation to learn the underlying timing young state of the data through a low dimensional representation against the network. In practical application, can be provided with,/>For the latent vector space corresponding to the feature space S, X, the embedding layerPotential coding of->Including both physiological and timing features. The embedded layer can be realized by a recurrent neural network, +.>WhereinIs an embedded layer of physiological characteristics, < >>Is a recursive embedding layer of timing features. />And->The temporal feature potential coding and the physiological feature potential coding are respectively carried out.
In particular embodiments, the Encoder (Encoder) may convert the input data into corresponding potential encodings, so in this embodiment, the element sequence timing characteristics X and the biophysical characteristics S may be input to the Encoder to obtain corresponding timing characteristic potential encodingsAnd physiological characteristics potential coding->
Step S302: and determining a node attribute matrix corresponding to the phage element according to the time sequence feature potential codes and the physiological feature potential codes.
It will be appreciated that the node attribute matrix described above may be a matrix of potential encodings of pre-trained phage elements. In practical application, the potential codes of the time sequence features corresponding to the time sequence features of the element sequence are obtained And the physiological characteristics corresponding to the biological physiological characteristics are potentially encoded +.>After that, the node attribute matrix corresponding to the phage element can be obtained
Step S303: constructing an adjacency matrix between phage elements based on the phage element sequences.
The adjacent matrix may be a matrix based on the difference between phage element sequences.
It should be understood that the step S303 specifically includes: combining the phage element sequence with a standard phage element sequence after Jing Sheng signal analysis to obtain a combined phage element sequence; and performing multi-sequence comparison on the combined phage element sequences and all phage element sequences of the original phage through preset sequence comparison software to obtain an adjacency matrix between phage elements.
It will be appreciated that the standard phage element sequence after the bioinformatic analysis may be a standard phage element sequence obtained by functional analysis of phage element sequences using bioinformatic techniques to locate and study biomarkers in the genome.
It should be noted that the above-mentioned preset sequence alignment software may be software for performing multiple sequence alignment on the element sequences, for example: the present embodiment is not limited to MAFFT, or other sequence alignment software having the same or similar function as MAFFT. Wherein, the multiple sequence alignment (Multiple sequence alignment, MSA) can be to align amino acid sequences or nucleic acid sequences of a plurality of (3 or more) protein molecules with systematic evolutionary relationships, and arrange identical bases or amino acid residues on the same column as much as possible.
In specific implementation, firstly, the phage element sequences of the original phage and the standard phage element sequences after the belief analysis can be combined, then all element sequences of the original phage are obtained, and the combined phage element sequences and the phage element sequences are subjected to multi-sequence comparison through sequence comparison software MAFFT, so that a similar adjacency matrix among phage elements is obtained.
Step S304: and determining the element space structure corresponding to the original phage based on the adjacency matrix and the node attribute matrix.
It should be understood that the element space structure of phage may be represented by element diagram GThe method is a graph designated by an adjacent matrix A and a node attribute matrix N, so that after the adjacent matrix A among phage elements and the node attribute matrix N corresponding to phage elements are acquired, an element space structure corresponding to original phage can be generated.
According to the embodiment, the time sequence features and the biological physiological features of the element sequence are respectively converted into the time sequence feature potential codes and the biological physiological feature potential codes through the embedding component, the node attribute matrix corresponding to the phage element is determined according to the time sequence feature potential codes and the biological feature potential codes, the phage element sequence and the standard phage element sequence after the biological analysis are combined to obtain a combined phage element sequence, then the sequence comparison software is used for carrying out multi-sequence comparison to obtain an adjacent matrix among phage elements, and finally the element space structure corresponding to the original phage is determined based on the adjacent matrix and the node attribute matrix, so that the correlation on the phage element sequence space can be explored, the performance of the model is improved, and the base sequence information loss caused by componentization dimension reduction is reduced.
Referring to FIG. 5, FIG. 5 is a schematic flow chart of a third embodiment of the phage construction method of the present invention.
Based on the above embodiments, in this embodiment, the timing generation countermeasure network is further provided with an encoding component; prior to the step S40, the method further includes:
step S041: the physiological characteristic potential code is converted into a target biological physiological characteristic through a physiological characteristic recovery network.
It should be noted that, the encoding component may be a component for learning a low-dimensional vector latent map of the element map G structure, for example: an Encoder (Encoder). Wherein the low-dimensional vector provides a new discrimination means for discriminators in the time-series generation countermeasure network, guiding the generator to learn characteristics on the phage element space.
It is to be appreciated that the physiological characteristic restoration network described above can be a network that restores a physiological characteristic latent code to a physiological characteristic. Accordingly, the target biological physiological characteristic may be a physiological characteristic recovered by the physiological characteristic recovery network.
Step S042: the timing characteristics are potentially transcoded into a target element sequence timing characteristics by a timing characteristics recovery network.
It should be appreciated that the timing characteristic recovery network described above may be a network that recovers a potential encoding of a timing characteristic to a timing characteristic. Accordingly, the sequence of timing characteristics of the target element may be the timing characteristics recovered by the timing characteristic recovery network. In practical application, as shown in FIG. 4, the recovery layer Restoring potential vectors of physiological and temporal features to their feature representation +.>. This embodiment can be implemented by feed-forward network per step +.>Wherein->And->Is a recovery network for physiological and temporal feature embedding.
It will be appreciated that the embedded and recovery layer functions may be autoregressive and that the output of each time step may depend on previous information, so that the embedded and recovery layer functions in this embodiment may be implemented by LSTM (Long Short-Term Memory network). Purely as a reversible mapping between features and potential space. The embedding and recovery functions should be able to recover from the original dataPotential representation of->Accurately reconstruct +.>At this point, a reconstruction penalty may be obtained:
wherein the method comprises the steps ofAnd->Is two reconstruction networks for physiological and temporal feature embedding.
Correspondingly, the step S40 includes: step S401: and encoding the element space structure through a structure encoding network and the encoding component to obtain a structure potential encoding corresponding to the element space structure.
It should be noted that the above-mentioned structure coding network may be a network for converting a spatial structure of an element into a structure potential coding. Accordingly, the structure potential code may be a potential code obtained by the coding component after coding the element space structure through the structure coding network.
Step S402: the structure potential encoding is converted to a target element spatial structure by a structural feature decoding network and a decoder.
It should be appreciated that the above described structural feature decoding network may be a network for the structural potential transcoding into an element space structure. Accordingly, the target element space structure may be an element space structure converted by the structural feature decoding network and the decoder.
Further, in order to make the learned target element spatial structure more accurate, after the step S402, the method includes: updating a arbiter in the time series generation countermeasure network based on the structural latent code, the element sequence time series characteristics, and the bio-physiological characteristics; directing, by the updated arbiter, the generator in the timing generation countermeasure network to learn spatial features of the phage elements to update the target element spatial structure.
It will be appreciated that phage element sequence representation based on phagePhage elements can be subdivided into coding and non-coding elements, where the element coding region refers to the region of a gene that defines the sequence of a protein, and thus is a key factor affecting protein function. Comparing the coding regions of different species may reveal the evolutionary relationship of the species, as their coding regions may have similar characteristics, indicating that they have a common ancestor. Likewise, variations in the coding region may reveal the evolution of the protein during evolution. In this example, a component similarity map can be constructed using the distance matrix of phage component sequences >Wherein->May be an adjacency matrix, and->Can be a node attribute matrix, an element node attribute value +.>For the pre-trained element potential vector,
in a specific implementation, as shown in FIG. 4, due to the component diagramIs a contiguous matrix->And node Attribute matrix->A prescribed map, so that one encoder and one Decoder (Decoder) can be learned in this embodiment, in the map +.>Is encoded in succession with the space of (2)>Mapping between them. In the probability setting of VAE, the encoder is posterior by variationDefinition, decoder is distributed by generation +.>Definition, wherein->And->Is a learning parameter. In addition, in the case of the optical fiber,has an a priori distribution->Imposed on the potential coded representation as regularization, the present embodiment can use a simple isotropic Gaussian a priori ++>. Both encoder and decoder can be realized in this embodiment by a graph convolutional neural network (Graph Convolutional Neural Network, GNN), which is +.>WhereinIs a coding network of the element diagram structure (i.e. the above-mentioned structure coding network),/i>Is a decoding network of the element diagram structure (i.e., the above-described structural feature decoding network). Coding the element map structure using the network architecture of the variable component codec to obtain the potential variable +. >(i.e., the above-mentioned spatial structure of the target element), and then associating it with the characteristic information of the phage element sequence +.>Fusion is performed to improve the discrimination capability of a discriminator (distinguisher) of the countermeasure component, guide a generator (generator) to learn the spatial features of the element, thereby realizing updating of the discriminator in the time-lapse countermeasure network, and the spatial features of the phage element can be learned by the updated discriminator designation generator to update the spatial structural features of the target element. The whole model is obtained by minimizing the negative log likelihood +.>Implementation, a second loss function can be obtained at this time:
wherein, the liquid crystal display device comprises a liquid crystal display device,the first term of (2) can reconstruct the loss, force sampling the generated graph and input graph +.>Is highly similar to the above. />The second term of (2) may be KL divergence regularized distribution space, allowing direct from +.>Sampling->Rather than from a later timeSampling->
Further, to make the learned target element sequence timing characteristics more accurate, after the step S042, the method further includes: receiving, by the generator, a timing feature latent code to generate a target timing feature latent code; the generator is directed to learn timing characteristics of the phage element based on the timing characteristic latent encoding and the target timing characteristic latent encoding to update target element sequence timing characteristics.
It should be noted that in the time sequence generation countermeasure network, the output of the generator is not directly in the potential space of the feature, but is output to the embedded potential space which has been pre-trained. In practical application, as shown in FIG. 4, it is possible to provideRepresenting a vector space defining a known distribution and extracting random vectors therefrom as input to generate +.>,/>. Generating a function->Taking the random vector tuples of physiological features and time sequence features to generate potential vectorsThe method is specifically expressed as follows: />WhereinIs a physiological characteristic generation network (which can be realized by a fully-connected neural network in the embodiment); />Is a generating network of timing characteristics (which in this embodiment may be implemented by a recurrent neural network). Random vector->Can be sampled from the selected distribution, +.>A random process is followed. But relying solely on binary challenge feedback of the discriminators may not be sufficient to motivate the generator to capture the conditional distribution of timing features in the phage sequence data, so this embodiment may introduce an additional penalty to further guide the generator learning. In the potential space, the generator receives real element data +.>Is potentially represented to generate the next element +. >I.e., the timing characteristic potential codes may be received by a generator to generate target timing characteristic potential codes. The gradient can be calculated at this time on the loss, which captures the distribution +.>And->The difference between them, the maximum likelihood is applied to derive the supervised loss function:
wherein, the liquid crystal display device comprises a liquid crystal display device,use of a sample->Approximation->This is standard in random gradient descent. In summary, in any one step of the training sequence, we evaluate the difference between the actual next potential vector (from the embedding component described above) and the next synthesized potential vector, thus +.>It may be made possible to further ensure that the model can produce similar transfer patterns between the front and back of the element. After the target time sequence feature potential codes are generated, a supervision loss function can be determined based on the time sequence feature potential codes and the target time sequence feature potential codes, and the supervision loss function is used for guiding the generator to learn the time sequence features of phage elements so as to update the time sequence features of the target elements, thereby enabling the target elements generated by the generator to be generatedThe sequential features of the part sequence are more accurate.
Step S403: constructing a target phage according to the target biological physiological characteristic, the target element space structure and the target element sequence time sequence characteristic.
It should be appreciated that the present implementation, after obtaining the target biological physiological characteristic recovered via the physiological characteristic recovery network, the target element sequence temporal characteristic recovered via the temporal characteristic recovery network, the target element spatial structure decoded via the structural characteristic decoding network and the decoder, can construct a phage by the target biological physiological characteristic, the target element spatial structure and the target element sequence temporal characteristic to obtain an artificial phage.
In a specific implementation, the physiological feature potential code and the time sequence feature potential code can be firstly converted into the target biological physiological feature and the target element sequence time sequence feature respectively, the corresponding structure potential code is obtained after the element space structure is coded, and the structure potential code is converted into the target element space structure, so that the target phage can be constructed according to the target biological physiological feature, the target element sequence time sequence feature and the target element sequence time sequence feature.
Further, after the step S403, the method further includes: inputting a time sequence feature potential code into a time sequence feature discrimination function, inputting the physiological feature potential code into a physiological feature discrimination function, and inputting the structure potential code into a spatial feature discrimination function to respectively obtain the real sample probability of the time sequence feature of the target element sequence, the real sample probability of the physiological feature of the target organism and the real sample probability of the spatial structure of the target element.
It will be appreciated that in this embodiment, as shown in fig. 4, the time series feature latent codes, the physiological feature latent codes and the structural latent codes may all be input to corresponding discriminant functionsReceiving potential representations of physiological, temporal and spatial features, returning a classification +.>。/>Representing real vector +.>Or a synthetic vector->The method comprises the steps of carrying out a first treatment on the surface of the Similarly, a->Representing real data +.>Or synthetic data->Can be expressed in particular as:andthe forward and backward hidden state sequences can be represented, respectively +.>,/>Is a recurrent neural network, < >>(physiological characteristic discriminant function),. About.>(timing characteristics discrimination function),. About.>(spatial feature discriminant function) is an output layer classification function in which +.>Can be realized by a bi-directional recurrent neural network, ">And->May be implemented by a fully connected neural network. To achieve the countermeasure learning of the model, the countermeasure loss of the present embodiment may be: />
Wherein the first term against loss is the loss function of the training generator. The generator makes the judgment of the discriminator error by reasonably generating samples, namely when judging the samples generated by the generator, the predicted value of the discriminator approaches toTo achieve this, the generator is to make the sample prediction value generated by the arbiter for it as small as possible less than +. >. Thus, the loss function of the generator is the predicted value of the sample for which the arbiter generates +.>,/>And->Logarithmic (log). The second loss function against loss is used to evaluate the effect of the generator and to represent the difference between the samples generated by the generator and the real samples, which is represented in the form of a binary cross entropy. In particular for a single moment +.>It calculates +.>Wherein->Is the sample that the arbiter considers the generator to generate +.>Is the probability of a true sample. Likewise, a->And->Is also generated by the arbiter for the generator>And->Is to be expected in +.>Such that a total evaluation generator generates a loss function of the sample mass. The time sequence feature potential codes, the physiological feature potential codes and the structure potential codes can be input into corresponding discriminant functions, and the real sample probability of the time sequence features of the target element sequence, the real sample probability of the physiological features of the target organism and the real sample probability of the space structure of the target element can be obtained, so that the judgment of the authenticity of the constructed target phage can be realized, and the accuracy of the constructed target phage is further improved.
It should be noted that, in the sequence generation model, the verification of the generation quality of the data is a difficulty, and even more so, the sequence data cannot be as intuitive as the image data, and the human eye can subjectively judge, so in order to verify the model and the artificial phage, the embodiment can verify the generation quality of the data from different aspects. The present example can analyze model phage generation quality through a variety of angles and modes, including: visualization of data distribution, nucleotide content comparison, GC content. Wherein the data distribution visualization, nucleotide content comparison and base combination thermodynamic diagram are base data distributions that compare the generated data. In practical application, data visualization, namely performing dimension reduction on training data, generated data and random data by using PCA and t-SNE, and checking the distribution condition of each data; average nucleotide content comparison by comparing the average content of A, G, C, T bases in all natural phage, artificial phage, and randomly generated phage, the generated data should have a content profile similar to that of natural phage; GC content is an important feature of DNA sequences, which has a versatile biological significance. Phages with higher GC content may be more susceptible to infection by hosts with higher GC content, while phages with lower GC content are more susceptible to infection by hosts with lower GC content. While a genome with a high GC content is easier to encode codons containing GC bases, since codons with a high GC content are more numerous than codons with a high AT content, which also means that a genome with a high GC content may have a higher gene density and encoding capacity. Through the research on GC content in phage base sequences, the biological characteristics of phage can be better understood, and important guidance is provided for developing new phage treatment and control strategies.
Referring to fig. 6 and 7, fig. 6 is a schematic diagram showing the result of PCA visualization corresponding to each phage in a third embodiment of the phage construction process of the present invention; FIG. 7 is a schematic representation of the results of t-SNE visualization for each phage in a third embodiment of the phage construction process of the present invention. In this example, dimensionality reduction visual analysis (PCA and t-SNE, respectively) was performed on the potential representation of the natural phage, the potential representation of the model generated data, and the randomly generated data. Wherein PCA and t-SNE may be implemented based on Scikit-Learn, which may be a machine learning toolkit. Random data is the random sampling of elements from a library of elements, combined into a sequence of elements, and reconverted into a potential representation of the sequence of elements (using embedded components). As shown in fig. 6 and 7, fig. 6 is a result of the visualization of PCA, fig. 7 is a result of the visualization of t-SNE, nature represents a natural phage, synthetic represents an artificial phage, random may be a randomly generated sequence, and both the abscissa x and the ordinate y of the coordinate system in fig. 6 and 7 represent positional information of phage. The graph shows that the dimension reduction methods of PCA and t-SNE can well distinguish randomly generated data distribution, in the PCA, the data distribution of natural phage and artificial phage is slightly overlapped, and in the t-SNE, the data distribution of the natural phage and the artificial phage is similar and basically overlapped, so that the generated artificial phage has certain similarity with the natural phage, has certain differentiation degree with random generation, has certain effect with a phage generation model, and can generate data similar to the natural phage.
Referring to FIGS. 8 and 9, FIG. 8 is a graph showing the average nucleotide profile of each phage in the third embodiment of the phage construction process of the present invention; FIG. 9 is a graph showing GC content distribution of each phage in the third example of the phage construction method of the present invention. In the natural phage DNA sequence, the nucleotide distribution is approximately balanced, that is, the A, T, C and G content are close. As shown in FIG. 8, which shows the average nucleotide profiles (Proportion distribution of ATCG) of natural phage, artificial phage and randomly generated phage, the abscissa of the coordinate system in FIG. 8 shows A, T, C and G, respectively, and the ordinate shows the ratio of A, T, C to G, it can be seen from FIG. 8 that the three phage data have 4 nucleotides in the range of 20% -30%, but the average nucleotide profile can also show the frequency and distribution of nucleotides in the sequence, and the average nucleotide distribution of the same species should be similar. As shown in FIG. 8, the absolute difference between the percentage of the content distribution of the 4 nucleotides of the randomly generated phage sequences and the natural phage was 4.243% on average, and the frequency and distribution of the nucleotides in the artificial phage were similar to those of the natural phage, and the absolute difference between the percentage of the content distribution was 0.725% on average, which indicated the effectiveness of the model. On the other hand, as shown in fig. 9, the abscissa of the coordinate system in fig. 9 represents GC content (GC content), and the ordinate represents Frequency of occurrence (Frequency), GC content being an important feature of a DNA sequence, which has various biological meanings. The GC content of phage is relatively low, on average about 35-45%, but the nucleotide distribution and content of phage genomes can vary from phage species to phage species. As can be seen from the GC content distribution diagram of FIG. 9, the GC content of the natural phage in the training data in the present example is mostly between 45% and 50%, the GC content of the phage generated by the construction model is also in this interval, and the GC content distribution of the phage generated randomly has obvious distribution difference from the natural phage, so that the data of the artificial phage generated is approximately the same as the data distribution of the natural phage from the analysis of the base sequence.
In the embodiment, the physiological characteristic potential codes are converted into the target biological physiological characteristic through the physiological characteristic recovery network, the time sequence characteristic potential codes are converted into the target element sequence time sequence characteristic through the time sequence characteristic recovery network, the element space structure is coded through the structure coding network and the coded component to obtain the structure potential codes corresponding to the element space structure, the structure characteristic potential codes are converted into the target element space structure through the structure characteristic decoding network and the decoder, and then the target phage is constructed according to the target biological physiological characteristic, the target element space structure and the target element sequence time sequence characteristic, so that a large number of genome rearrangement schemes can be searched rapidly, and the phage rearrangement efficiency is improved. Meanwhile, updating a discriminator in a time sequence generation countermeasure network based on the structure potential code, the element sequence time sequence characteristic and the biological physiological characteristic, guiding a generator in the time sequence generation countermeasure network to learn the spatial characteristic of the phage element through the updated discriminator, and updating the spatial structure of the target element; the generator receives the time sequence feature potential codes to generate target time sequence feature potential codes, and the generator is guided to learn the time sequence features of the phage elements based on the time sequence feature potential codes and the target time sequence feature potential codes so as to update the time sequence features of the target element sequences, so that the space structure of the generated target element and the time sequence features of the target element sequences are more accurate, and the constructed target phage can be more accurate.
Furthermore, an embodiment of the present invention proposes a storage medium having stored thereon a phage building program which, when executed by a processor, implements the steps of the phage building method as described above.
Referring to FIG. 10, FIG. 10 is a block diagram showing the construction of a first embodiment of the phage construction apparatus according to the present invention.
As shown in fig. 10, the phage construction apparatus according to the embodiment of the present invention includes:
an element sequence acquisition module 501, configured to acquire a phage element sequence corresponding to an original phage;
a feature determining module 502, configured to determine, according to the phage element sequence, an element sequence timing feature and a bio-physiological feature corresponding to the original phage;
a spatial structure determination module 503, configured to determine a spatial structure of an element corresponding to the original phage based on the element sequence timing feature and the biophysical feature;
phage construction module 504 is configured to perform feature distribution fitting on the element sequence temporal feature, the bio-physiological feature and the element spatial structure through a temporal generation countermeasure network to obtain a target phage.
The phage construction apparatus of this embodiment discloses obtaining phage element sequences corresponding to the original phage; determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence, and determining element space structures corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics; performing characteristic distribution fitting on element sequence time sequence characteristics, biological physiological characteristics and element space structures through a time sequence generation countermeasure network to obtain target phage; compared with the construction of artificial phage by phage genome rearrangement technology in the prior art, rearrangement of phage genome may be caused, so that rearrangement results are unstable, and because the embodiment determines element sequence time sequence characteristics and biological physiological characteristics according to phage element sequences corresponding to original phage, determines element space structures based on the element sequence time sequence characteristics and the biological physiological characteristics, and obtains target phage after characteristic distribution fitting of element sequence time sequence characteristics, biological physiological characteristics and element space structures by a time sequence generation countermeasure network, the technical problem of low phage construction efficiency by phage genome rearrangement technology in the prior art is solved.
Based on the above-described first embodiment of the phage building apparatus of the present invention, a second embodiment of the phage building apparatus of the present invention is proposed.
In this embodiment, the spatial structure determination module 503 is further configured to convert, by the embedding component, the element sequence temporal feature and the bio-physiological feature into a temporal feature latent code and a physiological feature latent code, respectively; determining a node attribute matrix corresponding to the phage element according to the time sequence feature potential codes and the physiological feature potential codes; constructing an adjacency matrix between phage elements based on the phage element sequences; and determining the element space structure corresponding to the original phage based on the adjacency matrix and the node attribute matrix.
Further, the spatial structure determining module 503 is further configured to perform sequence combination on the phage element sequence and a standard phage element sequence after Jing Sheng signal analysis to obtain a combined phage element sequence; and performing multi-sequence comparison on the combined phage element sequences and all phage element sequences of the original phage through preset sequence comparison software to obtain an adjacency matrix between phage elements.
According to the embodiment, the time sequence features and the biological physiological features of the element sequence are respectively converted into the time sequence feature potential codes and the biological physiological feature potential codes through the embedding component, the node attribute matrix corresponding to the phage element is determined according to the time sequence feature potential codes and the biological feature potential codes, the phage element sequence and the standard phage element sequence after the biological analysis are combined to obtain a combined phage element sequence, then the sequence comparison software is used for carrying out multi-sequence comparison to obtain an adjacent matrix among phage elements, and finally the element space structure corresponding to the original phage is determined based on the adjacent matrix and the node attribute matrix, so that the correlation on the phage element sequence space can be explored, the performance of the model is improved, and the base sequence information loss caused by componentization dimension reduction is reduced.
Based on the above-described respective embodiments of the phage building apparatus of the present invention, a third embodiment of the phage building apparatus of the present invention is proposed.
In this embodiment, the phage building module 504 is further configured to transform the physiological characteristic potential code into a target biological physiological characteristic via a physiological characteristic recovery network; converting the time sequence characteristic potential codes into a target element sequence time sequence characteristic through a time sequence characteristic recovery network; encoding the element space structure through a structure encoding network and the encoding component to obtain a structure potential encoding corresponding to the element space structure; converting the structure potential code into a target element spatial structure through a structural feature decoding network and a decoder; constructing a target phage according to the target biological physiological characteristic, the target element space structure and the target element sequence time sequence characteristic.
Further, the phage building module 504 is further configured to update the discriminators in the timing generation countermeasure network based on the structural potential code, the element sequence timing characteristics, and the biophysical characteristics; directing, by the updated arbiter, the generator in the timing generation countermeasure network to learn spatial features of the phage elements to update the target element spatial structure.
Further, the phage construction module 504 is further configured to receive, via the generator, the timing sequence feature potential code to generate a target timing sequence feature potential code; the generator is directed to learn timing characteristics of the phage element based on the timing characteristic latent encoding and the target timing characteristic latent encoding to update target element sequence timing characteristics.
Further, the phage building module 504 is further configured to input a time sequence feature latent code to a time sequence feature discrimination function, input the physiological feature latent code to a physiological feature discrimination function, and input the structure latent code to a spatial feature discrimination function, to obtain a real sample probability of the time sequence feature of the target element sequence, a real sample probability of the physiological feature of the target organism, and a real sample probability of the spatial structure of the target element, respectively.
In the embodiment, the physiological characteristic potential codes are converted into the target biological physiological characteristic through the physiological characteristic recovery network, the time sequence characteristic potential codes are converted into the target element sequence time sequence characteristic through the time sequence characteristic recovery network, the element space structure is coded through the structure coding network and the coded component to obtain the structure potential codes corresponding to the element space structure, the structure characteristic potential codes are converted into the target element space structure through the structure characteristic decoding network and the decoder, and then the target phage is constructed according to the target biological physiological characteristic, the target element space structure and the target element sequence time sequence characteristic, so that a large number of genome rearrangement schemes can be searched rapidly, and the phage rearrangement efficiency is improved. Meanwhile, updating a discriminator in a time sequence generation countermeasure network based on the structure potential code, the element sequence time sequence characteristic and the biological physiological characteristic, guiding a generator in the time sequence generation countermeasure network to learn the spatial characteristic of the phage element through the updated discriminator, and updating the spatial structure of the target element; the generator receives the time sequence feature potential codes to generate target time sequence feature potential codes, and the generator is guided to learn the time sequence features of the phage elements based on the time sequence feature potential codes and the target time sequence feature potential codes so as to update the time sequence features of the target element sequences, so that the space structure of the generated target element and the time sequence features of the target element sequences are more accurate, and the constructed target phage can be more accurate.
Other embodiments or specific implementations of the phage construction apparatus of the present invention can refer to the above-described method embodiments, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A phage construction method, wherein said phage construction method comprises:
obtaining a phage element sequence corresponding to an original phage;
determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence;
determining an element spatial structure corresponding to the original phage based on the element sequence timing characteristics and the biophysical characteristics;
performing characteristic distribution fitting on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network to obtain a target phage;
the phage building method is realized based on the time sequence generation countermeasure network, wherein an embedded component is arranged in the time sequence generation countermeasure network;
correspondingly, the step of determining the element space structure corresponding to the original phage based on the element sequence timing characteristics and the biological physiological characteristics comprises the following steps:
Converting, by the embedding component, the element sequence temporal feature and the bio-physiological feature into a temporal feature latent code and a physiological feature latent code, respectively;
determining a node attribute matrix corresponding to the phage element according to the time sequence feature potential codes and the physiological feature potential codes;
combining the phage element sequence with a standard phage element sequence after Jing Sheng signal analysis to obtain a combined phage element sequence;
performing multi-sequence comparison on the combined phage element sequences and all phage element sequences of the original phage through preset sequence comparison software to obtain an adjacency matrix among phage elements;
and determining the element space structure corresponding to the original phage based on the adjacency matrix and the node attribute matrix.
2. The phage building method of claim 1, wherein the time series generation countermeasure network is further provided with a coding component;
the step of obtaining the target phage further comprises the following steps of:
Converting the physiological characteristic potential code into a target biological physiological characteristic through a physiological characteristic recovery network;
converting the time sequence characteristic potential codes into a target element sequence time sequence characteristic through a time sequence characteristic recovery network;
the step of obtaining a target phage by fitting the characteristic distribution of the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network comprises the following steps:
encoding the element space structure through a structure encoding network and the encoding component to obtain a structure potential encoding corresponding to the element space structure;
converting the structure potential code into a target element spatial structure through a structural feature decoding network and a decoder;
constructing a target phage according to the target biological physiological characteristic, the target element space structure and the target element sequence time sequence characteristic.
3. The phage building method of claim 2, wherein after the step of converting the structure potential code into the target element spatial structure by the structural feature decoding network and decoder, further comprising:
updating a arbiter in the time series generation countermeasure network based on the structural latent code, the element sequence time series characteristics, and the bio-physiological characteristics;
Directing, by the updated arbiter, the generator in the timing generation countermeasure network to learn spatial features of the phage elements to update the target element spatial structure.
4. A phage construction method according to claim 3, wherein after the step of potentially transcoding the timing characteristics into the timing characteristics of the sequence of target elements via the timing characteristics recovery network, further comprising:
receiving, by the generator, a timing feature latent code to generate a target timing feature latent code;
the generator is directed to learn timing characteristics of the phage element based on the timing characteristic latent encoding and the target timing characteristic latent encoding to update target element sequence timing characteristics.
5. The phage construction method of claim 4, wherein after said step of constructing a target phage from said target biophysical feature, said target element spatial structure, and said target element sequence timing feature, further comprises:
inputting a time sequence feature potential code into a time sequence feature discrimination function, inputting the physiological feature potential code into a physiological feature discrimination function, and inputting the structure potential code into a spatial feature discrimination function to respectively obtain the real sample probability of the time sequence feature of the target element sequence, the real sample probability of the physiological feature of the target organism and the real sample probability of the spatial structure of the target element.
6. A phage building apparatus, said apparatus comprising:
the element sequence acquisition module is used for acquiring phage element sequences corresponding to the original phage;
the characteristic determining module is used for determining element sequence time sequence characteristics and biological physiological characteristics corresponding to the original phage according to the phage element sequence;
the space structure determining module is used for determining the element space structure corresponding to the original phage based on the element sequence time sequence characteristics and the biological physiological characteristics;
the phage construction module is used for carrying out characteristic distribution fitting on the element sequence time sequence characteristics, the biological physiological characteristics and the element space structure through a time sequence generation countermeasure network to obtain target phage;
the phage constructing device is provided with the time sequence generation countermeasure network, and an embedded component is arranged in the time sequence generation countermeasure network;
the space structure determining module is further used for converting the element sequence time sequence characteristics and the biological physiological characteristics into time sequence characteristic potential codes and physiological characteristic potential codes respectively through the embedding component; determining a node attribute matrix corresponding to the phage element according to the time sequence feature potential codes and the physiological feature potential codes; combining the phage element sequence with a standard phage element sequence after Jing Sheng signal analysis to obtain a combined phage element sequence; performing multi-sequence comparison on the combined phage element sequences and all phage element sequences of the original phage through preset sequence comparison software to obtain an adjacency matrix among phage elements; and determining the element space structure corresponding to the original phage based on the adjacency matrix and the node attribute matrix.
7. An electronic device, the device comprising: a memory, a processor, and a phage building program stored on the memory and executable on the processor, the phage building being configured to implement the steps of the phage building method of any one of claims 1 to 5.
8. A storage medium having stored thereon a phage building program which when executed by a processor performs the steps of the phage building method according to any one of claims 1 to 5.
CN202310771128.7A 2023-06-28 2023-06-28 Phage construction method, device, equipment and storage medium Active CN116543839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310771128.7A CN116543839B (en) 2023-06-28 2023-06-28 Phage construction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310771128.7A CN116543839B (en) 2023-06-28 2023-06-28 Phage construction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116543839A CN116543839A (en) 2023-08-04
CN116543839B true CN116543839B (en) 2023-09-22

Family

ID=87458091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310771128.7A Active CN116543839B (en) 2023-06-28 2023-06-28 Phage construction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116543839B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533578A (en) * 2019-06-05 2019-12-03 广东世纪晟科技有限公司 A kind of image interpretation method based on condition confrontation neural network
CN113658641A (en) * 2021-07-20 2021-11-16 北京大学 Phage classification method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3956896B1 (en) * 2019-05-19 2024-05-01 Just-Evotec Biologics, Inc. Generation of protein sequences using machine learning techniques
CN111291885B (en) * 2020-01-20 2023-06-09 北京百度网讯科技有限公司 Near infrared image generation method, training method and device for generation network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533578A (en) * 2019-06-05 2019-12-03 广东世纪晟科技有限公司 A kind of image interpretation method based on condition confrontation neural network
CN113658641A (en) * 2021-07-20 2021-11-16 北京大学 Phage classification method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Measurement of proton electromagnetic form factors in the time-like region using initial state radiation at BESIII;M. Ablikim et al.;Physics Letters B;第8卷;第1-10页 *
人工智能在合成生物学的应用;李敏 等;集成技术;第10卷(第5期);第43-56页 *
锂电池负极材料的研究进展;冯谢力 等;山东化工;第52卷(第7期);第126-138页 *

Also Published As

Publication number Publication date
CN116543839A (en) 2023-08-04

Similar Documents

Publication Publication Date Title
Tampuu et al. ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
Rao et al. Imputing single-cell RNA-seq data by combining graph convolution and autoencoder neural networks
Huang et al. Self-supervision-augmented deep autoencoder for unsupervised visual anomaly detection
Tkacik et al. Spin glass models for a network of real neurons
Girgis MeShClust v3. 0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores
CN116580848A (en) Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers
Xing et al. A hierarchical Bayesian Markovian model for motifs in biopolymer sequences
KR20200133067A (en) Method and system for predicting disease from gut microbial data
CN116543839B (en) Phage construction method, device, equipment and storage medium
Brejová et al. Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions
Huang et al. An attention-based neural network basecaller for Oxford Nanopore sequencing data
CN116758983A (en) Lysine phosphoglyceride site recognition method and system
Ribas et al. Life-Like Network Automata descriptor based on binary patterns for network classification
Banuelos et al. Negative binomial optimization for biomedical structural variant signal reconstruction
Wang et al. Improving irregularly sampled time series learning with time-aware dual-attention memory-augmented networks
Lupo et al. Pairing interacting protein sequences using masked language modeling
Masud et al. Multivariate rank via entropic optimal transport: sample efficiency and generative modeling
Qiu et al. Variational Interpretable Learning from Multi-view Data
Ji Improving protein structure prediction using amino acid contact & distance prediction
Yan et al. Accurate prediction of residue-residue contacts across homo-oligomeric protein interfaces through deep leaning
RU2734906C1 (en) Method of express-testing means of highly reliable biometric-neural network authentication of a person using a &#34;friend&#34; biometric images base
Nalbantoglu et al. Computational genomic signatures
WO2022118607A1 (en) Information processing apparatus, information processing method, and program
Junjun et al. A comprehensive review of deep learning-based variant calling methods
Zazas Deep learning for signaling network embeddings and inferring them from a compound’s chemical structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant