CN114913922A - DNA sequence assembling method - Google Patents

DNA sequence assembling method Download PDF

Info

Publication number
CN114913922A
CN114913922A CN202210406466.6A CN202210406466A CN114913922A CN 114913922 A CN114913922 A CN 114913922A CN 202210406466 A CN202210406466 A CN 202210406466A CN 114913922 A CN114913922 A CN 114913922A
Authority
CN
China
Prior art keywords
dna
node
directed
path
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210406466.6A
Other languages
Chinese (zh)
Inventor
史舜阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Turing Intelligent Computing Quantum Technology Co Ltd
Original Assignee
Shanghai Turing Intelligent Computing Quantum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Turing Intelligent Computing Quantum Technology Co Ltd filed Critical Shanghai Turing Intelligent Computing Quantum Technology Co Ltd
Priority to CN202210406466.6A priority Critical patent/CN114913922A/en
Publication of CN114913922A publication Critical patent/CN114913922A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Abstract

The present invention relates to a method for assembling DNA sequences. According to the DNA sequence assembly method based on the Itanium quantum annealing, for example, a huge amount of ordered NDA fragments are obtained through a whole genome shotgun method, the ordered DNA fragments are constructed into a directed graph, the parameters of Itanium Hamilton are constructed according to the directed graph and are brought into Itanium quantum annealing, the result of Itanium quantum annealing evolution is the position of each node in a loop, namely the position of each DNA fragment in the assembly sequence, and the complete sequence of the whole DNA is reproduced according to the assembly sequence of the DNA fragments.

Description

DNA sequence assembling method
Technical Field
The invention relates to the technical field of deoxyribonucleic acid (DNA) determination, in particular to a DNA sequence assembling method based on isooctane quantum annealing.
Background
Deoxyribonucleic acid, i.e., the technique for determining DNA sequence, in molecular biology research, the sequence analysis of DNA is the basis for further research and modification of target genes. The techniques used for sequencing are mainly dideoxy chain end termination methods proposed by Sanger et al and chemical degradation methods by Gilbert et al. These two methods are very different in principle, but both are based on the fact that the nucleotide starts at a fixed point and randomly stops at a specific base, four sets of ATCG nucleotides of different lengths are generated and electrophoresed on a urea-denatured PAGE gel to obtain DNA sequences.
Compared with the traditional sequencing technologies such as Sangge and the like, the second-generation nucleic acid sequence sequencing technology which is developed in recent years has the outstanding advantages of high flux, high accuracy, low running cost and the like, is a revolutionary change of the DNA sequencing technology, promotes the research of a plurality of biological frontier fields, and has very wide application prospect. Among them, GA sequencers and Applied Biology sequencers are two kinds of sequencers currently in the market. Because the nucleic acid sequences generated by the two sequencers have the industrial characteristic of short sequence, an essential link in the data analysis process from the data generated by the sequencers to the application of the data to numerous biological applications is the short sequence replying genome. That is, the high-flux short sequence generated by the sequencer is compared with the long sequence of the genome. I.e. the requirement of the degree of match of the genomic sequences is generated.
The main objective is to find a most similar fragment on the genome sequence to match it and output the matching position so as to consider the nature of the short sequence retrogradation genome as a short sequence to long sequence alignment problem. This is the most basic and commonly used algorithm in bioinformatics, which is needed for almost all bioinformatics processing tasks. As the amount of biological sequence data available for comparative analysis has seen explosive growth, the emerging diverse new requirements for sequence comparisons pose new challenges to approaches dealing with sequence alignments. For example, how to assemble DNA sequences.
At present, because of the realization of extremely high detection reliability, DNA sequencing technology is becoming a hot point for research of life science and medical science, and development is rapidly advanced and advanced day by day. The DNA sequence data is obtained by a sequencing technology aiming at deoxyribonucleic acid substances, is a basic research object in the fields of genetics, genomics, bioinformatics, medicine and the like, and has important scientific value and practical significance. As new generation high throughput sequencing technologies mature and are used in large quantities to obtain DNA data in a slightly reduced time, they are also stranded in the face of single molecules of hundreds of millions of nucleotides.
Disclosure of Invention
The application discloses a DNA sequence assembling method, which comprises the following steps:
establishing a directed graph of the DNA fragment, constructing an isooctane Hamiltonian according to the directed graph, and reproducing the complete sequence of the whole DNA by the annealing evolution result of the isooctane Hamiltonian.
The method described above, wherein:
establishing the directed graph comprises: copying a plurality of DNA to be sequenced, breaking each DNA into a plurality of DNA fragments at random positions, sequencing nucleotides, and taking the DNA fragments subjected to nucleotide sequencing as nodes of a directed graph.
The method described above, wherein:
a directed edge is assigned to each pair of ordered node pairs in the directed graph.
The method described above, wherein:
each directed edge is distributed with a weight value, and the weight value is used for evaluating the overlapping degree of two DNA fragments represented by a pair of nodes at two ends of the directed edge.
The method described above, wherein:
the smaller the weight value, the larger the degree of overlap, and the larger the weight value, the lower the degree of overlap.
The method described above, wherein:
and quantum annealing of the quantity of the isooctane Hamilton is used for searching a path with the minimum sum of weight values in the directed graph, and corresponding DNA fragments are arranged according to the sequence of the minimum path so as to completely reproduce the nucleotide sequence structure of the DNA.
The method described above, wherein:
adding a zero node which does not represent any DNA segment in the directed graph, wherein the zero node and all other nodes are interconnected by directed edges, the weights of the directed edges between the zero node and all other nodes are all zero, and a closed loop for describing a minimum path is formed by connecting the zero node, the node representing the starting point of the path and the node representing the end point of the path.
The present application also discloses another DNA sequence assembly method, comprising:
copying a plurality of DNA to be sequenced, wherein each DNA is broken into a plurality of DNA fragments at random positions and is subjected to nucleotide sequencing, and the DNA fragments subjected to the nucleotide sequencing are regarded as nodes of a directed graph;
and constructing an isooctane Hamilton quantity according to the directed graph, wherein the quantity is used for searching a path with the minimum sum of weight values in the directed graph in quantum annealing, and the DNA fragments are arranged in the order of the path with the minimum sum so as to reproduce the nucleotide sequence structure of the DNA.
The method described above, wherein:
adding a zero node which does not represent any DNA segment in the directed graph, wherein the zero node and all other nodes are interconnected by directed edges, the weights of the directed edges between the zero node and all other nodes are all zero values, and the zero node, the node representing the starting point of the path and the node representing the end point of the path are connected to form a closed loop for describing the minimum path.
The application also discloses another DNA sequence assembling method, which comprises the following steps:
s1, constructing a directed graph of the DNA fragments;
s2, constructing an Exin Hamiltonian according to the directed graph;
s3, carrying out quantum annealing evolution on the Ilextro Hamiltonian by an Ilextro machine;
s4, reproducing the complete sequence of the whole DNA according to the quantum annealing evolution result.
The method described above, wherein step S1 includes:
s11, copying a plurality of copies of the DNA to be sequenced;
s12, breaking each piece of DNA into a plurality of DNA fragments at random positions;
s13, selecting a fragment with a length suitable for direct sequencing from the DNA fragments, and carrying out nucleotide sequencing;
s14, taking the DNA fragment with the nucleotide sequencing completed as a node in the graph;
s15, allocating a directed edge (uv) to each pair of ordered node pairs (u, v).
The purpose of the present application is to provide a novel solution for DNA sequence assembly, i.e., to efficiently solve the DNA replication problem by means of Italic quantum annealing.
In view of the above, the main objective of the present method is to provide a DNA sequence assembly method based on izon quantum annealing, which utilizes the advantage of fast evolution of izon quantum annealing to process huge amount of NDA sample information quickly.
In order to solve the technical problems, the invention adopts the following technical scheme.
The directed edge (uv) is assigned a weight of W uv S (u, v). For example, the S (u, v) is a known function that can be freely selected or preset by the user, such as an overlap score function, and is used for evaluating the degree of overlap between the tail end of the DNA fragment u and the head end of the DNA fragment v. If the degree of overlap is larger and S (u, v) is smaller, S (u, v) is larger if u, v are not overlapped at all.
According to the nodes v and u and the directed edge (uv), the weight W of the edge uv Together, a directed graph G ═ (V, E) is constructed.
Wherein the quantity of Isatin Hamilton H ising Produced by the following equation:
Figure BDA0003602335080000041
the graph path problem, given a directed graph, given a variety of paths, requires weighting each edge in the directed graph and the weighting rules require that the weights of the edges between all nodes be minimal. The graph path assignment maps to the hamiltonian problem.
The graph path assignment problem can be mathematically equivalent to path finding, and the graph path assignment can be efficiently solved with the machine quantum annealing. By the machine, an Esimon machine is a physical system constructed, and the basic Hamiltonian has the following form:
Figure BDA0003602335080000042
in Hamiltonian s i The value is Isuzin spin, the value is +/-1, and only the diagram path allocation is required to construct an Isuzin Hamiltonian H with the corresponding form ising =-∑ i<j J ij s i s j -∑ i h i s i And various constraints and various optimization objectives are embodied in the Esinhamiltonian, the final result of Esinc quantum annealing evolution corresponds to the solution of graph path assignment. Therefore, the problem of assembly sequence between DNA fragments (such as DNA fragments broken by sequencing) can be solved efficiently by using an Isci machine, and the nucleus of all DNA fragments is essentially measuredThe problem of how to recover the complete DNA sequence after the structure of the nucleotide sequence.
Basic Hamiltonian middle J ij Representing the ith spin s in the Esinhamiltonian quantity i And the jth spin s j The coupling strength of (c).
Basic Hamiltonian of h i The local field binding coefficient (local field term) is expressed in the quantity of isooctane Hamilton.
The quantum computing quantum approximate optimization algorithm is an approximate optimization algorithm of polynomial time, is mainly used for solving a combined optimization problem, and has the potential of showing quantum dominance. Approximate optimization, as the name implies, only requires an approximate solution to the problem. The NP-hard complete problem can be solved, and the NP-hard problem with higher complexity can also be solved.
The Quantum Approximation Optimization Algorithm (QAOA) is one of the most promising algorithms to display quantum dominance implemented on recent quantum computers. As an approximation algorithm it does not give the best results, but "good enough" results and its accuracy depends on the lower bound of the approximation ratio.
The DNA recovery problem can be equivalent to a graph path problem according to the knowledge of graph theory in the mathematical domain. The graph path problem is a well-known NP complete problem, and although many heuristic algorithms and other algorithms have been developed to solve the graph path problem, the quantum annealing method based on the ircin machine is a more efficient graph path problem solution. Each graph path problem can be corresponding to an inching model, and the natural evolution result of the inching under the control of the physical law is the solution of the graph path problem, namely the solution of the Hamilton path problem or the Hamilton loop problem.
The DNA sequence assembling method can be used for replacing all nucleic acid sequence sequencers including a traditional GA sequencer, an applied biology sequencer and the like, sequencing equipment, a sequencing system and the like. The present application relates to a DNA sequence assembly apparatus comprising an aforementioned DNA sequence assembly module, except that an Isatin machine is integrated into a sequencer, an apparatus, a system, or the like, or used as a sequencing vector.
Drawings
To make the above objects, features and advantages more comprehensible, embodiments accompanied with figures are described in detail below, and features and advantages of the present application will become apparent upon reading the following detailed description and upon reference to the following figures.
FIG. 1 is a directed graph constructed by nodes and directed edges and weights of edges.
FIG. 2 is a flow chart of deoxyribonucleic acid assay and deoxyribonucleic acid fragment sequence assembly.
FIG. 3 shows a process of constructing directed graphs of deoxyribonucleic acid fragments related to deoxyribonucleic acid fragments.
Detailed Description
The present invention will be described more fully hereinafter with reference to the accompanying examples, which are intended to illustrate and not to limit the invention, but those skilled in the art, on the basis of which they may obtain without inventive faculty, without departing from the scope of the invention.
Referring to fig. 1, a graph (graph) does not refer to a graphic image (image) or a map (map). Generally speaking, the industry views a graph as an abstract network of "vertices," where the vertices in the network can be connected to each other by "edges" and represent that two vertices are related. Note the two keywords in the graph definition here, thus we get the two concepts we are most basic, namely vertex (vertex) and edge (edge). The most central items of the graph theory are presented first.
Referring to fig. 1, a vertex describes something or an object. Since the terminology of the graph is not standardized, it is possible to refer to vertices as points or nodes or endpoints, etc. The same applies to vertex terms as referred to in the context of the application.
Referring to fig. 1, edges represent objects and relationships between objects. Edges represent the logical relationship between vertices.
Referring to FIG. 1, a Directed/Undirected Graph (Directed Graph/Undirected Graph). The most basic graph is usually defined as an undirected graph and the corresponding one is called a directed graph. The difference is that the edges in the directed graph are directional.
Referring to fig. 1, the weight (weight) is a weight, a cost, and a length, and each edge has a corresponding value. For example, when a vertex represents some physical location or the like, the weight of an edge between two vertices may be set as the distance in the road network. Sometimes to cope with special cases, the weight of an edge may be zero or a negative number.
Referring to FIG. 1, the present application relates to the field of DNA sequencing, and in particular to a method for assembling DNA sequences or for restoring DNA sequences based on ImmunoQuantum annealing, in combination with the preceding figures (graph).
Referring to FIG. 1, in the context of the sequence assembly problem of DNA fragments, the entire single DNA molecule that can be sequenced is typically short in length during the DNA sequencing process. Even with technological advances to date, DNA molecules of thousands of nucleotides in length can be measured at a single time, but the human chromosome is a single molecule of billions of nucleotides.
Referring to FIG. 1, due to the large number of bases in a single molecule, one-time sequencing is impossible and the industry has to break the chromosome into DNA fragments that can be directly sequenced. However, the sequence of assembly between DNA fragments cannot be directly deduced, and the calculation amount of the sequence of assembly is huge with respect to the number of huge bases of a single nucleotide molecule.
Referring to FIG. 1, even under the very favorable assumption, the nucleotide sequence structure and sequence rules of all DNA fragments can be determined, but the complete DNA sequence cannot be recovered.
Referring to fig. 1, shotgun sequencing (shotgun sequencing) was developed in the industry. And copying multiple copies or backups of the DNA to be detected, breaking each copy into DNA fragments at random positions of the copy, and selecting the fragments with proper length from the DNA fragments for sequencing. Since the fragments are from many copies of DNA, breaking breakpoints are random, so the sample size needs to be large enough to allow overlap from many DNA fragment sequences from different copies. These critical overlapping information allows the industry to reconstruct the sequence relationships between DNA fragment sequences, assembling DNA fragment sequences into complete DNA sequences.
Referring to fig. 1, how to process huge amount of sample information (typically, overlapping of DNA fragment sequences of different copies and their related sequence relationships) is a problem to be solved.
Referring to fig. 1, building a directed graph includes: copying a plurality of DNA to be sequenced, breaking each DNA into a plurality of DNA fragments at random positions, sequencing nucleotides, and regarding the DNA fragments subjected to nucleotide sequencing as nodes of a directed graph, such as exemplary nodes { A, B, C, D … N }, and the like. More DNA fragments (nodes) are not shown.
Referring to FIG. 1, nodes v, u and directed edges (uv), weights W of the edges uv Jointly construct directed graph G ═ (V, E). For example, it may be assumed that the graph G ═ (V, E) has N nodes, V represents the set of all nodes, and E represents the set of all directed edges. Further, as is commonly available, (uv) represents a directed edge pointing from node u to node v, the weight W of the edge uv =S(u,v)。
Referring to FIG. 1, in the illustrative example of a directed edge, a directed edge pointing from node A to node B and a weight W on the edge may be represented by (AB) AB S (a, B); conversely, a directed edge pointing from node B to node A and the weight W on this edge may be represented By (BA) BA =S(B,A)。
Referring to FIG. 1, in the illustrative example of a directed edge, a directed edge pointing from node B to node C and a weight W on the edge may be represented By (BC) BC S (B, C); conversely, a directed edge pointing from node C to node B and the weight W on this edge may be represented by (CB) CB =S(C,B)。
Referring to FIG. 1, in the illustrative example of a directed edge, a directed edge pointing from node C to node D and a weight W on the edge may be represented by (CD) CD S (C, D); conversely, a directed edge pointing from node D to node C and the weight W on this edge may be represented by (DC) DC =S(D,C)。
Referring to FIG. 1, in the illustrative example of a directed edge, a directed edge pointing from node D to node N and a weight W on the edge may be represented by (DN) DN S (D, N); but instead is available (N)D) Representing a directed edge pointing from node N to node D and a weight W on this edge ND =S(N,D)。
Referring to FIG. 1, in the illustrative example of a directed edge, AN Available (AN) represents a directed edge pointing from node A to node N and a weight W on the edge AN S (a, N); conversely, the available (NA) represents a directed edge pointing from node N to node A and the weight W on this edge NA =S(N,A)。
Referring to FIG. 1, in the illustrative example of a directed edge, an Available (AD) represents the directed edge pointing from node A to node D and the weight W on this edge AD S (a, D); conversely, a directed edge pointing from node D to node A and the weight W on this edge may be represented by (DA) DA =S(D,A)。
Referring to FIG. 1, in an illustrative example of a directed edge, an Available (AC) represents the directed edge pointing from node A to node C and the weight W on this edge AC S (a, C); conversely, available (CA) represents a directed edge pointing from node C to node A and the weight W on this edge CA =S(C,A)。
Referring to FIG. 1, in the illustrative example of a directed edge, a directed edge pointing from node B to node D and a weight W on the edge are represented By (BD) BD S (B, D); conversely, the available (DB) represents a directed edge pointing from node D to node B and the weight W on this edge DB =S(D,B)。
Referring to FIG. 1, edges (BN) and (NB) and corresponding weights W BN (ii) S (B, N) and weight W NB =S(N,B)。
Referring to FIG. 1, edges (CN) and (NC) and corresponding weights W CN S (C, N) and weight W NC =S(N,C)。
Referring to FIG. 1, N binary variables x are assigned to each node, e.g., v, in the directed graph v,i Of variable x v,i The subscripts of (a) and (b) range from 1,2, …, N and i to 1,2, …, N. Variable x v,i 1 represents that a node v (DNA fragment v) appears at the i-th position of a so-called path (a path characterizes a sequence of DNA fragments), and since one node (DNA fragment) can only appear at one position in a path (a sequence of DNA fragments), there is a constraint 1:
Figure BDA0003602335080000081
referring to FIG. 1, a diagram is a strong framework in data structure and algorithms. The map can be used to represent almost all types of structures or systems, and has wide application from the fields of traffic network to communication network, chess game to optimal process solution, task distribution to interpersonal interaction network and the like. With respect to the world of graph theory, a clear, accurate basic concept is a necessary premise and foundation. The concept of graph theory is unusual, and the vertexes and edges are the most core items of content of the graph theory.
Referring to fig. 1, the development of quantum computers is gradually mature and scaled at present, for example, a simulation system with a quantum computing function is constructed based on a traditional computer, and a development approach and a tool of a quantum algorithm are provided. The existing quantum simulation system is mainly deployed on a supercomputer and a cloud computing platform, and the quantum computer has exponential computing acceleration compared with the traditional computer. It is the main objective of the present application to find a way to do illicin-based quantum annealing that not only solves the problem of illicin, but also allows for the replication of the entire sequence of the entire DNA.
Referring to FIG. 1, a weight W is assigned to a directed edge (uv) uv S (u, v). For example, graph neural network G ═ (V, E). The assembly sequence between the DNA fragments that are broken into fragments that can be directly sequenced and the DNA fragments is mapped to the quantity of isoocthamiltonian, and if the mapping is solved, the assembly result is obtained, as will be explained in detail further below. For example, S (u, v) may be a known function that can be freely selected or defined or preset by the user, such as an overlap score function, which is used to evaluate the degree of overlap between the tail end of the DNA fragment u and the head end of the DNA fragment v. If the degree of overlap is larger and S (u, v) is smaller, S (u, v) is larger if u, v are not overlapped at all.
Referring to FIG. 2, constructing a directed graph of DNA fragments includes the following steps S201-S206.
Referring to fig. 2, in step S201, a DNA to be sequenced is replicated in multiple copies.
Referring to fig. 2, in step S202, each piece of DNA is broken into a plurality of DNA fragments at random positions.
Referring to FIG. 2, in step S203, from the aforementioned DNA fragments, fragments of a length suitable for direct sequencing are selected and nucleotide sequencing is also required.
Referring to FIG. 2, in step S204, a DNA fragment whose nucleotide sequencing is completed is used as a node of the graph.
Referring to fig. 2, in step S205, one directed edge (uv) is assigned to each pair of ordered node pairs (u, v); the directed edge needs to be assigned a weight, such as the directed edge (uv) shown in fig. 1, to be W uv =S(u,v)。
Referring to FIG. 2, in step S206, the nodes v, u, etc. and the directed edges (uv), the weights W of the edges are calculated uv Jointly construct a directed graph G ═ V, E. The graphs defined by steps S201-S206 are directed graphs.
Referring to FIG. 3, the present application is directed to provide a DNA sequence assembly method based on Italic quantum annealing, which utilizes the advantage of fast evolution of Italic quantum annealing to process huge amount of NDA sample information quickly. In order to solve the above technical problem, the present application adopts the following technical solutions of S101 to S104.
Referring to fig. 3, a directed graph of DNA fragments is constructed in step S101.
Referring to fig. 3, an iximan quantity H is constructed from a directed graph in step S102 ising
Referring to fig. 3, the quantity of isooctane hamilton H is measured in step S103 ising Substituting into the Itanium quantum annealing evolution.
Referring to fig. 3, the entire sequence of the entire DNA is reproduced according to the quantum annealing evolution result in step S104.
Referring to fig. 3, the construction of the directed graph of DNA fragments in step S101 includes the steps of: copying a plurality of copies of DNA to be sequenced; breaking each piece of DNA into a plurality of DNA fragments at random positions; selecting a fragment with a length suitable for direct sequencing from the DNA fragments, and carrying out nucleotide sequencing; taking the DNA fragment subjected to nucleotide sequencing as a node of the graph; each pair of ordered node pairs (u, v) is assigned a directed edge (uv). According to the node v, the directed edge (uv) and the weight W of the edge uv Together, a directed graph G ═ V, E is constructed.
See FIG. 3, where the quantity of Exin Hamilton H ising Produced by the following equation:
Figure BDA0003602335080000091
the technical scheme has the following beneficial effects:
based on the DNA sequence assembling method based on the Italic quantum annealing, a huge amount of DNA sample fragments can be generated by a shotgun sequencing method, then a directed graph is constructed by the DNA sample fragments, and the fast DNA sequence assembling is realized according to the advantages that the directed graph constructs the Hamilton quantity and brings Hamilton quantity parameters into the quick evolution of Italic quantum annealing.
Referring to FIG. 1, the mathematical model of the DNA fragment sequencing problem is equivalent to a Hamiltonian loop problem.
Referring to fig. 1, the hamiltonian loop problem and the hamiltonian path problem are addressed. Given a graph, starting from a certain node and advancing along an edge, traversing each node in the graph without repeatedly reaching any node, and requiring the sum of the weights of all the edges in the path to be minimum, which can be equivalent to a Hamiltonian path problem.
Referring to fig. 1, the hamiltonian loop problem is the addition of a requirement that the path eventually must return to the starting point. In practice, the hamiltonian path problem is equivalent to a hamiltonian loop problem in this application.
Referring to FIG. 1, the sequence assembly problem of DNA fragments is equivalent to finding a closed loop with the smallest weight sum in the figure, i.e., the Hamiltonian loop problem. If the overlap score function S (u, v) between (u, v) is small, it indicates that the u tail and v head overlap to a large extent, i.e., u, v (most likely) is a partially overlapping segment immediately preceding and following the complete DNA sequence. If a path with the smallest weight and smallest value is found in the directed graph, the corresponding DNA fragments are arranged in the order of the path, and the nucleotide sequence structure of the DNA can be completely reproduced (with the greatest degree of confidence).
Referring to fig. 1, the present application transforms the DNA fragment assembly problem, which is equivalent to finding a path with minimum weight in a directed graph, the hamilton path problem.
Referring to fig. 1, based on the teachings presented herein, the subject matter relates to quantum computing or quantum processing that allows operations to be performed on quantum devices, the hamiltonian of which can be designed to meet at least the following objectives: the Yixin Hamiltonian operation comprises various annealing and evolution processes such as quantum annealing or simulated annealing.
Referring to fig. 1, the customary integral yixin hamiltonian quantity can be described as:
Figure BDA0003602335080000101
wherein J ij Representing the ith spin x in the Esinhamilton quantity i And the jth spin x j The coupling strength of (2).
Result x of each spin i Composed of binary values-1 and 1, the objective of solving the isooctane problem is to minimize the quantity of isooctane Hamiltonian and obtain x under the condition of minimum Hamiltonian i Corresponding to a value of 1 or-1.
As related to quantum herein, the following is relevant with respect to quantum devices and quantum data:
the term "quantum device" as used herein includes known quantum computing devices, quantum chips, and the like, and quantum hardware may be used instead of such terms. Typical "quantum devices" include, but are not limited to: quantum computers, quantum information processing systems or quantum cryptography systems, quantum simulators, all kinds of devices, apparatuses and machines that process quantum data.
"quantum data" as used herein encompasses information or data carried, held or stored by a quantum system, the smallest nontrivial system being a qubit, i.e., a system that defines a unit of quantum information. It should be understood that the term "qubit" includes all quantum systems that can be appropriately approximated as two-level systems in the respective contexts. Such quantum systems typically include, for example, typical atomic, electronic, photonic, ionic, or superconducting qubits, among others.
Referring to fig. 1, a graph neural network G ═ (V, E) is a definition method of set theory, and the expression may be summarized as that a graph is a set of vertices and edges. V is the Vertex (Vertex) and E is the Edge (Edge). Other expressions of interferograms in the present application are graph neural networks, and therefore interferograms belong to graph neural networks.
As related herein to quantum, the relevant matters regarding quantum machines and quantum data are as follows:
the term "quantum machine" as used herein encompasses known quantum computing devices, quantum chips, and the like, and quantum hardware may be used instead of such terms as quantum devices. Typical "quantum machines" include, but are not limited to: quantum computers, quantum information processing systems or quantum cryptography systems, quantum simulators, all kinds of devices, apparatuses and machines that process quantum data.
Commercial applications for quantum annealing-most typically a quantum annealer such as D-Wave, a quantum computer specialty company in canada. The quantum computer principle of D-Wave commercial sale is that a quantum bit is formed by a tiny current loop made of niobium metal, the quantum annealing phenomenon is realized, and the effect of storing a large number of numerical values by bit data in quantum computation can be simulated.
It is worth noting that in the field of commercial application, the quantum annealing method can effectively solve the optimization problem by searching various possibilities through using the superposition state, and effectively meets the current efficiency improvement and acceleration requirements of the actual working scheme.
Up to now, quantum annealing has constructed a variety of early application programs in various fields such as logistics, artificial intelligence, material science, drug discovery, network security and fault detection, and financial modeling. The currently common annealing algorithm is divided into two types of simulation annealing and quantum annealing, and quantum annealing is superior.
Annealing is essentially a heat treatment process of a metal by slowly heating the metal to a temperature and for a sufficient time and then cooling at a suitable rate. In the case of semiconductors, annealing is required after ion implantation because when impurity ions are implanted into the semiconductor, high-energy incident ions collide with atoms on the semiconductor lattice to displace the lattice atoms and annealing restores the crystal structure and eliminates defects. The actual annealing solves the problem of unstable hardware process of the material in the development process, and the simulated annealing and the quantum annealing solve the non-optimal solution of mathematical calculation such as combination optimization.
Quantum annealing is a form of Adiabatic Quantum Computing (AQC). Informally, the adiabatic theorem states that if a quantum mechanical system starts from the ground state of a certain hamiltonian amount and the speed of changing the hamiltonian amount is slow enough, the system will end up with the ground state of the final hamiltonian amount. If the initial Hamiltonian is set to a Hamiltonian with a known ground state, and the final Hamiltonian is set to a problem Hamiltonian, the ground state represents the solution of the optimization problem that is desired to be solved, and calculations using this theorem can yield the desired result. The annealing time scale (the expected time required for a single run to reach the solution) is defined by the inverse of the minimum energy gap between the ground state and the first excited state encountered during adiabatic evolution.
Referring to fig. 1, a new hamiltonian is constructed starting after the map is obtained. The graph G (V, E) has N nodes in total, V may represent a set of all nodes, E represents a set of all directed edges, and (uv) represents a directed edge pointing from node u to node V and the weight W of the edge uv S (u, v). This is the basic architecture of a Directed Graph (Directed Graph).
Referring to FIG. 1, N binary variables x are assigned to each DNA segment or node v of the directed graph v,i The subscripts of the variables have a value in the range of v 1,2, …, N, i 1,2, …, N. Typically N is a positive integer.
Referring to FIG. 1, it is worth noting that x v,i 1 represents that a node v (DNA fragment v) appears at the ith position of a path (i.e., a DNA fragment sequence), because one node (DNA fragment) can only appear at one position in the path (DNA fragment sequence), constraint 1 is set:
Figure BDA0003602335080000121
corresponding to the following penalty functions:
Figure BDA0003602335080000122
under the constraint condition A>0. The penalty function means that if the constraint is violated, the value of the penalty function increases and the physical evolution of the machine is characterized by finding and stabilizing the ground state with the lowest energy. The penalty function in the hamiltonian ensures that the ising engine does not evolve to violate the constraint. And because only one node (DNA fragment) can be placed at each position of the path (DNA fragment sequence), the constraint 2 is set,
Figure BDA0003602335080000123
the following penalty functions correspond:
Figure BDA0003602335080000124
in addition, the total weight on the path is required to be minimum, corresponding to a penalty function:
Figure BDA0003602335080000125
under this constraint B > 0. The ising hamiltonian for the traveler's question is then constructed based on graph G.
Figure BDA0003602335080000126
Referring to FIG. 1, it is worth noting that x u,i 1 represents that node u (DNA fragment u) occurs at the ith position of the pathway (i.e., DNA fragment sequence). Since one node (DNA fragment) can only appear at one position in the path (DNA fragment sequence), it has been explained above. x is a radical of a fluorine atom u,i+1 1 represents that the node v (DNA fragment v) appears at the i +1 th position of the pathway (i.e., DNA fragment sequence).
A > B >0 in the Exin Hamilton. A is a first term predetermined coefficient and B is a second term predetermined coefficient.
It is allowed to require only a > B >0, which means that the constraint cannot be violated in pursuit of smaller weights, after all satisfying the precondition that the constraint is a problem. Since the expansion or reduction of the value of the isooctane Hamilton coefficient by several times does not affect the final ground state, A and B can be selected with high flexibility as long as the condition A > B >0 is satisfied. By calculation, the quantity of itom hamiltonian constructed based on graph G can be converted into the following two-body itom hamiltonian quantities in standard form:
Figure BDA0003602335080000131
the evolution of quantum annealing is finished, and the output result is the position of each node in the loop, namely the position of each DNA segment in the assembly sequence. The complete sequence of the entire DNA is reproduced based on the assembled sequence of the DNA fragments. In quantum annealing J is usually added ij And h i Inputting the raw materials into an Italian machine to start Italian quantum annealing evolution.
Referring to fig. 1, in the present application, it can be satisfied that "finding a path with the smallest weight and the smallest value in the directed graph — the hamilton path problem" can always be simply converted: the conversion is "find the weight and the smallest closed loop in the graph-Hamiltonian loop problem", for example, a node 0 can be added in the graph. Node number 0 and all other nodes may be interconnected with two directed edges, but these directed edges are all weighted 0. Therefore, a path with the smallest weight sum only needs to be connected with the start point and the end point of the path and the node No. 0 to form a loop, and the loop is defined as a closed loop with the smallest weight sum in the directed graph in the application. The zero node is represented by the graph node "0".
Referring to FIG. 1, in the zero node example, node A points to the weight W of the directed edge of the zero node A0 Weight W for a directed edge pointing to node A at zero node and 0 0A =0=W A0
Referring to FIG. 1, in the zero node example, node B points to the weight W of the directed edge of the zero node B0 0 and zero node points to node BWeight W of edge 0B =0=W B0
Referring to FIG. 1, in the zero node example, node C points to the weight W of the directed edge of the zero node C0 Weight W for a directed edge with a zero node pointing to node C 0C =0=W C0
Referring to FIG. 1, in the zero node example, node D points to the weight W of the directed edge of the zero node D0 Weight W for directed edge pointing to node D at node 0 and zero 0D =0=W D0
Referring to FIG. 1, in the zero node example, node N points to the weight W of the directed edge of the zero node N0 Weight W for a directed edge with a zero node pointing to node N 0N =0=W N0
Referring to fig. 1, in an alternative embodiment, a zero node which does not represent any DNA segment is added to the directed graph and the zero node and all other nodes are interconnected by directed edges, the weights of the directed edges between the zero node and all other nodes are set to be zero, and a closed loop for describing the minimum path is formed by connecting the zero node, the node representing the start point of the path and the node representing the end point of the path.
Referring to fig. 1, in an alternative embodiment, the annealing of the quantity of ixinchamiltonian is used to find the path with the smallest sum of the weight values in the directed graph, and the corresponding DNA fragments are arranged in the order of the smallest path, the main purpose and function being to completely reproduce the nucleotide sequence structure of the DNA.
Referring to fig. 1, a DNA sequence assembly method based on iton quantum annealing, such as a whole genome shotgun method, obtains a huge amount of ordered NDA fragments, constructs the ordered DNA fragments into a directed graph, constructs parameters of iton hamilton according to the directed graph and brings in iton quantum annealing, the result of iton quantum annealing evolution is the position of each node in a loop, which is exactly the position of each DNA fragment in the assembly sequence, and the complete sequence of the whole DNA is reproduced according to the assembly sequence of the DNA fragments. Quantum annealing and simulated annealing are alternatives.
While the above specification teaches the preferred embodiments with a certain degree of particularity, there is shown in the drawings and will herein be described in detail a presently preferred embodiment with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiment illustrated. Various alterations and modifications will no doubt become apparent to those skilled in the art after having read the above description. Therefore, the appended claims should be construed to cover all such variations and modifications as fall within the true spirit and scope of the invention. Any and all equivalent ranges and contents within the scope of the claims should be considered to be within the intent and scope of the present invention.

Claims (11)

1. A method of DNA sequence assembly comprising:
establishing a directed graph of the DNA fragment, constructing an ixime Hamiltonian according to the directed graph, and reproducing the complete sequence of the whole DNA by the annealing evolution result of the ixime Hamiltonian.
2. The method of claim 1, wherein:
establishing the directed graph comprises: copying a plurality of DNA to be sequenced, breaking each DNA into a plurality of DNA fragments at random positions, sequencing nucleotides, and taking the DNA fragments subjected to nucleotide sequencing as nodes of a directed graph.
3. The method of claim 2, wherein:
a directed edge is assigned to each pair of ordered node pairs in the directed graph.
4. The method of claim 3, wherein:
each directed edge is distributed with a weight value, and the weight value is used for evaluating the overlapping degree of two DNA fragments represented by a pair of nodes at two ends of the directed edge.
5. The method of claim 4, wherein:
the smaller the weight value, the larger the degree of overlap, and the larger the weight value, the lower the degree of overlap.
6. The method of claim 4, wherein:
and annealing the quantity of the IshHamilton is used for searching a path with the minimum sum of weight values in the directed graph, and the corresponding DNA fragments are arranged according to the sequence of the minimum path so as to completely reproduce the nucleotide sequence structure of the DNA.
7. The method of claim 6, wherein:
adding a zero node which does not represent any DNA segment in the directed graph, wherein the zero node and all other nodes are interconnected by directed edges, the weights of the directed edges between the zero node and all other nodes are all zero, and a closed loop for describing a minimum path is formed by connecting the zero node, the node representing the starting point of the path and the node representing the end point of the path.
8. A method of DNA sequence assembly comprising:
copying a plurality of DNA to be sequenced, wherein each DNA is broken into a plurality of DNA fragments at random positions and is subjected to nucleotide sequencing, and the DNA fragments subjected to the nucleotide sequencing are regarded as nodes of a directed graph;
constructing an Eicot Hamilton quantity from the directed graph, which is used in annealing to find a path in the directed graph where the sum of the weight values is the smallest, and DNA fragments are arranged in the order of the smallest path to reproduce the nucleotide sequence structure of DNA.
9. The method of claim 8, wherein:
adding a zero node which does not represent any DNA segment in the directed graph, wherein the zero node and all other nodes are interconnected by directed edges, the weights of the directed edges between the zero node and all other nodes are all zero values, and the zero node, the node representing the starting point of the path and the node representing the end point of the path are connected to form a closed loop for describing the minimum path.
10. A method of DNA sequence assembly comprising the steps of:
s1, constructing a directed graph of the DNA fragments;
s2, constructing an Exin Hamiltonian according to the directed graph;
s3, carrying out quantum annealing evolution on the Ilextro Hamiltonian by an Ilextro machine;
and S4, reproducing the complete sequence of the whole DNA according to the quantum annealing evolution result.
11. The method according to claim 10, wherein step S1 includes:
copying a plurality of copies of DNA to be sequenced;
breaking each piece of DNA into a plurality of DNA fragments at random positions;
selecting a fragment with a length suitable for direct sequencing from the DNA fragments, and carrying out nucleotide sequencing;
taking a DNA fragment with completed nucleotide sequencing as a node in the graph;
each pair of ordered node pairs is assigned a directed edge.
CN202210406466.6A 2022-04-18 2022-04-18 DNA sequence assembling method Pending CN114913922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210406466.6A CN114913922A (en) 2022-04-18 2022-04-18 DNA sequence assembling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210406466.6A CN114913922A (en) 2022-04-18 2022-04-18 DNA sequence assembling method

Publications (1)

Publication Number Publication Date
CN114913922A true CN114913922A (en) 2022-08-16

Family

ID=82763893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210406466.6A Pending CN114913922A (en) 2022-04-18 2022-04-18 DNA sequence assembling method

Country Status (1)

Country Link
CN (1) CN114913922A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497092A (en) * 2024-01-02 2024-02-02 合肥微观纪元数字科技有限公司 RNA structure prediction method and system based on dynamic programming and quantum annealing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497092A (en) * 2024-01-02 2024-02-02 合肥微观纪元数字科技有限公司 RNA structure prediction method and system based on dynamic programming and quantum annealing

Similar Documents

Publication Publication Date Title
Ghambari et al. An improved artificial bee colony algorithm and its application to reliability optimization problems
Sun et al. Large scale flexible scheduling optimization by a distributed evolutionary algorithm
Moret et al. Reconstructing phylogenies from gene-content and gene-order data.
Zheng et al. On the PATHGROUPS approach to rapid small phylogeny
Wu et al. Solving the family traveling salesperson problem in the adleman–lipton model based on DNA computing
CN114913922A (en) DNA sequence assembling method
Sun A study of solving traveling salesman problem with genetic algorithm
Kommadath et al. Parallel computing strategies for sanitized teaching learning based optimization
Elkhani et al. Multi-objective binary PSO with kernel P system on GPU
El-Sherbiny Alternate mutation based artificial immune algorithm for step fixed charge transportation problem
Du et al. Species tree and reconciliation estimation under a duplication-loss-coalescence model
Pirkul et al. New heuristic solution procedures for the uniform graph partitioning problem: extensions and evaluation
Moon et al. Genetic algorithm for maximizing the parts flow within cells in manufacturing cell design
Kurniawan et al. An ant colony system for DNA sequence design based on thermodynamics
Osman et al. Hybrid learning algorithm in neural network system for enzyme classification
Ranjini et al. Analysis of selection schemes for solving job shop scheduling problem using genetic algorithm
Alissa et al. A neural approach to generation of constructive heuristics
Zhou et al. Nature-inspired algorithms for 0-1 knapsack problem: A survey
Triana et al. Knapsack problem solving using evolutionary algorithms guided by complex networks
Khalid et al. A model to optimize DNA sequences based on particle swarm optimization
Mateo et al. Graph-based solution batch management for multi-objective evolutionary algorithms
Wu et al. Multiple sequence alignment using ga and nn
Boulif et al. Multi-objective cell formation with routing flexibility: a graph partitioning approach
EP4343637A1 (en) Arithmetic operation program, arithmetic operation method, and information processing apparatus
Murata et al. Gene linkage identification in permutation problems for local search and genetic local search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination