CN116052759A - Hamiltonian volume construction method and related device - Google Patents

Hamiltonian volume construction method and related device Download PDF

Info

Publication number
CN116052759A
CN116052759A CN202211588569.5A CN202211588569A CN116052759A CN 116052759 A CN116052759 A CN 116052759A CN 202211588569 A CN202211588569 A CN 202211588569A CN 116052759 A CN116052759 A CN 116052759A
Authority
CN
China
Prior art keywords
stems
overlapping
stem
hamiltonian
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211588569.5A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
窦猛汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Origin Quantum Computing Technology Co Ltd
Original Assignee
Origin Quantum Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Origin Quantum Computing Technology Co Ltd filed Critical Origin Quantum Computing Technology Co Ltd
Priority to CN202211588569.5A priority Critical patent/CN116052759A/en
Publication of CN116052759A publication Critical patent/CN116052759A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/10Nucleic acid folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N10/00Quantum computing, i.e. information processing based on quantum-mechanical phenomena

Abstract

The invention discloses a construction method of Hamiltonian volume and a related device, wherein the method comprises the following steps: obtaining an overlapping matrix constructed by all stems corresponding to the target RNA; determining all overlapping areas from the overlapping matrix; based on all the overlapping areas, a hamiltonian is constructed. By utilizing the embodiment of the invention, by utilizing the overlapping region, part of stems overlapped with each other are deleted from the feasible folding mode of RNA when the Hamiltonian volume is constructed, so that the workload of quantum computation is reduced.

Description

Hamiltonian volume construction method and related device
Technical Field
The invention belongs to the technical field of RNA structure prediction, and particularly relates to a Hamiltonian volume construction method and a related device.
Background
Ribonucleic Acid (RNA) is a genetic information carrier existing in biological cells, has various functions, and can play roles in the processes of genetic coding, compiling, regulating, gene expression and the like. The function of RNA is closely related to the structure of RNA, and RNA is single-stranded, so that RNA can be single-stranded folded by self complementary base pair combination on the basis of a primary structure (base pair sequence), and a complex three-dimensional structure is formed, namely a secondary structure of RNA. Techniques in the fields of protein design, gene editing, vaccine research and development and the like all need to clearly relate to the secondary structure of RNA, however, unlike recognition and measurement of the primary structure, the current experiment lacks a technical means for efficiently predicting the secondary structure, so that people try to solve the problem of RNA secondary structure prediction (namely RNA folding) by using computational science.
When the problem of RNA folding is solved by utilizing quantum computation, the primary structure of the RNA is required to be processed, the Hamiltonian amount is constructed, and then the Hamiltonian amount is utilized to carry out quantum computation, so that the prediction of the secondary structure of the RNA is realized. The hamiltonian is constructed by using an overlapping matrix of stems, the hamiltonian includes stems overlapping each other, and then, when quantum calculation is performed by using the hamiltonian, stems required in a folding manner (i.e., a combination of stems) in which RNA is feasible are identified from the stems overlapping each other, but in calculation, a relatively large amount of work is required for identification.
Disclosure of Invention
The invention aims to provide a Hamiltonian volume construction method and a related device, which are used for reducing the workload of quantum computation by deleting part of stems overlapped with each other from a possible folding mode of RNA when constructing the Hamiltonian volume by utilizing an overlapped region.
One embodiment of the present application provides a method for constructing hamiltonian, the method comprising:
obtaining an overlapping matrix constructed by all stems corresponding to target RNA, wherein the target RNA is RNA with a primary structure;
determining all overlapping regions from the overlapping matrix, wherein each of the overlapping regions corresponds to a plurality of stems;
based on all the overlapping regions, constructing hamiltonian volumes, wherein the Ha Midu volumes do not have interaction items of a plurality of stems corresponding to the same overlapping region.
Optionally, the obtaining an overlap matrix constructed from all stems corresponding to the target RNA includes:
determining all stem regions corresponding to the target RNA;
obtaining all stems contained in each stem region;
and constructing an overlapping matrix according to the overlapping relation among all the stems.
Optionally, the obtaining all stems contained in each stem region includes:
for each stem region, stems ordered by stem length are obtained.
Optionally, after obtaining stems ordered according to stem length for each stem region, obtaining all stems corresponding to the target RNA further includes:
all stems are reordered according to the principle that adjacent stems overlap as much as possible.
Optionally, the re-ordering all stems according to the principle that adjacent stems overlap as much as possible includes:
and (3) re-ordering all stems according to the sequencing value of the stems, wherein the sequencing value is calculated by using the base positions in the stems and the lengths of the stems.
Optionally, determining all overlapping areas from the overlapping matrix includes:
all overlapping regions are determined from the region of the overlapping matrix having the target shape, which region is composed of target characteristics, wherein the target characteristics characterize that two stems overlap, and the stems corresponding to the overlapping regions overlap each other.
Optionally, the constructing the hamiltonian based on all the overlapping areas includes:
constructing an action item of a single stem based on all stems corresponding to the target RNA;
constructing interaction items of a plurality of stems based on stems corresponding to each of the overlapping regions;
and obtaining Hamiltonian by utilizing the action items of the single stems and the interaction items of the plurality of stems.
Optionally, the obtaining hamiltonian using the action item of the single stem and the interaction items of the plurality of stems includes:
the hamiltonian amount is obtained by the following equation:
Figure BDA0003990615500000021
wherein H is C In order to be a hamiltonian amount,
Figure BDA0003990615500000022
as an action item of the individual stems,
Figure BDA0003990615500000023
for the interaction term of multiple stems, q i Is stems i, q j Is stem j, q i And q j Not belonging to the same overlapping region, k i Length of stem i, k j Length of stem j, μ is length of longest stem of all stems corresponding to target RNA, c B Is the addition coefficient of the total length of the stem c L Penalty factor for number of stems, +.>
Figure BDA0003990615500000031
Penalty factor for pseudo junction->
Figure BDA0003990615500000032
Is a penalty factor for overlapping stems.
Yet another embodiment of the present application provides a hamiltonian construction apparatus, the apparatus comprising:
the obtaining module is used for obtaining an overlapping matrix constructed by all stems corresponding to target RNA, wherein the target RNA is RNA with a primary structure;
a determining module, configured to determine all overlapping areas from the overlapping matrix, where each overlapping area corresponds to a plurality of stems;
and constructing a model for constructing hamiltonian volume based on all the overlapping areas, wherein the Ha Midu volume does not contain interaction items of a plurality of stems corresponding to the same overlapping area.
Optionally, the obtaining the model includes:
a determining unit, configured to determine all stem regions corresponding to the target RNA;
an obtaining unit for obtaining all stems contained in each stem region;
and a construction unit for constructing an overlap matrix according to the overlapping relation among all stems.
Optionally, the obtaining unit is specifically configured to:
for each stem region, stems ordered by stem length are obtained.
Optionally, the obtaining unit is further configured to:
all stems are reordered according to the principle that adjacent stems overlap as much as possible.
Optionally, the obtaining unit is further specifically configured to:
and (3) re-ordering all stems according to the sequencing value of the stems, wherein the sequencing value is calculated by using the base positions in the stems and the lengths of the stems.
Optionally, the determining module is specifically configured to:
all overlapping regions are determined from the region of the overlapping matrix having the target shape, which region is composed of target characteristics, wherein the target characteristics characterize that two stems overlap, and the stems corresponding to the overlapping regions overlap each other.
Optionally, the construction module is specifically configured to:
constructing an action item of a single stem based on all stems corresponding to the target RNA;
constructing interaction items of a plurality of stems based on stems corresponding to each of the overlapping regions;
and obtaining Hamiltonian by utilizing the action items of the single stems and the interaction items of the plurality of stems.
Optionally, the construction module is further specifically configured to:
the hamiltonian amount is obtained by the following equation:
Figure BDA0003990615500000033
wherein H is C In order to be a hamiltonian amount,
Figure BDA0003990615500000041
as an action item of the individual stems,
Figure BDA0003990615500000042
for the interaction term of multiple stems, q i Is stems i, q j Is stem j, q i And q j Not belonging to the same overlapping region, k i Length of stem i, k j Length of stem j, μ is length of longest stem of all stems corresponding to target RNA, c B Is the addition coefficient of the total length of the stem c L Penalty factor for number of stems, +.>
Figure BDA0003990615500000043
Penalty factor for pseudo junction->
Figure BDA0003990615500000044
Is a penalty factor for overlapping stems.
An embodiment of the present application provides a storage medium having a computer program stored therein, wherein the computer program is configured to implement, when run, the method of any one of the above.
An embodiment of the application provides an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to implement the method of any of the above.
Compared with the prior art, the Hamiltonian volume constructing method and the related device provided by the invention have the advantages that the overlapping matrix constructed by all stems corresponding to the target RNA is obtained; determining all overlapping areas from the overlapping matrix, wherein each overlapping area corresponds to a plurality of stems; and then constructing Hamiltonian volume based on all the overlapping regions, wherein the Ha Midu volume does not contain interaction items of a plurality of stems corresponding to the same overlapping region. By utilizing the overlap region, a portion of the overlapping stems are deleted from the RNA-viable fold pattern when constructing the hamiltonian, thereby reducing the effort of quantum computation.
Drawings
Fig. 1 is a hardware block diagram of a computer terminal according to a method for constructing hamiltonian according to an embodiment of the present invention;
fig. 2 is a flow chart of a method for constructing hamiltonian according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a pairing matrix according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an overlapping matrix according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a hamiltonian configuration device according to an embodiment of the invention.
Detailed Description
The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The embodiment of the invention firstly provides a construction method of Hamiltonian volume, which can be applied to electronic equipment such as computer terminals, in particular to common computers, quantum computers and the like.
The quantum computer is a kind of physical device which performs high-speed mathematical and logical operation, stores and processes quantum information according to the law of quantum mechanics. When a device processes and calculates quantum information and operates on a quantum algorithm, the device is a quantum computer. Quantum computers are a key technology under investigation because of their ability to handle mathematical problems more efficiently than ordinary computers, for example, to accelerate the time to crack RSA keys from hundreds of years to hours.
The following describes the operation of the computer terminal in detail by taking it as an example. Fig. 1 is a hardware block diagram of a computer terminal according to a method for constructing hamiltonian according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the computer terminal described above. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the method of constructing hamiltonian in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 104 to perform various functional applications and data processing, i.e., implement the above-mentioned methods. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
The quantum computing is a novel computing mode for regulating and controlling the quantum information unit to compute according to a quantum mechanical law, wherein the most basic principle based on the quantum computing is a quantum mechanical state superposition principle, and the quantum mechanical state superposition principle enables the state of the quantum information unit to be in a superposition state with multiple possibilities, so that quantum information processing has greater potential compared with classical information processing in efficiency. A quantum system comprises a plurality of particles which move according to the law of quantum mechanics, the system is in a certain quantum state in a state space, and for chemical molecules, quantum chemical simulation can be realized, so that research support is provided for quantum computing.
It should be noted that a real quantum computer is a hybrid structure, which includes two major parts: part of the computers are classical computers and are responsible for performing classical computation and control; the other part is quantum equipment, which is responsible for running quantum programs so as to realize quantum computation. The quantum program is a series of instruction sequences written by a quantum language such as the qlunes language and capable of running on a quantum computer, so that the support of quantum logic gate operation is realized, and finally, quantum computing is realized. Specifically, the quantum program is a series of instruction sequences for operating the quantum logic gate according to a certain time sequence.
In practical applications, quantum computing simulations are often required to verify quantum algorithms, quantum applications, etc., due to the development of quantum device hardware. Quantum computing simulation is a process of realizing simulated operation of a quantum program corresponding to a specific problem by means of a virtual architecture (namely a quantum virtual machine) built by resources of a common computer. In general, it is necessary to construct a quantum program corresponding to a specific problem. The quantum program, namely the program for representing the quantum bit and the evolution thereof written in the classical language, wherein the quantum bit, the quantum logic gate and the like related to quantum computation are all represented by corresponding classical codes.
Quantum circuits, which are one embodiment of quantum programs, also weigh sub-logic circuits, are the most commonly used general quantum computing models, representing circuits that operate on qubits under an abstract concept, the composition of which includes qubits, circuits (timelines), and various quantum logic gates, and finally the results often need to be read out by quantum measurement operations.
Unlike conventional circuits, which are connected by metal lines to carry voltage or current signals, in a quantum circuit, the circuit can be seen as being connected by time, i.e., the state of the qubit naturally evolves over time, as indicated by the hamiltonian operator, during which it is operated until a logic gate is encountered.
One quantum program is corresponding to one total quantum circuit, and the quantum program refers to the total quantum circuit, wherein the total number of quantum bits in the total quantum circuit is the same as the total number of quantum bits of the quantum program. It can be understood that: one quantum program may consist of a quantum circuit, a measurement operation for the quantum bits in the quantum circuit, a register to hold the measurement results, and a control flow node (jump instruction), and one quantum circuit may contain several tens to hundreds or even thousands of quantum logic gate operations. The execution process of the quantum program is a process of executing all quantum logic gates according to a certain time sequence. Note that the timing is the time sequence in which a single quantum logic gate is executed.
It should also be noted that the present invention relates to a quantum computer, in which the unit of the processing chip is a CMOS tube in a common computing device based on a silicon chip, such a computing unit is not limited by time and dryness, i.e. such a computing unit is not limited by the length of time of use, and is ready to use. Furthermore, currently, the number of such calculation units in a silicon chip is sufficient, i.e. the number of calculation units in one chip is thousands of at present. The number of computational cells is sufficient and the CMOS transistor selectable computational logic is fixed, for example: and AND logic. When the CMOS tube is used for operation, a large number of CMOS tubes are combined with limited logic functions, so that the operation effect is realized.
Unlike such logic units in conventional computing devices, in current quantum computers the basic computing unit is a qubit, the input of which is limited by coherence and also by coherence time, i.e. the qubit is limited in terms of time of use and is not readily available. Full use of qubits within the usable lifetime of the qubits is a critical challenge for quantum computing. Furthermore, the critical challenges of quantum computing are related to the number of qubits in a quantum computer. Furthermore, the number of qubits in a quantum computer is one of the representative indicators of the performance of the quantum computer, each of the qubits realizes a calculation function by a logic function configured as needed, whereas the logic function in the field of quantum calculation is diversified in view of the limited number of qubits, for example: hadamard gates (Hadamard gates, H gates), brix-gates (X gates), brix-Y gates (Y gates), brix-Z gates (Z gates), RX gates, RY gates, RZ gates, CNOT gates, CR gates, issnap gates, toffoli gates, and the like. Quantum logic gates are typically represented using unitary matrices, which are not only in matrix form, but also an operation and transformation. The effect of a general quantum logic gate on a quantum state is calculated by multiplying the unitary matrix by the matrix corresponding to the right vector of the quantum state. During quantum computation, the operation effect is realized by combining limited quantum bits with various logic function combinations.
Referring to fig. 2, fig. 2 is a flow chart of a method for constructing hamiltonian according to an embodiment of the present invention, which may include the following steps:
s201: an overlapping matrix constructed from all stems corresponding to the target RNA is obtained, wherein the target RNA is RNA with a primary structure.
The primary structure of RNA is a single strand, and because of the single strand, base pairing must be such that one base forms a bond with another base in the strand, physically resulting in folding of the single strand. The problem of RNA folding is in fact the way to find base pairing in the strand, and if single base pairing is considered, the number of pairing ways is very large, and in order to solve the problem of RNA folding more effectively, the possible combinations of stems of RNA are determined in the dimension of the stems. The target RNA may comprise a plurality of stems, there may be overlap between the stems, and an overlap matrix is constructed based on the overlapping relationship between the stems, for example, if there is overlap between two stems, the element at the corresponding position in the overlap matrix may be denoted as 1, and if there is no overlap, the corresponding position may be denoted as 0.
In some possible embodiments of the present invention, the obtaining an overlap matrix constructed from all stems corresponding to the target RNA may include:
determining all stem regions corresponding to the target RNA;
obtaining all stems contained in each stem region;
and constructing an overlapping matrix according to the overlapping relation among all the stems.
One stem region is essentially the longest length stem ending in (i, j), i being the end point indicating the smallest base position of this stem and j being the position corresponding to the base of the base pairing corresponding to i.
In some possible embodiments of the invention, determining all stem regions corresponding to the target RNA may be accomplished by base pairing rules with the target RNA, constructing a pairing matrix, and determining all stem regions based on the pairing matrix.
RNA typically contains four bases, A, C, G and U, which can be treated as misinput if other bases are present, e.g., base T if present can be replaced with base U. The target RNA is composed of a base sequence, is generally in the form of a character string, and can be specifically obtained by adopting an RNA primary structure determination method.
According to the base pairing rules { C-G, G-U, U-A }, bases can be mapped one-to-one into numbers in the following order: CGUA→0123, so the base pairing rules become {0-1,1-2,2-3}. Thus, if and only if the absolute value of the difference between the number corresponding to a base and the number of another base is exactly 1, this pair of bases can be reasonably paired. Specifically, a matrix P may be used to represent base pairing in the sequence. When the matrix is the ith row and the jth column of the element P ij =1, meaning that the ith and jth bases can pair, whereas an element of 0 indicates no pairing. It is apparent that this pairing matrix is a symmetric matrix and that the diagonal terms of the matrix are constant at 0 since one base cannot pair with itself. For an RNA sequence of length n, the algorithm complexity for computing the pairing matrix is O (n 2 ). Since the pairing matrix is a symmetric matrix, only the upper triangle or the lower triangle of the pairing matrix can be processed when the stem region is obtained by using the pairing matrix. Of course, the pairing matrix can also be constructed directly according to the base pairing rules. Other elements in the pairing matrix can also be used to represent corresponding base pairing, and in particular, the base pairing can be determined according to actual conditions.
Taking the pairing matrix as the upper triangle as an example, it is illustrated how the stem region is obtained based on the pairing matrix. Since the stem is defined as a series of consecutively paired bases, assuming that the ith base is the end point of a stem, which is paired with the jth base, then the immediately following (i+1) th base and jth-1 th base must be paired. Similarly, assuming that the length of the stem is k, based on the pairing matrix, it is found that the stem is a continuous sequence of length k perpendicular to the diagonal, and a ternary array (i, j, k) may be used to record a stem, where (i, j) is also the number of rows and columns of the stem at the upper right end of the corresponding RNA sequence in the pairing matrix P.
All stem regions can be obtained by traversing the pairing matrix P. Taking an upper triangular matrix of P, traversing all elements according to rows, and recording row and column numbers of the elements as (i, j) when the values of the elements are 1, and recording the length k=1 of the stem region; the element below it on the left is accessed, and if it is 1, the recorded stem length k is increased by 1 until the accessed element value is 0. At this point, a stem region ending in a starting element is obtained, which is written as (i, j, k) using a triplet code. In addition, by limiting the minimum value that k can take, a stem region whose length is not satisfied can be discarded at the time of recording a stem region, so that the minimum length of the stem region can be limited. In addition, in order not to repeat the recording, the values of the accessed element and the starting point element may need to be set to 0 after the length k of the stem is obtained. This process requires access to all elements of the upper triangular matrix of matrix P, of complexity O (n 2 )。
Illustratively, the target RNA may be CUACGAUAG, and the resulting pairing matrix may be as shown in FIG. 3, where the dots in the figure indicate that the corresponding bases may pair, and 1 stem region is obtained based on the pairing matrix (1,9,3).
When stem regions are identified, each stem region contains at least one stem, and if only one stem is contained, the stem region is the stem, and the length of the stem region is the minimum length of the forming stem or stem region. If the length of the stem region is greater than the minimum length of the formed stems, the stem region may comprise a plurality of stems, in particular, all the stems comprised by the stem region may be obtained by traversing the stem region. Any one stem is uniquely contained in a certain stem region, and when all stems contained in each stem region are obtained, all stems corresponding to RNA are obtained.
In some possible embodiments of the invention, the obtaining all stems contained in each stem region comprises:
for each stem region, stems ordered by stem length are obtained.
Stems are ranked to ensure that stems that overlap each other are screened in close proximity with a higher probability, so that the overlap region contains more stems. Solving all stems contained in one stem region, specifically, cis-sequence from high to low according to the length of the stemsSequence screening, i.e. obtaining the corresponding stems in the order of length k-1, k-2, k-3 … for stem regions of length k, the complexity of this process is O (mk 2 ) Where m is the number of stem regions and k is the length of the longest stem. Illustratively, a stem region having a length of 5, and represented by a set of base positions 1-5 in the stem region, the corresponding stem of the stem region is obtained as: 1-5, 1-4, 2-5, 1-3, 2-4, 3-5.
In some possible embodiments of the present invention, after the obtaining stems ordered by stem length for each stem region, the obtaining all stems corresponding to the target RNA further includes:
all stems are reordered according to the principle that adjacent stems overlap as much as possible.
For the stems comprised in one stem region, the ordering has been done such that there is an overlapping stem approach in one stem region. Considering two stems, there are only three possibilities for their relative spatial positions: (1) belongs to the same stem region; (2) only the first half of the bases overlap; (3) only the latter half of the bases overlap. The case (1) is processed when the stems contained in the stem region are obtained, and in order to make the overlapping region contain as many stems as possible, the latter two cases are also processed, and all stems may be reordered so as to overlap as much as possible according to adjacent stems. The process actually combines the optimization process, and various methods for solving the combination optimization problem can be selected according to efficiency and benefit, for example, a greedy algorithm, a genetic algorithm, a neural network and the like can be used, and a ranking algorithm can be used.
In some possible embodiments of the present invention, the reordering all stems according to the principle that adjacent stems overlap as much as possible includes:
and re-ordering all stems which have been ordered by stem length according to the stem ordering value, wherein the ordering value is calculated by using the base position in the stem and the stem length.
For the target RNA, the bases are arranged in a certain order, the base positions are the positions of the bases in the base sequence, the base positions in the stem are the positions of the bases in the base sequence, and as exemplified in FIG. 3, the stem region (1,9,3) is also a stem, and 1 and 9 of the stems (1,9,3) are the base positions. The ranking value can be calculated by using a preset calculation formula and using one or more base positions in the stem and the length of the stem.
Taking the stem s= (i, j, k) as an example, the smallest base position in the stem and the length of the stem can be used to calculate a ranking value, e.g., ranking value
Figure BDA0003990615500000101
I.e. the midpoint number of the first half of the stem. Since all stems can be base sequences extending continuously from the midpoint to both sides, the closer their midpoints are, the greater the likelihood of overlap. Of course, the sequence number at the midpoint of the second half of the stem is used as a ranking value, and other base positions corresponding to the stem may be used for calculation. The sorting algorithm preferably adopts a stable sorting algorithm such as merging sorting, and the complexity of the sorting step is O (N) according to the adopted algorithm 2 ) O (), where N is the number of stems.
S202: from the overlap matrix, all overlap regions are determined, wherein each of the overlap regions corresponds to a plurality of stems.
The overlapping area may be an area in the overlapping stem set selected from the overlapping matrix, and the specific method for determining the overlapping area may be to traverse the overlapping matrix, and take an area satisfying a preset condition as the overlapping area.
In some possible embodiments of the present invention, the determining all overlapping areas from the overlapping matrix includes:
all overlapping regions are determined from the region of the overlapping matrix having the target shape, which region is composed of target characteristics, wherein the target characteristics characterize that two stems overlap, and the stems corresponding to the overlapping regions overlap each other.
In the overlapping matrix, there may be a certain aggregation of stems overlapping each other, and the overlapping region may be obtained from an aggregation region, where the aggregation region refers to a region including a diagonal line of the overlapping matrix and having target characteristics aggregated, specifically, when an aggregation region is identified, the region is composed of target characteristics, and then it is determined whether the region has a target shape or includes a target shape, and if so, the region of the target shape is regarded as the overlapping region. In the embodiment of the present invention, the target shape may be square, and the overlapping area is an area with the target shape including a diagonal line and composed of target characteristics, any two corresponding stems overlap each other, and the number of stems corresponding to one overlapping area is determined by the maximum side length of the overlapping area.
Illustratively, the RNA of PKB database number PKB079 is taken as an example. PKB079 was 61 bases in total, 166 stems (3 for the shortest stem length) and 80 stem regions, and the overlapping matrix formed was shown in FIG. 4. Each point in the graph (i.e., the target property) indicates that the corresponding two stems overlap, and the blank indicates that there is no overlap. From the image, it can be found that the dots are gathered in a diagonal comparison, and a plurality of square areas can be divided, and the areas are filled with dots, which means that stems in the areas overlap each other, and the divided square areas can be overlapping areas, specifically, for one area filled with dots, the largest square area can be divided from the area as the overlapping area.
S203: based on all the overlapping regions, constructing hamiltonian volumes, wherein the Ha Midu volumes do not have interaction items of a plurality of stems corresponding to the same overlapping region.
In constructing hamiltonian, an isooctane model can be used to construct, i.e., using the approximation of the RNA folding problem with the expression of Xin Moxing. Specifically, when constructing the hamiltonian, it is necessary to exclude the interaction items of the plurality of stems from the interaction items of the plurality of stems corresponding to the same overlapping region, that is, the hamiltonian does not include the interaction items of the plurality of stems in the same overlapping region.
In a possible folding manner of RNA obtained by quantum computation using the hamiltonian amount constructed in the present application, for each overlapping region, only one or zero stems are selected from stems corresponding to the overlapping region, and when zero stems are selected from the overlapping region, it is indicated that pseudo stems are selected at this time. In the Hamiltonian stage, part of stems are removed from the secondary structure of RNA, so that the workload of the quantum computing stage is reduced.
At present, the quantum annealing machine can only separate stems overlapped with each other from feasible solutions in a punishment item mode, which is equivalent to a screening process. However, in practice, the stems that overlap each other are not feasible solutions, and may not be calculated in the calculation, and a part of the infeasible solutions may be excluded from the whole calculation process through the overlapping region, and the quantum computer does not process the part of the solutions. Based on this, the quantum computer applicable to Hamiltonian volume constructed in the present application has a wider range, and can be applied to a quantum annealing machine.
In some possible embodiments of the present invention, the constructing the hamiltonian based on all the overlapping areas includes:
constructing an action item of a single stem based on all stems corresponding to the target RNA;
constructing interaction items of a plurality of stems based on stems corresponding to each of the overlapping regions;
and obtaining Hamiltonian by utilizing the action items of the single stems and the interaction items of the plurality of stems.
For each stalk, in order to obtain a viable folding pattern, it is necessary to construct individual stalk actions and also to construct interactions of multiple stalks in order to obtain a better folding effect.
Specifically, the interaction term of the single stem and the interaction terms of the plurality of stems may be added to obtain the hamiltonian amount, or the interaction term of the single stem and the interaction terms of the plurality of stems may be multiplied to perform weighting processing and then added to obtain the hamiltonian amount.
In some possible embodiments of the invention, said obtaining hamiltonian using the single stem interaction term and the multiple stem interaction terms comprises:
the hamiltonian amount is obtained by the following equation:
Figure BDA0003990615500000121
wherein H is C In order to be a hamiltonian amount,
Figure BDA0003990615500000122
as an action item of the individual stems,
Figure BDA0003990615500000123
for the interaction term of multiple stems, q i Is stems i, q j Is stem j, q i And q j Not belonging to the same overlapping region, k i Length of stem i, k j Length of stem j, μ is length of longest stem of all stems corresponding to target RNA, c B Is the addition coefficient of the total length of the stem c L Penalty factor for number of stems, +.>
Figure BDA0003990615500000124
Penalty factor for pseudo junction->
Figure BDA0003990615500000125
Is a penalty factor for overlapping stems.
In the embodiment of the invention, the problem of RNA folding can be approximately described by using the method Xin Moxing, and particularly, when the Hamiltonian volume is constructed, the quality of a folding mode can be judged from the following three angles:
(1) selected stems are not overlapped with each other and have no common base
If stem q i And stem q j Overlapping, then introducing an overlapping energy term in hamiltonian:
Figure BDA0003990615500000131
wherein k is i Represents stem q i Mu represents the length of the longest stem of the RNA strand. If the two stems do not overlap, then this term is 0, i.e
Figure BDA0003990615500000132
(2) The sum of the lengths of all stems should be as long as possible
Folding mode pairThe total length of the stems is Σk i q i The square of the contribution of the increased length to the ground state energy is obtained:
Figure BDA0003990615500000133
(3) the number of stems in the possible folding of the RNA should be as small as possible, i.e.the average length of the selected stems should be as long as possible
In practical applications, it is relatively complex to directly encode the average length into the hamiltonian, and the difference between the selected stem and the longest stem can be used to estimate its contribution to the average length. This is not equivalent to the average length of the stems, which has the meaning of letting the algorithm prefer a smaller number of long-chain combinations than a larger number of short-chain combinations. For example, selecting one length-6 stem is in most cases better than selecting two lengths-3 stems.
Combining the above three aspects and introducing coefficient c B And C L Adjusting the contribution degree of the steps (1) and (2), and writing the total Hamiltonian amount:
Figure BDA0003990615500000134
in addition, if the stem (i 1, j1, k 1) does not overlap with the stem (i 2, j2, k 2), but the corresponding endpoint sequence numbers satisfy i1<i2<j1<j2 or i2<i1<j2<j1, the corresponding three-dimensional structure forms a pseudo junction. The pseudoknot has a certain influence on the energy of the RNA structure, and
Figure BDA0003990615500000135
empirical adjustments are made to the overall energy, the reaction being in a two-body effect. The Hamiltonian volume has only single action items and two action items, wherein the single action items can be simulated by using an Rz gate, and the two Hamiltonian volume can be simulated by using two CNOT gates and one Rz gate, and the corresponding rotation angles, parameters and Hamiltonian volume coefficients are related.
Finally, since the stems in the same overlap domain must not be present simultaneously in a viable folding fashion, in practice, the hamiltonian of construction can omit all two-body contributions in the same overlap domain. This rule reduces the number of quantum gates of the phase separation lines.
Therefore, the embodiment of the invention firstly obtains the overlapping matrix constructed by all stems corresponding to the target RNA, and then determines all overlapping areas from the overlapping matrix; finally, based on all the overlapping regions, constructing a hamiltonian volume, wherein the Ha Midu volume does not have interaction items of a plurality of stems corresponding to the same overlapping region. By utilizing the overlap region, a portion of the overlapping stems are deleted from the RNA-viable fold pattern when constructing the hamiltonian, thereby reducing the effort of quantum computation.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a hamiltonian configuration device according to an embodiment of the present invention, corresponding to the flow shown in fig. 2, where the device includes:
an obtaining module 501, configured to obtain an overlapping matrix constructed by all stems corresponding to a target RNA, where the target RNA is an RNA with a primary structure;
a determining module 502, configured to determine all overlapping areas from the overlapping matrix, where each overlapping area corresponds to a plurality of stems;
a model 503 is configured to construct a hamiltonian volume based on all the overlapping regions, wherein the Ha Midu volume does not have an interaction term of a plurality of stems corresponding to the same overlapping region.
In some possible embodiments of the present invention, the obtaining the model 501 may include:
a determining unit, configured to determine all stem regions corresponding to the target RNA;
an obtaining unit for obtaining all stems contained in each stem region;
and a construction unit for constructing an overlap matrix according to the overlapping relation among all stems.
In some possible embodiments of the present invention, the obtaining unit may be specifically configured to:
for each stem region, stems ordered by stem length are obtained.
In some possible embodiments of the invention, the obtaining unit may be further configured to:
all stems are reordered according to the principle that adjacent stems overlap as much as possible.
In some possible embodiments of the present invention, the obtaining unit may be further specifically configured to:
and (3) re-ordering all stems according to the sequencing value of the stems, wherein the sequencing value is calculated by using the base positions in the stems and the lengths of the stems.
In some possible embodiments of the present invention, the determining module 502 may be specifically configured to:
all overlapping regions are determined from the region of the overlapping matrix having the target shape, which region is composed of target characteristics, wherein the target characteristics characterize that two stems overlap, and the stems corresponding to the overlapping regions overlap each other.
In some possible embodiments of the present invention, the construction module 503 may be specifically configured to:
constructing an action item of a single stem based on all stems corresponding to the target RNA;
constructing interaction items of a plurality of stems based on stems corresponding to each of the overlapping regions;
and obtaining Hamiltonian by utilizing the action items of the single stems and the interaction items of the plurality of stems.
In some possible embodiments of the present invention, the construction module 503 may be further specifically configured to:
the hamiltonian amount is obtained by the following equation:
Figure BDA0003990615500000151
wherein H is C In order to be a hamiltonian amount,
Figure BDA0003990615500000152
as an action item of the individual stems,
Figure BDA0003990615500000153
for the interaction term of multiple stems, q i Is stems i, q j Is stem j, q i And q j Not belonging to the same overlapping region, k i Length of stem i, k j Length of stem j, μ is length of longest stem of all stems corresponding to target RNA, c B Is the addition coefficient of the total length of the stem c L Penalty factor for number of stems, +.>
Figure BDA0003990615500000154
Penalty factor for pseudo junction->
Figure BDA0003990615500000155
Is a penalty factor for overlapping stems.
Therefore, the embodiment of the invention firstly obtains the overlapping matrix constructed by all stems corresponding to the target RNA, and then determines all overlapping areas from the overlapping matrix; finally, based on all the overlapping regions, constructing a hamiltonian volume, wherein the Ha Midu volume does not have interaction items of a plurality of stems corresponding to the same overlapping region. By utilizing the overlap region, a portion of the overlapping stems are deleted from the RNA-viable fold pattern when constructing the hamiltonian, thereby reducing the effort of quantum computation.
The embodiment of the invention also provides a storage medium, in which a computer program is stored, wherein the computer program is configured to implement the steps in any of the method embodiments described above when run.
Specifically, in the present embodiment, the above-described storage medium may be configured to store a computer program for realizing the steps of:
s201: obtaining an overlapping matrix constructed by all stems corresponding to target RNA, wherein the target RNA is RNA with a primary structure;
s202: determining all overlapping regions from the overlapping matrix, wherein each of the overlapping regions corresponds to a plurality of stems;
s203: based on all the overlapping regions, constructing hamiltonian volumes, wherein the Ha Midu volumes do not have interaction items of a plurality of stems corresponding to the same overlapping region.
An embodiment of the invention also provides an electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to carry out the steps of any of the method embodiments described above.
Specifically, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Specifically, in this embodiment, the above-mentioned processor may be configured to implement the following steps by a computer program:
s201: obtaining an overlapping matrix constructed by all stems corresponding to target RNA, wherein the target RNA is RNA with a primary structure;
s202: determining all overlapping regions from the overlapping matrix, wherein each of the overlapping regions corresponds to a plurality of stems;
s203: based on all the overlapping regions, constructing hamiltonian volumes, wherein the Ha Midu volumes do not have interaction items of a plurality of stems corresponding to the same overlapping region.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (11)

1. A method of constructing a hamiltonian, the method comprising:
obtaining an overlapping matrix constructed by all stems corresponding to target RNA, wherein the target RNA is RNA with a primary structure;
determining all overlapping regions from the overlapping matrix, wherein each overlapping region corresponds to a plurality of stems;
based on all the overlapping regions, constructing hamiltonian volumes, wherein the Ha Midu volumes do not have interaction items of a plurality of stems corresponding to the same overlapping region.
2. The method of claim 1, wherein said obtaining an overlap matrix constructed from all stems corresponding to the target RNA comprises:
determining all stem regions corresponding to the target RNA;
obtaining all stems contained in each stem region;
and constructing an overlapping matrix according to the overlapping relation among all the stems.
3. The method according to claim 2, wherein said obtaining all stems contained in each stem region comprises:
for each stem region, stems ordered by stem length are obtained.
4. A method according to claim 3, wherein after obtaining stems ordered by stem length for each stem region, the obtaining of all stems corresponding to the target RNA further comprises:
all stems are reordered according to the principle that adjacent stems overlap as much as possible.
5. The method of claim 4, wherein said re-ordering all stems according to the principle that adjacent stems overlap as much as possible comprises:
and (3) re-ordering all stems according to the sequencing value of the stems, wherein the sequencing value is calculated by using the base positions in the stems and the lengths of the stems.
6. The method of claim 4, wherein said determining all overlapping regions from said overlapping matrix comprises:
all overlapping regions are determined from the region of the overlapping matrix having the target shape, which region is composed of target characteristics, wherein the target characteristics characterize that two stems overlap, and the stems corresponding to the overlapping regions overlap each other.
7. The method of claim 6, wherein constructing a hamiltonian based on all of the overlapping regions comprises:
constructing an action item of a single stem based on all stems corresponding to the target RNA;
constructing interaction items of a plurality of stems based on stems corresponding to each of the overlapping regions;
and obtaining Hamiltonian by utilizing the action items of the single stems and the interaction items of the plurality of stems.
8. The method of claim 7, wherein said obtaining hamiltonian using the single stem interaction term and the plurality of stem interaction terms comprises:
the hamiltonian amount is obtained by the following equation:
Figure FDA0003990615490000021
wherein H is C In order to be a hamiltonian amount,
Figure FDA0003990615490000022
as an action item of the individual stems,
Figure FDA0003990615490000023
for the interaction term of multiple stems, q i Is stems i, q j Is stem j, q i And q j Not belonging to the same overlapping region, k i Length of stem i, k j Length of stem j, μ is length of longest stem of all stems corresponding to target RNA, c B Is the addition coefficient of the total length of the stem c L Penalty factor for number of stems, +.>
Figure FDA0003990615490000024
Penalty factor for pseudo junction->
Figure FDA0003990615490000025
Is a penalty factor for overlapping stems.
9. A hamiltonian construction apparatus, the apparatus comprising:
the obtaining module is used for obtaining an overlapping matrix constructed by all stems corresponding to target RNA, wherein the target RNA is RNA with a primary structure;
a determining module, configured to determine all overlapping areas from the overlapping matrix, where each overlapping area corresponds to a plurality of stems;
and constructing a model for constructing hamiltonian volume based on all the overlapping areas, wherein the Ha Midu volume does not contain interaction items of a plurality of stems corresponding to the same overlapping area.
10. A storage medium having a computer program stored therein, wherein the computer program is arranged to implement the method of any of claims 1 to 8 when run.
11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to implement the method of any of the claims 1 to 8.
CN202211588569.5A 2022-12-09 2022-12-09 Hamiltonian volume construction method and related device Pending CN116052759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211588569.5A CN116052759A (en) 2022-12-09 2022-12-09 Hamiltonian volume construction method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211588569.5A CN116052759A (en) 2022-12-09 2022-12-09 Hamiltonian volume construction method and related device

Publications (1)

Publication Number Publication Date
CN116052759A true CN116052759A (en) 2023-05-02

Family

ID=86122783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211588569.5A Pending CN116052759A (en) 2022-12-09 2022-12-09 Hamiltonian volume construction method and related device

Country Status (1)

Country Link
CN (1) CN116052759A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523066A (en) * 2023-07-03 2023-08-01 微观纪元(合肥)量子科技有限公司 Ground state energy calculating method based on stability stator and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523066A (en) * 2023-07-03 2023-08-01 微观纪元(合肥)量子科技有限公司 Ground state energy calculating method based on stability stator and related equipment
CN116523066B (en) * 2023-07-03 2023-09-12 微观纪元(合肥)量子科技有限公司 Ground state energy calculating method based on stability stator and related equipment

Similar Documents

Publication Publication Date Title
Jiang et al. Efficient network architecture search via multiobjective particle swarm optimization based on decomposition
Cai et al. Proxylessnas: Direct neural architecture search on target task and hardware
US20230143652A1 (en) Automated Synthesizing of Quantum Programs
Frutos et al. A memetic algorithm based on a NSGAII scheme for the flexible job-shop scheduling problem
Janga Reddy et al. An efficient multi-objective optimization algorithm based on swarm intelligence for engineering design
Wang et al. Architecture evolution of convolutional neural network using monarch butterfly optimization
Nearchou A novel metaheuristic approach for the flow shop scheduling problem
CN110516810B (en) Quantum program processing method and device, storage medium and electronic device
Filatovas et al. A preference-based multi-objective evolutionary algorithm R-NSGA-II with stochastic local search
CN113821983B (en) Engineering design optimization method and device based on proxy model and electronic equipment
CN114764549B (en) Quantum circuit simulation calculation method and device based on matrix product state
CN112073126B (en) Method and device for ordering network node importance
Kumar et al. Pareto evolutionary algorithm hybridized with local search for biobjective TSP
Bin et al. A binary particle swarm optimization algorithm inspired by multi-level organizational learning behavior
CN116052759A (en) Hamiltonian volume construction method and related device
Agrawal et al. Evolutionary algorithm hybridized with local search and intelligent seeding for solving multi-objective Euclidian TSP
WO2022147583A2 (en) System and method for optimal placement of interacting objects on continuous (or discretized or mixed) domains
Meirom et al. Optimizing tensor network contraction using reinforcement learning
Gungon et al. GPU implementation of evolving spiking neural P systems
Asadi et al. Reconfigurable computing for learning Bayesian networks
Han et al. Cooperative hybrid evolutionary algorithm for large scale multi-stage multi-product batch plants scheduling problem
CN114511094B (en) Quantum algorithm optimization method and device, storage medium and electronic device
CN114519429B (en) Method, device and medium for obtaining observability quantity of target system
Zhang et al. Multi-objective cuckoo algorithm for mobile devices network architecture search
Joneckis et al. An initial look at alternative computing technologies for the intelligence community

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 230088 6th floor, E2 building, phase II, innovation industrial park, 2800 innovation Avenue, high tech Zone, Hefei City, Anhui Province

Applicant after: Benyuan Quantum Computing Technology (Hefei) Co.,Ltd.

Address before: 230088 6th floor, E2 building, phase II, innovation industrial park, 2800 innovation Avenue, high tech Zone, Hefei City, Anhui Province

Applicant before: ORIGIN QUANTUM COMPUTING COMPANY, LIMITED, HEFEI