CN111931939A

CN111931939A - Single-amplitude quantum computation simulation method

Info

Publication number: CN111931939A
Application number: CN201910394102.9A
Authority: CN
Inventors: 王晶; 窦猛汉
Original assignee: Origin Quantum Computing Technology Co Ltd
Current assignee: Origin Quantum Computing Technology Co Ltd
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2020-11-13
Anticipated expiration: 2039-05-13
Also published as: CN111931939B

Abstract

The invention discloses a single-amplitude quantum computation simulation method, which comprises the following steps: obtaining a target quantum program for each computing node of the distributed cluster; constructing an undirected graph corresponding to the target quantum program; wherein, the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the operation of the quantum logic gate, and one edge of the undirected graph corresponds to a tensor; obtaining a quantum state corresponding to a target single amplitude to be measured, and calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and matched with the GPU corresponding to the calculation node; wherein the sub-amplitude is an amplitude corresponding to the undirected graph; and returning the sub-amplitudes to the main node of the distributed cluster so that the main node reduces each sub-amplitude to obtain the amplitude of the quantum state as a target single amplitude. With the embodiments of the present invention, quantum computing simulations involving 50 or even more qubits can be achieved.

Description

Single-amplitude quantum computation simulation method

Technical Field

The invention belongs to the technical field of quantum computation, and particularly relates to a single-amplitude quantum computation simulation method.

Background

Quantum computers are physical devices that perform high-speed mathematical and logical operations, store and process quantum information in compliance with the laws of quantum mechanics. When a device processes and calculates quantum information and runs quantum algorithms, the device is a quantum computer.

Quantum computers can perform a variety of tasks that classical computers cannot accomplish, such as quantum simulation and factoring large figures. On the way of quantum computing, in order to realize 'quantum ownership', a quantum computer with quantum bit number of more than 50 and high fidelity is required to be realized. Before the realization of the method, the quantum computation simulation can be carried out through the related theory of the quantum computation to realize the software and hardware decoupling of the quantum computer, and the foundation is laid for the development of quantum programs and quantum applications.

The quantum computation simulation is a simulation computation which simulates and follows the law of quantum mechanics by means of numerical computation and computer science, and is used as a simulation program which describes the space-time evolution of quantum states by utilizing the high-speed computing capability of a computer according to the basic law of quantum bits of the quantum mechanics.

At present, the quantum computation simulation usually adopts full-amplitude simulation, that is, all amplitudes of the last state of a quantum bit are simulated at one time, but the full-amplitude simulation is computed based on unitary transformation, and the memory overhead of the full-amplitude simulation increases exponentially with the number of the quantum bits. For example, to simulate a quantum computation involving 30 qubits, the memory overhead is 16 gbytes (gigabytes); at 40 qubits, the memory overhead requires 16TByte (terabyte), i.e., 2¹⁰(16 GByte); for 50 qubits, the memory overhead is 16 PBytes (beats), i.e. 2¹⁰16 TByte. The simulation method is hard to bear for common cloud platforms and even super computing platforms which provide quantum computing simulation services, the academic world can only simulate 49 quantum bits at most by using a full-amplitude simulator at present, and the simulation result is based on the largest super computer in the world, but the cloud services are not provided externally, and the research and development of quantum programs and quantum applications are not facilitated. In this case, single-amplitude simulation, i.e. a scheme of simulating only one amplitude at a time, has been proposed, and the memory requirement of this mode will be much smaller. Therefore, it can be seen that, under the condition that the memory resource of the current platform is limited, the research and implementation related to the quantum computation simulation of the amplitude of the single quantum state component are particularly important for the development of quantum computation.

Disclosure of Invention

The invention aims to provide a single-amplitude quantum computation simulation method to solve the defects in the prior art and realize computation simulation involving 50 or more qubits.

The technical scheme adopted by the invention is as follows:

a single amplitude quantum computational simulation method, the method comprising:

obtaining a target quantum program for each computing node of the distributed cluster;

constructing an undirected graph corresponding to the target quantum program; wherein, the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the operation of the quantum logic gate, and one edge of the undirected graph corresponds to a tensor;

obtaining a quantum state corresponding to a target single amplitude to be measured, and calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and matched with the GPU corresponding to the calculation node; wherein the sub-amplitude is an amplitude corresponding to the undirected graph;

and returning the sub-amplitudes to the main node of the distributed cluster so that the main node reduces each sub-amplitude to obtain the amplitude of the quantum state as a target single amplitude.

Optionally, the constructing an undirected graph corresponding to the target quantum program includes:

analyzing the target quantum program to obtain a linked list for recording quantum program information;

traversing the linked list, and creating an edge with a tensor order of 1 when the type of the quantum logic gate in the linked list is a first single quantum gate; wherein the edge is connected with the last vertex of the vertex chain corresponding to the quantum bit operated by the first single quantum gate, and the unitary matrix of the first single quantum gate is a diagonal matrix;

when the type of the quantum logic gate in the linked list is a second single quantum gate, creating an edge with the tensor order of 2 and a vertex connected with the edge; the edge is connected with the last vertex of the corresponding vertex chain of the quantum bit operated by the second single quantum gate, and the unitary matrix of the second single quantum gate is a non-diagonal matrix;

when the type of the quantum logic gate in the linked list is a first double quantum gate, an edge with the tensor order of 2 is created; wherein the edge is connected with the last vertex in the vertex chain respectively corresponding to the two qubits operated by the first dual-quantum gate, and the unitary matrix of the first dual-quantum gate is a diagonal matrix;

when the type of the quantum logic gate in the linked list is a second double quantum gate, an edge with the tensor order of 4 and two vertexes connected with the edge are created; the edge is connected with the last vertex in the vertex chain respectively corresponding to the two qubits operated by the second double-quantum gate, and the unitary matrix of the second double-quantum gate is a non-diagonal matrix;

and obtaining an undirected graph corresponding to the target quantum program.

Optionally, the calculating, based on the quantum state and the undirected graph and in cooperation with the GPU corresponding to the calculation node, the sub-amplitude of the quantum state includes:

calling a GPU corresponding to the computing node, and respectively determining the tensors of edges connected with specific vertexes of the undirected graph to reduce the order; wherein the specific vertex is the first and last vertex of the vertex chain corresponding to each qubit;

deleting the particular vertex;

receiving a value of a target vertex allocated by the master node, splitting a current undirected graph based on the value of the target vertex, and calling the GPU to respectively determine value reduction of tensors of connecting edges of the target vertex aiming at each sub-undirected graph obtained by splitting;

aiming at each vertex in the sub-undirected graph, combining the GPU to fuse all connecting edges of the vertex into a new edge, reducing the tensor of the new edge, and deleting the vertex;

taking product of tensor values of all the reduced new edges to obtain a first sub-amplitude of the quantum state corresponding to the sub-undirected graph;

and summing the first sub-amplitudes of all the sub-undirected graphs in the quantum state to obtain the sub-amplitude of the quantum state.

Optionally, the determining the value reduction of the tensors of the edges connected to the specific vertex of the undirected graph respectively includes:

the GPU corresponding to the computing node sets the number of thread blocks according to the reduced tensor order and the number of threads in each thread block in the GPU aiming at the edge connected with each specific vertex;

calculating a first element number of the tensor after the reduction according to the thread block serial number, the number of threads in each thread block and the line program number, and calculating two second element numbers of the tensor before the reduction corresponding to the first element number; the number of the element number corresponds to the number of the vertex bits connected with the current edge one by one, and the value of each bit of the element number is the value of the vertex of the corresponding vertex bit;

determining a second element number with a preset determination value on the number position corresponding to the specific vertex position from the two second element numbers;

and acquiring a second element value corresponding to the determined second element number, and determining the second element value as a first element value corresponding to the first element number.

Optionally, the receiving a value of a target vertex allocated by a master node, splitting the current undirected graph based on the value of the target vertex, and invoking the GPU to determine value reduction of a tensor of a target vertex connection edge for each sub-undirected graph obtained by splitting, includes:

receiving one or more values of the target vertex equally divided by the main node; wherein the target vertex is the first m vertices with the maximum number of connected edges in the current undirected graph, and the m vertices comprise 2^mThe number of the calculation nodes is 2ⁿN is a positive integer, and n is greater than 0 and less than or equal to m;

splitting an undirected graph of the computing node into one or more sub-undirected graphs aiming at each evenly-divided vertex value;

and traversing the edges connected with the target vertex aiming at each sub-undirected graph, and calling the GPU to respectively determine the tensors of the edges connected with the target vertex and reduce the order.

Optionally, the merging, by cooperating with the GPU, all the connection edges of the vertex into a new edge includes:

determining a first edge and a second edge to be fused aiming at all connecting edges of the vertex;

calling the GPU to perform upscaling on the first tensor of the first edge according to the vertex which is not connected with the first edge in the second edge, and updating the first tensor by the upscaled tensor;

deleting the second edge, and connecting the vertex of the second edge, which is not connected with the first edge, to the first edge to obtain a fused middle edge;

calling the GPU to calculate tensor elements of the middle edge according to the recorded corresponding relation between the vertex numbers of the first edge and the second edge;

and returning to the step of determining the first edge and the second edge to be fused until the tensor element obtained by calculation is the tensor element of the last edge, and determining the last edge as a new edge to be fused.

Optionally, the step-up of the first amount of the first edge includes:

the GPU calculates the tensor order after the order is increased according to the order of the first tensor and the increased order;

setting the number of thread blocks according to the tensor order after the upgrade and the number of threads in each thread block in the GPU;

calculating the first element number of the tensor after the upgrade according to the thread block serial number, the thread number in each thread block and the line program number;

the element of the tensor after the ascending order is calculated according to the first element number, the ascending order and the element of the first tensor.

Optionally, the calculating tensor elements of the middle edge includes:

the GPU sets the number of thread blocks according to the updated order of the first tensor and the number of threads in each thread block in the GPU;

calculating the first element number of the tensor of the middle edge according to the thread block serial number, the number of threads in each thread block and the line program number;

determining a corresponding element of each element in the first tensor in a second tensor of the second edge according to the corresponding relation;

traversing each element in the first tensor to update the element by its product with its corresponding element in the second tensor.

Optionally, the reducing the tensor of the new edge includes:

the GPU sets the number of thread blocks according to the tensor order of the new edge after order reduction;

and acquiring two second element values corresponding to the two second element numbers one to one, summing the two second element values, and determining the sum as a first element value corresponding to the first element number.

Optionally, the calculation formula of the first element number is:

Idx＝block_id*num+thread_id

the Idx is a first element number, block _ id is a thread block serial number, num is the number of threads in each thread block, and thread _ id is a thread program number.

Compared with the prior art, the method can calculate only one target single amplitude of the involved qubits at a time, specifically, a target quantum program is mapped onto an undirected graph, the undirected graph is split to a plurality of calculation nodes by combining a path integration method, and each calculation node and a subordinate GPU are matched to calculate the corresponding sub-undirected graph. The whole calculation process is mostly based on simple operation of elements in tensor, compared with full-amplitude simulation based on unitary matrix in the prior art, the requirement on the memory is greatly reduced, and the calculation amount does not rise along with the index of the quantum bit, so that quantum calculation simulation related to 50 or more quantum bits can be realized; the GPU has stronger performance of executing massive parallel computation, so that the quantum computation simulation efficiency is higher. At present, quantum computation simulation involving 196 qubits can be realized at most by applying the technical solution provided by the embodiments of the present invention.

In addition, in practical application, sometimes only one or more amplitudes in the full amplitude of the qubits are needed, and in this case, if the full amplitude mode in the prior art is adopted, that is, all the amplitudes are simulated at one time, the waste of resources such as a memory and time is undoubtedly caused; by applying the method provided by the embodiment of the invention, one or more times of simulation can be performed in a targeted manner, and one or more single amplitudes required can be simulated, so that resources and time are greatly saved.

Drawings

FIG. 1 is a specific example of the splitting of quantum wires into different paths for a quantum program of the present invention;

FIG. 2 is a flow chart of a single amplitude quantum computation simulation method according to an embodiment of the present invention;

fig. 3 is a schematic undirected graph diagram of different types of quantum logic gates constructed in the single-amplitude quantum computation simulation method according to the embodiment of the present invention.

Detailed Description

The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

In order to realize computational simulation involving 50 or more qubits, embodiments of the present invention provide a single-amplitude quantum computational simulation method and apparatus.

First, a single-amplitude quantum computation simulation method provided by the embodiment of the present invention is described below.

As will be appreciated by those skilled in the art, each qubit may be at |0 simultaneously>And |1>The quantum state ψ of one qubit can be represented as a |0>+b|1>Wherein a and b are respectively |0>、|1>The amplitudes of (A) are all complex numbers. After measurement, the quantum state collapses to a fixed quantum state, where it collapses to |0>Has a probability of²Collapse to |1>Has a probability of b²，a²+b ²1. And the quantum state of n qubits is 2ⁿA superposition of individual quantum states. For example, the quantum state ψ of a 3 qubit composition is 2³(i.e., 8) superposition states of quantum states, wherein the 8 quantum states are each |000>、|001>、|010>、|011>、|100>、|101>、|110>And |111>At this time, the quantum state ψ of the 3 qubit composition can be expressed as

ψ＝c₀|000＞+c₁|001＞+c₂|010＞+c₃|011＞+c₄|100＞+c₅|101＞+c₆|110＞+c₇|111＞。

Wherein each of the 8 quantum states is referred to as a quantum state component, and each quantum state component has an amplitude, i.e., c₀To c₇These complex numbers may be referred to as a single amplitude. Full-amplitude simulation refers to one-time simulation of 2 of n qubitsⁿThe amplitude of the individual quantum state components; the single-amplitude simulation is a one-time simulation 2ⁿAmplitude of any one quantum state component of the individual quantum states.

Currently, full amplitude mode is mostly adopted in the industry for quantum computation simulation. However, for the full-amplitude mode, the memory occupied by the full-amplitude mode generally increases exponentially with the number of analog qubits, for example, only 16KB memory is needed for simulating 10 qubits, 16MB memory is needed for simulating 20 qubits, 16GB is needed for simulating 30 qubits, and up to 16PB memory is needed for simulating 50 qubits, and all computer memories in the integrated world cannot realize full-amplitude simulation of 50 qubits or even more.

In view of this, the embodiment of the present invention provides a single-amplitude quantum computation simulation method, which is applied to compute nodes of a distributed cluster, where all the compute nodes correspond to one master node, each compute node belongs to (i.e., controls) one or more GPUs, and the number of GPUs controlled by one compute node may be a positive integer power of 2, so that it is convenient to process quantum information because the number of quantum states of a qubit is an index of 2. The following description will take an example in which one compute node controls one GPU. Preferably, the distributed cluster may be a supercomputer cluster (e.g. the optical supercomputer platform of the Shenwei Taihu lake).

It should be noted that the quantum program is a string of instruction sequences written by a quantum language and capable of running on a quantum computer, so that the support of the operation of the quantum logic gate is realized, and the simulation of the quantum computation is finally realized. In particular, a quantum program is a sequence of instructions that operate quantum logic gates in a time sequence.

A quantum logic gate is a basic quantum wire that operates on a small number of qubits (i.e., qubits). It is the basis for quantum wires, like the relationship between conventional logic gates and conventional digital wires. The quantum logic gate comprises a single quantum logic gate, a double quantum logic gate and a multiple quantum logic gate.

A quantum program may contain tens or hundreds of quantum logic gate operations, or may contain thousands or millions of quantum logic gate operations. The execution process of the quantum program is a process executed for all the quantum logic gates according to a certain time sequence. The timing is a time sequence in which the quantum logic gates are executed.

It should be noted that quantum logic gates are generally represented by unitary matrices, and unitary matrices are not only matrices, but also operations and transformations. The function of a general quantum logic gate on a quantum state is calculated by multiplying the unitary matrix by the right vector corresponding to the quantum state.

For example: quantum state |0>Corresponding right vector is

And quantum state |1>Corresponding right vector is

The quantum logic gates can be further divided into diagonal quantum logic gates and non-diagonal quantum logic gates according to the unitary matrix type. The diagonal quantum logic gate refers to a quantum logic gate with a unitary matrix being a diagonal matrix. As is known, a diagonal matrix is a matrix whose elements outside the main diagonal are all 0, and the elements on the diagonal can be 0 or other values.

Such as an identity matrix

A typical diagonal matrix.

In contrast, there are non-zeros outside the main diagonalThe matrix of elements being a non-diagonal matrix, e.g.

And the unitary matrix is a non-diagonal matrix quantum logic gate, namely a non-diagonal quantum logic gate.

It will be understood by those skilled in the art that the initial quantum state |0.. 0 of the qubit is assumed to be divided in the target quantum program>And the last state is involved in M₁Quantum state, then, since the state of each qubit can be at |0>And |1>So as to be directed to M₁One quantum state of the population is split into two quantum state components: i0>And |1>Then, the initial quantum state to the final state component X ═ X can be obtained₀...x_n-1>2 of (2)^M1Calculating the amplitude of each path by using possible transformation paths, and summing to obtain the final state component, namely the amplitude of the target quantum state component. Wherein M is₁Is a positive integer.

For example, a target quantum process involves 2 qubits, respectively: q. q.s₀、q₁Initial state s₀＝|00>The target quantum state component is |11>The quantum program includes 2H gates (Hadamard Gate ): h₁、H ₂1 CNOT Gate (Control-not Gate).

As shown in FIG. 1a, which gives a simple illustration of the quantum process with respect to a quantum wire, it can be seen that the quantum wire divides the initial quantum state |00>And a target quantum state component |11>In addition, 4 quantum states are involved: s₀ ¹、s₀ ²、s₁ ¹、s₁ ²Where each quantum state can be represented as |0>And |1>In the stacked state. If will s₁ ¹Splitting into |0>And |1>Two parts, then, from the initial quantum state |00>To |11>It can be split into two paths as shown in (1b) and (1c) to obtain the initial quantum state |00>Transformed into |11 via two paths>And summing the sub-amplitudes to obtain an amplitude value corresponding to the target quantum program, thereby completing the simulation.

It will be appreciated that if M in a target quantum program is to be used₁Quantum state, all splitting it into |0>And |1>Two parts, then the initial quantum state to the final state component is obtained

And calculating the amplitude of each path by using the possible transformation paths, and summing to obtain the target single amplitude of the quantum state.

In a quantum program with only single-quantum logic gates and diagonal dual-quantum logic gates, the initial quantum state of a given qubit is |0.. 0>When the final state component, i.e. the target quantum state component, takes the value of x ═ x₀...x_n-1>Then, the calculation formula of the amplitude can be expressed as:

formula (1) is a basic formula of the quantum mechanical path integration method.

It should be noted that psi functions in formula (1) are all complex functions related to boolean variables, and represent the contribution of quantum logic gates to quantum states, and for better illustration, formula (1) only represents one of the three types of psi functions, and omits other psi functions;

the quantum bit with the value of {0, 1}, corresponding to the quantum bit j, is subjected to the action of the kth quantum logic gate to obtain the quantum state component. The value of psi function is mainly related to two factors, namely, the quantum state of quantum bit operated by the quantum logic gate before and after the quantum logic gate is executed, and unitary matrix of the quantum logic gate.

In particular, the amount of the solvent to be used,

is about a Boolean variable

And

the value of the complex function is determined by the values of two variables and the unitary matrix of the corresponding diagonal dual-quantum logic gate,

and

respectively corresponding to qubits of v₁And v₂The two quantum bits are not subjected to the component of the quantum state before the action of the v diagonal double-quantum logic gate;

is about a Boolean variable

The value of the complex function is determined by the value of the variable and the unitary matrix corresponding to the diagonal single-quantum logic gate,

the quantum bit corresponding to the quantum bit u is not subjected to the component of the quantum state before the action of the u' th diagonal single-quantum logic gate;

is about a Boolean variable

And

the value of the complex function is determined by the values of two variables and a unitary matrix corresponding to the non-diagonal single-quantum logic gate,

and

respectively correspond to quantityThe quantum bit with the sub-bit j has the components of the front and back quantum states before and after the action of the ith off-diagonal single-quantum logic gate. It is understood that j, k, v₁、v₂、v₁′、v₂', u', i are all non-negative integers.

The single-amplitude quantum computation simulation method provided by the embodiment of the invention is expanded to an off-diagonal double-quantum logic gate based on a formula (1), and is mapped into an undirected graph. In particular, the amount of the solvent to be used,

the ψ function corresponds to the edge of the undirected graph, and converts the solving formula (1) into processing for the undirected graph, as will be understood from the points of the undirected graph

The value of (2) can split the undirected graph; more specifically, an undirected graph corresponding to a target quantum program is constructed on a plurality of computing nodes, the undirected graph is split according to different vertex values to obtain different sub-undirected graphs, amplitudes of corresponding paths are obtained through the calculation of the sub-undirected graphs, and finally the amplitudes of all the paths are combined to obtain corresponding amplitudes of target quantum state components.

As shown in fig. 2, a single-amplitude quantum computation simulation method provided in an embodiment of the present invention may include the following steps:

s201, aiming at each computing node of the distributed cluster, obtaining a target quantum program;

from the perspective of the computing nodes, each computing node obtains an identical target quantum program sent by the master node, the program is preferably written in the existing QRunes language, and may also be written in other feasible quantum languages, the computing node is usually a CPU in a computer, the master node may also be a CPU, and the first computing node may be used as the master node, which can take the quantum program input by a user, and the computing node is hereinafter described as the CPU.

S202, constructing an undirected graph corresponding to the target quantum program; wherein, the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the operation of the quantum logic gate, and one edge of the undirected graph corresponds to a tensor;

first, the target quantum program may be parsed to obtain a linked list of recorded quantum program information. And traversing the linked list, sequentially reading the types and unitary matrix forms of all the quantum logic gates in the linked list, adding vertexes and edges, and constructing an undirected graph of the quantum program. One specific implementation is as follows:

when the type of the quantum logic gate in the linked list is a first single quantum gate, creating an edge with a tensor order of 1; wherein the edge is connected with the last vertex of the vertex chain corresponding to the quantum bit operated by the first single quantum gate, and the unitary matrix of the first single quantum gate is a diagonal matrix;

and finishing traversing, and finishing adding vertexes and edges to obtain an undirected graph corresponding to the target quantum program.

It should be noted that, in the process of constructing an undirected graph, when a vertex is created, the vertex is recorded as belonging to the next vertex of the qubit operated by the current quantum logic gate.

In the constructed undirected graph, each quantum bit corresponds to vertex chain information, and the vertex chain information comprises vertex values from a first vertex to a last vertex, vertex connection side information and vertex identification. The vertex identification uniquely determines a vertex, and the quantum bit to which the corresponding vertex belongs, the value of the quantum bit, the connection side information and the like can be determined according to the identification; after the value of the vertex is determined, the value is 0 or 1; however, when the vertex value is uncertain, the vertex value may be null, or any agreed numerical value or character that conforms to the vertex value type, such as-1, for determining the vertex value condition in the implementation process. The vertex values may be expressed as tensors, or may be expressed as variables or other reasonable data types.

The undirected graph also includes tensor information for the edges, which may include an array of tensors and an identification of vertices connected by the edges corresponding to the tensors.

Wherein, the vertex of the undirected graph corresponds to the quantum state component of the operated qubit before or after the operation of the quantum logic gate, and the values are all {0, 1}, corresponding to the variable in the formula (1)

And the operated qubit is the qubit corresponding to the operation of the quantum logic gate. The edges of the undirected graph correspond to quantum logic gates in the target quantum program, specifically, the edges corresponding to each quantum logic gate correspond to a tensor, elements in the tensor are determined by unitary matrixes corresponding to the quantum logic gates and vertex values connected with the corresponding edges, and it can be understood that the tensor corresponds to the ψ function in formula (1).

Tensor (Tensor) is a quantity defined in several linear spaces simultaneously, and is a generalization of the concept of vectors and matrices. Each tensor can be indexed using subscript notation, e.g., tensor T₁₂The number of subscripts is the order of the tensor (rank), which represents the dimension of the tensor. For example, the scalar is a 0 th order tensor, the vector is a 1 st order tensor, and the matrix is a 2 nd order tensor. The shape of the tensor is then the number of elements in each dimension; the number of tensor elements is determined by their shape.

For example, tensor B is known₁₂₃₄The subscript is "1234", which is a 4 th order tensor; wherein, if there are 3 elements in the dimension represented by subscript 1, 4 elements in the dimension represented by subscript 2, 5 elements in the dimension represented by subscript 3, and 4 elements in the dimension represented by subscript 4, tensor B is represented₁₂₃₄Shape of [3, 4, 5, 6 ]]The number of elements is: 3 × 4 × 5 × 6 is 360.

In the embodiment of the invention, the order of the tensor is equal to the number of vertexes connected with corresponding edges of the tensor, and the subscript of the tensor is the number of vertexes in the undirected graph. Since each vertex can only take a value of 0 or 1, there are only two possibilities, and thus, for an n-order tensor, its shape is [2, 2, 2.. 2 ]]The number of elements is 2ⁿ。

The number of each tensor element can be represented as a binary number, and each bit of the binary number can be represented as a value of a corresponding vertex.

For example, edge E_mConnecting 4 vertices, the corresponding tensor is a 4 th order tensor with 2 total elements⁴The element number can be represented as a binary number: (0000)₂～(1111)₂. When edge E_mThe 4 connected vertexes are arranged according to a certain sequence to obtain a vertex sequence, and when the four vertexes all take values of 0, the combined value of the vertex sequence is 0000, and the (0000) th corresponding tensor is obtained₂A bit element; when edge E_mThe value of the first vertex of the connection is 1, the value of the second vertex is 0, the value of the third vertex is 0, and the value of the fourth vertex is 1, the combined value of the vertex sequences is '1001', corresponding to the tensor (1001)₂A bit element.

In practical applications, when the qubit logic gate is a single-quantum logic gate or a dual-quantum logic gate and the unitary matrix corresponding to the qubit logic gate is a diagonal matrix, i.e., when the diagonal single-quantum logic gate or the diagonal dual-quantum logic gate acts on the qubit, the qubit logic gate usually only changes the amplitude, and the quantum state component corresponding to the qubit corresponds to the quantum state component in the formula (1)

There is typically no change.

Single quantum logic gates of this type, typically Pauli-Z gates, have a unitary matrix of

When a qubit is operated on with a Poly-Z gate, the basic state |0> of the qubit is left unchanged and |1> is converted to- |1 >.

A dual quantum logic gate of this type, typically a CZ gate, has a unitary matrix of:

when for two qubits Q₀、Q₁(Q₀To control bits, Q₁Target bit) when performing a CZ gate operation, Q is the same as the target bit₀Quantum state of (b) is |0>When is, Q₁The quantum state of (a) is unchanged; when Q is₀Quantum state of |1>When is, Q₁Quantum state retention of 0>Invariably, will |1>Change to- |1>。

It can be seen that either the pauli-Z gate or the CZ gate brings about only a change in the amplitude of the quantum state component, with the quantum state component being unchanged. Therefore, when an undirected graph is constructed for such a quantum logic gate, i.e., a single quantum logic gate or a double quantum logic gate in which a unitary matrix is a diagonal matrix, an edge corresponding to the quantum logic gate is added to the undirected graph.

As shown in FIG. 3a, for diagonal single quantum logic gate, only one edge E is added when constructing an undirected graph₁That is, one end of the edge and the vertex V₀Are connected. Wherein, V₀The current last vertex of the corresponding quantum bit, namely the vertex corresponding to the quantum state directly acted by the quantum logic gate.

As can be appreciated, edge E₁Only one vertex is connected, so its tensor

Is tensor of order 1 and has a total of 2¹2 elements, the tensor corresponds to ψ in equation (1)_uA function representing the contribution of the diagonal single quantum logic gate to a quantum state. In particular, it is assumed that the unitary matrix of the diagonal single-quantum logic gate

Then

Wherein, the vertex V₀When the value is "0", it corresponds to

To (0)₂Bit elements, i.e. U₀₀，V₀When the value is "1", the number (1) of the corresponding tensor is₂Bit elements, i.e. U₁₁。

For example, when the edge E₁When the corresponding edge of the diagonal single quantum logic gate Pagli-Z gate is known, the tensor can be known according to the unitary matrix of the Pagli-Z gate

As shown in FIG. 3b, for diagonal double-quantum logic gate, only one edge E needs to be added when constructing an undirected graph₁₂One end of the edge and the vertex V₁Connected with one end of the vertex V₂Are connected. Wherein, V₁And V₂The two qubits of the diagonal double-quantum logic gate operation are respectively corresponding to the current last vertex, i.e. the vertex corresponding to the quantum state directly acted by the diagonal double-quantum logic gate.

It will be appreciated that edge E, as shown in FIG. 3b₁₂With two vertices V₁And V₂Are connected so their tensors

Is 2-order tensor and has a total of 2²4 elements, the tensor corresponds to formula (1)ψ_vA function representing the contribution of the diagonal dual quantum logic gate to a quantum state. In particular, it is assumed that the unitary matrix of the diagonal biquantum logic gate

Then

Wherein when V₁V₂When the value is "00", it corresponds to

No. 00₂A bit element; v₁V₂When the value is "01", it corresponds to

To (01)₂A bit element; when V is₁V₂When the value is "10", it corresponds to

Of (10)₂A bit element; v₁V₂When the value is "11", it corresponds to

No. 11)₂A bit element. Of course, the vertex sequence "V" may be followed₂V₁"corresponds to the elements in the tensor, and, in this order, it is understood that,

it should be noted that one value of the vertex sequence uniquely determines one element in the tensor, but the positional relationship between the value of the vertex sequence and the corresponding element in the tensor is not unique and can be determined according to actual requirements.

For example, when the edge E₁₂When the corresponding edge of the diagonal biquantum logic gate CZ gate is obtained, the tensor can be known according to the unitary matrix

When the quantum logic gate is a single-quantum logic gate and the unitary matrix corresponding thereto is an off-diagonal matrix, i.e., an off-diagonal single-quantum logic gate, such as an H-gate, the unitary matrix is

Through the operation of the H gate, |0>Will become into

|1>Will become into

Both the amplitude and the quantum state components are changed. For the quantum logic gate, when an undirected graph is constructed, a vertex corresponding to a new quantum state and an edge corresponding to a quantum logic gate after a single quantum logic gate operation need to be added to the undirected graph.

As shown in FIG. 3c, for the non-diagonal single quantum logic gate, when constructing an undirected graph, a vertex V is added₄An edge E₃₄One end of the edge and the vertex V₃Connected with one end of the vertex V₄Are connected. Wherein, V₃The current last vertex corresponding to the qubit operated by the off-diagonal single-quantum logic gate, namely the vertex corresponding to the quantum state directly acted by the off-diagonal quantum logic gate; v₄And the new vertex corresponds to a new quantum state after the operation of the off-diagonal single-quantum logic gate.

As can be appreciated, edge E₃₄With two vertices V₃And V₄Are connected so their tensors

Is 2-order tensor and has a total of 2²4 elements, the tensor corresponds to ψ in equation (1)_iA function representing the contribution of the off-diagonal single quantum logic gate to a quantum state. In particular, it is assumed that the unitary matrix of the diagonal single-quantum logic gate

Then

Wherein when V₃V₄When the value is '00', the

No. 00₂A bit element; v₃V₄When the value is "01", it corresponds to

To (01)₂A bit element; when V is₃V₄When the value is "10", it corresponds to

Of (10)₂A bit element; v₃V₄When the value is "11", it corresponds to

No. 11)₂A bit element.

For example, when the edge E₃₄When the corresponding edge of the H gate of the non-diagonal single quantum logic gate is known according to the unitary matrix, tensor

When the quantum logic gate is a dual-quantum logic gate and the corresponding unitary matrix is an off-diagonal matrix, i.e., an off-diagonal dual-quantum logic gate, such as a CNOT gate, the unitary matrix is

The quantum state of the control bit is |0 by operation of the CNOT gate>The quantum state of the steered bit is unchanged, i.e., the quantum states of the steering bit and the steered bit are |00>And |01>Then, after CNOT gate operation, still respectively being |00>And |01>(ii) a The quantum state of the control bit is |1>Then the quantum state of the controlled bit is negated, i.e., 0 is applied>Becomes |1>Will |1>Becomes |0>I.e. the quantum states of the control bit and the steered bit are |10>And |11>When operated by CNOT gate, become respectively non-conducting11>And |10>. Since two qubits are usually at |00>、|01>、|10>And |11>So that after the CNOT gate operation, it can be seen that both the amplitude and the quantum state components of the two qubits involved will change. For such quantum logic gates, when an undirected graph is constructed, for each qubit, a vertex corresponding to a new quantum state of the undirected graph after the operation of the dual-quantum logic gate, that is, two new vertices, is added to the undirected graph, and a corresponding edge of the dual-quantum logic gate is added to the undirected graph.

As shown in fig. 3d, for the non-diagonal biquantum logic gate, when constructing the undirected graph, an edge E needs to be added₅₆₇₈For the sake of clarity, two edges are shown here, but it should be noted that the two edges in fig. 3d are actually the same edge E₅₆₇₈. Newly added edge end and vertex V₅、V₇Connected with one end of the vertex V₆、V₈Are connected. Wherein, V₅And V₆Respectively for the current last vertex, V, corresponding to the two qubits of the off-diagonal dual-quantum logic gate operation₇And V₈Respectively corresponding to the new quantum state of the two quantum bits after the operation of the off-diagonal double-quantum logic gate. In particular, when the non-diagonal double-quantum logic gate is a control logic gate, V₅、V₇Corresponding to the control bit, V₆、V₈Corresponding to the controlled bits.

It will be appreciated that edge E, as shown in FIG. 3d₅₆₇₈And four vertices V₅、V₆、V₇And V₈Are connected so their tensors

Is 4-order tensor and has a total of 2⁴Let the vertex sequence "V" be 16 elements₅V₆V₇V₈The values (0000 to 1111) correspond to

Middle (0000)₂～(1111)₂A bit element. In particular, assume a unitary matrix of the non-diagonal biquantum logic gate

Wherein, when i ≠ j, U_ijNot all 0, then the vertex sequence "V₅V₆V₇V₈Value and U₄The corresponding relationship between the elements in (1) is shown in table 1:

TABLE 1V₅V₆V₇V₈Value and U₄The corresponding relation of each element in

In a clear view of the above, it is known that,

for example, when the edge E₃₄When the corresponding edge of the non-diagonal single-quantum logic gate CNOT gate is known, its unitary matrix is

Thus, its tensor

It will be understood by those skilled in the art that any multi-qubit gate can be constructed with a single-quantum logic gate plus any double-quantum logic gate, which in most cases is a multi-select CNOT gate. In a sense, the CNOT gate and the single quantum logic gate are prototypes of all other gates. Therefore, the method provided by the application is also suitable for the quantum program with the multi-quantum bit gate, and in practical application, the multi-quantum bit gate in the quantum program can be converted into the combination of the single-quantum logic gate and the double-quantum logic gate, and then the single-amplitude quantum computation simulation method provided by the invention is applied.

S203, obtaining a quantum state corresponding to the target single amplitude to be measured, and calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and by matching with the GPU corresponding to the calculation node; wherein the sub-amplitude is an amplitude corresponding to the undirected graph;

in practical applications, when the simulation of quantum computation involves many qubits, the computation can be performed directly using the dirac symbol>It would be very inconvenient to incorporate a binary representation method to represent each quantum state. Therefore, it is usually expressed by decimal numbers corresponding to binary expression method, such as |000>I.e. zero state, |0100>This is the 4-state. It will be appreciated that if the target quantum state component is a decimal number, it will need to be converted by the master node into a binary string and then sent to each compute node. Each bit in the binary string corresponds to a value of a qubit, and the low order to the high order corresponds to the qubits from the low order to the high order, it should be noted that, for a quantum computer, the arrangement of the low order and the high order is arranged from the low order to the high order from right to left as in a classical computer. For example, assuming that the target quantum state component is 5 states and converted to binary form, i.e. "101", corresponding to 3 qubits (q in order from low to high:)₀、q₁、q₂) Then "101" corresponds to "q", respectively₂q₁q₀”。

Specifically, calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and by cooperating with the GPU corresponding to the calculation node may include:

calling a GPU corresponding to the computing node, and respectively determining the tensors of edges connected with specific vertexes of the undirected graph to reduce the order; wherein the specific vertex is the first and last vertex of the vertex chain corresponding to each qubit; deleting the particular vertex; receiving a value of a target vertex distributed by a main node, splitting a current undirected graph based on the value of the target vertex, and calling the GPU to respectively determine value reduction of tensors of connecting edges of the target vertex aiming at each sub-undirected graph obtained by splitting; aiming at each vertex in the sub-undirected graph, combining the GPU to fuse all connecting edges of the vertex into a new edge, reducing the tensor of the new edge, and deleting the vertex; taking product of tensor values of all the reduced new edges to obtain a first sub-amplitude of the quantum state corresponding to the sub-undirected graph; and summing the first sub-amplitudes of all the sub-undirected graphs in the computing node to obtain the sub-amplitudes of the quantum states.

Finally, all vertices in each sub-undirected graph are deleted, leaving only edges with tensor order 0, the 0-order tensor being a scalar. The tensor values of the edges are multiplied to obtain a first sub-amplitude of the quantum state in the computational node.

For example, for a sub-undirected graph, all vertices and their connecting edges are S₁₂、S₁₆、S₂₅、S₃、S₄：

For vertex 1, S₁₂、S₁₆Fuse into a new edge S₁₂₆For its tensor A₁₂₆Reduced to A₂₆The corresponding edge becomes S₂₆ Delete vertex 1;

for vertex 2, the current connecting edge S₂₅、S₂₆Fuse into a new edge S₂₅₆For its tensor A₂₅₆Reduced to A₅₆Corresponding side S₅₆Delete vertex 2;

for vertex 3, only edge S is connected₃For its tensor A₃The order is reduced to A ═ x₃(scalar quantity) for changing the corresponding edge to the edge s with tensor order 0₃Delete vertex 3;

for vertex 4, only edge S is connected₄For its tensor A₄Reduced to A' ═ x₄(scalar quantity) for changing the corresponding edge to the edge s with tensor order 0₄Delete vertex 4;

for vertex 5, currently only edge S is connected₅₆The tensor A is unchanged after fusion₅₆Reduced to A₆Corresponding side S₆Delete vertex 5;

for vertex 6, currently only edge S is connected₆For its tensor A₆The order is reduced to A ″ ═ x₆(scalar quantity) for changing the corresponding edge to the edge s with tensor order 0₆Vertex 6 is deleted.

Calculating a first sub-amplitude x corresponding to the sub-undirected graph₃*x₄*x₆。

For another example, for a sub-undirected graph, all vertices and their connecting edges are S₁₂₃、S₁₂₄、S₁₅、S₄₆：

For vertex 1, S₁₂₃、S₁₂₄、S₁₅Fuse into a new edge S₁₂₃₄₅For its tensor A₁₂₃₄₅Reduced to A₂₃₄₅The corresponding edge becomes S₂₃₄₅ Delete vertex 1;

for vertex 2, currently only edge S is connected₂₃₄₅For its tensor A₂₃₄₅Reduced to A₃₄₅Corresponding side S₃₄₅Delete vertex 2;

for vertex 3, only edge S is connected₃₄₅For its tensor A₃₄₅Reduced to A₄₅Corresponding side S₄₅Delete vertex 3;

for vertex 4, the current connecting edge S₄₅、S₄₆Fuse into a new edge S₄₅₆For its tensor A₄₅₆Reduced to A₅₆Corresponding side S₅₆Delete vertex 4;

for vertex 5, currently only edge S is connected₅₆For its tensor A₅₆Reduced to A₆Corresponding side S₆Delete vertex 5;

for vertex 6, currently only edge S is connected₆For its tensor A₆Reduced by B ═ y₆(scalar quantity) for changing the corresponding edge to the edge s with tensor order 0₆Vertex 6 is deleted.

Calculating the first sub-amplitude corresponding to the sub-undirected graph as y₆。

Specifically, in order to reduce the calculation amount of the subsequent undirected graph, the tensors of the edges connected to the specific vertices of the undirected graph are respectively subjected to deterministic value reduction, which may be:

calculating a GPU corresponding to the nodes, and setting the number of thread blocks according to the tensor order after the order reduction and the number of threads in each thread block in the GPU aiming at the edges connected with each specific vertex; calculating a first element number of the tensor after the reduction according to the thread block serial number, the number of threads in each thread block and the line program number, and calculating two second element numbers of the tensor before the reduction corresponding to the first element number; the number of the element number corresponds to the number of the vertex bits connected with the current edge one by one, and the value of each bit of the element number is the value of the vertex of the corresponding vertex bit; determining a second element number with a preset determination value on the number position corresponding to the specific vertex position from the two second element numbers; and acquiring a second element value corresponding to the determined second element number, and determining the second element value as a first element value corresponding to the first element number.

Wherein, the calculation formula (the same below) of the first element number is:

Idx＝block_id*num+thread_id (2)

idx is the first element number, block _ id is the thread block sequence number, num is the number of threads in each thread block, and thread _ id is the thread program number.

Standing at the angle of the GPU, a specific implementation mode for determining the value reduction is as follows: first, the elements of the tensor can be complex numbers, and since the GPU does not have (the CPU has) a representation of complex number x + yi, a new real part tensor space and imaginary part tensor space need to be applied, where the real part tensor space is designated to store x and the imaginary part tensor space to store y, so as to represent the tensor.

Obtaining the reduced order n sent by the computing node (CPU), and judging 2ⁿWhether the number of the GPU thread blocks is less than num, if the number of the GPU thread blocks is less than 1, setting the number of the GPU thread blocks to be 2ⁿAnd num, wherein the num value satisfies the power of a non-negative integer of 2. Due to the nature of the computer, the thread block number block _ id and the thread program number thread _ id are often numbered from 0.

Taking the GPU as GTX1080ti as an example, num of GTX1080ti is 1024, thread _ id is 0, 1, and 2 … … 1023. For example, edge E is connected to vertex 2 (vertex 2 for short)₁₃₂₅₄Tensor A of 5 th order₁₃₂₅₄Performing definite value reduction, i.e. for A₁₃₂₅₄Is determined, wherein the dimensions are numbered from right to left, e.g., vertex 1 is dimension 5 and vertex 4 is dimension 1.

Assuming that the determination value is 0 (or 1, which may be set as needed), it means that the vertex 2 takes a value of 0. A step-down operation is usuallyOne step down, so n is 4, 2ⁿLess than 1024, thread block 1, blcok _ id 0. Reduced 4-order tensor A₁₃₅₄Only 16 elements, starting with thread 0, one thread calculates an element number, only thread 0-15 is called, and the first element number calculation can be: idx is 0, 1, 2 … … 15.

For the first element number 10 (binary 1010), the two second element numbers before the corresponding reduction are calculated according to the following calculation principle:

will 2^3-1-1 ═ 3 (binary 11, reduced order for mth order, then calculate 2^m-1A value of-1, to split the element number) and 10 bitwise and-ing, resulting in 2 (binary 10). Bitwise negation of 3 (after negation, binary 1111 … … 1100, the number of binary digits depends on the data type, such as 64 bits, etc.), bitwise AND operation is performed on 10 to obtain 8(1000), and binary splitting of 10 is realized to obtain the first two digits and the last two digits of the binary of 10. The two second element numbers with the vertex determination values of 0 and 1 corresponding to the subscript 2 before the reduction are respectively: 8(1000) bit left shifted by one bit (10000) and 2(10) or operated to obtain 18 (10010); the reduced order is order 3, and for 18(10010) and 4 (equivalent to 1 < (3-1), i.e., for the mth order, a left shift of 1 < (m-1) is performed, 1. ltoreq. m.ltoreq.n +1) is ORed, yielding 22 (10110).

As the set determination value is 0, namely the value on the 3 rd bit of the element number corresponding to the 3 rd bit specific vertex is 0, the bit and the value on the bit of the binary element number are in one-to-one correspondence and are the same as the vertex values on the vertex bit and the vertex bit, the second element number which is obtained by the first element number 10 is determined to be 18, and A is obtained₁₃₂₅₄The element value p of the element number 18 of (1) as A after the reduction₁₃₅₄The first element value of element number 10. Similarly, A can be calculated₁₃₅₄The first element values corresponding to the rest first element numbers finally obtain the tensor A after the reduction₁₃₅₄＝{p₀，p₁，p₂…p…p₁₄，p₁₅}。

Then, releasing the original reduced order front tensor A₁₃₂₅₄Occupied video memory.

Specifically, in order to reduce the computational complexity, one or more values of the target vertex equally divided by the master node may be received; wherein the target vertex is the first m vertices with the maximum number of connected edges in the current undirected graph, and the m vertices comprise 2^mThe number of the calculation nodes is 2ⁿN is a positive integer, and n is greater than 0 and less than or equal to m; splitting an undirected graph of the computing node into one or more sub-undirected graphs aiming at each evenly-divided vertex value; and traversing the edges connected with the target vertex aiming at each sub-undirected graph, and calling the GPU to respectively determine the values of the tensors of the edges connected with the target vertex and reduce the order.

The number of the computing nodes is a preset value, namely n is a preset value, and is related to the written quantum program and the available computing resources correspondingly configured, and the value m is preset according to needs. If n is 1 and m is 1, the vertex value of 1 vertex with the largest number of connected edges is originally uncertain, and there are 2 types: 0 or 1. The 2 value-taking conditions are distributed to 2 computing nodes for computing respectively, the vertex value of an undirected graph in one computing node is determined to be 0, a sub-undirected graph is obtained, the vertex value of the undirected graph in the other computing node is determined to be 1, the other sub-undirected graph is obtained, and the undirected graph is split.

For another example, n is 1, m is 2, and 4 values are given: 00. 01, 10, 11, then 2 compute nodes equally divide 4 values of the first 2 vertexes with the most connecting edges: 00. 01, 10, 11, generally divided equally in sequence, with one computing node divided into 00, 01 and the other computing node divided into 10, 11. Equivalently, an undirected graph in one compute node is split into 2 sub-undirected graphs, the 2 vertexes in one sub-undirected graph take values of 0 and 0 respectively, and the other one takes values of 0 and 1 respectively. Similarly, the two vertex values in one sub-undirected graph of another compute node are 1 and 0, respectively, and the two vertex values in the other sub-undirected graph are 1 and 1, respectively.

In terms of all the computing nodes, originally, each computing node comprises a same undirected graph, so that splitting is performed, the undirected graph of each computing node is split into one or more different sub-undirected graphs, each sub-undirected graph can be regarded as a Path, a first sub-amplitude corresponding to each Path inside one computing node is calculated, and the internal first sub-amplitudes are summed to obtain a sub-amplitude of a quantum state corresponding to the current node, so that the idea of a Path integration method (Feymann Path integration) is embodied, and the method is not based on a unitary matrix transformation method, because the latter causes memory occupation to increase exponentially.

Taking n-1 and m-2 as an example, vertex 3 and vertex 5 are the most connected edges in the current undirected graph. All edges connected to vertices 3 or 5 are assumed to be E₃₂₄₁、E₃₆₁、E₃₂、E₃₅₄、E₁₂₅、E₄₅、E₅₆Traversing the edges, calling the GPUs under the computation nodes with the values of 00, and determining the value reduction of the tensors of the edges in the sub-undirected graph respectively, wherein the principle is the same as the determination of the value reduction of the tensors of the edges connected to the specific vertex of the undirected graph, and the following steps are exemplified:

for edge E₃₂₄₁Corresponding to a 4 th order tensor T₃₂₄₁For the case of vertex 3, 5 taking the value 00, since the edge is not connected to vertex 5, it is only necessary to match T₃₂₄₁The 4 th order in which the vertex 3 is located is determined to be reduced. The GPU still applies for a new real part tensor and an imaginary part tensor first to obtain a reduced order n sent by a belonging computing node (CPU), and 2 is judgedⁿWhether the number of the GPU thread blocks is less than num, if the number of the GPU thread blocks is less than 1, setting the number of the GPU thread blocks to be 2ⁿ/num。

Taking GTX1080ti as an example, num of GTX1080ti is 1024, thread _ id is 0, 1, 2 … … 1023. The vertex 3 takes a value of 0, i.e., the determination value is 0. A down-scaling operation is usually one down-scaling, so n is 3, 2ⁿLess than 1024, thread block 1, blcok _ id 0. Reduced 3-order tensor T₂₄₁Only 8 elements, starting with thread 0, one thread computes an element number, only thread 0-7 is invoked, and the first element number is computed using equation (2) as: idx is 0, 1, 2 … … 7.

For the first element number 5 (binary 101), the two second element numbers before the corresponding reduction are calculated according to the following calculation principle:

for the 4 th order, 2^4-1Bitwise and-ing 7 (binary 111) and 5 yields 5 (binary 101). And carrying out bitwise negation on 7 (after negation, binary 1111 … … 1000), carrying out bitwise AND operation on the 7 and the 5 to obtain 0, and realizing binary splitting on the 5. The two second element numbers with the determined values of 0 and 1 of the vertex 3 corresponding to the subscript 3 before the reduction are respectively: 0 is or-operated by shifting left by one bit and 5 to obtain 5 (0101); for 5(0101) and 8(1000) (equivalent to 1 < (4-1), i.e., for the mth order down-scaling, a left shift operation of 1 < (m-1) is performed, 1. ltoreq. m.ltoreq.n +1) is performed or operated, yielding 13 (1101).

Because each binary digit value of the element number represents the value of the corresponding vertex, 0101 represents T₃₂₄₁Edge E of₃₂₄₁Vertex 3, vertex 2, vertex 4 and vertex 1 are sequentially taken as 0, 1, 0 and 1, so that the finally-obtained second element number 5 as the first element number 5 is determined, and T is obtained₃₂₄₁The second element value with the number of the middle element being 5 is used as the reduced T₂₄₁First element value w of middle element number 5₅. The first element values corresponding to the rest first element numbers can be calculated in the same way, and finally the tensor T with reduced order is obtained₂₄₃＝{w₀，w₁，w₂，w₃，w₄，w₅，w₆，w₇}。

Then, releasing the original reduced order front tensor T₃₂₄₁Occupied video memory.

In addition, for edge E₃₅₄In the case of the vertex 3 and the vertex 5 taking the value 00, since the order is reduced by only one step each time, the edge E can be first aligned₃₅₄Tensor T of₃₅₄The 3 rd order of the descending vertex 3 is T₅₄In the presence of T₅₄The 2 nd order of the descending point 5 is T₄。

The order reduction calculation of the edges and the value taking conditions in the other sub-undirected graphs is the same as the above, and is not described again.

Specifically, the merging, by cooperating with the GPU, all the connection edges of the vertex into a new edge includes:

determining a first edge and a second edge to be fused aiming at all connecting edges of the vertex; calling the GPU to perform upscaling on the first tensor of the first edge according to the vertex which is not connected with the first edge in the second edge, and updating the first tensor by the upscaled tensor; deleting the second edge, and connecting the vertex of the second edge, which is not connected with the first edge, to the first edge to obtain a fused middle edge; calling the GPU to calculate tensor elements of the middle edge according to the recorded corresponding relation between the vertex numbers of the first edge and the second edge; and returning to the step of determining the first edge and the second edge to be fused until the tensor element obtained by calculation is the tensor element of the last edge, and determining the last edge as a new edge to be fused.

In practical application, generally, for the fusion of any two edges, the tensor of one edge needs to be stepped up first, so that the edge with the largest order number in all the connection edges of the vertex is generally selected as the first edge.

It is understood that the second edge is an edge that is not fused in the remaining edges except for the first edge.

Specifically, one side can be directly selected from the remaining sides as the second side; or the connecting edges of the vertices may be sorted according to the order, and may be sorted from small to large or from large to small, without limitation, and then the edge with the largest order is determined as the first edge, and the edge with the largest order in the remaining unfused edges is determined as the second edge.

For example, vertex V_nThe number of connected edges is 4, which are respectively: e_n1、E_n2、E_n3、E_n4The corresponding orders are respectively: 3. 2, 4 and 2. Firstly, sequencing the edges from large to small according to the order, and obtaining: e_n3、E_n1、E_n2、E_n4(or E)_n3、E_n1、E_n4、E_n2) It can be seen that E_n3The order is maximum, it is determined as the first side, and three sides which are not fused are left, E_n1The order is largest and it is determined as the second edge.

And comparing the vertex connected with the first edge and the second edge, determining the vertex connected with the second edge but not connected with the first edge, then executing the step-up operation on the first tensor according to the determined vertex, and updating the first tensor by the new tensor after the step-up operation.

For example, suppose the vertex is numbered 2 and the corresponding first edge is E₁₂Connecting vertices 1 and 2, the corresponding first tensor is A₁₂1, { 2, 3, 4 }; the second side is E₂₃Connecting vertices 2 and 3, corresponding to the second tensor B₂₃5, 6, 7, 8. Comparing the two edges of the vertex with each other, it can be seen that the vertex 3 is a vertex connected to the second edge but not connected to the first edge. Accordingly, A is₁₂Rising order of A₁₂₃As will be understood from the following₁₂₃After {1, 1, 2, 2, 3, 3, 4, 4}, the first volume corresponding to the first edge is updated to a₁₂₃。

Wherein the upscaling computation for the first tensor for the first edge is performed by the GPU, the principle may be as follows:

the GPU performs stepping on the first tensor of the first edge, and may calculate a tensor order t ═ r + s after stepping according to an order r of the first tensor and a stepped order s, where r and s are both positive integers; setting the number of thread blocks according to the upgraded tensor order t and the number num of threads in each thread block in the GPU, wherein if 2, the number is 2^t Setting 1 thread block less than num, otherwise setting GPU thread block number to 2ⁿ(ii) num; calculating a first element number idx of the tensor after the step rising according to the thread block serial number block _ id, the thread number num in each thread block and the thread program number thread _ id, wherein the calculation formula is the same as the formula (2); the element of the tensor after the raising is calculated from the first element number idx, the raised order s, and the element of the first tensor.

Continuing with the example of GTX1080ti, instruction information for stepping up, the first tensor A before stepping up, is received from the compute node ₁₂1, 2, 3, 4, corresponding to

element numbers

0, 1, 2, 3 (corresponding to binary 00, 01, 10, 11), up to a₁₂₃R is 2, s is 1, and t is 3. Judgment 2^tLess than num (1024), configure 1 thread block, block _ id 0. For the same reason, the thread _ id is 0, 1 … … 1023, because of a₁₂₃Only 8 elements, only 0 need be calledThread No. 7, each thread is calculated by formula (2) to obtain A₁₂₃Idx of (a) is 0, 1, 2 … … 7 in that order. Assign the first element numbered idx to: dst [ idx]＝src[idx/2^s]Denotes A with the first element number idx₁₂₃Element value dst [ idx ]]Is equal to the element number idx/2^sA of the rounded portion of (A)₁₂The element value, the integer part, means the integer part left after the decimal point is removed.

From A₁₂₃Idx of (2) calculating idx/2^s0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5 in that order to idx/2, gives:

dst[0]＝src[0]＝1，dst[1]＝src[0.5]＝src[0]＝1；

dst[2]＝src[1]＝2，dst[3]＝src[1.5]＝src[1]＝2；

dst[4]＝src[2]＝3，dst[5]＝src[2.5]＝src[2]＝3；

dst[6]＝src[3]＝4，dst[7]＝src[3.5]＝src[3]＝4。

thereby obtaining A₁₂₃1, 1, 2, 2, 3, 3, 4, 4 }. Similarly, if A is paired₁₂Up to 2 th order A₁₂₃₄Obtaining A₁₂₃₄＝{1，1，1，1，2，2，2，2，3，3，3，3，4，4，4，4}。

Then the first tensor before the upgrade is the 10 th order tensor A_12345678910For example, {1, 2, 3 … … 1024} corresponds to

element numbers

0, 1, 2 … … 1023 (corresponding to binary 0000000000, 0000000001 … … 1111111111) rising to a_{1234567891011}R is 10, s is 1, and t is 11. Judgment 2^tGreater than num (1024), configuration 2^tEach thread block has a thread number thread _ id of 0 to 1, and the number of the thread numbers thread _ id in each thread block is 0 to 1 … … 1023, and the number of the tensor elements after the upgrade is 2₁₁The idx calculated by the thread block 0 is 0, 1 and 2 … … 1023 in sequence, and the idx calculated by the thread block 1 is 1024 and 1025 … … 2047 in sequence, namely A_{1234567891011}Idx of (a) is 0 to 2047.

Calculating idx/2^s0, 0.5, 1, 1.5 … … 1023, 1023.5 in that order to give idx/2:

dst[0]＝src[0]＝1，dst[1]＝src[0.5]＝src[0]＝1；

dst[2]＝src[1]＝2，dst[3]＝src[1.5]＝src[1]＝2；

……

dst[2046]＝src[1023]＝1024，dst[2047]＝src[1023.5]＝src[1023]＝1024。

thereby obtaining A_{1234567891011}＝{1，1，2，2……1024，1024}。

And releasing the video memory occupied by the original first tensor after the upgraded tensor is obtained.

The first side is still taken as E₁₂The second side is E₂₃For example, the first quantity is updated to A₁₂₃Then, delete the second side E₂₃Connecting the vertex 3 of the second edge which is not connected with the first edge to obtain a fused intermediate edge E₁₂₃. Then, according to the recorded corresponding relation between the vertex numbers of the first edge and the second edge, calling the GPU to calculate the tensor C of the middle edge₁₂₃Of (2) is used.

The corresponding relation can be defined by the first quantity A₁₂₃And a second tensor B₂₃And (6) obtaining. Wherein A is₁₂₃The corresponding vertex sequence is "123", B₂₃The corresponding vertex sequence is "23". It can be seen that vertex 2 is at A₁₂₃The corresponding vertex sequence is bit 2, at B₂₃The vertex sequence is 1 st bit, and the vertex number 3 is A₁₂₃The corresponding vertex sequence is bit 3, at B₂₃The corresponding vertex sequence is the 2 nd bit, and the corresponding relation of the position is recorded. Specifically, the element structure of the maskray may be stored in an array, for example, in an array maskray: struct { vertex at A₁₂₃Position in the corresponding vertex sequence, vertex at B₂₃The position in the corresponding vertex sequence }.

Specifically, the tensor elements of the middle edge are calculated, and the steps may be as follows:

the GPU firstly applies for an array, copies Maskarray in a CPU (central processing unit) of a computing node to the GPU and stores the Maskarray in the array;

according to the updated order q of the first tensor and each line in the GPUSetting the number of thread blocks, if 2, of the number num of threads in the thread block^qSetting 1 thread block less than num, otherwise setting 2^qA number of thread blocks,/num; calculating a first element number idx of the tensor of the middle edge according to the thread block serial number block _ id, the number num of threads in each thread block and the thread program number thread _ id, wherein the calculation formula is the same as the formula (2);

determining a corresponding element of each element in the first tensor in a second tensor of the second edge according to a corresponding relation stored in array; traversing each element in the first tensor to update the element by its product with its corresponding element in the second tensor.

In a

first amount A

₁₂₃1, 1, 2, 2, 3, 3, 4, 4 and a second tensor B₂₃As an example of {5, 6, 7, 8}, it can be seen from the correspondence that the vertex 2 is at a₁₂₃And B₂₃The vertex numbers 3 of the corresponding 2 nd and 1 st positions in the vertex sequence are respectively at A₁₂₃And B₂₃Bit 3 and bit 2 in the corresponding vertex sequence.

First, the position number of the first vector element is expressed as a binary number (value corresponding to the vertex "123"): (000)₂、(001)₂、(010)₂、(011)₂、(100)₂、(101)₂、(110)₂、(111)₂the position number of the element in the second tensor is expressed as a binary number (corresponding to the value of the vertex "23"): (00)₂、(01)₂、(10)₂、(11)₂from vertices 2 and 3, respectively, at tensor A₁₂₃And B₂₃The position corresponding relation in the corresponding vertex sequence can determine A₁₂₃Wherein each element is in B₂₃The corresponding elements in (1) are shown in table 1:

TABLE 1A₁₂₃Wherein each element is in B₂₃Corresponding element in (1)

Wherein in the second row of Table 1, under binary numbersThe scribing is to clarify B₂₃The middle vertex is at A₁₂₃The values in the position numbers of each element are convenient for clarity and have no limiting meaning.

Each element in the first tensor is traversed, multiplied by a corresponding element in the second tensor, and the product is then used to update the element.

With A₁₂₃And B₂₃For example, A is first introduced₁₂₃Each element in (1) and B₂₃Multiplying the corresponding elements in (1), wherein the obtained products are sequentially: 5. 6, 14, 16, 15, 18, 28, 32, and then uses these products to update a₁₂₃Obtaining a middle edge E₁₂₃Tensor C of₁₂₃＝{5，6，14，16，15，18，28，32}。

Wherein, the reduced order calculation of the tensor for the new edge is executed by the GPU, and the process may be as follows:

firstly, the GPU sets the number of thread blocks according to the tensor order of the new edge after order reduction;

secondly, calculating a first element number of the tensor after the reduction according to the thread block serial number, the number of threads in each thread block and the line program number, and calculating two second element numbers of the tensor before the reduction corresponding to the first element number; the number of the element number corresponds to the number of the vertex bits connected with the current edge one by one, and the value of each bit of the element number is the value of the vertex of the corresponding vertex bit.

It should be noted that the two steps are consistent with the calculation principle of the step corresponding to the determined value reduction, and are not described herein again.

Then, two second element values corresponding to the two second element numbers one to one are obtained, the two second element values are summed, and the sum is determined as the first element value corresponding to the first element number.

It can be seen that, compared with the calculation principle of deterministic value reduction, the difference is that the deterministic value reduction needs to find out the second element number L1 whose value at the position corresponding to a vertex is the preset deterministic value from the two second element numbers L1 and L2, so as to obtain the second element value corresponding to the second element number L1 as the first element value corresponding to the first element number, and the reduction calculation needs to add the two second element values corresponding to the two second element numbers L1 and L2, respectively, to obtain the sum as the first element value corresponding to the first element number.

Referring to the specific implementation and example of the above-mentioned deterministic value reduction, assume a 5 th order tensor A₁₃₂₅₄The tensor of a new edge after the final fusion of the vertex 2 is consistent with other conditions. For the first element number 10 (binary 1010), the two second element numbers before their corresponding reduction are calculated as 18(10010), 22 (10110). Because the vertex 2 at the 3 rd order has an undetermined value (may be 0 or 1), the two element numbers 18 and 22 are both corresponding element numbers of 10, the element values p and p 'corresponding to the two element numbers 18 and 22 are obtained and added, the obtained sum p' is the first element value of the first element number 10, and so on, and finally the A is obtained₁₃₂₅₄Reduced tensor A₁₃₅₄。

For another example, assuming the vertex is vertex 2, after all the connected edges are fused, a new edge E is obtained₁₂₃₄Its tensor is A₁₂₃₄5, 5, 6, 6, 14, 14, 16, 16, 15, 15, 18, 18, 28, 28, 32, 32. The corresponding vertex sequence of the new edge tensor is 1234, except vertex 2, a new vertex sequence 134 is obtained, and the value of the new vertex sequence and A are taken₁₂₃₄The corresponding relationship of the elements in (1) is shown in table 2:

TABLE 2 values of the New vertex sequence "134" and A₁₂₃₄Corresponding relation of middle element

Thus, for the target edge E₁₂₃₄After the order of the vertex 2 is reduced, the vertex 2 is deleted, and a new edge E after the order is reduced is obtained₁₃₄Corresponding to tensor A₁₃₄＝{19，19，22，22，43，43，50，50}。

And S204, returning the sub-amplitudes to the main node of the distributed cluster so that the main node reduces each sub-amplitude to obtain the amplitude of the quantum state as a target single amplitude.

The reduction is data reduction, which means that the data volume is reduced to the maximum extent on the premise of keeping the original appearance of the data as much as possible. And the main node sums all the sub-amplitudes by stipulating the sub-amplitudes calculated by each calculation node to obtain the target single amplitude of the measured quantum state.

The existing CPU micro-architecture is designed for high efficiency of instruction execution, and has strong performance and high efficiency of logic processing (instruction execution), but the GPU has a large number of threads (hundreds of thousands) and is dedicated to large-scale concurrent computation, and the numerical computation efficiency of the CPU micro-architecture is usually about 5 to 10 times higher than that of the CPU. Based on this, in the embodiment of the present invention, the related main computation tasks include determining the tensor element computation of the order reduction, the order increase, the middle edge of the fusion process, and the like, which are all distributed to the GPU subordinate to each computation node CPU for execution, and the CPU mainly executes the logic processing task, and the two tasks cooperate with each other, so that the single-amplitude quantum computation simulation efficiency is at a higher level.

Therefore, the method can calculate only one target single amplitude of the involved qubits at a time, specifically, map the target quantum program onto the undirected graph, split the undirected graph onto a plurality of computing nodes by combining a path integration method, and calculate the corresponding sub-undirected graph by matching each computing node with the subordinate GPU. The whole calculation process is mostly based on simple operation of elements in tensor, compared with full-amplitude simulation based on unitary matrix in the prior art, the requirement on the memory is greatly reduced, and the calculation amount does not rise along with the index of the quantum bit, so that quantum calculation simulation related to 50 or more quantum bits can be realized; the GPU has stronger performance of executing massive parallel computation, so that the simulation efficiency of the whole quantum computation is higher. At present, quantum computation simulation involving 196 qubits can be realized at most by applying the technical solution provided by the embodiments of the present invention.

The construction, features and functions of the present invention are described in detail in the embodiments illustrated in the drawings, which are only preferred embodiments of the present invention, but the present invention is not limited by the drawings, and all equivalent embodiments modified or changed according to the idea of the present invention should fall within the protection scope of the present invention without departing from the spirit of the present invention covered by the description and the drawings.

Claims

1. A single amplitude quantum computational simulation method, the method comprising:

2. The single-amplitude quantum computation simulation method of claim 1, characterized in that: the constructing of the undirected graph corresponding to the target quantum program comprises:

when the type of the quantum logic gate in the linked list is a first double quantum gate, an edge with the tensor order of 2 is created; the edge is connected with the last vertex in the vertex chain respectively corresponding to the two qubits operated by the first double-quantum gate, and the unitary matrix of the first double-quantum gate is a diagonal matrix;

and obtaining an undirected graph corresponding to the target quantum program.

3. The single-amplitude quantum computation simulation method of claim 2, characterized in that: the calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and matched with the GPU corresponding to the calculating node comprises the following steps:

deleting the particular vertex;

4. The single-amplitude quantum computation simulation method of claim 3, wherein: the determining the values of the tensors of the edges connected to the specific vertexes of the undirected graph and reducing the orders respectively comprises:

5. The single-amplitude quantum computation simulation method of claim 4, wherein: the receiving of the value of the target vertex allocated by the master node, splitting the current undirected graph based on the value of the target vertex, and calling the GPU to determine value reduction of the tensors of the connecting edges of the target vertex for each sub-undirected graph obtained by the splitting, includes:

6. The single-amplitude quantum computation simulation method of claim 3, wherein: the matching with the GPU to fuse all the connecting edges of the vertex into a new edge includes:

7. The single-amplitude quantum computation simulation method of claim 6, wherein: the step up of the first magnitude of the first edge comprises:

8. The single-amplitude quantum computation simulation method of claim 6, wherein: the computing tensor elements for the intermediate edges includes:

9. The single-amplitude quantum computation simulation method of claim 6, wherein: the reducing the tensor of the new edge includes:

10. The single amplitude quantum computation simulation method of any one of claims 4 to 9, wherein: the calculation formula of the first element number is as follows:

Idx＝block_id*num+thread_id