CN113569511A - Quantum circuit simulation method and device - Google Patents

Quantum circuit simulation method and device Download PDF

Info

Publication number
CN113569511A
CN113569511A CN202110657016.XA CN202110657016A CN113569511A CN 113569511 A CN113569511 A CN 113569511A CN 202110657016 A CN202110657016 A CN 202110657016A CN 113569511 A CN113569511 A CN 113569511A
Authority
CN
China
Prior art keywords
circuit
quantum
sub
qubits
dividing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110657016.XA
Other languages
Chinese (zh)
Inventor
翟季冬
张晨
宋泽宇
王豪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110657016.XA priority Critical patent/CN113569511A/en
Publication of CN113569511A publication Critical patent/CN113569511A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/36Circuit design at the analogue level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N10/00Quantum computing, i.e. information processing based on quantum-mechanical phenomena

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a simulation method and a device of a quantum circuit, wherein the method comprises the following steps: dividing the quantum circuit based on a mixed division rule to obtain at least one division sub-circuit and a simulation method corresponding to each division sub-circuit; and simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit. The device is used for executing the method. According to the quantum circuit simulation method and device provided by the embodiment of the invention, the simulation efficiency of each divided sub-circuit is improved, and the simulation efficiency of the quantum circuit is improved.

Description

Quantum circuit simulation method and device
Technical Field
The invention relates to the technical field of quantum information, in particular to a quantum circuit simulation method and a quantum circuit simulation device.
Background
At present, the common approach to analog quantum circuits is to represent the quantum states as a vector and treat the quantum gates in the quantum circuit that act on the quantum states as small matrix-vector multiplications. However, the performance of the directly implemented quantum circuit simulator is poor because the analog quantum gate needs a large number of small matrices with poor data locality, which results in a very high cache miss rate, and further makes the computing resources not fully utilized.
In the prior art, in order to improve the performance of a quantum circuit simulator, a ShareMem method and a BatchMV method are mainly used. The ShareMem method caches the quantum state to be simulated in a high-speed shared memory (shared memory) of the GPU to reduce the delay of memory access, and assuming that the quantum state to be simulated contains n quantum bits, the simulator needs 2 quantum bits in totalnA number to indicate the state. ShareMem's method converts 2nThe value is divided into several sizes of 2kWhere k represents the number of target qubits in the sub-circuit to be acted upon. Each fragment is processed by a thread block (thread block) of the GPU. Each thread block copies a segment from a main Memory (Global Memory) of the GPU to the shared Memory, sequentially acts gates in the sub-circuits on the segment, and then stores the segment back to the main Memory. However, this method has more repeated computations, slower index access, and memory Bank (Bank) access conflicts of the GPU. The BatchMV method, which directly calculates the combined quantum gates to reduce the total number of quantum gates in a circuit, can combine gates in a sub-circuit into a k-qubit gate when both the target and control bits of the gates belong to a certain set of k-qubits. The combined k quantum bit gate can be composed of 2 in one dimensionk×2kThe target qubit is the k qubits. The effect of the merged quantum gate can be seen as 2(n-k)The individual matrix vectors multiply the task. Each task will have a dimension of 2k×2kAnd a quantum gate matrix of 2 in quantum statekThe vectors of the number are multiplied. However, this method needs to copy the entire gate matrix for each computing unit, the memory access is large, and it is difficult to determine the optimal number of merged qubits when merging the quantum gates.
Disclosure of Invention
In view of the problems in the prior art, embodiments of the present invention provide a method and an apparatus for simulating a quantum circuit, which can at least partially solve the problems in the prior art.
In one aspect, the present invention provides a method for simulating a quantum circuit, including:
dividing the quantum circuit based on a mixed division rule to obtain at least one division sub-circuit and a simulation method corresponding to each division sub-circuit;
and simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
In another aspect, the present invention provides an analog device of a quantum circuit, including:
the quantum circuit comprises a dividing module, a simulation module and a control module, wherein the dividing module is used for dividing the quantum circuit based on a mixed dividing rule to obtain at least one dividing sub-circuit and a simulation method corresponding to each dividing sub-circuit;
and the simulation module is used for simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
In another aspect, the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the quantum circuit simulation method described in any of the above embodiments are implemented.
In yet another aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method for simulating a quantum circuit according to any of the above embodiments.
The quantum circuit simulation method and device provided by the embodiment of the invention can be used for dividing the quantum circuit based on the mixed division rule to obtain at least one division sub-circuit and a simulation mode corresponding to each division sub-circuit, then simulating each division sub-circuit according to the simulation method corresponding to each division sub-circuit to obtain the simulation result of the quantum circuit, and improving the simulation efficiency of the quantum circuit by improving the simulation efficiency of each division sub-circuit.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a schematic flow chart of a simulation method of a quantum circuit according to a first embodiment of the present invention.
Fig. 2 is a schematic flow chart of a simulation method of a quantum circuit according to a second embodiment of the present invention.
Fig. 3 is a schematic flow chart of a simulation method of a quantum circuit according to a third embodiment of the present invention.
Fig. 4 is a schematic flowchart of a simulation method of a quantum circuit according to a fourth embodiment of the present invention.
Fig. 5 is a schematic flowchart of a simulation method of a quantum circuit according to a fifth embodiment of the present invention.
Fig. 6a is a schematic diagram of the distribution of computing tasks in the prior art according to the sixth embodiment of the present invention.
Fig. 6b is a schematic diagram of the distribution of the computing tasks according to the present application according to the sixth embodiment of the present invention.
Fig. 7 is a schematic flowchart of a simulation method of a quantum circuit according to a seventh embodiment of the present invention.
Fig. 8a is a schematic diagram of original data storage of a segment provided by an eighth embodiment of the present invention.
FIG. 8b is a diagram illustrating data storage of remapped segments according to an eighth embodiment of the present invention.
Fig. 9 is a schematic flowchart of a simulation method of a quantum circuit according to a ninth embodiment of the present invention.
Fig. 10 is a schematic flowchart of a simulation method of a quantum circuit according to a tenth embodiment of the present invention.
Fig. 11 is a schematic flowchart of a simulation method of a quantum circuit according to an eleventh embodiment of the present invention.
Fig. 12 is a schematic flowchart of a simulation method of a quantum circuit according to a twelfth embodiment of the present invention.
Fig. 13 is a schematic diagram of data allocated by each GPU according to the thirteenth embodiment of the present invention.
Fig. 14 is a schematic diagram of a quantum circuit according to a fourteenth embodiment of the present invention.
Fig. 15 is a schematic diagram of a phase transition of a quantum circuit according to a fifteenth embodiment of the present invention.
Fig. 16 is a diagram illustrating comparison of simulation results of a single-GPU quantum circuit according to a sixteenth embodiment of the present invention.
Fig. 17 is a diagram illustrating comparison of simulation results of a quantum circuit with multiple GPUs according to a seventeenth embodiment of the present invention.
Fig. 18 is a diagram illustrating comparison of simulation results of two quantum circuits of a quantum circuit simulator according to an eighteenth embodiment of the present invention.
Fig. 19 is a schematic structural diagram of a quantum circuit simulation apparatus according to a nineteenth embodiment of the present invention.
Fig. 20 is a schematic structural diagram of a quantum circuit simulation device according to a twentieth embodiment of the present invention.
Fig. 21 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-first embodiment of the present invention.
Fig. 22 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-second embodiment of the present invention.
Fig. 23 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-third embodiment of the present invention.
Fig. 24 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-fourth embodiment of the present invention.
Fig. 25 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-fifth embodiment of the present invention.
Fig. 26 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-sixth embodiment of the present invention.
Fig. 27 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-seventh embodiment of the present invention.
Fig. 28 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-eighth embodiment of the present invention.
Fig. 29 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-ninth embodiment of the present invention.
Fig. 30 is a schematic structural diagram of an analog device of a quantum circuit according to a thirtieth embodiment of the present invention.
Fig. 31 is a schematic physical structure diagram of an electronic device according to a thirty-first embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The execution body of the simulation method of the quantum circuit provided by the embodiment of the invention comprises but is not limited to a computer.
In order to facilitate understanding of the technical solutions provided in the present application, the following first describes relevant contents of the technical solutions in the present application.
Quantum computing: refers to a technique for rapidly completing a calculation using properties such as entanglement of quanta.
Quantum bit: refers to a physical device for quantum computing.
A quantum gate: the method refers to a process of operating quantum bits, and all quantum operations can be completed by using a single-bit quantum gate and a controlled single quantum gate.
Target qubit: refers to the qubit of the quantum gate effect. For a controlled quantum gate, the target qubit does not include a control qubit.
Quantum circuit: refers to a set of quantum gates.
Quantum circuit simulation: refers to the process of obtaining a quantum processed state after some quantum gate operations to obtain an initial quantum state.
GPU (graphics Processing Unit): the graphics processor is also called a display chip.
cuBLAS: and an efficient matrix operation library in the GPU software stack.
Tensor computation Core (Tensor Core): a matrix multiplication dedicated computation unit in the GPU.
A diagonal door: the representation matrix is a quantum gate of a diagonal matrix.
Fig. 1 is a schematic flow chart of a simulation method of a quantum circuit according to a first embodiment of the present invention, and as shown in fig. 1, the simulation method of a quantum circuit according to the embodiment of the present invention includes:
s101, dividing the quantum circuit based on a mixed division rule to obtain at least one divided sub-circuit and a simulation method corresponding to each divided sub-circuit;
specifically, for a quantum circuit that needs to be simulated, the quantum circuit may be divided based on a mixed division rule to obtain at least one division sub-circuit, that is, one division sub-circuit may be obtained, or two or more division sub-circuits may be obtained. Each of the divided sub-circuits corresponds to an analog method. Each of the dividing sub-circuits includes at least one quantum gate, that is, the dividing sub-circuit may include one quantum gate, and may also include two or more quantum gates. The hybrid segmentation rule is preset and comprises at least two segmentation rules. Each partitioning sub-circuit has the same number of qubits as the quantum circuit.
S102, simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
Specifically, after obtaining the simulation method corresponding to the divided sub-circuit, the divided sub-circuit may be simulated by the simulation method corresponding to the divided sub-circuit, and a simulation result of the simulation circuit may be obtained. And sequentially simulating each divided sub-circuit by a simulation method corresponding to each divided sub-circuit, and taking the obtained simulation result as the simulation result of the quantum circuit.
The quantum circuit simulation method provided by the embodiment of the invention can be used for dividing the quantum circuit based on the mixed division rule to obtain at least one division sub-circuit and a simulation mode corresponding to each division sub-circuit, and then simulating each division sub-circuit according to the simulation method corresponding to each division sub-circuit to obtain the simulation result of the quantum circuit, so that the simulation efficiency of the quantum circuit is improved by improving the simulation efficiency of each division sub-circuit.
Fig. 2 is a schematic flow chart of a simulation method of a quantum circuit according to a second embodiment of the present invention, and as shown in fig. 2, on the basis of the foregoing embodiments, further, the hybrid division rule includes a first division rule and a second division rule; correspondingly, the simulation method for segmenting the quantum circuit based on the hybrid segmentation rule to obtain at least one segmentation sub-circuit and each segmentation sub-circuit corresponds to comprises the following steps:
s201, segmenting the rest circuit according to the first segmentation rule to obtain a first sub-circuit, and segmenting the rest circuit according to the second segmentation rule to obtain a second sub-circuit; wherein the remaining circuit is obtained by removing the divided sub-circuit which has been divided from the quantum circuit;
specifically, the hybrid division rule includes a first division rule and a second division rule, the first division rule corresponds to a first simulation method, that is, a division sub-circuit obtained by the first division rule performs simulation calculation by using the first simulation method, and the second division rule corresponds to a second simulation method, that is, a division sub-circuit obtained by the second division rule performs simulation calculation by using the second simulation method.
The remaining circuit is partitioned based on the first partitioning rule, a first sub-circuit comprising at least one quantum gate may be obtained. And segmenting the rest circuit based on the second segmentation rule to obtain a second sub-circuit, wherein the second sub-circuit comprises quantum gates with the number larger than or equal to zero. Wherein the remaining circuit is obtained by removing the divided sub-circuit which has been divided from the quantum circuit. Understandably, when the quantum circuit is divided for the first time, the rest of the circuit is the same as the quantum circuit; when the quantum circuit is divided for the second time, the rest circuits remove the divided sub-circuits obtained by the first division for the quantum circuit; when the quantum circuit is divided for the third time, the residual circuit removes the divided sub-circuit obtained by the first division and the divided sub-circuit obtained by the second division for the quantum circuit; and the like until the quantum circuit is completely divided.
S202, comparing the execution efficiency of the first sub-circuit with the execution efficiency of the second sub-circuit to obtain a first sub-circuit or a second sub-circuit with higher execution efficiency as the dividing sub-circuit; wherein the execution efficiency of the first sub-circuit and the execution efficiency of the second sub-circuit are obtained in advance.
Specifically, after obtaining the first sub-circuit and the second sub-circuit, comparing the execution efficiency of the first sub-circuit with the execution efficiency of the second sub-circuit, and if the execution efficiency of the first sub-circuit is higher than the execution efficiency of the second sub-circuit, regarding the first sub-circuit as the dividing sub-circuit; if the execution efficiency of the second sub-circuit is higher than the execution efficiency of the first sub-circuit, then treating the second sub-circuit as the dividing sub-circuit; if the execution efficiency of the first sub-circuit is equal to the execution efficiency of the second sub-circuit, then one of the first sub-circuit and the second sub-circuit may be randomly selected as the dividing sub-circuit.
Fig. 3 is a schematic flow chart of a quantum circuit simulation method according to a third embodiment of the present invention, and as shown in fig. 3, based on the foregoing embodiments, the first division rule further includes:
s301, obtaining a first active qubit set from the qubits of the remaining circuits, wherein the first active qubit set comprises a first number of active qubits, and the first number of active qubits comprises qubits of a second number of dimensions which are lowest in a quantum state; wherein the second number is less than the first number;
in particular, the remaining circuitry includes at least one qubit gate, each qubit gate having an active target qubit. A first number of qubits may be obtained from the qubits of the remaining circuitry to form a first set of active qubits. Each qubit comprised by the first set of active qubits is referred to as an active qubit, the first set of active qubits comprising a first number of active qubits. The first number of active qubits includes qubits in a second number of dimensions that are lowest in quantum state to merge accesses to a GPU master (Global Memory). The second number is smaller than the first number, and the first number and the second number are set according to actual needs. For example, the first number is set to 10 and the second number is set to 3.
Wherein obtaining at least one first set of active qubits from the qubits of the remaining circuit comprises:
obtaining a set of executable quantum gates from the quantum gates of the remaining circuitry, the set of executable quantum gates comprising at least one quantum gate;
obtaining the quantum bits meeting a first adding rule from the quantum bits of the residual circuit, and adding the quantum bits into a first quantum bit set so that the number of the quantum bits included in the first quantum bit set is just less than or equal to a first number; the first set of qubits comprises qubits in a second number of dimensions that are lowest in quantum state;
if the number of the qubits included in the first set of qubits is equal to a first number, taking the first set of qubits as a first active set of qubits; if the number of qubits included in the first set of qubits is smaller than a first number, randomly selecting a difference qubit between the first number and the number of qubits included in the first set of qubits, and forming a first active set of qubits with the qubits included in the first set of qubits.
When obtaining the executable quantum gate set from the quantum gates of the remaining circuit, it is necessary to determine whether each of the quantum gates of the remaining circuit can be executed. For one of the quantum gates of the remaining circuits, if the number of the qubits corresponding to the quantum gate is greater than the first number, the quantum gate is not executable, and the control qubit and the target qubit of the quantum gate are added to the set of non-executable qubits; and if the number of the quantum bits corresponding to the quantum gate is less than or equal to a first number and the related quantum bits of the quantum gate and the unexecutable quantum bit collection do not have the same quantum bits, adding the quantum bits into the executable quantum gate collection as an executable quantum gate. Wherein the set of non-executable qubits is initially empty; determining whether a quantum gate can be executed requires first determining whether the quantum gate dependent on the quantum gate can be executed.
It should be noted that the qubit corresponding to the quantum gate refers to a union of the relevant qubits of the quantum gate and the qubits in the second dimensionality where the quantum state is lowest. The relevant qubits of the quantum gate include: a collection of target and control qubits for the quantum gates and control qubits and target qubits for quantum gates on which the quantum gates depend. The quantum gate dependent quantum gate is all quantum gates which are positioned before the quantum gate and have non-empty intersection of the active quantum bit (including the control quantum bit and the target quantum bit) set and the quantum bit set of the quantum gate active quantum gate. In selecting the relevant qubits for the qubit gates, the target qubit has to be selected if the qubit gates are off-diagonal gates.
The first addition rule includes:
the method comprises the following steps that firstly, if the number of the quantum bits included in a first quantum bit set is smaller than a first number, the number of quantum gates corresponding to each quantum bit in a residual quantum bit set is counted; wherein the surplus qubit set refers to the qubits in the surplus circuits that are not added to the first qubit set; each qubit in the remaining set of qubits is a control qubit or a target qubit of a corresponding qubit gate, the qubit corresponding to each qubit in the remaining set of qubits is a quantum gate in the set of executable qubits, and does not include a quantum gate corresponding to a qubit in the first set of qubits;
secondly, the quantum gates corresponding to the quantum bits in the residual quantum bit set are counted to obtain the quantum bits with the largest number of the quantum gates, the quantum bits are added into the first quantum bit set, and the quantum gates corresponding to the quantum bits with the largest number of the quantum gates are added into the first quantum gate set;
and thirdly, repeating the first step until the number of the qubits included in the first qubit set is just less than or equal to the first number, that is, if the qubits are added to the first qubit set, the number of the qubits in the first qubit set is greater than the first number, or adding the qubits to the first qubit set, and the number of the quantum gates in the first quantum gate set is not increased. S302, taking a quantum gate corresponding to the first active quantum bit set as a first sub-circuit of the first sub-circuit;
specifically, the quantum gates acting on the first active qubit set are used as the quantum gates corresponding to the first active qubit set, and there is at least one quantum gate corresponding to the first active qubit set. A quantum gate corresponding to the first set of active qubits may be obtained and then used as the first sub-circuit. First sub-circuit
S303, performing execution time prediction on the first sub-circuit to obtain the predicted execution time of the first sub-circuit, and counting the number of quantum gates of the first sub-circuit; first sub-circuit
Specifically, for each of the first sub-circuits, a predicted execution time may be obtained for the first sub-circuit. The number of quantum gates of the first sub-circuit may be obtained by counting the number of quantum gates of the first sub-circuit.
S304, obtaining the execution efficiency of the first sub-circuit based on the number of quantum gates and the predicted execution time of the first sub-circuit. First sub-circuit
Specifically, for the first sub-circuit, the execution efficiency of the first sub-circuit may be obtained according to the number of quantum gates of the first sub-circuit and the predicted execution time of the first sub-circuit. A first sub-circuit wherein the execution efficiency may be defined as the number of quantum gates that can be processed per second, the greater the number of quantum gates that can be processed per second, the higher the execution efficiency.
For example, the execution efficiency Eff of the first sub-circuit may be obtained by calculation according to the formula Eff — Q/t, where Q represents the number of quantum gates of the first sub-circuit, and t represents the predicted execution time of the first sub-circuit.
First sub-circuit
Fig. 4 is a schematic flowchart of a simulation method of a quantum circuit according to a fourth embodiment of the present invention, and fig. 4 shows that, based on the foregoing embodiments, the performing time prediction on each first sub-circuit to obtain the predicted execution time of each first sub-circuit includes:
s401, inquiring and obtaining the predicted execution time of each quantum gate of the first sub-circuit according to the type of each quantum gate of the first sub-circuit and the number of quantum bits of the quantum circuit;
specifically, the first sub-circuit obtained by the first segmentation rule is subjected to simulation calculation only by using a first simulation method corresponding to the first segmentation rule, wherein the first simulation method is improved based on the ShareMem method. In performing the execution time prediction for the first sub-circuit, the predicted execution time comprises the sum of two parts of time, the first part of time being the sum of the time of each quantum gate that acts on the first sub-circuit of n quantum bits in turn, and the second part being a fixed overhead time, including but not limited to the time to copy the quantum gates in the first sub-circuit to the constant memory of the GPU, the time to copy data between the GPU main memory and the shared memory, some scheduling overhead time, etc.
Each quantum gate has a unique corresponding type. For different quantum circuits, the execution time of each type of quantum gate of each quantum circuit can be obtained through a first simulation method experiment, the execution time corresponds to the number of quantum bits of the quantum circuit and the type of the quantum gate, and a quantum gate execution time table is formed and stored in a database. The execution time of each quantum gate of the first sub-circuit can be obtained by looking up from the above quantum gate execution time table according to the type of each quantum gate of the first sub-circuit and the number of qubits of the quantum circuit.
S402, inquiring and obtaining fixed overhead time according to the number of the quantum bits of the quantum circuit;
specifically, for the quantum circuit, the fixed overhead time is only related to the number of qubits of the quantum circuit, and quantum circuits with the same number of qubits can be considered to be equal in fixed overhead time. The fixed overhead time measurement can be performed on the quantum circuits with different quantum bit numbers by a first simulation method, the fixed overhead time corresponding to the different quantum bit numbers is obtained, and a fixed overhead time table is formed and stored in a database. The fixed overhead time may be queried from the fixed overhead schedule based on the number of qubits of the quantum circuit.
And S403, calculating the sum of the predicted execution time of each quantum gate of the first sub-circuit and the fixed overhead time as the execution time of the first sub-circuit.
Specifically, the sum of the predicted execution times of the respective quantum gates of the first sub-circuit is calculated as a first fractional time, the fixed overhead time is calculated as a second fractional time, the sum of the first fractional time and the second fractional time is calculated, and the result of the calculation is taken as the execution time of the first sub-circuit.
Fig. 5 is a schematic flowchart of a simulation method of a quantum circuit according to a fifth embodiment of the present invention, and as shown in fig. 5, in addition to the above embodiments, the obtaining a corresponding simulation method according to a division rule corresponding to each divided sub-circuit further includes:
s501, dividing the sub-circuit into 2nThe number value represents that n is the number of quantum bits included by the partitioning sub-circuit;
specifically, the dividing sub-circuit has the same number n of qubits as the quantum circuit, and 2 is used for the dividing sub-circuitnNumerical values are indicated. Wherein n is the number of qubits comprised by the partitioning sub-circuit. The dividing sub-circuit in the embodiment of the invention is a first sub-circuit.
S502, mixing 2nNumerical value division into 2n-kA size of 2 for each segmentk(ii) a Wherein k is the number of active qubits in the first active qubit set corresponding to the partitioning sub-circuit;
specifically, 2nNumerical value division into 2n-kA size of 2 for each segmentkAnd k is the number of active qubits in the first set of active qubits corresponding to the partitioning sub-circuit.
S503, processing each segment through a thread block of the graphics processor; where each thread block processes a fragment.
Specifically, each segment is assigned to a thread block of the graphics processor for processing, one segment per thread block. The corresponding segments can be stored in a shared memory of the GPU through the thread blocks, then each quantum gate of the segmentation sub-circuit is sequentially acted on the corresponding segment, and then the processed segment is copied to a main memory of the GPU from the shared memory.
The first simulation method, which is the method for simulating the split sub-circuit in steps 501 to 503, is obtained by improving the ShareMem method. The process of simulating the divided sub-circuit in the steps 501 to 503 is a process of simulating the divided sub-circuit according to the first simulation method corresponding to the first rule corresponding to the divided sub-circuit.
On the basis of the foregoing embodiments, further, the processing, by the thread block of the graphics processor, each segment includes:
and dividing the calculation task corresponding to each segment into a preset number of threads for processing.
Specifically, when each thread block processes the corresponding segment, the segment is divided into a plurality of computing tasks for processing, and the computing tasks corresponding to each segment can be divided into a preset number of threads for processing, so that the amount of repeated computation is reduced, and the computing efficiency is improved. The preset number is set according to actual needs, and the embodiment of the invention is not limited.
For example, as shown in FIG. 6a, T0~T8Computing tasks are represented, each computing task comprising two subtasks, which are represented in block diagram form in fig. 6a, the same subtasks being represented by the same block diagram, and different subtasks being represented by different block diagrams. Each computing task includes one same subtask, i.e., a subtask represented by a single-diagonal block diagram. Each computation task is assigned to one thread for processing, 9 computation tasks are assigned to 9 threads for processing, and each thread must process a subtask represented by a single-diagonal block diagram.
As shown in fig. 6b, in order to reduce the repetitive computation, 3 computation tasks are allocated to 1 thread for processing, and 9 computation tasks are allocated to 3 threads for processing. For each computing task processed by the same thread, the same subtasks can be combined for processing, and each thread only needs to be processed once for the subtasks represented by the single-slope block diagram. Although the execution time of each thread becomes longer, the total execution time is reduced due to the reduction in the number of threads and the reduction in the number of repetitive calculations.
For each computation task there is an identical operation (subtask), for example, the loading of parameters of a quantum gate, a jump to the corresponding execution code according to the parameters of the quantum gate, etc. When each thread executes a plurality of computing tasks, the same operation only needs to be executed once on each thread, and the operation can be repeatedly used among the computing tasks, so that the total operation times can be reduced. By setting the appropriate number of threads, each thread executes a plurality of computing tasks, and repeated operations are reduced, thereby improving the processing efficiency.
Fig. 7 is a schematic flowchart of a quantum circuit simulation method according to a seventh embodiment of the present invention, and as shown in fig. 7, based on the foregoing embodiments, further, the processing, by the thread block of the graphics processor, each segment includes:
s701, acquiring subscript increment of adjacent calculation tasks of the threads; wherein each thread is assigned a plurality of computing tasks;
specifically, the subscript increment lookup table is queried according to the number of the calculation tasks allocated by the current thread, the thread data of the thread block to which the current thread belongs, and the target quantum bit of the quantum gate currently processed, so that the subscript increments of the adjacent calculation tasks of the obtained thread can be queried. Wherein each thread is assigned a plurality of computational tasks, the index increment look-up table being pre-obtained.
The index increment lookup table may be configured by enumerating the number of computation tasks allocated to each thread, enumerating the number of threads included in each thread block, enumerating the number of target qubits of the quantum gate, and calculating an index increment corresponding to the number of computation tasks allocated to a thread, the number of threads included in a thread block, and the number of target qubits of the quantum gate by index lookup pseudo codes.
S702, calculating a subscript of a first calculation task of the thread;
specifically, the index of the first computational task of the thread may be computed.
For example, the index of the first calculation task of the current thread may be calculated according to the formula pair _ id > > t < (t +1)) | (pair _ id & ((1< < t) -1), where pair _ id represents the current thread task number and t represents the target qubit of the currently processed quantum gate effect.
S703, calculating the sum of the subscript of the last calculation task of the thread and the subscript increment as the subscript of the current calculation task; wherein the current computing task refers to a computing task subsequent to the first computing task.
Specifically, the sum of the subscript of the first computing task and the subscript increment may be calculated as the subscript of the second computing task, the sum of the subscript of the second computing task and the subscript increment may be calculated as the subscript of the third computing task, the sum of the subscript of the third computing task and the subscript increment may be calculated as the subscript of the fourth computing task, and so on until the subscript calculation of each computing task in the thread is completed. The current computing task is a computing task after the first computing task, that is, a second computing task, a third computing task, and a fourth computing task, until a last computing task.
For example, in processing a quantum gate that acts on the t-th quantum bit, each thread needs to complete x computational tasks, x being equal to or greater than 2. Applying a quantum gate to the t-th quantum bit, the t-th quantum bit
Figure BDA0003113470370000121
One task is to update the following two locations:
Figure BDA0003113470370000122
Figure BDA0003113470370000123
wherein lo represents a first position to be updated, hi represents a second position to be updated,
Figure BDA0003113470370000124
representing the number of active qubits. The two positions of the update are to insert 0 and 1 into the data respectively
Figure BDA0003113470370000125
Obtained at the t-th position.
In the prior art, in a general programming language (e.g. C language or C + + language), the above-mentioned operation of updating the location can be implemented by using the following index-finding pseudo code, which requires 8 integer operations in total (1< < t is calculated only once)
lo=(pair_id>>t<<(t+1))|(pair_id&((1<<t)-1);
hi=lo|(1<<t)
However, if a round-robin method (round-robin) is used to assign a thread to each computing task, the index increment of the position where each thread is assigned to the processing of multiple tasks is independent of the number of the thread in the thread block, so a look-up table may be used to store the index increments between tasks processed by the threads. When a quantum gate acting on the t-th quantum bit is processed, num _ task calculation tasks required to be completed by each thread can be realized by the following optimized index search pseudo code:
Figure BDA0003113470370000131
the number of num _ task is num _ thread, and a single thread block has num _ thread, and the task number of each thread, i.e. pair _ id, is as follows:
pair_id=thread_id+i×num_thread,i∈[0,num_task-1]
the pseudo code is searched through the optimized index, only two times of integer operation are needed for recalculating the subscript of the calculation task each time, and the recalculation mode of the subscript of the calculation task is more friendly to the optimization of the compiler.
On the basis of the foregoing embodiments, further, the processing, by the thread block of the graphics processor, each segment includes:
according to the formula
Figure BDA0003113470370000132
Data remapping is carried out on the data of each segment to obtain each remapped segmentData of the segment; wherein i represents the original position of the ith data of each segment, j represents the position of the ith data of each segment after remapping,
Figure BDA0003113470370000133
indicating that the xor is performed in bits,
Figure BDA0003113470370000134
u is a constant and w is the number of bits corresponding to the data type of the data of each segment.
Specifically, after copying the data of each segment into the shared memory of the GPU, the data of each segment is sequentially stored in the shared memory of the GPU, which may be according to a formula
Figure BDA0003113470370000135
And performing data remapping on the data of each segment, and adjusting the storage position of the data of each segment to reduce bank access conflicts in the shared memory. Usually, there are 32 shared memory banks in the GPU, each bank has a size of 32, and u may be set to 32 × 32 to 1024.
For example, as shown in fig. 8a, 8 cells in each row have 8 columns, and each cell represents a 128-bit double-precision complex number to represent one amplitude in a quantum state. The number 0 lattice stores the 0 th data, the number 1 lattice stores the 1 st data, the number 2 lattice stores the 2 nd data, and so on, and the number in fig. 8a indicates the number of the data, and also indicates the number of the lattice. The least significant bit is written to the leftmost side of the index. Since each grid represents a 128-bit double-precision complex number, the GPU can be considered to have 8 banks, and each bank has a size of 128. The grids in the same column belong to the same data bank in the shared memory of the GPU, and this layout of data storage may cause bank access conflicts when dealing with quantum gates acting on the 0 th, 1 st and 2 nd quantum bits. To handle the quantum gate acting on qubit 0, 32 threads within a bundle (wrap) need to first access locations {0,2,4, 6.., 60,62} (the gray grid in fig. 8 a), respectively, and then access {1,3,5, 7.., 61,63} (the white grid in fig. 8 a), respectively. Both accesses only access half of the columns of the matrix in the figure, thus wasting half of the bank per access, being serialized into 8 shared memory accesses.
By the formula
Figure BDA0003113470370000141
The original data in fig. 8a is data remapped to obtain the result as shown in fig. 8b, where the numbers in the grid in fig. 8b represent the data in fig. 8a, and the number of the grid in fig. 8b is the same as the number of the grid in fig. 8 a. Since each lattice stores a double precision complex number, w is 128 and q is 1024/128 is 8. When i is equal to 0, the reaction is carried out,
Figure BDA0003113470370000142
when j is 0, the 0 th data position is not moved, and i is 8,
Figure BDA0003113470370000143
the 8 th data in fig. 8a is remapped from the 8 th trellis to the 9 th trellis in fig. 8 b. Since the storage locations of the data accessed each time are evenly distributed in 8 columns, all banks can be utilized, and only 4 times of shared memory access are needed. The access conflict of the bank is reduced, and the access efficiency of the data is improved.
Fig. 9 is a schematic flow chart of a quantum circuit simulation method according to a ninth embodiment of the present invention, and as shown in fig. 9, in addition to the above embodiments, the second division rule includes:
s901, acquiring a plurality of second active qubit sets from the qubits of the remaining circuits, wherein the second active qubit sets comprise m active qubits, and m is greater than or equal to a third quantity and less than or equal to a fourth quantity;
in particular, the remaining circuitry includes at least one quantum gate, each quantum gate including a Control qubit and a target qubit of the Control qubit. M qubits may be obtained from the qubits of the remaining circuits to form a first set of active qubits, and a plurality of second sets of active qubits may be obtained, each qubit included in the second set of active qubits being referred to as an active qubit. m is equal to or greater than the third number and equal to or less than the fourth number. The second simulation method corresponding to the second partitioning rule is obtained based on the BatchMV method, and the number of active quantum bits cannot be too large because the exponential increase of the multiplication times is caused by the increase of the number of active quantum bits when the subsequent quantum circuit simulation is carried out. The third number and the fourth number are set according to actual needs, and the embodiment of the invention is not limited.
For example, the third number is set to 4 and the fourth number is set to 7.
Wherein obtaining a plurality of second active qubit sets from the qubits of the remaining circuitry comprises:
obtaining a set of executable quantum gates from the quantum gates of the remaining circuitry, the set of executable quantum gates comprising at least one quantum gate;
obtaining the qubits meeting a second addition rule from the qubits of the remaining circuits, and adding the qubits into a second qubit set so that the number of the qubits included in the second qubit set is just less than or equal to m;
if the number of the qubits included in the second set of qubits is equal to m, taking the second set of qubits as a second active set of qubits; if the number of qubits included in the second set of qubits is less than m, randomly selecting a difference qubit between m and the number of qubits included in the second set of qubits, and forming a second active set of qubits with the qubits included in the second set of qubits.
When obtaining the executable quantum gate set from the quantum gates of the remaining circuit, it is necessary to determine whether each of the quantum gates of the remaining circuit can be executed. For one of the quantum gates of the remaining circuits, if the number of relevant qubits of the quantum gate is greater than m, the quantum gate is not executable, and the control qubit and the target qubit of the quantum gate are added to the set of non-executable qubits; and if the number of the relevant quantum bits of the quantum gate is less than or equal to m and the relevant quantum bits of the quantum gate and the unexecutable quantum bit collection do not have the same quantum bits, adding the quantum bits into the executable quantum gate collection as an executable quantum gate. Wherein the set of non-executable qubits is initially empty; determining whether a quantum gate can be executed requires first determining whether the quantum gate dependent on the quantum gate can be executed.
It should be noted that the relevant qubits of the quantum gate include: a collection of target and control qubits for the quantum gates and control qubits and target qubits for quantum gates on which the quantum gates depend. The quantum gate dependent quantum gate is all quantum gates which are positioned before the quantum gate and have non-empty intersection of the active quantum bit (including the control quantum bit and the target quantum bit) set and the quantum bit set of the quantum gate active quantum gate. When selecting the relevant quantum bit of the quantum gate, if the quantum gate is a diagonal gate, all the quantum bits acted by the diagonal gate and the control bit of the controlled quantum gate must be selected; if the quantum gates are non-diagonal gates, the target qubits for the non-diagonal gates must be selected.
The second addition rule includes:
step one, if the number of the qubits included in the second qubit set is less than m, counting the number of quantum gates corresponding to each qubit in the remaining qubit set; wherein the set of remnant qubits refers to the qubits in the qubits of the remnant circuit that are not added to the second set of qubits; each qubit in the set of remaining qubits is a control qubit or a target qubit of a corresponding qubit gate, the qubit corresponding to each qubit in the set of remaining qubits is a quantum gate in the set of executable qubits and does not include a quantum gate corresponding to a qubit in the second set of qubits;
secondly, the quantum gates corresponding to the quantum bits in the residual quantum bit set are counted to obtain the quantum bits with the largest number of the quantum gates, the quantum bits are added into a second quantum gate set, and the quantum gates corresponding to the quantum bits with the largest number of the quantum gates are added into the second quantum gate set;
and thirdly, repeating the first step until the number of the qubits included in the second set of qubits is just less than or equal to the first number, that is, if the qubits are added to the second set of qubits, the number of the qubits in the second set of qubits is greater than m, or adding the qubits to the second set of qubits, and the number of the quantum gates in the second set of quantum gates is not increased.
S902, taking the quantum gate corresponding to each first active quantum bit set as a candidate second sub-circuit;
specifically, the quantum gates acting on the second active qubit set are used as quantum gates corresponding to the second active qubit set, and there is at least one quantum gate corresponding to the second active qubit set. The corresponding quantum gate of each second active quantum bit set can be obtained, and then the corresponding quantum gate of each second active quantum bit set is used as a candidate second sub-circuit. Selecting one candidate second sub-circuit from the candidate second sub-circuits as the second sub-circuit.
S903, performing execution time prediction on each candidate second sub-circuit to obtain the predicted execution time of each candidate second sub-circuit, and counting the number of quantum gates of each candidate second sub-circuit;
specifically, for each candidate second sub-circuit, a prediction of execution time may be performed, obtaining a predicted execution time for each candidate second sub-circuit. The number of quantum gates of each candidate second sub-circuit may be counted to obtain the number of quantum gates of each candidate second sub-circuit.
S904, obtaining the execution efficiency of each candidate second sub-circuit based on the number of quantum gates and the predicted execution time of each candidate second sub-circuit;
specifically, for each candidate second sub-circuit, the execution efficiency of the first sub-circuit may be obtained from the number of quantum gates of the candidate second sub-circuit and the predicted execution time of the candidate second sub-circuit. The execution efficiency of each first sub-circuit can be obtained.
For example, the execution efficiency Eff of the candidate second sub-circuit may be obtained by calculation according to the formula Eff — Q/t, where Q represents the number of quantum gates of the candidate second sub-circuit and t represents the predicted execution time of the candidate second sub-circuit.
S905, comparing the execution efficiency of each candidate second sub-circuit, and acquiring the candidate second sub-circuit with the highest execution efficiency as the second sub-circuit.
Specifically, after obtaining the execution efficiency of each candidate second sub-circuit, comparing the execution efficiency of each candidate second sub-circuit, and obtaining a maximum value of the execution efficiency from the execution efficiencies, where the candidate second sub-circuit corresponding to the maximum execution efficiency is the candidate second sub-circuit with the highest execution efficiency, and the candidate second sub-circuit with the highest execution efficiency is taken as the first sub-circuit.
Fig. 10 is a schematic flowchart of a quantum circuit simulation method according to a tenth embodiment of the present invention, and as shown in fig. 10, based on the foregoing embodiments, the performing a time prediction on each candidate second sub-circuit further includes:
s1001, inquiring to obtain first time according to the number of quantum bits of the quantum circuit;
specifically, the candidate second sub-circuits obtained by the second segmentation rule are only subjected to simulation calculation by using a second simulation method corresponding to the second segmentation rule, wherein the second simulation method is improved based on the BatchMV method. When a second simulation method is adopted to carry out simulation calculation on the candidate second sub-circuit, two steps are needed, wherein in the first step, the active quantum bit is exchanged to the lowest bit through matrix transposition; second, calculate the shape to be 2n-k×2k×2kBy a common matrix, i.e. with a calculation dimension of 2n-k×2kQuantum state matrix and dimension of 2k×2kN is the number of qubits of the quantum circuit and k is the number of active qubits. Then for the candidateThe prediction of the execution time of the two sub-circuits, i.e. the sum of the predicted times of the two steps.
For the first step, the time taken is only related to the number of qubits of the quantum circuit. The time measurement of the matrix transposition of the first step can be performed on the quantum circuits with different numbers of the qubits through a second simulation method, so that first time corresponding to the different numbers of the qubits is obtained, and a first step time table is formed and stored in the database. The first time may be obtained by looking up the first step schedule according to the number of qubits of the quantum circuit.
S1002, inquiring to obtain second time according to the number of the quantum bits of the quantum circuit and the number of active quantum bits in a second active quantum bit set corresponding to the candidate second sub-circuit;
specifically, for the second step described above, a sub-circuit having a number of active qubits in the second active qubit collection of quantum circuits having a different number of qubits may be calculated by the second simulation method with a dimension of 2n-k×2kQuantum state matrix and dimension of 2k×2kThe time measurement multiplied by the quantum gate matrix obtains second time corresponding to different quantum bit numbers and different active quantum bit numbers, and forms a second step time table to be stored in a database. The second step schedule may be queried to obtain a second time according to the number of qubits of the quantum circuit and the number of active qubits in the second active qubit set corresponding to the candidate second sub-circuit.
And S1003, calculating the sum of the first time and the second time to be used as the predicted execution time of the candidate second sub-circuit.
Specifically, after obtaining a first time and a second time, calculating the sum of the first time and the second time, and taking the result of the sum as the predicted execution time of the candidate second sub-circuit.
Fig. 11 is a schematic flowchart of a simulation method of a quantum circuit according to an eleventh embodiment of the present invention, and as shown in fig. 11, in addition to the above embodiments, the obtaining a corresponding simulation method according to a division rule corresponding to each divided sub-circuit further includes:
s1101, obtaining tensors corresponding to the segmentation sub-circuits;
specifically, the dividing sub-circuit and the quantum circuit have the same number n of qubits, quantum states corresponding to the dividing sub-circuit can be regarded as an n-dimensional tensor in which each bit size is 2, and a tensor corresponding to the dividing sub-circuit can be obtained. The dividing sub-circuit in the embodiment of the invention is a second sub-circuit.
S1102, transposing the tensors corresponding to the dividing sub-circuits to obtain a quantum state matrix and a quantum gate matrix;
specifically, the tensors corresponding to the division sub-circuits are transposed so that each length is 2kIs stored continuously in memory and thus can be viewed as one dimension of 2n-k×2kOne row in the state matrix of (a). By transposing, 2n-kEach size is 1 × 2k×2kAre combined into a matrix vector product of size 2n-k×2k×2kBy a universal matrix multiplication from which a dimension of 2 can be obtainedn-k×2kQuantum state matrix and dimension of 2k×2kThe quantum gate matrix of (1). Wherein n is the number of qubits of the partitioning sub-circuit, and k is the number of active qubits corresponding to the partitioning sub-circuit.
And S1103, obtaining a simulation result of the dividing sub-circuit according to the quantum state matrix, the quantum gate matrix and the matrix operation library.
Specifically, after the quantum state matrix and the quantum gate matrix are obtained, analog computation may be performed through a matrix operation library, and a computation result may be directly obtained from the matrix operation library according to the quantum state matrix and the quantum gate matrix, so as to obtain an analog result of the division sub-circuit, thereby improving computation efficiency. The matrix operation library may adopt cuBLAS. After the matrix multiplication is completed, the quantum state does not need to be transposed back to the original memory arrangement again, because the label of the quantum bit acted by the subsequent quantum gate can be renumbered, so as to ensure the equivalence of the operation.
The second simulation method, which is a method used for simulating the division sub-circuit in the above-described steps 1101 to 1103, is obtained by improvement based on the BatchMV method. The process of simulating the divided sub-circuit in the steps 901 to 903 is a process of simulating the divided sub-circuit according to the second simulation method corresponding to the second rule corresponding to the divided sub-circuit.
Fig. 12 is a schematic flowchart of a method for simulating a quantum circuit according to a twelfth embodiment of the present invention, and as shown in fig. 12, on the basis of the foregoing embodiments, the method for simulating a quantum circuit according to the embodiment of the present invention further includes:
s1201, dividing a quantum gate of the quantum circuit into quantum gates of multiple stages based on a preset division rule, and dividing n quantum bits of the quantum circuit into g global quantum bits and n-g local quantum bits of the multiple stages; wherein the number of the graphics processors is 2gA plurality of;
specifically, in the passage 2gWhen the GPU simulates one quantum circuit, the quantum gate of the quantum circuit can be divided into a plurality of stages of quantum gates according to a preset division rule, the quantum gates of the stages cannot be repeated, and the quantum gates of the stages form the quantum gate of the whole quantum circuit. And dividing the n qubits of the quantum circuit into g global qubits and n-g local qubits for each stage. The quantum gates of the first stage can be processed by each GPU separately, and the quantum gates of the remaining stages require communication by each GPU before processing. Wherein g is a natural number. The preset partition rule is set according to actual needs, and the embodiment of the invention is not limited.
Firstly, dividing a first stage to obtain a quantum gate of the first stage; then, the second stage of division is carried out to obtain a quantum gate of the second stage; and then, carrying out division of a third stage to obtain the quantum gate of the third stage, and so on until all the quantum gates of the quantum circuit are divided. And the quantum gate which is divided in the previous stage does not participate in the division of the quantum gate in the subsequent stage.
Wherein the preset partition rule comprises:
obtaining a set of executable quantum gates from quantum gates of the quantum circuit, the set of executable quantum gates comprising a plurality of quantum gates;
obtaining a quantum bit meeting a third adding rule from the quantum bits of the quantum circuit, adding the quantum bit into a third quantum bit set, and obtaining a third quantum gate set;
if the number of the qubits included in the third qubit set is equal to n-g, taking the qubits in the third qubit set as n-g local qubits, and taking the quantum gates in the third quantum gate set as the quantum gates in the first stage; if the number of qubits included in the third set of qubits is less than n-g, randomly choosing a difference qubit between n-g and the number of qubits included in the third set of qubits, and using the difference qubits together with the qubits included in the third set of qubits as n-g local qubits.
When obtaining a set of executable quantum gates from the quantum gates of the quantum circuit, it is necessary to determine whether each of the quantum gates of the quantum circuit can execute. For one quantum gate in quantum gates of the quantum circuit, if the number of the related quantum bits is larger than n-g, the quantum gate is not executable, and a control quantum bit and a target quantum bit of the quantum gate are added into a non-executable quantum bit set; and if the number of the quantum bits corresponding to the quantum gate is less than or equal to n-g, and the quantum bit related to the quantum gate does not have the same quantum bit as the quantum bit set which cannot be executed, adding the quantum bit into the quantum gate set which can be executed as an executable quantum gate. Wherein the set of non-executable qubits is initially empty; determining whether a quantum gate can be executed requires first determining whether the quantum gate dependent on the quantum gate can be executed.
It should be noted that the relevant qubits of the quantum gate include: a collection of target and control qubits for the quantum gates and control qubits and target qubits for quantum gates on which the quantum gates depend. The quantum gate dependent quantum gate is all quantum gates which are positioned before the quantum gate and have non-empty intersection of the active quantum bit (including the control quantum bit and the target quantum bit) set and the quantum bit set of the quantum gate active quantum gate. In selecting the relevant qubits for the qubit gates, the target qubit has to be selected if the qubit gates are off-diagonal gates.
The third addition rule includes:
step one, if the number of the qubits included in the third qubit set is less than n-g, counting the number of quantum gates corresponding to each qubit in the remaining qubit set; wherein the remaining qubit set refers to the qubits in the qubits of the quantum circuit that are not added to the third qubit set; each qubit in the set of remaining qubits is a control qubit or a target qubit of a corresponding qubit gate, the qubit corresponding to each qubit in the set of remaining qubits is a quantum gate in the set of executable qubits, and does not include a quantum gate corresponding to a qubit in the third set of qubits;
secondly, the quantum gates corresponding to the quantum bits in the residual quantum bit set are counted to obtain the quantum bits with the largest number of the quantum gates, the quantum bits are added into a third quantum gate set, and the quantum gates corresponding to the quantum bits with the largest number of the quantum gates are added into the third quantum gate set;
and thirdly, repeating the first step until the number of the qubits included in the third set of qubits is just less than or equal to n-g, namely if the qubits are added to the third set of qubits, the number of the qubits in the third set of qubits is greater than n-g, or adding the qubits to the third set of qubits, and the number of the quantum gates in the third set of quantum gates is not increased.
S1202, converting 2 in quantum statenNumber ofAverage value of 2gThe graphics processors enable the graphics processors to perform first stage data processing based on the n-g local qubits and perform data interaction for second stage data processing.
In particular, the quantum circuit may pass through 2n Numerical representation 2nNumber average assigned to 2gAnd each GPU performs processing, and each GPU performs first-stage data processing based on the n-g local qubits in the first stage. After the data processing in the first stage is completed, each GPU transmits unprocessed data to other GPUs, namely, the data which belong to the processing of other GPUs in the unprocessed data are transmitted to the corresponding GPUs, the data which belong to the processing of the local GPUs and are transmitted by the other GPUs are received, then, the second-stage data processing is performed based on the n-g local qubits in the second stage, and then, the second-stage data processing is performed based on the n-g local qubits in the second stage.
Each GPU performs a first stage data processing process based on n-g local qubits in the first stage, and may regard the data allocated to each GPU as a quantum circuit, and process the allocated data according to steps S101 and S102, with the n-g local qubits as the qubits of the quantum circuit. It should be noted that the global qubit in the first stage cannot be selected as the active qubit when selecting the active qubit.
For each stage except the first stage, each GPU performs per-stage data processing based on n-g local qubits of each stage, and may treat the data belonging to the local GPU processing as a quantum circuit, and treat the data belonging to the local GPU processing according to steps S101 and S102, with the newly divided n-g local qubits as the qubits of the quantum circuit. It should be noted that the global qubit of the current stage cannot be selected as the active qubit when selecting the active qubit.
The following describes an implementation process of simulating one quantum circuit by a plurality of GPUs, taking the process of processing one quantum circuit a with 5 qubits by 4 GPUs as an example.
Based on a preset partition rule, 2 global qubits and 3 local qubits are selected from the quantum circuit a, with qubits 0, 1, 2 as local qubits and qubits 3 and 4 as global qubits. Then 2 of the quantum circuit A5The values are evenly distributed to 4 GPUs for processing, and the 4 GPUs can be respectively marked as 0, 1, 2 and 3.
As shown in fig. 13, data with global qubits 3 and 4 corresponding to 0 is allocated to GPU0, data with global qubits 3 and 4 corresponding to 1 is allocated to GPU1, data with global qubits 3 and 4 corresponding to 2 is allocated to GPU2, and data with global qubits 3 and 4 corresponding to 3 is allocated to GPU 3.
Each GPU may process separately for the quantum gates assigned by each GPU that act on local qubits 0, 1, 2. Each GPU performs a first stage data processing process based on 3 local qubits 0, 1, and 2, and may regard the quantum gate acting on the local qubits 0, 1, and 2 allocated to each GPU as a quantum circuit, and process the allocated data according to steps S101 and S102, with 3 local qubits 0, 1, and 2 as the quantum bits of the quantum circuit.
As shown in fig. 13, the quantum gates assigned to each GPU, which act on global qubits 3 and 4, i.e., the data in columns 4 and 5 in the figure, cannot be processed because the data pairs with different global qubits 3 and 4 are divided into different GPUs.
As shown in fig. 14, the quantum circuit a includes 7 quantum gates: g1, G2, G3, G4, G5, G6 and G7, the data processing of the quantum circuit a can be divided into two stages in order to process the quantum gates acting on all the qubits. Stage 1: with qubits 0, 1, 2 as local qubits and qubits 3 and 4 as global qubits, the quantum gates G1, G2, G3 and G4 can be processed in phase 1; and (2) stage: with qubits 1,3, 4 as local qubits and qubits 0 and 2 as global qubits, the qubits gates G5, G6, and G7 can be processed in stage 2. Among them, G2 may be processed during the communication of each GPU.
In order to re-divide the global qubits and the local qubits, the data needs to be rearranged when the second stage of data processing is performed. In the data rearrangement process, the target GPU of each data is determined by the bit of the new global qubit in the index. The rearrangement of the data is divided into the following steps:
(1) packaging all data to be sent to the same GPU in each GPU into a continuous fragment;
(2) and sending the data belonging to other GPUs to the corresponding GPU, and receiving the data belonging to the local GPU from other GPUs.
As shown in fig. 15, the work flow of quantum circuit a from phase 1 to phase 2 is shown. Each GPU needs to send two data to the other GPU. The number of the target GPU is determined by the bit in the index where the new global qubits 0 and 2 are located. For example, data with an index of 00010, a bit combination of global qubits 0 and 2 is 00, data 00010 is sent to GPU0, data with an index of 10110, a bit combination of global qubits 0 and 2 is 11, and data 10110 is sent to GPU 3. In the first step of fig. 15, the data inside each GPU is transposed once, so that the same qubits are stored consecutively in the 0 th and 2 nd bits. In the second step of fig. 15, the GPUs communicate to obtain data to be processed, and the second stage data processing may be performed. Each GPU performs the second stage data processing based on the 3 local qubits 1,3, and 4 obtained by the repartitioning, and may regard the data belonging to the local GPU as a quantum circuit, regard the 3 repartitioned local qubits 1,3, and 4 as the qubits of the quantum circuit, and process the data belonging to the local GPU according to steps S101 and S102. Wherein, each GPU can handle the role of quantum gate G2 on qubit 1 during communication.
It should be noted that if a qubit is a local qubit in both phase 1 and phase 2, the qubits that are applied to the qubit in the current phase may also be processed after the data reaches the target GPU, and thus these qubits may be processed during the second communication step of fig. 15. If the indices of the data in a data pair differ only in the bits where the qubits are located, the GPU they are currently located and the communicating target GPU are identical, so they are packed into the same fragment in the first step of FIG. 15 and sent simultaneously in the second step of FIG. 15. Once a segment reaches the target GPU, processing of quantum gates acting on data pairs in the segment can begin, so there is no need to wait until all communications have ended before processing of quantum gates can begin. For example, in quantum circuit a, qubit 1 is a local qubit in both stages, so that the quantum gate G2 acting on it can be processed during communication. As shown in fig. 15, only the 1 st digit of the different numbers is packed into the same fragment (labeled in the same gray scale in the figure) in the first step and sent to the same target GPU in the second step. The quantum gates acting on qubit 1 may process inside the respective fragments, so the processing of G2 on each fragment may begin directly after the fragment reaches the target GPU.
According to the quantum circuit simulation method provided by the embodiment of the invention, when the quantum circuit is processed by the plurality of GPUs, the improvement of the simulation efficiency comes from the reduction of communication traffic brought by division and the higher communication bandwidth of direct transmission among the GPUs compared with the communication bandwidth of redistribution after the CPU collects the data. But also from the increase in analog efficiency of the individual dividing sub-circuits.
TABLE 1 existing versions of Quantum Circuit simulator
Figure BDA0003113470370000231
In order to verify the high efficiency of the simulation method of the quantum circuit provided by the embodiment of the present invention, a quantum circuit simulator (named HyQuas) constructed by using the simulation method of the quantum circuit provided by the embodiment of the present invention and the existing quantum circuit simulators (QCGPU, Qibo, Qiskit, QuEST, qualacs, Yao) shown in table 1 are used to simulate 7 commonly used quantum circuits (bc, bv, hs, qaoa, qft, qv, sp, GeoMean) on the GPU, and the relevant information of the 7 circuit circuits is shown in table 2 to demonstrate the performance of the quantum circuit simulator provided by the embodiment of the present invention.
TABLE 2 information about seven Quantum circuits
Figure BDA0003113470370000241
The experimental platform included the following two clusters:
(1) v100 clustering: including 4V 100-SXM2-16GB GPUs and two Intel Xeon E5-2620V 4 CPUs connected using nvlink2.0 (a high speed point-to-point communication protocol developed by the intevada corporation). The compiler versions are GCC 8.3.0, CUDA 10.2.89.
(2) A100 clustering: comprises an A100-PCIE-40GB GPU and an AMD EPYC 7282 CPU. Compiler versions are GCC 8.3.0 and CUDA 11.0.2.
As shown in fig. 16, a single GPU obtained based on the a100 cluster simulates the results of different quantum circuits through different quantum simulators. As can be seen from fig. 16, the time consumed by the quantum simulator HyQuas provided in the embodiment of the present invention for simulating 7 different quantum circuits is the shortest, and compared with the quantum circuit simulator in the prior art, the performance of the quantum simulator provided in the embodiment of the present invention is improved by 2.20 times and 9.82 times at most.
As shown in fig. 17, a plurality of GPUs obtained based on the V100 cluster simulate different quantum circuits by different quantum simulators. As can be seen from fig. 17, the time consumed by the quantum simulator HyQuas provided in the embodiment of the present invention for simulating 7 different quantum circuits is the shortest, and compared with the quantum circuit simulator in the prior art, the performance of the quantum simulator provided in the embodiment of the present invention is improved by 2.73 times and 10.71 times at the maximum.
The quantum simulator HyQuas provided by the embodiment of the invention and the existing quantum simulator Qibo are respectively used for simulating 7 different quantum circuits on 1, 2 and 4 GPUs, and the obtained simulation results are shown in fig. 18. Particularly, under the condition of 4 GPUs, the performance of the quantum simulator HyQuas provided by the embodiment of the invention is improved by 227 times compared with that of the quantum simulator Qibo.
Fig. 19 is a schematic structural diagram of a quantum circuit simulation apparatus according to a nineteenth embodiment of the present invention, and as shown in fig. 19, the quantum circuit simulation apparatus according to the embodiment of the present invention includes a divider 1 and an actuator 2, where:
the splitter 1 includes a blend splitting unit 11 and an execution time prediction unit 12, the blend splitting unit 11 is connected to the execution time prediction unit 12, and the blend splitting unit 11 is connected to the executor 2.
The hybrid division unit 11 is configured to divide the quantum circuit based on a hybrid division rule, and obtain at least one division sub-circuit and a respective corresponding division rule; the execution time prediction unit 12 is used to predict the execution time of the partitioning sub-circuit.
The actuator 2 is used for acquiring a corresponding simulation method according to the division rule corresponding to each division sub-circuit to simulate each division sub-circuit, so as to obtain the simulation result of the quantum circuit.
Fig. 20 is a schematic structural diagram of a quantum circuit analog device according to a twentieth embodiment of the present invention, and as shown in fig. 20, based on the foregoing embodiments, further, the segmenter 1 further includes a stage segmentation unit 13, and the executor 2 includes a data packing unit 21 and a communication unit 22, where:
the phase segmentation unit 13 is connected to the executor 2, and the data packetizing unit 21 is connected to the communication unit 22.
The phase segmentation unit 13 is configured to divide n qubits of the quantum circuit into g global qubits and n-g local qubits based on a preset division rule; wherein the number of the graphics processors is 2gAnd (4) respectively.
The actuator 2 is also used for coupling 2 of the quantum circuitnNumber average assigned to 2gEach graphics processor is enabled to carry out first-stage data processing based on n-g local qubits and carry out data interaction to obtain n-g local qubits again based on the preset division rule to carry out second-stage data processing; the data packing unit 21 is used for packing data to be interacted. The communication unit is used for sending data needing interaction.
Fig. 21 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-first embodiment of the present invention, and as shown in fig. 21, the analog device of a quantum circuit according to the embodiment of the present invention includes a dividing module 2110 and an analog module 2120, where:
the dividing module 2110 is used for dividing the quantum circuit based on a mixed dividing rule to obtain at least one dividing sub-circuit and a simulation method corresponding to each dividing sub-circuit; the simulation module 2120 is configured to simulate each of the divided sub-circuits according to a simulation method corresponding to each of the divided sub-circuits, and obtain a simulation result of the quantum circuit.
Specifically, for the quantum circuit that needs to be simulated, the splitting module 2110 may split the quantum circuit based on the hybrid splitting rule to obtain at least one splitting sub-circuit, that is, one splitting sub-circuit may be obtained, or two or more splitting sub-circuits may be obtained. Each of the divided sub-circuits corresponds to an analog method. Each of the dividing sub-circuits includes at least one quantum gate, that is, the dividing sub-circuit may include one quantum gate, and may also include two or more quantum gates. The hybrid segmentation rule is preset and comprises at least two segmentation rules.
After obtaining the simulation method corresponding to the divided sub-circuit, the simulation module 2120 may simulate the divided sub-circuit by the simulation method corresponding to the divided sub-circuit to obtain a simulation result of the simulation circuit. And sequentially simulating each divided sub-circuit by a simulation method corresponding to each divided sub-circuit, and taking the obtained simulation result as the simulation result of the quantum circuit.
The quantum circuit simulation device provided by the embodiment of the invention can divide the quantum circuit based on the mixed division rule to obtain at least one division sub-circuit and a simulation mode corresponding to each division sub-circuit, then simulate each division sub-circuit according to the simulation method corresponding to each division sub-circuit to obtain the simulation result of the quantum circuit, and improve the simulation efficiency of the quantum circuit by improving the simulation efficiency of each division sub-circuit.
Fig. 22 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-second embodiment of the present invention, and as shown in fig. 22, on the basis of the foregoing embodiments, the hybrid division rule further includes a first division rule and a second division rule; accordingly, the segmentation module 2110 includes a segmentation sub-module 21101 and a comparison sub-module 21102, wherein:
the division submodule 21101 is configured to divide the remaining circuit according to the first division rule to obtain a first sub-circuit, and divide the remaining circuit according to the second division rule to obtain a second sub-circuit; wherein the remaining circuit is obtained by removing the divided sub-circuit which has been divided from the quantum circuit; the comparison sub-module 21102 is configured to compare the execution efficiency of the first sub-circuit with the execution efficiency of the second sub-circuit, and obtain a first sub-circuit or a second sub-circuit with higher execution efficiency as the dividing sub-circuit; wherein the execution efficiency of the first sub-circuit and the execution efficiency of the second sub-circuit are obtained in advance.
Fig. 23 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-third embodiment of the present invention, and as shown in fig. 23, on the basis of the foregoing embodiments, the partitioning submodule 21101 further includes a first obtaining unit 211011, a first as unit 211012, a first prediction unit 211013, and a first obtaining unit 211014, where:
a first obtaining unit 211011 is configured to obtain a first set of active qubits from the qubits of the remaining circuit, where the first set of active qubits includes a first number of active qubits including qubits in a second number of dimensions that are lowest in quantum state; wherein the second number is less than the first number; first as unit 211012 is to treat a quantum gate corresponding to the first active quantum bit set as the first sub-circuit; the first prediction unit 211013 is configured to perform time prediction on the first sub-circuit, obtain a predicted time for the first sub-circuit, and count the number of quantum gates of the first sub-circuit; the first obtaining unit 211014 is configured to obtain an execution efficiency of the first sub-circuit based on the number of quantum gates and the predicted execution time of the first sub-circuit. Fig. 24 is a schematic structural diagram of an analog apparatus of a quantum circuit according to a twenty-fourth embodiment of the present invention, and as shown in fig. 24, on the basis of the foregoing embodiments, further, the first obtaining unit 211013 includes a first querying subunit 2110131, a second querying subunit 2110132, and a first calculating subunit 2110133, where:
a first query subunit 2110131 is configured to query for a predicted execution time of each quantum gate of the first sub-circuit according to a type of each quantum gate of the first sub-circuit and a number of qubits of the quantum circuit; the second query subunit 2110132 is configured to query for a fixed overhead time according to the number of qubits of the quantum circuit; the first calculating sub-unit 2110133 is configured to calculate a sum of the predicted execution time of each quantum gate of the first sub-circuit and the fixed overhead time as the predicted execution time of the first sub-circuit.
Fig. 25 is a schematic structural diagram of a simulation apparatus of a quantum circuit according to a twenty-fifth embodiment of the present invention, and as shown in fig. 25, on the basis of the foregoing embodiments, further, a simulation module 2120 includes a representation submodule 2120, a dividing submodule 2120, and a processing submodule 2120, where:
the representation submodule 21201 is used for dividing the sub-circuit into 2nThe number value represents that n is the number of quantum bits included by the partitioning sub-circuit; divide submodule 21202 for dividing 2nNumerical value division into 2n-kA size of 2 for each segmentk(ii) a Wherein k is the number of active qubits in the first active qubit set corresponding to the partitioning sub-circuit; the processing submodule 21203 is configured to process each segment through a thread block of the graphics processor; where each thread block processes a fragment.
On the basis of the foregoing embodiments, further, the processing submodule 21203 is specifically configured to:
and dividing the calculation task corresponding to each segment into a preset number of threads for processing.
Fig. 26 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-sixth embodiment of the present invention, and as shown in fig. 26, on the basis of the foregoing embodiments, further, the processing submodule 21203 includes a second obtaining unit 212031, a calculating unit 212032, and a second obtaining unit 212065, where:
the second obtaining unit 212064 is configured to obtain subscript increments of adjacent computing tasks of the thread; wherein each thread is assigned a plurality of computing tasks; the calculation unit 212032 is used for calculating the index of the first calculation task of the thread; the second obtaining unit 212033 is configured to calculate a sum of the subscript of the last calculation task of the thread and the subscript increment as the subscript of the current calculation task; wherein the current computing task refers to a computing task subsequent to the first computing task.
On the basis of the foregoing embodiments, further, the processing submodule 21203 is specifically configured to:
according to the formula
Figure BDA0003113470370000281
Carrying out data remapping on the data of each segment to obtain the data of each segment after remapping; wherein i represents the original position of the ith data of each segment, j represents the position of the ith data of each segment after remapping,
Figure BDA0003113470370000282
indicating that the xor is performed in bits,
Figure BDA0003113470370000283
u is a constant and w is the number of bits corresponding to the data type of the data of each segment.
Fig. 27 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-seventh embodiment of the present invention, and as shown in fig. 27, on the basis of the foregoing embodiments, the partitioning submodule 21101 further includes a third obtaining unit 211015, a second as unit 211016, a second predicting unit 211017, a third obtaining unit 211018, and a comparing unit 211019, where:
the third obtaining unit 211015 is configured to obtain a plurality of second active qubit sets from the qubits of the remaining circuits, where the second active qubit set includes m active qubits, and m is greater than or equal to the third number and less than or equal to the fourth number; second as unit 211016 is configured to treat the corresponding quantum gate of each second active qubit set as a candidate second sub-circuit; the second prediction unit 211017 is configured to perform execution time prediction on each candidate second sub-circuit, obtain an execution time of each candidate second sub-circuit, and count the number of quantum gates of each candidate second sub-circuit; the third obtaining unit 211018 is configured to obtain an execution efficiency of each candidate second sub-circuit based on the number of quantum gates and the execution time of each candidate second sub-circuit; the comparison unit 211019 is configured to compare the execution efficiency of each candidate second sub-circuit, and obtain the candidate second sub-circuit with the highest execution efficiency as the second sub-circuit.
Fig. 28 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-eighth embodiment of the present invention, and as shown in fig. 28, on the basis of the foregoing embodiments, further, the third obtaining unit 211018 includes a third querying subunit 2110181, a fourth querying subunit 2110182, and a second calculating subunit 2110183, where:
the third query subunit 2110181 is configured to query for the first time according to the number of qubits of the quantum circuit; the fourth querying subunit 2110182 is configured to query for a second time according to the number of qubits in the quantum circuit and the number of active qubits in the second active qubit set corresponding to the candidate second sub-circuit; the second calculating sub-unit 2110183 is configured to calculate a sum of the first time and the second time as the execution time of the candidate second sub-circuit.
Fig. 29 is a schematic structural diagram of an analog device of a quantum circuit according to a twenty-ninth embodiment of the present invention, and as shown in fig. 29, on the basis of the foregoing embodiments, further, the analog module 2120 includes an obtaining submodule 21204, a transposing submodule 2120, and an obtaining submodule 2120, where:
the obtaining submodule 21204 is configured to obtain a tensor corresponding to the dividing submodule; the transposition submodule 21201 is configured to transpose a tensor corresponding to the dividing submodule to obtain a quantum state matrix and a quantum gate matrix; the obtaining submodule 21206 is configured to obtain a simulation result of the dividing submodule according to the quantum state matrix, the quantum gate matrix, and the matrix operation library.
Fig. 30 is a schematic structural diagram of an analog device of a quantum circuit according to a thirtieth embodiment of the present invention, and as shown in fig. 30, on the basis of the foregoing embodiments, further, the analog device of a quantum circuit according to the present invention further includes a dividing module 2130 and an allocating module 2140, where:
the dividing module 2130 is configured to divide a quantum gate of the quantum circuit into quantum gates of multiple stages based on a preset dividing rule and divide n quantum bits of the quantum circuit into g global quantum bits and n-g local quantum bits of each stage; wherein the number of the graphics processors is 2gA plurality of; allocation module 2140 for 2's of quantum statesnNumber average assigned to 2gEach graphics processor enables each graphics processor to perform a first stage of data processing based on n-g local qubits and to perform data interaction for the remaining stages of data processing.
The embodiment of the apparatus provided in the embodiment of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.
Fig. 31 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 31, the electronic device may include: a processor (processor)3101, a communication Interface (communication Interface)3102, a memory (memory)3103 and a communication bus 3104, wherein the processor 3101, the communication Interface 3102 and the memory 3103 are communicated with each other via the communication bus 3104. Processor 3101 may call logical instructions in memory 3103 to perform the following method: dividing the quantum circuit based on a mixed division rule to obtain at least one division sub-circuit and a simulation method corresponding to each division sub-circuit; and simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
Furthermore, the logic instructions in the memory 3103 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: dividing the quantum circuit based on a mixed division rule to obtain at least one division sub-circuit and a simulation method corresponding to each division sub-circuit; and simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
The present embodiment provides a computer-readable storage medium, which stores a computer program, where the computer program causes the computer to execute the method provided by the above method embodiments, for example, the method includes: dividing the quantum circuit based on a mixed division rule to obtain at least one division sub-circuit and a simulation method corresponding to each division sub-circuit; and simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (26)

1. A method of simulating a quantum circuit, comprising:
dividing the quantum circuit based on a mixed division rule to obtain at least one division sub-circuit and a simulation method corresponding to each division sub-circuit;
and simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
2. The method of claim 1, wherein the hybrid segmentation rule comprises a first segmentation rule and a second segmentation rule; correspondingly, the simulation method for segmenting the quantum circuit based on the hybrid segmentation rule to obtain at least one segmentation sub-circuit and each segmentation sub-circuit corresponds to comprises the following steps:
dividing the rest circuit according to the first division rule to obtain a first sub-circuit, and dividing the rest circuit according to the second division rule to obtain a second sub-circuit; wherein the remaining circuit is obtained by removing the divided sub-circuit which has been divided from the quantum circuit;
comparing the execution efficiency of the first sub-circuit with the execution efficiency of the second sub-circuit to obtain a first sub-circuit or a second sub-circuit with higher execution efficiency as the dividing sub-circuit; wherein the execution efficiency of the first sub-circuit and the execution efficiency of the second sub-circuit are obtained in advance.
3. The method of claim 2, wherein the first segmentation rule comprises:
obtaining a first set of active qubits from the qubits of the remaining circuits, the first set of active qubits including a first number of active qubits including qubits in a second number of dimensions that are lowest in quantum state; wherein the second number is less than the first number;
taking a quantum gate corresponding to the first active quantum bit set as the first sub-circuit;
performing execution time prediction on the first sub-circuit to obtain the predicted execution time of the first sub-circuit, and counting the number of quantum gates of the first sub-circuit;
obtaining an execution efficiency of the first sub-circuit based on the number of quantum gates and the predicted execution time of the first sub-circuit.
4. The method of claim 3, wherein performing temporal prediction on the first sub-circuit, and wherein obtaining the predicted temporal path of execution for the first sub-circuit comprises:
obtaining a predicted execution time of each quantum gate of the first sub-circuit according to the type of each quantum gate of the first sub-circuit and the number of quantum bits of the quantum circuit;
inquiring to obtain fixed overhead time according to the number of the quantum bits of the quantum circuit;
calculating a sum of the predicted execution time of each quantum gate of the first sub-circuit and the fixed overhead time as the predicted execution time of the first sub-circuit.
5. The method of claim 3, wherein simulating each of the partitioned sub-circuits according to the simulation methodology corresponding to each of the partitioned sub-circuits comprises:
2 for the dividing sub-circuitnThe number value represents that n is the number of quantum bits included by the partitioning sub-circuit;
will 2nNumerical value division into 2n-kA size of 2 for each segmentk(ii) a Wherein k is the number of active qubits in the first active qubit set corresponding to the partitioning sub-circuit;
processing each segment by a thread block of a graphics processor; where each thread block processes a fragment.
6. The method of claim 5, wherein processing each fragment by a thread block of a graphics processor comprises:
and dividing the calculation task corresponding to each segment into a preset number of threads for processing.
7. The method of claim 5, wherein processing each fragment by a thread block of a graphics processor comprises:
acquiring subscript increment of adjacent computing tasks of the thread; wherein each thread is assigned a plurality of computing tasks;
calculating the subscript of the first calculation task of the thread;
calculating the sum of the subscript of the last calculation task of the thread and the subscript increment as the subscript of the current calculation task; wherein the current computing task refers to a computing task subsequent to the first computing task.
8. The method of claim 5, wherein processing each fragment by a thread block of a graphics processor comprises:
according to the formula
Figure FDA0003113470360000021
Carrying out data remapping on the data of each segment to obtain the data of each segment after remapping; wherein i represents the original position of the ith data of each segment, j represents the position of the ith data of each segment after remapping,
Figure FDA0003113470360000022
indicating that the xor is performed in bits,
Figure FDA0003113470360000023
u is a constant and w is the number of bits corresponding to the data type of the data of each segment.
9. The method of claim 2, wherein the second segmentation rule comprises:
obtaining a plurality of second active qubit sets from the qubits of the remaining circuits, the second active qubit sets comprising m active qubits, m being greater than or equal to a third number and less than or equal to a fourth number;
taking the quantum gate corresponding to each second active quantum bit set as a candidate second sub-circuit;
performing execution time prediction on each candidate second sub-circuit to obtain the execution time of each candidate second sub-circuit, and counting the number of quantum gates of each candidate second sub-circuit;
obtaining an execution efficiency of each candidate second sub-circuit based on the number and execution time of the quantum gates of each candidate second sub-circuit;
and comparing the execution efficiency of each candidate second sub-circuit, and acquiring the candidate second sub-circuit with the highest execution efficiency as the second sub-circuit.
10. The method of claim 9, wherein performing the execution time prediction for each candidate second sub-circuit comprises:
inquiring to obtain a first time according to the number of quantum bits of the quantum circuit;
inquiring to obtain a second time according to the number of the quantum bits of the quantum circuit and the number of active quantum bits in a second active quantum bit set corresponding to the candidate second sub-circuit;
and calculating the sum of the first time and the second time as the execution time of the candidate second sub-circuit.
11. The method of claim 9, wherein the simulating each of the divided sub-circuits according to the simulation method corresponding to each of the divided sub-circuits, and obtaining the simulation result of the quantum circuit comprises:
acquiring tensors corresponding to the segmentation sub-circuits;
transposing the tensors corresponding to the dividing sub-circuits to obtain a quantum state matrix and a quantum gate matrix;
and obtaining the simulation result of the segmentation sub-circuit according to the quantum state matrix, the quantum gate matrix and the matrix operation library.
12. The method of any one of claims 1 to 11, further comprising:
dividing a quantum gate of the quantum circuit into quantum gates of a plurality of stages based on a preset division rule and dividing n quantum bits of the quantum circuit into g global quantum bits and n-g local quantum bits of each stage; wherein the number of the graphics processors is 2gA plurality of;
2 of quantum statenNumber average assigned to 2gA graphics processor to controlAnd each graphics processor performs first-stage data processing based on n-g local qubits and performs data interaction to perform other-stage data processing.
13. An analog device of a quantum circuit, comprising:
the quantum circuit comprises a dividing module, a simulation module and a control module, wherein the dividing module is used for dividing the quantum circuit based on a mixed dividing rule to obtain at least one dividing sub-circuit and a simulation method corresponding to each dividing sub-circuit;
and the simulation module is used for simulating each divided sub-circuit according to the simulation method corresponding to each divided sub-circuit to obtain the simulation result of the quantum circuit.
14. The apparatus of claim 13, wherein the hybrid segmentation rule comprises a first segmentation rule and a second segmentation rule; accordingly, the segmentation module comprises:
the division submodule is used for dividing the rest circuit according to the first division rule to obtain a first sub-circuit and dividing the rest circuit according to the second division rule to obtain a second sub-circuit; wherein the remaining circuit is obtained by removing the divided sub-circuit which has been divided from the quantum circuit;
the comparison sub-module is used for comparing the execution efficiency of the first sub-circuit with the execution efficiency of the second sub-circuit to obtain a first sub-circuit or a second sub-circuit with higher execution efficiency as the dividing sub-circuit; wherein the execution efficiency of the first sub-circuit and the execution efficiency of the second sub-circuit are obtained in advance.
15. The apparatus of claim 14, wherein the partitioning sub-module comprises:
a first obtaining unit, configured to obtain a first set of active qubits from the qubits of the remaining circuits, where the first set of active qubits includes a first number of active qubits, and the first number of active qubits includes qubits in a second number of dimensions that are lowest in quantum state; wherein the second number is less than the first number; a first acting unit, configured to act as the first sub-circuit with a quantum gate corresponding to the first active quantum bit set;
the first prediction unit is used for predicting the execution time of the first sub-circuit, obtaining the predicted execution time of the first sub-circuit and counting the number of quantum gates of the first sub-circuit;
a first obtaining unit for obtaining an execution efficiency of the first sub-circuit based on a number of quantum gates and a predicted execution time of the first sub-circuit.
16. The apparatus of claim 15, wherein the first obtaining unit comprises:
a first query subunit, configured to obtain a predicted execution time of each quantum gate of the first sub-circuit according to a type of each quantum gate of the first sub-circuit and a number of qubits of the quantum circuit;
the second inquiry subunit is used for inquiring and obtaining the fixed overhead time according to the number of the quantum bits of the quantum circuit;
a first calculating sub-unit for calculating a sum of the predicted execution time of each quantum gate of the first sub-circuit and the fixed overhead time as the predicted execution time of the first sub-circuit.
17. The apparatus of claim 15, wherein the simulation module comprises:
a representation submodule for dividing the division sub-circuit by 2nThe number value represents that n is the number of quantum bits included by the partitioning sub-circuit;
partitioning sub-modules for dividing 2nNumerical value division into 2n-kA size of 2 for each segmentk(ii) a Wherein k is the number of active qubits in the first active qubit set corresponding to the partitioning sub-circuit;
the processing submodule is used for processing each segment through a thread block of the graphics processor; where each thread block processes a fragment.
18. The apparatus of claim 17, wherein the processing submodule is specifically configured to:
and dividing the calculation task corresponding to each segment into a preset number of threads for processing.
19. The apparatus of claim 17, wherein the processing submodule comprises:
the second acquisition unit is used for acquiring subscript increments of adjacent calculation tasks of the threads; wherein each thread is assigned a plurality of computing tasks;
a computing unit for computing a subscript of a first computing task of the thread;
the second obtaining unit is used for calculating the sum of the subscript of the last calculation task of the thread and the subscript increment to serve as the subscript of the current calculation task; wherein the current computing task refers to a computing task subsequent to the first computing task.
20. The apparatus of claim 17, wherein the processing submodule is specifically configured to:
according to the formula
Figure FDA0003113470360000051
Carrying out data remapping on the data of each segment to obtain the data of each segment after remapping; wherein i represents the original position of the ith data of each segment, j represents the position of the ith data of each segment after remapping,
Figure FDA0003113470360000052
indicating that the xor is performed in bits,
Figure FDA0003113470360000053
u is a constant and w is the number of bits corresponding to the data type of the data of each segment.
21. The apparatus of claim 14, wherein the partitioning sub-module comprises:
a third obtaining unit, configured to obtain a plurality of second active qubit sets from the qubits of the remaining circuits, where each of the second active qubit sets includes m active qubits, and m is greater than or equal to a third number and less than or equal to a fourth number;
a second acting unit, configured to take the quantum gate corresponding to each second active quantum bit set as a candidate second sub-circuit;
the second prediction unit is used for performing execution time prediction on each candidate second sub-circuit, obtaining the execution time of each candidate second sub-circuit and counting the number of quantum gates of each candidate second sub-circuit;
a third obtaining unit configured to obtain an execution efficiency of each candidate second sub-circuit based on the number of quantum gates and the execution time of each candidate second sub-circuit;
and the comparison unit is used for comparing the execution efficiency of each candidate second sub-circuit and acquiring the candidate second sub-circuit with the highest execution efficiency as the second sub-circuit.
22. The apparatus of claim 21, wherein the third obtaining unit comprises:
the third inquiry subunit is used for inquiring and obtaining the first time according to the number of the quantum bits of the quantum circuit;
the fourth inquiry subunit is used for inquiring and obtaining second time according to the number of the quantum bits of the quantum circuit and the number of the active quantum bits in the second active quantum bit set corresponding to the candidate second sub-circuit;
and the second calculating subunit is used for calculating the sum of the first time and the second time as the execution time of the candidate second sub-circuit.
23. The apparatus of claim 21, wherein the simulation module comprises:
the acquisition submodule is used for acquiring tensors corresponding to the segmentation subcircuits;
the transposition submodule is used for transposing the tensors corresponding to the dividing subcircuits to obtain a quantum state matrix and a quantum gate matrix;
and the obtaining submodule is used for obtaining a simulation result of the dividing sub-circuit according to the quantum state matrix, the quantum gate matrix and the matrix operation library.
24. The apparatus of any one of claims 13 to 23, further comprising:
the dividing module is used for dividing the quantum gate of the quantum circuit into a plurality of stages of quantum gates based on a preset dividing rule and dividing n quantum bits of the quantum circuit into g global quantum bits and n-g local quantum bits of each stage; wherein the number of the graphics processors is 2gA plurality of;
an assignment module for assigning 2 of quantum statesnNumber average assigned to 2gEach graphics processor enables each graphics processor to perform a first stage of data processing based on n-g local qubits and to perform data interaction for the remaining stages of data processing.
25. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 12 are implemented by the processor when executing the computer program.
26. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 12.
CN202110657016.XA 2021-06-11 2021-06-11 Quantum circuit simulation method and device Pending CN113569511A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110657016.XA CN113569511A (en) 2021-06-11 2021-06-11 Quantum circuit simulation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110657016.XA CN113569511A (en) 2021-06-11 2021-06-11 Quantum circuit simulation method and device

Publications (1)

Publication Number Publication Date
CN113569511A true CN113569511A (en) 2021-10-29

Family

ID=78162030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110657016.XA Pending CN113569511A (en) 2021-06-11 2021-06-11 Quantum circuit simulation method and device

Country Status (1)

Country Link
CN (1) CN113569511A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115130675A (en) * 2022-09-02 2022-09-30 之江实验室 Multi-amplitude simulation method and device of quantum random circuit
CN115470912A (en) * 2022-03-16 2022-12-13 合肥本源量子计算科技有限责任公司 Quantum task processing device and method and quantum computer
CN116127896A (en) * 2023-04-14 2023-05-16 北京芯愿景软件技术股份有限公司 Method, apparatus, device, medium and product for digitally modeling analog circuits
CN117236457A (en) * 2023-11-13 2023-12-15 国开启科量子技术(安徽)有限公司 Method, system and electronic device for operating and using quantum simulator

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115470912A (en) * 2022-03-16 2022-12-13 合肥本源量子计算科技有限责任公司 Quantum task processing device and method and quantum computer
CN115470912B (en) * 2022-03-16 2024-04-05 本源量子计算科技(合肥)股份有限公司 Quantum task processing device and method and quantum computer
CN115130675A (en) * 2022-09-02 2022-09-30 之江实验室 Multi-amplitude simulation method and device of quantum random circuit
CN115130675B (en) * 2022-09-02 2023-01-24 之江实验室 Multi-amplitude simulation method and device of quantum random circuit
CN116127896A (en) * 2023-04-14 2023-05-16 北京芯愿景软件技术股份有限公司 Method, apparatus, device, medium and product for digitally modeling analog circuits
CN117236457A (en) * 2023-11-13 2023-12-15 国开启科量子技术(安徽)有限公司 Method, system and electronic device for operating and using quantum simulator

Similar Documents

Publication Publication Date Title
CN113569511A (en) Quantum circuit simulation method and device
CN109993299B (en) Data training method and device, storage medium and electronic device
JP7227272B2 (en) Parallel computing architecture with reconfigurable core-level and vector-level parallelism
US20210004684A1 (en) System and method of executing neural networks
Karloff et al. A model of computation for MapReduce
JP7078758B2 (en) Improving machine learning models to improve locality
US8400458B2 (en) Method and system for blocking data on a GPU
US8813091B2 (en) Distribution data structures for locality-guided work stealing
KR102479264B1 (en) Dynamic batching for inference system for transformer-based generation tasks
CN116893904A (en) Memory management method, device, equipment, medium and product of neural network model
CN116644804A (en) Distributed training system, neural network model training method, device and medium
CN108108242B (en) Storage layer intelligent distribution control method based on big data
US20210049496A1 (en) Device and methods for a quantum circuit simulator
Marker et al. Code generation and optimization of distributed-memory dense linear algebra kernels
Afanasyev et al. The comparison of large-scale graph processing algorithms implementation methods for Intel KNL and NVIDIA GPU
KR102498595B1 (en) Selective batching for inference system for transformer-based generation tasks
Gonthier et al. Memory-aware scheduling of tasks sharing data on multiple gpus with dynamic runtime systems
CN113986816A (en) Reconfigurable computing chip
CN114428936A (en) Allocating processing threads for matrix-matrix multiplication
Bilotta et al. Design and implementation of particle systems for meshfree methods with high performance
Gissler et al. Efficient Uniform Grids for Collision Handling in Medical Simulators.
WO2021061172A1 (en) System and method of executing neural networks
US20240185110A1 (en) Distribution of quantum state vector elements across network devices in quantum computing simulation
CN117112145B (en) Training model distribution method, training model distribution device, computer equipment and storage medium
Gunarathne et al. Iterative statistical kernels on contemporary GPUs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination