WO2021168798A1

WO2021168798A1 - Training method for quantum boltzmann machine, and hybrid computer

Info

Publication number: WO2021168798A1
Application number: PCT/CN2020/077208
Authority: WO
Inventors: 张文
Original assignee: 华为技术有限公司
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2021-09-02
Also published as: CN114730385A

Abstract

Provided are a training method for a quantum Boltzmann machine, and a hybrid computer, relating to the field of quantum computing. The method can be applied to semi-supervised learning, and comprises: acquiring a first loss function of a quantum Boltzmann machine; acquiring a first partial derivative of the first loss function with respect to a predetermined parameter of the Hamiltonian of the quantum Boltzmann machine, wherein the predetermined parameter comprises the connection weight of two quantum units in the quantum Boltzmann machine or the bias of the quantum units; and executing a gradient algorithm on the first partial derivative to update the predetermined parameter, and acquiring an updated quantum Boltzmann machine, wherein the Hamiltonian of the updated quantum Boltzmann machine uses the updated predetermined parameter.

Description

A training method of quantum Boltzmann machine and hybrid computer

Technical field

This application relates to the field of quantum computing, in particular to a training method of a quantum Boltzmann machine and a hybrid computer.

Background technique

Quantum machine learning uses the high parallelism of quantum computing to achieve the purpose of further optimizing traditional machine learning. Among them, the quantum Boltzmann machine is a typical quantum machine learning model. At present, the model structure of the quantum Boltzmann machine with supervised learning and the quantum Boltzmann machine with unsupervised learning is not uniform, so it cannot be used for semi-supervised learning.

Summary of the invention

This application provides a method for training a quantum Boltzmann machine and a hybrid computer, which can be used for semi-supervised learning.

In order to achieve the foregoing objectives, the following technical solutions are adopted in the embodiments of this application:

In the first aspect, a training method of a quantum Boltzmann machine is provided. It includes the following steps: Obtain the first loss function of the quantum Boltzmann machine, where the model structure of the quantum Boltzmann machine includes the first layer and the second layer; the quantum unit of the first layer is used to assign the input of the labeled sample Sample, the quantum unit of the second layer is used to assign the output sample of the labeled sample; or the quantum unit of the first layer is used to assign the input sample of the unlabeled sample; the quantum unit of the first layer is fully connected with the quantum unit of the second layer ; The first loss function=α*the second loss function+β*the third loss function, where the second loss function is obtained by calculating the negative logarithmic conditional likelihood of the conditional probability of the output sample under the condition of the input sample of the labeled sample. The three loss function is obtained by calculating the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample; where α and β are constants, and usually it needs to determine the value according to the characteristics of the sample data set. One example is α∈[0,1], β∈[0,1]; obtain the first partial derivative of the first loss function with respect to the predetermined parameter of the quantum Boltzmann machine's Hamiltonian, the predetermined parameter includes the quantum Boltz The connection weight of the two quantum units in the Mann machine or the bias of the quantum unit; perform a gradient algorithm on the first partial derivative to update the predetermined parameters to obtain an updated quantum Boltzmann machine, where the updated quantum Boltzmann machine The Hamiltonian uses the updated predetermined parameter. In the above solution, the model structure of the quantum Boltzmann machine includes a first layer and a second layer; among them, the quantum unit of the first layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used for Output samples assigned with labeled samples, or input samples used to assign unlabeled samples in the first layer; quantum units in the first layer are fully connected with quantum units in the second layer; models for supervised and unsupervised learning The structure is the same, and the total number of qubits required is the same. In addition, the loss function of the training quantum Boltzmann machine adopts the negative logarithmic conditional likelihood of the conditional probability of the output sample under the condition of the input sample of the labeled sample, and the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample. However, the quantum Boltzmann machine obtained by training can be adapted to semi-supervised learning according to a certain ratio.

In a possible design, the calculation method of the second loss function and the third loss function are also provided; the calculation method of the second loss function is as follows: according to the conditional probability of the output sample under the condition of the labeled sample, the negative The logarithmic conditional likelihood is calculated to obtain the supervised learning loss function; the supervised learning loss function is converted into the second loss function using Gordon-Thompson Golden-Thompson inequality. The calculation method of the third loss function is as follows: perform negative logarithmic conditional likelihood calculation according to the marginal probability of the input sample of the unlabeled sample to obtain the loss function of unsupervised learning; use Gordon for the loss function of unsupervised learning -Thompson Golden-Thompson inequality is converted into the third loss function. Among them, since the form of the supervised learning loss function and the unsupervised learning loss function obtained through likelihood calculation will increase the computational complexity of the subsequent processing, the Golden-Thompson inequality conversion is performed on both here.

In a possible design, the first partial derivative is expressed as a polynomial, and the method further includes: determining a predetermined sample from a sample data set, the predetermined sample including the labeled sample or the unlabeled sample; preparing The first quantum state of the predetermined sample; the quantum approximation optimization QAOA algorithm is performed on the first quantum state to obtain the second quantum state; the second partial derivative of the Hamiltonian with respect to the predetermined parameter is measured for the second quantum state as The term of the first partial derivative. In the above process, the predetermined sample determined from the sample data set can be directly processed by the digital computer, and the subsequent processes that need to be processed in the quantum state can all be completed by the quantum computer.

In a possible design, the method further includes: calculating a first average value of the M second partial derivatives obtained by the predetermined sample, and using the first average value as a term of the first partial derivative. In order to improve the accuracy of calculation, M second partial derivatives can be calculated for the same predetermined sample, and the larger the value of M, the higher the calculation accuracy.

In a possible design, the method further includes: calculating a second average value of the second partial derivatives corresponding to the N samples obtained in the sample data set, and using the second average value as the term of the first partial derivative.

In a possible design, when the quantum unit of the first layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used to assign the output sample of the labeled sample, the first layer And the second layer is a visible layer; or, when the quantum unit of the first layer is used to assign input samples of unmarked samples, the first layer is the visible layer, and the second layer is the hidden layer. In this way, for the supervised learning of labeled samples, the first and second layers of the quantum Boltzmann machine are all visible layers, and the input and output are used as the visible layer, and there is no additional hidden layer. For unsupervised learning of unlabeled samples, based on the previous model, the second layer of the output sample is changed from the visible layer to the hidden layer, and no additional hidden layer is introduced. Ensure the unification of the supervised learning model and the unsupervised learning model.

In the second aspect, a hybrid computer is provided for implementing the above-mentioned various methods. The hybrid computer includes modules, units, or means corresponding to the foregoing methods, and the modules, units, or means can be implemented by hardware, software, or by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the above-mentioned functions; for example, a hybrid computer may include a quantum computer and a digital computer for implementing the above-mentioned method.

In a third aspect, a hybrid computer is provided, including: a processor and a memory; the memory is used to store computer instructions, and when the processor executes the instructions, the hybrid computer can execute the method of any one of the foregoing aspects.

In a fourth aspect, a hybrid computer is provided, including: a processor; the processor is configured to be coupled to a memory, and after reading an instruction in the memory, execute the method according to any one of the above aspects according to the instruction.

In a fifth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores instructions that, when run on a computer, enable the computer to execute the method in any of the above aspects.

In a sixth aspect, a computer program product containing instructions is provided, which when running on a computer, enables the computer to execute the method in any of the above aspects.

In a seventh aspect, a hybrid computer is provided (for example, the hybrid computer may be a chip or a chip system), and the hybrid computer includes a processor for implementing the functions involved in any of the above aspects. In a possible design, the hybrid computer further includes a memory for storing necessary program instructions and data. When the hybrid computer is a chip system, it may be composed of chips, or may include chips and other discrete devices.

Among them, the technical effects brought about by any one of the design methods of the second aspect to the seventh aspect can be referred to the technical effects brought about by the different design methods of the above-mentioned first aspect, which will not be repeated here.

Description of the drawings

FIG. 1 is a schematic structural diagram of a hybrid computer provided by an embodiment of this application;

2 is a schematic flowchart of a training method of a quantum Boltzmann machine provided by an embodiment of the application;

FIG. 3 is a schematic structural diagram of a quantum Boltzmann machine provided by an embodiment of the application;

4 is a schematic structural diagram of a quantum Boltzmann machine provided by an embodiment of the application;

FIG. 5 is a schematic structural diagram of a hybrid computer provided by another embodiment of this application.

Detailed ways

First, the technical terms used in the embodiments of this application are described as follows:

Supervised learning: The training data uses labeled samples, and the training data has both features and labels. Usually the input samples are the existing features and the output samples are the labels. Through training, the machine can find the features and labels by itself When facing data with only features and no labels, the label can be judged.

Unsupervised learning: The training data uses unlabeled samples, usually only input samples, and the label information of the input samples is unknown. The goal is to reveal the inherent properties and laws of the data through the learning of unlabeled samples, which is further data Analysis provides the basis. Among such learning tasks, clustering is the most studied and widely used. Other unsupervised algorithms include density estimation, anomaly detection, and so on.

Semi-supervised learning: The training data contains both labeled samples and unlabeled samples, without manual intervention, so that the machine does not rely on external interaction and automatically uses unlabeled samples to improve learning performance, which is semi-supervised learning.

Quantum computing, quantum computing uses the principles of quantum mechanics to perform general-purpose calculations. Classical computers (digital computers) use 0 and 1 to encode, store, and process data in binary, with each bit taking the value 0 or 1. Quantum computing is based on the manipulation of qubits. Each qubit can be in the superposition of the quantum state |0> state and |1> state. N qubits can be in the superposition state of 2N quantum states (|0...0>, |0...1>,..., |1...0>, |1...1> states) ( like

(|0...0>+|0...1>+...+|1...0>+|1...1>)), the manipulation of the superposition state is equivalent to the simultaneous action of the operation In these 2N states, quantum computers have powerful quantum parallel computing capabilities.

Boltzmann machine, Boltzmann machine is a neural network model. It contains two sets of variables: hidden variables and visible variables, all variables are binary (take 0 or 1). A Boltzmann machine with N variables satisfies the following three properties: 1. All variables (samples) can be represented by a binary random vector x∈{0,1} ^N ; 2. All variables are fully connected Yes, the value of each variable depends on all other variables; 3. The influence relationship between the variables is pairwise symmetrical. The joint probability of the variable X conforms to the Boltzmann distribution, P(x)=(1/Z)exp(-E(x)), where Z is the partition function Z=Σ _x exp(-E(x)), energy Function E(x)=-(Σ _i<j w _ij x _i x _j +Σ _i b _i x _i ), where w _ij is the connection weight between two variables x _i and x _j _{, x i} ∈ {0 , 1} denotes a state variable, is a variable x _i B _i is the offset. The loss function used in Boltzmann machine parameter training is negative log likelihood

Where v represents a visible variable,

Represents the actual probability of v in the training data set data, P _v is the marginal probability of the visible variable in the model P _v =(1/Z)Σ _h exp(-E(x)), where h represents the hidden variable. The parameter update formula of the Boltzmann machine is usually not accurately calculated, and it needs to be approximated by the Gibbs sampling method.

The quantum Boltzmann machine can be regarded as a quantum version of the classic Boltzmann machine. The variable in the quantum Boltzmann machine is qubit. The energy function in the classical Boltzmann machine is replaced by the conceptual hamiltonian in quantum mechanics. The Hamiltonian is a quantum mechanics operator, which can be represented by a matrix. For a system of N qubits, the Hamiltonian is a matrix ^{with a dimension of 2 N} × 2 ^N. The eigenvalue of the Hamiltonian is energy, so if the Hamiltonian only has diagonal elements (e.g.

in

Is of dimension 2 ^{^N} × 2 ^N matrix, w _ij and b _i are model parameters), Boltzmann machine of N classical quantum qubit Boltzmann machine and the N variables are equivalent; when Hamiltonian When non-diagonal elements exist (e.g.

in

Are all matrices with dimensions of 2 ^N × 2 ^N,

Are model parameters), the energy function of the classical Boltzmann machine cannot describe all the characteristics of the Hamiltonian, and the quantum Boltzmann machine can describe more complex models than the classical Boltzmann machine. Similar to the classical Boltzmann machine, the joint probability of the N qubit states x of the quantum Boltzmann machine satisfies the Boltzmann distribution P(x)=(1/Z)exp(-<x|H|x >), where Z=tr[exp(-H)] is the partition function, |x> is the quantum state of state x, expressed as a column vector, <x| is the conjugate transpose of |x>, expressed as a row vector . The loss function used in the parameter training of the quantum Boltzmann machine is also negative log likelihood

Marginal probability P _v =(1/Z)tr[Λ _v exp(-H)], where tr() represents the trace of the matrix,

h represents a hidden variable, and I represents the identity matrix. To update the parameters of the quantum Boltzmann machine, the Boltzmann distribution of the model can be obtained by a quantum computer, and sampled, and then the parameter update value is calculated. Some studies have shown that the increase of hidden variables of the quantum Boltzmann machine has a limited impact on its performance improvement, and its performance improvement is more significant by increasing the degree of freedom of the parameters in the Hamiltonian. Simulations and experiments of the quantum Boltzmann machine show that it can complete training faster and/or get a better model.

Quantum approximate optimization algorithm, quantum approximate optimization algorithm (QAOA) is a quantum algorithm, specifically it is a quantum-classical hybrid algorithm that combines classical parameter optimization and quantum computing. We can use QAOA to obtain the Boltzmann distribution of the quantum Boltzmann machine, and then perform sampling and calculations. QAOA involves two operators, called the mixed Hamiltonian H _M and the cost Hamiltonian H _C. QAOA specifically includes the following steps: first prepare a quantum state, its density operator is exp(-βH _M )/tr[exp(-βH _M )], where β is a constant; then perform the operator on the quantum state

Where v _l and γ _l are a series of constants to be optimized and are given random initial values; then the average value of the _{operator H C} _{is measured <H C} >, which is a numerical value; through classical computers and classical optimization methods such as gradient descent Method, etc., optimize v _l and γ _l until the minimum value of <H _C > is obtained, at this time v _l and γ _l take values respectively

with

Then the operator

Acting on the quantum state exp(-βH _M )/tr[exp(-βH _M )] will get the quantum state with the density operator exp(-βH _C )/tr[exp(-βH _{C )].}

In this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. "The following at least one item (item)" or similar expressions refers to any combination of these items, including any combination of single item (item) or plural items (item). For example, at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple . In addition, the embodiments of the present application use words such as "first" and "second" to distinguish objects with similar names or functions or functions. Those skilled in the art can understand the words "first", "second" and the like. There is no limit to the number and execution order.

As shown in FIG. 1, an embodiment of the present application provides a hybrid computer 01, which includes a computing subsystem 20 and a digital computer 10 coupled to the computing subsystem 20. The computing subsystem 20 can provide professional functions. In some embodiments, in the examples provided in this application, the computing subsystem 20 is a quantum computer, and the digital computer 10 is a classical computer. In some embodiments, the quantum computer is a quantum annealing and/or adiabatic quantum computer. In some embodiments, the quantum computer is a gate-model quantum computer or another suitable type of quantum computer.

The digital computer 10 includes one or more digital processors 101, a communication line 102, and at least one communication interface (in FIG. 1 it is only an example that includes a communication interface 104 and a digital processor 101 as an example). Optionally, the memory 103 may also be included.

The digital processor 101 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the execution of the program of this application. Integrated circuits.

The communication line 102 may include a path for connecting different components.

The communication interface 104 may be a transceiver module for communicating with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. For example, the transceiver module may be a device such as a transceiver or a transceiver. Optionally, the communication interface 104 may also be a transceiver circuit located in the digital processor 101 to implement signal input and signal output of the processor.

The memory 103 may be a device having a storage function. For example, it can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions Dynamic storage devices can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage ( Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be stored by a computer Any other media taken, but not limited to this. The memory can exist independently and is connected to the processor through the communication line 102. The memory can also be integrated with the digital processor.

Wherein, the memory 103 is used to store computer-executable instructions for executing the solution of the present application, and the digital processor 101 controls the execution. The digital processor 101 is configured to execute computer execution instructions stored in the memory 103, so as to implement other classical digital processing calculations besides quantum calculation in the training method provided in the embodiment of the present application. The communication interface 104 is responsible for communicating with other devices, which is not specifically limited in the embodiment of the present application.

Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In a specific implementation, as an embodiment, the digital processor 101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 1.

In a specific implementation, as an embodiment, the digital computer 10 may include multiple digital processors, such as the digital processor 101 and the digital processor 108 in FIG. 1. Each of these digital processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The digital processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

In a specific implementation, as an embodiment, the digital computer 10 may further include an output device 105 and an input device 106. The output device 105 communicates with the digital processor 101 and can display information in a variety of ways. For example, the output device 105 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector), etc. The input device 106 communicates with the digital processor 101 and can receive user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like. The above-mentioned digital computer 10 may be a general-purpose device or a special-purpose device. Those skilled in the relevant art will understand that when properly configured or programmed to form a dedicated device, and/or when communicatively coupled to control a quantum computer, other digital computer configurations can be used to practice the system and method of the present invention, including handheld Devices, multi-processor systems, microprocessor-based or programmable consumer electronic devices, personal computers ("PCs"), network PCs, minicomputers, mainframe computers, etc.

In this article, the digital computer 10 will sometimes be referred to in the singular form, but this is not intended to limit the application to a single digital computer. The system and method of the present invention can also be practiced in a distributed computing environment, where tasks or sets of instructions are performed or executed by remote processing devices linked through a communication network. In a distributed computing environment, computer-readable or processor-readable instructions (sometimes referred to as program modules), application programs, and/or data can be located in both local memory storage devices and remote memory storage devices (for example, non-transitory computer storage devices). Read or processor readable media) in both. As shown in FIG. 1, the digital computer 10 is coupled to the computing subsystem 20 through the controller 109, and the controller 109 is coupled to the communication line 102 in the digital computer 10. In some embodiments, the memory 103 may store a set of computer-readable or processor-readable computing instructions (ie, computing modules) to perform pre-processing, co-processing, and post-processing on the computing subsystem 20. According to the system and method of the present invention, the memory 103 can store a set of analog computers or quantum computer interface modules operable to interact with the computing subsystem 20.

In some embodiments, the memory 103 may store related instructions for training of the quantum Boltzmann machine to provide programs and parameters for the operation of the computing subsystem 20 as the quantum Boltzmann machine. For example, the training method of the quantum Boltzmann machine provided by the embodiment of the present application can be implemented on the digital computer 10 and the computing subsystem 20.

The computing subsystem 20 may be set in an isolated environment (not shown). For example, in the case where the computing subsystem 20 is a quantum computer, the environment can shield the internal components of the quantum computer from heat, magnetic fields, and the like. The computing subsystem 20 may include a quantum processor 201.

The quantum processor 201 includes programmable elements such as qubits, couplers, and other devices. The qubits are read through the read control system 202. These results are fed to the memory 103 of the digital computer 10. The qubit is controlled via the qubit control system 203. The coupler is controlled via the coupler control system 204. In some embodiments, the qubit control system 203 and the coupler control system 204 are used to implement quantum annealing as described herein on the quantum processor 201. According to at least some embodiments of the system and apparatus of the present application, the quantum processor may be designed to perform gate-level model quantum calculations. Alternatively or additionally, the quantum processor may be designed to perform quantum annealing and/or adiabatic quantum calculations.

Based on the foregoing hybrid computer, the embodiment of the present application provides a method for training a quantum Boltzmann machine, as shown in FIG. 2, including the following steps:

101. Obtain the first loss function of the quantum Boltzmann machine.

Among them, the model structure of the quantum Boltzmann machine includes a first layer and a second layer; the quantum unit of the first layer is used to assign the input sample of the marked sample, and the quantum unit of the second layer is used to assign the value of the marked sample The output sample; or the quantum unit of the first layer is used to assign the input sample of the unlabeled sample. The quantum unit of the first layer is fully connected with the quantum unit of the second layer. Combining Figure 3 and Figure 4, the model structure of the quantum Boltzmann machine is explained as follows: Figure 3 provides the model structure of the quantum Boltzmann machine for supervised learning, and Figure 4 provides the quantum Boltzmann machine for unsupervised learning The model structure of the man machine. Among them, referring to Figure 3, for the model structure of the quantum Boltzmann machine with supervised learning, the quantum unit of the first layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used to assign the labeled sample. The output samples of the sample, the first layer and the second layer are visible layers; referring to Figure 4, for the model structure of the quantum Boltzmann machine for unsupervised learning, the quantum unit of the first layer is used to assign the value of the unlabeled sample Input samples, the first layer is the visible layer, and the second layer is the hidden layer. Therefore, as shown in Figure 3 for the supervised learning of labeled samples, the first and second layers of the quantum Boltzmann machine are all visible layers, and the input and output are both visible layers, and there is no additional hidden layer. As shown in Figure 4, for unsupervised learning of unlabeled samples, based on the previous model, the output variable is changed from the visible layer to the hidden layer, and no additional hidden layer is introduced. As shown in Figure 3 and Figure 4, the model structure when performing supervised learning and unsupervised learning is the same, and the total number of qubits required is the same.

The form of the Hamiltonian of the quantum Boltzmann machine is not limited. As explained in the background introduction, it can be that only diagonal elements exist (e.g.

It can also have non-diagonal elements (e.g.

The first loss function=α*the second loss function+β*the third loss function, where the second loss function is obtained by calculating the negative logarithmic conditional likelihood of the conditional probability of the output sample under the condition of the input sample of the labeled sample, and the third The loss function is obtained by calculating the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample; exemplary, for the convenience of calculation, the second loss function is obtained in the following manner: according to the input sample condition of the labeled sample The conditional probability of the output sample is calculated by negative logarithmic conditional likelihood to obtain the loss function of supervised learning; the loss function of supervised learning is converted into the second loss function by using Gordon-Thompson Golden-Thompson inequality. The third loss function is obtained in the following way: according to the marginal probability of the input sample of the unlabeled sample, the negative logarithmic conditional likelihood calculation is performed to obtain the loss function of the unsupervised learning; the loss function of the unsupervised learning is used Gordon-Thompson Golden- The Thompson inequality is converted into the third loss function.

The method for obtaining the first loss function is described as follows: The specific form of the Hamiltonian H of the quantum Boltzmann machine is not limited. For labeled samples, include input sample x and output sample y.

The marginal probability of the input sample x in the quantum Boltzmann machine model is

The joint probability of input sample x and output sample y is

Then the conditional probability of output sample y under the condition that the input sample is x is

Among them, for a quantum Boltzmann machine with a total number of samples N, the Hamiltonian H is a matrix of 2N×2N;

|> and <| are the right and left symbols of Dirac in quantum mechanics, respectively. Suppose there are n and Nn input samples x and output samples y, respectively, then |x> and <x| respectively represent 2 ⁿ -dimensional columns Vector and row vector, I _y represents a 2 ^Nn -dimensional identity matrix,

Is a tensor product symbol, and Λ _x is also a 2 ^N × 2 ^N matrix; the same is true

H _x =H-lnΛ _x . For supervised learning with labeled samples, the loss function is negative logarithmic conditional likelihood,

Where D ^lab represents a data set with labeled samples,

Represents the joint probability of x and y in the data set with labeled samples. For unsupervised learning of unlabeled samples, the loss function is negative log likelihood,

Where D ^unlab represents a data set of unlabeled samples,

Represents the probability of x in the data set of unlabeled samples. The above two likelihood functions are not convenient for subsequent calculation and processing, so the Golden-Thompson inequality is used to take

As the loss function of supervised learning (that is, the second loss function), take

As the loss function of unsupervised learning (that is, the third loss function), where H _x,y =H-lnΛ _x -lnΛ _y . The overall loss function of semi-supervised learning (that is, the first loss function) is obtained by adding the above two loss functions in a certain proportion, namely

Among them, α and β are unrestricted constants. An example is α ∈ [0, 1], β ∈ [0, 1]. Usually, it needs to determine the value according to the characteristics of the sample data set, for example, α is 0. When β is not 0, it is used as the loss function of unsupervised learning. When α is not 0, when β is 0, it is used as the loss function of supervised learning. When both α and β are not 0, it is used as the loss function of semi-supervised learning.

102. Obtain the first partial derivative of the first loss function with respect to a predetermined parameter of the quantum Boltzmann machine's Hamiltonian, where the predetermined parameter includes the connection weight of two quantum units in the quantum Boltzmann machine or the weight of the quantum unit Bias.

Let θ represent any parameter in the Hamiltonian of the quantum Boltzmann machine (e.g.

In w _ij and b _i ), there are

and

So

And can be written as a polynomial:

Among them, the polynomial includes the following four terms:

The four items of are respectively calculated by a hybrid computer of a quantum computer and a classical computer. Specifically, a method for calculating each item in the first partial derivative is provided:

S01. Determine a predetermined sample from a sample data set, where the predetermined sample includes a labeled sample or an unlabeled sample.

S02. Prepare the first quantum state of the predetermined sample;

S03. Perform quantum approximation optimization QAOA algorithm on the first quantum state to obtain the second quantum state;

S04. Measure the second partial derivative of the Hamiltonian with respect to the predetermined parameter for the second quantum state as the term of the first partial derivative.

In order to improve the calculation accuracy, it also includes S05, calculating the first average value of the M second partial derivatives obtained from the predetermined sample, and using the first average value as the term of the first partial derivative. The larger the value of M, the higher the calculation accuracy.

In addition, it is necessary to calculate all samples in the sample data set, which also includes: S06 calculates the second average value of the second partial derivatives corresponding to the N samples obtained in the sample data set, and uses the second average value as the first partial derivative Item.

Among them, the above step S01 can be calculated by a digital computer, and S02-S04 can be calculated by a quantum computer; in step S05, the first average value of M second partial derivatives obtained by a predetermined sample can be calculated by a digital computer, and In step S05, it is necessary to repeat the steps S02-S04 for each predetermined sample obtained. In addition, in step S06, the second average of the second partial derivatives corresponding to the N samples obtained in the sample data set can be calculated by a digital computer, and in step S06, steps S02-S04 need to be repeated for each sample. Or repeat steps S02-S05.

Below, specifically for the above

The description of obtaining the four items of is as follows:

1) For

The description of the acquisition process is as follows:

S1. Select a sample (x _i , y _i ) from the data set of labeled samples, and determine from this

form,

S2. Before the execution of QAOA, when preparing the initial state, the quantum state of the qubit corresponding to the _{sample y i is} _{prepared according to the density matrix exp(-βH M} )/tr[exp(-βH _M )], and the sample x _i corresponds to The quantum state of the qubit is prepared as a quantum state |x _i > according to the sample in S1.

S3. Define the cost Hamiltonian in the QAOA algorithm and set it to the value in S1

The mixed Hamiltonian is

in

Is a matrix with a dimension of 2 ^N × 2 ^N , N is the total number of input samples x and output samples y, and is also the total number of qubits required to execute the QAOA algorithm, n is the number of input samples y,

Then follow the remaining steps of the QAOA algorithm to complete the execution of the QAOA algorithm. After QAOA is completed, the quantum state will be obtained

Operator to this quantum state

Measurement, get

S4, repeat S2 and S3, get the results of multiple measurements _{on the sample (x i} , _{yi ), and then calculate}

M is the number of repetitions of steps S2 and S3. The value of M is not limited. The larger M, the higher the calculation accuracy.

S5. Repeat steps S1-S4. In S1, each time a different sample is obtained from the data set of labeled samples, until all the labeled samples are obtained.

Then calculated

N ^lab is the number of labeled samples and the number of repetitions of steps S1-S4.

2)

The description of the acquisition process is as follows:

S1. Select a sample x _i from the unlabeled sample data set, and determine from this

form,

For unsupervised learning, the output sample y represents the hidden layer variable.

S2. Before the execution of QAOA, when preparing the initial state, the quantum state of the qubit corresponding to the _{sample x i} _{is prepared as a quantum state |x i} > according to the sample in S1. The quantum state of the qubit corresponding to the sample y is prepared according to the density matrix exp(-βH _M )/tr[exp(-βH _M )].

The mixed Hamiltonian is

Then follow the remaining steps of QAOA to complete the QAOA algorithm, and after completing the QAOA algorithm to get the quantum state

Operator to this quantum state

Measurement of

Measured

S4. Repeat S2 and S3, _{get the results of multiple measurements on the sample x i} , and then calculate the results of repeated QAOA algorithm and measurement multiple times, and calculate

S5. Repeat steps S1-S4. In S1, each time a different sample is obtained from the data set of unlabeled samples, until all unlabeled samples are obtained.

Then calculated

N ^unlab is the number of unlabeled samples and the number of repetitions of steps S1-S4.

3)

The description of the acquisition process is as follows:

S1, execute the QAOA algorithm, where the cost Hamiltonian is set to H, and the mixed Hamiltonian is

After the QAOA algorithm is completed, perform an operator on the obtained quantum state

Measurement, get

S2, repeat step S1, get the results of multiple measurements, and then calculate

It should be noted that if the cost Hamiltonian and the mixed Hamiltonian in the QAOA algorithm are usually specified, then the QAOA algorithm can be executed. When the QAOA algorithm is executed, the sample must first be prepared to a quantum state related to the mixed Hamiltonian. But calculate

In, the quantum state of sample x is prepared in advance as |x _i >, instead of preparing according to the mixed Hamiltonian of the QAOA algorithm, the sample y is prepared according to the standard procedure of the QAOA algorithm for the initial state. While calculating

In the fourth item, since the fourth item has nothing to do with the sample, it is possible to prepare all qubits to a quantum state related to the mixed Hamiltonian at the beginning of the execution according to the general rules of the QAOA algorithm, and then continue to execute the QAOA algorithm.

The acquisition process did not elaborate on sample selection and quantum state preparation.

4)

The acquisition process is as follows:

S1, calculate for each labeled sample

Take the average of the results of all labeled samples to get

in,

The acquisition process can be completely implemented on a digital computer.

Finally, the calculation results of each item obtained according to the above 1), 2), 3) and 4) are calculated by a digital computer

103. Perform a gradient algorithm on the first partial derivative to update a predetermined parameter to obtain an updated quantum Boltzmann machine, where the Hamiltonian of the updated quantum Boltzmann machine uses the updated predetermined parameter.

In step 103, the gradient descent method or the ascending method is specifically applied to the first partial derivative to update the predetermined parameters, and the model training is completed.

In the above solution, the model structure of the quantum Boltzmann machine includes a first layer and a second layer; among them, the quantum unit of the first layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used for Output samples assigned with labeled samples, or input samples used to assign unlabeled samples in the first layer; quantum units in the first layer are fully connected with quantum units in the second layer; models for supervised and unsupervised learning The structure is the same, and the total number of qubits required is the same. In addition, the loss function of the training quantum Boltzmann machine adopts the negative logarithmic conditional likelihood of the conditional probability of the output sample under the condition of the input sample of the labeled sample, and the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample. However, the quantum Boltzmann machine obtained by training can be adapted to semi-supervised learning according to a certain ratio.

It can be understood that, in the above embodiments, the methods and/or steps implemented by the hybrid computer can also be implemented by components (such as chips or circuits) that can be used in the hybrid computer.

The foregoing mainly introduces the solution provided by the embodiment of the present application from the perspective of the method flow implemented by the hybrid computer. Correspondingly, an embodiment of the present application also provides a hybrid computer, which is used to implement the above-mentioned various methods. It can be understood that, in order to realize the above-mentioned functions, the hybrid computer includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The embodiments of the present application may divide the functional modules of the hybrid computer according to the foregoing method embodiments. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.

FIG. 5 shows a schematic diagram of the structure of a hybrid computer 5. The hybrid computer includes: a digital computing unit 51 and a quantum computing unit 52.

The digital computing unit 51 is configured to obtain the first loss function of the quantum Boltzmann machine, wherein the model structure of the quantum Boltzmann machine includes a first layer and a second layer; the quantum unit of the first layer The quantum unit of the second layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used to assign the output sample of the labeled sample; or the quantum unit of the first layer is used to assign the input sample of the unlabeled sample; The quantum unit of one layer is fully connected with the quantum unit of the second layer; the first loss function=α*the second loss function+β*the third loss function, wherein the second loss function is marked by the The negative logarithmic conditional likelihood of the conditional probability of the output sample under the input sample condition of the sample is calculated, and the third loss function is calculated by the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample; α; And β are constants; the quantum computing unit 52 acquires the first partial derivative of the first loss function acquired by the digital computing unit 51 with respect to the predetermined parameter of the Hamiltonian of the model of the quantum Boltzmann machine, so The predetermined parameter includes the connection weight of the two quantum units in the model of the quantum Boltzmann machine or the bias of the quantum unit; the digital computing unit 51 is also used to compare the quantum computing unit 52 The first partial derivative executes a gradient algorithm to update the predetermined parameter to obtain the updated quantum Boltzmann machine, wherein the Hamiltonian of the updated quantum Boltzmann machine uses the updated predetermined parameter.

Optionally, the digital calculation unit 51 is further configured to perform negative logarithmic conditional likelihood calculation according to the conditional probability of the output sample under the condition of the input sample of the labeled sample, and obtain the loss function of supervised learning; The loss function of the supervised learning is converted into the second loss function using the Golden-Thompson Golden-Thompson inequality.

Optionally, the digital calculation unit 51 is further configured to perform negative log conditional likelihood calculation according to the marginal probability of the input sample of the unlabeled sample to obtain the loss function of unsupervised learning; The function is converted into the third loss function using the Gordon-Thompson Golden-Thompson inequality.

Optionally, the first partial derivative is expressed as a polynomial, and the digital calculation unit 51 is configured to determine a predetermined sample from a sample data set, and the predetermined sample includes the labeled sample or the unlabeled sample; the quantum The calculation unit 52 is configured to prepare the first quantum state of the predetermined sample determined by the digital calculation unit; execute the QAOA algorithm on the first quantum state to obtain the second quantum state; measure the Hamiltonian for the second quantum state The second partial derivative of the quantity with respect to a predetermined parameter is used as the term of the first partial derivative.

Optionally, the digital calculation unit 51 is further configured to calculate a first average value of the M second partial derivatives obtained by the predetermined sample, and use the first average value as a term of the first partial derivative .

Optionally, the digital calculation unit 51 is further configured to calculate a second average value of the second partial derivatives corresponding to the N samples obtained in a predetermined sample set, and use the second average value as the first partial derivative Item.

Optionally, when the quantum unit of the first layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used to assign the output sample of the labeled sample, the first layer and the first layer The second layer is the visible layer; or, when the quantum unit of the first layer is used to assign input samples of unmarked samples, the first layer is the visible layer, and the second layer is the hidden layer.

Among them, all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.

In this embodiment, the hybrid computer is presented in the form of dividing various functional modules in an integrated manner. The "module" here can refer to a specific ASIC, circuit, processor and memory that executes one or more software or firmware programs, integrated logic circuit, and/or other devices that can provide the above-mentioned functions. In a simple embodiment, those skilled in the art can imagine that the hybrid computer can take the form of the hybrid computer shown in FIG. 1.

For example, the digital processor 101 and the computing subsystem 20 in the hybrid computer 01 shown in FIG. 1 can call the computer execution instructions stored in the memory 103, so that the hybrid computer 01 executes the method in the foregoing method embodiment; the computing subsystem 20 It can be a quantum computer.

Specifically, the function/implementation process of the digital computing unit 51 in FIG. 5 can be realized by the digital computer in the hybrid computer 01 shown in FIG. The function/implementation process of the quantum computing unit 52 can be implemented by the quantum computer in the hybrid computer 01 shown in FIG. Since the hybrid computer 01 provided in this embodiment can execute the above-mentioned method, the technical effects that can be obtained can refer to the above-mentioned method embodiment, which will not be repeated here.

Optionally, the embodiments of the present application also provide a hybrid computer (for example, the hybrid computer may be a chip or a chip system), the hybrid computer includes a processor and an interface, and the processor is used to read instructions to perform any of the above methods. The method in the embodiment. In one possible design, the hybrid computer also includes memory. The memory is used to store necessary program instructions and data, and the processor can call the program code stored in the memory to instruct the hybrid computer to execute the method in any of the foregoing method embodiments. Of course, the memory may not be in the computing device. When the hybrid computer is a chip system, it may be composed of chips, or may include chips and other discrete devices, which are not specifically limited in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or includes one or more data storage devices such as servers, data centers, etc. that can be integrated with the medium. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)). In the embodiment of the present application, the computer may include the aforementioned device.

Although this application has been described in conjunction with various embodiments, in the process of implementing the claimed application, those skilled in the art can understand and understand by viewing the drawings, the disclosure, and the appended claims in the process of implementing the claimed application. Implement other changes of the disclosed embodiment. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "one" does not exclude a plurality. A single processor or other unit may implement several functions listed in the claims. Certain measures are described in mutually different dependent claims, but this does not mean that these measures cannot be combined to produce good results.

Although the application has been described in combination with specific features and embodiments, it is obvious that various modifications and combinations can be made without departing from the spirit and scope of the application. Correspondingly, the specification and drawings are merely exemplary descriptions of the application as defined by the appended claims, and are deemed to cover any and all modifications, changes, combinations or equivalents within the scope of the application. Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of this application fall within the scope of the claims of this application and their equivalent technologies, then this application is also intended to include these modifications and variations.

Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A training method of a quantum Boltzmann machine is characterized in that it comprises:

Obtain the first loss function of the quantum Boltzmann machine, wherein the model structure of the quantum Boltzmann machine includes the first layer and the second layer; the quantum unit of the first layer is used to assign the input of the labeled sample Sample, the quantum unit of the second layer is used to assign the output sample of the labeled sample; or the quantum unit of the first layer is used to assign the input sample of the unlabeled sample; the quantum unit of the first layer and the The quantum unit of the second layer is fully connected; the first loss function=α*the second loss function+β*the third loss function, wherein the second loss function is the output sample under the condition of the input sample of the labeled sample The negative logarithmic conditional likelihood of the conditional probability of is calculated, and the third loss function is calculated by the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample; α and β are constants;

Obtain a first partial derivative of the first loss function to a predetermined parameter of the Hamiltonian of the quantum Boltzmann machine, where the predetermined parameter includes the connection weight of two quantum units in the quantum Boltzmann machine Or the bias of the quantum unit;

Perform a gradient algorithm on the first partial derivative to update the predetermined parameter to obtain an updated quantum Boltzmann machine, wherein the Hamiltonian of the updated quantum Boltzmann machine uses the updated predetermined parameter .
The training method of a quantum Boltzmann machine according to claim 1, wherein the method further comprises:

Perform negative logarithmic conditional likelihood calculation according to the conditional probability of the output sample under the condition of the input sample of the labeled sample, and obtain the supervised learning loss function;

The loss function of the supervised learning is converted into the second loss function by using Gordon-Thompson Golden-Thompson inequality.
The training method of a quantum Boltzmann machine according to claim 1, wherein the method further comprises:

Performing negative logarithmic conditional likelihood calculation according to the marginal probability of the input sample of the unlabeled sample to obtain the loss function of unsupervised learning;

The loss function of the unsupervised learning is converted into the third loss function by using Gordon-Thompson Golden-Thompson inequality.
The method for training a quantum Boltzmann machine according to claim 1, wherein the first partial derivative is expressed as a polynomial, and the method further comprises:

Determining a predetermined sample from a sample data set, where the predetermined sample includes the labeled sample or the unlabeled sample;

Preparing the first quantum state of the predetermined sample;

Performing a quantum approximation optimization QAOA algorithm on the first quantum state to obtain a second quantum state;

The second partial derivative of the Hamiltonian with respect to a predetermined parameter is measured for the second quantum state as a term of the first partial derivative.
The training method of a quantum Boltzmann machine according to claim 4, further comprising:

Calculate the first average value of the M second partial derivatives obtained by the predetermined sample, and use the first average value as the term of the first partial derivative.
The training method of a quantum Boltzmann machine according to claim 4 or 5, characterized in that it further comprises:

A second average value is calculated for the second partial derivatives corresponding to the N samples obtained in the sample data set, and the second average value is used as a term of the first partial derivative.
The training method of a quantum Boltzmann machine according to claim 1, wherein the quantum unit of the first layer is used for assigning input samples with labeled samples, and the quantum unit of the second layer is used for assigning values When there are output samples of labeled samples, the first layer and the second layer are visible layers;

Alternatively, when the quantum unit of the first layer is used to assign input samples of unmarked samples, the first layer is a visible layer and the second layer is a hidden layer.
A hybrid computer, characterized in that it includes:

A digital computer for obtaining the first loss function of a quantum Boltzmann machine, wherein the model structure of the quantum Boltzmann machine includes a first layer and a second layer; the quantum unit of the first layer is used for Input samples assigned with labeled samples, the quantum unit of the second layer is used to assign output samples with labeled samples; or the quantum unit of the first layer is used to assign input samples of unlabeled samples; the first layer The quantum unit of the second layer is fully connected with the quantum unit of the second layer; the first loss function=α*the second loss function+β*the third loss function, wherein the second loss function is determined by the labeled sample The negative logarithmic conditional likelihood of the conditional probability of the output sample under the input sample condition is calculated, and the third loss function is calculated by the negative logarithmic conditional likelihood of the marginal probability of the input sample of the unlabeled sample; α and β Is a constant

A quantum computer to obtain a first partial derivative of a predetermined parameter of the Hamiltonian of the quantum Boltzmann machine with respect to the first loss function acquired by the digital computer, the predetermined parameter including the quantum Boltzmann The connection weight of the two quantum units in the machine or the bias of the quantum unit;

The digital computer is further configured to perform a gradient algorithm on the first partial derivative obtained by the hybrid computer to update the predetermined parameters, and obtain the updated quantum Boltzmann machine, wherein the updated quantum Boltzmann machine The Hamiltonian of the Manchester machine uses the updated predetermined parameter.
The hybrid computer according to claim 8, wherein the digital computer is further configured to perform negative logarithmic conditional likelihood calculation according to the conditional probability of the output sample under the condition of the input sample of the labeled sample to obtain supervised learning The loss function of; the loss function of the supervised learning is converted into the second loss function using the Gordon-Thompson Golden-Thompson inequality.
The hybrid computer according to claim 8, wherein the digital computer is further configured to perform negative logarithmic conditional likelihood calculation according to the marginal probability of the input sample of the unlabeled sample to obtain the loss function of unsupervised learning; The loss function of the unsupervised learning is converted into the third loss function by using Gordon-Thompson Golden-Thompson inequality.
The hybrid computer according to claim 8, wherein the first partial derivative is expressed as a polynomial, and the digital computer is further configured to determine a predetermined sample from a sample data set, and the predetermined sample includes the labeled sample or The unlabeled sample; the quantum computer is used to prepare the first quantum state of the predetermined sample determined by the digital computer; the QAOA algorithm is executed on the first quantum state to obtain the second quantum state; The second partial derivative of the two-quantum state measurement Hamiltonian with respect to a predetermined parameter is used as the term of the first partial derivative.
The hybrid computer of claim 11, wherein:

The digital computer is further configured to obtain a first average value of the M second partial derivatives obtained by the predetermined sample, and use the first average value as a term of the first partial derivative.
The hybrid computer according to claim 11 or 12, wherein the digital computer is further configured to calculate a second average value of the second partial derivatives corresponding to the N samples obtained in a predetermined sample set, and calculate the The two average values are used as the term of the first partial derivative.
The hybrid computer according to claim 8, wherein the quantum unit of the first layer is used to assign the input sample of the labeled sample, and the quantum unit of the second layer is used to assign the output sample of the labeled sample. , The first layer and the second layer are visible layers;

Alternatively, when the quantum unit of the first layer is used to assign input samples of unmarked samples, the first layer is a visible layer and the second layer is a hidden layer.
A hybrid computer characterized by comprising: a processor and a memory;

The memory is used to store computer-executed instructions, and when the processor executes the computer-executed instructions, so that the hybrid computer executes the method according to any one of claims 1-7.
A chip, characterized in that it includes a processor and an interface;

The processor is configured to read instructions to execute the method according to any one of claims 1 to 7.
A computer-readable storage medium, characterized by comprising instructions, which when run on a computer, causes the computer to execute the method according to any one of claims 1-7.