CN109154798A

CN109154798A - For improving the method and system of the strategy of Stochastic Control Problem

Info

Publication number: CN109154798A
Application number: CN201780028555.9A
Authority: CN
Inventors: 丹尼尔·克劳福德; 普亚·罗纳格; 安娜·莱维特
Original assignee: 1QB Information Technologies Inc
Current assignee: 1QB Information Technologies Inc
Priority date: 2016-05-09
Filing date: 2017-05-09
Publication date: 2019-01-04
Anticipated expiration: 2037-05-09
Also published as: WO2017195114A1; CA3022167A1; GB2569702A; US11017289B2; CN109154798B; JP2019515397A; GB201819448D0; US20170323195A1; CA3022167C; JP6646763B2

Abstract

It discloses a kind of for improving the method and system of the strategy of Stochastic Control Problem, Stochastic Control Problem by set of actions, state set, as the incentive structure of state and the function of movement, and multiple decision period characterizations, this method includes that the data for indicating the sample configuration of Boltzmann machine are obtained using sampling apparatus, obtains the initialization data and initial policy of Stochastic Control Problem；The data of the initial weight of each coupler for respectively indicating Boltzmann machine and each node and biasing and lateral field strength are distributed into sampling apparatus；It performs the following operation until meeting stopping criterion: generating current period state action pair, modification indicates the data of no coupler or at least one coupler and at least one biasing, the sampling for corresponding to current period state action pair is executed to obtain the first sampling empirical mean, obtain the approximation of the value of the Q function at current period state action, obtain future period state action pair, wherein, state is to handle to obtain by stochastic regime, and further wherein, acquisition movement includes to multiple all state actions including future period state and any possible movement to execution random optimization test, to be acted in future period offer and update the strategy of future period state；Modification indicates the data of no coupler or at least one coupler and at least one biasing, execute the sampling for corresponding to future period state action pair, the approximation for obtaining the value of the Q function at future period state action, updates each weight and each biasing, and strategy is provided when meeting stopping criterion.

Description

For improving the method and system of the strategy of Stochastic Control Problem

Cross reference to related applications

The U.S. Provisional Patent Application No.62/333,707's that patent application claims are submitted on May 9th, 2016 is preferential Power.

Technical field

The present invention relates to calculating.More precisely, the invention relate to improve the method for the strategy of Stochastic Control Problem And system.

Background technique

Markovian decision processing

Stochastic Control Problem is intended to design a kind of strategy and is developed to control by random process with the system of maximum utility State.

It is the certain types of Stochastic Control Problem for meeting markov attribute that Markovian decision, which handles (MDP),.

Markovian decision processing is widely used in simulating the Sequential Decision done under uncertain condition.

Many problems are related to Markovian decision processing, as population harvest (population harvesting), control fill Water resource, the equipment replacement of any industry, the Portfolio Optimization of finance and investment, queuing theory and the operation irrigate and generated electricity are ground Management, quarantine and treatment level, life are overbooked in scheduling, generation credit and insurance policies, health and the pharmacy application studied carefully At Motion, emergency response vehicle location.

In fact, a given system with some intrinsic random evolutions, when these decisions may influence system, certainly How does plan person determine to maximize some utility functions dependent on system within multiple periods?

In form, Markovian decision processing can be defined by following four part.

1. one group of decision period T={ n, n+1 ..., m }, wherein m can be limited or unlimited.It should be appreciated that should Group decision period indicates to have to make as one group of time of decision.For example, being involved in the problems, such as that Markovian decision processing is equipment In the case where replacement, this group of decision period can be continuous use equipment daily.

2. state space S.It should be appreciated that any state in state space all includes the data for indicating realization system.Example Such as, in the case where being involved in the problems, such as Markovian decision processing is equipment replacement problem, state space can be expression equipment Situation one group of integer.

3. the space A of action.It should be appreciated that any movement in motion space all includes can control for expression system Data.For example, motion space may include in the case where being involved in the problems, such as Markovian decision processing is equipment replacement problem Two movements, replace or are changed without equipment.

4. instantaneously reward (instantaneous rewards)It should be appreciated that instantaneous prize Encourage the result for indicating to take action when system is in the given state in given decision period.For example, determining being related to markov In the case that the problem of plan processing is equipment replacement problem, if movement is the equipment that replacement indicates equipment replacement cost, wink When reward can be negative integer, be otherwise positive integer.When running under the conditions of device is preferably, positive integer is bigger.

It should be appreciated that transition probabilityIt is from given state to the general of the transformation of another given state Rate.The markov attribute of Markovian decision processing can be write as:

For example, being involved in the problems, such as that Markovian decision processing is equipment replacement problem and device has the (event of 3 kinds of situations Hinder, be poor, good) in the case where, transition probability can be unrelated with the time and be provided by transition probability matrix:

With

5. discount factor γ ∈ [0,1).It should be appreciated that discount factor indicates importance between the following reward and current reward Difference.

Policy definition is function alpha: S × T → A.It will thus be appreciated that strategy is to distribute movement in each decision period To the state of system.For example, strategy can be in the case where being involved in the problems, such as Markovian decision processing is equipment replacement problem It is the only more changing device when equipment is in failure situations, is otherwise changed without device.

It will further be appreciated by those of ordinary skill in the art that utility function can be defined asWherein, in given original state s_nUnder conditions of tactful α, Summand is the discount desired value of the following reward.Therefore, it will be appreciated by those skilled in the art that policymaker may want to maximize effect With function, that is, findThis, which in turn means, finds optimal policy: α^*=arg max_αU_n(s_n, α).

It will be understood by those skilled in the art that when utility function carries out maximum to the action to be taken within current decision period When changing and continuing to use optimal policy, which will be referred to as Q function, and can be written asAnd work as Q (s_n, a_n) in a_nWhen upper maximization, we obtain optimal State action pair.

It may be very troublesome it should be appreciated that finding optimal policy.In fact, when state, movement and/or decision period collection become Must be too big, or when transition probability is unknown, it may be problematic for finding the solution of Markovian decision processing problem.

In the literature, the algorithm that the lower bound of the computation complexity of algorithm is exponentially increased relative to the dimension of problem is referred to as The algorithm of (curse of dimensionality) is limited by dimension.Solve the problems, such as the common side of Markovian decision processing Method is value iterative method [Richard Bellman, " A Markovian Decision Process ", Journal of Mathematics and Mechanics, Vol.6, No.5 (1957)], it has exponential complexity under normal circumstances；That is Ω (2^d), wherein d indicates the dimension of Markovian decision processing problem.

There are many methods to overcome dimension restricted problem, such as the learning-oriented method of Q [Richard S.Sutton, Andrew G.Barto].However, these methods need to store the value of the Q function of all possible state action pair, this is for certain problems Range become infeasible.In order to overcome this disadvantage, a kind of Q function parameter method (example neural network based is proposed Such as [Sallans, B., Hinton, G.E., Reinforcement Learning with Factored States and Actions, Journal of Machine Learning Research 5,1063-1088,2004]), but this is related to training Neural network, training neural network need to be fitted neural network, this is an independent matter of opening, and in certain situations Under, it needs that neural network is trained to need to solve the problems, such as NP difficulty.

Therefore, it is necessary to a kind of method for improvement strategy, which will overcome at least one of disadvantages mentioned above.

Artificial neural network

Artificial neural network (ANN) is the computation model inspired by biological neural network and the approximation for being used for function.Manually Neural network is indicated with graph theory, wherein the node of figure is also referred to as neuron, and edge is also referred to as cynapse.

Common Boltzmann machine (GBM) is a kind of artificial neural network, wherein neuron indicates there is line connected to it Property biasing stochastic variable, each cynapse expression between two neurons is related to the secondary of stochastic variable relevant to neuron ?.Specifically, there is global energy function relevant to common Boltzmann machine, by from all linear terms and quadratic term Contribution composition.

Therefore, common Boltzmann machine is the graphical model for the Joint Distribution of approximate dependent variable.Figure includes accordingly The node of referred to as visible node (or input variable), and the invisible node of referred to as concealed nodes (or latent variable).Commonly Boltzmann machine is exploited for indicating and solving certain combinatorial problems, and may be used as probability machine learning tool.Commonly The application program of Boltzmann machine includes but is not limited to visual object and speech recognition, classification, recurrence task, dimensionality reduction, information inspection Rope and image reconstruction.About the general introduction of common Boltzmann machine, referring to D.Ackley, G.Hinton, T.Sejnowski, " A Learning Algorithm for Boltzmann Machines, " Cognitive Science 9,147-169 (1985).

Distribution in common Boltzmann machine is approximately the node by the way that interested dependent variable to be encoded to larger figure It is performed.These nodes are visible nodes, and every other node is all concealed nodes.In the graphic be respectively each side and Distribute weight and biasing on vertex, and energy function is distributed to figure according to these weights and biasing.

Common Boltzmann machine with any connection not yet proves to be particularly useful in machine learning meaning.This is Since approximate learning method is very slow.When carrying out certain limitations to the connection between concealed nodes, common Boltzmann machine mind Through network become easier to training and to machine learning task it is useful.When not allowing the connection between concealed nodes and do not permit When connection perhaps between visible node, obtained neural network is referred to as limited Boltzmann machine (RBM), only by one Visible layer and a hidden layer composition.

In the case where no inside is visible or internal hiding node connection, effective training algorithm has been developed, It is by easily learning the probability distribution in one group of input in visible layer, so that limited Boltzmann machine is led in machine learning It is showed in domain good.In relation to application, algorithm and theory, Section 6 of Y.Bengio et al., " Representation are please referred to Learning:A Review and New Perspectives ", arXiv 2014-(http://www.cl.uni- heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf)。

To create more effective neural network (referred to as deepness belief network (DBN)), it is limited the idea of Boltzmann machine By diversification.Deepness belief network is created by stacking on top of each other limited Boltzmann machine, so that the first limited glass The hidden layer of the graceful machine of Wurz is used as the visible layer of the second limited Boltzmann machine, and the hidden layer of the second limited Boltzmann machine is used as Third is limited the visible layer of Boltzmann machine, and so on.This structure is widely studied, and is the basis of deep learning.It is this The advantages of structure, is that network weight and biasing can train limited Boltzmann by limited Boltzmann machine from top to down Machine uses the identical training algorithm for independent limited Boltzmann machine exploitation.Application journey in relation to deepness belief network behind Sequence, algorithm and theory, please refer to:http://neuralnetworksanddeeplearning.com/chap6.html。

The method that the limited Boltzmann machine of deepness belief network is trained with limited Boltzmann machine is with each limited Accumulated error caused by the APPROXIMATE DISTRIBUTION of Boltzmann machine is cost.Another method of this neural network of training is handle It updates in same an iteration rather than successively all weights as common Boltzmann machine.Applied to this structure Method be known as depth Boltzmann machine (DBM).

Quantum processor

Quantum processor is the quantum-mechanical system of multiple quantum bits, and measurement on it will obtain carrying out the complete of free system The sample of ANALOGY OF BOLTZMANN DISTRIBUTION defined in office's energy.

Quantum bit is the physics realization of the quantum-mechanical system indicated on Hilbert space, and realizes at least two Different and differentiable eigenstate indicates two states of quantum bit.Quantum bit is the simulation of digit order number, wherein environment is deposited Storage device can store two states of two state quantum information | 0 > with | 1 >, but can also be in the form of the superposition of two states α | 0 >+β | 1 > storage.In various embodiments, such system can have more than two eigenstates, in this case, attached The eigenstate added is used to measure by degeneracy (degenerate) to indicate two logic states.The reality of quantum bit has been proposed Existing various embodiments: for example, electronically or with Nuclear Magnetic Resonance Measurement and the solid-state nuclear spin of control, the ion of capture, light It learns the atom (Eurytrema coelomatium) in chamber, liquid nuclear spin, the electron charge in quantum dot or spin freedom degree, be based on Superconducting Quantum circuit [Barone the and Paterno, 1982, Physics and Applications of of Josephson knot The Josephson Effect, John Wiley and Sons, New York；Martinis et al., 2002, Physical Review Letters 89,117901] and helium on electronics.

The bias source for being inductively coupled to each quantum bit is known as local fields biasing.In one embodiment, bias source is A kind of calutron, for magnetic flux to be passed through quantum bit to provide the control [US 2006/0225165] to qubit state.

Local field biasing on quantum bit is programmable and controllable.In one embodiment, including digital processing list The quantum level controlling system of member is connected to qubit system, and can program and tune the biasing of the local field on quantum bit.

Quantum processor can also include multiple quantum bits in it is multipair between multiple couplings.Between two quantum bits Coupling is a device near two quantum bits, it is by magnetic flux through on two quantum bits.In one embodiment, Coupling can be made of the superconducting circuit interrupted by compound Josephson knot.Magnetic flux can pass through compound Josephson knot simultaneously Therefore magnetic flux [US 2006/0225165] is passed through on two quantum bits.Energy of the intensity of the magnetic flux to quantum processor There are two the contributions of aspect for amount.In one embodiment, it is forced by being tuned at the coupling device near two quantum bits Realize stiffness of coupling.

Stiffness of coupling is controllable and programmable.In one embodiment, the quantum device including digital processing element Part control system is connected to multiple couplings, and can program the stiffness of coupling of quantum processor.

Quantum annealing furnace is the quantum processor with Quantum annealing, for example, such as Farhi, described in E. et al., " Quantum Adiabatic Evolution Algorithms versus Simulated Annealing " arXiv.org:quant ph/ 0201031 (2002), pp.1-16.

Quantum annealing furnace executes quantum processor from initial setting up to the conversion being finally arranged.Quantum processor initial and Final setting is provided by the quantized system of corresponding initial and final Hamiltonian description.For having part as described above The Quantum annealing device of field biasing and coupling, final Hamiltonian can be expressed as quadratic function f (x)=∑_ih_ix_i+∑_{(i, j)} J_{(i, j)}x_ix_j, wherein first summation is run on the index i of quantum bit for indicating Quantum annealing furnace, second summation is in quantum In the presence of coupling on (i, j) between bit i and j.

Quadratic function (wherein, each variable x as described above_iTake one in the spin values -1 and 1 of i-th of quantum bit) Also referred to as Ising model.In this case, Ising model is also usedIt indicates.Here subscript z Indicate the spin σ of quantum bit i_iOnly work in one in three of them axis.Therefore, axis z also referred to as measures axis or measurement Base.

In more generally embodiment, the Hamiltonian of Ising model can also be spun on certainly other bases comprising quantum bit In contribution.For example, HamiltonianReferred to as transverse field Ising model, In, each spin is influenced by the non-zero transverse field along x-axis.

Quantum annealing furnace can be used as the heuristic optimization device of its energy function.McGeoch, Catherine C.and Cong Wang, (2013), " Experimental Evaluation of an Adiabatic Quantum System for Combinatorial Optimization ", Computing Frontiers, May14 16,2013 (http: // Www.cs.amherst.edu/ccm/cf14-mcgeoch.pdf it) discloses the embodiment of this analog processor and goes back It is disclosed in patent application US 2006/0225165.

By the minor modifications handled Quantum annealing, quantum processor can alternatively under finite temperature from it The ANALOGY OF BOLTZMANN DISTRIBUTION of Ising model provides sample.Reader can refer to technical report: Bian, Z., Chudak, F., Macready, W.G.and Rose, G. (2010), " The Ising model:teaching an old problem new Tricks ", and also Amin, M.H., Andriyash, E., Rolfe, J., Kulchytskyy, B., and Melko, R. (2016), " Quantum Boltzmann Machine " arXiv:1601.02036.

This method of sampling is known as quantum sampling.

For the quantum processor for biasing and coupling with local field, sample that quantum sampling provides with it from indicating The distribution that the ANALOGY OF BOLTZMANN DISTRIBUTION of Ising model is slightly different.

Bibliography Amin, M.H., Andriyash, E., Rolfe, J., Kulchytskyy, B., and Melko, R. (2016), " Quantum Boltzmann Machine " arXiv:1601.02036 has studied quantum sampling and adopts with Boltzmann The distance of sample.

Optical computing device

The simulation system that can be sampled from the ANALOGY OF BOLTZMANN DISTRIBUTION of the Ising model close to its equilibrium state it is another Embodiment is Optical devices.

In one embodiment, Optical devices include the network of optical parametric oscillator (OPO), such as patent application Disclosed in US20160162798 and WO2015006494 A1.

In the present embodiment, optical parametric oscillator of each spin of Ising model by a job under degeneracy Simulation.

Degenerate Optical Parametric Oscillator is open dissipative system, and second order phase transformation is carried out at oscillation threshold.Due to phase sensitivity Amplification, the optical parametric oscillator of a degeneracy can be more than the vibration of threshold value with the phase oscillation of 0 or π relative to pumping phase Width.Phase be it is random, by oscillation establish during the relevant quantum noise of optical parameter down coversion influenced.Therefore, degeneracy light Learn parametric oscillator indicates the binary digit specified by its output phase naturally.Based on the characteristic, the vibration of degeneracy optical parameter Swinging device system may be used as Yi Xinji.The phase of each Degenerate Optical Parametric Oscillator be identified as Yi Xin spin, amplitude and Phase is determined by the intensity and symbol of the Yi Xin coupling between dependent spin.

When being pumped by intense source, Degenerate Optical Parametric Oscillator is in Ising model using two corresponding to spin 1 or -1 One of a phase state.The identical source of network with the N number of essentially identical optical parametric oscillator to intercouple pumps To simulate Yi Xin spin system.Optical parametric oscillator network after a transient state phase, gradually tends to close after pumping introducing Thermally equilibrated stable state.

Phase state selection processing depends on the fluctuations of vacuum of optical parametric oscillator and intercouples.In some embodiments In, pump with constant amplitude pulse, in other embodiments, pump output gradually increases, and in a further embodiment, pump with Other modes control.

In an embodiment of Optical devices, by for the multiple of the light field between coupling optical parametric oscillator Configurable coupler simulates multiple couplings of Ising model.Configurable coupler can be configured as closing or be configured to out It opens.Opening and closing coupler can be progressive, be also possible to unexpected.When being configured to open, configuration, which can provide, appoints What phase or amplitude, is specifically dependent upon the stiffness of coupling of Yi Xin problem.

Each optical parametric oscillator output is interfered by phase reference, and result is captured at photodetector.Optical Parametric Measuring oscillator output indicates the configuration of Ising model.For example, zero phase can indicate -1 spin states, and π phase can be with table Show 1 spin states in Ising model.

For the Ising model with N number of spin, and according to one embodiment, multiple optical parametric oscillators it is humorous Vibration chamber is configured with N times of the two-way time in the period equal to N number of pulse from pumping source.When used herein round-trip Between indicate light along the Once dissemination of described recursion paths time.Period P is equal to the 1/N's of the two-way time of resonant cavity N number of pulse of pulse train can propagate concurrently through N number of optical parametric oscillator without interfering with each other.

In one embodiment, the coupling of optical parametric oscillator is provided by the multiple delay lines distributed along resonant cavity.

Multiple delay lines include multiple modulators, synchronously control the intensity and phase of coupling, allow to Optical devices It is programmed to simulate Ising model.

In the network of N number of optical parametric oscillator, N-1 delay line and corresponding modulator are enough to control every two light Learn the amplitude and phase of the coupling between parametric oscillator.

In one embodiment, optical parametric oscillator can be manufactured to from the optimum device that Ising model samples The network of device, as disclosed in U.S. Patent application 20160162798.

In one embodiment, quotient can be used in the coupling of the network and optical parametric oscillator of optical parametric oscillator Obtainable mode-locked laser and optical element (such as telecommunication optical fiber delay line, modulator and other Optical devices) come real in industry It is existing.Alternatively, optical fiber technology realization can be used in the coupling of optical parametric oscillator network and optical parametric oscillator, it is for example, electric Believe the optical fiber technology of application and development.Coupler can be realized with optical fiber, and be controlled by optics Kerr shutter (Kerr shutters) System.

Q- study

For near-optimization value function U^*With optimal policy α^*Method be referred to as neurodynamics programming or Q study calculate Method.Bibliography [Sallans, B., Hinton, G.E., Reinforcement Learning with Factored States and Actions, Journal of Machine Learning Research 5,1063-1088,2004] it proposes The method that Q study is carried out by using Boltzmann machine.Especially common Boltzmann machine is used for near-optimization STOCHASTIC CONTROL The Joint Distribution of state and movement in setting.

By reading following the disclosure, attached drawing and description, feature of the invention be will become obvious.

Summary of the invention

According to extensive aspect, a kind of method for improving the strategy of Stochastic Control Problem is disclosed, STOCHASTIC CONTROL is asked Topic be characterized in that set of actions, state set, as state and the incentive structure and multiple decisions of the function of movement when Phase, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, this method includes that number is coupled in use Computer and the sampling apparatus for being coupled to sampling apparatus control system, which, which obtains, indicates that the sampling of Boltzmann machine is matched The data set, which includes multiple nodes, multiple couplers, (each biasing corresponds to multiple nodes for multiple biasings In node), multiple coupled weights (each coupled weight corresponds to the coupler in multiple couplers), and lateral field strength； Using digital computer obtain include set of actions, state set, Stochastic Control Problem incentive structure and be used for STOCHASTIC CONTROL The initial policy of problem, the strategy include selecting at least one movement for each state；Use digital computer and sampling apparatus Control system will indicate each coupler and the respective initial weight of each node and biasing and transverse field of Boltzmann machine Strong data distribute to sampling apparatus；Until meeting stopping criterion, current period state action pair is generated using digital computer, Using digital computer and sampling apparatus control system, using generation current state movement to come modify indicate without or at least The data of one coupler and at least one biasing are executed and are adopted corresponding to the sampling of current period state action pair with obtaining first Sample empirical mean, using digital computer, approaches the value of Q function under current state acts on using the first sampling empirical mean Approximation, the value of Q function indicate the effectiveness of current state effect pair, obtain future period state action pair using digital computer, Wherein state be by stochastic regime handle obtain, in addition, the acquisition of the movement include to include future state and it is any can The multiple state actions for the movement that can be taken to provide movement for future period, and update not to random optimization test is carried out Carry out the strategy of period state；No or at least one coupler is indicated using digital computer and the modification of sampling apparatus control system Data and using generate future period state action pair at least one biasing, execute correspond to future period state action Pair sampling to obtain the second sampling empirical mean, using the second sampling empirical mean, using digital computer, when obtaining following The approximation of the value of Q function at phase state action, the value of Q function indicate the effectiveness of future period state action pair, use digital meter Calculation machine uses the approximation of the value of the Q function of generation and the first sampling empirical mean of current period state action pair and use The corresponding reward for the current period state action pair that incentive structure obtains, updates each coupler and each section of Boltzmann machine The respective each weight of point and each biasing, and strategy is provided using digital computer when meeting stopping criterion.

According to one embodiment, sampling apparatus includes quantum processor, and wherein sampling apparatus control system includes Quantum devices control system, and quantum processor is coupled to digital computer and quantum devices control system, in addition, wherein measuring Sub-processor includes multiple quantum bits and multiple couplers, and each coupler is provided for the infall between two quantum bits It is communicatively coupled.

According to one embodiment, sampling apparatus includes Optical devices, is configured as receiving energy simultaneously from optical energy source Multiple optical parametric oscillators and multiple coupling devices are generated, each coupling device controllably couples multiple optical parameters Optical parametric oscillator in oscillator.

According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes The memory cell of Boltzmann machine, and the Boltzmann machine realized is classical Boltzmann machine, it is characterised in that zero Lateral field strength；In addition, wherein memory cell includes indicating each coupler of classical Boltzmann machine and every for obtaining The application program of a respective each weight of node and the data of each biasing, and application program is adapted for carrying out classical Bohr The hereby simulation Quantum annealing of graceful machine.

According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes The memory cell of Boltzmann machine, and the Boltzmann machine realized is quantum Boltzmann machine, it is characterised in that nonzero value Lateral field strength and memory cell include indicating that each node of each coupler and quantum Boltzmann machine is each for obtaining From each weight and each biasing data application program；In addition, wherein application program is adapted for carrying out quantum Bohr hereby The simulation Quantum annealing of graceful machine.

According to one embodiment, the execution of the simulation Quantum annealing of quantum Boltzmann machine, which provides, indicates quantum Bohr hereby Multiple sample configurations of the Effective Hamiltanian of graceful machine.

According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes The memory cell of Boltzmann machine, and the Boltzmann machine realized is classical Boltzmann machine, it is characterised in that zero Lateral field strength；In addition, wherein memory cell includes indicating each coupler of classical Boltzmann machine and every for obtaining The application program of a respective each weight of node and the data of each biasing, and application program is suitable for corresponding to classical glass Multiple examples sampling that the random cluster of the Fortuin-Kasteleyn of the graceful machine of Wurz indicates, to provide Fortuin-Kasteleyn The approximation of the number of cluster in random cluster expression.

According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes The memory cell of Boltzmann machine, and the Boltzmann machine realized is quantum Boltzmann machine, it is characterised in that nonzero value Lateral field strength and memory cell include indicating that each node of each coupler and quantum Boltzmann machine is each for obtaining From each weight and each biasing data application program, and the application is suitable for corresponding to quantum Boltzmann machine Multiple examples sampling that the random cluster of Fortuin-Kasteleyn indicates, so that providing the random cluster of Fortuin-Kasteleyn indicates The approximation of the number of middle cluster.

According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes along measurement The multiple configuration samples for the Boltzmann machine that axis is obtained from sampling apparatus, and use digital computer calculates oneself of Boltzmann machine It is approximate by the experience of energy.

According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes along measurement The multiple sample configurations for the Boltzmann machine that axis is obtained from sampling apparatus, constructing from sample configuration obtained indicates quantum Bohr The hereby multiple configuration samples of the Effective Hamiltanian of graceful machine, and the free energy of quantum Boltzmann machine is calculated using digital computer Experience is approximate.

According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes setting from sampling It is standby to obtain the multiple configuration samples for indicating the Effective Hamiltanian of quantum Boltzmann machine, and quantum is calculated using digital computer The experience of the free energy of Boltzmann machine is approximate.

According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes setting from sampling The standby approximation for obtaining the number of cluster in the random cluster expression of Fortuin-Kasteleyn for correspond to quantum Boltzmann machine, and The experience that the free energy of quantum Boltzmann machine is calculated using digital computer is approximate.

According to one embodiment, it calculates the first empirical mean for corresponding to node and the second empirical mean includes along survey Amount axis is from one multiple configuration sample in the quantum Boltzmann machine that sampling apparatus obtains and classical Boltzmann machine and makes With digital computer come the approximation of the empirical mean of calculate node.

According to one embodiment, it calculates the first empirical mean for corresponding to node and the second empirical mean includes from sampling Device obtains multiple configuration samples of the Effective Hamiltanian of Boltzmann machine and the experience using digital computer calculate node The approximation of mean value.

According to one embodiment, to the multiple state action to progress random optimization test, including digital meter is used Calculation machine and sample devices control system, using correspond to future period state each state action to come modify indicate without or The data of at least one coupler and at least one biasing execute and correspond to each state action of future period state to phase Corresponding sampling obtains each state action pair for corresponding to future period state using digital computer to provide empirical mean The approximation of the value of the Q function at place, using with correspond to the corresponding each state action pair of future period state all approximate Q The value of function updates the strategy for future period state from corresponding profile samples using digital computer.

It according to one embodiment, include obtaining temperature ginseng to random optimization test is executed to multiple all state actions Number；Obtain future period state；Relevant ANALOGY OF BOLTZMANN DISTRIBUTION approximate with the value of Q function is sampled, wherein state variable is not Come at period state and the temperature of offer to be fixed.

According to one embodiment, multiple quantum bits of quantum processor include first group of quantum bit；Second group of quantum bit Multiple couplers with quantum processor include that (each of at least one coupler is used for first at least one coupler Infall in the quantum bit and second group of quantum bit of group quantum bit between at least one quantum bit provides communicative couplings) and it is multiple (each of multiple couplers are used for other in the quantum bit and second group of quantum bit in second group of quantum bit to coupler Intersection between quantum bit provides communicative couplings).

According to one embodiment, first group of quantum bit indicates the set of actions of Stochastic Control Problem.

According to one embodiment, using current period state action generated to come modify indicate without or at least one A coupler and the data of at least one biasing include will be between the quantum bit and second group of quantum bit of first group of quantum bit Infall provides all couplers being communicatively coupled and is switched to OFF, and using current epoch state action generated to next Modify the biasing of at least one of described second group of quantum bit.

According to one embodiment, using future period state action generated to come modify indicate without or at least one A coupler and the data of at least one biasing include will be between the quantum bit and second group of quantum bit of first group of quantum bit Infall provides all couplers being communicatively coupled and is switched to OFF, and using future period state action generated to next Modify the biasing of at least one of second group of quantum bit.

According to one embodiment, to the multiple all state actions pair for including future period state and any possible movement Executing random optimization test includes by the friendship between the quantum bit of first group of quantum bit and the quantum bit of second group of quantum bit All couplers being communicatively coupled are provided at fork and are switched to ON；Use the future period shape for corresponding to future period state action pair State biases to modify at least one of second group of quantum bit；Quantum sampling is executed to obtain the warp for corresponding to first group of quantum bit Mean value is tested, and passes through the distribution according to the empirical mean obtained for corresponding to first group of quantum bit to not using digital computer Carry out the movement of period state assignment to update the strategy of future period state.

According to one embodiment, stopping criterion includes the training step for reaching maximum quantity.

According to one embodiment, stopping criterion includes reaching maximum runing time.

According to one embodiment, stopping criterion includes the convergence coupled with the weight of local field and the function of biasing.

According to one embodiment, stopping criterion includes that strategy is converged to fixed policy.

According to embodiment, tactful offer includes at least one to the user's display strategy interacted with digital computer； It stores the policies into digital computer and sends strategy to another processing list for being operably connected to digital computer Member.

According to one embodiment, digital computer includes memory cell；In addition, wherein initialization data is from number What the memory cell of computer obtained.

According to one embodiment, initialization data is from the user interacted with digital computer and can with digital computer One in the remote processing unit being operatively connected acquisition.

According to extensive aspect, a kind of digital computer, including central processing unit are disclosed；Show equipment；It is logical Port is believed, for digital computer to be operationally connected to sampling apparatus and the sampling apparatus control of being coupled to digital computer System；A kind of memory cell including application program, the method for the strategy for improving Stochastic Control Problem, STOCHASTIC CONTROL are asked Topic be characterized in that set of actions, state set, as state and the incentive structure and multiple decisions of the function of movement when Phase, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, which includes for using coupling It closes digital computer and is coupled to the instruction of the sampling apparatus of sampling apparatus control system, which, which obtains, indicates Bohr The hereby data of the sampling configuration of graceful machine, which includes multiple nodes, multiple couplers, multiple biasing (each biasings Corresponding to the node in multiple nodes), multiple coupled weights (each coupled weight correspond to multiple couplers in coupler), And lateral field strength；For using the digital computer to obtain the instruction of initialization data, initialization data includes behavior aggregate Conjunction, state set, the incentive structure of Stochastic Control Problem and the initial policy for Stochastic Control Problem, the strategy is including being every A state selects at least one movement；For using the instruction of digital computer and sample devices control system, Bohr will be indicated Hereby the data of each coupler of graceful machine and the respective initial weight of each node and biasing and lateral field strength distribute to sampling Device；Instruction is for generating current period state action pair using digital computer, using digital meter until meeting stopping criterion Calculation machine and sampling apparatus control system indicate no or at least one coupler to modify using the current state movement of generation With the data of at least one biasing, execute equal to obtain the first sampling experience corresponding to the sampling of current period state action pair Value approaches the approximation that current state acts on the value of lower Q function, Q letter using digital computer using the first sampling empirical mean Several values indicates the effectiveness of current state effect pair, obtains future period state action pair using digital computer, wherein state It is to handle to obtain by stochastic regime, in addition, acquisitions of the movement includes to including future state and any may taking Multiple state actions of movement to provide movement for future period, and update future period shape to random optimization test is carried out The strategy of state；Using digital computer and sampling apparatus control system modification indicate without or at least one coupler data with Using at least one biasing of the future period state action pair of generation, the sampling for corresponding to future period state action pair is executed To obtain the second sampling empirical mean, it is dynamic to obtain future period state using digital computer using the second sampling empirical mean The approximation of the value of Q function at work, the value of Q function indicate the effectiveness of future period state action pair, using digital computer, use The approximation of the value of the Q function of generation and the first of current period state action pair are sampled empirical mean and are obtained using incentive structure Current period state action pair corresponding reward, respectively update Boltzmann machine each coupler and each node it is every A weight and each biasing, and for providing the instruction of strategy using digital computer when meeting stopping criterion.

According to extensive aspect, it is computer-readable to disclose a kind of non-transitory for storing computer executable instructions Storage medium, computer executable instructions make digital computer execute the plan for improving Stochastic Control Problem when executed Method slightly, Stochastic Control Problem are characterized in that set of actions, state set, as the reward knot of state and the function of movement Structure and multiple decision periods, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, this method packet It includes using being coupled to digital computer and being coupled to the sampling apparatus of sampling apparatus control system, which, which obtains, indicates glass The data of the sampling configuration of the graceful machine of Wurz, the Boltzmann machine include multiple nodes, multiple couplers, multiple biasings (each partially Set correspond to multiple nodes in node), multiple coupled weights (each coupled weight correspond to multiple couplers in coupling Device), and lateral field strength；Using digital computer obtain include set of actions, state set, Stochastic Control Problem reward knot Structure and initial policy for Stochastic Control Problem, the strategy include selecting at least one movement for each state；Use number Computer and sampling apparatus control system will indicate each coupler and the respective initial weight of each node of Boltzmann machine Sampling apparatus is distributed to the data of biasing and lateral field strength；Until meeting stopping criterion, worked as using digital computer generation Preceding period state action pair, using digital computer and sampling apparatus control system, using the current state movement of generation to next Modification indicate without or at least one coupler and at least one biasing data, execute corresponding to current period state action pair Sampling to obtain the first sampling empirical mean, using digital computer approach current state using the first sampling empirical mean The approximation of the value of lower Q function is acted on, the value of Q function is indicated the effectiveness of current state effect pair, obtained not using digital computer Carry out period state action pair, wherein state is to handle to obtain by stochastic regime, in addition, the acquisition of the movement includes to including Future state and multiple state actions of any movement that may be taken are to random optimization test is carried out, to mention for future period For movement, and update the strategy of future period state；Indicate do not have using digital computer and the modification of sampling apparatus control system Or at least one biasing of the data and the future period state action pair using generation of at least one coupler, execution correspond to The sampling of future period state action pair, using the second sampling empirical mean, uses number to obtain the second sampling empirical mean Computer, obtains the approximation of the value of Q function at future period state action, and the value of Q function indicates future period state action pair Effectiveness use the first sampling of the approximation and current period state action pair of the value of the Q function of generation using digital computer The corresponding reward of empirical mean and the current period state action pair obtained using incentive structure, updates Boltzmann machine respectively Each coupler and each node each weight and each biasing, and when meeting stopping criterion use digital computer Strategy is provided.

One advantage of method disclosed herein is that it overcomes the value iteration side for solving Markov decision problem The dimension of method limits.

Another advantage of method disclosed herein is that it overcomes the common Q for solving Markov decision problem The memory storage issues of learning method.

Another advantage of method disclosed herein is quantum sampling for providing the warp of the quantum bit for finding system The effective ways of mean value are tested, to provide the effective ways for training neural network.

Another advantage of method disclosed herein is in one embodiment using from Fortuin-Kasteleyn The sampling that random cluster indicates is used for providing the effective ways for finding the empirical mean of the quantum bit of system to provide The effective ways of training neural network.

Another advantage of method disclosed herein be it be not limited to quantum processor or Optical devices quantum bit spy Determine pattern layout.

Detailed description of the invention

For ease of understanding the present invention, embodiments of the present invention are illustrated by way of example in the accompanying drawings.

Fig. 1 is the figure for showing the embodiment of system of the digital display circuit including being coupled to simulation computer.

Fig. 2 is the flow chart for showing the embodiment of method of the strategy for improving Stochastic Control Problem.

Further details and its advantage of the invention will be apparent from detailed description included below.

Specific embodiment

It is in order to illustrate example of the invention can be practiced with reference to attached drawing in the description below to embodiment.

Term

Unless expressly stated otherwise, otherwise term " invention " etc. indicates " one or more inventions disclosed herein ".

Unless expressly stated otherwise, otherwise term " one aspect ", " embodiment ", " embodiment ", " one or more A embodiment ", " some embodiments ", " certain embodiments ", " embodiment ", " another embodiment " etc. indicate " one or more (but being not all of) embodiment of disclosed invention ".

Unless expressly stated otherwise, otherwise " another embodiment " or " another aspect " are drawn when describing embodiment With being not meant to cited embodiment and another embodiment (for example, describing before cited embodiment Embodiment) mutual exclusion.

Unless expressly stated otherwise, otherwise the terms "include", "comprise" and its variant indicate " including but not limited to ".

Unless expressly stated otherwise, otherwise term " one ", "one", "the" and "at least one" indicate " one or more It is a ".

Unless expressly stated otherwise, otherwise term " multiple " expression " two or more ".

Unless expressly stated otherwise, otherwise term " herein " expression " in this application, including can be by quoting simultaneously Any content entered ".

Term " thus " it is only used for the expected results in the something for only indicating previously clearly to have described, target or consequence herein Before subordinate clause or other set of letters.Therefore, when in the claims use term " thus " when, term " thus " modification from Sentence or other words do not know claim it is specific it is further limitation or otherwise limit claim meaning or Range.

Term " such as " and similar terms expression " such as ", and term or phrase that they are explained therefore are not limited. For example, in the sentence of " computer sends data (for example, instruction, data structure) by internet ", term " such as " explain Say that " instruction " is the example that computer can send " data " by internet, and also explaining " data structure " is to calculate The example of " data " that machine can be sent by internet.However, " instruction " and " data structure " is all only showing for " data " Example, and other things other than " instruction " and " data structure " can be " data ".

Term " i.e. " and similar terms indicate " that is ", and therefore limit the term or short that they are explained Language.

In one embodiment, term " simulation computer " refer to include quantum processor, quantum level controlling system, The system of coupling device and read-out system, they are all connected with each other by communication bus.

In alternate embodiments, " simulation computer " refer to include Optical devices system, which includes The control system of the network of optical parametric oscillator, optical parametric oscillator；One or more including delay line and modulator Coupling device, and the read-out system including one or more photodetectors.

Title and abstract should not be construed as by the range of disclosed invention it is any in a manner of limited.The mark of the application The title of topic and chapters and sections provided herein is not to be considered in any way limitative of the disclosure only for convenient.

Many embodiments are described in this application, and are presented merely for illustrative purpose.The embodiment of description It is not, and is also not and limits in any sense intentionally.It will be apparent that presently disclosed invention is extensive such as from the disclosure Suitable for numerous embodiments.It will be appreciated by those of ordinary skill in the art that various modifications and change (such as structure can be passed through And logic Modification) practice disclosed invention.Although can be retouched with reference to one or more particular implementations and/or attached drawing State the special characteristic of disclosed invention, but it is to be understood that unless expressly stated otherwise, these features are not limited to retouch with reference to it The use in one or more particular implementations or attached drawing stated.

It should be appreciated that the present invention can realize in many ways.In the present specification, these embodiments or the present invention can be with Any other form used is properly termed as subsystem or technology.It is described as being configured as such as processor of execution task or deposits The component of reservoir includes that provisional configuration is to execute the general purpose module of task in given time or be manufactured to the spy of execution task Determine component.

In view of all these, the present invention relates to a kind of for improving the method and system of the strategy of Stochastic Control Problem.

As described above, Stochastic Control Problem can be various types.In one embodiment, Stochastic Control Problem is gold Portfolio Optimization in warm investment.

In alternate embodiments, Stochastic Control Problem is equipment replacement problem.

In alternate embodiments, Stochastic Control Problem is the scheduling in queuing theory and running research.

In alternate embodiments, Stochastic Control Problem is to be involved in the problems, such as generating Motion.

Referring now to Figure 1, show the figure of the embodiment for the system of showing, the system can be used to implement for improve with The method of the strategy of machine control problem.

It should be appreciated that using quantum processor in the embodiment disclosed in Fig. 1.

It is appreciated that removably, other sampling apparatuses, such as the simulator of quantum or classical Ising model can be used Or the Optical devices including optical parametric oscillator network.

More precisely, the system includes being coupled to the digital display circuit 8 of simulation computer 10.

It should be appreciated that digital computer 8 can be any kind of digital computer.

In one embodiment, digital computer 8 is selected from a group, which includes desktop computer, calculating on knee Machine, tablet computer, server, smart phone etc..It should also be understood that in the foregoing, digital computer 8 can also be broadly referred to as handling Device.

In the embodiment shown in figure 1, digital computer 8 include central processing unit 12 (also referred to as microprocessor), Display device 14, input unit 16, communication port 20, data/address bus 18 and memory cell 22.

Central processing unit 12 is for handling computer instruction.It will be understood by those skilled in the art that centre can be provided Manage the various embodiments of unit 12.

In one embodiment, central processing unit 12 includes with 2.5GHz operation and by Intel^(TM)The CPU of manufacture Core i5 3210。

Display device 14 is used to show data to user.It will be appreciated by those skilled in the art that can be used various types of aobvious Showing device 14.

In one embodiment, display device 14 is standard LCD (LCD) monitor.

Input unit 16 is for entering data into digital computer 8.

Communication port 20 is used for and 8 shared data of digital computer.

Communication port 20 may include for example for keyboard and mouse to be connected to the universal serial bus of digital computer 8 (USB) port.

Communication port 20 can also include data network communications port (such as 802.3 port IEEE), for realizing number The connection of computer 8 and simulation computer 10.

It will be understood by those skilled in the art that the various optional embodiments of communication port 20 can be provided.

Memory cell 22 is for storing computer executable instructions.

Memory cell 22 may include system storage, such as storage system control program (for example, BIOS, behaviour Make system module, application program etc.) high-speed random access memory (RAM) and read-only memory (ROM).

It should be appreciated that in one embodiment, memory cell 22 includes operating system module.

It should be appreciated that operating system module can be it is various types of.

In one embodiment, operating system module is Apple^TMThe OS X Yosemite of manufacture.

Memory cell 22 further includes the application program for improving the strategy of Stochastic Control Problem.

Memory cell 22 can also include the application program for using simulation computer 10.

Memory cell 22 can also include quantum processor data, such as quantum processor 28 each coupler pair Answer the corresponding biasing of each quantum bit of weight and quantum processor 28.

Simulation computer 10 includes quantum level controlling system 24, reads control system 26, quantum processor 28 and coupling dress Set control system 30.

Quantum processor 28 can be various types of.In one embodiment, quantum processor includes Superconducting Quantum Position.

Read the quantum bit that control system 26 is used to read quantum processor 28.Indeed, it is to be understood that in order to herein Quantum processor is used in disclosed method, needs a kind of reading that quantized system quantum bit is measured under its quantum mechanical state System.Repeatedly measurement provides the sample of qubit state.Result from reading is fed to digital computer 8.Quantum treatment The biasing of the quantum bit of device 28 is controlled by quantum level controlling system 24.Coupler is controlled by coupling device control system 30.

It should be appreciated that read control system 26 can be it is various types of.For example, it may include more for reading control system 26 A dc-SQUID magnetometer, the different quantum bits of each dc-SQUID magnetometer inductance connection to quantum processor 28.Read control System 26 processed can provide voltage value or current value.In one embodiment, as it is known in the art, the dc-SQUID magnetic Power meter includes the superconductor ring interrupted by least one Josephson knot.

Coupling device control system 30 may include one or more Coupling Control Units for coupling device, also referred to as " coupler ".Each Coupling Control Unit can be configured as the coupled weight by corresponding coupling device from zero adjustment to maximum value. It should be appreciated that coupling device can be tuned, for example, providing ferromagnetic or antiferromagnetic coupling between the quantum bit of quantum processor 28 It closes.The example of this simulation computer is disclosed in United States Patent (USP) No.8,421,053 and U.S. Patent Application Publication No.2015/ In 0046681.

In the embodiment of figure 1, the sampling apparatus for being coupled to digital computer is quantum processor.

In alternate embodiments, sampling apparatus is the Optical devices for including optical parametric oscillator network.

In the third embodiment, sampling apparatus includes central processing unit and the memory for being coupled to central processing unit Unit, the memory cell include for obtaining the lateral field strength and each coupler and each node that indicate Boltzmann machine The application program of respective each weight and the data of each biasing, wherein zero transverse direction field strength corresponds to classical Boltzmann Machine, and nonzero value transverse direction field strength corresponds to quantum Boltzmann machine (QBM), and the analog quantity for executing Boltzmann machine Sub- method for annealing is to provide multiple sample configurations of Boltzmann machine along measurement axis.

In the fourth embodiment, sampling apparatus includes central processing unit and the memory for being coupled to central processing unit Unit, the memory cell include for obtaining the lateral field strength and each coupling that indicate Boltzmann machine from digital computer The application program of device and the respective each weight of each node and the data of each biasing, wherein lateral field strength has the amount of corresponding to The nonzero value of sub- Boltzmann machine, and for executing simulation quantum method for annealing on quantum Boltzmann machine, to provide Indicate multiple sample configurations of the Effective Hamiltanian of quantum Boltzmann machine.

In the 5th embodiment, sampling apparatus includes central processing unit and the memory for being coupled to central processing unit Unit, the memory cell include for obtaining the lateral field strength and each coupling that indicate Boltzmann machine from digital computer The application program of device and the respective each weight of each node and the data of each biasing, wherein lateral field strength has the amount of corresponding to The nonzero value of sub- Boltzmann machine, and for the random cluster of Fortuin-Kasteleyn for corresponding to quantum Boltzmann machine The multiple examples sampling indicated, to provide the approximation of the number of cluster in the random cluster expression of Fortuin-Kasteleyn.

Referring now to Figure 2, showing the embodiment of the method for the strategy for improving Stochastic Control Problem.

As described above, Stochastic Control Problem be characterized in that set of actions, state set, discount factor, as state and The incentive structure of the function of movement and multiple decision periods, wherein the evolution of basic stochastic regime processing depends in strategy Multiple movements.

Use sampling apparatus.More precisely, using being coupled to digital computer and being coupled to sampling apparatus control system Sampling apparatus be used for obtain data.The data of acquisition indicate the sample configuration of Boltzmann machine, which includes Multiple nodes, multiple couplers, multiple biasings (each biasing corresponds to the node in multiple nodes), multiple coupled weights are (every A coupled weight corresponds to the coupler in multiple couplers), and lateral field strength.

According to processing step 52, initialization data is obtained.It is initialized it should be appreciated that digital computer 8 can be used Data.It is also understood that initialization data include set of actions, state set, discount factor, Stochastic Control Problem reward knot Structure, and for the initial policy of Stochastic Control Problem, which includes selecting at least one movement for each state.

It should be appreciated that in one embodiment, initialization data can store the memory cell in digital computer 8 In 22.

In alternate embodiments, initialization data can be provided by the user interacted with digital computer 8.

In another optional embodiment, initialization data can be long-range from being operatively coupled with digital computer 8 Processing unit obtains.

Still referring to Figure 2 and according to processing step 54, each coupler and each node of Boltzmann machine will be indicated The data of respective initial weight and biasing and lateral field strength distribute to sampling apparatus.It include quantum processor in sampling apparatus Embodiment in, indicate the data of initial weight and biasing be respectively allocated to quantum processor each coupler and each amount Sub- position, and use the value of the lateral field strength of control system distribution.

In the embodiment that sampling apparatus includes optical parametric oscillator pulse network, initial weight and biasing are indicated Data are sent to energy source and modulator respectively.It include the embodiment for simulating Quantum annealing application program in sampling apparatus In, the data of initial weight and biasing are delivered separately to application program as parameter.

It should be appreciated that quantum processor can be it is various types of.

In one embodiment, quantum processor includes first group of quantum bit and second group of quantum bit.In the embodiment party In formula, quantum processor includes one group of coupler.This group of coupler of the quantum processor includes at least one coupler, this is extremely Each of few coupler coupler be used for first group of quantum bit quantum bit and second group of quantum bit at least one Infall between a quantum bit provides communicative couplings.This group of coupler further includes multiple couplers, in multiple coupler Each coupler is for the friendship between other quantum bits in the quantum bit and second group of quantum bit in second group of quantum bit Communicative couplings are provided at fork.

In this embodiment, first group of quantum bit is used for the set of actions of Stochastic Control Problem.

In another embodiment, quantum processor is by D-Wave Systems, the D-Wave 2X system of Ltd. manufacture System.

It should be appreciated that digital computer 8 and quantum devices control system can be used to distribute each of quantum processor Coupler and the respective initial weight of each quantum bit and biasing.

Apparatus control system includes quantum level controlling system 24 and coupling device control system 30.

It should be appreciated that initial weight and biasing can store in the memory cell 22 of digital computer 8.

In alternate embodiments, initial weight and biasing are provided by the user interacted with digital computer 8.

In further embodiment, initial weight and biasing are long-range by being operatively coupled with digital computer 8 Processing unit provides.

It should be appreciated that in one embodiment, initial weight is randomly generated with biasing.

Sampling apparatus is set

In one embodiment, wherein quantum processor is used as sampling apparatus, it should be understood that the quantum of quantum processor Position indicates multiple nodes of corresponding general Boltzmann machine (GBM).

In one embodiment, wherein sampling apparatus includes Optical devices, and optical parametric oscillator network representation is general Boltzmann machine.

The visible node of general Boltzmann machine is made of two group nodes.The shape of first group node expression Stochastic Control Problem State.The movement of second group node expression Stochastic Control Problem.The concealed nodes of general Boltzmann machine are by being not included in first group All nodes composition in node or the second group node.

In one embodiment, wherein quantum processor is used as sampling apparatus, and quantum processor includes indicating general glass Multiple quantum bits of the concealed nodes of the graceful machine of Wurz.In this embodiment, quantum processor includes multiple quantum bits and multiple Coupler, each coupler provide communicative couplings for the infall between two quantum bits.

In one embodiment, wherein Optical devices are used as sampling apparatus, and optical parametric oscillator indicates general Bohr The hereby concealed nodes of graceful machine.

In another embodiment, wherein simulation Quantum annealing is used as sampling apparatus, simulation spin indicates general Bohr hereby The concealed nodes of graceful machine.

In another embodiment, wherein simulation Quantum annealing is used as sampling apparatus, first group of simulation spin indicates general The movement node of Boltzmann machine, and second group of simulation spin indicates the concealed nodes of general Boltzmann machine.

In another embodiment, wherein quantum processor is used as sampling apparatus, first group of quantum bit of quantum processor Indicate the movement node of general Boltzmann machine, and second group of quantum bit of quantum processor indicates general Boltzmann machine Concealed nodes.In this embodiment, quantum processor includes one group of coupler.This group of coupler of the quantum processor include At least one coupler, each coupler at least one coupler are used for the quantum bit and second in first group of quantum bit Infall between at least one quantum bit of group quantum bit provides communicative couplings.This group of coupler further includes multiple couplers, Each coupler in multiple couplers is for the other amounts in the quantum bit and second group of quantum bit in second group of quantum bit Infall between sub- position provides communicative couplings.In this embodiment, first group of quantum bit is for the dynamic of Stochastic Control Problem Work is gathered, and second group of quantum bit is used for one group of concealed nodes of general Boltzmann machine.

Each node of general Boltzmann machine takes the value in { 0,1 }, except not a node is used for the shape of Stochastic Control Problem State set or set of actions.

For indicating that multiple nodes of the state set of Stochastic Control Problem and the general Boltzmann machine of set of actions can Using the value in { 0,1 } perhaps limited or unlimited discrete value set or the real number indicated by floating type.

In one embodiment, wherein quantum processor is used as sampling apparatus, the ON coupling between any two quantum bit Conjunction is considered as the weight between two corresponding nodes of general Boltzmann machine.

In same embodiment, each ON coupling has float value intensity, which is the close of respective weights Seemingly.The connectivity of non-zero weight instruction node between two nodes.

Still in same embodiment, each OFF coupling has effective zero intensity, and is general Boltzmann The separated instruction of any two node in machine.

Training

According to processing step 56, current period state action pair is generated.

It should be appreciated that current period state action is to including state and respective action.

In one embodiment, current period state action pair is generated at random using digital computer 8.

In alternate embodiments, from environment generation current period state action pair.

In alternate embodiments, from strategy generating current period state action pair.

According to processing step 58, using current period state action generated to come modify indicate without or at least one The data of coupler and at least one biasing.It should be appreciated that indicating no or at least one coupling using the modification of digital computer 8 The data of device and at least one biasing.

In the case where sampling apparatus includes quantum processor, if any quantum bit expression of quantum processor acts section Point, then the processing step includes that will couple to cut each of between any quantum bit and any other quantum bit of expression movement node It is changed to OFF.Then, using current period state action generated to updating and be connected to general Bohr of visible node hereby The biasing of the corresponding quantum bit of those of graceful machine concealed nodes.

In the case where sampling apparatus includes simulation Quantum annealing application program, if simulation Quantum annealing application program Any spin expression acts node, then the processing step includes that any spin of expression movement node is spinned it with any other Between weight be set as zero.Then, using current period state action generated to updating and be connected to visible node The biasing of those of the general Boltzmann machine corresponding spin of concealed nodes.

If current period state action is to by the vector v on visible node=(s a) indicates, and by state node i The weight for being connected to concealed nodes j connected to it is w_ij, then pass through addition w_ijs_iTo modify the biasing on concealed nodes j.Such as It is w that fruit, which will act node k and be connected to the weight of concealed nodes j connected to it,_kj, then pass through addition w_kja_kTo modify concealed nodes Biasing on j.

In the case where sampling apparatus includes Optical devices, that modifies as described above respectively indicates the data of weight and biasing It is sent to energy source and modulator.

In one embodiment, wherein sampling apparatus includes quantum processor, it should be understood that uses digital computer 8 Quantum processor is modified with including the quantum devices control system of quantum level controlling system 24 and coupling device control system 30 Coupling and biasing.

According to processing step 60, sampling is executed.It should be appreciated that including quantum processor or Optical devices in sampling apparatus In the case of, according to the property of these devices, sampling is quantum.

It should be appreciated that executing the sampling for corresponding to current period state action pair to obtain the first sampling empirical mean.

In the case where sampling apparatus includes quantum processor, execute correspond to current period state action pair sampling with Obtain the first quantum sampling empirical mean for corresponding to the quantum bit of quantum processor.

In the case where sampling apparatus includes Optical devices, the sampling for corresponding to current period state action pair is executed to obtain Obtain the first sampling empirical mean of the optical parametric oscillator corresponding to Optical devices.

More precisely, the first sampling empirical mean includes three multiple values.

In the case where sampling apparatus includes quantum processor, more than first value is to measure in quantum sampling corresponding to hidden Hide the average value of the state of each quantum bit of node.In the case where sampling apparatus includes Optical devices, more than first value is The average value of the spin of the measurement of phase corresponding to optical parametric oscillator.It include simulation Quantum annealing application in sampling apparatus In the case where program, more than first value is the average value of spin values.It will be understood by those skilled in the art that for concealed nodes j, it should Value can be by < h_j>_vIt indicates, wherein (s a) is the vector for indicating the visible node corresponding to current period state action pair to v=.

In the case where sampling apparatus includes quantum processor, more than second value is that measurement corresponds to one in quantum sampling To the average value of the product of the state of each pair of quantum bit of concealed nodes.In the case where sampling apparatus includes Optical devices, the More than two value is the average value of the product of spin values corresponding with the measured value of the phase of optical parametric oscillator.In sampling cartridge In the case where setting including simulation Quantum annealing application program, more than second value is the average value of the product of spin values.This field skill Art personnel will be understood that, for this to concealed nodes j and k, which can be by < h_jh_k>_vIt indicates.

In the case where sampling apparatus includes quantum processor, the multiple values of third be byThe quantum treatment of expression The frequency of occurrences of each configuration of the quantum bit of device, wherein h is to indicate to measure all quantum in each sampling that quantum samples The binary vector of the state of position.

In the case where sampling apparatus includes the simulation Quantum annealing application program for classical Boltzmann machine, third is more A value be byThe frequency of occurrences of each configuration of the spin of expression, wherein h is to indicate to measure in each sample The binary vector of the state of all spins.

In the case where sampling apparatus includes Optical devices, the multiple values of third are the frequencies of occurrences of each spin configuration, Corresponding to byThe phase of the optical parametric oscillator of expression, wherein h is to indicate to correspond to light at each sample Learn the binary vector of the spin values of the phase measurement of parametric oscillator.

It include at the quantum for executing the sampling from the quantum Hamiltonian for indicating quantum Boltzmann machine in sampling apparatus Manage device in the case where, the multiple values of third be byThe classical Effective Hamiltanian of the expression quantum Boltzmann machine of expression Each sample configuration the frequency of occurrences, wherein c is the binary vector for indicating all effective spin states.

In one embodiment, indicating the quantum Hamiltonian of quantum Boltzmann machine is

It is with n spin σ₁..., σ_n。

In further embodiment, classical Effective Hamiltanian includes the quantum Hamiltonian of quantum Boltzmann machine Spin m copy.

Provide the quantity for corresponding to the copy of effective classical Hamiltonian of the quantum Boltzmann machine with transverse field m。

In one embodiment, using digital computer 8, and the memory of digital computer 8 is more accurately used 22 obtain the copy amount m of effectively classical Ising model.

In alternate embodiments, the copy amount m of effectively classical Ising model by with the operationally coupling of digital computer 8 The remote processing unit of conjunction is supplied to digital computer 8.

Each spin σ_iIt is expressed as with mSpin correlation connection.For i=1 ..., n and k=1 ..., M, each spinOn biasing be arranged toFor 1≤i ≠ j≤n, every two spinWithBetween coupling quilt It is set asFor each k=1 ..., m-1, every two spinWithBetween coupling be arranged toTherefore, higher one-dimensional Effective Hamiltanian is

It include quantum processor in sampling apparatus, quantum Hamilton of the quantum processor from expression quantum Boltzmann machine In the case that amount executes sampling, classical Effective Hamiltanian is constructed by the way that the measured value of quantum bit is attached to effectively spin Sample configuration, wherein each measuring configuration of quantum bit corresponds to the copy in Effective Hamiltanian.

Sampling apparatus include simulation Quantum annealing application program, the simulation Quantum annealing application program from indicate quantum glass In the case that the quantum Hamiltonian of the graceful machine of Wurz executes sampling, the multiple values of third be byThe Effective Hamiltonian of expression The frequency of occurrences for each configuration of amount effectively spinned, wherein c is the binary vector for indicating all effective spin states.

Still referring to Figure 2 and according to processing step 62, the approximation of the value of Q function is executed.

It should be appreciated that executing Q function to place in current period state action using the first sampling empirical mean obtained Value approximate determination.

It should be appreciated that being sampled and being passed through using the first quantum obtained in the case where sampling apparatus includes quantum processor Test the approximate determination that mean value executes the value of Q function in current period state action to place.

It will be further understood that, the approximation of the value of Q function is determined using digital computer 8.

It will be understood by those skilled in the art that the value of Q function indicates the effectiveness of current period state action pair.

According to processing step 64, future period state is obtained.It should be appreciated that state is to handle to obtain by stochastic regime 's.

In one embodiment, future period is obtained by being related to the random test of known Markov transition probabilities State.In another embodiment, future period state is obtained by the observation from environment.In another embodiment, from The training data of offer obtains future period state.

It should be appreciated that obtaining future period state using digital computer 8.

In one embodiment, using digital computer 8, and the memory of digital computer 8 is more accurately used 22 obtain future period state,.

In alternate embodiments, when by the remote processing unit that is operatively coupled with digital computer 8 by future Phase state is supplied to digital computer 8.

According to processing step 66, future period movement is obtained.The acquisition of movement include to include future period state and appoint The multiple all state actions what may be acted are to random optimization test is executed, thus in future period offer movement.

It in one embodiment, include obtaining temperature ginseng to random optimization test is executed to multiple all state actions Number obtains future period state and adopts to associated ANALOGY OF BOLTZMANN DISTRIBUTION approximate with the value of the Q function with state variable Sample, wherein state variable is fixed at future period state and the temperature of offer.

In one embodiment, correspond to movement node to sample ANALOGY OF BOLTZMANN DISTRIBUTION.In this embodiment, for Current period state s and each movement a_i∈ A, corresponding Q function by approximation and are expressed as Q_i.Then, from distributionIn to movement a_i∈ A sampling.Obtained movement is assumed for the best dynamic of current period state s Make.

In another embodiment that sampling apparatus includes quantum processor, wherein first group of quantum bit indicates STOCHASTIC CONTROL The set of actions of problem, and second group of quantum bit indicates the concealed nodes of corresponding general Boltzmann machine, it can be with throughput Sub-sampling come execute to current period state strategy update.In one embodiment, to include future period state and Multiple all state actions of any possible movement include to execution random optimization test will be in the quantum bit of first group of quantum bit Infall between the quantum bit of second group of quantum bit provides all couplers being communicatively coupled and is switched to ON, using corresponding to The future period state of future period state action pair biases to modify at least one of second group of quantum bit, executes quantum and adopts Sample corresponds to the empirical mean of first group of quantum bit to obtain, and is passed through using digital computer 8 according to corresponding to first group of amount The distribution of the empirical mean obtained of sub- position acts future period state assignment to update the strategy of future period state.

Still referring to Figure 2 and according to processing step 68, future is updated using the movement obtained in processing step 66 The strategy of period state.

According to processing step 70, using future period state action generated to come modify indicate without or at least one The data of coupler and at least one biasing.It should be appreciated that indicating no or at least one coupling using the modification of digital computer 8 The data of device and at least one biasing.

In the case where sampling apparatus includes quantum processor, if the arbitrarily quantum position expression of quantum processor acts section Point, then the processing step includes that will couple to cut each of between any quantum bit of expression movement node and other any quantum bits It is changed to OFF.Then, correspond to the general glass for being connected to visible node to update using future period state action generated The biasing of those of the graceful machine of the Wurz quantum bit of concealed nodes.

In the case where sampling apparatus includes simulation Quantum annealing application program, if simulation Quantum annealing application is any Expression of spinning acts node, then the processing step includes that will act between any spin of node and any other spin in expression Weight be set as zero.Then, using future period state action generated to updating and be connected to the one of visible node As the corresponding spin of concealed nodes of those of Boltzmann machine biasing.

If future period state action is to by the vector v on visible node=(s a) indicates, and by state node i The weight for being connected to concealed nodes j connected to it is w_ij, then pass through addition w_ijs_iTo modify the biasing on concealed nodes j.Such as It is w that fruit, which will act node k and be connected to the weight of concealed nodes j connected to it,_kj, then pass through addition w_kja_kTo modify concealed nodes Biasing on j.

In the case where sampling apparatus includes Optical devices, that modifies as described above respectively indicates coupled weight and biasing Data are sent to energy source and modulator.

According to processing step 72, sampling is executed.In the case where sampling apparatus includes quantum processor or Optical devices, answer Work as understanding, by the property of these devices, sampling is quantum.Correspond to future period state action pair it should be appreciated that executing Sampling to obtain the second sampling empirical mean.

In the case where sampling apparatus includes quantum processor, execute correspond to future period state action pair sampling with Obtain the second quantum sampling empirical mean for corresponding to the quantum bit of quantum processor.

In the case where sampling apparatus includes Optical devices, the sampling for corresponding to future period state action pair is executed to obtain Obtain the second sampling empirical mean of the optical parametric oscillator corresponding to Optical devices.

More precisely, the second sampling empirical mean includes three multiple values.

In the case where sampling apparatus includes quantum processor, more than second value is that measurement corresponds to one in quantum sampling To the average value of the product of the state of each pair of quantum bit of concealed nodes.In the case where sampling apparatus includes Optical devices, the More than two value is the average value of the product of spin values corresponding with the measured value of the phase of optical parametric oscillator.In sampling apparatus In the case where simulation Quantum annealing application program, more than second value is the average value of the product of spin values.Art technology Personnel will be understood that, for this to concealed nodes j and k, which can be by < h_jh_k>_vIt indicates.

In the case where sampling apparatus includes quantum processor, the multiple values of third be byThe quantum treatment of expression The frequency of occurrences of each configuration of the quantum bit of device, wherein h is to indicate to measure all quantum in each sample that quantum samples The binary vector of the state of position.

In the case where sampling apparatus includes the simulation Quantum annealing application program for classical Boltzmann machine, third is more A value be byThe frequency of occurrences of each configuration of the spin of expression, wherein h is to indicate to survey in each sample of sampling Measure the binary vector of the state of all spins.

In the case where sampling apparatus includes Optical devices, the multiple values of third correspond to byThe optics of expression The frequency of occurrences of each configuration of the spin of the phase of parametric oscillator, wherein h is to indicate corresponding at each sample of sampling In the binary vector of the spin values of the phase measurement of optical parametric oscillator.

In the case where quantum processor in the embodiment that sampling apparatus includes quantum Boltzmann machine, third is multiple Value be byThe appearance frequency of each sample configuration of the classical Effective Hamiltanian of the expression quantum Boltzmann machine of expression Rate, wherein c is the binary vector for indicating all effective spin states.

It will be further understood that, the feelings of the quantum processor in the embodiment that sampling apparatus includes quantum Boltzmann machine Under condition, the sample configuration of Effective Hamiltanian is constructed by the way that the measured value of quantum bit is attached to effectively spin, wherein quantum Each measuring configuration of position corresponds to the copy in Effective Hamiltanian.

In the case where sampling apparatus includes the simulation Quantum annealing application program for quantum Boltzmann machine, third is more A value is the frequency of occurrences for each configuration of Effective Hamiltanian effectively spinned.

Still referring to Figure 2 and according to processing step 74, the new approximation of the value of Q function is determined.It should be appreciated that using institute New approximation of the second sampling empirical mean obtained to future period state action to the value for executing Q function.It should be appreciated that Q The effectiveness of function representation future period state action pair.In the case where wherein sampling apparatus includes quantum processor, institute is used Second quantum of the quantum bit corresponding to quantum processor obtained samples empirical mean, to future period state action to execution The approximation of the value of Q function.

It should be appreciated that executing the approximation of the value of Q function using digital computer 8.

In one embodiment, Q letter is executed using the remote processing unit for being operably connected to digital computer 8 The approximation of several values.

It should be appreciated that in one embodiment and in the case where sampling apparatus includes quantum processor, current The approximation for the value that period and future period obtain Q function includes the more of the Boltzmann machine obtained along measurement axis from sampling apparatus A sample configuration, from the configuration sample of the Effective Hamiltonian function of the multiple above-mentioned quantum Boltzmann machines of configuration sample architecture obtained Originally the experience approximation of the negative free energy for the quantum Boltzmann machine being given by and using digital computer 8 is calculated

It should be appreciated that including in one embodiment and in sampling apparatus the analog quantity for quantum Boltzmann machine In the case where son annealing, obtaining the approximation of the value of Q function in current period and future period includes being indicated from sample devices Multiple sample configurations of the Effective Hamiltanian of above-mentioned quantum Boltzmann machine, and be given by using digital computer calculating Quantum Boltzmann machine negative free energy experience it is approximate

It should be appreciated that including in another embodiment and in sampling apparatus quantum processor or Optical devices or simulation It include along measurement axis from sampling in the approximation that current period and future period obtain the value of Q function in the case where Quantum annealing The multiple sample configurations for the classical Boltzmann machine that device obtains, and the classics being given by are calculated using digital computer 8 The experience of the negative free energy of Boltzmann machine is approximate

It should be appreciated that in another embodiment, including in the approximation for the value that current period and future period obtain Q function The close of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn corresponding to Boltzmann machine is obtained from sampling apparatus Seemingly, and using digital computer 8 glass is calculated using the approximation of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn The experience of the negative free energy of the graceful machine of Wurz is approximate.Negative free energy is given by

Here constant ρ depends on the weight and biasing of the Boltzmann machine in classical Boltzmann machine, and takes Certainly in the weight of the Boltzmann machine in quantum Boltzmann machine and biasing and lateral field strength.Index #c is indicated The number of free cluster in the random cluster expression of Fortuin-Kasteleyn.

Still referring to Figure 2 and according to processing step 76, the Q function generated in current period state action to place is used The approximation of value and first samples empirical mean and in the current period state action obtained using incentive structure to the corresponding of place It rewards to update each coupler of Boltzmann machine and the respective each weight of each node and each biasing.In sample devices In the case where quantum processor, each weight of quantum processor and each biasing are updated.

More precisely, equal to the approximation of the value of the Q function of place's generation and the first experience using current period state action The corresponding reward of value and the current period state options pair obtained using incentive structure, updates each coupling of quantum processor Device and the respective each weight of each quantum bit and each biasing.

If r indicates that the value of reward, it is hidden to will be seen that node i is connected to by following formula update for current period state action Hide the weight of node k

Δw_ik=∈_n(r+γQ₂-Q₁)v_i<h_k>_v。

The weight that concealed nodes k is connected to concealed nodes j is updated by following formula

Δu_kj=∈_n(r+γQ₂-Q₁)<h_kh_j>_v。

And the biasing on concealed nodes k is updated by following formula

Δb_k=∈_n(r+γQ₂-Q₁)<h_k>_v。

Here Q₁It is the approximation of the Q function of current period state action pair, Q₂It is the Q function of future period state action pair Approximation.

According to identical processing step, the biasing on the arbitrarily quantum position of quantum processor passes through the hiding section represented by it Renewal amount on point updates.

According to identical processing step, the weight of the arbitrarily coupling device of quantum processor passes through the weight u represented by it_kjOr w_ikRenewal amount update.

In one embodiment, each weight and each biasing of quantum processor are updated using digital computer 8.

Still referring to Figure 2 and according to processing step 78, test is executed to find out and whether meets stopping criterion.This field The skilled person will understand that stopping criterion can be it is various types of.

It should be appreciated that in one embodiment, stopping criterion may include the training step for reaching maximum quantity.

It should be appreciated that stopping criterion may include reaching maximum runing time in an optional embodiment.

It should be appreciated that in an optional embodiment, stopping criterion may include the weight and partially of coupling and local field The convergence for the function set.

It should be appreciated that stopping criterion may include that strategy is converged to fixed policy in an optional embodiment.

In an optional embodiment, test includes at least one stopping criterion.

In the case where being unsatisfactory at least one stopping criterion and according to processing step 56, from the training data of offer or From environment generation current period state action pair.

In the case where meeting at least one stopping criterion, strategy is provided according to processing step 80.

It should be appreciated that strategy can be provided according to various embodiments.Indeed, it is to be understood that using digital computer 8 Most well-known strategy is provided.

In one embodiment, more specifically policy store is stored in digital computer 8 in digital computer In memory 22.

In alternate embodiments, strategy is shown to the user interacted with digital computer 8 via display device 14.

In another optional embodiment, strategy is sent to the long-range processing being operatively coupled with digital computer 8 Unit.

It should be appreciated that it is computer-readable to further disclose a kind of non-transitory for storing computer executable instructions Storage medium, computer executable instructions make digital computer execute the plan for improving Stochastic Control Problem when executed Method slightly, Stochastic Control Problem are characterized in that set of actions, state set, as the reward knot of state and the function of movement Structure and multiple decision periods, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, this method packet It includes using being coupled to digital computer and being coupled to the sampling apparatus of sampling apparatus control system, which, which obtains, indicates glass The data of the sampling configuration of the graceful machine of Wurz, the Boltzmann machine include multiple nodes, multiple couplers, multiple biasings (each partially Set correspond to multiple nodes in node), multiple coupled weights (each coupled weight correspond to multiple couplers in coupling Device), and lateral field strength；Using digital computer obtain include set of actions, state set, Stochastic Control Problem reward knot Structure and initial policy for Stochastic Control Problem, the strategy include selecting at least one movement for each state；Use number Computer and sampling apparatus control system will indicate each coupler and the respective initial weight of each node of Boltzmann machine Sampling apparatus is distributed to the data of biasing and lateral field strength；Until meeting stopping criterion: being worked as using digital computer generation Preceding period state action pair, using digital computer and sampling apparatus control system, using the current state movement of generation to next Modification indicate without or at least one coupler and at least one biasing data, execute corresponding to current period state action pair Sampling to obtain the first sampling empirical mean, using first sampling empirical mean, use digital computer, obtain current period The approximation of the value of Q function under state action, the value of Q function indicate the effectiveness of current period state action pair, use numerical calculation Machine obtains future period state action pair, and wherein state is to handle to obtain by stochastic regime, and further, wherein should The acquisition of movement includes random to executing to multiple all state actions including future period state and any possible movement Optimum Experiment to provide the movement of future period, and updates the strategy of future period state；Use digital computer and sampling Apparatus control system indicates no or at least one coupler and at least one to modification using the future period state action of generation The data of a biasing execute the sampling for corresponding to future period state action pair to obtain the second sampling empirical mean, use the Two sampling empirical means obtain the approximation of the value of Q function at future period state action, the value of Q function using digital computer The effectiveness for indicating future period state action pair, using digital computer, the Q letter that place is generated using current period state action The approximation of several values and first samples empirical mean and uses the corresponding of the current period state action pair of incentive structure acquisition Reward, updates each coupler and the respective each weight of each node and each biasing of Boltzmann machine, and when meeting Strategy is provided using digital computer when stopping criterion.

It should be appreciated that including the Stochastic Control Problem in memory cell 22 for improving in one embodiment Strategy application program include for using the sampling cartridge for being coupled to digital computer and being coupled to sampling apparatus control system The instruction set, the sampling apparatus obtain the data for indicating the sample configuration of Boltzmann machine, which includes multiple sections Point, multiple couplers, multiple biasings (each biasing corresponds to the node in multiple nodes), (each coupling of multiple coupled weights Weight corresponds to the coupler in multiple couplers), and lateral field strength.For improve include in memory cell 22 with Machine control problem strategy application program further include for using digital computer obtain include set of actions, state set, The instruction of the initialization data of the incentive structure of Stochastic Control Problem and the initial policy for Stochastic Control Problem, strategy include At least one movement is selected for each state.For improve include Stochastic Control Problem in memory cell 22 strategy Application program further includes the instruction for using digital computer and sampling apparatus control system, will indicate the every of Boltzmann machine The data of a coupler and the respective initial weight of each node and biasing and lateral field strength distribute to sampling apparatus.For changing Application program into the strategy for including Stochastic Control Problem in memory cell 22 further includes for stopping mark until meeting Quasi- instruction: generating current period state action pair using digital computer, uses digital computer and sampling apparatus control system System, using current period state action generated to modification indicate without or at least one coupler and at least one biasing Data are executed the sampling for corresponding to current period state action pair to obtain the first sampling empirical mean, are passed through using the first sampling Mean value is tested, using digital computer, obtains the approximation of the value of the Q function of current period state action, the value of Q function indicates current The effectiveness of period state action pair obtains future period state action pair using digital computer, and wherein state is by random What state processing obtained, and further, wherein the acquisition acted include to include future period state and it is any can be active The multiple all state actions made are to random optimization test is executed, to act in future period offer and update future period shape The strategy of state；Using digital computer and sampling apparatus control system, using the future period state action of generation to modification table Show without or at least one coupler and at least one biasing data, execute correspond to future period state action pair sampling To obtain the second sampling empirical mean, it is dynamic to obtain future period state using digital computer using the second sampling empirical mean The approximation of the value of Q function at work, the value of Q function indicate the effectiveness of future period state action pair, using digital computer, use The approximation of the value for the Q function that current period state action generates place and the first empirical mean and working as using incentive structure acquisition The corresponding reward of preceding period state action pair updates each coupler and the respective each weight of each node of Boltzmann machine With each biasing.It include the application program of strategy of Stochastic Control Problem in memory cell 22 further include using for improving In the instruction for using digital computer offer tactful when meeting stopping criterion.

It should be appreciated that the advantages of method disclosed herein is, usage amount sub-sampling is calculated for acting node and hidden Component involved in the weight between component involved in the empirical mean of node, approximation Q function, update quantum bit is hidden, and It updates it and biases related component, thus provide faster Q learning method.

It will be further understood that, another advantage of method disclosed herein is that it overcomes to be handled for Markovian decision Traditional solutions in dimension experienced limitation.

Although above description is related to the presently contemplated specific embodiment of inventor, but it is to be understood that the present invention exists Its broad aspect includes the functional equivalent of elements described herein.

A kind of methods for improving the strategy of Stochastic Control Problem of item 1., Stochastic Control Problem is by set of actions, state Gather, as the incentive structure and characterization of multiple decision periods of state and the function of movement, wherein at basic stochastic regime The evolution of reason depends on multiple movements in strategy, and method includes:

Using the sampling apparatus for being coupled to digital computer He being coupled to sampling apparatus control system, sampling apparatus obtains table Show the data of the sample configuration of Boltzmann machine, Boltzmann machine includes:

Multiple nodes,

Multiple couplers,

Multiple biasings, each biasing correspond to a node in multiple nodes,

Multiple coupled weights, each coupled weight correspond to a coupler in multiple couplers, and

Lateral field strength；

Obtained using digital computer include the set of actions of Stochastic Control Problem, state set, incentive structure and with The initialization data of the initial policy of machine control problem, strategy include selecting at least one movement for each state；

Using digital computer and sampling apparatus control system, each coupler of Boltzmann machine and every will be respectively indicated The data of the initial weight of a node and biasing and lateral field strength distribute to sampling apparatus；

It performs the following operation until meeting stopping criterion:

Current period state action pair is generated using digital computer,

Using digital computer and sampling apparatus control system using the current period state action of generation to modifying table Show the data of no coupler or at least one coupler and at least one biasing,

Execute correspond to current period state action pair sampling to obtain the first sampling empirical mean,

The value of the Q function at the first sampling empirical mean acquisition current period state action is used using digital computer Approximation, the value of Q function indicate the effectiveness of current period state action pair,

Future period state action pair is obtained using digital computer, wherein state is to handle to obtain by stochastic regime , and further wherein, acquisition movement includes: to multiple all shapes including future period state and any possible movement State movement is tested random optimization is executed, so that the strategy for future period state is acted and updates in future period offer,

Using digital computer and sampling apparatus control system, using future period state action generated to modifying Indicate the data of no coupler or at least one coupler and at least one biasing,

Execute correspond to future period state action pair sampling to obtain the second sampling empirical mean,

The value of the Q function at the second sampling empirical mean acquisition future period state action is used using digital computer Approximation, the value of Q function indicate the effectiveness of future period state action pair, and

It is used using digital computer and is adopted in approximation and first of the current period state action to the value of the Q function of place's generation Sample empirical mean, and the corresponding of place is rewarded in current period state action using what incentive structure obtained, to update respectively Each coupler of Boltzmann machine and each weight of each node and each biasing；And

Strategy is provided using digital computer when meeting stopping criterion.

2. methods according to item 1 of item, wherein sampling apparatus includes quantum processor, and wherein, sampling apparatus control System processed includes quantum devices control system；Further wherein, quantum processor is coupled to digital computer and quantum devices control System processed, further wherein, quantum processor include multiple quantum bits and multiple couplers, and each coupler is used to measure at two The infall of sub- position provides communicative couplings.

3. methods according to item 1 of item, wherein sampling apparatus includes being configured as receiving energy simultaneously from optical energy source The Optical devices and multiple coupling devices of multiple optical parametric oscillators are generated, each of multiple coupling devices can Control ground couples an optical parametric oscillator in multiple optical parametric oscillators.

4. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by zero transverse direction field strength Classical Boltzmann machine；Further wherein, memory cell includes respectively indicating each of classical Boltzmann machine for obtaining The application program of the data of each weight and each biasing of coupler and each node, further wherein, application program are applicable in In the simulation Quantum annealing for executing classical Boltzmann machine.

5. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by nonzero value transverse direction field strength Quantum Boltzmann machine；Further wherein, memory cell includes respectively indicating the every of quantum Boltzmann machine for obtaining The application program of the data of each weight and each biasing of a coupler and each node；Further wherein, application program is suitable For executing the simulation Quantum annealing of quantum Boltzmann machine.

6. methods according to item 5 of item, wherein the simulation Quantum annealing for executing quantum Boltzmann machine provides expression amount Multiple sample configurations of the Effective Hamiltanian of sub- Boltzmann machine.

7. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by zero transverse direction field strength Classical Boltzmann machine；Further, wherein memory cell includes respectively indicating the every of classical Boltzmann machine for obtaining The application program of the data of each weight and each biasing of a coupler and each node, further wherein, application program are suitable Multiple examples for indicating the random cluster of Fortuin-Kasteleyn corresponding with classical Boltzmann machine sample, To provide the approximation of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn.

8. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by nonzero value transverse direction field strength Quantum Boltzmann machine；Further wherein, memory cell includes respectively indicating the every of quantum Boltzmann machine for obtaining The application program of the data of each weight and each biasing of a coupler and each node；Further wherein, application program is suitable Multiple examples for indicating the random cluster of Fortuin-Kasteleyn corresponding with quantum Boltzmann machine sample, To provide the approximation of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn.

9. methods according to any one of item 2,3,4 and 5 of item, wherein obtained in both current period and future period The approximation for obtaining the value of Q function includes: the multiple configuration samples for obtaining Boltzmann machine from sampling apparatus along measurement axis, and is used The experience that digital computer calculates the free energy of Boltzmann machine is approximate.

10. methods according to any one of item 2 and 5 of item, wherein obtain Q in both current period and future period The approximation of the value of function includes: that multiple sample configurations of Boltzmann machine are obtained from sampling apparatus along measurement axis, from being obtained Sample configuration construction indicate quantum Boltzmann machine Effective Hamiltanian multiple configurations sample, and use numerical calculation The experience that machine calculates the free energy of quantum Boltzmann machine is approximate.

11. methods according to item 6 of item, wherein obtain the close of the value of Q function in both current period and future period It seemingly include the sample of the multiple configurations for the Effective Hamiltanian for indicating quantum Boltzmann machine being obtained from sampling apparatus, and use number The experience that word computer calculates the free energy of quantum Boltzmann machine is approximate.

12. methods according to item 8 of item, wherein obtain the close of the value of Q function in both current period and future period It seemingly include: to be obtained in the random cluster expression of Fortuin-Kasteleyn corresponding with quantum Boltzmann machine from sampling apparatus The approximation of the number of cluster, and it is approximate using the experience that digital computer calculates the free energy of quantum Boltzmann machine.

13. methods according to any one of item 2,3,4 and 5 of item, wherein calculate equal corresponding to the first experience of node Both value and the second empirical mean include: one obtained along measurement axis from sampling apparatus in quantum or classical Boltzmann machine Multiple configurations sample, and carry out using digital computer the approximation of the empirical mean of calculate node.

14. methods according to item 6 of item, wherein calculate the first empirical mean for corresponding to node and the second experience is equal Both values include: the sample of multiple configurations of Effective Hamiltanian of Boltzmann machine to be obtained from sampling apparatus, and use number Computer carrys out the approximation of the empirical mean of calculate node.

15. methods according to item 1 of item, wherein include: to random optimization test is executed to multiple all state actions

Each state action corresponding to future period state is used using digital computer and sampling apparatus control system No coupler or at least one coupler and at least one data biased are indicated to modify,

It executes and samples with each state action for corresponding to future period state to corresponding to provide empirical mean, make The approximation for corresponding to the value of the Q function of each state action pair of future period state is obtained with digital computer,

Use the value with each state action corresponding to future period state to corresponding all approximate Q functions, benefit With digital computer, sampled from corresponding distribution to update the strategy of future period state.

16. methods according to item 1 of item, wherein include: to random optimization test is executed to multiple all state actions

Obtain temperature parameter；

Obtain future period state；

Associated ANALOGY OF BOLTZMANN DISTRIBUTION approximate with the value of the Q function with state variable is sampled, state becomes Measure future period state and provide at a temperature of be fixed.

17. methods according to item 2, wherein multiple quantum bits of quantum processor include:

First group of quantum bit；

Second group of quantum bit；And

Wherein, multiple couplers of quantum processor include:

At least one coupler, each of at least one coupler is for a quantum in first group of quantum bit The infall between at least one quantum bit in position and second group of quantum bit provides communicative couplings, and

Multiple couplers, each of multiple couplers are for the quantum bit and second in second group of quantum bit The infall between other quantum bits in group quantum bit provides communicative couplings.

18. methods according to item 17 of item, wherein first group of quantum bit indicates the set of actions of Stochastic Control Problem.

19. methods according to item 17 of item, wherein indicated using current period state action generated to modify There is no the data of coupler or at least one coupler and at least one biasing, comprising:

There is provided the infall between the quantum bit and second group of quantum bit in first group of quantum bit to communicative couplings All couplers be switched to closing, and

It is biased to modify at least one of second group of quantum bit using current period state action generated.

20. methods according to item 17 of item, wherein indicated using future period state action generated to modify There is no the data of coupler or at least one coupler and at least one biasing, comprising:

It is biased to modify at least one of second group of quantum bit using future period state action generated.

21. methods according to item 17, wherein to including the multiple of future period state and any possible movement All state actions are tested random optimization is executed, comprising:

Infall between the quantum bit in the quantum bit and second group of quantum bit in first group of quantum bit is mentioned Connection is switched to for all couplers of communicative couplings；

It is modified in second group of quantum bit at least using the future period state for corresponding to future period state action pair One biasing；

Quantum sampling is executed to obtain the empirical mean for corresponding to first group of quantum bit；And

By according to the distribution for the empirical mean obtained for corresponding to first group of quantum bit to future period state assignment Movement to update the strategy of future period state using digital computer.

22. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes the instruction for reaching maximum quantity Practice step.

23. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes reaching maximum runing time.

24. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes the power of coupling and local field The convergence of weight and the function of biasing.

25. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes that strategy is converged to fixation Strategy.

26. methods according to any one of item 1 to 25, wherein provide strategy include it is following at least one: to User's display strategy of digital computer interaction；It stores the policies into digital computer and sends strategy to and operationally connect It is connected to another processing unit of digital computer.

27. methods according to any one of item 1 to 26 of item, wherein digital computer includes memory cell；Into one Wherein, initialization data is obtained from the memory cell of digital computer to step.

28. methods according to any one of item 1 to 26 of item, wherein initialization data is handed over from digital computer An acquisition in mutual user or the remote processing unit being operably connected with digital computer.

A kind of 29. digital computers of item, including

Central processing unit；

Display device；

Communication port, for digital computer to be operably connected to sampling apparatus, sampling apparatus is coupled to digital meter Calculation machine and sampling apparatus control system；

Memory cell, the application program including the strategy for improving Stochastic Control Problem, Stochastic Control Problem is by moving Make to gather, state set, the incentive structure as state and the function of movement and characterization of multiple decision periods, wherein basis The evolution of stochastic regime processing depends on multiple movements in strategy, and application program includes:

For using the instruction for being coupled to the sampling apparatus of digital computer and sampling apparatus control system, sampling apparatus is obtained Must indicate the data of the sample configuration of Boltzmann machine, Boltzmann machine include multiple nodes, multiple couplers, multiple biasings, Multiple coupled weights, and lateral field strength, each biasing correspond to a node in multiple nodes, and each coupled weight is corresponding A coupler in multiple couplers；

For use digital computer obtain include the set of actions of Stochastic Control Problem, state set, incentive structure and The instruction of the initialization data of the initial policy of Stochastic Control Problem, strategy include selecting at least one dynamic for each state Make；

For using digital computer and sampling apparatus control system that will respectively indicate each coupler of Boltzmann machine The instruction of sampling apparatus is distributed to the data of the initial weight of each node and biasing and lateral field strength；

For the following instruction operated until meeting stopping criterion:

Current period state action pair is generated using digital computer,

Using digital computer and sampling apparatus control system using current period state action generated to modifying Indicate the data of no coupler or at least one coupler and at least one biasing,

Future period state action pair is obtained using digital computer, wherein state is to handle to obtain by stochastic regime , and further wherein, acquisition movement includes to multiple all shapes including future period state and any possible movement State movement is to progress random optimization test, so that the strategy for future period state is acted and updates in future period offer,

The value of Q function is obtained at future period state action using the second sampling empirical mean using digital computer Approximation, the value of Q function indicate the effectiveness of future period state action pair, and

The instruction of strategy is provided using digital computer when meeting stopping criterion.

Item 30. is a kind of for storing the non-transitory computer-readable storage media of computer executable instructions, computer The method for the strategy that executable instruction executes digital computer for improving Stochastic Control Problem, STOCHASTIC CONTROL Problem is characterized by set of actions, state set, the incentive structure as state and the function of movement and multiple decision periods, Wherein, the evolution of basic stochastic regime processing depends on multiple movements in strategy, and method includes:

Using the sampling apparatus for being coupled to digital computer and sampling apparatus control system, sampling apparatus, which obtains, indicates Bohr Hereby the data of the sample configuration of graceful machine, Boltzmann machine include:

Multiple nodes,

Multiple couplers,

Multiple biasings, each biasing correspond to a node in multiple nodes,

Lateral field strength；

Each coupler of Boltzmann machine and every will be respectively indicated using digital computer and sampling apparatus control system The data of the initial weight of a node and biasing and lateral field strength distribute to sampling apparatus；

It performs the following operation until meeting stopping criterion:

Current period state action pair is generated using digital computer,

The value of the Q function at the second sampling empirical mean acquisition future period state action is used using digital computer Approximation, the value of Q function indicate the effectiveness of future period state action pair,

Strategy is provided using digital computer when meeting stopping criterion.

Claims

1. a kind of method for improving the strategy of Stochastic Control Problem, the Stochastic Control Problem is by set of actions, state set It closes, as the incentive structure and characterization of multiple decision periods of state and the function of movement, wherein basic stochastic regime processing Evolution depend on it is described strategy in multiple movements, which comprises

Using the sampling apparatus for being coupled to digital computer He being coupled to sampling apparatus control system, the sampling apparatus obtains table Show the data of the sample configuration of Boltzmann machine, the Boltzmann machine includes:

Multiple nodes,

Multiple couplers,

Multiple biasings, each biasing correspond to a node in the multiple node,

Multiple coupled weights, each coupled weight correspond to a coupler in the multiple coupler, and

Lateral field strength；

It the use of digital computer acquisition include the set of actions of the Stochastic Control Problem, the state set, institute The initialization data of the initial policy of incentive structure and the Stochastic Control Problem is stated, the strategy includes selecting for each state Select at least one movement；

Using the digital computer and the sampling apparatus control system, each institute of the Boltzmann machine will be respectively indicated The data of the initial weight and the biasing and the lateral field strength of stating coupler and each node distribute to described adopt Sampling device；

It performs the following operation until meeting stopping criterion:

Current period state action pair is generated using the digital computer,

Using the digital computer and the sampling apparatus control system using the current period state action of generation to repairing Change the data for indicating no coupler or at least one coupler and at least one biasing,

Execute correspond to the current period state action pair sampling to obtain the first sampling empirical mean,

The Q letter at the current period state action is obtained using the first sampling empirical mean using the digital computer The approximation of several values, the value of the Q function indicate the effectiveness of the current period state action pair,

Future period state action pair is obtained using the digital computer, wherein the state is handled by stochastic regime It obtains, and further wherein, obtaining the movement includes: to including the future period state and any possible movement Multiple all state actions to random optimization test is executed, to provide the movement in the future period and update and be used for The strategy of the future period state,

Using the digital computer and the sampling apparatus control system, using future period state action generated to next Modification indicates the data of no coupler or at least one coupler and at least one biasing,

Execute correspond to the future period state action pair sampling to obtain the second sampling empirical mean,

The institute at the future period state action is obtained using the second sampling empirical mean using the digital computer The approximation of the value of Q function is stated, the value of the Q function indicates the effectiveness of the future period state action pair, and

Approximation and the institute in current period state action to the value of the Q function of place's generation are used using the digital computer State the first sampling empirical mean, and the correspondence in the current period state action to place obtained using the incentive structure Reward, to update each coupler of the Boltzmann machine and each weight and each biasing of each node respectively；And

The strategy is provided using the digital computer when meeting the stopping criterion.

2. according to the method described in claim 1, wherein, the sampling apparatus includes quantum processor, and wherein, described to adopt Sampling device control system includes quantum devices control system；Further wherein, the quantum processor is coupled to the digital meter Calculation machine and the quantum devices control system, further wherein, the quantum processor includes multiple quantum bits and multiple couplings Device, each coupler are used to provide communicative couplings in the infall of two quantum bits.

3. according to the method described in claim 1, wherein, the sampling apparatus includes being configured as receiving energy from optical energy source The Optical devices and multiple coupling devices of multiple optical parametric oscillators are measured and generate, it is every in the multiple coupling device One optical parametric oscillator that can be coupled with controlling in the multiple optical parametric oscillator.

4. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by zero cross The classical Boltzmann machine characterized to field strength；Further wherein, the memory cell is described including respectively indicating for acquisition The application program of each coupler of classical Boltzmann machine and each weight of each node and the data of each biasing, into one Wherein, the application program is adapted for carrying out the simulation Quantum annealing of the classical Boltzmann machine to step.

5. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by nonzero value The quantum Boltzmann machine of lateral field strength characterization；Further wherein, the memory cell includes respectively indicating institute for obtaining State the application program of each coupler of quantum Boltzmann machine and each weight of each node and the data of each biasing；Into Wherein, the application program is adapted for carrying out the simulation Quantum annealing of the quantum Boltzmann machine to one step.

6. according to the method described in claim 5, wherein, the simulation Quantum annealing for executing the quantum Boltzmann machine mentions For indicating multiple sample configurations of the Effective Hamiltanian of the quantum Boltzmann machine.

7. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by zero It is worth the classical Boltzmann machine of lateral field strength characterization；Further, wherein the memory cell includes for obtaining table respectively Show the application journey of the data of each coupler of the classical Boltzmann machine and each weight and each biasing of each node Sequence, further wherein, the application program are suitable for Fortuin- corresponding with the classics Boltzmann machine Multiple examples that the random cluster of Kasteleyn indicates are sampled, so that providing the random cluster of the Fortuin-Kasteleyn indicates In cluster number approximation.

8. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by nonzero value The quantum Boltzmann machine of lateral field strength characterization；Further wherein, the memory cell includes respectively indicating institute for obtaining State the application program of each coupler of quantum Boltzmann machine and each weight of each node and the data of each biasing；Into One step wherein, the application program be suitable for Fortuin-Kasteleyn corresponding with the quantum Boltzmann machine with Multiple examples that machine cluster indicates are sampled, to provide the number of the cluster in the random cluster expression of the Fortuin-Kasteleyn Purpose is approximate.

9. the method according to any one of claim 2,3,4 and 5, wherein in the current period and the future The approximation that both phases obtain the value of the Q function includes: to obtain the Boltzmann machine from the sampling apparatus along measurement axis Multiple configuration samples, and it is approximate using the experience that the digital computer calculates the free energy of the Boltzmann machine.

10. the method according to any one of claim 2 and 5, wherein in the current period and the future period two The approximation that person obtains the value of the Q function includes: to obtain the Boltzmann machine from the sampling apparatus along measurement axis Multiple sample configurations, indicate the more of the Effective Hamiltanian of the quantum Boltzmann machine from sample configuration obtained construction The sample of a configuration, and it is approximate using the experience that the digital computer calculates the free energy of the quantum Boltzmann machine.

11. according to the method described in claim 6, wherein, obtaining the Q in both the current period and the future period The approximation of the value of function includes that the Effective Hamiltanian for indicating the quantum Boltzmann machine is obtained from the sampling apparatus The multiple configuration sample, and calculate using the digital computer experience of the free energy of the quantum Boltzmann machine It is approximate.

12. according to the method described in claim 8, wherein, obtaining the Q in both the current period and the future period The approximation of the value of function includes: to obtain the Fortuin- corresponding with the quantum Boltzmann machine from the sampling apparatus The approximation of the number of cluster in the random cluster expression of Kasteleyn, and the quantum glass is calculated using the digital computer The experience of the free energy of the graceful machine of Wurz is approximate.

13. the method according to any one of claim 2,3,4 and 5, wherein calculate the first warp for corresponding to the node Testing both mean value and the second empirical mean includes: to obtain the quantum or the classical glass from the sampling apparatus along measurement axis The sample of one multiple configuration in the graceful machine of Wurz, and calculate using the digital computer empirical mean of the node Approximation.

14. according to the method described in claim 6, wherein, calculating the first empirical mean for corresponding to the node and the second warp Testing both mean values includes: the sample that multiple configurations of Effective Hamiltanian of the Boltzmann machine are obtained from the sampling apparatus Originally, and using the digital computer calculate the approximation of the empirical mean of the node.

15. according to the method described in claim 1, wherein, to the multiple all state actions to executing the random optimization Test includes:

It is used using the digital computer and the sampling apparatus control system and corresponds to each of described future period state State action indicates no coupler or at least one coupler and at least one data biased to modify,

It executes and samples with each state action for corresponding to the future period state to corresponding to provide empirical mean, make The close of the value for corresponding to the Q function of each state action pair of the future period state is obtained with the digital computer Seemingly,

Use the value with each state action corresponding to the future period state to corresponding all approximate Q functions, benefit With the digital computer, sampled from corresponding distribution to update the strategy of the future period state.

16. according to the method described in claim 1, wherein, to the multiple all state actions to executing the random optimization Test includes:

Obtain temperature parameter；

Obtain the future period state；

It is sampled to the approximate associated ANALOGY OF BOLTZMANN DISTRIBUTION of the value of the Q function with state variable, the shape State variable the future period state and the offer at a temperature of be fixed.

17. according to the method described in claim 2, wherein, the multiple quantum bit of the quantum processor includes:

First group of quantum bit；

Second group of quantum bit；And

Wherein, the multiple coupler of the quantum processor includes:

At least one coupler, each of at least one described coupler is for one in first group of quantum bit Infall between at least one quantum bit in quantum bit and second group of quantum bit provides communicative couplings, and

Multiple couplers, each of the multiple coupler in second group of quantum bit a quantum bit and Infall between other quantum bits in second group of quantum bit provides communicative couplings.

18. according to the method for claim 17, wherein first group of quantum bit indicates the dynamic of the Stochastic Control Problem Work is gathered.

19. according to the method for claim 17, wherein indicated using current period state action generated to modify There is no the data of coupler or at least one coupler and at least one biasing, comprising:

There is provided the infall between the quantum bit and second group of quantum bit in first group of quantum bit to communication All couplers of coupling are switched to closing, and

It is biased to modify at least one of described second group of quantum bit using current period state action generated.

20. according to the method for claim 17, wherein indicated using future period state action generated to modify There is no the data of coupler or at least one coupler and at least one biasing, comprising:

It is biased to modify at least one of described second group of quantum bit using future period state action generated.

21. according to the method for claim 17, wherein to including the future period state and any possible movement Multiple all state actions are tested the random optimization is executed, comprising:

By the intersection between the quantum bit in the quantum bit and second group of quantum bit in first group of quantum bit Place provides all couplers being communicatively coupled and is switched to connection；

It is modified in second group of quantum bit using the future period state for corresponding to the future period state action pair At least one biasing；

By according to the distribution for the empirical mean obtained for corresponding to first group of quantum bit to the future period state Distribution acts to update the strategy of the future period state using the digital computer.

22. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes reaching maximum quantity Training step.

23. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes reaching maximum operation Time.

24. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes the coupling drawn game The convergence of the function of the weight and biasing in portion.

25. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes receiving the strategy Hold back fixed policy.

26. according to claim 1 to method described in any one of 25, wherein provide it is described strategy include it is following at least one: The strategy is shown to the user interacted with the digital computer；By the policy store in the digital computer and will The strategy is sent to another processing unit for being operably connected to the digital computer.

27. according to claim 1 to method described in any one of 26, wherein the digital computer includes memory cell； Further wherein, the initialization data is obtained from the memory cell of the digital computer.

28. according to claim 1 to method described in any one of 26, wherein the initialization data be from the number The user of computer interaction or an acquisition in the remote processing unit that is operably connected with the digital computer.

29. a kind of digital computer, including

Central processing unit；

Display device；

Communication port, for the digital computer to be operably connected to sampling apparatus, the sampling apparatus is coupled to number Word computer and sampling apparatus control system；

Memory cell, the application program including the strategy for improving Stochastic Control Problem, the Stochastic Control Problem is by moving Make to gather, state set, the incentive structure as state and the function of movement and characterization of multiple decision periods, wherein basis The evolution of stochastic regime processing depends on multiple movements in the strategy, and the application program includes:

It is described to adopt for using the instruction for being coupled to the sampling apparatus of the digital computer and the sampling apparatus control system Sampling device obtains the data for indicating the sample configuration of Boltzmann machine, and the Boltzmann machine includes multiple nodes, multiple couplings Device, multiple biasings, multiple coupled weights, and lateral field strength, each biasing correspond to one in the multiple node Node, each coupled weight correspond to a coupler in the multiple coupler；

For using the digital computer to obtain the set of actions for including the Stochastic Control Problem, the state set The instruction of the initialization data of the initial policy of conjunction, the incentive structure and the Stochastic Control Problem, the strategy include pair At least one movement is selected in each state；

For using the digital computer and the sampling apparatus control system that will respectively indicate the every of the Boltzmann machine The data of the initial weight and biasing of a coupler and each node and the lateral field strength distribute to the sampling apparatus Instruction；

For the following instruction operated until meeting stopping criterion:

Current period state action pair is generated using the digital computer,

Using the digital computer and the sampling apparatus control system using current period state action generated to next Modification indicates the data of no coupler or at least one coupler and at least one biasing,

Future period state action pair is obtained using the digital computer, wherein the state is handled by stochastic regime It obtains, and further wherein, obtaining the movement includes to including the future period state and any possible movement Multiple all state actions to random optimization test is carried out, to provide the movement in the future period and update and be used for The strategy of the future period state,

Institute is obtained at the future period state action using the second sampling empirical mean using the digital computer The approximation of the value of Q function is stated, the value of the Q function indicates the effectiveness of the future period state action pair, and

The instruction of the strategy is provided using the digital computer when meeting the stopping criterion.

30. a kind of for storing the non-transitory computer-readable storage media of computer executable instructions, the computer can Execute instruction the method for making digital computer execute the strategy for improving Stochastic Control Problem when executed, the random control Problem processed by set of actions, state set, as the incentive structure and multiple decision period tables of state and the function of movement Sign, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, which comprises

Using the sampling apparatus for being coupled to digital computer and sampling apparatus control system, the sampling apparatus, which obtains, indicates Bohr The hereby data of the sample configuration of graceful machine, the Boltzmann machine include:

Multiple nodes,

Multiple couplers,

Multiple biasings, each biasing correspond to a node in the multiple node,

Lateral field strength；

Each coupling of the Boltzmann machine will be respectively indicated using the digital computer and the sampling apparatus control system The data of the initial weight of clutch and each node and biasing and lateral field strength distribute to the sampling apparatus；

It performs the following operation until meeting stopping criterion:

Current period state action pair is generated using the digital computer,

The institute at the future period state action is obtained using the second sampling empirical mean using the digital computer The approximation of the value of Q function is stated, the value of the Q function indicates the effectiveness of the future period state action pair,