CN109154798A - For improving the method and system of the strategy of Stochastic Control Problem - Google Patents
For improving the method and system of the strategy of Stochastic Control Problem Download PDFInfo
- Publication number
- CN109154798A CN109154798A CN201780028555.9A CN201780028555A CN109154798A CN 109154798 A CN109154798 A CN 109154798A CN 201780028555 A CN201780028555 A CN 201780028555A CN 109154798 A CN109154798 A CN 109154798A
- Authority
- CN
- China
- Prior art keywords
- quantum
- digital computer
- coupler
- period state
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06E—OPTICAL COMPUTING DEVICES; COMPUTING DEVICES USING OTHER RADIATIONS WITH SIMILAR PROPERTIES
- G06E3/00—Devices not provided for in group G06E1/00, e.g. for processing analogue or hybrid data
- G06E3/001—Analogue devices in which mathematical operations are carried out with the aid of optical or electro-optical elements
- G06E3/005—Analogue devices in which mathematical operations are carried out with the aid of optical or electro-optical elements using electro-optical or opto-electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N10/00—Quantum computing, i.e. information processing based on quantum-mechanical phenomena
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Condensed Matter Physics & Semiconductors (AREA)
- Nonlinear Science (AREA)
- Optics & Photonics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Algebra (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
It discloses a kind of for improving the method and system of the strategy of Stochastic Control Problem, Stochastic Control Problem by set of actions, state set, as the incentive structure of state and the function of movement, and multiple decision period characterizations, this method includes that the data for indicating the sample configuration of Boltzmann machine are obtained using sampling apparatus, obtains the initialization data and initial policy of Stochastic Control Problem;The data of the initial weight of each coupler for respectively indicating Boltzmann machine and each node and biasing and lateral field strength are distributed into sampling apparatus;It performs the following operation until meeting stopping criterion: generating current period state action pair, modification indicates the data of no coupler or at least one coupler and at least one biasing, the sampling for corresponding to current period state action pair is executed to obtain the first sampling empirical mean, obtain the approximation of the value of the Q function at current period state action, obtain future period state action pair, wherein, state is to handle to obtain by stochastic regime, and further wherein, acquisition movement includes to multiple all state actions including future period state and any possible movement to execution random optimization test, to be acted in future period offer and update the strategy of future period state;Modification indicates the data of no coupler or at least one coupler and at least one biasing, execute the sampling for corresponding to future period state action pair, the approximation for obtaining the value of the Q function at future period state action, updates each weight and each biasing, and strategy is provided when meeting stopping criterion.
Description
Cross reference to related applications
The U.S. Provisional Patent Application No.62/333,707's that patent application claims are submitted on May 9th, 2016 is preferential
Power.
Technical field
The present invention relates to calculating.More precisely, the invention relate to improve the method for the strategy of Stochastic Control Problem
And system.
Background technique
Markovian decision processing
Stochastic Control Problem is intended to design a kind of strategy and is developed to control by random process with the system of maximum utility
State.
It is the certain types of Stochastic Control Problem for meeting markov attribute that Markovian decision, which handles (MDP),.
Markovian decision processing is widely used in simulating the Sequential Decision done under uncertain condition.
Many problems are related to Markovian decision processing, as population harvest (population harvesting), control fill
Water resource, the equipment replacement of any industry, the Portfolio Optimization of finance and investment, queuing theory and the operation irrigate and generated electricity are ground
Management, quarantine and treatment level, life are overbooked in scheduling, generation credit and insurance policies, health and the pharmacy application studied carefully
At Motion, emergency response vehicle location.
In fact, a given system with some intrinsic random evolutions, when these decisions may influence system, certainly
How does plan person determine to maximize some utility functions dependent on system within multiple periods?
In form, Markovian decision processing can be defined by following four part.
1. one group of decision period T={ n, n+1 ..., m }, wherein m can be limited or unlimited.It should be appreciated that should
Group decision period indicates to have to make as one group of time of decision.For example, being involved in the problems, such as that Markovian decision processing is equipment
In the case where replacement, this group of decision period can be continuous use equipment daily.
2. state space S.It should be appreciated that any state in state space all includes the data for indicating realization system.Example
Such as, in the case where being involved in the problems, such as Markovian decision processing is equipment replacement problem, state space can be expression equipment
Situation one group of integer.
3. the space A of action.It should be appreciated that any movement in motion space all includes can control for expression system
Data.For example, motion space may include in the case where being involved in the problems, such as Markovian decision processing is equipment replacement problem
Two movements, replace or are changed without equipment.
4. instantaneously reward (instantaneous rewards)It should be appreciated that instantaneous prize
Encourage the result for indicating to take action when system is in the given state in given decision period.For example, determining being related to markov
In the case that the problem of plan processing is equipment replacement problem, if movement is the equipment that replacement indicates equipment replacement cost, wink
When reward can be negative integer, be otherwise positive integer.When running under the conditions of device is preferably, positive integer is bigger.
It should be appreciated that transition probabilityIt is from given state to the general of the transformation of another given state
Rate.The markov attribute of Markovian decision processing can be write as:
For example, being involved in the problems, such as that Markovian decision processing is equipment replacement problem and device has the (event of 3 kinds of situations
Hinder, be poor, good) in the case where, transition probability can be unrelated with the time and be provided by transition probability matrix:
With
5. discount factor γ ∈ [0,1).It should be appreciated that discount factor indicates importance between the following reward and current reward
Difference.
Policy definition is function alpha: S × T → A.It will thus be appreciated that strategy is to distribute movement in each decision period
To the state of system.For example, strategy can be in the case where being involved in the problems, such as Markovian decision processing is equipment replacement problem
It is the only more changing device when equipment is in failure situations, is otherwise changed without device.
It will further be appreciated by those of ordinary skill in the art that utility function can be defined asWherein, in given original state snUnder conditions of tactful α,
Summand is the discount desired value of the following reward.Therefore, it will be appreciated by those skilled in the art that policymaker may want to maximize effect
With function, that is, findThis, which in turn means, finds optimal policy: α*=arg
maxαUn(sn, α).
It will be understood by those skilled in the art that when utility function carries out maximum to the action to be taken within current decision period
When changing and continuing to use optimal policy, which will be referred to as Q function, and can be written asAnd work as Q (sn, an) in anWhen upper maximization, we obtain optimal
State action pair.
It may be very troublesome it should be appreciated that finding optimal policy.In fact, when state, movement and/or decision period collection become
Must be too big, or when transition probability is unknown, it may be problematic for finding the solution of Markovian decision processing problem.
In the literature, the algorithm that the lower bound of the computation complexity of algorithm is exponentially increased relative to the dimension of problem is referred to as
The algorithm of (curse of dimensionality) is limited by dimension.Solve the problems, such as the common side of Markovian decision processing
Method is value iterative method [Richard Bellman, " A Markovian Decision Process ", Journal of
Mathematics and Mechanics, Vol.6, No.5 (1957)], it has exponential complexity under normal circumstances;That is Ω
(2d), wherein d indicates the dimension of Markovian decision processing problem.
There are many methods to overcome dimension restricted problem, such as the learning-oriented method of Q [Richard S.Sutton, Andrew
G.Barto].However, these methods need to store the value of the Q function of all possible state action pair, this is for certain problems
Range become infeasible.In order to overcome this disadvantage, a kind of Q function parameter method (example neural network based is proposed
Such as [Sallans, B., Hinton, G.E., Reinforcement Learning with Factored States and
Actions, Journal of Machine Learning Research 5,1063-1088,2004]), but this is related to training
Neural network, training neural network need to be fitted neural network, this is an independent matter of opening, and in certain situations
Under, it needs that neural network is trained to need to solve the problems, such as NP difficulty.
Therefore, it is necessary to a kind of method for improvement strategy, which will overcome at least one of disadvantages mentioned above.
Artificial neural network
Artificial neural network (ANN) is the computation model inspired by biological neural network and the approximation for being used for function.Manually
Neural network is indicated with graph theory, wherein the node of figure is also referred to as neuron, and edge is also referred to as cynapse.
Common Boltzmann machine (GBM) is a kind of artificial neural network, wherein neuron indicates there is line connected to it
Property biasing stochastic variable, each cynapse expression between two neurons is related to the secondary of stochastic variable relevant to neuron
?.Specifically, there is global energy function relevant to common Boltzmann machine, by from all linear terms and quadratic term
Contribution composition.
Therefore, common Boltzmann machine is the graphical model for the Joint Distribution of approximate dependent variable.Figure includes accordingly
The node of referred to as visible node (or input variable), and the invisible node of referred to as concealed nodes (or latent variable).Commonly
Boltzmann machine is exploited for indicating and solving certain combinatorial problems, and may be used as probability machine learning tool.Commonly
The application program of Boltzmann machine includes but is not limited to visual object and speech recognition, classification, recurrence task, dimensionality reduction, information inspection
Rope and image reconstruction.About the general introduction of common Boltzmann machine, referring to D.Ackley, G.Hinton, T.Sejnowski, " A
Learning Algorithm for Boltzmann Machines, " Cognitive Science 9,147-169 (1985).
Distribution in common Boltzmann machine is approximately the node by the way that interested dependent variable to be encoded to larger figure
It is performed.These nodes are visible nodes, and every other node is all concealed nodes.In the graphic be respectively each side and
Distribute weight and biasing on vertex, and energy function is distributed to figure according to these weights and biasing.
Common Boltzmann machine with any connection not yet proves to be particularly useful in machine learning meaning.This is
Since approximate learning method is very slow.When carrying out certain limitations to the connection between concealed nodes, common Boltzmann machine mind
Through network become easier to training and to machine learning task it is useful.When not allowing the connection between concealed nodes and do not permit
When connection perhaps between visible node, obtained neural network is referred to as limited Boltzmann machine (RBM), only by one
Visible layer and a hidden layer composition.
In the case where no inside is visible or internal hiding node connection, effective training algorithm has been developed,
It is by easily learning the probability distribution in one group of input in visible layer, so that limited Boltzmann machine is led in machine learning
It is showed in domain good.In relation to application, algorithm and theory, Section 6 of Y.Bengio et al., " Representation are please referred to
Learning:A Review and New Perspectives ", arXiv 2014-(http://www.cl.uni- heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf)。
To create more effective neural network (referred to as deepness belief network (DBN)), it is limited the idea of Boltzmann machine
By diversification.Deepness belief network is created by stacking on top of each other limited Boltzmann machine, so that the first limited glass
The hidden layer of the graceful machine of Wurz is used as the visible layer of the second limited Boltzmann machine, and the hidden layer of the second limited Boltzmann machine is used as
Third is limited the visible layer of Boltzmann machine, and so on.This structure is widely studied, and is the basis of deep learning.It is this
The advantages of structure, is that network weight and biasing can train limited Boltzmann by limited Boltzmann machine from top to down
Machine uses the identical training algorithm for independent limited Boltzmann machine exploitation.Application journey in relation to deepness belief network behind
Sequence, algorithm and theory, please refer to:http://neuralnetworksanddeeplearning.com/chap6.html。
The method that the limited Boltzmann machine of deepness belief network is trained with limited Boltzmann machine is with each limited
Accumulated error caused by the APPROXIMATE DISTRIBUTION of Boltzmann machine is cost.Another method of this neural network of training is handle
It updates in same an iteration rather than successively all weights as common Boltzmann machine.Applied to this structure
Method be known as depth Boltzmann machine (DBM).
Quantum processor
Quantum processor is the quantum-mechanical system of multiple quantum bits, and measurement on it will obtain carrying out the complete of free system
The sample of ANALOGY OF BOLTZMANN DISTRIBUTION defined in office's energy.
Quantum bit is the physics realization of the quantum-mechanical system indicated on Hilbert space, and realizes at least two
Different and differentiable eigenstate indicates two states of quantum bit.Quantum bit is the simulation of digit order number, wherein environment is deposited
Storage device can store two states of two state quantum information | 0 > with | 1 >, but can also be in the form of the superposition of two states
α | 0 >+β | 1 > storage.In various embodiments, such system can have more than two eigenstates, in this case, attached
The eigenstate added is used to measure by degeneracy (degenerate) to indicate two logic states.The reality of quantum bit has been proposed
Existing various embodiments: for example, electronically or with Nuclear Magnetic Resonance Measurement and the solid-state nuclear spin of control, the ion of capture, light
It learns the atom (Eurytrema coelomatium) in chamber, liquid nuclear spin, the electron charge in quantum dot or spin freedom degree, be based on
Superconducting Quantum circuit [Barone the and Paterno, 1982, Physics and Applications of of Josephson knot
The Josephson Effect, John Wiley and Sons, New York;Martinis et al., 2002,
Physical Review Letters 89,117901] and helium on electronics.
The bias source for being inductively coupled to each quantum bit is known as local fields biasing.In one embodiment, bias source is
A kind of calutron, for magnetic flux to be passed through quantum bit to provide the control [US 2006/0225165] to qubit state.
Local field biasing on quantum bit is programmable and controllable.In one embodiment, including digital processing list
The quantum level controlling system of member is connected to qubit system, and can program and tune the biasing of the local field on quantum bit.
Quantum processor can also include multiple quantum bits in it is multipair between multiple couplings.Between two quantum bits
Coupling is a device near two quantum bits, it is by magnetic flux through on two quantum bits.In one embodiment,
Coupling can be made of the superconducting circuit interrupted by compound Josephson knot.Magnetic flux can pass through compound Josephson knot simultaneously
Therefore magnetic flux [US 2006/0225165] is passed through on two quantum bits.Energy of the intensity of the magnetic flux to quantum processor
There are two the contributions of aspect for amount.In one embodiment, it is forced by being tuned at the coupling device near two quantum bits
Realize stiffness of coupling.
Stiffness of coupling is controllable and programmable.In one embodiment, the quantum device including digital processing element
Part control system is connected to multiple couplings, and can program the stiffness of coupling of quantum processor.
Quantum annealing furnace is the quantum processor with Quantum annealing, for example, such as Farhi, described in E. et al., " Quantum
Adiabatic Evolution Algorithms versus Simulated Annealing " arXiv.org:quant ph/
0201031 (2002), pp.1-16.
Quantum annealing furnace executes quantum processor from initial setting up to the conversion being finally arranged.Quantum processor initial and
Final setting is provided by the quantized system of corresponding initial and final Hamiltonian description.For having part as described above
The Quantum annealing device of field biasing and coupling, final Hamiltonian can be expressed as quadratic function f (x)=∑ihixi+∑(i, j)
J(i, j)xixj, wherein first summation is run on the index i of quantum bit for indicating Quantum annealing furnace, second summation is in quantum
In the presence of coupling on (i, j) between bit i and j.
Quadratic function (wherein, each variable x as described aboveiTake one in the spin values -1 and 1 of i-th of quantum bit)
Also referred to as Ising model.In this case, Ising model is also usedIt indicates.Here subscript z
Indicate the spin σ of quantum bit iiOnly work in one in three of them axis.Therefore, axis z also referred to as measures axis or measurement
Base.
In more generally embodiment, the Hamiltonian of Ising model can also be spun on certainly other bases comprising quantum bit
In contribution.For example, HamiltonianReferred to as transverse field Ising model,
In, each spin is influenced by the non-zero transverse field along x-axis.
Quantum annealing furnace can be used as the heuristic optimization device of its energy function.McGeoch, Catherine C.and Cong
Wang, (2013), " Experimental Evaluation of an Adiabatic Quantum System for
Combinatorial Optimization ", Computing Frontiers, May14 16,2013 (http: //
Www.cs.amherst.edu/ccm/cf14-mcgeoch.pdf it) discloses the embodiment of this analog processor and goes back
It is disclosed in patent application US 2006/0225165.
By the minor modifications handled Quantum annealing, quantum processor can alternatively under finite temperature from it
The ANALOGY OF BOLTZMANN DISTRIBUTION of Ising model provides sample.Reader can refer to technical report: Bian, Z., Chudak, F.,
Macready, W.G.and Rose, G. (2010), " The Ising model:teaching an old problem new
Tricks ", and also Amin, M.H., Andriyash, E., Rolfe, J., Kulchytskyy, B., and Melko, R.
(2016), " Quantum Boltzmann Machine " arXiv:1601.02036.
This method of sampling is known as quantum sampling.
For the quantum processor for biasing and coupling with local field, sample that quantum sampling provides with it from indicating
The distribution that the ANALOGY OF BOLTZMANN DISTRIBUTION of Ising model is slightly different.
Bibliography Amin, M.H., Andriyash, E., Rolfe, J., Kulchytskyy, B., and Melko, R.
(2016), " Quantum Boltzmann Machine " arXiv:1601.02036 has studied quantum sampling and adopts with Boltzmann
The distance of sample.
Optical computing device
The simulation system that can be sampled from the ANALOGY OF BOLTZMANN DISTRIBUTION of the Ising model close to its equilibrium state it is another
Embodiment is Optical devices.
In one embodiment, Optical devices include the network of optical parametric oscillator (OPO), such as patent application
Disclosed in US20160162798 and WO2015006494 A1.
In the present embodiment, optical parametric oscillator of each spin of Ising model by a job under degeneracy
Simulation.
Degenerate Optical Parametric Oscillator is open dissipative system, and second order phase transformation is carried out at oscillation threshold.Due to phase sensitivity
Amplification, the optical parametric oscillator of a degeneracy can be more than the vibration of threshold value with the phase oscillation of 0 or π relative to pumping phase
Width.Phase be it is random, by oscillation establish during the relevant quantum noise of optical parameter down coversion influenced.Therefore, degeneracy light
Learn parametric oscillator indicates the binary digit specified by its output phase naturally.Based on the characteristic, the vibration of degeneracy optical parameter
Swinging device system may be used as Yi Xinji.The phase of each Degenerate Optical Parametric Oscillator be identified as Yi Xin spin, amplitude and
Phase is determined by the intensity and symbol of the Yi Xin coupling between dependent spin.
When being pumped by intense source, Degenerate Optical Parametric Oscillator is in Ising model using two corresponding to spin 1 or -1
One of a phase state.The identical source of network with the N number of essentially identical optical parametric oscillator to intercouple pumps
To simulate Yi Xin spin system.Optical parametric oscillator network after a transient state phase, gradually tends to close after pumping introducing
Thermally equilibrated stable state.
Phase state selection processing depends on the fluctuations of vacuum of optical parametric oscillator and intercouples.In some embodiments
In, pump with constant amplitude pulse, in other embodiments, pump output gradually increases, and in a further embodiment, pump with
Other modes control.
In an embodiment of Optical devices, by for the multiple of the light field between coupling optical parametric oscillator
Configurable coupler simulates multiple couplings of Ising model.Configurable coupler can be configured as closing or be configured to out
It opens.Opening and closing coupler can be progressive, be also possible to unexpected.When being configured to open, configuration, which can provide, appoints
What phase or amplitude, is specifically dependent upon the stiffness of coupling of Yi Xin problem.
Each optical parametric oscillator output is interfered by phase reference, and result is captured at photodetector.Optical Parametric
Measuring oscillator output indicates the configuration of Ising model.For example, zero phase can indicate -1 spin states, and π phase can be with table
Show 1 spin states in Ising model.
For the Ising model with N number of spin, and according to one embodiment, multiple optical parametric oscillators it is humorous
Vibration chamber is configured with N times of the two-way time in the period equal to N number of pulse from pumping source.When used herein round-trip
Between indicate light along the Once dissemination of described recursion paths time.Period P is equal to the 1/N's of the two-way time of resonant cavity
N number of pulse of pulse train can propagate concurrently through N number of optical parametric oscillator without interfering with each other.
In one embodiment, the coupling of optical parametric oscillator is provided by the multiple delay lines distributed along resonant cavity.
Multiple delay lines include multiple modulators, synchronously control the intensity and phase of coupling, allow to Optical devices
It is programmed to simulate Ising model.
In the network of N number of optical parametric oscillator, N-1 delay line and corresponding modulator are enough to control every two light
Learn the amplitude and phase of the coupling between parametric oscillator.
In one embodiment, optical parametric oscillator can be manufactured to from the optimum device that Ising model samples
The network of device, as disclosed in U.S. Patent application 20160162798.
In one embodiment, quotient can be used in the coupling of the network and optical parametric oscillator of optical parametric oscillator
Obtainable mode-locked laser and optical element (such as telecommunication optical fiber delay line, modulator and other Optical devices) come real in industry
It is existing.Alternatively, optical fiber technology realization can be used in the coupling of optical parametric oscillator network and optical parametric oscillator, it is for example, electric
Believe the optical fiber technology of application and development.Coupler can be realized with optical fiber, and be controlled by optics Kerr shutter (Kerr shutters)
System.
Q- study
For near-optimization value function U*With optimal policy α*Method be referred to as neurodynamics programming or Q study calculate
Method.Bibliography [Sallans, B., Hinton, G.E., Reinforcement Learning with Factored
States and Actions, Journal of Machine Learning Research 5,1063-1088,2004] it proposes
The method that Q study is carried out by using Boltzmann machine.Especially common Boltzmann machine is used for near-optimization STOCHASTIC CONTROL
The Joint Distribution of state and movement in setting.
By reading following the disclosure, attached drawing and description, feature of the invention be will become obvious.
Summary of the invention
According to extensive aspect, a kind of method for improving the strategy of Stochastic Control Problem is disclosed, STOCHASTIC CONTROL is asked
Topic be characterized in that set of actions, state set, as state and the incentive structure and multiple decisions of the function of movement when
Phase, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, this method includes that number is coupled in use
Computer and the sampling apparatus for being coupled to sampling apparatus control system, which, which obtains, indicates that the sampling of Boltzmann machine is matched
The data set, which includes multiple nodes, multiple couplers, (each biasing corresponds to multiple nodes for multiple biasings
In node), multiple coupled weights (each coupled weight corresponds to the coupler in multiple couplers), and lateral field strength;
Using digital computer obtain include set of actions, state set, Stochastic Control Problem incentive structure and be used for STOCHASTIC CONTROL
The initial policy of problem, the strategy include selecting at least one movement for each state;Use digital computer and sampling apparatus
Control system will indicate each coupler and the respective initial weight of each node and biasing and transverse field of Boltzmann machine
Strong data distribute to sampling apparatus;Until meeting stopping criterion, current period state action pair is generated using digital computer,
Using digital computer and sampling apparatus control system, using generation current state movement to come modify indicate without or at least
The data of one coupler and at least one biasing are executed and are adopted corresponding to the sampling of current period state action pair with obtaining first
Sample empirical mean, using digital computer, approaches the value of Q function under current state acts on using the first sampling empirical mean
Approximation, the value of Q function indicate the effectiveness of current state effect pair, obtain future period state action pair using digital computer,
Wherein state be by stochastic regime handle obtain, in addition, the acquisition of the movement include to include future state and it is any can
The multiple state actions for the movement that can be taken to provide movement for future period, and update not to random optimization test is carried out
Carry out the strategy of period state;No or at least one coupler is indicated using digital computer and the modification of sampling apparatus control system
Data and using generate future period state action pair at least one biasing, execute correspond to future period state action
Pair sampling to obtain the second sampling empirical mean, using the second sampling empirical mean, using digital computer, when obtaining following
The approximation of the value of Q function at phase state action, the value of Q function indicate the effectiveness of future period state action pair, use digital meter
Calculation machine uses the approximation of the value of the Q function of generation and the first sampling empirical mean of current period state action pair and use
The corresponding reward for the current period state action pair that incentive structure obtains, updates each coupler and each section of Boltzmann machine
The respective each weight of point and each biasing, and strategy is provided using digital computer when meeting stopping criterion.
According to one embodiment, sampling apparatus includes quantum processor, and wherein sampling apparatus control system includes
Quantum devices control system, and quantum processor is coupled to digital computer and quantum devices control system, in addition, wherein measuring
Sub-processor includes multiple quantum bits and multiple couplers, and each coupler is provided for the infall between two quantum bits
It is communicatively coupled.
According to one embodiment, sampling apparatus includes Optical devices, is configured as receiving energy simultaneously from optical energy source
Multiple optical parametric oscillators and multiple coupling devices are generated, each coupling device controllably couples multiple optical parameters
Optical parametric oscillator in oscillator.
According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes
The memory cell of Boltzmann machine, and the Boltzmann machine realized is classical Boltzmann machine, it is characterised in that zero
Lateral field strength;In addition, wherein memory cell includes indicating each coupler of classical Boltzmann machine and every for obtaining
The application program of a respective each weight of node and the data of each biasing, and application program is adapted for carrying out classical Bohr
The hereby simulation Quantum annealing of graceful machine.
According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes
The memory cell of Boltzmann machine, and the Boltzmann machine realized is quantum Boltzmann machine, it is characterised in that nonzero value
Lateral field strength and memory cell include indicating that each node of each coupler and quantum Boltzmann machine is each for obtaining
From each weight and each biasing data application program;In addition, wherein application program is adapted for carrying out quantum Bohr hereby
The simulation Quantum annealing of graceful machine.
According to one embodiment, the execution of the simulation Quantum annealing of quantum Boltzmann machine, which provides, indicates quantum Bohr hereby
Multiple sample configurations of the Effective Hamiltanian of graceful machine.
According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes
The memory cell of Boltzmann machine, and the Boltzmann machine realized is classical Boltzmann machine, it is characterised in that zero
Lateral field strength;In addition, wherein memory cell includes indicating each coupler of classical Boltzmann machine and every for obtaining
The application program of a respective each weight of node and the data of each biasing, and application program is suitable for corresponding to classical glass
Multiple examples sampling that the random cluster of the Fortuin-Kasteleyn of the graceful machine of Wurz indicates, to provide Fortuin-Kasteleyn
The approximation of the number of cluster in random cluster expression.
According to one embodiment, sampling apparatus includes central processing unit and is coupled to central processing unit and realizes
The memory cell of Boltzmann machine, and the Boltzmann machine realized is quantum Boltzmann machine, it is characterised in that nonzero value
Lateral field strength and memory cell include indicating that each node of each coupler and quantum Boltzmann machine is each for obtaining
From each weight and each biasing data application program, and the application is suitable for corresponding to quantum Boltzmann machine
Multiple examples sampling that the random cluster of Fortuin-Kasteleyn indicates, so that providing the random cluster of Fortuin-Kasteleyn indicates
The approximation of the number of middle cluster.
According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes along measurement
The multiple configuration samples for the Boltzmann machine that axis is obtained from sampling apparatus, and use digital computer calculates oneself of Boltzmann machine
It is approximate by the experience of energy.
According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes along measurement
The multiple sample configurations for the Boltzmann machine that axis is obtained from sampling apparatus, constructing from sample configuration obtained indicates quantum Bohr
The hereby multiple configuration samples of the Effective Hamiltanian of graceful machine, and the free energy of quantum Boltzmann machine is calculated using digital computer
Experience is approximate.
According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes setting from sampling
It is standby to obtain the multiple configuration samples for indicating the Effective Hamiltanian of quantum Boltzmann machine, and quantum is calculated using digital computer
The experience of the free energy of Boltzmann machine is approximate.
According to one embodiment, obtaining the approximation of the value of Q function in current period and future period includes setting from sampling
The standby approximation for obtaining the number of cluster in the random cluster expression of Fortuin-Kasteleyn for correspond to quantum Boltzmann machine, and
The experience that the free energy of quantum Boltzmann machine is calculated using digital computer is approximate.
According to one embodiment, it calculates the first empirical mean for corresponding to node and the second empirical mean includes along survey
Amount axis is from one multiple configuration sample in the quantum Boltzmann machine that sampling apparatus obtains and classical Boltzmann machine and makes
With digital computer come the approximation of the empirical mean of calculate node.
According to one embodiment, it calculates the first empirical mean for corresponding to node and the second empirical mean includes from sampling
Device obtains multiple configuration samples of the Effective Hamiltanian of Boltzmann machine and the experience using digital computer calculate node
The approximation of mean value.
According to one embodiment, to the multiple state action to progress random optimization test, including digital meter is used
Calculation machine and sample devices control system, using correspond to future period state each state action to come modify indicate without or
The data of at least one coupler and at least one biasing execute and correspond to each state action of future period state to phase
Corresponding sampling obtains each state action pair for corresponding to future period state using digital computer to provide empirical mean
The approximation of the value of the Q function at place, using with correspond to the corresponding each state action pair of future period state all approximate Q
The value of function updates the strategy for future period state from corresponding profile samples using digital computer.
It according to one embodiment, include obtaining temperature ginseng to random optimization test is executed to multiple all state actions
Number;Obtain future period state;Relevant ANALOGY OF BOLTZMANN DISTRIBUTION approximate with the value of Q function is sampled, wherein state variable is not
Come at period state and the temperature of offer to be fixed.
According to one embodiment, multiple quantum bits of quantum processor include first group of quantum bit;Second group of quantum bit
Multiple couplers with quantum processor include that (each of at least one coupler is used for first at least one coupler
Infall in the quantum bit and second group of quantum bit of group quantum bit between at least one quantum bit provides communicative couplings) and it is multiple
(each of multiple couplers are used for other in the quantum bit and second group of quantum bit in second group of quantum bit to coupler
Intersection between quantum bit provides communicative couplings).
According to one embodiment, first group of quantum bit indicates the set of actions of Stochastic Control Problem.
According to one embodiment, using current period state action generated to come modify indicate without or at least one
A coupler and the data of at least one biasing include will be between the quantum bit and second group of quantum bit of first group of quantum bit
Infall provides all couplers being communicatively coupled and is switched to OFF, and using current epoch state action generated to next
Modify the biasing of at least one of described second group of quantum bit.
According to one embodiment, using future period state action generated to come modify indicate without or at least one
A coupler and the data of at least one biasing include will be between the quantum bit and second group of quantum bit of first group of quantum bit
Infall provides all couplers being communicatively coupled and is switched to OFF, and using future period state action generated to next
Modify the biasing of at least one of second group of quantum bit.
According to one embodiment, to the multiple all state actions pair for including future period state and any possible movement
Executing random optimization test includes by the friendship between the quantum bit of first group of quantum bit and the quantum bit of second group of quantum bit
All couplers being communicatively coupled are provided at fork and are switched to ON;Use the future period shape for corresponding to future period state action pair
State biases to modify at least one of second group of quantum bit;Quantum sampling is executed to obtain the warp for corresponding to first group of quantum bit
Mean value is tested, and passes through the distribution according to the empirical mean obtained for corresponding to first group of quantum bit to not using digital computer
Carry out the movement of period state assignment to update the strategy of future period state.
According to one embodiment, stopping criterion includes the training step for reaching maximum quantity.
According to one embodiment, stopping criterion includes reaching maximum runing time.
According to one embodiment, stopping criterion includes the convergence coupled with the weight of local field and the function of biasing.
According to one embodiment, stopping criterion includes that strategy is converged to fixed policy.
According to embodiment, tactful offer includes at least one to the user's display strategy interacted with digital computer;
It stores the policies into digital computer and sends strategy to another processing list for being operably connected to digital computer
Member.
According to one embodiment, digital computer includes memory cell;In addition, wherein initialization data is from number
What the memory cell of computer obtained.
According to one embodiment, initialization data is from the user interacted with digital computer and can with digital computer
One in the remote processing unit being operatively connected acquisition.
According to extensive aspect, a kind of digital computer, including central processing unit are disclosed;Show equipment;It is logical
Port is believed, for digital computer to be operationally connected to sampling apparatus and the sampling apparatus control of being coupled to digital computer
System;A kind of memory cell including application program, the method for the strategy for improving Stochastic Control Problem, STOCHASTIC CONTROL are asked
Topic be characterized in that set of actions, state set, as state and the incentive structure and multiple decisions of the function of movement when
Phase, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, which includes for using coupling
It closes digital computer and is coupled to the instruction of the sampling apparatus of sampling apparatus control system, which, which obtains, indicates Bohr
The hereby data of the sampling configuration of graceful machine, which includes multiple nodes, multiple couplers, multiple biasing (each biasings
Corresponding to the node in multiple nodes), multiple coupled weights (each coupled weight correspond to multiple couplers in coupler),
And lateral field strength;For using the digital computer to obtain the instruction of initialization data, initialization data includes behavior aggregate
Conjunction, state set, the incentive structure of Stochastic Control Problem and the initial policy for Stochastic Control Problem, the strategy is including being every
A state selects at least one movement;For using the instruction of digital computer and sample devices control system, Bohr will be indicated
Hereby the data of each coupler of graceful machine and the respective initial weight of each node and biasing and lateral field strength distribute to sampling
Device;Instruction is for generating current period state action pair using digital computer, using digital meter until meeting stopping criterion
Calculation machine and sampling apparatus control system indicate no or at least one coupler to modify using the current state movement of generation
With the data of at least one biasing, execute equal to obtain the first sampling experience corresponding to the sampling of current period state action pair
Value approaches the approximation that current state acts on the value of lower Q function, Q letter using digital computer using the first sampling empirical mean
Several values indicates the effectiveness of current state effect pair, obtains future period state action pair using digital computer, wherein state
It is to handle to obtain by stochastic regime, in addition, acquisitions of the movement includes to including future state and any may taking
Multiple state actions of movement to provide movement for future period, and update future period shape to random optimization test is carried out
The strategy of state;Using digital computer and sampling apparatus control system modification indicate without or at least one coupler data with
Using at least one biasing of the future period state action pair of generation, the sampling for corresponding to future period state action pair is executed
To obtain the second sampling empirical mean, it is dynamic to obtain future period state using digital computer using the second sampling empirical mean
The approximation of the value of Q function at work, the value of Q function indicate the effectiveness of future period state action pair, using digital computer, use
The approximation of the value of the Q function of generation and the first of current period state action pair are sampled empirical mean and are obtained using incentive structure
Current period state action pair corresponding reward, respectively update Boltzmann machine each coupler and each node it is every
A weight and each biasing, and for providing the instruction of strategy using digital computer when meeting stopping criterion.
According to extensive aspect, it is computer-readable to disclose a kind of non-transitory for storing computer executable instructions
Storage medium, computer executable instructions make digital computer execute the plan for improving Stochastic Control Problem when executed
Method slightly, Stochastic Control Problem are characterized in that set of actions, state set, as the reward knot of state and the function of movement
Structure and multiple decision periods, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, this method packet
It includes using being coupled to digital computer and being coupled to the sampling apparatus of sampling apparatus control system, which, which obtains, indicates glass
The data of the sampling configuration of the graceful machine of Wurz, the Boltzmann machine include multiple nodes, multiple couplers, multiple biasings (each partially
Set correspond to multiple nodes in node), multiple coupled weights (each coupled weight correspond to multiple couplers in coupling
Device), and lateral field strength;Using digital computer obtain include set of actions, state set, Stochastic Control Problem reward knot
Structure and initial policy for Stochastic Control Problem, the strategy include selecting at least one movement for each state;Use number
Computer and sampling apparatus control system will indicate each coupler and the respective initial weight of each node of Boltzmann machine
Sampling apparatus is distributed to the data of biasing and lateral field strength;Until meeting stopping criterion, worked as using digital computer generation
Preceding period state action pair, using digital computer and sampling apparatus control system, using the current state movement of generation to next
Modification indicate without or at least one coupler and at least one biasing data, execute corresponding to current period state action pair
Sampling to obtain the first sampling empirical mean, using digital computer approach current state using the first sampling empirical mean
The approximation of the value of lower Q function is acted on, the value of Q function is indicated the effectiveness of current state effect pair, obtained not using digital computer
Carry out period state action pair, wherein state is to handle to obtain by stochastic regime, in addition, the acquisition of the movement includes to including
Future state and multiple state actions of any movement that may be taken are to random optimization test is carried out, to mention for future period
For movement, and update the strategy of future period state;Indicate do not have using digital computer and the modification of sampling apparatus control system
Or at least one biasing of the data and the future period state action pair using generation of at least one coupler, execution correspond to
The sampling of future period state action pair, using the second sampling empirical mean, uses number to obtain the second sampling empirical mean
Computer, obtains the approximation of the value of Q function at future period state action, and the value of Q function indicates future period state action pair
Effectiveness use the first sampling of the approximation and current period state action pair of the value of the Q function of generation using digital computer
The corresponding reward of empirical mean and the current period state action pair obtained using incentive structure, updates Boltzmann machine respectively
Each coupler and each node each weight and each biasing, and when meeting stopping criterion use digital computer
Strategy is provided.
One advantage of method disclosed herein is that it overcomes the value iteration side for solving Markov decision problem
The dimension of method limits.
Another advantage of method disclosed herein is that it overcomes the common Q for solving Markov decision problem
The memory storage issues of learning method.
Another advantage of method disclosed herein is quantum sampling for providing the warp of the quantum bit for finding system
The effective ways of mean value are tested, to provide the effective ways for training neural network.
Another advantage of method disclosed herein is in one embodiment using from Fortuin-Kasteleyn
The sampling that random cluster indicates is used for providing the effective ways for finding the empirical mean of the quantum bit of system to provide
The effective ways of training neural network.
Another advantage of method disclosed herein be it be not limited to quantum processor or Optical devices quantum bit spy
Determine pattern layout.
Detailed description of the invention
For ease of understanding the present invention, embodiments of the present invention are illustrated by way of example in the accompanying drawings.
Fig. 1 is the figure for showing the embodiment of system of the digital display circuit including being coupled to simulation computer.
Fig. 2 is the flow chart for showing the embodiment of method of the strategy for improving Stochastic Control Problem.
Further details and its advantage of the invention will be apparent from detailed description included below.
Specific embodiment
It is in order to illustrate example of the invention can be practiced with reference to attached drawing in the description below to embodiment.
Term
Unless expressly stated otherwise, otherwise term " invention " etc. indicates " one or more inventions disclosed herein ".
Unless expressly stated otherwise, otherwise term " one aspect ", " embodiment ", " embodiment ", " one or more
A embodiment ", " some embodiments ", " certain embodiments ", " embodiment ", " another embodiment " etc. indicate
" one or more (but being not all of) embodiment of disclosed invention ".
Unless expressly stated otherwise, otherwise " another embodiment " or " another aspect " are drawn when describing embodiment
With being not meant to cited embodiment and another embodiment (for example, describing before cited embodiment
Embodiment) mutual exclusion.
Unless expressly stated otherwise, otherwise the terms "include", "comprise" and its variant indicate " including but not limited to ".
Unless expressly stated otherwise, otherwise term " one ", "one", "the" and "at least one" indicate " one or more
It is a ".
Unless expressly stated otherwise, otherwise term " multiple " expression " two or more ".
Unless expressly stated otherwise, otherwise term " herein " expression " in this application, including can be by quoting simultaneously
Any content entered ".
Term " thus " it is only used for the expected results in the something for only indicating previously clearly to have described, target or consequence herein
Before subordinate clause or other set of letters.Therefore, when in the claims use term " thus " when, term " thus " modification from
Sentence or other words do not know claim it is specific it is further limitation or otherwise limit claim meaning or
Range.
Term " such as " and similar terms expression " such as ", and term or phrase that they are explained therefore are not limited.
For example, in the sentence of " computer sends data (for example, instruction, data structure) by internet ", term " such as " explain
Say that " instruction " is the example that computer can send " data " by internet, and also explaining " data structure " is to calculate
The example of " data " that machine can be sent by internet.However, " instruction " and " data structure " is all only showing for " data "
Example, and other things other than " instruction " and " data structure " can be " data ".
Term " i.e. " and similar terms indicate " that is ", and therefore limit the term or short that they are explained
Language.
In one embodiment, term " simulation computer " refer to include quantum processor, quantum level controlling system,
The system of coupling device and read-out system, they are all connected with each other by communication bus.
In alternate embodiments, " simulation computer " refer to include Optical devices system, which includes
The control system of the network of optical parametric oscillator, optical parametric oscillator;One or more including delay line and modulator
Coupling device, and the read-out system including one or more photodetectors.
Title and abstract should not be construed as by the range of disclosed invention it is any in a manner of limited.The mark of the application
The title of topic and chapters and sections provided herein is not to be considered in any way limitative of the disclosure only for convenient.
Many embodiments are described in this application, and are presented merely for illustrative purpose.The embodiment of description
It is not, and is also not and limits in any sense intentionally.It will be apparent that presently disclosed invention is extensive such as from the disclosure
Suitable for numerous embodiments.It will be appreciated by those of ordinary skill in the art that various modifications and change (such as structure can be passed through
And logic Modification) practice disclosed invention.Although can be retouched with reference to one or more particular implementations and/or attached drawing
State the special characteristic of disclosed invention, but it is to be understood that unless expressly stated otherwise, these features are not limited to retouch with reference to it
The use in one or more particular implementations or attached drawing stated.
It should be appreciated that the present invention can realize in many ways.In the present specification, these embodiments or the present invention can be with
Any other form used is properly termed as subsystem or technology.It is described as being configured as such as processor of execution task or deposits
The component of reservoir includes that provisional configuration is to execute the general purpose module of task in given time or be manufactured to the spy of execution task
Determine component.
In view of all these, the present invention relates to a kind of for improving the method and system of the strategy of Stochastic Control Problem.
As described above, Stochastic Control Problem can be various types.In one embodiment, Stochastic Control Problem is gold
Portfolio Optimization in warm investment.
In alternate embodiments, Stochastic Control Problem is equipment replacement problem.
In alternate embodiments, Stochastic Control Problem is the scheduling in queuing theory and running research.
In alternate embodiments, Stochastic Control Problem is to be involved in the problems, such as generating Motion.
Referring now to Figure 1, show the figure of the embodiment for the system of showing, the system can be used to implement for improve with
The method of the strategy of machine control problem.
It should be appreciated that using quantum processor in the embodiment disclosed in Fig. 1.
It is appreciated that removably, other sampling apparatuses, such as the simulator of quantum or classical Ising model can be used
Or the Optical devices including optical parametric oscillator network.
More precisely, the system includes being coupled to the digital display circuit 8 of simulation computer 10.
It should be appreciated that digital computer 8 can be any kind of digital computer.
In one embodiment, digital computer 8 is selected from a group, which includes desktop computer, calculating on knee
Machine, tablet computer, server, smart phone etc..It should also be understood that in the foregoing, digital computer 8 can also be broadly referred to as handling
Device.
In the embodiment shown in figure 1, digital computer 8 include central processing unit 12 (also referred to as microprocessor),
Display device 14, input unit 16, communication port 20, data/address bus 18 and memory cell 22.
Central processing unit 12 is for handling computer instruction.It will be understood by those skilled in the art that centre can be provided
Manage the various embodiments of unit 12.
In one embodiment, central processing unit 12 includes with 2.5GHz operation and by Intel(TM)The CPU of manufacture
Core i5 3210。
Display device 14 is used to show data to user.It will be appreciated by those skilled in the art that can be used various types of aobvious
Showing device 14.
In one embodiment, display device 14 is standard LCD (LCD) monitor.
Input unit 16 is for entering data into digital computer 8.
Communication port 20 is used for and 8 shared data of digital computer.
Communication port 20 may include for example for keyboard and mouse to be connected to the universal serial bus of digital computer 8
(USB) port.
Communication port 20 can also include data network communications port (such as 802.3 port IEEE), for realizing number
The connection of computer 8 and simulation computer 10.
It will be understood by those skilled in the art that the various optional embodiments of communication port 20 can be provided.
Memory cell 22 is for storing computer executable instructions.
Memory cell 22 may include system storage, such as storage system control program (for example, BIOS, behaviour
Make system module, application program etc.) high-speed random access memory (RAM) and read-only memory (ROM).
It should be appreciated that in one embodiment, memory cell 22 includes operating system module.
It should be appreciated that operating system module can be it is various types of.
In one embodiment, operating system module is AppleTMThe OS X Yosemite of manufacture.
Memory cell 22 further includes the application program for improving the strategy of Stochastic Control Problem.
Memory cell 22 can also include the application program for using simulation computer 10.
Memory cell 22 can also include quantum processor data, such as quantum processor 28 each coupler pair
Answer the corresponding biasing of each quantum bit of weight and quantum processor 28.
Simulation computer 10 includes quantum level controlling system 24, reads control system 26, quantum processor 28 and coupling dress
Set control system 30.
Quantum processor 28 can be various types of.In one embodiment, quantum processor includes Superconducting Quantum
Position.
Read the quantum bit that control system 26 is used to read quantum processor 28.Indeed, it is to be understood that in order to herein
Quantum processor is used in disclosed method, needs a kind of reading that quantized system quantum bit is measured under its quantum mechanical state
System.Repeatedly measurement provides the sample of qubit state.Result from reading is fed to digital computer 8.Quantum treatment
The biasing of the quantum bit of device 28 is controlled by quantum level controlling system 24.Coupler is controlled by coupling device control system 30.
It should be appreciated that read control system 26 can be it is various types of.For example, it may include more for reading control system 26
A dc-SQUID magnetometer, the different quantum bits of each dc-SQUID magnetometer inductance connection to quantum processor 28.Read control
System 26 processed can provide voltage value or current value.In one embodiment, as it is known in the art, the dc-SQUID magnetic
Power meter includes the superconductor ring interrupted by least one Josephson knot.
Coupling device control system 30 may include one or more Coupling Control Units for coupling device, also referred to as
" coupler ".Each Coupling Control Unit can be configured as the coupled weight by corresponding coupling device from zero adjustment to maximum value.
It should be appreciated that coupling device can be tuned, for example, providing ferromagnetic or antiferromagnetic coupling between the quantum bit of quantum processor 28
It closes.The example of this simulation computer is disclosed in United States Patent (USP) No.8,421,053 and U.S. Patent Application Publication No.2015/
In 0046681.
In the embodiment of figure 1, the sampling apparatus for being coupled to digital computer is quantum processor.
In alternate embodiments, sampling apparatus is the Optical devices for including optical parametric oscillator network.
In the third embodiment, sampling apparatus includes central processing unit and the memory for being coupled to central processing unit
Unit, the memory cell include for obtaining the lateral field strength and each coupler and each node that indicate Boltzmann machine
The application program of respective each weight and the data of each biasing, wherein zero transverse direction field strength corresponds to classical Boltzmann
Machine, and nonzero value transverse direction field strength corresponds to quantum Boltzmann machine (QBM), and the analog quantity for executing Boltzmann machine
Sub- method for annealing is to provide multiple sample configurations of Boltzmann machine along measurement axis.
In the fourth embodiment, sampling apparatus includes central processing unit and the memory for being coupled to central processing unit
Unit, the memory cell include for obtaining the lateral field strength and each coupling that indicate Boltzmann machine from digital computer
The application program of device and the respective each weight of each node and the data of each biasing, wherein lateral field strength has the amount of corresponding to
The nonzero value of sub- Boltzmann machine, and for executing simulation quantum method for annealing on quantum Boltzmann machine, to provide
Indicate multiple sample configurations of the Effective Hamiltanian of quantum Boltzmann machine.
In the 5th embodiment, sampling apparatus includes central processing unit and the memory for being coupled to central processing unit
Unit, the memory cell include for obtaining the lateral field strength and each coupling that indicate Boltzmann machine from digital computer
The application program of device and the respective each weight of each node and the data of each biasing, wherein lateral field strength has the amount of corresponding to
The nonzero value of sub- Boltzmann machine, and for the random cluster of Fortuin-Kasteleyn for corresponding to quantum Boltzmann machine
The multiple examples sampling indicated, to provide the approximation of the number of cluster in the random cluster expression of Fortuin-Kasteleyn.
Referring now to Figure 2, showing the embodiment of the method for the strategy for improving Stochastic Control Problem.
As described above, Stochastic Control Problem be characterized in that set of actions, state set, discount factor, as state and
The incentive structure of the function of movement and multiple decision periods, wherein the evolution of basic stochastic regime processing depends in strategy
Multiple movements.
Use sampling apparatus.More precisely, using being coupled to digital computer and being coupled to sampling apparatus control system
Sampling apparatus be used for obtain data.The data of acquisition indicate the sample configuration of Boltzmann machine, which includes
Multiple nodes, multiple couplers, multiple biasings (each biasing corresponds to the node in multiple nodes), multiple coupled weights are (every
A coupled weight corresponds to the coupler in multiple couplers), and lateral field strength.
According to processing step 52, initialization data is obtained.It is initialized it should be appreciated that digital computer 8 can be used
Data.It is also understood that initialization data include set of actions, state set, discount factor, Stochastic Control Problem reward knot
Structure, and for the initial policy of Stochastic Control Problem, which includes selecting at least one movement for each state.
It should be appreciated that in one embodiment, initialization data can store the memory cell in digital computer 8
In 22.
In alternate embodiments, initialization data can be provided by the user interacted with digital computer 8.
In another optional embodiment, initialization data can be long-range from being operatively coupled with digital computer 8
Processing unit obtains.
Still referring to Figure 2 and according to processing step 54, each coupler and each node of Boltzmann machine will be indicated
The data of respective initial weight and biasing and lateral field strength distribute to sampling apparatus.It include quantum processor in sampling apparatus
Embodiment in, indicate the data of initial weight and biasing be respectively allocated to quantum processor each coupler and each amount
Sub- position, and use the value of the lateral field strength of control system distribution.
In the embodiment that sampling apparatus includes optical parametric oscillator pulse network, initial weight and biasing are indicated
Data are sent to energy source and modulator respectively.It include the embodiment for simulating Quantum annealing application program in sampling apparatus
In, the data of initial weight and biasing are delivered separately to application program as parameter.
It should be appreciated that quantum processor can be it is various types of.
In one embodiment, quantum processor includes first group of quantum bit and second group of quantum bit.In the embodiment party
In formula, quantum processor includes one group of coupler.This group of coupler of the quantum processor includes at least one coupler, this is extremely
Each of few coupler coupler be used for first group of quantum bit quantum bit and second group of quantum bit at least one
Infall between a quantum bit provides communicative couplings.This group of coupler further includes multiple couplers, in multiple coupler
Each coupler is for the friendship between other quantum bits in the quantum bit and second group of quantum bit in second group of quantum bit
Communicative couplings are provided at fork.
In this embodiment, first group of quantum bit is used for the set of actions of Stochastic Control Problem.
In another embodiment, quantum processor is by D-Wave Systems, the D-Wave 2X system of Ltd. manufacture
System.
It should be appreciated that digital computer 8 and quantum devices control system can be used to distribute each of quantum processor
Coupler and the respective initial weight of each quantum bit and biasing.
Apparatus control system includes quantum level controlling system 24 and coupling device control system 30.
It should be appreciated that initial weight and biasing can store in the memory cell 22 of digital computer 8.
In alternate embodiments, initial weight and biasing are provided by the user interacted with digital computer 8.
In further embodiment, initial weight and biasing are long-range by being operatively coupled with digital computer 8
Processing unit provides.
It should be appreciated that in one embodiment, initial weight is randomly generated with biasing.
Sampling apparatus is set
In one embodiment, wherein quantum processor is used as sampling apparatus, it should be understood that the quantum of quantum processor
Position indicates multiple nodes of corresponding general Boltzmann machine (GBM).
In one embodiment, wherein sampling apparatus includes Optical devices, and optical parametric oscillator network representation is general
Boltzmann machine.
The visible node of general Boltzmann machine is made of two group nodes.The shape of first group node expression Stochastic Control Problem
State.The movement of second group node expression Stochastic Control Problem.The concealed nodes of general Boltzmann machine are by being not included in first group
All nodes composition in node or the second group node.
In one embodiment, wherein quantum processor is used as sampling apparatus, and quantum processor includes indicating general glass
Multiple quantum bits of the concealed nodes of the graceful machine of Wurz.In this embodiment, quantum processor includes multiple quantum bits and multiple
Coupler, each coupler provide communicative couplings for the infall between two quantum bits.
In one embodiment, wherein Optical devices are used as sampling apparatus, and optical parametric oscillator indicates general Bohr
The hereby concealed nodes of graceful machine.
In another embodiment, wherein simulation Quantum annealing is used as sampling apparatus, simulation spin indicates general Bohr hereby
The concealed nodes of graceful machine.
In another embodiment, wherein simulation Quantum annealing is used as sampling apparatus, first group of simulation spin indicates general
The movement node of Boltzmann machine, and second group of simulation spin indicates the concealed nodes of general Boltzmann machine.
In another embodiment, wherein quantum processor is used as sampling apparatus, first group of quantum bit of quantum processor
Indicate the movement node of general Boltzmann machine, and second group of quantum bit of quantum processor indicates general Boltzmann machine
Concealed nodes.In this embodiment, quantum processor includes one group of coupler.This group of coupler of the quantum processor include
At least one coupler, each coupler at least one coupler are used for the quantum bit and second in first group of quantum bit
Infall between at least one quantum bit of group quantum bit provides communicative couplings.This group of coupler further includes multiple couplers,
Each coupler in multiple couplers is for the other amounts in the quantum bit and second group of quantum bit in second group of quantum bit
Infall between sub- position provides communicative couplings.In this embodiment, first group of quantum bit is for the dynamic of Stochastic Control Problem
Work is gathered, and second group of quantum bit is used for one group of concealed nodes of general Boltzmann machine.
Each node of general Boltzmann machine takes the value in { 0,1 }, except not a node is used for the shape of Stochastic Control Problem
State set or set of actions.
For indicating that multiple nodes of the state set of Stochastic Control Problem and the general Boltzmann machine of set of actions can
Using the value in { 0,1 } perhaps limited or unlimited discrete value set or the real number indicated by floating type.
In one embodiment, wherein quantum processor is used as sampling apparatus, the ON coupling between any two quantum bit
Conjunction is considered as the weight between two corresponding nodes of general Boltzmann machine.
In same embodiment, each ON coupling has float value intensity, which is the close of respective weights
Seemingly.The connectivity of non-zero weight instruction node between two nodes.
Still in same embodiment, each OFF coupling has effective zero intensity, and is general Boltzmann
The separated instruction of any two node in machine.
Training
According to processing step 56, current period state action pair is generated.
It should be appreciated that current period state action is to including state and respective action.
In one embodiment, current period state action pair is generated at random using digital computer 8.
In alternate embodiments, from environment generation current period state action pair.
In alternate embodiments, from strategy generating current period state action pair.
According to processing step 58, using current period state action generated to come modify indicate without or at least one
The data of coupler and at least one biasing.It should be appreciated that indicating no or at least one coupling using the modification of digital computer 8
The data of device and at least one biasing.
In the case where sampling apparatus includes quantum processor, if any quantum bit expression of quantum processor acts section
Point, then the processing step includes that will couple to cut each of between any quantum bit and any other quantum bit of expression movement node
It is changed to OFF.Then, using current period state action generated to updating and be connected to general Bohr of visible node hereby
The biasing of the corresponding quantum bit of those of graceful machine concealed nodes.
In the case where sampling apparatus includes simulation Quantum annealing application program, if simulation Quantum annealing application program
Any spin expression acts node, then the processing step includes that any spin of expression movement node is spinned it with any other
Between weight be set as zero.Then, using current period state action generated to updating and be connected to visible node
The biasing of those of the general Boltzmann machine corresponding spin of concealed nodes.
If current period state action is to by the vector v on visible node=(s a) indicates, and by state node i
The weight for being connected to concealed nodes j connected to it is wij, then pass through addition wijsiTo modify the biasing on concealed nodes j.Such as
It is w that fruit, which will act node k and be connected to the weight of concealed nodes j connected to it,kj, then pass through addition wkjakTo modify concealed nodes
Biasing on j.
In the case where sampling apparatus includes Optical devices, that modifies as described above respectively indicates the data of weight and biasing
It is sent to energy source and modulator.
In one embodiment, wherein sampling apparatus includes quantum processor, it should be understood that uses digital computer 8
Quantum processor is modified with including the quantum devices control system of quantum level controlling system 24 and coupling device control system 30
Coupling and biasing.
According to processing step 60, sampling is executed.It should be appreciated that including quantum processor or Optical devices in sampling apparatus
In the case of, according to the property of these devices, sampling is quantum.
It should be appreciated that executing the sampling for corresponding to current period state action pair to obtain the first sampling empirical mean.
In the case where sampling apparatus includes quantum processor, execute correspond to current period state action pair sampling with
Obtain the first quantum sampling empirical mean for corresponding to the quantum bit of quantum processor.
In the case where sampling apparatus includes Optical devices, the sampling for corresponding to current period state action pair is executed to obtain
Obtain the first sampling empirical mean of the optical parametric oscillator corresponding to Optical devices.
More precisely, the first sampling empirical mean includes three multiple values.
In the case where sampling apparatus includes quantum processor, more than first value is to measure in quantum sampling corresponding to hidden
Hide the average value of the state of each quantum bit of node.In the case where sampling apparatus includes Optical devices, more than first value is
The average value of the spin of the measurement of phase corresponding to optical parametric oscillator.It include simulation Quantum annealing application in sampling apparatus
In the case where program, more than first value is the average value of spin values.It will be understood by those skilled in the art that for concealed nodes j, it should
Value can be by < hj>vIt indicates, wherein (s a) is the vector for indicating the visible node corresponding to current period state action pair to v=.
In the case where sampling apparatus includes quantum processor, more than second value is that measurement corresponds to one in quantum sampling
To the average value of the product of the state of each pair of quantum bit of concealed nodes.In the case where sampling apparatus includes Optical devices, the
More than two value is the average value of the product of spin values corresponding with the measured value of the phase of optical parametric oscillator.In sampling cartridge
In the case where setting including simulation Quantum annealing application program, more than second value is the average value of the product of spin values.This field skill
Art personnel will be understood that, for this to concealed nodes j and k, which can be by < hjhk>vIt indicates.
In the case where sampling apparatus includes quantum processor, the multiple values of third be byThe quantum treatment of expression
The frequency of occurrences of each configuration of the quantum bit of device, wherein h is to indicate to measure all quantum in each sampling that quantum samples
The binary vector of the state of position.
In the case where sampling apparatus includes the simulation Quantum annealing application program for classical Boltzmann machine, third is more
A value be byThe frequency of occurrences of each configuration of the spin of expression, wherein h is to indicate to measure in each sample
The binary vector of the state of all spins.
In the case where sampling apparatus includes Optical devices, the multiple values of third are the frequencies of occurrences of each spin configuration,
Corresponding to byThe phase of the optical parametric oscillator of expression, wherein h is to indicate to correspond to light at each sample
Learn the binary vector of the spin values of the phase measurement of parametric oscillator.
It include at the quantum for executing the sampling from the quantum Hamiltonian for indicating quantum Boltzmann machine in sampling apparatus
Manage device in the case where, the multiple values of third be byThe classical Effective Hamiltanian of the expression quantum Boltzmann machine of expression
Each sample configuration the frequency of occurrences, wherein c is the binary vector for indicating all effective spin states.
In one embodiment, indicating the quantum Hamiltonian of quantum Boltzmann machine is
It is with n spin σ1..., σn。
In further embodiment, classical Effective Hamiltanian includes the quantum Hamiltonian of quantum Boltzmann machine
Spin m copy.
Provide the quantity for corresponding to the copy of effective classical Hamiltonian of the quantum Boltzmann machine with transverse field
m。
In one embodiment, using digital computer 8, and the memory of digital computer 8 is more accurately used
22 obtain the copy amount m of effectively classical Ising model.
In alternate embodiments, the copy amount m of effectively classical Ising model by with the operationally coupling of digital computer 8
The remote processing unit of conjunction is supplied to digital computer 8.
Each spin σiIt is expressed as with mSpin correlation connection.For i=1 ..., n and k=1 ...,
M, each spinOn biasing be arranged toFor 1≤i ≠ j≤n, every two spinWithBetween coupling quilt
It is set asFor each k=1 ..., m-1, every two spinWithBetween coupling be arranged toTherefore, higher one-dimensional Effective Hamiltanian is
It include quantum processor in sampling apparatus, quantum Hamilton of the quantum processor from expression quantum Boltzmann machine
In the case that amount executes sampling, classical Effective Hamiltanian is constructed by the way that the measured value of quantum bit is attached to effectively spin
Sample configuration, wherein each measuring configuration of quantum bit corresponds to the copy in Effective Hamiltanian.
Sampling apparatus include simulation Quantum annealing application program, the simulation Quantum annealing application program from indicate quantum glass
In the case that the quantum Hamiltonian of the graceful machine of Wurz executes sampling, the multiple values of third be byThe Effective Hamiltonian of expression
The frequency of occurrences for each configuration of amount effectively spinned, wherein c is the binary vector for indicating all effective spin states.
Still referring to Figure 2 and according to processing step 62, the approximation of the value of Q function is executed.
It should be appreciated that executing Q function to place in current period state action using the first sampling empirical mean obtained
Value approximate determination.
It should be appreciated that being sampled and being passed through using the first quantum obtained in the case where sampling apparatus includes quantum processor
Test the approximate determination that mean value executes the value of Q function in current period state action to place.
It will be further understood that, the approximation of the value of Q function is determined using digital computer 8.
It will be understood by those skilled in the art that the value of Q function indicates the effectiveness of current period state action pair.
According to processing step 64, future period state is obtained.It should be appreciated that state is to handle to obtain by stochastic regime
's.
In one embodiment, future period is obtained by being related to the random test of known Markov transition probabilities
State.In another embodiment, future period state is obtained by the observation from environment.In another embodiment, from
The training data of offer obtains future period state.
It should be appreciated that obtaining future period state using digital computer 8.
In one embodiment, using digital computer 8, and the memory of digital computer 8 is more accurately used
22 obtain future period state,.
In alternate embodiments, when by the remote processing unit that is operatively coupled with digital computer 8 by future
Phase state is supplied to digital computer 8.
According to processing step 66, future period movement is obtained.The acquisition of movement include to include future period state and appoint
The multiple all state actions what may be acted are to random optimization test is executed, thus in future period offer movement.
It in one embodiment, include obtaining temperature ginseng to random optimization test is executed to multiple all state actions
Number obtains future period state and adopts to associated ANALOGY OF BOLTZMANN DISTRIBUTION approximate with the value of the Q function with state variable
Sample, wherein state variable is fixed at future period state and the temperature of offer.
In one embodiment, correspond to movement node to sample ANALOGY OF BOLTZMANN DISTRIBUTION.In this embodiment, for
Current period state s and each movement ai∈ A, corresponding Q function by approximation and are expressed as Qi.Then, from distributionIn to movement ai∈ A sampling.Obtained movement is assumed for the best dynamic of current period state s
Make.
In another embodiment that sampling apparatus includes quantum processor, wherein first group of quantum bit indicates STOCHASTIC CONTROL
The set of actions of problem, and second group of quantum bit indicates the concealed nodes of corresponding general Boltzmann machine, it can be with throughput
Sub-sampling come execute to current period state strategy update.In one embodiment, to include future period state and
Multiple all state actions of any possible movement include to execution random optimization test will be in the quantum bit of first group of quantum bit
Infall between the quantum bit of second group of quantum bit provides all couplers being communicatively coupled and is switched to ON, using corresponding to
The future period state of future period state action pair biases to modify at least one of second group of quantum bit, executes quantum and adopts
Sample corresponds to the empirical mean of first group of quantum bit to obtain, and is passed through using digital computer 8 according to corresponding to first group of amount
The distribution of the empirical mean obtained of sub- position acts future period state assignment to update the strategy of future period state.
Still referring to Figure 2 and according to processing step 68, future is updated using the movement obtained in processing step 66
The strategy of period state.
According to processing step 70, using future period state action generated to come modify indicate without or at least one
The data of coupler and at least one biasing.It should be appreciated that indicating no or at least one coupling using the modification of digital computer 8
The data of device and at least one biasing.
In the case where sampling apparatus includes quantum processor, if the arbitrarily quantum position expression of quantum processor acts section
Point, then the processing step includes that will couple to cut each of between any quantum bit of expression movement node and other any quantum bits
It is changed to OFF.Then, correspond to the general glass for being connected to visible node to update using future period state action generated
The biasing of those of the graceful machine of the Wurz quantum bit of concealed nodes.
In the case where sampling apparatus includes simulation Quantum annealing application program, if simulation Quantum annealing application is any
Expression of spinning acts node, then the processing step includes that will act between any spin of node and any other spin in expression
Weight be set as zero.Then, using future period state action generated to updating and be connected to the one of visible node
As the corresponding spin of concealed nodes of those of Boltzmann machine biasing.
If future period state action is to by the vector v on visible node=(s a) indicates, and by state node i
The weight for being connected to concealed nodes j connected to it is wij, then pass through addition wijsiTo modify the biasing on concealed nodes j.Such as
It is w that fruit, which will act node k and be connected to the weight of concealed nodes j connected to it,kj, then pass through addition wkjakTo modify concealed nodes
Biasing on j.
In the case where sampling apparatus includes Optical devices, that modifies as described above respectively indicates coupled weight and biasing
Data are sent to energy source and modulator.
In one embodiment, wherein sampling apparatus includes quantum processor, it should be understood that uses digital computer 8
Quantum processor is modified with including the quantum devices control system of quantum level controlling system 24 and coupling device control system 30
Coupling and biasing.
According to processing step 72, sampling is executed.In the case where sampling apparatus includes quantum processor or Optical devices, answer
Work as understanding, by the property of these devices, sampling is quantum.Correspond to future period state action pair it should be appreciated that executing
Sampling to obtain the second sampling empirical mean.
In the case where sampling apparatus includes quantum processor, execute correspond to future period state action pair sampling with
Obtain the second quantum sampling empirical mean for corresponding to the quantum bit of quantum processor.
In the case where sampling apparatus includes Optical devices, the sampling for corresponding to future period state action pair is executed to obtain
Obtain the second sampling empirical mean of the optical parametric oscillator corresponding to Optical devices.
More precisely, the second sampling empirical mean includes three multiple values.
In the case where sampling apparatus includes quantum processor, more than first value is to measure in quantum sampling corresponding to hidden
Hide the average value of the state of each quantum bit of node.In the case where sampling apparatus includes Optical devices, more than first value is
The average value of the spin of the measurement of phase corresponding to optical parametric oscillator.It include simulation Quantum annealing application in sampling apparatus
In the case where program, more than first value is the average value of spin values.It will be understood by those skilled in the art that for concealed nodes j, it should
Value can be by < hj>vIt indicates, wherein (s a) is the vector for indicating the visible node corresponding to current period state action pair to v=.
In the case where sampling apparatus includes quantum processor, more than second value is that measurement corresponds to one in quantum sampling
To the average value of the product of the state of each pair of quantum bit of concealed nodes.In the case where sampling apparatus includes Optical devices, the
More than two value is the average value of the product of spin values corresponding with the measured value of the phase of optical parametric oscillator.In sampling apparatus
In the case where simulation Quantum annealing application program, more than second value is the average value of the product of spin values.Art technology
Personnel will be understood that, for this to concealed nodes j and k, which can be by < hjhk>vIt indicates.
In the case where sampling apparatus includes quantum processor, the multiple values of third be byThe quantum treatment of expression
The frequency of occurrences of each configuration of the quantum bit of device, wherein h is to indicate to measure all quantum in each sample that quantum samples
The binary vector of the state of position.
In the case where sampling apparatus includes the simulation Quantum annealing application program for classical Boltzmann machine, third is more
A value be byThe frequency of occurrences of each configuration of the spin of expression, wherein h is to indicate to survey in each sample of sampling
Measure the binary vector of the state of all spins.
In the case where sampling apparatus includes Optical devices, the multiple values of third correspond to byThe optics of expression
The frequency of occurrences of each configuration of the spin of the phase of parametric oscillator, wherein h is to indicate corresponding at each sample of sampling
In the binary vector of the spin values of the phase measurement of optical parametric oscillator.
In the case where quantum processor in the embodiment that sampling apparatus includes quantum Boltzmann machine, third is multiple
Value be byThe appearance frequency of each sample configuration of the classical Effective Hamiltanian of the expression quantum Boltzmann machine of expression
Rate, wherein c is the binary vector for indicating all effective spin states.
It will be further understood that, the feelings of the quantum processor in the embodiment that sampling apparatus includes quantum Boltzmann machine
Under condition, the sample configuration of Effective Hamiltanian is constructed by the way that the measured value of quantum bit is attached to effectively spin, wherein quantum
Each measuring configuration of position corresponds to the copy in Effective Hamiltanian.
In the case where sampling apparatus includes the simulation Quantum annealing application program for quantum Boltzmann machine, third is more
A value is the frequency of occurrences for each configuration of Effective Hamiltanian effectively spinned.
Still referring to Figure 2 and according to processing step 74, the new approximation of the value of Q function is determined.It should be appreciated that using institute
New approximation of the second sampling empirical mean obtained to future period state action to the value for executing Q function.It should be appreciated that Q
The effectiveness of function representation future period state action pair.In the case where wherein sampling apparatus includes quantum processor, institute is used
Second quantum of the quantum bit corresponding to quantum processor obtained samples empirical mean, to future period state action to execution
The approximation of the value of Q function.
It should be appreciated that executing the approximation of the value of Q function using digital computer 8.
In one embodiment, Q letter is executed using the remote processing unit for being operably connected to digital computer 8
The approximation of several values.
It should be appreciated that in one embodiment and in the case where sampling apparatus includes quantum processor, current
The approximation for the value that period and future period obtain Q function includes the more of the Boltzmann machine obtained along measurement axis from sampling apparatus
A sample configuration, from the configuration sample of the Effective Hamiltonian function of the multiple above-mentioned quantum Boltzmann machines of configuration sample architecture obtained
Originally the experience approximation of the negative free energy for the quantum Boltzmann machine being given by and using digital computer 8 is calculated
It should be appreciated that including in one embodiment and in sampling apparatus the analog quantity for quantum Boltzmann machine
In the case where son annealing, obtaining the approximation of the value of Q function in current period and future period includes being indicated from sample devices
Multiple sample configurations of the Effective Hamiltanian of above-mentioned quantum Boltzmann machine, and be given by using digital computer calculating
Quantum Boltzmann machine negative free energy experience it is approximate
It should be appreciated that including in another embodiment and in sampling apparatus quantum processor or Optical devices or simulation
It include along measurement axis from sampling in the approximation that current period and future period obtain the value of Q function in the case where Quantum annealing
The multiple sample configurations for the classical Boltzmann machine that device obtains, and the classics being given by are calculated using digital computer 8
The experience of the negative free energy of Boltzmann machine is approximate
It should be appreciated that in another embodiment, including in the approximation for the value that current period and future period obtain Q function
The close of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn corresponding to Boltzmann machine is obtained from sampling apparatus
Seemingly, and using digital computer 8 glass is calculated using the approximation of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn
The experience of the negative free energy of the graceful machine of Wurz is approximate.Negative free energy is given by
Here constant ρ depends on the weight and biasing of the Boltzmann machine in classical Boltzmann machine, and takes
Certainly in the weight of the Boltzmann machine in quantum Boltzmann machine and biasing and lateral field strength.Index #c is indicated
The number of free cluster in the random cluster expression of Fortuin-Kasteleyn.
Still referring to Figure 2 and according to processing step 76, the Q function generated in current period state action to place is used
The approximation of value and first samples empirical mean and in the current period state action obtained using incentive structure to the corresponding of place
It rewards to update each coupler of Boltzmann machine and the respective each weight of each node and each biasing.In sample devices
In the case where quantum processor, each weight of quantum processor and each biasing are updated.
More precisely, equal to the approximation of the value of the Q function of place's generation and the first experience using current period state action
The corresponding reward of value and the current period state options pair obtained using incentive structure, updates each coupling of quantum processor
Device and the respective each weight of each quantum bit and each biasing.
If r indicates that the value of reward, it is hidden to will be seen that node i is connected to by following formula update for current period state action
Hide the weight of node k
Δwik=∈n(r+γQ2-Q1)vi<hk>v。
The weight that concealed nodes k is connected to concealed nodes j is updated by following formula
Δukj=∈n(r+γQ2-Q1)<hkhj>v。
And the biasing on concealed nodes k is updated by following formula
Δbk=∈n(r+γQ2-Q1)<hk>v。
Here Q1It is the approximation of the Q function of current period state action pair, Q2It is the Q function of future period state action pair
Approximation.
According to identical processing step, the biasing on the arbitrarily quantum position of quantum processor passes through the hiding section represented by it
Renewal amount on point updates.
According to identical processing step, the weight of the arbitrarily coupling device of quantum processor passes through the weight u represented by itkjOr
wikRenewal amount update.
In one embodiment, each weight and each biasing of quantum processor are updated using digital computer 8.
Still referring to Figure 2 and according to processing step 78, test is executed to find out and whether meets stopping criterion.This field
The skilled person will understand that stopping criterion can be it is various types of.
It should be appreciated that in one embodiment, stopping criterion may include the training step for reaching maximum quantity.
It should be appreciated that stopping criterion may include reaching maximum runing time in an optional embodiment.
It should be appreciated that in an optional embodiment, stopping criterion may include the weight and partially of coupling and local field
The convergence for the function set.
It should be appreciated that stopping criterion may include that strategy is converged to fixed policy in an optional embodiment.
In an optional embodiment, test includes at least one stopping criterion.
In the case where being unsatisfactory at least one stopping criterion and according to processing step 56, from the training data of offer or
From environment generation current period state action pair.
In the case where meeting at least one stopping criterion, strategy is provided according to processing step 80.
It should be appreciated that strategy can be provided according to various embodiments.Indeed, it is to be understood that using digital computer 8
Most well-known strategy is provided.
In one embodiment, more specifically policy store is stored in digital computer 8 in digital computer
In memory 22.
In alternate embodiments, strategy is shown to the user interacted with digital computer 8 via display device 14.
In another optional embodiment, strategy is sent to the long-range processing being operatively coupled with digital computer 8
Unit.
It should be appreciated that it is computer-readable to further disclose a kind of non-transitory for storing computer executable instructions
Storage medium, computer executable instructions make digital computer execute the plan for improving Stochastic Control Problem when executed
Method slightly, Stochastic Control Problem are characterized in that set of actions, state set, as the reward knot of state and the function of movement
Structure and multiple decision periods, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, this method packet
It includes using being coupled to digital computer and being coupled to the sampling apparatus of sampling apparatus control system, which, which obtains, indicates glass
The data of the sampling configuration of the graceful machine of Wurz, the Boltzmann machine include multiple nodes, multiple couplers, multiple biasings (each partially
Set correspond to multiple nodes in node), multiple coupled weights (each coupled weight correspond to multiple couplers in coupling
Device), and lateral field strength;Using digital computer obtain include set of actions, state set, Stochastic Control Problem reward knot
Structure and initial policy for Stochastic Control Problem, the strategy include selecting at least one movement for each state;Use number
Computer and sampling apparatus control system will indicate each coupler and the respective initial weight of each node of Boltzmann machine
Sampling apparatus is distributed to the data of biasing and lateral field strength;Until meeting stopping criterion: being worked as using digital computer generation
Preceding period state action pair, using digital computer and sampling apparatus control system, using the current state movement of generation to next
Modification indicate without or at least one coupler and at least one biasing data, execute corresponding to current period state action pair
Sampling to obtain the first sampling empirical mean, using first sampling empirical mean, use digital computer, obtain current period
The approximation of the value of Q function under state action, the value of Q function indicate the effectiveness of current period state action pair, use numerical calculation
Machine obtains future period state action pair, and wherein state is to handle to obtain by stochastic regime, and further, wherein should
The acquisition of movement includes random to executing to multiple all state actions including future period state and any possible movement
Optimum Experiment to provide the movement of future period, and updates the strategy of future period state;Use digital computer and sampling
Apparatus control system indicates no or at least one coupler and at least one to modification using the future period state action of generation
The data of a biasing execute the sampling for corresponding to future period state action pair to obtain the second sampling empirical mean, use the
Two sampling empirical means obtain the approximation of the value of Q function at future period state action, the value of Q function using digital computer
The effectiveness for indicating future period state action pair, using digital computer, the Q letter that place is generated using current period state action
The approximation of several values and first samples empirical mean and uses the corresponding of the current period state action pair of incentive structure acquisition
Reward, updates each coupler and the respective each weight of each node and each biasing of Boltzmann machine, and when meeting
Strategy is provided using digital computer when stopping criterion.
It should be appreciated that including the Stochastic Control Problem in memory cell 22 for improving in one embodiment
Strategy application program include for using the sampling cartridge for being coupled to digital computer and being coupled to sampling apparatus control system
The instruction set, the sampling apparatus obtain the data for indicating the sample configuration of Boltzmann machine, which includes multiple sections
Point, multiple couplers, multiple biasings (each biasing corresponds to the node in multiple nodes), (each coupling of multiple coupled weights
Weight corresponds to the coupler in multiple couplers), and lateral field strength.For improve include in memory cell 22 with
Machine control problem strategy application program further include for using digital computer obtain include set of actions, state set,
The instruction of the initialization data of the incentive structure of Stochastic Control Problem and the initial policy for Stochastic Control Problem, strategy include
At least one movement is selected for each state.For improve include Stochastic Control Problem in memory cell 22 strategy
Application program further includes the instruction for using digital computer and sampling apparatus control system, will indicate the every of Boltzmann machine
The data of a coupler and the respective initial weight of each node and biasing and lateral field strength distribute to sampling apparatus.For changing
Application program into the strategy for including Stochastic Control Problem in memory cell 22 further includes for stopping mark until meeting
Quasi- instruction: generating current period state action pair using digital computer, uses digital computer and sampling apparatus control system
System, using current period state action generated to modification indicate without or at least one coupler and at least one biasing
Data are executed the sampling for corresponding to current period state action pair to obtain the first sampling empirical mean, are passed through using the first sampling
Mean value is tested, using digital computer, obtains the approximation of the value of the Q function of current period state action, the value of Q function indicates current
The effectiveness of period state action pair obtains future period state action pair using digital computer, and wherein state is by random
What state processing obtained, and further, wherein the acquisition acted include to include future period state and it is any can be active
The multiple all state actions made are to random optimization test is executed, to act in future period offer and update future period shape
The strategy of state;Using digital computer and sampling apparatus control system, using the future period state action of generation to modification table
Show without or at least one coupler and at least one biasing data, execute correspond to future period state action pair sampling
To obtain the second sampling empirical mean, it is dynamic to obtain future period state using digital computer using the second sampling empirical mean
The approximation of the value of Q function at work, the value of Q function indicate the effectiveness of future period state action pair, using digital computer, use
The approximation of the value for the Q function that current period state action generates place and the first empirical mean and working as using incentive structure acquisition
The corresponding reward of preceding period state action pair updates each coupler and the respective each weight of each node of Boltzmann machine
With each biasing.It include the application program of strategy of Stochastic Control Problem in memory cell 22 further include using for improving
In the instruction for using digital computer offer tactful when meeting stopping criterion.
It should be appreciated that the advantages of method disclosed herein is, usage amount sub-sampling is calculated for acting node and hidden
Component involved in the weight between component involved in the empirical mean of node, approximation Q function, update quantum bit is hidden, and
It updates it and biases related component, thus provide faster Q learning method.
It will be further understood that, another advantage of method disclosed herein is that it overcomes to be handled for Markovian decision
Traditional solutions in dimension experienced limitation.
Although above description is related to the presently contemplated specific embodiment of inventor, but it is to be understood that the present invention exists
Its broad aspect includes the functional equivalent of elements described herein.
A kind of methods for improving the strategy of Stochastic Control Problem of item 1., Stochastic Control Problem is by set of actions, state
Gather, as the incentive structure and characterization of multiple decision periods of state and the function of movement, wherein at basic stochastic regime
The evolution of reason depends on multiple movements in strategy, and method includes:
Using the sampling apparatus for being coupled to digital computer He being coupled to sampling apparatus control system, sampling apparatus obtains table
Show the data of the sample configuration of Boltzmann machine, Boltzmann machine includes:
Multiple nodes,
Multiple couplers,
Multiple biasings, each biasing correspond to a node in multiple nodes,
Multiple coupled weights, each coupled weight correspond to a coupler in multiple couplers, and
Lateral field strength;
Obtained using digital computer include the set of actions of Stochastic Control Problem, state set, incentive structure and with
The initialization data of the initial policy of machine control problem, strategy include selecting at least one movement for each state;
Using digital computer and sampling apparatus control system, each coupler of Boltzmann machine and every will be respectively indicated
The data of the initial weight of a node and biasing and lateral field strength distribute to sampling apparatus;
It performs the following operation until meeting stopping criterion:
Current period state action pair is generated using digital computer,
Using digital computer and sampling apparatus control system using the current period state action of generation to modifying table
Show the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to current period state action pair sampling to obtain the first sampling empirical mean,
The value of the Q function at the first sampling empirical mean acquisition current period state action is used using digital computer
Approximation, the value of Q function indicate the effectiveness of current period state action pair,
Future period state action pair is obtained using digital computer, wherein state is to handle to obtain by stochastic regime
, and further wherein, acquisition movement includes: to multiple all shapes including future period state and any possible movement
State movement is tested random optimization is executed, so that the strategy for future period state is acted and updates in future period offer,
Using digital computer and sampling apparatus control system, using future period state action generated to modifying
Indicate the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to future period state action pair sampling to obtain the second sampling empirical mean,
The value of the Q function at the second sampling empirical mean acquisition future period state action is used using digital computer
Approximation, the value of Q function indicate the effectiveness of future period state action pair, and
It is used using digital computer and is adopted in approximation and first of the current period state action to the value of the Q function of place's generation
Sample empirical mean, and the corresponding of place is rewarded in current period state action using what incentive structure obtained, to update respectively
Each coupler of Boltzmann machine and each weight of each node and each biasing;And
Strategy is provided using digital computer when meeting stopping criterion.
2. methods according to item 1 of item, wherein sampling apparatus includes quantum processor, and wherein, sampling apparatus control
System processed includes quantum devices control system;Further wherein, quantum processor is coupled to digital computer and quantum devices control
System processed, further wherein, quantum processor include multiple quantum bits and multiple couplers, and each coupler is used to measure at two
The infall of sub- position provides communicative couplings.
3. methods according to item 1 of item, wherein sampling apparatus includes being configured as receiving energy simultaneously from optical energy source
The Optical devices and multiple coupling devices of multiple optical parametric oscillators are generated, each of multiple coupling devices can
Control ground couples an optical parametric oscillator in multiple optical parametric oscillators.
4. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list
Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by zero transverse direction field strength
Classical Boltzmann machine;Further wherein, memory cell includes respectively indicating each of classical Boltzmann machine for obtaining
The application program of the data of each weight and each biasing of coupler and each node, further wherein, application program are applicable in
In the simulation Quantum annealing for executing classical Boltzmann machine.
5. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list
Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by nonzero value transverse direction field strength
Quantum Boltzmann machine;Further wherein, memory cell includes respectively indicating the every of quantum Boltzmann machine for obtaining
The application program of the data of each weight and each biasing of a coupler and each node;Further wherein, application program is suitable
For executing the simulation Quantum annealing of quantum Boltzmann machine.
6. methods according to item 5 of item, wherein the simulation Quantum annealing for executing quantum Boltzmann machine provides expression amount
Multiple sample configurations of the Effective Hamiltanian of sub- Boltzmann machine.
7. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list
Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by zero transverse direction field strength
Classical Boltzmann machine;Further, wherein memory cell includes respectively indicating the every of classical Boltzmann machine for obtaining
The application program of the data of each weight and each biasing of a coupler and each node, further wherein, application program are suitable
Multiple examples for indicating the random cluster of Fortuin-Kasteleyn corresponding with classical Boltzmann machine sample,
To provide the approximation of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn.
8. methods according to item 1 of item, wherein sampling apparatus includes central processing unit and is coupled to central processing list
Member and the memory cell for realizing Boltzmann machine, wherein the Boltzmann machine realized is characterized by nonzero value transverse direction field strength
Quantum Boltzmann machine;Further wherein, memory cell includes respectively indicating the every of quantum Boltzmann machine for obtaining
The application program of the data of each weight and each biasing of a coupler and each node;Further wherein, application program is suitable
Multiple examples for indicating the random cluster of Fortuin-Kasteleyn corresponding with quantum Boltzmann machine sample,
To provide the approximation of the number of the cluster in the random cluster expression of Fortuin-Kasteleyn.
9. methods according to any one of item 2,3,4 and 5 of item, wherein obtained in both current period and future period
The approximation for obtaining the value of Q function includes: the multiple configuration samples for obtaining Boltzmann machine from sampling apparatus along measurement axis, and is used
The experience that digital computer calculates the free energy of Boltzmann machine is approximate.
10. methods according to any one of item 2 and 5 of item, wherein obtain Q in both current period and future period
The approximation of the value of function includes: that multiple sample configurations of Boltzmann machine are obtained from sampling apparatus along measurement axis, from being obtained
Sample configuration construction indicate quantum Boltzmann machine Effective Hamiltanian multiple configurations sample, and use numerical calculation
The experience that machine calculates the free energy of quantum Boltzmann machine is approximate.
11. methods according to item 6 of item, wherein obtain the close of the value of Q function in both current period and future period
It seemingly include the sample of the multiple configurations for the Effective Hamiltanian for indicating quantum Boltzmann machine being obtained from sampling apparatus, and use number
The experience that word computer calculates the free energy of quantum Boltzmann machine is approximate.
12. methods according to item 8 of item, wherein obtain the close of the value of Q function in both current period and future period
It seemingly include: to be obtained in the random cluster expression of Fortuin-Kasteleyn corresponding with quantum Boltzmann machine from sampling apparatus
The approximation of the number of cluster, and it is approximate using the experience that digital computer calculates the free energy of quantum Boltzmann machine.
13. methods according to any one of item 2,3,4 and 5 of item, wherein calculate equal corresponding to the first experience of node
Both value and the second empirical mean include: one obtained along measurement axis from sampling apparatus in quantum or classical Boltzmann machine
Multiple configurations sample, and carry out using digital computer the approximation of the empirical mean of calculate node.
14. methods according to item 6 of item, wherein calculate the first empirical mean for corresponding to node and the second experience is equal
Both values include: the sample of multiple configurations of Effective Hamiltanian of Boltzmann machine to be obtained from sampling apparatus, and use number
Computer carrys out the approximation of the empirical mean of calculate node.
15. methods according to item 1 of item, wherein include: to random optimization test is executed to multiple all state actions
Each state action corresponding to future period state is used using digital computer and sampling apparatus control system
No coupler or at least one coupler and at least one data biased are indicated to modify,
It executes and samples with each state action for corresponding to future period state to corresponding to provide empirical mean, make
The approximation for corresponding to the value of the Q function of each state action pair of future period state is obtained with digital computer,
Use the value with each state action corresponding to future period state to corresponding all approximate Q functions, benefit
With digital computer, sampled from corresponding distribution to update the strategy of future period state.
16. methods according to item 1 of item, wherein include: to random optimization test is executed to multiple all state actions
Obtain temperature parameter;
Obtain future period state;
Associated ANALOGY OF BOLTZMANN DISTRIBUTION approximate with the value of the Q function with state variable is sampled, state becomes
Measure future period state and provide at a temperature of be fixed.
17. methods according to item 2, wherein multiple quantum bits of quantum processor include:
First group of quantum bit;
Second group of quantum bit;And
Wherein, multiple couplers of quantum processor include:
At least one coupler, each of at least one coupler is for a quantum in first group of quantum bit
The infall between at least one quantum bit in position and second group of quantum bit provides communicative couplings, and
Multiple couplers, each of multiple couplers are for the quantum bit and second in second group of quantum bit
The infall between other quantum bits in group quantum bit provides communicative couplings.
18. methods according to item 17 of item, wherein first group of quantum bit indicates the set of actions of Stochastic Control Problem.
19. methods according to item 17 of item, wherein indicated using current period state action generated to modify
There is no the data of coupler or at least one coupler and at least one biasing, comprising:
There is provided the infall between the quantum bit and second group of quantum bit in first group of quantum bit to communicative couplings
All couplers be switched to closing, and
It is biased to modify at least one of second group of quantum bit using current period state action generated.
20. methods according to item 17 of item, wherein indicated using future period state action generated to modify
There is no the data of coupler or at least one coupler and at least one biasing, comprising:
There is provided the infall between the quantum bit and second group of quantum bit in first group of quantum bit to communicative couplings
All couplers be switched to closing, and
It is biased to modify at least one of second group of quantum bit using future period state action generated.
21. methods according to item 17, wherein to including the multiple of future period state and any possible movement
All state actions are tested random optimization is executed, comprising:
Infall between the quantum bit in the quantum bit and second group of quantum bit in first group of quantum bit is mentioned
Connection is switched to for all couplers of communicative couplings;
It is modified in second group of quantum bit at least using the future period state for corresponding to future period state action pair
One biasing;
Quantum sampling is executed to obtain the empirical mean for corresponding to first group of quantum bit;And
By according to the distribution for the empirical mean obtained for corresponding to first group of quantum bit to future period state assignment
Movement to update the strategy of future period state using digital computer.
22. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes the instruction for reaching maximum quantity
Practice step.
23. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes reaching maximum runing time.
24. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes the power of coupling and local field
The convergence of weight and the function of biasing.
25. methods according to any one of item 1 to 21 of item, wherein stopping criterion includes that strategy is converged to fixation
Strategy.
26. methods according to any one of item 1 to 25, wherein provide strategy include it is following at least one: to
User's display strategy of digital computer interaction;It stores the policies into digital computer and sends strategy to and operationally connect
It is connected to another processing unit of digital computer.
27. methods according to any one of item 1 to 26 of item, wherein digital computer includes memory cell;Into one
Wherein, initialization data is obtained from the memory cell of digital computer to step.
28. methods according to any one of item 1 to 26 of item, wherein initialization data is handed over from digital computer
An acquisition in mutual user or the remote processing unit being operably connected with digital computer.
A kind of 29. digital computers of item, including
Central processing unit;
Display device;
Communication port, for digital computer to be operably connected to sampling apparatus, sampling apparatus is coupled to digital meter
Calculation machine and sampling apparatus control system;
Memory cell, the application program including the strategy for improving Stochastic Control Problem, Stochastic Control Problem is by moving
Make to gather, state set, the incentive structure as state and the function of movement and characterization of multiple decision periods, wherein basis
The evolution of stochastic regime processing depends on multiple movements in strategy, and application program includes:
For using the instruction for being coupled to the sampling apparatus of digital computer and sampling apparatus control system, sampling apparatus is obtained
Must indicate the data of the sample configuration of Boltzmann machine, Boltzmann machine include multiple nodes, multiple couplers, multiple biasings,
Multiple coupled weights, and lateral field strength, each biasing correspond to a node in multiple nodes, and each coupled weight is corresponding
A coupler in multiple couplers;
For use digital computer obtain include the set of actions of Stochastic Control Problem, state set, incentive structure and
The instruction of the initialization data of the initial policy of Stochastic Control Problem, strategy include selecting at least one dynamic for each state
Make;
For using digital computer and sampling apparatus control system that will respectively indicate each coupler of Boltzmann machine
The instruction of sampling apparatus is distributed to the data of the initial weight of each node and biasing and lateral field strength;
For the following instruction operated until meeting stopping criterion:
Current period state action pair is generated using digital computer,
Using digital computer and sampling apparatus control system using current period state action generated to modifying
Indicate the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to current period state action pair sampling to obtain the first sampling empirical mean,
The value of the Q function at the first sampling empirical mean acquisition current period state action is used using digital computer
Approximation, the value of Q function indicate the effectiveness of current period state action pair,
Future period state action pair is obtained using digital computer, wherein state is to handle to obtain by stochastic regime
, and further wherein, acquisition movement includes to multiple all shapes including future period state and any possible movement
State movement is to progress random optimization test, so that the strategy for future period state is acted and updates in future period offer,
Using digital computer and sampling apparatus control system, using future period state action generated to modifying
Indicate the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to future period state action pair sampling to obtain the second sampling empirical mean,
The value of Q function is obtained at future period state action using the second sampling empirical mean using digital computer
Approximation, the value of Q function indicate the effectiveness of future period state action pair, and
It is used using digital computer and is adopted in approximation and first of the current period state action to the value of the Q function of place's generation
Sample empirical mean, and the corresponding of place is rewarded in current period state action using what incentive structure obtained, to update respectively
Each coupler of Boltzmann machine and each weight of each node and each biasing;And
The instruction of strategy is provided using digital computer when meeting stopping criterion.
Item 30. is a kind of for storing the non-transitory computer-readable storage media of computer executable instructions, computer
The method for the strategy that executable instruction executes digital computer for improving Stochastic Control Problem, STOCHASTIC CONTROL
Problem is characterized by set of actions, state set, the incentive structure as state and the function of movement and multiple decision periods,
Wherein, the evolution of basic stochastic regime processing depends on multiple movements in strategy, and method includes:
Using the sampling apparatus for being coupled to digital computer and sampling apparatus control system, sampling apparatus, which obtains, indicates Bohr
Hereby the data of the sample configuration of graceful machine, Boltzmann machine include:
Multiple nodes,
Multiple couplers,
Multiple biasings, each biasing correspond to a node in multiple nodes,
Multiple coupled weights, each coupled weight correspond to a coupler in multiple couplers, and
Lateral field strength;
Obtained using digital computer include the set of actions of Stochastic Control Problem, state set, incentive structure and with
The initialization data of the initial policy of machine control problem, strategy include selecting at least one movement for each state;
Each coupler of Boltzmann machine and every will be respectively indicated using digital computer and sampling apparatus control system
The data of the initial weight of a node and biasing and lateral field strength distribute to sampling apparatus;
It performs the following operation until meeting stopping criterion:
Current period state action pair is generated using digital computer,
Using digital computer and sampling apparatus control system using current period state action generated to modifying
Indicate the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to current period state action pair sampling to obtain the first sampling empirical mean,
The value of the Q function at the first sampling empirical mean acquisition current period state action is used using digital computer
Approximation, the value of Q function indicate the effectiveness of current period state action pair,
Future period state action pair is obtained using digital computer, wherein state is to handle to obtain by stochastic regime
, and further wherein, acquisition movement includes: to multiple all shapes including future period state and any possible movement
State movement is tested random optimization is executed, so that the strategy for future period state is acted and updates in future period offer,
Using digital computer and sampling apparatus control system, using future period state action generated to modifying
Indicate the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to future period state action pair sampling to obtain the second sampling empirical mean,
The value of the Q function at the second sampling empirical mean acquisition future period state action is used using digital computer
Approximation, the value of Q function indicate the effectiveness of future period state action pair,
It is used using digital computer and is adopted in approximation and first of the current period state action to the value of the Q function of place's generation
Sample empirical mean, and the corresponding of place is rewarded in current period state action using what incentive structure obtained, to update respectively
Each coupler of Boltzmann machine and each weight of each node and each biasing;And
Strategy is provided using digital computer when meeting stopping criterion.
Claims (30)
1. a kind of method for improving the strategy of Stochastic Control Problem, the Stochastic Control Problem is by set of actions, state set
It closes, as the incentive structure and characterization of multiple decision periods of state and the function of movement, wherein basic stochastic regime processing
Evolution depend on it is described strategy in multiple movements, which comprises
Using the sampling apparatus for being coupled to digital computer He being coupled to sampling apparatus control system, the sampling apparatus obtains table
Show the data of the sample configuration of Boltzmann machine, the Boltzmann machine includes:
Multiple nodes,
Multiple couplers,
Multiple biasings, each biasing correspond to a node in the multiple node,
Multiple coupled weights, each coupled weight correspond to a coupler in the multiple coupler, and
Lateral field strength;
It the use of digital computer acquisition include the set of actions of the Stochastic Control Problem, the state set, institute
The initialization data of the initial policy of incentive structure and the Stochastic Control Problem is stated, the strategy includes selecting for each state
Select at least one movement;
Using the digital computer and the sampling apparatus control system, each institute of the Boltzmann machine will be respectively indicated
The data of the initial weight and the biasing and the lateral field strength of stating coupler and each node distribute to described adopt
Sampling device;
It performs the following operation until meeting stopping criterion:
Current period state action pair is generated using the digital computer,
Using the digital computer and the sampling apparatus control system using the current period state action of generation to repairing
Change the data for indicating no coupler or at least one coupler and at least one biasing,
Execute correspond to the current period state action pair sampling to obtain the first sampling empirical mean,
The Q letter at the current period state action is obtained using the first sampling empirical mean using the digital computer
The approximation of several values, the value of the Q function indicate the effectiveness of the current period state action pair,
Future period state action pair is obtained using the digital computer, wherein the state is handled by stochastic regime
It obtains, and further wherein, obtaining the movement includes: to including the future period state and any possible movement
Multiple all state actions to random optimization test is executed, to provide the movement in the future period and update and be used for
The strategy of the future period state,
Using the digital computer and the sampling apparatus control system, using future period state action generated to next
Modification indicates the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to the future period state action pair sampling to obtain the second sampling empirical mean,
The institute at the future period state action is obtained using the second sampling empirical mean using the digital computer
The approximation of the value of Q function is stated, the value of the Q function indicates the effectiveness of the future period state action pair, and
Approximation and the institute in current period state action to the value of the Q function of place's generation are used using the digital computer
State the first sampling empirical mean, and the correspondence in the current period state action to place obtained using the incentive structure
Reward, to update each coupler of the Boltzmann machine and each weight and each biasing of each node respectively;And
The strategy is provided using the digital computer when meeting the stopping criterion.
2. according to the method described in claim 1, wherein, the sampling apparatus includes quantum processor, and wherein, described to adopt
Sampling device control system includes quantum devices control system;Further wherein, the quantum processor is coupled to the digital meter
Calculation machine and the quantum devices control system, further wherein, the quantum processor includes multiple quantum bits and multiple couplings
Device, each coupler are used to provide communicative couplings in the infall of two quantum bits.
3. according to the method described in claim 1, wherein, the sampling apparatus includes being configured as receiving energy from optical energy source
The Optical devices and multiple coupling devices of multiple optical parametric oscillators are measured and generate, it is every in the multiple coupling device
One optical parametric oscillator that can be coupled with controlling in the multiple optical parametric oscillator.
4. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described
Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by zero cross
The classical Boltzmann machine characterized to field strength;Further wherein, the memory cell is described including respectively indicating for acquisition
The application program of each coupler of classical Boltzmann machine and each weight of each node and the data of each biasing, into one
Wherein, the application program is adapted for carrying out the simulation Quantum annealing of the classical Boltzmann machine to step.
5. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described
Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by nonzero value
The quantum Boltzmann machine of lateral field strength characterization;Further wherein, the memory cell includes respectively indicating institute for obtaining
State the application program of each coupler of quantum Boltzmann machine and each weight of each node and the data of each biasing;Into
Wherein, the application program is adapted for carrying out the simulation Quantum annealing of the quantum Boltzmann machine to one step.
6. according to the method described in claim 5, wherein, the simulation Quantum annealing for executing the quantum Boltzmann machine mentions
For indicating multiple sample configurations of the Effective Hamiltanian of the quantum Boltzmann machine.
7. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described
Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by zero
It is worth the classical Boltzmann machine of lateral field strength characterization;Further, wherein the memory cell includes for obtaining table respectively
Show the application journey of the data of each coupler of the classical Boltzmann machine and each weight and each biasing of each node
Sequence, further wherein, the application program are suitable for Fortuin- corresponding with the classics Boltzmann machine
Multiple examples that the random cluster of Kasteleyn indicates are sampled, so that providing the random cluster of the Fortuin-Kasteleyn indicates
In cluster number approximation.
8. according to the method described in claim 1, wherein, the sampling apparatus includes central processing unit and is coupled in described
Central Processing Unit and the memory cell for realizing the Boltzmann machine, wherein the Boltzmann machine realized is by nonzero value
The quantum Boltzmann machine of lateral field strength characterization;Further wherein, the memory cell includes respectively indicating institute for obtaining
State the application program of each coupler of quantum Boltzmann machine and each weight of each node and the data of each biasing;Into
One step wherein, the application program be suitable for Fortuin-Kasteleyn corresponding with the quantum Boltzmann machine with
Multiple examples that machine cluster indicates are sampled, to provide the number of the cluster in the random cluster expression of the Fortuin-Kasteleyn
Purpose is approximate.
9. the method according to any one of claim 2,3,4 and 5, wherein in the current period and the future
The approximation that both phases obtain the value of the Q function includes: to obtain the Boltzmann machine from the sampling apparatus along measurement axis
Multiple configuration samples, and it is approximate using the experience that the digital computer calculates the free energy of the Boltzmann machine.
10. the method according to any one of claim 2 and 5, wherein in the current period and the future period two
The approximation that person obtains the value of the Q function includes: to obtain the Boltzmann machine from the sampling apparatus along measurement axis
Multiple sample configurations, indicate the more of the Effective Hamiltanian of the quantum Boltzmann machine from sample configuration obtained construction
The sample of a configuration, and it is approximate using the experience that the digital computer calculates the free energy of the quantum Boltzmann machine.
11. according to the method described in claim 6, wherein, obtaining the Q in both the current period and the future period
The approximation of the value of function includes that the Effective Hamiltanian for indicating the quantum Boltzmann machine is obtained from the sampling apparatus
The multiple configuration sample, and calculate using the digital computer experience of the free energy of the quantum Boltzmann machine
It is approximate.
12. according to the method described in claim 8, wherein, obtaining the Q in both the current period and the future period
The approximation of the value of function includes: to obtain the Fortuin- corresponding with the quantum Boltzmann machine from the sampling apparatus
The approximation of the number of cluster in the random cluster expression of Kasteleyn, and the quantum glass is calculated using the digital computer
The experience of the free energy of the graceful machine of Wurz is approximate.
13. the method according to any one of claim 2,3,4 and 5, wherein calculate the first warp for corresponding to the node
Testing both mean value and the second empirical mean includes: to obtain the quantum or the classical glass from the sampling apparatus along measurement axis
The sample of one multiple configuration in the graceful machine of Wurz, and calculate using the digital computer empirical mean of the node
Approximation.
14. according to the method described in claim 6, wherein, calculating the first empirical mean for corresponding to the node and the second warp
Testing both mean values includes: the sample that multiple configurations of Effective Hamiltanian of the Boltzmann machine are obtained from the sampling apparatus
Originally, and using the digital computer calculate the approximation of the empirical mean of the node.
15. according to the method described in claim 1, wherein, to the multiple all state actions to executing the random optimization
Test includes:
It is used using the digital computer and the sampling apparatus control system and corresponds to each of described future period state
State action indicates no coupler or at least one coupler and at least one data biased to modify,
It executes and samples with each state action for corresponding to the future period state to corresponding to provide empirical mean, make
The close of the value for corresponding to the Q function of each state action pair of the future period state is obtained with the digital computer
Seemingly,
Use the value with each state action corresponding to the future period state to corresponding all approximate Q functions, benefit
With the digital computer, sampled from corresponding distribution to update the strategy of the future period state.
16. according to the method described in claim 1, wherein, to the multiple all state actions to executing the random optimization
Test includes:
Obtain temperature parameter;
Obtain the future period state;
It is sampled to the approximate associated ANALOGY OF BOLTZMANN DISTRIBUTION of the value of the Q function with state variable, the shape
State variable the future period state and the offer at a temperature of be fixed.
17. according to the method described in claim 2, wherein, the multiple quantum bit of the quantum processor includes:
First group of quantum bit;
Second group of quantum bit;And
Wherein, the multiple coupler of the quantum processor includes:
At least one coupler, each of at least one described coupler is for one in first group of quantum bit
Infall between at least one quantum bit in quantum bit and second group of quantum bit provides communicative couplings, and
Multiple couplers, each of the multiple coupler in second group of quantum bit a quantum bit and
Infall between other quantum bits in second group of quantum bit provides communicative couplings.
18. according to the method for claim 17, wherein first group of quantum bit indicates the dynamic of the Stochastic Control Problem
Work is gathered.
19. according to the method for claim 17, wherein indicated using current period state action generated to modify
There is no the data of coupler or at least one coupler and at least one biasing, comprising:
There is provided the infall between the quantum bit and second group of quantum bit in first group of quantum bit to communication
All couplers of coupling are switched to closing, and
It is biased to modify at least one of described second group of quantum bit using current period state action generated.
20. according to the method for claim 17, wherein indicated using future period state action generated to modify
There is no the data of coupler or at least one coupler and at least one biasing, comprising:
There is provided the infall between the quantum bit and second group of quantum bit in first group of quantum bit to communication
All couplers of coupling are switched to closing, and
It is biased to modify at least one of described second group of quantum bit using future period state action generated.
21. according to the method for claim 17, wherein to including the future period state and any possible movement
Multiple all state actions are tested the random optimization is executed, comprising:
By the intersection between the quantum bit in the quantum bit and second group of quantum bit in first group of quantum bit
Place provides all couplers being communicatively coupled and is switched to connection;
It is modified in second group of quantum bit using the future period state for corresponding to the future period state action pair
At least one biasing;
Quantum sampling is executed to obtain the empirical mean for corresponding to first group of quantum bit;And
By according to the distribution for the empirical mean obtained for corresponding to first group of quantum bit to the future period state
Distribution acts to update the strategy of the future period state using the digital computer.
22. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes reaching maximum quantity
Training step.
23. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes reaching maximum operation
Time.
24. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes the coupling drawn game
The convergence of the function of the weight and biasing in portion.
25. according to claim 1 to method described in any one of 21, wherein the stopping criterion includes receiving the strategy
Hold back fixed policy.
26. according to claim 1 to method described in any one of 25, wherein provide it is described strategy include it is following at least one:
The strategy is shown to the user interacted with the digital computer;By the policy store in the digital computer and will
The strategy is sent to another processing unit for being operably connected to the digital computer.
27. according to claim 1 to method described in any one of 26, wherein the digital computer includes memory cell;
Further wherein, the initialization data is obtained from the memory cell of the digital computer.
28. according to claim 1 to method described in any one of 26, wherein the initialization data be from the number
The user of computer interaction or an acquisition in the remote processing unit that is operably connected with the digital computer.
29. a kind of digital computer, including
Central processing unit;
Display device;
Communication port, for the digital computer to be operably connected to sampling apparatus, the sampling apparatus is coupled to number
Word computer and sampling apparatus control system;
Memory cell, the application program including the strategy for improving Stochastic Control Problem, the Stochastic Control Problem is by moving
Make to gather, state set, the incentive structure as state and the function of movement and characterization of multiple decision periods, wherein basis
The evolution of stochastic regime processing depends on multiple movements in the strategy, and the application program includes:
It is described to adopt for using the instruction for being coupled to the sampling apparatus of the digital computer and the sampling apparatus control system
Sampling device obtains the data for indicating the sample configuration of Boltzmann machine, and the Boltzmann machine includes multiple nodes, multiple couplings
Device, multiple biasings, multiple coupled weights, and lateral field strength, each biasing correspond to one in the multiple node
Node, each coupled weight correspond to a coupler in the multiple coupler;
For using the digital computer to obtain the set of actions for including the Stochastic Control Problem, the state set
The instruction of the initialization data of the initial policy of conjunction, the incentive structure and the Stochastic Control Problem, the strategy include pair
At least one movement is selected in each state;
For using the digital computer and the sampling apparatus control system that will respectively indicate the every of the Boltzmann machine
The data of the initial weight and biasing of a coupler and each node and the lateral field strength distribute to the sampling apparatus
Instruction;
For the following instruction operated until meeting stopping criterion:
Current period state action pair is generated using the digital computer,
Using the digital computer and the sampling apparatus control system using current period state action generated to next
Modification indicates the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to the current period state action pair sampling to obtain the first sampling empirical mean,
The Q letter at the current period state action is obtained using the first sampling empirical mean using the digital computer
The approximation of several values, the value of the Q function indicate the effectiveness of the current period state action pair,
Future period state action pair is obtained using the digital computer, wherein the state is handled by stochastic regime
It obtains, and further wherein, obtaining the movement includes to including the future period state and any possible movement
Multiple all state actions to random optimization test is carried out, to provide the movement in the future period and update and be used for
The strategy of the future period state,
Using the digital computer and the sampling apparatus control system, using future period state action generated to next
Modification indicates the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to the future period state action pair sampling to obtain the second sampling empirical mean,
Institute is obtained at the future period state action using the second sampling empirical mean using the digital computer
The approximation of the value of Q function is stated, the value of the Q function indicates the effectiveness of the future period state action pair, and
Approximation and the institute in current period state action to the value of the Q function of place's generation are used using the digital computer
State the first sampling empirical mean, and the correspondence in the current period state action to place obtained using the incentive structure
Reward, to update each coupler of the Boltzmann machine and each weight and each biasing of each node respectively;And
The instruction of the strategy is provided using the digital computer when meeting the stopping criterion.
30. a kind of for storing the non-transitory computer-readable storage media of computer executable instructions, the computer can
Execute instruction the method for making digital computer execute the strategy for improving Stochastic Control Problem when executed, the random control
Problem processed by set of actions, state set, as the incentive structure and multiple decision period tables of state and the function of movement
Sign, wherein the evolution of basic stochastic regime processing depends on multiple movements in strategy, which comprises
Using the sampling apparatus for being coupled to digital computer and sampling apparatus control system, the sampling apparatus, which obtains, indicates Bohr
The hereby data of the sample configuration of graceful machine, the Boltzmann machine include:
Multiple nodes,
Multiple couplers,
Multiple biasings, each biasing correspond to a node in the multiple node,
Multiple coupled weights, each coupled weight correspond to a coupler in the multiple coupler, and
Lateral field strength;
It the use of digital computer acquisition include the set of actions of the Stochastic Control Problem, the state set, institute
The initialization data of the initial policy of incentive structure and the Stochastic Control Problem is stated, the strategy includes selecting for each state
Select at least one movement;
Each coupling of the Boltzmann machine will be respectively indicated using the digital computer and the sampling apparatus control system
The data of the initial weight of clutch and each node and biasing and lateral field strength distribute to the sampling apparatus;
It performs the following operation until meeting stopping criterion:
Current period state action pair is generated using the digital computer,
Using the digital computer and the sampling apparatus control system using current period state action generated to next
Modification indicates the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to the current period state action pair sampling to obtain the first sampling empirical mean,
The Q letter at the current period state action is obtained using the first sampling empirical mean using the digital computer
The approximation of several values, the value of the Q function indicate the effectiveness of the current period state action pair,
Future period state action pair is obtained using the digital computer, wherein the state is handled by stochastic regime
It obtains, and further wherein, obtaining the movement includes: to including the future period state and any possible movement
Multiple all state actions to random optimization test is executed, to provide the movement in the future period and update and be used for
The strategy of the future period state,
Using the digital computer and the sampling apparatus control system, using future period state action generated to next
Modification indicates the data of no coupler or at least one coupler and at least one biasing,
Execute correspond to the future period state action pair sampling to obtain the second sampling empirical mean,
The institute at the future period state action is obtained using the second sampling empirical mean using the digital computer
The approximation of the value of Q function is stated, the value of the Q function indicates the effectiveness of the future period state action pair,
Approximation and the institute in current period state action to the value of the Q function of place's generation are used using the digital computer
State the first sampling empirical mean, and the correspondence in the current period state action to place obtained using the incentive structure
Reward, to update each coupler of the Boltzmann machine and each weight and each biasing of each node respectively;And
The strategy is provided using the digital computer when meeting the stopping criterion.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662333707P | 2016-05-09 | 2016-05-09 | |
US62/333,707 | 2016-05-09 | ||
PCT/IB2017/052702 WO2017195114A1 (en) | 2016-05-09 | 2017-05-09 | Method and system for improving a policy for a stochastic control problem |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109154798A true CN109154798A (en) | 2019-01-04 |
CN109154798B CN109154798B (en) | 2022-02-25 |
Family
ID=60242617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780028555.9A Active CN109154798B (en) | 2016-05-09 | 2017-05-09 | Method and system for improving strategies for stochastic control problems |
Country Status (6)
Country | Link |
---|---|
US (1) | US11017289B2 (en) |
JP (1) | JP6646763B2 (en) |
CN (1) | CN109154798B (en) |
CA (1) | CA3022167C (en) |
GB (1) | GB2569702A (en) |
WO (1) | WO2017195114A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109855766A (en) * | 2019-01-21 | 2019-06-07 | 浙江工业大学 | A kind of heat dissipation rate measurement method based on the hot light generation of optical microresonator |
CN110263433A (en) * | 2019-06-19 | 2019-09-20 | 苏州科技大学 | A kind of fuse failure alarm method and system |
CN111812972A (en) * | 2019-04-11 | 2020-10-23 | 富士通株式会社 | Optimization device and method for controlling an optimization device |
CN112016667A (en) * | 2019-05-29 | 2020-12-01 | 富士通株式会社 | Optimization device and optimization method |
CN115398211A (en) * | 2020-04-20 | 2022-11-25 | 索尼集团公司 | Information processing system, information processing method, program, information processing device, and computing device |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11797641B2 (en) | 2015-02-03 | 2023-10-24 | 1Qb Information Technologies Inc. | Method and system for solving the lagrangian dual of a constrained binary quadratic programming problem using a quantum annealer |
CA2881033C (en) | 2015-02-03 | 2016-03-15 | 1Qb Information Technologies Inc. | Method and system for solving lagrangian dual of a constrained binary quadratic programming problem |
US11086966B2 (en) * | 2015-09-08 | 2021-08-10 | Hewlett Packard Enterprise Development Lp | Apparatus for solving Ising problems |
US10599988B2 (en) | 2016-03-02 | 2020-03-24 | D-Wave Systems Inc. | Systems and methods for analog processing of problem graphs having arbitrary size and/or connectivity |
CA3017275C (en) | 2016-03-11 | 2023-05-23 | 1Qb Information Technologies Inc. | Methods and systems for quantum computing |
US9870273B2 (en) | 2016-06-13 | 2018-01-16 | 1Qb Information Technologies Inc. | Methods and systems for quantum ready and quantum enabled computations |
US10044638B2 (en) | 2016-05-26 | 2018-08-07 | 1Qb Information Technologies Inc. | Methods and systems for quantum computing |
EP3548910A4 (en) * | 2016-12-05 | 2020-09-02 | 1QB Information Technologies Inc. | Method for estimating the thermodynamic properties of a quantum ising model with transverse field |
US11164077B2 (en) * | 2017-11-02 | 2021-11-02 | Siemens Aktiengesellschaft | Randomized reinforcement learning for control of complex systems |
CN111670438B (en) * | 2017-12-01 | 2023-12-29 | 1Qb信息技术公司 | System and method for randomly optimizing robust reasoning problem |
US20210201138A1 (en) * | 2018-05-25 | 2021-07-01 | Nec Corporation | Learning device, information processing system, learning method, and learning program |
JP6985997B2 (en) * | 2018-08-27 | 2021-12-22 | 株式会社日立製作所 | Machine learning system and Boltzmann machine calculation method |
JP7111966B2 (en) * | 2018-09-03 | 2022-08-03 | 富士通株式会社 | Optimization device and control method for optimization device |
US11816594B2 (en) * | 2018-09-24 | 2023-11-14 | International Business Machines Corporation | Stochastic control with a quantum computer |
US11593174B2 (en) | 2018-10-16 | 2023-02-28 | D-Wave Systems Inc. | Systems and methods for scheduling programs for dedicated execution on a quantum processor |
US10504033B1 (en) | 2018-11-13 | 2019-12-10 | Atom Computing Inc. | Scalable neutral atom based quantum computing |
US11580435B2 (en) | 2018-11-13 | 2023-02-14 | Atom Computing Inc. | Scalable neutral atom based quantum computing |
US11995512B2 (en) | 2018-11-13 | 2024-05-28 | Atom Computing Inc. | Scalable neutral atom based quantum computing |
WO2020150156A1 (en) | 2019-01-17 | 2020-07-23 | D-Wave Systems, Inc. | Systems and methods for hybrid algorithms using cluster contraction |
US11593695B2 (en) | 2019-03-26 | 2023-02-28 | D-Wave Systems Inc. | Systems and methods for hybrid analog and digital processing of a computational problem using mean fields |
US20200349453A1 (en) * | 2019-05-01 | 2020-11-05 | 1Qb Information Technologies Inc. | Method and system for solving a dynamic programming problem |
WO2020255076A1 (en) | 2019-06-19 | 2020-12-24 | 1Qb Information Technologies Inc. | Method and system for mapping a dataset from a hilbert space of a given dimension to a hilbert space of a different dimension |
US11714730B2 (en) | 2019-08-20 | 2023-08-01 | D-Wave Systems Inc. | Systems and methods for high availability, failover and load balancing of heterogeneous resources |
US11615293B2 (en) * | 2019-09-23 | 2023-03-28 | Adobe Inc. | Reinforcement learning with a stochastic action set |
CN115516469A (en) | 2020-03-02 | 2022-12-23 | 原子计算公司 | Scalable neutral atom based quantum computing |
JP7359287B2 (en) * | 2020-03-13 | 2023-10-11 | 日本電気株式会社 | Information processing device, control method and program |
CN113991641B (en) * | 2021-09-28 | 2023-07-28 | 广西大学 | Novel power system distributed collaborative quantum Q learning power generation control method |
US20230153148A1 (en) * | 2021-11-18 | 2023-05-18 | Red Hat, Inc. | Quantum isolation zones |
CN114454160B (en) * | 2021-12-31 | 2024-04-16 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Belman residual error reinforcement learning |
US11875227B2 (en) | 2022-05-19 | 2024-01-16 | Atom Computing Inc. | Devices and methods for forming optical traps for scalable trapped atom computing |
US20230391373A1 (en) * | 2022-06-03 | 2023-12-07 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Controlling Autonomous Vehicle in Uncertain Environment |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5608843A (en) * | 1994-08-01 | 1997-03-04 | The United States Of America As Represented By The Secretary Of The Air Force | Learning controller with advantage updating algorithm |
US6473851B1 (en) * | 1999-03-11 | 2002-10-29 | Mark E Plutowski | System for combining plurality of input control policies to provide a compositional output control policy |
US20090306866A1 (en) * | 2008-06-10 | 2009-12-10 | The Regents Of The University Of Michigan | Method, control apparatus and powertrain system controller for real-time, self-learning control based on individual operating style |
CN101673354A (en) * | 2009-06-12 | 2010-03-17 | 北京工业大学 | Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning |
CN102207928A (en) * | 2011-06-02 | 2011-10-05 | 河海大学常州校区 | Reinforcement learning-based multi-Agent sewage treatment decision support system |
CN103034473A (en) * | 2012-12-17 | 2013-04-10 | 中国科学院高能物理研究所 | Pseudo-random number generator |
CN103150595A (en) * | 2011-12-06 | 2013-06-12 | 腾讯科技(深圳)有限公司 | Automatic pair selection method and device in data processing system |
US20140156031A1 (en) * | 2011-08-11 | 2014-06-05 | The Trustees Of Columbia University In The City Of New York | Adaptive Stochastic Controller for Dynamic Treatment of Cyber-Physical Systems |
CN103906893A (en) * | 2011-07-12 | 2014-07-02 | 因格瑞恩股份有限公司 | Method for simulating fractional multi-phase/multi-component flow through porous media |
CN104200073A (en) * | 2014-08-19 | 2014-12-10 | 浙江工业大学 | Self-adaptation group global optimization method based on local Lipschitz estimation |
WO2015006494A1 (en) * | 2013-07-09 | 2015-01-15 | Board Of Trustees Of The Leland Stanford Junior University | Computation using a network of optical parametric oscillators |
CN104463248A (en) * | 2014-12-09 | 2015-03-25 | 西北工业大学 | High-resolution remote sensing image airplane detecting method based on high-level feature extraction of depth boltzmann machine |
CN104751227A (en) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | Method and system for constructing deep neural network |
JP2015125198A (en) * | 2013-12-25 | 2015-07-06 | Kddi株式会社 | Interactive program, server and method for controlling inserting action of dynamic interactive node to interactive scenario |
US20160034814A1 (en) * | 2014-08-01 | 2016-02-04 | University Of Southern California | Noise-boosted back propagation and deep learning neural networks |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7251636B2 (en) * | 2003-12-10 | 2007-07-31 | Microsoft Corporation | Scalable methods for learning Bayesian networks |
US7533068B2 (en) | 2004-12-23 | 2009-05-12 | D-Wave Systems, Inc. | Analog processor comprising quantum devices |
US7800395B2 (en) * | 2007-05-02 | 2010-09-21 | D-Wave Systems Inc. | Systems, devices, and methods for controllably coupling qubits |
JP5296189B2 (en) | 2008-03-24 | 2013-09-25 | ディー−ウェイブ システムズ,インコーポレイテッド | System, apparatus, and method for analog processing |
US9015093B1 (en) * | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US9152746B2 (en) * | 2013-03-26 | 2015-10-06 | Microsoft Technology Licensing, Llc | Quantum annealing simulator |
US9183508B2 (en) | 2013-08-07 | 2015-11-10 | D-Wave Systems Inc. | Systems and devices for quantum processor architectures |
EP3362952A4 (en) * | 2015-10-16 | 2018-10-03 | D-Wave Systems Inc. | Systems and methods for creating and using quantum boltzmann machines |
US20170132699A1 (en) * | 2015-11-10 | 2017-05-11 | Astir Technologies, Inc. | Markov decision process-based decision support tool for financial planning, budgeting, and forecasting |
US10817796B2 (en) * | 2016-03-07 | 2020-10-27 | D-Wave Systems Inc. | Systems and methods for machine learning |
-
2017
- 2017-05-09 WO PCT/IB2017/052702 patent/WO2017195114A1/en active Application Filing
- 2017-05-09 GB GB1819448.0A patent/GB2569702A/en not_active Withdrawn
- 2017-05-09 CN CN201780028555.9A patent/CN109154798B/en active Active
- 2017-05-09 JP JP2018558696A patent/JP6646763B2/en active Active
- 2017-05-09 US US15/590,614 patent/US11017289B2/en active Active
- 2017-05-09 CA CA3022167A patent/CA3022167C/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5608843A (en) * | 1994-08-01 | 1997-03-04 | The United States Of America As Represented By The Secretary Of The Air Force | Learning controller with advantage updating algorithm |
US6473851B1 (en) * | 1999-03-11 | 2002-10-29 | Mark E Plutowski | System for combining plurality of input control policies to provide a compositional output control policy |
US20090306866A1 (en) * | 2008-06-10 | 2009-12-10 | The Regents Of The University Of Michigan | Method, control apparatus and powertrain system controller for real-time, self-learning control based on individual operating style |
CN101673354A (en) * | 2009-06-12 | 2010-03-17 | 北京工业大学 | Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning |
CN102207928A (en) * | 2011-06-02 | 2011-10-05 | 河海大学常州校区 | Reinforcement learning-based multi-Agent sewage treatment decision support system |
CN103906893A (en) * | 2011-07-12 | 2014-07-02 | 因格瑞恩股份有限公司 | Method for simulating fractional multi-phase/multi-component flow through porous media |
US20140156031A1 (en) * | 2011-08-11 | 2014-06-05 | The Trustees Of Columbia University In The City Of New York | Adaptive Stochastic Controller for Dynamic Treatment of Cyber-Physical Systems |
CN103150595A (en) * | 2011-12-06 | 2013-06-12 | 腾讯科技(深圳)有限公司 | Automatic pair selection method and device in data processing system |
CN103034473A (en) * | 2012-12-17 | 2013-04-10 | 中国科学院高能物理研究所 | Pseudo-random number generator |
WO2015006494A1 (en) * | 2013-07-09 | 2015-01-15 | Board Of Trustees Of The Leland Stanford Junior University | Computation using a network of optical parametric oscillators |
JP2015125198A (en) * | 2013-12-25 | 2015-07-06 | Kddi株式会社 | Interactive program, server and method for controlling inserting action of dynamic interactive node to interactive scenario |
CN104751227A (en) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | Method and system for constructing deep neural network |
US20160034814A1 (en) * | 2014-08-01 | 2016-02-04 | University Of Southern California | Noise-boosted back propagation and deep learning neural networks |
CN104200073A (en) * | 2014-08-19 | 2014-12-10 | 浙江工业大学 | Self-adaptation group global optimization method based on local Lipschitz estimation |
CN104463248A (en) * | 2014-12-09 | 2015-03-25 | 西北工业大学 | High-resolution remote sensing image airplane detecting method based on high-level feature extraction of depth boltzmann machine |
Non-Patent Citations (5)
Title |
---|
BRIAN SALLANS: "reinforcement learning with factored states and actions", 《JOURNAL OF MACHINE LEARNING RESEARCH 5(2004)》 * |
HAO XU: "Model-free H∞ stochastic optimal design for unknown linear networked controlsystem zero-sum games via Q-learning", 《2011 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL》 * |
仲伟俊: "含整数变量的两层分散决策的玻尔兹曼机方法", 《管理科学与系统科学进展——全国青年管理科学与系统科学论文集》 * |
杜波: "启发式概率值迭代算法:一种求解POMDP问题的近似框架", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
段艳杰: "深度学习在控制领域的研究现状与展望", 《自动化学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109855766A (en) * | 2019-01-21 | 2019-06-07 | 浙江工业大学 | A kind of heat dissipation rate measurement method based on the hot light generation of optical microresonator |
CN111812972A (en) * | 2019-04-11 | 2020-10-23 | 富士通株式会社 | Optimization device and method for controlling an optimization device |
CN112016667A (en) * | 2019-05-29 | 2020-12-01 | 富士通株式会社 | Optimization device and optimization method |
CN110263433A (en) * | 2019-06-19 | 2019-09-20 | 苏州科技大学 | A kind of fuse failure alarm method and system |
CN110263433B (en) * | 2019-06-19 | 2024-03-05 | 苏州科技大学 | Fuse fault alarm method and system |
CN115398211A (en) * | 2020-04-20 | 2022-11-25 | 索尼集团公司 | Information processing system, information processing method, program, information processing device, and computing device |
Also Published As
Publication number | Publication date |
---|---|
WO2017195114A1 (en) | 2017-11-16 |
CA3022167A1 (en) | 2017-11-16 |
GB2569702A (en) | 2019-06-26 |
US11017289B2 (en) | 2021-05-25 |
CN109154798B (en) | 2022-02-25 |
JP2019515397A (en) | 2019-06-06 |
GB201819448D0 (en) | 2019-01-16 |
US20170323195A1 (en) | 2017-11-09 |
CA3022167C (en) | 2021-07-20 |
JP6646763B2 (en) | 2020-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109154798A (en) | For improving the method and system of the strategy of Stochastic Control Problem | |
Movassagh et al. | Artificial neural networks training algorithm integrating invasive weed optimization with differential evolutionary model | |
Ross | A review of quantum-inspired metaheuristics: Going from classical computers to real quantum computers | |
Crawford et al. | Reinforcement learning using quantum Boltzmann machines | |
CN109074520A (en) | Quantum processor and its purposes for realizing neural network | |
Tümer et al. | Prediction of team league’s rankings in volleyball by artificial neural network method | |
Uyar et al. | B-spline curve fitting with invasive weed optimization | |
CN109697504A (en) | Time Series Forecasting Methods and device based on intuition circulation fuzzy neural network | |
Beloborodov et al. | Reinforcement learning enhanced quantum-inspired algorithm for combinatorial optimization | |
Zoufal | Generative quantum machine learning | |
Beer | Quantum neural networks | |
Sahoo et al. | A novel variant of moth flame optimizer for higher dimensional optimization problems | |
Kapoor et al. | Bayesian neuroevolution using distributed swarm optimization and tempered MCMC | |
Dehghani et al. | A hybrid MGA-MSGD ANN training approach for approximate solution of linear elliptic PDEs | |
Allori et al. | What is Bohmian mechanics | |
Srivastava et al. | A Review of Optimization Algorithms for Training Neural Networks | |
Torres et al. | Dissipative Quantum Hopfield Network: A numerical analysis | |
Saosaovaphak et al. | Causal statistics of structural dependence space-based trend simulations for the coalition of rice exporters: the cases of India, Thailand, and Vietnam | |
Escudero et al. | Assessing the impact of noise on quantum neural networks: An experimental analysis | |
Rakhshani et al. | From feature selection to continuous optimization | |
Faury et al. | Rover descent: Learning to optimize by learning to navigate on prototypical loss surfaces | |
Piatkowski et al. | How to Trust Generative Probabilistic Models for Time-Series Data? | |
Peluso et al. | International Trade: a Reinforced Urn Network Model | |
Vo et al. | Causal Modeling with Stochastic Confounders | |
Yin et al. | A Reinforcement Learning Method for Inventory Control Under State-based Stochastic Demand |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: British Columbia Applicant after: 1QB Information Technology Company Address before: Columbia Canada Applicant before: 1QB Information Technology Company |
|
GR01 | Patent grant | ||
GR01 | Patent grant |