CN107315572A - Build control method, storage medium and the terminal device of Mechatronic Systems - Google Patents

Build control method, storage medium and the terminal device of Mechatronic Systems Download PDF

Info

Publication number
CN107315572A
CN107315572A CN201710592114.3A CN201710592114A CN107315572A CN 107315572 A CN107315572 A CN 107315572A CN 201710592114 A CN201710592114 A CN 201710592114A CN 107315572 A CN107315572 A CN 107315572A
Authority
CN
China
Prior art keywords
strategy
renewal
state
cost function
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710592114.3A
Other languages
Chinese (zh)
Other versions
CN107315572B (en
Inventor
孙凫
孙一凫
吴若飒
张豪
王宗祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Geyun Technology Co Ltd
Original Assignee
Beijing Geyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Geyun Technology Co Ltd filed Critical Beijing Geyun Technology Co Ltd
Priority to CN201710592114.3A priority Critical patent/CN107315572B/en
Publication of CN107315572A publication Critical patent/CN107315572A/en
Application granted granted Critical
Publication of CN107315572B publication Critical patent/CN107315572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

This application provides a kind of control method, storage medium and terminal device for building Mechatronic Systems, this method includes:Obtain sensing data and determine current state according to goal-selling data;The value for performing corresponding actions according to current strategies under current state is predicted according to the cost function based on strategy, and according to preset algorithm iteration recovery value function and its strategy, untill the strategy after renewal is identical with the strategy before renewal;The corresponding action of current state is determined according to the strategy after renewal and performed.Control efficiency is improved, artificial experience is eliminated the reliance on, moreover it is possible to the effect of low-loss and energy-saving is reached.

Description

Build control method, storage medium and the terminal device of Mechatronic Systems
Technical field
The application is related to the control technology field of building Mechatronic Systems, more particularly to a kind of controlling party for building Mechatronic Systems Method, storage medium and terminal device.
Background technology
Building electro mechanical system device is indispensable important component in building, including industrial building, civilian is built Build, the plumbing in utilities building, electrically, heating, ventilation, fire-fighting, communication and Automated condtrol etc..
Modern architecture electro mechanical system device is generally using traditional proportional-integral-differential (PID) control or fuzzy control etc. Algorithm, its autgmentability is weaker, needs artificial regulation quantity of parameters for specific building or room or rule of thumb sets Empirical value.And the control effect being finally reached is also more rough, energy consumption is higher.
The content of the invention
In view of this, the embodiment of the present application provides a kind of control method, storage medium and terminal for building Mechatronic Systems Equipment, automatic control effect to solve to build Mechatronic Systems in the prior art is rough, precision is too low, more artificial warps of dependence The technical problem tested.
According to the one side of the embodiment of the present application, there is provided a kind of control method for building Mechatronic Systems, methods described Including:Obtain sensing data and determine current state according to goal-selling data;Predicted according to the cost function based on strategy Perform the value of corresponding actions according to current strategies under current state, and according to preset algorithm iteration recovery value function and its Strategy, untill the strategy after renewal is identical with the strategy before renewal;Current state correspondence is determined according to the strategy after renewal Action and execution.
According to the another aspect of the embodiment of the present application there is provided a kind of terminal device, including:Processor;At storage Manage the memory of device executable instruction;Wherein, the processor is configured as:Obtain sensing data and according to goal-selling number According to determination current state;Predicted according to the cost function based on strategy and perform corresponding actions according to current strategies under current state Value, and according to preset algorithm iteration recovery value function and its strategy, until the strategy after renewal with update before strategy Untill identical;The corresponding action of current state is determined according to the strategy after renewal and performed.
According to the another aspect of the embodiment of the present application there is provided a kind of computer-readable recording medium, meter is stored thereon with Calculation machine is instructed, the control method of above-mentioned building Mechatronic Systems is realized in instruction when being executed by processor the step of.
The beneficial effect of the embodiment of the present application includes:Using measured data real-time optimal control strategy, control effect is improved Rate, eliminates the reliance on artificial experience, moreover it is possible to reach the effect of low-loss and energy-saving, and the control based on strategy helps to find the electromechanical system of building The globally optimal solution of system, so as to realize optimum control of the system to many equipment multiple targets.
Brief description of the drawings
By description referring to the drawings to the embodiment of the present application, the above-mentioned and other purpose of the application, feature and Advantage will be apparent from, in the accompanying drawings:
Fig. 1 is the schematic flow sheet of the control method for the building Mechatronic Systems that the embodiment of the present application is provided;
Fig. 2 is the schematic flow sheet of the embodiment of the present application iteration recovery value function and its strategy;
Fig. 3 is the schematic flow sheet of the control method for the building Mechatronic Systems that the embodiment of the present application is provided.
Embodiment
The application is described below based on embodiment, but the application is not restricted to these embodiments.Under Text is detailed to describe some specific detail sections in the detailed description of the application.Do not have for a person skilled in the art The description of these detail sections can also understand the application completely.In order to avoid obscuring the essence of the application, known method, mistake Journey, flow, element and circuit do not have detailed narration.
In addition, it should be understood by one skilled in the art that provided herein accompanying drawing be provided to explanation purpose, and What accompanying drawing was not necessarily drawn to scale.
Unless the context clearly requires otherwise, otherwise entire disclosure is similar with the " comprising " in claims, "comprising" etc. Word should be construed to the implication included rather than exclusive or exhaustive implication;That is, being containing for " including but is not limited to " Justice.
In the description of the present application, it is to be understood that term " first ", " second " etc. are only used for describing purpose, without It is understood that to indicate or imply relative importance.In addition, in the description of the present application, unless otherwise indicated, the implication of " multiple " It is two or more.
The embodiment of the present application updates the cost function based on strategy using intensified learning method, and building Mechatronic Systems is constantly sharp Learnt with the environmental data of actual measurement, Optimal Control Strategy, using study to optimal policy remove control device, be finally reached setting Target.Building Mechatronic Systems is made strategy by intensified learning and updated based on the tactful continuous iteration of cost function until strategy Convergence, so as to find the action under optimal policy and strategy in each state with maximum value.Not only increase control Efficiency processed, moreover it is possible to reach the effect of low-loss and energy-saving, eliminates the reliance on artificial experience, saves a large amount of manpowers, and extension and replicability By force, it can apply in other building Mechatronic Systems.
Fig. 1 is the control method for the building Mechatronic Systems that the embodiment of the present application is provided, it is adaptable to terminal device, terminal device Can be computer, console, server etc., this method comprises the following steps.
S10, obtains sensing data and determines current state according to goal-selling data.
The data gathered by sensor can be that the ambient condition of interior of building, power supply and are set water supply condition, pipeline The data such as received shipment row.The desired value to be reached can be pre-set for each item data, so as to be managed to interior of building state Control.Current state can be determined according to the difference between sensing data and predetermined target value.
S11, predicts according to the cost function based on strategy and performs corresponding actions according to current strategies under current state Value, and according to preset algorithm iteration recovery value function and its strategy, until the strategy after renewal and the tactful phase before renewal With untill.
Strategy be the stateful lower set for corresponding to the action performed respectively, a strategy is internal to be likely to occur comprising all State and its next step execution action corresponding relation.If a state includes a variety of data variables, should by exhaustion All combinations of multiple data variables are stateful come the institute for determining to be likely to occur;It can also include in each corresponding action multiple Controlled variable.
Cost function is used for the corresponding relation reflected between state, action and value, and defines state space and action Space.If a state includes a variety of data variables, whole to define by all combinations of exhaustive the plurality of data variable Individual state space;If an action includes multiple controlled quentity controlled variables, all combinations of exhaustive multiple controlled quentity controlled variables are whole dynamic to define Make space;Value refers to that, in each state for performing the benefit corresponding to each action, value is bigger, represents in the state The lower effect for performing the action is better, contributes to faster close to default control target.The cost function can be Q value matrixs or Approximating function.
When being initialized to strategy, it can configure that every kind of state is corresponding to perform action.Cost function is carried out initial The value for performing each action can be given to assign random value during change under each state.In addition also need to initialize Reward Program, root According to the predetermined target value of interior of building indices variable (for example, environmental index, power supply target, water supply index etc.), calculate The distance between currency of each index and desired value and the return value after negating as corresponding states:
R (y)=- (y1-y10)2-(y2-y20)2-(y3-y30)2.......;Wherein, r (y) represents return value, y1、y2、 y3... represent the currency of indices variable, y10、y20、y30... represent the desired value of indices variable.
Determine after current state, identical is matched from strategy or closest state, so that it is determined that this is current The corresponding action of state.Determine to perform the value of the action under current state further according to cost function.Then in conjunction with pre- imputation Method recovery value function, the action of the Maximum Value under current state is determined according to the cost function after renewal, and will be worth most Big action is updated into current strategies and current state binding.
S11 as shown in Figure 2 can further comprise the steps:
S110, according to preset algorithm recovery value function.
S111, the action of the Maximum Value under current state is determined according to the cost function after renewal, and by Maximum Value Action update into the current strategies of cost function.
Whether S112, the strategy before judging the strategy after updating and updating is identical.If both are identical, S113 is performed; If both are different, S110 is returned.
S113, stops iteration, using the tactful current optimal policy as cost function after renewal.
If the action of Maximum Value is with current state, the corresponding original action in strategy is identical, and the strategy is real after updating Do not changed in matter;If corresponding original action is different in strategy from current state, the strategy there occurs after updating Change.
If strategy is changed, continue according to preset algorithm recovery value function, further according to the value after renewal Function redefines the action of Maximum Value and more new strategy under current state, until the strategy after renewal and the strategy before renewal It is identical, i.e., strategy in current state it is corresponding action do not change, be now considered as searched out it is optimal under current state Strategy.
In one embodiment, the cost function Q based on strategy can be updated based on Bellman equationhl
Qh(x, u)=r (x, u)+γ Qh(f(x,u),h(f(x,u)));Wherein l represents iterations, Qh(x, u) is represented State x acts the Q values obtained by u according to tactful h execution, and r (x, u) represents the return value obtained by state x execution acts u, γ represents discount factor, and f (x, u) represents state x and acts the transition equation that u obtains next state by execution.
The corresponding action of maximum Q values is found according to the cost function after renewal, the action is updated into strategy, i.e., hl+1(x)∈argmaxuQhl(x,u).When the corresponding action of current state no longer changes under the strategy, i.e. hl+1=hlWhen, stop Iteration, otherwise returns and continues iteration recovery value function QhlAnd its strategy, until hl+1=hlUntill.
S12, determines the corresponding action of current state according to the strategy after renewal and performs.
In the present embodiment, using measured data real-time optimal control strategy, control efficiency is improved, artificial warp is eliminated the reliance on Test, moreover it is possible to reach the effect of low-loss and energy-saving, help to find the globally optimal solution of building Mechatronic Systems based on policy control, can be with Realize optimum control of the complication system to many equipment multiple targets.
In one embodiment, in addition to being pre-configured with to initial policy, initial policy, which can also be, utilizes product Obtained from tired historical data is trained to neutral net.By the state accumulated in preset duration and its action can be performed Data as training data, or, when above-mentioned data accumulation is to predetermined number as training data, built in advance for training Vertical neutral net, the error between the action actually performed in the prediction action of neutral net and the training data of accumulation Untill pre-determined threshold.Neutral net is divided into input layer, hidden layer and output layer, and it is state that it, which is inputted, is output as The action of prediction, wherein hidden layer are configured to preferably use amendment linear unit in 10 implicit nodes, the present embodiment (Rectified Linear Unit, ReLU) activation primitive.ReLU activation primitive expression formulas are:F (x)=max (0, x).ReLU The advantage of activation primitive is:Gradient is unsaturated, and gradient calculation formula is:1{x>0 }, in back-propagation process, ladder is alleviated The problem of spending disperse;Calculating speed is fast, during forward-propagating, and S-shaped (sigmoid) activation primitive and tanh (tanh) swash Function living needs gauge index when calculating activation value, and ReLU functions only need to set threshold value, if x<0, then f (x)=0, such as Fruit x>0, then f (x)=x, accelerates the calculating speed of forward-propagating.
In addition, when obtaining initial policy using trained neural metwork training, if according in above-described embodiment Iteration recovery value function and its strategy process within preset duration (such as 30 minutes) still fail to reach predetermined target value, Then it can continue to train the neutral net using the state and its action data of accumulation in the preset duration.Said process is such as Shown in Fig. 3, this method further comprises:
S13, judges whether to reach goal-selling shape after by preset duration according to the data got from sensor State.When not up to goal-selling state, step S14 is performed.
S14, continues to train neutral net using the state and its action data accumulated in preset duration.Training is updated to obtain More ageing control initial policy after, return to step S10 continues to control building Mechatronic Systems according to new initial policy To reach goal-selling state as early as possible.
In addition, in the embodiment of the present application, terminal device can by hardware processor (hardware processor) come Realize each above-mentioned functional steps.Terminal device includes:Processor, the memory for storing processor-executable instruction;Its In, processor is configured as:Obtain sensing data and determine current state according to goal-selling data;According to based on strategy Cost function prediction performs the value of corresponding actions under current state according to current strategies, and is updated according to preset algorithm iteration Cost function and its strategy, untill the strategy after renewal is identical with the strategy before renewal;Determined according to the strategy after renewal The corresponding action of current state is simultaneously performed.
In one embodiment, according to preset algorithm iteration recovery value function and its strategy, after once updating Include untill strategy is identical with the strategy before renewal:
According to preset algorithm recovery value function;The Maximum Value under current state is determined according to the cost function after renewal Action, and by Maximum Value action update into the current strategies of cost function;
Whether the strategy before judging the strategy after updating and updating is identical;The strategy before strategy and renewal after renewal is not Meanwhile, return to the step of above-mentioned iteration recovery value function and its strategy;The strategy after strategy before renewal is with updating is identical When, stop iteration, using the tactful current optimal policy as cost function after renewal.
In one embodiment, include according to preset algorithm recovery value function:
Based on Bellman equation Qh(x, u)=r (x, u)+γ Qh(f (x, u), h (f (x, u))) updates the value based on strategy Function Qhl, wherein l represents iterations, QhQ value of (x, the u) representative obtained by state x acts u according to tactful h execution, r (x, U) return value obtained by state x execution acts u is represented, γ represents discount factor, and it is dynamic by performing that f (x, u) represents state x The transition equation of next state is obtained as u.
In one embodiment, the processor is configured to:Utilize the historic state and its action data of accumulation Training neutral net obtains the strategy, and the input of neutral net is state, is output as action.
In one embodiment, the activation primitive of neutral net is ReLU functions.
In one embodiment, the processor is configured to:If obtained after preset duration from sensor To data be not up to goal-selling data, then continue to train nerve using the state and its action data of accumulation in preset duration Network.
It will be understood by those skilled in the art that embodiments herein can be provided as method, device (equipment) or computer Program product.Therefore, in terms of the application can be using complete hardware embodiment, complete software embodiment or combination software and hardware Embodiment form.Moreover, the application can be used in one or more meters for wherein including computer usable program code The computer journey that calculation machine usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of sequence product.
The application is with reference to the flow chart according to the method for the embodiment of the present application, device (equipment) and computer program product And/or block diagram is described.It should be understood that can be by each flow in computer program instructions implementation process figure and/or block diagram And/or square frame and the flow in flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided to refer to The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is made to produce One machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for realizing The device for the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
The preferred embodiment of the application is the foregoing is only, the application is not limited to, for those skilled in the art For, the application can have various changes and change.It is all any modifications made within spirit herein and principle, equivalent Replace, improve etc., it should be included within the protection domain of the application.

Claims (8)

1. a kind of control method for building Mechatronic Systems, it is characterised in that methods described includes:
Obtain sensing data and determine current state according to goal-selling data;
The value for performing corresponding actions according to current strategies under the current state is predicted according to the cost function based on strategy, And the cost function and its strategy are updated according to preset algorithm iteration, until the strategy after renewal is identical with the strategy before updating Untill;
The corresponding action of the current state is determined according to the strategy after renewal and performed.
2. according to the method described in claim 1, it is characterised in that according to preset algorithm iteration update the cost function and its Strategy, includes untill the strategy after once updating is identical with the strategy before renewal:
The cost function is updated according to preset algorithm;
The action of the Maximum Value under the current state is determined according to the cost function after renewal, and by the Maximum Value Action is updated into the current strategies of the cost function;
Whether the strategy before judging the strategy after updating and updating is identical;
When the strategy after renewal is tactful different from before renewal, the step of above-mentioned iteration recovery value function and its strategy is returned Suddenly;
When the strategy before renewal is tactful identical with after renewal, stops iteration, regard the strategy after the renewal as the valency The current optimal policy of value function.
3. method according to claim 2, it is characterised in that updating the cost function according to preset algorithm includes:
Based on Bellman equation Qh(x, u)=r (x, u)+γ Qh(f (x, u), h (f (x, u))) updates the cost function based on strategy Qhl, wherein l represents iterations, Qh(x, u) represents the Q values obtained by state x acts u according to tactful h execution, r (x, u) generation Return value of the table obtained by state x execution acts u, γ represents discount factor, and f (x, u) represents state x and acts u by execution Obtain the transition equation of next state.
4. according to the method described in claim 1, it is characterised in that methods described also includes:
The strategy, the input of the neutral net are obtained using historic state and its action data the training neutral net of accumulation For state, action is output as.
5. method according to claim 4, it is characterised in that the activation primitive of the neutral net is ReLU functions.
6. method according to claim 4, it is characterised in that methods described also includes:
If the data got after preset duration from sensor are not up to the goal-selling data, using described pre- If the state and its action data of accumulation continue to train the neutral net in duration.
7. a kind of terminal device, it is characterised in that including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:Perform claim requires the control of the building Mechatronic Systems described in 1 to 6 any one Method.
8. a kind of computer-readable recording medium, is stored thereon with computer instruction, it is characterised in that the instruction is held by processor The step of control method that Mechatronic Systems is built described in claim 1-6 is realized during row.
CN201710592114.3A 2017-07-19 2017-07-19 Control method of building electromechanical system, storage medium and terminal equipment Active CN107315572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710592114.3A CN107315572B (en) 2017-07-19 2017-07-19 Control method of building electromechanical system, storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710592114.3A CN107315572B (en) 2017-07-19 2017-07-19 Control method of building electromechanical system, storage medium and terminal equipment

Publications (2)

Publication Number Publication Date
CN107315572A true CN107315572A (en) 2017-11-03
CN107315572B CN107315572B (en) 2020-08-11

Family

ID=60178838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710592114.3A Active CN107315572B (en) 2017-07-19 2017-07-19 Control method of building electromechanical system, storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN107315572B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111505944A (en) * 2019-01-30 2020-08-07 珠海格力电器股份有限公司 Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control
CN117970819A (en) * 2024-04-01 2024-05-03 北京邮电大学 Optimal control method and system for nonlinear electromechanical system under state constraint

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982344A (en) * 2012-11-12 2013-03-20 浙江大学 Support vector machine sorting method based on simultaneously blending multi-view features and multi-label information
CN105652754A (en) * 2016-03-18 2016-06-08 江苏联宏自动化系统工程有限公司 Comprehensive electricity consumption measurement and control management terminal
CN105959353A (en) * 2016-04-22 2016-09-21 广东石油化工学院 Cloud operation access control method based on average reinforcement learning and Gaussian process regression
CN106125595A (en) * 2016-06-22 2016-11-16 北京小米移动软件有限公司 Control the method and device of terminal applies
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982344A (en) * 2012-11-12 2013-03-20 浙江大学 Support vector machine sorting method based on simultaneously blending multi-view features and multi-label information
CN105652754A (en) * 2016-03-18 2016-06-08 江苏联宏自动化系统工程有限公司 Comprehensive electricity consumption measurement and control management terminal
CN105959353A (en) * 2016-04-22 2016-09-21 广东石油化工学院 Cloud operation access control method based on average reinforcement learning and Gaussian process regression
CN106125595A (en) * 2016-06-22 2016-11-16 北京小米移动软件有限公司 Control the method and device of terminal applies
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LEEMON BAIRD: "Residual Algorithms:Reinforcement Learning with Function Approximation", 《PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON MACHING LEARNING》 *
隋先超: "电力系统电压无功控制方法研究", 《中国优秀硕士学位论文全文数据库 工程科技II辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111505944A (en) * 2019-01-30 2020-08-07 珠海格力电器股份有限公司 Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control
CN117970819A (en) * 2024-04-01 2024-05-03 北京邮电大学 Optimal control method and system for nonlinear electromechanical system under state constraint

Also Published As

Publication number Publication date
CN107315572B (en) 2020-08-11

Similar Documents

Publication Publication Date Title
US20150206050A1 (en) Configuring neural network for low spiking rate
CN105637540A (en) Methods and apparatus for reinforcement learning
WO2017091629A1 (en) Reinforcement learning using confidence scores
KR102596158B1 (en) Reinforcement learning through dual actor critical algorithm
CN106709565A (en) Optimization method and device for neural network
TW201602807A (en) COLD neuron spike timing back propagation
US20050273296A1 (en) Neural network model for electric submersible pump system
CN110781969B (en) Air conditioner air volume control method, device and medium based on deep reinforcement learning
CN108133085B (en) Method and system for predicting equipment temperature in electronic equipment cabin
KR20160062052A (en) Automated method for modifying neural dynamics
KR20160145636A (en) Modulating plasticity by global scalar values in a spiking neural network
TWI550530B (en) Method, apparatus, computer readable medium, and computer program product for generating compact representations of spike timing-dependent plasticity curves
TW201602923A (en) Probabilistic representation of large sequences using spiking neural network
CN105335375B (en) Topics Crawling method and apparatus
CN107315572A (en) Build control method, storage medium and the terminal device of Mechatronic Systems
CN116627027A (en) Optimal robustness control method based on improved PID
JP6902487B2 (en) Machine learning system
Mousavi et al. Applying q (λ)-learning in deep reinforcement learning to play atari games
CN116050505A (en) Partner network-based intelligent agent deep reinforcement learning method
CN107367929A (en) Update method, storage medium and the terminal device of Q value matrixs
WO2020121494A1 (en) Arithmetic device, action determination method, and non-transitory computer-readable medium storing control program
CN115906673B (en) Combat entity behavior model integrated modeling method and system
GB2595833A (en) System and method for applying artificial intelligence techniques to reservoir fluid geodynamics
CN107315573A (en) Build control method, storage medium and the terminal device of Mechatronic Systems
US9342782B2 (en) Stochastic delay plasticity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant