CN107315572A - Build control method, storage medium and the terminal device of Mechatronic Systems - Google Patents
Build control method, storage medium and the terminal device of Mechatronic Systems Download PDFInfo
- Publication number
- CN107315572A CN107315572A CN201710592114.3A CN201710592114A CN107315572A CN 107315572 A CN107315572 A CN 107315572A CN 201710592114 A CN201710592114 A CN 201710592114A CN 107315572 A CN107315572 A CN 107315572A
- Authority
- CN
- China
- Prior art keywords
- strategy
- renewal
- state
- cost function
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/04—Programme control other than numerical control, i.e. in sequence controllers or logic controllers
- G05B19/042—Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
Abstract
This application provides a kind of control method, storage medium and terminal device for building Mechatronic Systems, this method includes:Obtain sensing data and determine current state according to goal-selling data;The value for performing corresponding actions according to current strategies under current state is predicted according to the cost function based on strategy, and according to preset algorithm iteration recovery value function and its strategy, untill the strategy after renewal is identical with the strategy before renewal;The corresponding action of current state is determined according to the strategy after renewal and performed.Control efficiency is improved, artificial experience is eliminated the reliance on, moreover it is possible to the effect of low-loss and energy-saving is reached.
Description
Technical field
The application is related to the control technology field of building Mechatronic Systems, more particularly to a kind of controlling party for building Mechatronic Systems
Method, storage medium and terminal device.
Background technology
Building electro mechanical system device is indispensable important component in building, including industrial building, civilian is built
Build, the plumbing in utilities building, electrically, heating, ventilation, fire-fighting, communication and Automated condtrol etc..
Modern architecture electro mechanical system device is generally using traditional proportional-integral-differential (PID) control or fuzzy control etc.
Algorithm, its autgmentability is weaker, needs artificial regulation quantity of parameters for specific building or room or rule of thumb sets
Empirical value.And the control effect being finally reached is also more rough, energy consumption is higher.
The content of the invention
In view of this, the embodiment of the present application provides a kind of control method, storage medium and terminal for building Mechatronic Systems
Equipment, automatic control effect to solve to build Mechatronic Systems in the prior art is rough, precision is too low, more artificial warps of dependence
The technical problem tested.
According to the one side of the embodiment of the present application, there is provided a kind of control method for building Mechatronic Systems, methods described
Including:Obtain sensing data and determine current state according to goal-selling data;Predicted according to the cost function based on strategy
Perform the value of corresponding actions according to current strategies under current state, and according to preset algorithm iteration recovery value function and its
Strategy, untill the strategy after renewal is identical with the strategy before renewal;Current state correspondence is determined according to the strategy after renewal
Action and execution.
According to the another aspect of the embodiment of the present application there is provided a kind of terminal device, including:Processor;At storage
Manage the memory of device executable instruction;Wherein, the processor is configured as:Obtain sensing data and according to goal-selling number
According to determination current state;Predicted according to the cost function based on strategy and perform corresponding actions according to current strategies under current state
Value, and according to preset algorithm iteration recovery value function and its strategy, until the strategy after renewal with update before strategy
Untill identical;The corresponding action of current state is determined according to the strategy after renewal and performed.
According to the another aspect of the embodiment of the present application there is provided a kind of computer-readable recording medium, meter is stored thereon with
Calculation machine is instructed, the control method of above-mentioned building Mechatronic Systems is realized in instruction when being executed by processor the step of.
The beneficial effect of the embodiment of the present application includes:Using measured data real-time optimal control strategy, control effect is improved
Rate, eliminates the reliance on artificial experience, moreover it is possible to reach the effect of low-loss and energy-saving, and the control based on strategy helps to find the electromechanical system of building
The globally optimal solution of system, so as to realize optimum control of the system to many equipment multiple targets.
Brief description of the drawings
By description referring to the drawings to the embodiment of the present application, the above-mentioned and other purpose of the application, feature and
Advantage will be apparent from, in the accompanying drawings:
Fig. 1 is the schematic flow sheet of the control method for the building Mechatronic Systems that the embodiment of the present application is provided;
Fig. 2 is the schematic flow sheet of the embodiment of the present application iteration recovery value function and its strategy;
Fig. 3 is the schematic flow sheet of the control method for the building Mechatronic Systems that the embodiment of the present application is provided.
Embodiment
The application is described below based on embodiment, but the application is not restricted to these embodiments.Under
Text is detailed to describe some specific detail sections in the detailed description of the application.Do not have for a person skilled in the art
The description of these detail sections can also understand the application completely.In order to avoid obscuring the essence of the application, known method, mistake
Journey, flow, element and circuit do not have detailed narration.
In addition, it should be understood by one skilled in the art that provided herein accompanying drawing be provided to explanation purpose, and
What accompanying drawing was not necessarily drawn to scale.
Unless the context clearly requires otherwise, otherwise entire disclosure is similar with the " comprising " in claims, "comprising" etc.
Word should be construed to the implication included rather than exclusive or exhaustive implication;That is, being containing for " including but is not limited to "
Justice.
In the description of the present application, it is to be understood that term " first ", " second " etc. are only used for describing purpose, without
It is understood that to indicate or imply relative importance.In addition, in the description of the present application, unless otherwise indicated, the implication of " multiple "
It is two or more.
The embodiment of the present application updates the cost function based on strategy using intensified learning method, and building Mechatronic Systems is constantly sharp
Learnt with the environmental data of actual measurement, Optimal Control Strategy, using study to optimal policy remove control device, be finally reached setting
Target.Building Mechatronic Systems is made strategy by intensified learning and updated based on the tactful continuous iteration of cost function until strategy
Convergence, so as to find the action under optimal policy and strategy in each state with maximum value.Not only increase control
Efficiency processed, moreover it is possible to reach the effect of low-loss and energy-saving, eliminates the reliance on artificial experience, saves a large amount of manpowers, and extension and replicability
By force, it can apply in other building Mechatronic Systems.
Fig. 1 is the control method for the building Mechatronic Systems that the embodiment of the present application is provided, it is adaptable to terminal device, terminal device
Can be computer, console, server etc., this method comprises the following steps.
S10, obtains sensing data and determines current state according to goal-selling data.
The data gathered by sensor can be that the ambient condition of interior of building, power supply and are set water supply condition, pipeline
The data such as received shipment row.The desired value to be reached can be pre-set for each item data, so as to be managed to interior of building state
Control.Current state can be determined according to the difference between sensing data and predetermined target value.
S11, predicts according to the cost function based on strategy and performs corresponding actions according to current strategies under current state
Value, and according to preset algorithm iteration recovery value function and its strategy, until the strategy after renewal and the tactful phase before renewal
With untill.
Strategy be the stateful lower set for corresponding to the action performed respectively, a strategy is internal to be likely to occur comprising all
State and its next step execution action corresponding relation.If a state includes a variety of data variables, should by exhaustion
All combinations of multiple data variables are stateful come the institute for determining to be likely to occur;It can also include in each corresponding action multiple
Controlled variable.
Cost function is used for the corresponding relation reflected between state, action and value, and defines state space and action
Space.If a state includes a variety of data variables, whole to define by all combinations of exhaustive the plurality of data variable
Individual state space;If an action includes multiple controlled quentity controlled variables, all combinations of exhaustive multiple controlled quentity controlled variables are whole dynamic to define
Make space;Value refers to that, in each state for performing the benefit corresponding to each action, value is bigger, represents in the state
The lower effect for performing the action is better, contributes to faster close to default control target.The cost function can be Q value matrixs or
Approximating function.
When being initialized to strategy, it can configure that every kind of state is corresponding to perform action.Cost function is carried out initial
The value for performing each action can be given to assign random value during change under each state.In addition also need to initialize Reward Program, root
According to the predetermined target value of interior of building indices variable (for example, environmental index, power supply target, water supply index etc.), calculate
The distance between currency of each index and desired value and the return value after negating as corresponding states:
R (y)=- (y1-y10)2-(y2-y20)2-(y3-y30)2.......;Wherein, r (y) represents return value, y1、y2、
y3... represent the currency of indices variable, y10、y20、y30... represent the desired value of indices variable.
Determine after current state, identical is matched from strategy or closest state, so that it is determined that this is current
The corresponding action of state.Determine to perform the value of the action under current state further according to cost function.Then in conjunction with pre- imputation
Method recovery value function, the action of the Maximum Value under current state is determined according to the cost function after renewal, and will be worth most
Big action is updated into current strategies and current state binding.
S11 as shown in Figure 2 can further comprise the steps:
S110, according to preset algorithm recovery value function.
S111, the action of the Maximum Value under current state is determined according to the cost function after renewal, and by Maximum Value
Action update into the current strategies of cost function.
Whether S112, the strategy before judging the strategy after updating and updating is identical.If both are identical, S113 is performed;
If both are different, S110 is returned.
S113, stops iteration, using the tactful current optimal policy as cost function after renewal.
If the action of Maximum Value is with current state, the corresponding original action in strategy is identical, and the strategy is real after updating
Do not changed in matter;If corresponding original action is different in strategy from current state, the strategy there occurs after updating
Change.
If strategy is changed, continue according to preset algorithm recovery value function, further according to the value after renewal
Function redefines the action of Maximum Value and more new strategy under current state, until the strategy after renewal and the strategy before renewal
It is identical, i.e., strategy in current state it is corresponding action do not change, be now considered as searched out it is optimal under current state
Strategy.
In one embodiment, the cost function Q based on strategy can be updated based on Bellman equationhl。
Qh(x, u)=r (x, u)+γ Qh(f(x,u),h(f(x,u)));Wherein l represents iterations, Qh(x, u) is represented
State x acts the Q values obtained by u according to tactful h execution, and r (x, u) represents the return value obtained by state x execution acts u,
γ represents discount factor, and f (x, u) represents state x and acts the transition equation that u obtains next state by execution.
The corresponding action of maximum Q values is found according to the cost function after renewal, the action is updated into strategy, i.e.,
hl+1(x)∈argmaxuQhl(x,u).When the corresponding action of current state no longer changes under the strategy, i.e. hl+1=hlWhen, stop
Iteration, otherwise returns and continues iteration recovery value function QhlAnd its strategy, until hl+1=hlUntill.
S12, determines the corresponding action of current state according to the strategy after renewal and performs.
In the present embodiment, using measured data real-time optimal control strategy, control efficiency is improved, artificial warp is eliminated the reliance on
Test, moreover it is possible to reach the effect of low-loss and energy-saving, help to find the globally optimal solution of building Mechatronic Systems based on policy control, can be with
Realize optimum control of the complication system to many equipment multiple targets.
In one embodiment, in addition to being pre-configured with to initial policy, initial policy, which can also be, utilizes product
Obtained from tired historical data is trained to neutral net.By the state accumulated in preset duration and its action can be performed
Data as training data, or, when above-mentioned data accumulation is to predetermined number as training data, built in advance for training
Vertical neutral net, the error between the action actually performed in the prediction action of neutral net and the training data of accumulation
Untill pre-determined threshold.Neutral net is divided into input layer, hidden layer and output layer, and it is state that it, which is inputted, is output as
The action of prediction, wherein hidden layer are configured to preferably use amendment linear unit in 10 implicit nodes, the present embodiment
(Rectified Linear Unit, ReLU) activation primitive.ReLU activation primitive expression formulas are:F (x)=max (0, x).ReLU
The advantage of activation primitive is:Gradient is unsaturated, and gradient calculation formula is:1{x>0 }, in back-propagation process, ladder is alleviated
The problem of spending disperse;Calculating speed is fast, during forward-propagating, and S-shaped (sigmoid) activation primitive and tanh (tanh) swash
Function living needs gauge index when calculating activation value, and ReLU functions only need to set threshold value, if x<0, then f (x)=0, such as
Fruit x>0, then f (x)=x, accelerates the calculating speed of forward-propagating.
In addition, when obtaining initial policy using trained neural metwork training, if according in above-described embodiment
Iteration recovery value function and its strategy process within preset duration (such as 30 minutes) still fail to reach predetermined target value,
Then it can continue to train the neutral net using the state and its action data of accumulation in the preset duration.Said process is such as
Shown in Fig. 3, this method further comprises:
S13, judges whether to reach goal-selling shape after by preset duration according to the data got from sensor
State.When not up to goal-selling state, step S14 is performed.
S14, continues to train neutral net using the state and its action data accumulated in preset duration.Training is updated to obtain
More ageing control initial policy after, return to step S10 continues to control building Mechatronic Systems according to new initial policy
To reach goal-selling state as early as possible.
In addition, in the embodiment of the present application, terminal device can by hardware processor (hardware processor) come
Realize each above-mentioned functional steps.Terminal device includes:Processor, the memory for storing processor-executable instruction;Its
In, processor is configured as:Obtain sensing data and determine current state according to goal-selling data;According to based on strategy
Cost function prediction performs the value of corresponding actions under current state according to current strategies, and is updated according to preset algorithm iteration
Cost function and its strategy, untill the strategy after renewal is identical with the strategy before renewal;Determined according to the strategy after renewal
The corresponding action of current state is simultaneously performed.
In one embodiment, according to preset algorithm iteration recovery value function and its strategy, after once updating
Include untill strategy is identical with the strategy before renewal:
According to preset algorithm recovery value function;The Maximum Value under current state is determined according to the cost function after renewal
Action, and by Maximum Value action update into the current strategies of cost function;
Whether the strategy before judging the strategy after updating and updating is identical;The strategy before strategy and renewal after renewal is not
Meanwhile, return to the step of above-mentioned iteration recovery value function and its strategy;The strategy after strategy before renewal is with updating is identical
When, stop iteration, using the tactful current optimal policy as cost function after renewal.
In one embodiment, include according to preset algorithm recovery value function:
Based on Bellman equation Qh(x, u)=r (x, u)+γ Qh(f (x, u), h (f (x, u))) updates the value based on strategy
Function Qhl, wherein l represents iterations, QhQ value of (x, the u) representative obtained by state x acts u according to tactful h execution, r (x,
U) return value obtained by state x execution acts u is represented, γ represents discount factor, and it is dynamic by performing that f (x, u) represents state x
The transition equation of next state is obtained as u.
In one embodiment, the processor is configured to:Utilize the historic state and its action data of accumulation
Training neutral net obtains the strategy, and the input of neutral net is state, is output as action.
In one embodiment, the activation primitive of neutral net is ReLU functions.
In one embodiment, the processor is configured to:If obtained after preset duration from sensor
To data be not up to goal-selling data, then continue to train nerve using the state and its action data of accumulation in preset duration
Network.
It will be understood by those skilled in the art that embodiments herein can be provided as method, device (equipment) or computer
Program product.Therefore, in terms of the application can be using complete hardware embodiment, complete software embodiment or combination software and hardware
Embodiment form.Moreover, the application can be used in one or more meters for wherein including computer usable program code
The computer journey that calculation machine usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of sequence product.
The application is with reference to the flow chart according to the method for the embodiment of the present application, device (equipment) and computer program product
And/or block diagram is described.It should be understood that can be by each flow in computer program instructions implementation process figure and/or block diagram
And/or square frame and the flow in flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided to refer to
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is made to produce
One machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for realizing
The device for the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
The preferred embodiment of the application is the foregoing is only, the application is not limited to, for those skilled in the art
For, the application can have various changes and change.It is all any modifications made within spirit herein and principle, equivalent
Replace, improve etc., it should be included within the protection domain of the application.
Claims (8)
1. a kind of control method for building Mechatronic Systems, it is characterised in that methods described includes:
Obtain sensing data and determine current state according to goal-selling data;
The value for performing corresponding actions according to current strategies under the current state is predicted according to the cost function based on strategy,
And the cost function and its strategy are updated according to preset algorithm iteration, until the strategy after renewal is identical with the strategy before updating
Untill;
The corresponding action of the current state is determined according to the strategy after renewal and performed.
2. according to the method described in claim 1, it is characterised in that according to preset algorithm iteration update the cost function and its
Strategy, includes untill the strategy after once updating is identical with the strategy before renewal:
The cost function is updated according to preset algorithm;
The action of the Maximum Value under the current state is determined according to the cost function after renewal, and by the Maximum Value
Action is updated into the current strategies of the cost function;
Whether the strategy before judging the strategy after updating and updating is identical;
When the strategy after renewal is tactful different from before renewal, the step of above-mentioned iteration recovery value function and its strategy is returned
Suddenly;
When the strategy before renewal is tactful identical with after renewal, stops iteration, regard the strategy after the renewal as the valency
The current optimal policy of value function.
3. method according to claim 2, it is characterised in that updating the cost function according to preset algorithm includes:
Based on Bellman equation Qh(x, u)=r (x, u)+γ Qh(f (x, u), h (f (x, u))) updates the cost function based on strategy
Qhl, wherein l represents iterations, Qh(x, u) represents the Q values obtained by state x acts u according to tactful h execution, r (x, u) generation
Return value of the table obtained by state x execution acts u, γ represents discount factor, and f (x, u) represents state x and acts u by execution
Obtain the transition equation of next state.
4. according to the method described in claim 1, it is characterised in that methods described also includes:
The strategy, the input of the neutral net are obtained using historic state and its action data the training neutral net of accumulation
For state, action is output as.
5. method according to claim 4, it is characterised in that the activation primitive of the neutral net is ReLU functions.
6. method according to claim 4, it is characterised in that methods described also includes:
If the data got after preset duration from sensor are not up to the goal-selling data, using described pre-
If the state and its action data of accumulation continue to train the neutral net in duration.
7. a kind of terminal device, it is characterised in that including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:Perform claim requires the control of the building Mechatronic Systems described in 1 to 6 any one
Method.
8. a kind of computer-readable recording medium, is stored thereon with computer instruction, it is characterised in that the instruction is held by processor
The step of control method that Mechatronic Systems is built described in claim 1-6 is realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710592114.3A CN107315572B (en) | 2017-07-19 | 2017-07-19 | Control method of building electromechanical system, storage medium and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710592114.3A CN107315572B (en) | 2017-07-19 | 2017-07-19 | Control method of building electromechanical system, storage medium and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107315572A true CN107315572A (en) | 2017-11-03 |
CN107315572B CN107315572B (en) | 2020-08-11 |
Family
ID=60178838
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710592114.3A Active CN107315572B (en) | 2017-07-19 | 2017-07-19 | Control method of building electromechanical system, storage medium and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107315572B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111505944A (en) * | 2019-01-30 | 2020-08-07 | 珠海格力电器股份有限公司 | Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control |
CN117970819A (en) * | 2024-04-01 | 2024-05-03 | 北京邮电大学 | Optimal control method and system for nonlinear electromechanical system under state constraint |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982344A (en) * | 2012-11-12 | 2013-03-20 | 浙江大学 | Support vector machine sorting method based on simultaneously blending multi-view features and multi-label information |
CN105652754A (en) * | 2016-03-18 | 2016-06-08 | 江苏联宏自动化系统工程有限公司 | Comprehensive electricity consumption measurement and control management terminal |
CN105959353A (en) * | 2016-04-22 | 2016-09-21 | 广东石油化工学院 | Cloud operation access control method based on average reinforcement learning and Gaussian process regression |
CN106125595A (en) * | 2016-06-22 | 2016-11-16 | 北京小米移动软件有限公司 | Control the method and device of terminal applies |
CN106228314A (en) * | 2016-08-11 | 2016-12-14 | 电子科技大学 | The workflow schedule method of study is strengthened based on the degree of depth |
CN106842925A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
-
2017
- 2017-07-19 CN CN201710592114.3A patent/CN107315572B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982344A (en) * | 2012-11-12 | 2013-03-20 | 浙江大学 | Support vector machine sorting method based on simultaneously blending multi-view features and multi-label information |
CN105652754A (en) * | 2016-03-18 | 2016-06-08 | 江苏联宏自动化系统工程有限公司 | Comprehensive electricity consumption measurement and control management terminal |
CN105959353A (en) * | 2016-04-22 | 2016-09-21 | 广东石油化工学院 | Cloud operation access control method based on average reinforcement learning and Gaussian process regression |
CN106125595A (en) * | 2016-06-22 | 2016-11-16 | 北京小米移动软件有限公司 | Control the method and device of terminal applies |
CN106228314A (en) * | 2016-08-11 | 2016-12-14 | 电子科技大学 | The workflow schedule method of study is strengthened based on the degree of depth |
CN106842925A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
Non-Patent Citations (2)
Title |
---|
LEEMON BAIRD: "Residual Algorithms:Reinforcement Learning with Function Approximation", 《PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON MACHING LEARNING》 * |
隋先超: "电力系统电压无功控制方法研究", 《中国优秀硕士学位论文全文数据库 工程科技II辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111505944A (en) * | 2019-01-30 | 2020-08-07 | 珠海格力电器股份有限公司 | Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control |
CN117970819A (en) * | 2024-04-01 | 2024-05-03 | 北京邮电大学 | Optimal control method and system for nonlinear electromechanical system under state constraint |
Also Published As
Publication number | Publication date |
---|---|
CN107315572B (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20150206050A1 (en) | Configuring neural network for low spiking rate | |
CN105637540A (en) | Methods and apparatus for reinforcement learning | |
WO2017091629A1 (en) | Reinforcement learning using confidence scores | |
KR102596158B1 (en) | Reinforcement learning through dual actor critical algorithm | |
CN106709565A (en) | Optimization method and device for neural network | |
TW201602807A (en) | COLD neuron spike timing back propagation | |
US20050273296A1 (en) | Neural network model for electric submersible pump system | |
CN110781969B (en) | Air conditioner air volume control method, device and medium based on deep reinforcement learning | |
CN108133085B (en) | Method and system for predicting equipment temperature in electronic equipment cabin | |
KR20160062052A (en) | Automated method for modifying neural dynamics | |
KR20160145636A (en) | Modulating plasticity by global scalar values in a spiking neural network | |
TWI550530B (en) | Method, apparatus, computer readable medium, and computer program product for generating compact representations of spike timing-dependent plasticity curves | |
TW201602923A (en) | Probabilistic representation of large sequences using spiking neural network | |
CN105335375B (en) | Topics Crawling method and apparatus | |
CN107315572A (en) | Build control method, storage medium and the terminal device of Mechatronic Systems | |
CN116627027A (en) | Optimal robustness control method based on improved PID | |
JP6902487B2 (en) | Machine learning system | |
Mousavi et al. | Applying q (λ)-learning in deep reinforcement learning to play atari games | |
CN116050505A (en) | Partner network-based intelligent agent deep reinforcement learning method | |
CN107367929A (en) | Update method, storage medium and the terminal device of Q value matrixs | |
WO2020121494A1 (en) | Arithmetic device, action determination method, and non-transitory computer-readable medium storing control program | |
CN115906673B (en) | Combat entity behavior model integrated modeling method and system | |
GB2595833A (en) | System and method for applying artificial intelligence techniques to reservoir fluid geodynamics | |
CN107315573A (en) | Build control method, storage medium and the terminal device of Mechatronic Systems | |
US9342782B2 (en) | Stochastic delay plasticity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |