CN117692487A

CN117692487A - Remote control method and device of equipment, electronic equipment and storage medium

Info

Publication number: CN117692487A
Application number: CN202311684441.3A
Authority: CN
Inventors: 王坤柠
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2023-12-08
Filing date: 2023-12-08
Publication date: 2024-03-12

Abstract

The invention discloses a remote control method and device of equipment, electronic equipment and a storage medium. The method comprises the following steps: acquiring an output signal output by a sensor; transmitting the output signal through a first network based on an event trigger mechanism; determining an observation value from the output signal based on discrete time feedback; determining control information based on the reinforcement learning model from the observations and the output signal; the control information is executed in the object by an actuator, and a new output signal of the object is captured by a sensor. The output signal is transmitted through the first network based on the event triggering mechanism, so that the output signal is transmitted through the first network when an event is triggered, and compared with the current real-time transmission, the transmission method and the device can reduce unnecessary transmission and reduce the transmission data quantity. Compared with the prior art that the control information is directly transmitted, the control information obtained through the reinforcement learning model can generate more accurate control information, and the control effect is optimized.

Description

Remote control method and device of equipment, electronic equipment and storage medium

Technical Field

The present invention relates to the field of intelligent control, and in particular, to a remote control method and apparatus for a device, an electronic device, and a storage medium.

Background

With the development of computer network technology and communication technology, the structure of the current control system is gradually modularized and distributed, the layout of the system is larger, the information exchange between system elements is more complex, and the information needs to be transmitted through a local area network and the internet.

Network control systems are increasingly becoming the focus of research and use by many businesses and research institutions today. The network control system refers to a spatially distributed closed-loop control system for transmitting data between nodes of the system through a communication network.

As network control tasks increase, more and more information of the closed-loop system is transmitted through the network. The information communication mode of the network control system is instant transmission, however, the mode has large data volume and poor control effect. How to reduce the data transmission amount and improve the control effect becomes a problem to be solved.

Disclosure of Invention

The invention provides a remote control method and device of equipment, electronic equipment and a storage medium, which are used for solving the problems of large real-time transmission data volume and poor control effect.

According to an aspect of the present invention, there is provided a remote control method of an apparatus, including:

acquiring an output signal output by a sensor;

transmitting the output signal through a first network based on an event trigger mechanism;

determining an observation value from the output signal based on discrete time feedback;

determining control information based on the reinforcement learning model from the observations and the output signals;

the control information is executed in the object by an actuator, and a new output signal of the object is captured by a sensor.

According to another aspect of the present invention, there is provided a remote control apparatus of a device, comprising:

the acquisition module is used for acquiring an output signal output by the sensor;

the event trigger mechanism module is used for transmitting the output signal through a first network based on an event trigger mechanism;

the state observation module is used for determining an observation value according to the output signal based on discrete time feedback;

the control information determining module is used for determining control information according to the observed value and the output signal based on the reinforcement learning model;

and the execution module is used for executing the control information in the object through an executor and capturing a new output signal of the object through a sensor.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a method of remote control of the apparatus according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a remote control method of an apparatus according to any of the embodiments of the present invention.

According to the technical scheme, the output signal output by the sensor is obtained; transmitting the output signal through a first network based on an event trigger mechanism; determining an observation value from the output signal based on discrete time feedback; determining control information based on the reinforcement learning model from the observations and the output signals; the control information is executed in the object by an actuator, and a new output signal of the object is captured by a sensor. The output signal is transmitted through the first network based on the event triggering mechanism, so that the output signal is transmitted through the first network when an event is triggered, and compared with the current real-time transmission, the transmission method and the device can reduce unnecessary transmission and reduce the transmission data quantity. Based on the discrete-time feedback, an observed value is determined from the output signal, the observed value can be predicted from the output signal based on the discrete-time feedback, and then control information is determined from the observed value and the output signal based on a reinforcement learning model. Compared with the prior art that the control information is directly transmitted, the control information obtained through the reinforcement learning model can generate more accurate control information, and the control effect is optimized.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a remote control method of a device according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a remote control system of a device according to a first embodiment of the present invention;

fig. 3 is a schematic structural diagram of a remote control device of an apparatus according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device implementing a remote control method of a device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

Example 1

Fig. 1 is a schematic flow chart of a remote control method of a device according to an embodiment of the present invention, where the method may be performed by a remote control apparatus of a device, and the remote control apparatus of the device may be implemented in hardware and/or software, and the remote control apparatus of the device may be configured in an electronic device. As shown in fig. 1, the method includes:

s110, obtaining an output signal output by the sensor.

Fig. 2 is a schematic diagram of a remote control system of a device according to an embodiment of the present invention, including an actuator, an object, a sensor, a first network, a state observer, and an adaptive reinforcement learning controller. The actuator controls the object to work according to the input data u (k), and the sensor detects the numerical value of the tested item on the object to obtain an output signal u (k). The output signal is determined whether to transmit through the first network by the determination of the event trigger mechanism. If the transmission is determined, the transmission is performed through the first network. If it is determined that no transmission is performed, continuously generating an output signal of the current period through a zero-order keeper on the other side of the first network. The process can reduce an amount of data transmission in the first network based on the event trigger. The state observer is used for determining an observed value according to the output signal based on discrete time feedback. The output signals and the observed values at discrete time points can be compared based on the discrete time feedback, the observed value used in the current period is obtained, and the observation efficiency is improved. The adaptive reinforcement learning controller is used for determining control information according to the observed value and the output signal based on a reinforcement learning model. And generating an accurate input signal through the reinforcement learning model, and optimizing the control effect.

S120, transmitting the output signal through a first network based on an event triggering mechanism.

Optionally, the transmitting the output signal through the first network based on an Event trigger mechanism (Event-triggered Mechanism, ETM) may be implemented as:

responding to event triggering, and judging whether the error of the output signal and the previous output signal exceeds an error range or not;

if the error exceeds the error range, transmitting the output signal through a first network;

if the error does not exceed the error range, the previous output signal is maintained by the zero-order keeper.

Exemplary, define trigger time as k ₀ ＝0＜k ₁ ＜k ₂ ＜…＜k _∞ Defining the trigger sequence asAs shown in fig. 2, the sensor and the state observer communicate via a first network, and at a trigger interval k=k _s +1,k _s +2,…,k _s+1 -1, using a Zero-Order Holder (ZOH) before the state observer to hold the network transmission information at the last trigger time. The signal before the state observer remains unchanged at the triggering interval; at the moment of triggering the sensor information is transmitted via the communication network to the state observer and at the same time the zero-order holder is updated. The default initial time of the scheme is the trigger time.

The following trigger errors are defined: y is _e (k)＝y(k)-y(k _s ) (6-1)

Wherein y is _e (k) Is the triggering error of the system output. Based on the above analysis, the trigger error may further represent:

in the embodiment of the invention, two assumptions are given, and a state observer and a model in the adaptive reinforcement learning controller are constructed based on the assumptions.

Suppose 1: unknown functions in system modelsMeeting Lipschitz conditions, i.e. there is a Lipohshing constant L _i > 0 (i=1, 2) such that |f _i (x ₁ (k))-f _i (x ₂ (k))|≤L _i ||x ₁ (k)-x ₂ (k)||。

Suppose 2: the neural network basis function satisfies Lipschitz conditions, i.e. there is a Lipohshing constant L _s > 0, such that S (x ₁ (k))-S(x ₂ (k))||≤L _s ||x ₁ (k)-x ₂ (k)||。

S130, based on discrete time feedback, determining an observation value according to the output signal.

Alternatively, based on discrete-time feedback, determining an observed value from the output signal may be implemented as:

constructing a first neural first network model of errors and observations based on discrete time feedback;

and determining an observed value according to the output signal and the first neural first network model.

The first network model of the dynamic event triggering state observer in the embodiment of the invention comprises the following components:

wherein the method comprises the steps ofIs to state x _i (k) Is>Unknown dynamics in the system is treated by a neural network>Is satisfied-> Can also be written as theta as ideal weight _i Satisfy-> For the weight error ε _i (k) To approximate the error, k _i For observer gain, i=1, 2, …, n.

Notably, at the trigger time, y (k _s ) =y (k), the latest information y (k) can be used in the state observer to calculate the state information for the next moment; at the trigger interval, the state observer needs to use y (k) in the ZOH _s )＝y(k)-y _e (k) To calculate the status information for the next time instant.

Define the observation error e (k) = [ e ] ₁ (k),e ₂ (k),…,e _n (k)] ^T Whereini=1, 2, …, n. From the system model 2-1 and the state observer 6-3, the observation error model is obtained as follows:

wherein K= [ K ] ₁ ,…,k _n ] ^T ，

ε(k)＝[ε ₁ (k),ε ₂ (k),…,ε _n (k)] ^T ，G'＝diag{g ₁ ,g ₂ ,…,g _n-1 }。

Because of the effect of the ETM,the y (k) in (a) cannot be acquired in time temporarily. Thereby defining the available observation error as +.>The state observer neural network update law design is as follows.

The lyapunov function was designed as:

V _o (k)＝V _e (k)+μV _θ (k) (6-6)

wherein V is _e (k)＝e ^T (k)Pe(k)，Mu is a positive constant.

And carrying out differential operation on the above method to obtain:

from hypothesis 1, it can be obtainedSubstituting the above formula, the following can be obtained:

wherein q=p-5|p||l _f -5A ^T PA，κ ₁ ＝5||P|||ε| ² 。

Consider pair V _θ (k) Calculating the difference, the following can be obtained:

wherein the method comprises the steps ofβ _i Lambda is the design parameter _2i ＝σ _i (1-2σ _i )，/>κ _2i ＝σ _i ||θ _i || ² +(0.5+β _i )γ _i ² . Parameter satisfaction->1-2σ _i > 0, i=1, 2, …, n. In conclusion, pair V can be obtained _o (k) The difference of (2) is:

thus, the Lyapunov function of the observation error and the observer neural network weight error is calculated. Note that the state observer lyapunov function also includes a trigger error term 5K caused by ETM ^T Ky _e ² (k) A. The invention relates to a method for producing a fibre-reinforced plastic composite It is next necessary to design a suitable ETM according to the characteristics of equations 6-10 to ensure good observability and control performance while reducing network load.

Defining a triggering dynamic variable h (k), and designing an updating law of the triggering dynamic variable h (k) as follows:

h(k+1)＝q ₁ h(k)+Γ|e ₁ (k)| ² -5K ^T K|y _e (k)| ² (6-11)

wherein h (0) > 0, q ₁ For design parameters, satisfy 0 < q ₁ ＜1。

Based on the design, the dynamic event triggering conditions are designed as follows:

wherein q is ₂ For the design parameters, satisfyAt the trigger interval, the above formula holds; when the above formula is not established, triggering occurs.

Definition V _h (k) =h (k), lyapunov function was designed as:

V ₁ (k)＝V _o (k)+V _h (k) (6-13)

for V _o (k) The first order difference calculation of (2) is shown in equations 6-10. For V _h (k) The first order difference of (a) is:

ΔV _h (k)＝V _h (k+1)-V _h (k)

＝h(k+1)-h(k)

＝q ₁ h(k)+Γ|e ₁ (k)| ² -5K ^T K|y _e (k)| ² -h(k)

＝-(1-q ₁ )h(k)+Γ|e ₁ (k)| ² -5K ^T K|y _e (k)| ² (6-14)

from formulas 6 to 10 and 6 to 14, V is found ₁ (k) The first order difference of (a) is:

namely:

where c=diag {1,0,., 0}, D _B ＝κ ₁ +μκ ₂ -(1-q ₁ )h(k)。

It is known that when the parameters satisfy (6-17), the state observer observation error and the observer neural network weight error satisfy the coincidence and finally are bounded.

Where P is a symmetric positive definite matrix and l is the upper bound of the norm square of the HONN basis function, i.e. |S _i (k)|| ² <l, and lambda _min ＝min{λ ₁₁ ,λ ₁₂ ,…,λ _1n I=1, 2,..n, Γ is a design parameter.

Alternatively, the construction and application of the first network model can be achieved through equations 6-3, 6-5, 6-1 and 6-12.

And S140, determining control information according to the observed value and the output signal based on a reinforcement learning model.

Alternatively, determining control information according to the observed value and the output signal may be implemented as:

estimating performance indexes according to the observed values by constructing an evaluation network;

and determining the optimized performance index and control information according to the performance index through the constructed execution network.

An evaluation network is first constructed. The role of the evaluation network is to estimate the long-term performance index function Q (k) of the system. To describe the performance of the system at a single instant, a binary performance index function p (k) is defined:

where c is a threshold parameter describing the current performance,by superposition of the individual moments p (k), the index function Q (k) of the long-term performance of the system can be expressed as:

Q(k)＝α ^N p(k+1)+α ^N-1 p(k+2)+...+α ^k+1 p(N)+… (6-19)

q (k) can be further written as:

Q(k)＝min _u(k) {αQ(k-1)-α ^N+1 p(k)} (6-20)

estimating the data by adopting an evaluation neural network, and writing:

wherein the method comprises the steps ofThe Q (k) practically available is:

and then obtaining an update law of the evaluation network weight through a gradient descent algorithm. According to equation (6-20), an evaluation network error function is defined:

defining an evaluation network error index function:

by means of ETM and gradient descent algorithm, it is possible to obtain:

wherein the method comprises the steps ofα _c To evaluate network parameters.

In summary, the applicable evaluation network update law is:

for simplicity of description hereafter, definitions are providedAn execution network is then constructed. The purpose of the execution network is to reduce the output of the evaluation network>And simultaneously, the optimal control performance is achieved. Defining the execution network error is:

wherein g=g ₁ g ₂ …g _n ，Q _d (k) =0. It can be seen that->Wherein->The above can be further written as:

defining the execution network error index function asAccording to the ETM and gradient descent algorithm, the update law of the executable network is:

wherein the method comprises the steps ofα _a Parameters are designed for the implementation of the network. The available execution network update laws are:

by an iterative transformation method, use is made ofReplacing in execution network update laws (6-30)Due to z ₁ (k+n) is not directly available, and is used here>To sum up, the network update law may be implemented as:

alternatively, the evaluation network may be implemented by formulas 6-26 and the execution network by formulas 6-31.

S150, executing the control information in the object through an executor, and capturing a new output signal of the object through a sensor.

Alternatively, the control information is executed in the object by the actuator, and the new output signal of the object is captured by the sensor, which may be implemented as:

transmitting the control information through a second network;

The actuator and the adaptive reinforcement learning controller may be located on different hardware entities. After the control information is obtained, the control information is sent to the actuator through the second network so as to be applied to the object to perform closed-loop control of the next period.

Example two

Fig. 3 is a schematic structural diagram of a remote control device of an apparatus according to a second embodiment of the present invention. As shown in fig. 3, the apparatus includes:

an acquisition module 21 for acquiring an output signal output from the sensor;

an event trigger mechanism module 22, configured to transmit the output signal through a first network based on an event trigger mechanism;

a state observation module 23 for determining an observation value from the output signal based on discrete time feedback;

a control information determining module 24 for determining control information based on the reinforcement learning model from the observed value and the output signal;

an execution module 25 for executing the control information in the object by means of an actuator, capturing a new output signal of the object by means of a sensor.

Based on the above embodiment, the event trigger mechanism module 22 is optionally configured to:

On the basis of the above embodiment, optionally, the state observation module 23 is configured to:

On the basis of the above embodiment, optionally, the control information determining module 24 is configured to:

Based on the above embodiment, optionally, the execution module 25 is configured to:

transmitting the control information through a second network;

According to the technical scheme of the embodiment of the invention, the acquisition module 21 is used for acquiring an output signal output by the sensor; an event trigger mechanism module 22, configured to transmit the output signal through a first network based on an event trigger mechanism; a state observation module 23 for determining an observation value from the output signal based on discrete time feedback; a control information determining module 24 for determining control information based on the reinforcement learning model from the observed value and the output signal; an execution module 25 for executing the control information in the object by means of an actuator, capturing a new output signal of the object by means of a sensor. The output signal is transmitted through the first network based on the event triggering mechanism, so that the output signal is transmitted through the first network when an event is triggered, and compared with the current real-time transmission, the transmission method and the device can reduce unnecessary transmission and reduce the transmission data quantity. Based on the discrete-time feedback, an observed value is determined from the output signal, the observed value can be predicted from the output signal based on the discrete-time feedback, and then control information is determined from the observed value and the output signal based on a reinforcement learning model. Compared with the prior art that the control information is directly transmitted, the control information obtained through the reinforcement learning model can generate more accurate control information, and the control effect is optimized.

The remote control device of the equipment provided by the embodiment of the invention can execute the remote control method of the equipment provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example III

Fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a remote control method of the device.

In some embodiments, the remote control method of the device may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the remote control method of the apparatus described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the remote control method of the device in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

The computer program for implementing the remote control method of the device of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

Example IV

The fourth embodiment of the present invention also provides a computer-readable storage medium storing computer instructions for causing a processor to execute a remote control method of an apparatus, the method comprising:

acquiring an output signal output by a sensor;

Based on the above embodiment, optionally, transmitting the output signal through the first network based on an event trigger mechanism includes:

On the basis of the above embodiment, optionally, determining the observed value according to the output signal based on discrete time feedback includes:

On the basis of the above embodiment, optionally, determining control information based on the observed value and the output signal based on a reinforcement learning model includes:

On the basis of the above embodiment, optionally, the control information is executed in the object by an actuator, and a new output signal of the object is captured by a sensor, including:

transmitting the control information through a second network;

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a developer, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a developer; and a keyboard and pointing device (e.g., a mouse or a trackball) by which a developer can provide input to the electronic device. Other kinds of devices can also be used to provide for interaction with a developer; for example, feedback provided to the developer may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the developer may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a developer computer having a graphical developer interface or a web browser through which a developer can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A remote control method of a device, comprising:

acquiring an output signal output by a sensor;

2. The method of claim 1, wherein transmitting the output signal over the first network based on an event trigger mechanism comprises:

3. The method of claim 1, wherein determining observations from the output signal based on discrete-time feedback comprises:

4. The method of claim 1, wherein determining control information from the observations and the output signals based on a reinforcement learning model comprises:

5. The method of claim 1, wherein executing the control information in the object by an actuator, capturing a new output signal of the object by a sensor, comprises:

transmitting the control information through a second network;

6. A data transmission-based control device, comprising:

7. The apparatus of claim 6, wherein the state observation module is configured to:

8. The apparatus of claim 6, wherein the control information determination module is configured to:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the remote control method of the apparatus of any one of claims 1-5.

10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of remote control of the apparatus of any one of claims 1-5.