CN110456644B

CN110456644B - Method and device for determining execution action information of automation equipment and electronic equipment

Info

Publication number: CN110456644B
Application number: CN201910744441.5A
Authority: CN
Inventors: 李江涛
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2022-12-06
Anticipated expiration: 2039-08-13
Also published as: CN110456644A

Abstract

Disclosed is a method for determining information of an execution action of an automation device in a real environment, comprising: determining an observable state quantity of a pre-execution time point of the automation equipment in a real environment; determining an unobservable state quantity of the automation device for a pre-execution time point in a real environment based on the observable state quantity; and determining the execution action information of the automation equipment in the real environment at the preset time point through a first neural network model based on the unobservable state quantity and the observable state quantity. The unobservable state quantity and the observable state quantity contain information required by the real environment of the automation equipment, the first neural network model has enough observation information, the first neural network model can be quickly converged during training to obtain an optimal solution, the training is quickly successful, and the execution action information of the pre-execution time point of the automation equipment in the real environment determined based on the first neural network model is closer to the reality.

Description

Method and device for determining execution action information of automation equipment and electronic equipment

Technical Field

The invention relates to the technical field of automation control, in particular to a method and a device for determining execution action information of automation equipment in a real environment and electronic equipment.

Background

In recent years, when solving a control problem in a relatively complicated real environment using reinforcement learning, many state quantities of an automation apparatus are difficult to observe due to the size, cost, and complexity of sensors in the automation apparatus, particularly in the automation apparatus.

Due to the limited observation information, the first neural network model trained on the limited observation information in the simulator cannot be applied to a real environment to control the automation equipment to execute actions.

Disclosure of Invention

In order to solve the technical problem, embodiments of the present application provide a method, an apparatus, and an electronic device for determining execution action information of an automation device in a real environment.

According to one aspect of the application, a method for determining information of an execution action of an automation device in a real environment is provided, and the method comprises the following steps: determining an observable state quantity of an automatic device in a real environment for a time point in advance; determining an unobservable state quantity of the automation device at the pre-execution time point in a real environment based on the observable state quantity of the pre-execution time point; and determining the execution action information of the automation equipment at the pre-execution time point in the real environment through a first neural network model based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point.

According to another aspect of the application, an apparatus for determining information of an execution action of an automation device in a real environment is provided, including: the observable state quantity determining module is used for determining the observable state quantity of the automation equipment in a real environment at a pre-execution time point; an unobservable state quantity determining module, configured to determine, based on the observable state quantity of the pre-execution time point, an unobservable state quantity of the automation device at the pre-execution time point in a real environment; and the execution action information determining module is used for determining the execution action information of the automation equipment at the preset time point in the real environment through a first neural network model based on the unobservable state quantity of the preset time point and the observable state quantity of the preset time point.

According to another aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program for executing the method of any one of the above.

According to another aspect of the present application, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is configured to perform any of the above methods.

The method for determining the execution action information of the automation equipment in the real environment, provided by the embodiment of the application, comprises the steps of determining an observable state quantity of a pre-execution time point of the automation equipment in the real environment, determining an unobservable state quantity of the pre-execution time point of the automation equipment in the real environment based on the observable state quantity of the pre-execution time point, determining the execution action information of the pre-execution time point of the automation equipment in the real environment based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point through a first neural network model, training the first neural network model to obtain the execution action information of the pre-execution time point of the automation equipment in the real environment based on the observable state quantity of the pre-execution time point, training the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point to contain information required by the real environment of the automation equipment based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point, wherein the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point can be fast observed state quantity and the observable state quantity of the pre-execution time point and the automation equipment in the execution time point are enough to obtain fast observation information, so that the observable state quantity can be fast observed information required by fast convergence in the optimal convergence in the real environment when the optimization training is successful; and the first neural network model is trained on the basis of enough observation information, so that the execution action information of the automation equipment at the pre-execution time point in the real environment determined by the first neural network model is closer to the reality.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally indicate like parts or steps.

Fig. 1 is a flowchart illustrating a method for determining information of an action performed by an automation device in a real environment according to an exemplary embodiment of the present application.

Fig. 2 is a schematic flowchart of determining an unobservable state quantity of a pre-execution time point of an automation device in a real environment based on an observable state quantity according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating a method for determining information of an action performed by an automation device in a real environment according to still another exemplary embodiment of the present application.

FIG. 4 is a schematic diagram of a training data flow in a simulator provided by an exemplary embodiment of the present application.

FIG. 5 is a schematic diagram of data flow in a real environment provided by an exemplary embodiment of the present application.

Fig. 6 is a schematic structural diagram of an apparatus for determining information of an action performed by an automation device in a real environment according to an exemplary embodiment of the present application.

Fig. 7 is a schematic structural diagram of an unobservable state quantity determining module in an apparatus for determining information of an action performed by an automation device in a real environment according to an exemplary embodiment of the present application.

Fig. 8 is a schematic structural diagram of an apparatus for determining information of an action performed by an automation device in a real environment according to still another exemplary embodiment of the present application.

Fig. 9 is a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

At present, when the first neural network model is trained by a simulator, the state quantity of the real environment of the automation device is not used as observation (observation) information, but only the state quantity acquired by a sensor in the automation device is selected as observation information. Therefore, the first neural network model is slow in convergence due to lack of necessary information during training, an optimal solution cannot be obtained, and even training fails; and the simulator can not be applied to a real environment to control the automatic equipment to execute actions based on a first neural network model obtained by limited observation information training. The simulator in the application refers to a computer program, and the input of the computer program is a motion model and a control quantity of an object; and simulating the performance of the object in the virtual physical environment by calculating physical quantities such as motion, collision, gravity, friction and the like.

In order to solve the technical problems, the basic concept of the application is to provide a method, a device and an electronic device for determining execution action information of an automation device in a real environment, wherein the observable state quantity of a pre-execution time point of the automation device in the real environment is determined, the unobservable state quantity of the pre-execution time point of the automation device in the real environment is determined based on the observable state quantity of the pre-execution time point, the execution action information of the pre-execution time point of the automation device in the real environment is determined through a first neural network model based on the observable state quantity of the pre-execution time point, the unobservable state quantity of the pre-execution time point is obtained based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point, the execution action information of the pre-execution time point of the automation device in the real environment is obtained by training the first neural network model, the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point comprise information required by the automation device in the real environment, the first neural network model has enough convergence information, and the first neural network model can be trained quickly and can be obtained when the execution time is trained successfully and the observed information can be obtained; and the first neural network model trained based on enough observation information, the execution action information of the automation equipment at the pre-execution time point in the real environment determined by the first neural network model is closer to the reality, and the first neural network model can be applied to the real environment to control the automation equipment to execute actions.

It should be noted that the application scope of the present application is not limited to the field of robotics. For example, the technical solution mentioned in the embodiment of the present application may also be applied to other intelligent mobile automation devices, and is specifically used to provide control technical support for the intelligent mobile automation devices.

Various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary method

Fig. 1 is a flowchart illustrating a method for determining information of an action performed by an automation device in a real environment according to an exemplary embodiment of the present application. The method for determining the execution action information of the automation equipment in the real environment can be applied to the technical field of robots and can also be applied to the field of other intelligent movable automation equipment. As shown in fig. 1, a method for determining information of an action performed by an automation device in a real environment according to an embodiment of the present application includes the following steps:

step 101, an observable state quantity of a pre-execution time point of the automation equipment in a real environment is determined.

The automation equipment can be a robot and the like, a real environment is an actual environment where the automation equipment is located, a pre-execution time point is a time point corresponding to an action to be executed by the automation equipment, and a observable state quantity is state information of the automation equipment, wherein the state information of the automation equipment can be obtained by a sensor in the automation equipment in the real environment.

And 102, determining the unobservable state quantity of the pre-execution time point of the automation equipment in the real environment based on the observable state quantity of the pre-execution time point.

The unobservable state variable is, in other words, state information of the automation device which cannot be obtained by sensors in the automation device in the real environment.

Wherein, the observation state quantity comprises at least any one of the following items and any combination: the gravity center position, gravity center speed, gravity center acceleration, posture, rotating force moment vector, external stress magnitude and direction of the rigid body component of the automatic equipment; and/or the non-observable state quantities include at least any one and any combination of: the magnitude of the external force applied to the automation device and the direction of the motion of the automation device.

For example: when the automation device is a robot, its observed state quantity (i.e., information required for walking control) includes: the gravity center position, the gravity center speed, the gravity center acceleration, the posture (rotation matrix), the rotating force moment vector, the external force magnitude, the robot action direction and the like of the rigid body component part can be obtained through a sensor of the robot, the gravity center position, the gravity center speed, the gravity center acceleration, the posture (rotation matrix), the rotating moment vector and the like of the rigid body component part can be observed state quantities, and the external force magnitude, the action direction and the like can not be obtained through the sensor of the robot, and can not be observed state quantities.

And 103, determining the execution action information of the pre-execution time point of the automation equipment in the real environment through the first neural network model based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point.

The execution action information of the pre-execution time point, namely the action to be executed by the automation equipment at the pre-execution time point, is based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point, and the first neural network model is trained to determine the execution action information of the pre-execution time point of the automation equipment in the real environment, so that the execution action information is more accurate.

The method for determining the execution action information of the automatic equipment in the real environment comprises the steps of determining an observable state quantity of a pre-execution time point of the automatic equipment in the real environment, determining an unobservable state quantity of the pre-execution time point of the automatic equipment in the real environment based on the observable state quantity of the pre-execution time point, determining the execution action information of the pre-execution time point of the automatic equipment in the real environment based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point, training a first neural network model to obtain the execution action information of the pre-execution time point of the automatic equipment in the real environment based on the observable state quantity of the pre-execution time point, and training the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point to contain information required by the real environment of the automatic equipment, wherein the first neural network model has enough observation information, can be rapidly converged during training to obtain an optimal rapid observable state quantity and an observable state quantity of the pre-execution time point; and the first neural network model is trained on the basis of enough observation information, and the execution action information of the automation equipment at the pre-execution time point in the real environment determined by the first neural network model is closer to the reality, so that the first neural network model can be applied to the real environment to control the automation equipment to execute actions.

An exemplary embodiment of the present application provides a method for determining observable state quantities of an automation device at pre-execution time points in a real environment. The embodiment shown in the present application is extended based on the embodiment shown in fig. 1 of the present application, and the differences between the embodiment shown in the present application and the embodiment shown in fig. 1 are mainly described below, and the descriptions of the same parts are omitted.

In the method for determining information of an execution action of an automation device in a real environment provided by an embodiment of the present application, determining an observable state quantity of a pre-execution time point of the automation device in the real environment includes:

the observable state quantity of the time point in the real environment of the automation device is determined through a sensor of the automation device.

In particular, sensors in the automation device observe the state of the automation device in the real environment to obtain corresponding information.

According to the method for determining the action execution information of the automation equipment in the real environment, the observable state quantity of the automation equipment at the time point in the real environment is determined through the sensor of the automation equipment, the observable state quantity is the actual information observed by the sensor in the automation equipment, the real situation can be reflected, and the accuracy of the first neural network model is further improved.

Fig. 2 is a flowchart illustrating a process for determining an unobservable state quantity of an automation device at a pre-execution time point in a real environment based on an observable state quantity according to an exemplary embodiment of the present application. The embodiment shown in fig. 2 of the present application is extended based on the embodiment shown in fig. 1 of the present application, and the differences between the embodiment shown in fig. 2 and the embodiment shown in fig. 1 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 2, in the method for determining information of an execution action of an automation device in a real environment according to an embodiment of the present application, determining an unobservable state quantity of a time point, which is pre-executed by the automation device in the real environment, based on an observable state quantity includes:

step 1021, determining the observable state quantity and the unobservable state quantity of the executed time point of the automation device in the real environment, and the executed action information of the executed time point.

Specifically, the executed time point is a time point before the pre-execution time point, and the executed time point may be a certain time point before the pre-execution time point, or may be a certain number of time points before the pre-execution time point, and may be set according to an actual application situation, which is not limited herein.

And step 1022, determining the unobservable state quantity of the time point in the real environment of the automation device through the second neural network model based on the observable state quantity and the unobservable state quantity of the executed time point, the executed motion information of the executed time point and the observable state quantity of the time point.

Specifically, based on the observable state quantity and the unobservable state quantity of the executed time point, the executed motion information of the executed time point and the observable state quantity of the pre-executed time point, a second neural network model is trained, and the unobservable state quantity of the pre-executed time point of the automation device in the real environment is determined.

The method for determining the execution action information of the automation equipment in the real environment, which is mentioned in the embodiment of the application, determines the observable state quantity and the unobservable state quantity of the executed time point of the automation equipment in the real environment, and the execution action information of the executed time point, determines the unobservable state quantity of the executed time point of the automation equipment in the real environment through the second neural network model based on the observable state quantity and the unobservable state quantity of the executed time point, the execution action information of the executed time point, and the observable state quantity of the pre-executed time point, and can quickly determine the unobservable state quantity through the second neural network model, thereby improving the speed of determining the unobservable state quantity.

Fig. 3 is a flowchart illustrating a method for determining information of an action performed by an automation device in a real environment according to still another exemplary embodiment of the present application. The embodiment shown in fig. 3 of the present application is extended based on the embodiment shown in the previous application, and the differences between the embodiment shown in fig. 3 and the previous embodiment are mainly described below, and the description of the same parts is omitted.

As shown in fig. 3, in the method for determining information of an action performed by an automation device in a real environment according to an embodiment of the present application, the method further includes:

and 104, training a first neural network model by using a deep reinforcement learning method.

The first neural network model is trained by using a deep reinforcement learning method, and the method is an unsupervised learning method.

In particular, the first neural network model may be constructed by a deep neural network.

The first Neural Network model constructed by the deep Neural Network may use structures such as MLP (Multi-Layer Perceptron) (full connection), RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory, long Short-Term Memory Network), and the like.

And 105, training a second neural network model by using a supervised learning method while training the first neural network model.

Wherein, while training the first neural network model, the second neural network model is trained by using a supervised learning method by utilizing a state-action-observation sequence generated by the first neural network model.

In particular, a second neural network model may be constructed by a deep neural network.

The second neural network model constructed by the deep neural network may use MLP (fully-connected), RNN, LSTM, or the like.

According to the method for determining the action execution information of the automation equipment in the real environment, the first neural network model is trained by using a deep reinforcement learning method, the second neural network model is trained by using a supervised learning method while the first neural network model is trained, the second neural network model can determine unobservable state quantities, and the first neural network model and the second neural network model can be migrated to the real environment lacking partial observation for use.

Yet another exemplary embodiment of the present application provides a method of determining performance action information of an automation device in a real environment. The embodiment shown in the present application is extended based on the embodiment shown in fig. 3 of the present application, and the differences between the embodiment shown in the present application and the embodiment shown in fig. 3 are mainly described below, and the descriptions of the same parts are omitted.

In the method for determining the information of the action performed by the automation device in the real environment provided by the embodiment of the application, the method further includes:

and applying the first neural network model and the second neural network model to the automation equipment of the real environment.

Specifically, the first neural network model and the second neural network model are applied to the automation equipment in the real environment, and the automation equipment can be controlled to act in the real environment according to the execution action information of the preset time point.

According to the method for determining the action execution information of the automation equipment in the real environment, the first neural network model and the second neural network model are applied to the automation equipment in the real environment, so that the automation equipment can be controlled to act in the real environment according to the action execution information of the preset execution time point, and the action execution accuracy of the automation equipment in the real environment is improved.

Yet another exemplary embodiment of the present application provides a method of determining information of an execution action of an automation device in a real environment. The embodiments shown in the present application are extended from the embodiments shown in the previous application, and the differences between the embodiments shown in the present application and the embodiments shown in the previous application are mainly described below, and the descriptions of the same parts are omitted.

and controlling the automation equipment to act in the real environment according to the execution action information of the preset time point through the second neural network model and the first neural network model.

Specifically, the unobservable state quantity of the time point of the automation equipment in the real environment is determined through the second neural network model, the execution action information of the time point of the automation equipment in the real environment is determined through the first neural network model based on the unobservable state quantity of the time point of the automation equipment and the observable state quantity of the time point of the automation equipment, and therefore the automation equipment is controlled to act in the real environment according to the execution action information of the time point of the automation equipment.

According to the method for determining the action execution information of the automation equipment in the real environment, the automation equipment is controlled to act in the real environment according to the action execution information of the pre-execution time point through the second neural network model and the first neural network model, the action of the automation equipment in the real environment can be realized, and the action accuracy in the real environment can be improved.

In order to facilitate understanding of the present application, the following further illustrates an example, in an embodiment of the present application, an automation device is a robot, a pre-execution time point is an nth step, an executed time point is an N-1 st step (in practical applications, k may also include k steps, k is a natural number greater than 1, and the executed time point is k previous steps (i.e., the N-1 st step, the N-2 nd step, to the N-k th step)), and a method for determining execution motion information of the automation device in a real environment according to the present application is integrated in a simulator to pre-train a first neural network model. The input of the simulator is observable state quantity and non-observable state quantity such as a motion model and control quantity of the automation equipment, and the performance of the automation equipment in a virtual physical environment is simulated by calculating physical quantities such as motion, collision, gravity, friction force and the like.

All the state quantities of the simulator are pre-calculated values, so that all the state quantities thereof can be observed. Firstly, a set of observation state quantities required by a current task is selected from all state quantities, and specifically, which are relevant to the task are selected, for example, for the gait walking control of the robot, the position, the posture, the gravity center position and the speed of each joint, the vector of partial limb stress and the like are generally required. The set of observed state quantities of the simulator is then divided into two parts, respectively observable state quantities (the part that can be obtained by the sensor in the real environment) and non-observable state quantities (the part that cannot be obtained by the sensor in the real environment).

Referring to fig. 4, which is a schematic diagram of a training data flow in the simulator provided in an exemplary embodiment of the present application, an action performed in the step N-1 is input to the simulator 10, an observable state quantity in the step N and an unobservable state quantity in the step N are output, the observable state quantity in the step N and the unobservable state quantity in the step N are input to the first neural network model 20, and an action performed in the step N is output; the data flow used to train the second neural network model 30 is: observable state quantity of step N-1 and unobservable state quantity of step N-1, action performed in step N-1, and observable state quantity of step N. The second neural network model output is: and the non-observable state quantity of the Nth step. While training the reinforcement learning first neural network model using simulator 10 according to fig. 4, the second neural network model 30 is trained using supervised learning methods using the state-action-observation sequence it produces until convergence.

Referring to fig. 5, a schematic diagram of data flow in a real environment provided by an exemplary embodiment of the present application is shown, where (observable state quantity at step N-1 and unobservable state quantity at step N-1, and action performed at step N-1) + (observable state quantity at step N acquired by a sensor of a robot in the real environment) + (second neural network model 30) - - > unobservable state quantity at step N. The method comprises the steps of (observable state quantity of the Nth step acquired by a sensor of the robot in the real environment) + (unobservable state quantity of the Nth step calculated by a second neural network model) + (a first neural network model 20) - - > action to be executed in the Nth step.

It should be noted that, when this step is executed for the first time, the unobservable state quantity of the executed time point (i.e., the unobservable state quantity of the N-1 st step) is not yet determined by the second neural network model, and is default to 0, and when this step is executed subsequently, the unobservable state quantity of the executed time point is calculated according to the actual value.

Exemplary devices

Fig. 6 is a schematic structural diagram of an apparatus for determining information of an action performed by an automation device in a real environment according to an exemplary embodiment of the present application. The device for determining the execution action information of the automation equipment in the real environment can be applied to the technical field of robots and can also be applied to the field of other intelligent movable automation equipment. As shown in fig. 6, an apparatus for determining information of an action performed by an automation device in a real environment according to an embodiment of the present application includes:

an observable state quantity determining module 201, configured to determine an observable state quantity of a time point that is pre-executed by the automation device in a real environment;

the unobservable state quantity determining module 202 is configured to determine, based on the observable state quantity of the pre-execution time point, an unobservable state quantity of the pre-execution time point of the automation device in the real environment;

and the execution action information determining module 203 is used for determining the execution action information of the pre-execution time point of the automation equipment in the real environment through the first neural network model based on the unobservable state quantity of the pre-execution time point and the observable state quantity of the pre-execution time point.

An exemplary embodiment of the present application provides an observable state quantity determination module 201. The embodiment shown in the present application is extended based on the embodiment shown in fig. 6 of the present application, and the differences between the embodiment shown in the present application and the embodiment shown in fig. 6 are mainly described below, and the descriptions of the same parts are omitted.

In the apparatus for determining information of an execution action of an automation device in a real environment according to the embodiment of the present application, the observable state quantity determining module 201 is specifically configured to determine an observable state quantity of a time point that the automation device executes in the real environment in advance based on a sensor of the automation device.

Fig. 7 is a schematic structural diagram of the module 202 for determining unobservable state quantities in the apparatus for determining information of actions performed by an automation device in a real environment according to an exemplary embodiment of the present application. The embodiment shown in fig. 7 of the present application is extended based on the embodiment shown in fig. 6 of the present application, and the differences between the embodiment shown in fig. 7 and the embodiment shown in fig. 6 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 7, in the apparatus for determining information of an action performed by an automation device in a real environment according to an embodiment of the present application, the unobservable state quantity determining module 202 includes:

a first determining unit 2021, configured to determine observable and unobservable state quantities of the executed time point and executed action information of the executed time point in the real environment of the automation device;

a second determining unit 2022, configured to determine, through the second neural network model, an unobservable state quantity of the pre-execution time point of the automation device in the real environment based on the observable state quantity and the unobservable state quantity of the executed time point, the execution motion information of the executed time point, and the observable state quantity of the pre-execution time point.

Fig. 8 is a schematic structural diagram of an apparatus for determining information of an action performed by an automation device in a real environment according to still another exemplary embodiment of the present application. The embodiment shown in fig. 8 of the present application is extended based on the embodiment shown in fig. 7 of the present application, and the differences between the embodiment shown in fig. 8 and the embodiment shown in fig. 7 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 8, in the apparatus for determining information of an action performed by an automation device in a real environment according to an embodiment of the present application, the apparatus further includes:

a first training module 204 for training a first neural network model using a deep reinforcement learning method;

and a second training module 205, configured to train the second neural network model using a supervised learning method while training the first neural network model.

The application further provides a schematic structural diagram of a device for determining information of an action performed by an automation device in a real environment according to another exemplary embodiment. The embodiment shown in the present application is extended based on the embodiment shown in fig. 8 of the present application, and the differences between the embodiment shown in the present application and the embodiment shown in fig. 8 are mainly described below, and the descriptions of the same parts are omitted.

In the apparatus for determining information of an execution action of an automation device in a real environment provided by an embodiment of the present application, the apparatus further includes:

and the application module is used for applying the first neural network model and the second neural network model to the automation equipment in the real environment.

The application further provides a schematic structural diagram of a device for determining information of executing actions of an automation device in a real environment according to another exemplary embodiment. The embodiments shown in the present application are extended from the embodiments shown in the previous figures of the present application, and the differences between the embodiments shown in the present application and the embodiments shown in the previous figures are emphasized below, and the descriptions of the same parts are omitted.

and the control module is used for controlling the automation equipment to act in the real environment according to the execution action information of the pre-execution time point through the second neural network model and the first neural network model.

It should be understood that the operations and functions of the observable state quantity determining module 201, the unobservable state quantity determining module 202, the execution action information determining module 203, the first training module 204, the second training module 205 in the apparatus for determining execution action information of an automation device in a real environment provided in fig. 6 to 8, and the first determining unit 2021 and the second determining unit 2022 included in the unobservable state quantity determining module 202 may refer to the method for determining execution action information of an automation device in a real environment provided in fig. 1 to 5, and are not repeated herein to avoid repetition.

Exemplary electronic device

FIG. 9 illustrates a block diagram of an electronic device in accordance with an embodiment of the application.

As shown in fig. 9, the electronic device 11 includes one or more processors 111 and memory 112.

The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.

Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 111 to implement the method for determining the information of the action performed by the automation device in the real environment of the various embodiments of the application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, the input device 113 may be a camera or a microphone, a microphone array, or the like, for capturing an input signal of an image or a sound source. When the electronic device is a stand-alone device, the input means 113 may be a communication network connector for receiving the acquired input signal from a network processor.

The input device 113 may also include, for example, a keyboard, a mouse, and the like.

The output device 114 may output various information to the outside, including the determined output voltage, output current information, and the like. The output devices 114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for the sake of simplicity, only some of the components related to the present application in the electronic device 11 are shown in fig. 9, and components such as a bus, an input/output interface, and the like are omitted. In addition, the electronic device 11 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and devices, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method of determining information of an execution action of an automation device in a real environment described in the above-mentioned "exemplary methods" section of this specification.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps of the method of determining execution action information of an automation device in a real environment according to various embodiments of the present application described in the above section "exemplary methods" of the present specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present application have been described above with reference to specific embodiments, but it should be noted that advantages, effects, etc. mentioned in the present application are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of determining execution action information of an automation device in a real environment, comprising:

determining an observable state quantity of an automatic device in a real environment for a time point in advance;

determining an unobservable state quantity of the automation device at the pre-execution time point in a real environment based on the observable state quantity of the pre-execution time point;

determining, by a first neural network model, execution action information of the automation device at the pre-execution time point in a real environment based on an unobservable state quantity of the pre-execution time point and an observable state quantity of the pre-execution time point;

wherein determining the unobservable state quantity of the automation device at the pre-execution time point in the real environment based on the observable state quantity of the pre-execution time point comprises:

determining observable state quantity and unobservable state quantity of the executed time point of the automation equipment in a real environment, and executing action information of the executed time point;

determining, by a second neural network model, an unobservable state quantity of the automation device at the pre-execution time point in a real environment based on the observable state quantity and the unobservable state quantity of the executed time point, the execution action information of the executed time point, and the observable state quantity of the pre-execution time point.

2. The method of claim 1, wherein determining an observable state quantity of an automation device at a pre-execution time point in a real environment comprises:

the observable state variable of the time point in the real environment of the automation device is determined by a sensor of the automation device.

3. The method of claim 1, the observable state quantity comprising at least any one and any combination of: the gravity center position, gravity center speed, gravity center acceleration, posture, rotational force moment vector, external stress magnitude and direction of the rigid body component of the automation equipment; and/or, the unobservable state quantity comprises at least any one of and any combination of the following: the magnitude of an external force applied to the automated device and the direction of motion of the automated device.

4. The method of claim 1, further comprising:

training the first neural network model using a deep reinforcement learning method;

training the second neural network model using a supervised learning approach while training the first neural network model.

5. The method of claim 4, further comprising:

applying the first neural network model and the second neural network model to the automation device in a real environment.

6. The method of claim 5, further comprising:

and controlling the automation equipment to act according to the execution action information of the preset time point in a real environment through the second neural network model and the first neural network model.

7. An apparatus for determining information of an execution action of an automation device in a real environment, comprising:

the observable state quantity determining module is used for determining the observable state quantity of the automation equipment in a real environment at a pre-execution time point;

an unobservable state quantity determining module, configured to determine, based on the observable state quantity of the pre-execution time point, an unobservable state quantity of the automation device in a real environment at the pre-execution time point, including: determining observable state quantity and unobservable state quantity of the executed time point of the automation equipment in a real environment, and executing action information of the executed time point; determining, by a second neural network model, an unobservable state quantity of the automation device at the pre-execution time point in a real environment based on the observable state quantity and the unobservable state quantity of the executed time point, the execution action information of the executed time point, and the observable state quantity of the pre-execution time point; and

and the execution action information determining module is used for determining the execution action information of the automation equipment at the preset time point in the real environment through a first neural network model based on the unobservable state quantity of the preset time point and the observable state quantity of the preset time point.

8. A computer-readable storage medium, in which a computer program is stored, which is adapted to carry out the method of determining information of an action performed by an automation device in a real environment according to any one of the preceding claims 1 to 6.

9. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute the method for determining the information of the execution action of the automation device in the real environment according to any one of claims 1 to 6.