WO2023082949A1 - Agent control method and apparatus, electronic device, program, and storage medium - Google Patents

Agent control method and apparatus, electronic device, program, and storage medium Download PDF

Info

Publication number
WO2023082949A1
WO2023082949A1 PCT/CN2022/125695 CN2022125695W WO2023082949A1 WO 2023082949 A1 WO2023082949 A1 WO 2023082949A1 CN 2022125695 W CN2022125695 W CN 2022125695W WO 2023082949 A1 WO2023082949 A1 WO 2023082949A1
Authority
WO
WIPO (PCT)
Prior art keywords
digital twin
agent
target task
world
control
Prior art date
Application number
PCT/CN2022/125695
Other languages
French (fr)
Chinese (zh)
Inventor
黄晓庆
马世奎
彭飞
Original Assignee
达闼科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 达闼科技(北京)有限公司 filed Critical 达闼科技(北京)有限公司
Publication of WO2023082949A1 publication Critical patent/WO2023082949A1/en

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • the embodiments of the present application relate to the technical field of intelligent control, and in particular to an intelligent body control method, device, electronic equipment, program, and storage medium.
  • the data collected by smart devices is usually used as input for learning and training, and the output is used to control the actions of intelligent agents.
  • the output is used to control the actions of intelligent agents.
  • RGBD RGB-Depth Map, RGB color mode and depth map
  • RGBD information For RGBD information, it usually requires a camera to perform image acquisition and recognition.
  • the data acquired by the camera not only includes RGBD information, but also includes a variety of unnecessary parameters, such as: light and shadow conditions, image data of nearby obstacles, etc.
  • Image data is screened and processed, which inevitably requires a large amount of data calculation process, that is, when RGBD information is used as input data for learning and training, there is a problem of data acquisition difficulties, and the data calculation equipment requires high computing power.
  • the large amount of data that needs to be processed leads to slow training convergence, and in some execution processes, there is also the problem of complex migration of virtual and real data during the calculation process. Due to the complex data processing process, the control efficiency of the training and learning process on the agent is low.
  • the purpose of the embodiments of the present application is to provide a control method, device, electronic equipment, and storage medium for an agent, which reduces the complexity of data processing, thereby improving the control efficiency of the agent.
  • the embodiment of the present application provides a control method of an agent, including the following steps: obtaining the target task; according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, generating A control instruction for controlling the digital twin to complete the target task; wherein, the digital twin world is obtained through simulation mapping of the physical world, the digital twin is located in the digital twin world, and the intelligent body is located in the physical world The world, and corresponds to the digital twin; according to the control instructions for completing the target task, control the intelligent body to perform the target task.
  • the embodiment of the present application also provides a control device for an intelligent body, including: an acquisition module for acquiring a target task; a generation module for, according to the environmental data of the digital twin world, the pose of the intelligent body and the reinforcement learning network, Generate control instructions for controlling the digital twin to complete the target task; wherein, the digital twin world is obtained through simulation mapping of the physical world, the digital twin is located in the digital twin world, and the intelligent body is located in The physical world corresponds to the digital twin; an execution module is configured to control the agent to execute the target task according to the control instruction for completing the target task.
  • An embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information that can be executed by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned control method of the intelligent body.
  • the embodiment of the present application also provides a computer program, which implements the above-mentioned intelligent agent control method when the computer program is executed by a processor.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program, and implementing the above-mentioned intelligent agent control method when the computer program is executed by a processor.
  • the physical world is simulated through the digital twin world, and there is a digital twin corresponding to the agent in the physical world in the digital twin world; the digital twin is operated through control instructions in the digital twin world, It can simulate the result of the control instruction to operate the agent, and obtain appropriate control instructions through training to make the agent perform the target task.
  • the process of preprocessing the input parameters such as RGBD, and it also reduces the complexity of data calculation for the control commands output by the agent, and improves the control efficiency for the agent.
  • the generation of control instructions for controlling the digital twin to complete the target task according to the environment data of the digital twin world, the pose of the agent, and the reinforcement learning network includes: The pose and the spatial semantic map used to represent the environmental data are input into the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin; the reinforcement learning network is based on the As a result of executing the control instruction, the digital twin is trained to obtain the control instruction to complete the target task. That is, simulation training is carried out in the digital twin world through the environment data, the pose of the agent and the reinforcement learning network, and it is continuously adjusted according to the feedback results until the control instruction to complete the target task is obtained.
  • the initial control instruction output by the reinforcement learning network is generated according to prior data; wherein, the prior data is obtained according to a user's action of controlling the digital twin through an interactive device.
  • the prior data is the data that can achieve the target task or is close to the target task. Using the prior data as the initial control instruction can reduce the number of training times and reduce the complexity of data processing.
  • the digital twin world is loaded on a cloud server; the control system for controlling the digital twin to complete the target task is generated according to the environmental data of the digital twin world, the pose of the agent, and the reinforcement learning network.
  • the instructions include: generating control instructions for controlling the digital twin to complete the target task based on the environmental data of the digital twin world, the pose of the agent, and the reinforcement learning network through interaction with the cloud server. Loading the digital twin world on the cloud can greatly reduce the data calculation requirements for the agent itself and reduce the complexity of equipment settings. At the same time, the data processing capabilities of cloud servers are generally high, which can further improve the acquisition and completion of the target tasks. Efficiency of control commands.
  • the method further includes: closing the rendering function; After the control instruction of the target task is completed, the method further includes: enabling the rendering function.
  • the rendering function is used to display to the user, and generally occupies more computing resources; the data before generating the control command to complete the target task generally has no practical effect on the user, so the rendering function is canceled during this period, and the device's Data processing resources are all used to generate control instructions, which can improve the generation efficiency of control instructions.
  • the rendering function is turned on, so that the process of the digital twin executing the control command is visualized for the user, and the user can know the simulated execution process of the control command.
  • Fig. 1 is a flow chart of the control method of the agent provided according to one embodiment of the present application
  • Fig. 2 is a schematic diagram of a control device of an intelligent body provided according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of an electronic device provided according to an embodiment of the present application.
  • first and second in the embodiments of the present application are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • the terms “including” and “having” and any variations thereof are intended to cover non-exclusive inclusion. For example, a system, product or equipment comprising a series of components or units is not limited to the listed components or units, but optionally also includes components or units not listed, or optionally also includes the components or units for these products or Other parts or units inherent in equipment.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
  • An embodiment of the present application relates to a method for controlling an agent. The specific process is shown in Figure 1.
  • Step 101 acquiring a target task.
  • Step 102 according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, generate control instructions for controlling the digital twin to complete the target task; wherein, the digital twin world is obtained through the simulation mapping of the physical world, and the digital twin The twin is located in the digital twin world, and the agent is located in the physical world and corresponds to the digital twin.
  • Step 103 Control the agent to execute the target task according to the control instruction for completing the target task.
  • the physical world is simulated through the digital twin world, and there is a digital twin corresponding to the agent in the physical world in the digital twin world; in the digital twin world, the digital twin is operated through control instructions, and the control can be simulated.
  • the instruction to operate the agent appropriate control instructions are obtained through training to enable the agent to perform the target task.
  • control method of the agent in this embodiment will be described in detail below.
  • the following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this solution.
  • training all represent the process of obtaining control instructions to complete the target task.
  • a target task is acquired.
  • the target task may be obtained from the user, other interactive devices, or the cloud; wherein, the target task is, for example, a task related to a spatial position such as moving a specified item or grabbing a specified item.
  • the target task does not necessarily require a three-dimensional positional relationship, and it can also be independent of the three-dimensional space position, such as image (two-dimensional) recognition, audio processing, image-to-text conversion, etc., as long as it can be performed by the robot.
  • step 102 according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated; wherein, the digital twin world simulates the physical world
  • the digital twin world is located in the digital twin world
  • the agent is located in the physical world and corresponds to the digital twin.
  • the digital twin world is obtained by mapping the real physical world, transforming the environment in the physical world into digital content for display, and can simulate the positional relationship of objects in the physical world and related environmental information.
  • the acquisition method of the digital twin world it can be obtained through modeling by a modeler, or directly scanning the physical world, etc.; the intelligent body in the physical world can be a robot, and there are intelligent bodies (robots) in the digital twin world
  • the corresponding digital twin can simulate the behavior of the agent in the digital twin world. Since the digital twin world is a digital embodiment of the physical world, the interaction between the digital twin and the surrounding environment when it is active in the digital twin world, Ability to simulate the consequences of an agent performing the same activity in the physical world. In the digital twin world, it involves the geometric structure corresponding to the physical world, the spatial position, the physical structure constraints of the agent, and the simulation of physical characteristics (such as friction coefficient, gravity, etc.).
  • control instructions for controlling the digital twin to complete the target task are generated, for example: the pose and pose of the agent are used to represent the environment
  • the spatial semantic map of the data is input into the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin; the reinforcement learning network is trained to obtain the control instructions to complete the target task according to the result of the digital twin executing the control instructions.
  • the reinforcement learning network inputting the pose of the agent in the current physical world into the reinforcement learning network, can also obtain the spatial semantic map representing the environmental data of the digital twin world; since the digital twin corresponds to the agent in the digital twin world , so the pose of the agent in the physical world acquired by the reinforcement learning network is the initial action of the digital twin, and the reinforcement learning network outputs control instructions for changing the action of the digital twin; the digital twin is in the digital twin world according to The control instruction of the reinforcement learning network changes the action, and the reinforcement learning network obtains the result obtained by the digital twin after changing the action according to the control instruction, compares the result with the target task, and adjusts the control instruction adaptively according to the comparison data, Until the digital twin simulates and completes the target task in the digital twin world according to a certain control instruction of the reinforcement learning network, the control instruction is the control instruction that controls the agent to perform the target task. That is, simulation training is carried out in the digital twin world through the environment data, the pose of the agent and the reinforcement learning network, and it is continuously adjusted
  • the result of the digital twin's action changes according to the control instructions, including but not limited to the chassis and body posture of the digital twin, whether a collision has occurred, whether the target task has been completed, etc.; the content of the control instructions includes but is not limited to: control Digital twin movement, limb movement, etc.
  • the reinforcement learning network has different interfaces for obtaining relevant information, such as the state observation interface, which is used to collect the state of the digital twin world, involving the chassis of the agent and the pose of the whole body, and the spatial semantic map, such as when the target task When picking up the cup, it can collect the distance from the target cup, etc.;
  • the action control interface is used to output the control instructions of the reinforcement learning network, which can be applied to the digital twin world, such as controlling the movement of the digital twin body, limb movement, etc.; feedback The interface is used to collect the result feedback when the digital twin in the digital twin world performs actions according to the control instructions, such as whether there is a collision, whether the target task is completed, etc.
  • the reinforcement learning network starts to output control instructions to the digital twin gradually based on the digital twin world according to the target task, so that it can Able to complete target tasks.
  • the reinforcement learning network can divide the target task to be completed into multiple sub-steps. After the control instruction corresponding to each sub-step is sent to the digital twin, it will obtain the feedback after the digital twin executes the control instruction, and judge whether to complete the sub-step. task, so as to gradually acquire a set of control instructions that can complete the target task.
  • the reinforcement learning network will adjust according to the feedback of the digital twin to execute the control command. For example, in the process of executing the movement command, if there is a collision with the environment, the digital twin can be selected Return to the initial position before executing the movement command, update the movement command by reducing the movement distance or adjusting the action angle, and the digital twin executes the updated movement command until the digital twin completes the sub-step; the digital twin completes this time
  • the sub-steps of the movement instruction for example, reach the destination of the movement instruction, or reach the destination of the movement instruction without colliding with the surrounding environment.
  • the reinforcement learning network obtains the result of the successful execution of the sub-step by the digital twin, for example, feedback from the digital twin to the reinforcement learning network to obtain the result of the successful execution of this sub-step, or the reinforcement learning network discovers the digital twin through monitoring the digital twin world Complete this sub-step, etc.; after the reinforcement learning network obtains the information that the digital twin successfully executes this sub-step, it can carry out the training process of the control instruction for the next sub-step, if this sub-step is the last sub-step to complete the target task Or if there is only this one step in the target task, then the obtained control instructions of all the sub-steps successfully executed can be integrated to obtain the control instruction to complete the target task. That is, in the control instructions output by the digital twin to execute the reinforcement learning network, through multiple trials and errors, a set of control instructions that can complete the target task is obtained.
  • the initial control instruction output by the reinforcement learning network is generated based on prior data; wherein the prior data is obtained based on the actions of the user controlling the digital twin through the interactive device. That is, in order to reduce the number of adjustments of the control instructions of the reinforcement learning network, or to reduce the computing memory usage of the reinforcement learning network, the initial control instructions are generated based on the prior data, and the prior data is obtained according to the actions of the user controlling the digital twin through the interactive device , that is, manual control instructions, or in historical records, control instructions that can complete the target task or are close to completing the target task.
  • the process of obtaining the control instruction to complete the target task through reinforcement learning network training is more efficient, reduces the number of intermediate debugging, and reduces the memory usage of data operations.
  • the interactive device includes one of a mouse, a keyboard, a somatosensory device or any combination thereof. Therefore, the prior data is data that can achieve the target task or is close to achieving the target task. Using the prior data as the initial control instruction can reduce the number of training times and reduce the complexity of data processing.
  • instructions input by the trainer through the mouse, keyboard, and somatosensory equipment can be obtained to control the digital twin to interact with the environment, objects or other data in the digital twin world to generate high-quality professional
  • the control instructions obtained from the trainer compared with the control instructions independently generated by the reinforcement learning network, greatly improve the completion rate of the target task.
  • the reinforcement learning network can randomly generate control instructions according to the target task, or generate different types of control instructions according to part of the label information, that is, it cannot guarantee the original generated Relevance of control instructions to target tasks.
  • control instructions obtained from the trainer it can be used as a priori data to train on the basis of the control instructions input by the trainer with high correlation with the target task, which can greatly reduce the adjustment requirements of the control instructions and reduce the computation time. required storage space and time.
  • the trainer inputs the control command, and the digital twin is successfully completed; when the target task is to pick up the water cup at a2, the query finds that there are similar
  • There are executable control instructions for the task of picking up the water cup at a1 and training based on the aforementioned control instructions input by the trainer can significantly reduce calculations compared to direct training to complete the control instruction for picking up the water cup at a2
  • the required time is reduced, the computational complexity is reduced, and the user experience is improved.
  • the spatial semantic map includes: poses of objects in the digital twin world, 3D collision boxes, object classification information, and object material information.
  • the pose of each object in the digital twin world is used to simulate the position of the surrounding objects in the environment where the agent is located in the physical world;
  • the 3D collision box is used to specify or limit the collision relationship in the digital twin world, making it closer to the physical world
  • the movement situation of the object includes, for example, the physical structure of the object
  • the object material information is used to simulate the detailed physical characteristics of the environment in the physical world before and after the agent moves, such as friction coefficient, sliding, etc.
  • reinforcement learning networks include: DQN (Deep Q Network, depth Q value) network model; the input of the DQN network model is an RGBD image including the pose of the agent and the spatial semantic map, and the output of the DQN network model is the action of each joint of the robotic arm.
  • DQN (Deep Q Network, depth Q value) network model as an example, the input of the model is an RGBD image, and the output is the movement of each joint of the robotic arm. °], these three actions are replaced by [-1,0,1] in the network.
  • the robotic arm in this example has a total of 7 joints, so for each frame, DQN inputs an RGBD image and outputs a 7 ⁇ 3 array.
  • the a priori data is obtained in the following ways: through the interactive device, the user receives the operation instructions for controlling the mechanical arm based on the collected RGBD image; Action: save the RGBD image and the action of each joint of the manipulator as prior data.
  • the trainer completes the cup grasping task by observing the collected RGBD images, operating the keyboard, mouse or somatosensory devices to control the robotic arm, During the completion of the task, the rotation of each joint is automatically recorded, and these rotations are combined with the RGBD image to form prior data, which is used as the initial data of DQN.
  • this embodiment is aimed at different target tasks and reinforcement learning networks, and is not limited to acquiring RGBD images, or only acquiring RGBD image information and agent poses.
  • the digital twin world is loaded on the cloud server; according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated, including: through communication with the cloud
  • the interaction of the server generates control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network. That is, the processing of the digital twin world requires high-complexity equipment support and takes up more computing resources; loading the digital twin world on the cloud server can reduce the computing power requirements of the agent device, and the computing power of the cloud server is relatively strong. , which can improve the generation efficiency of the control instruction to complete the target task.
  • the reinforcement learning network can also be located in the cloud server, which can further reduce the data computing resources required by the agent and improve the generation efficiency of control instructions to complete the target task.
  • the agent is controlled to execute the target task according to the control instruction for completing the target task.
  • the agent receives the control instruction and executes the control instruction to complete the target task. That is, loading the digital twin world on the cloud can greatly reduce the data calculation requirements for the agent itself and reduce the complexity of device settings.
  • the data processing capabilities of cloud servers are generally high, which can further improve the acquisition and completion of the above goals. Efficiency of the task's control instructions.
  • the control instruction for controlling the digital twin to complete the target task after obtaining the target task, before generating the control instruction for controlling the digital twin to complete the target task, it also includes: closing the rendering function; after generating the control instruction for controlling the digital twin to complete the target task, further Including: Turn on the rendering function.
  • the rendering function is used to display to the user. Before obtaining the control instruction to complete the target task, the rendering function generally occupies more computing resources; the data before generating the control instruction to complete the target task generally has no practical effect on the user.
  • the training process data can be placed in the storage space to ensure that access can be completed, for example, placed in the CPU (central In the processing unit, central processing unit), the process data is not rendered and displayed to reduce the training complexity, and the rendering takes a long time, reducing the rendering can also improve the training efficiency; when the training is completed or almost completed, the Rendering and displaying the training process data, on the one hand, can enable users to perceive the training results, and on the other hand, can observe whether the control instructions obtained from the training conform to human behavior habits. The habit is to pick up the cup with the rim facing up.
  • control instructions obtained in this training did achieve the purpose of picking up the cup, the cup that was finally picked up was with the rim facing down, which does not conform to the behavior habits of ordinary people.
  • the training process cannot accurately discover the results that do not conform to human behavior habits after the execution of such control instructions, but it is easy to detect and further optimize by observing after rendering.
  • the agent after controlling the agent to perform the target task according to the control instruction for completing the target task, it also includes: in the case that the agent fails to perform the target task, receiving an auxiliary instruction input by the user through the interactive device, the auxiliary instruction is used for Control the agent to successfully execute the target task; after successfully executing the target task, update the prior data according to the actions of the joints of the robotic arm during the execution of the auxiliary instructions.
  • the reinforcement learning network After the reinforcement learning network converges, failure occurs in the subsequent use process
  • human intervention can be used to provide artificial assistance to generate prior data. These prior data can be updated to the DQN network model to improve the robustness of the agent when facing this situation next time. . To achieve the purpose of learning from failure.
  • the digital twin world is updated synchronously with the physical world in real time. Since the digital twin world is to simulate the movement process in the physical world and achieve the purpose of feedback training, if the physical world changes, the data information of the digital twin world needs to be changed synchronously to ensure that the simulation results in the digital twin world conform to the actual physical Motion state and results in the world.
  • this embodiment can use 3D reconstruction technology to perform virtual reconstruction of the real physical world to obtain a digital twin world that restores the real world at a ratio of 1:1, and add a digital twin to it.
  • the digital twin and the physical world corresponding to the intelligent agent.
  • ElasticFusion technology to scan the environment with a depth camera to obtain a digital twin world, and refine the results manually.
  • the trainer can control the digital twin in the digital twin world through the keyboard and mouse, so that it can complete target tasks (such as grabbing cups, pouring drinks, opening cabinet doors, etc.). Generate sufficient prior data for a specific task, and then start the reinforcement learning network for training based on the prior data. The training process takes place in the digital twin world. After the training converges, the reinforcement learning network can be used to control real-world agents to complete corresponding tasks.
  • the physical world is simulated through the digital twin world, and there is a digital twin body corresponding to the agent in the physical world in the digital twin world; in the digital twin world, the digital twin body can be simulated by controlling instructions.
  • the control instruction is the result of operating the agent, and the appropriate control instruction is obtained through training to make the agent perform the target task.
  • the intelligent body can be a robot, that is, through the simulation of the digital twin world, the complexity of the control of the robot can be reduced and the control efficiency of the robot can be improved.
  • the agent control can also be divided into three stages: first, the digital twin technology is used to realize the 1:1 simulation mapping between the physical world and the digital twin world, and the virtual world is updated synchronously in real time; secondly, in the digital twin world, based on reinforcement learning
  • the network uses the spatial semantic map of the twin world and the pose of the agent as input for training and decision-making, and controls the digital twin corresponding to the agent; finally, the behavior of the digital twin is synchronized to control the agent in the physical world. It effectively avoids the complexity problem of training directly based on RGB-D data, and the algorithm converges quickly. At the same time, the algorithm output does not directly control the physical equipment, which effectively reduces the cost of virtual-real migration.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • An embodiment of the present application relates to a control device for an intelligent body, as shown in FIG. 2 , including:
  • An acquisition module 201 configured to acquire a target task.
  • the generation module 202 is used to generate control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network;
  • the simulation mapping shows that the digital twin is located in the digital twin world, and the agent is located in the physical world and corresponds to the digital twin.
  • the execution module 203 is configured to control the agent to execute the target task according to the control instruction for completing the target task.
  • control device of the intelligent body in this embodiment will be described in detail below.
  • the following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this solution.
  • control instructions for controlling the digital twin to complete the target task are generated, including: the pose of the agent And the spatial semantic map used to represent the environmental data, input the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin; the reinforcement learning network is trained to complete the target task according to the results of the digital twin executing the control instructions control instructions.
  • the initial control instruction output by the reinforcement learning network is generated based on prior data; wherein the prior data is obtained based on the actions of the user controlling the digital twin through the interactive device.
  • the spatial semantic map includes: poses of objects in the digital twin world, 3D collision boxes, object classification information, and object material information.
  • the digital twin world is loaded on the cloud server; according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated, including: through communication with the cloud The interaction of the server generates control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network.
  • the reinforcement learning network includes: a DQN network model; the input of the DQN network model is an RGBD image including the pose of the agent and the spatial semantic map, and the output of the DQN network model is the action of each joint of the mechanical arm.
  • the execution module 203 after generating the control instruction for controlling the digital twin to complete the target task, it further includes: enabling the rendering function.
  • the digital twin world is updated synchronously with the physical world in real time.
  • agents can be robots.
  • the physical world is simulated through the digital twin world, and there is a digital twin body corresponding to the agent in the physical world in the digital twin world; in the digital twin world, the digital twin body can be simulated by controlling instructions.
  • Control instructions operate on the result of the agent, so as to obtain appropriate control instructions to make the agent perform the target task.
  • this embodiment is a system embodiment corresponding to the above-mentioned embodiments, and this embodiment can be implemented in cooperation with the above-mentioned embodiments.
  • the relevant technical details mentioned in the foregoing embodiments are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • An embodiment of the present application relates to an electronic device, as shown in FIG. 3 , including at least one processor 301; and a memory 302 connected in communication with at least one processor 301; wherein, the memory 302 stores information that can be processed by at least one
  • the instructions executed by the processor 301 are executed by at least one processor 301, so that the at least one processor 301 can execute the above-mentioned control method of the agent.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • An embodiment of the present application relates to a computer program.
  • the computer program is executed by a processor, the method for controlling an agent as described in any of the above embodiments is implemented.
  • One embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical discs and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Embodiments of the present application relate to the technical field of intelligent control. Disclosed are an agent control method and device, an electronic device, a program, and a storage medium. The method comprises: obtaining a target task; generating, according to environment data of a digital twin world, the pose of an agent, and a reinforcement learning network, a control instruction for controlling a digital twin to complete the target task, the digital twin world being obtained by means of simulation mapping of a physical world, the digital twin being located in the digital twin world, and the agent being located in the physical world and corresponding to the digital twin; and controlling, according to the control instruction for completing the target task, the agent to execute the target task.

Description

智能体的控制方法、装置、电子设备、程序及存储介质Intelligent body control method, device, electronic device, program, and storage medium
本申请基于申请号为“202111329240.2”、申请日为2021年11月10日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is based on the Chinese patent application with the application number "202111329240.2" and the filing date is November 10, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference. Apply.
技术领域technical field
本申请的实施例涉及智能控制技术领域,特别涉及一种智能体的控制方法、装置、电子设备、程序及存储介质。The embodiments of the present application relate to the technical field of intelligent control, and in particular to an intelligent body control method, device, electronic equipment, program, and storage medium.
背景技术Background technique
人工智能领域通常以智能设备采集的数据为输入进行学习训练,输出用于控制智能体的动作。例如通过采集RGBD(RGB-Depth Map,RGB色彩模式和深度图)信息作为输入数据。In the field of artificial intelligence, the data collected by smart devices is usually used as input for learning and training, and the output is used to control the actions of intelligent agents. For example, by collecting RGBD (RGB-Depth Map, RGB color mode and depth map) information as input data.
对于RGBD信息,通常需要摄像头进行图像获取和识别来得到。但摄像头获取的数据不仅包括RGBD信息,还包括多种不必要的参数,例如:光影条件、旁边障碍物体的图像数据等,也就是为了得到目标RGBD信息,在摄像头采集到图像后,还需要对于图像数据进行筛选处理,其中不免会需要大量的数据计算过程,即,在将RGBD信息作为输入数据进行学习训练时,存在数据采集困难的问题,并且对于数据计算设备的对运算能力要求高,由于需要处理的数据量大导致训练收敛慢,在一些执行过程中还会存在虚实数据在计算过程中迁移复杂的问题。由于数据处理过程复杂,该训练学习过程对智能体的控制效率低。For RGBD information, it usually requires a camera to perform image acquisition and recognition. However, the data acquired by the camera not only includes RGBD information, but also includes a variety of unnecessary parameters, such as: light and shadow conditions, image data of nearby obstacles, etc. Image data is screened and processed, which inevitably requires a large amount of data calculation process, that is, when RGBD information is used as input data for learning and training, there is a problem of data acquisition difficulties, and the data calculation equipment requires high computing power. The large amount of data that needs to be processed leads to slow training convergence, and in some execution processes, there is also the problem of complex migration of virtual and real data during the calculation process. Due to the complex data processing process, the control efficiency of the training and learning process on the agent is low.
技术解决方案technical solution
本申请实施例的目的在于提供一种智能体的控制方法、装置、电子设备及存储介质,降低数据处理的复杂程度,从而提高对智能体的控制效率。The purpose of the embodiments of the present application is to provide a control method, device, electronic equipment, and storage medium for an agent, which reduces the complexity of data processing, thereby improving the control efficiency of the agent.
为解决上述技术问题,本申请的实施例提供了一种智能体的控制方法,包括以下步骤:获取目标任务;根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令;其中,所述数字孪生世界通过对物理世界的仿真映射得到,所述数字孪生体位于所述数字孪生世界内,所述智能体位于所述物理世界,且与所述数字孪生体相对应;根据所述完成目标任务的控制指令,控制所述智能体执行所述目标任务。In order to solve the above-mentioned technical problems, the embodiment of the present application provides a control method of an agent, including the following steps: obtaining the target task; according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, generating A control instruction for controlling the digital twin to complete the target task; wherein, the digital twin world is obtained through simulation mapping of the physical world, the digital twin is located in the digital twin world, and the intelligent body is located in the physical world The world, and corresponds to the digital twin; according to the control instructions for completing the target task, control the intelligent body to perform the target task.
本申请的实施例还提供了一种智能体的控制装置,包括:获取模块,用于获取目标任务;生成模块,用于根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令;其中,所述数字孪生世界通过对物理世界的仿真映射得到,所述数字孪生体位于所述数字孪生世界内,所述智能体位于所述物理世界,且与所述数字孪生体相对应;执行模块,用于根据所述完成目标任务的控制指令,控制所述智能体执行所述目标任务。The embodiment of the present application also provides a control device for an intelligent body, including: an acquisition module for acquiring a target task; a generation module for, according to the environmental data of the digital twin world, the pose of the intelligent body and the reinforcement learning network, Generate control instructions for controlling the digital twin to complete the target task; wherein, the digital twin world is obtained through simulation mapping of the physical world, the digital twin is located in the digital twin world, and the intelligent body is located in The physical world corresponds to the digital twin; an execution module is configured to control the agent to execute the target task according to the control instruction for completing the target task.
本申请的实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的智能体的控制方法。An embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information that can be executed by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned control method of the intelligent body.
本申请的实施例还提供了一种计算机程序,所述计算机程序被处理器执行时实现上述的智能体的控制方法。The embodiment of the present application also provides a computer program, which implements the above-mentioned intelligent agent control method when the computer program is executed by a processor.
本申请的实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的智能体的控制方法。The embodiment of the present application also provides a computer-readable storage medium storing a computer program, and implementing the above-mentioned intelligent agent control method when the computer program is executed by a processor.
在本申请的实施例中,通过数字孪生世界对物理世界进行模拟,并在数字孪生世界中存在与物理世界中智能体对应的数字孪生体;在数字孪生世界中通过控制指令操作数字孪生体,能够模拟控制指令操作智能体的结果,通过训练获取合适的控制指令以使得智能体执行目标任务。不需要考虑对RGBD等输入参数进行预处理的过程,也降低对智能体输出的控制指令的数据计算的复杂度,提高对于智能体的控制效率。In the embodiment of this application, the physical world is simulated through the digital twin world, and there is a digital twin corresponding to the agent in the physical world in the digital twin world; the digital twin is operated through control instructions in the digital twin world, It can simulate the result of the control instruction to operate the agent, and obtain appropriate control instructions through training to make the agent perform the target task. There is no need to consider the process of preprocessing the input parameters such as RGBD, and it also reduces the complexity of data calculation for the control commands output by the agent, and improves the control efficiency for the agent.
在一些实施例中,所述根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令,包括:将所述智能体的位姿和用于表征所述环境数据的空间语义地图,输入所述强化学习网络,所述强化学习网络输出用于控制所述数字孪生体的动作的控制指令;所述强化学习网络根据所述数字孪生体执行所述控制指令的结果,训练得到完成所述目标任务的控制指令。即,通过环境数据、智能体的位姿和强化学习网络在数字孪生世界中进行模拟训练,根据反馈结果不断调整,直至得到完成所述目标任务的控制指令。In some embodiments, the generation of control instructions for controlling the digital twin to complete the target task according to the environment data of the digital twin world, the pose of the agent, and the reinforcement learning network includes: The pose and the spatial semantic map used to represent the environmental data are input into the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin; the reinforcement learning network is based on the As a result of executing the control instruction, the digital twin is trained to obtain the control instruction to complete the target task. That is, simulation training is carried out in the digital twin world through the environment data, the pose of the agent and the reinforcement learning network, and it is continuously adjusted according to the feedback results until the control instruction to complete the target task is obtained.
在一些实施例中,所述强化学习网络输出的初始控制指令根据先验数据生成;其中,所述先验数据根据用户通过交互设备控制所述数字孪生体的动作获取得到。先验数据为能够实现目标任务或者接近实现目标任务的数据,采用先验数据作为初始控制指令,能够减少训练次数,降低数据处理的复杂度。In some embodiments, the initial control instruction output by the reinforcement learning network is generated according to prior data; wherein, the prior data is obtained according to a user's action of controlling the digital twin through an interactive device. The prior data is the data that can achieve the target task or is close to the target task. Using the prior data as the initial control instruction can reduce the number of training times and reduce the complexity of data processing.
在一些实施例中,所述数字孪生世界加载于云端服务器;所述根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令,包括:通过与所述云端服务器的交互,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令。将数字孪生世界加载于云端,极大程度上降低对于智能体自身的数据计算要求,减少设备设置的复杂度,同时云端服务器的数据处理能力普遍较高,能够进一步提高获取完成所述目标任务的控制指令的效率。In some embodiments, the digital twin world is loaded on a cloud server; the control system for controlling the digital twin to complete the target task is generated according to the environmental data of the digital twin world, the pose of the agent, and the reinforcement learning network. The instructions include: generating control instructions for controlling the digital twin to complete the target task based on the environmental data of the digital twin world, the pose of the agent, and the reinforcement learning network through interaction with the cloud server. Loading the digital twin world on the cloud can greatly reduce the data calculation requirements for the agent itself and reduce the complexity of equipment settings. At the same time, the data processing capabilities of cloud servers are generally high, which can further improve the acquisition and completion of the target tasks. Efficiency of control commands.
在一些实施例中,在所述获取目标任务后,所述生成用于控制数字孪生体完成所述目标任务的控制指令之前,还包括:关闭渲染功能;在所述生成用于控制数字孪生体完成所述目标任务的控制指令后,还包括:开启所述渲染功能。渲染功能用于向用户进行展示,且普遍占用计算资源较多;在生成完成所述目标任务的控制指令之前的数据对于用户一般不具有实际作用,所以在该时间段取消渲染功能,将设备的数据处理资源均应用于生成控制指令,能够提高控制指令的生成效率。在得到控制指令后,开启渲染功能,使得数字孪生体执行控制指令的过程对于用户呈可视化,用户可获知控制指令的模拟执行过程。In some embodiments, after the acquisition of the target task, before generating the control instruction for controlling the digital twin to complete the target task, it further includes: closing the rendering function; After the control instruction of the target task is completed, the method further includes: enabling the rendering function. The rendering function is used to display to the user, and generally occupies more computing resources; the data before generating the control command to complete the target task generally has no practical effect on the user, so the rendering function is canceled during this period, and the device's Data processing resources are all used to generate control instructions, which can improve the generation efficiency of control instructions. After the control command is obtained, the rendering function is turned on, so that the process of the digital twin executing the control command is visualized for the user, and the user can know the simulated execution process of the control command.
附图说明Description of drawings
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute a limitation to the embodiments. Elements with the same reference numerals in the drawings represent similar elements. Unless otherwise stated, the drawings in the drawings are not limited to scale.
图1是根据本申请的一个实施例所提供的智能体的控制方法的流程图;Fig. 1 is a flow chart of the control method of the agent provided according to one embodiment of the present application;
图2是根据本申请的一个实施例所提供的智能体的控制装置的示意图;Fig. 2 is a schematic diagram of a control device of an intelligent body provided according to an embodiment of the present application;
图3是根据本申请的一个实施例所提供的电子设备的示意图。Fig. 3 is a schematic diagram of an electronic device provided according to an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, various implementations of the present application will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present application, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列部件或单元的系统、产品或设备没有限定于已列出的部件或单元,而是可选地还包括没有列出的部件或单元,或可选地还包括对于这些产品或设备固有的其它部件或单元。本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。The terms "first" and "second" in the embodiments of the present application are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present application, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a system, product or equipment comprising a series of components or units is not limited to the listed components or units, but optionally also includes components or units not listed, or optionally also includes the components or units for these products or Other parts or units inherent in equipment. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.
本申请的一个实施例涉及一种智能体的控制方法。具体流程如图1所示。An embodiment of the present application relates to a method for controlling an agent. The specific process is shown in Figure 1.
步骤101,获取目标任务。Step 101, acquiring a target task.
步骤102,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令;其中,数字孪生世界通过对物理世界的仿真映射得到,数字孪生体位于数字孪生世界内,智能体位于物理世界,且与数字孪生体相对应。Step 102, according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, generate control instructions for controlling the digital twin to complete the target task; wherein, the digital twin world is obtained through the simulation mapping of the physical world, and the digital twin The twin is located in the digital twin world, and the agent is located in the physical world and corresponds to the digital twin.
步骤103,根据完成目标任务的控制指令,控制智能体执行目标任务。Step 103: Control the agent to execute the target task according to the control instruction for completing the target task.
本实施例中,通过数字孪生世界对物理世界进行模拟,并在数字孪生世界中存在与物理世界中智能体对应的数字孪生体;在数字孪生世界中通过控制指令操作数字孪生体,能够模拟控制指令操作智能体的结果,通过训练获取合适的控制指令以使得智能体执行目标任务。不需要考虑对RGBD等输入参数进行预处理的过程,也降低对智能体输出的控制指令的数据处理的复杂度,提高对于智能体的控制效率。In this embodiment, the physical world is simulated through the digital twin world, and there is a digital twin corresponding to the agent in the physical world in the digital twin world; in the digital twin world, the digital twin is operated through control instructions, and the control can be simulated. As a result of the instruction to operate the agent, appropriate control instructions are obtained through training to enable the agent to perform the target task. There is no need to consider the process of preprocessing the input parameters such as RGBD, and it also reduces the complexity of data processing of the control commands output by the agent, and improves the control efficiency for the agent.
下面对本实施方式的智能体的控制方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。其中,以下“训练”均代表获取完成目标任务的控制指令的过程。The implementation details of the control method of the agent in this embodiment will be described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this solution. Among them, the following "training" all represent the process of obtaining control instructions to complete the target task.
在步骤101中,获取目标任务。可以是从用户、其他交互设备或云端中获得目标任务;其中,目标任务例如是对指定物品进行移动,或抓取指定物品等与空间位置相关的任务。另外,目标任务并不限制一定需要三维位置关系,也可以是与三维空间位置无关,例如进行图像(二维)识别、音频处理、图文转化等,只要是机器人能够执行的即可。In step 101, a target task is acquired. The target task may be obtained from the user, other interactive devices, or the cloud; wherein, the target task is, for example, a task related to a spatial position such as moving a specified item or grabbing a specified item. In addition, the target task does not necessarily require a three-dimensional positional relationship, and it can also be independent of the three-dimensional space position, such as image (two-dimensional) recognition, audio processing, image-to-text conversion, etc., as long as it can be performed by the robot.
在步骤102中,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令;其中,数字孪生世界通过对物理世界的仿真映射得到,数字孪生体位于数字孪生世界内,智能体位于物理世界,且与数字孪生体相对应。具体的,数字孪生世界是根据现实的物理世界进行映射得到的,将物理世界中的环境转化为数字内容进行展现,能够模拟物理世界中的物体位置关系及相关环境信息等,在本实施例中并不限制数字孪生世界的获取方式,例如可以通过建模师建模获取,或直接扫描物理世界得到等;物理世界中的智能体可以为机器人,在数字孪生世界中存在与智能体(机器人)相对应的数字孪生体,可以在数字孪生世界中模拟智能体的行为活动,由于数字孪生世界是物理世界的数字化体现,则数字孪生体在数字孪生世界中进行活动时与周围环境产生的交互,能够模拟智能体在物理世界中进行相同活动时会引发的结果。在数字孪生世界中,涉及与物理世界对应的几何结构、空间位置、智能体的物理结构约束、物理特性仿真(如摩擦系数、重力等)。In step 102, according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated; wherein, the digital twin world simulates the physical world According to the mapping, the digital twin is located in the digital twin world, and the agent is located in the physical world and corresponds to the digital twin. Specifically, the digital twin world is obtained by mapping the real physical world, transforming the environment in the physical world into digital content for display, and can simulate the positional relationship of objects in the physical world and related environmental information. In this embodiment It does not limit the acquisition method of the digital twin world, for example, it can be obtained through modeling by a modeler, or directly scanning the physical world, etc.; the intelligent body in the physical world can be a robot, and there are intelligent bodies (robots) in the digital twin world The corresponding digital twin can simulate the behavior of the agent in the digital twin world. Since the digital twin world is a digital embodiment of the physical world, the interaction between the digital twin and the surrounding environment when it is active in the digital twin world, Ability to simulate the consequences of an agent performing the same activity in the physical world. In the digital twin world, it involves the geometric structure corresponding to the physical world, the spatial position, the physical structure constraints of the agent, and the simulation of physical characteristics (such as friction coefficient, gravity, etc.).
在一个例子中,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令,例如:将智能体的位姿和用于表征环境数据的空间语义地图,输入强化学习网络,强化学习网络输出用于控制数字孪生体的动作的控制指令;强化学习网络根据数字孪生体执行控制指令的结果,训练得到完成目标任务的控制指令。也就是说,将当前物理世界中智能体的位姿输入强化学习网络,强化学习网络还能够获取表征数字孪生世界的环境数据的空间语义地图;由于在数字孪生世界中数字孪生体与智能体对应,所以强化学习网络所获取的物理世界中智能体的位姿即为数字孪生体的初始动作,强化学习网络输出对于数字孪生体的动作进行改变的控制指令;数字孪生体在数字孪生世界中根据强化学习网络的控制指令改变动作,强化学习网络获取数字孪生体根据本次控制指令进行动作改变后所得到的结果,将其结果与目标任务相比较,并根据比较数据进行适应性调整控制指令,直到数字孪生体根据强化学习网络的某次控制指令在数字孪生世界中模拟完成了目标任务,则该次控制指令为控制智能体执行目标任务的控制指令。即,通过环境数据、智能体的位姿和强化学习网络在数字孪生世界中进行模拟训练,根据反馈结果不断调整,直至得到完成所述目标任务的控制指令。In one example, according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated, for example: the pose and pose of the agent are used to represent the environment The spatial semantic map of the data is input into the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin; the reinforcement learning network is trained to obtain the control instructions to complete the target task according to the result of the digital twin executing the control instructions. That is to say, inputting the pose of the agent in the current physical world into the reinforcement learning network, the reinforcement learning network can also obtain the spatial semantic map representing the environmental data of the digital twin world; since the digital twin corresponds to the agent in the digital twin world , so the pose of the agent in the physical world acquired by the reinforcement learning network is the initial action of the digital twin, and the reinforcement learning network outputs control instructions for changing the action of the digital twin; the digital twin is in the digital twin world according to The control instruction of the reinforcement learning network changes the action, and the reinforcement learning network obtains the result obtained by the digital twin after changing the action according to the control instruction, compares the result with the target task, and adjusts the control instruction adaptively according to the comparison data, Until the digital twin simulates and completes the target task in the digital twin world according to a certain control instruction of the reinforcement learning network, the control instruction is the control instruction that controls the agent to perform the target task. That is, simulation training is carried out in the digital twin world through the environment data, the pose of the agent and the reinforcement learning network, and it is continuously adjusted according to the feedback results until the control instruction to complete the target task is obtained.
可以理解的是,强化学习网络外获取数字孪生体根据本次控制指令进行动作改变后所得到的结果之外,还能够同时获取数字孪生体根据本次控制指令进行动作改变后空间语义地图的变化,与数字孪生体进行动作改变之后的结果相结合,判断是否执行完成目标任务。It is understandable that, in addition to obtaining the result obtained after the digital twin changes the action according to this control instruction outside the reinforcement learning network, it can also simultaneously obtain the change of the spatial semantic map after the digital twin performs the action change according to the control instruction. , combined with the result of the digital twin body after the action change, to determine whether to perform the target task.
其中,数字孪生体根据控制指令进行动作变化的结果,包括但不限于数字孪生体的底盘及全身肢体位姿、是否发生了碰撞、是否完成目标任务等;控制指令的内容包括但不限于:控制数字孪生体移动、肢体运动等。在一个例子中,强化学习网络存在不同的接口用于获取相关信息,例如状态观测接口,用于采集数字孪生世界的状态,涉及智能体底盘及全身肢体位姿、空间语义地图,例如当目标任务为拿起杯子时,可采集与目标杯子之间的距离等;动作控制接口,用于输出强化学习网络的控制指令,作用到数字孪生世界中,如控制数字孪生体移动、肢体运动等;反馈接口,用于采集数字孪生世界中数字孪生体根据控制指令执行动作时的结果反馈,如是否发生了碰撞,是否完成目标任务等。Among them, the result of the digital twin's action changes according to the control instructions, including but not limited to the chassis and body posture of the digital twin, whether a collision has occurred, whether the target task has been completed, etc.; the content of the control instructions includes but is not limited to: control Digital twin movement, limb movement, etc. In one example, the reinforcement learning network has different interfaces for obtaining relevant information, such as the state observation interface, which is used to collect the state of the digital twin world, involving the chassis of the agent and the pose of the whole body, and the spatial semantic map, such as when the target task When picking up the cup, it can collect the distance from the target cup, etc.; the action control interface is used to output the control instructions of the reinforcement learning network, which can be applied to the digital twin world, such as controlling the movement of the digital twin body, limb movement, etc.; feedback The interface is used to collect the result feedback when the digital twin in the digital twin world performs actions according to the control instructions, such as whether there is a collision, whether the target task is completed, etc.
在一些例子中,在数字孪生体获取智能体的位姿,并将其作为自己的初始位姿之后,强化学习网络开始基于数字孪生世界,根据目标任务逐步向数字孪生体输出控制指令,使其能够完成目标任务。其中强化学习网络可以将待完成的目标任务分为多个子步骤,每个子步骤对应的控制指令发送给数字孪生体之后,都获取数字孪生体执行该控制指令后的反馈,判断是否完成该子步骤的任务,从而逐渐获取能够完成目标任务的一组控制指令。可以理解的是,根据反馈进行调节的过程中,不仅能够对于每个子步骤中的控制指令进行调整,使得数字孪生体能够根据该控制指令完成该子步骤;若对于某一子步骤经过较长的时间周期(可预设)仍无法得到能够实现其的控制指令,则可能是该子步骤的设定不合理,可以调整该子步骤,或者舍弃该子步骤,相应的,还能够继续随即调整前后的步骤等等,本实施例并不进行限定。另外,在复杂度过高、占用运算空间过大或者出错率超过预设阈值等情况下,均能够考虑对子步骤进行调整,即,在不满足预设条件的情况下均能够考虑调整,本实施例并不进行具体限定。In some examples, after the digital twin acquires the pose of the agent and uses it as its initial pose, the reinforcement learning network starts to output control instructions to the digital twin gradually based on the digital twin world according to the target task, so that it can Able to complete target tasks. Among them, the reinforcement learning network can divide the target task to be completed into multiple sub-steps. After the control instruction corresponding to each sub-step is sent to the digital twin, it will obtain the feedback after the digital twin executes the control instruction, and judge whether to complete the sub-step. task, so as to gradually acquire a set of control instructions that can complete the target task. It can be understood that in the process of adjusting according to the feedback, not only can the control instruction in each sub-step be adjusted, so that the digital twin can complete the sub-step according to the control instruction; If the time period (presettable) still fails to obtain the control instruction that can realize it, it may be that the setting of the sub-step is unreasonable, the sub-step can be adjusted, or the sub-step can be discarded. Correspondingly, it can continue to adjust before and after The steps and the like are not limited in this embodiment. In addition, when the complexity is too high, the calculation space is too large, or the error rate exceeds the preset threshold, etc., the adjustment of the sub-steps can be considered, that is, the adjustment can be considered when the preset conditions are not met. Examples are not specifically limited.
对于执行某一子步骤,在一个例子中,强化学习网络会根据数字孪生体执行控制指令的反馈进行调整,例如在执行移动指令的过程中,若与环境产生碰撞,则可以选择将数字孪生体恢复到执行该移动指令之前的初始位置,通过减少移动距离或者调整动作角度来更新该移动指令,数字孪生体执行更新后的移动指令,直到数字孪生体完成该子步骤;数字孪生体完成本次移动指令的子步骤,例如达到该移动指令的目的地、或者没有与周围环境产生碰撞的达到该移动指令的目的地等。随后,强化学习网络获取数字孪生体执行该子步骤成功的结果,例如从数字孪生体向强化学习网络反馈得到本子步骤执行成功的结果,或者强化学习网络通过对数字孪生世界的监测发现数字孪生体完成了本子步骤等;在强化学习网络获取数字孪生体执行该子步骤成功的信息后,可以进行对下一个子步骤的控制指令的训练过程,若该子步骤为完成目标任务的最后一个子步骤或者目标任务仅存在这一个步骤,则可以将获取的所有子步骤执行成功的控制指令整合,得到完成本目标任务的控制指令。即,在数字孪生体执行强化学习网络所输出的控制指令中,通过多次试错,以得到能够完成目标任务的一组控制指令。For the execution of a certain sub-step, in one example, the reinforcement learning network will adjust according to the feedback of the digital twin to execute the control command. For example, in the process of executing the movement command, if there is a collision with the environment, the digital twin can be selected Return to the initial position before executing the movement command, update the movement command by reducing the movement distance or adjusting the action angle, and the digital twin executes the updated movement command until the digital twin completes the sub-step; the digital twin completes this time The sub-steps of the movement instruction, for example, reach the destination of the movement instruction, or reach the destination of the movement instruction without colliding with the surrounding environment. Subsequently, the reinforcement learning network obtains the result of the successful execution of the sub-step by the digital twin, for example, feedback from the digital twin to the reinforcement learning network to obtain the result of the successful execution of this sub-step, or the reinforcement learning network discovers the digital twin through monitoring the digital twin world Complete this sub-step, etc.; after the reinforcement learning network obtains the information that the digital twin successfully executes this sub-step, it can carry out the training process of the control instruction for the next sub-step, if this sub-step is the last sub-step to complete the target task Or if there is only this one step in the target task, then the obtained control instructions of all the sub-steps successfully executed can be integrated to obtain the control instruction to complete the target task. That is, in the control instructions output by the digital twin to execute the reinforcement learning network, through multiple trials and errors, a set of control instructions that can complete the target task is obtained.
在一个例子中,强化学习网络输出的初始控制指令根据先验数据生成;其中,先验数据根据用户通过交互设备控制数字孪生体的动作获取得到。即,为了减少强化学习网络的控制指令的调整次数,或为了降低强化学习网络的运算内存占用,根据先验数据生成初始控制指令,先验数据根据用户通过交互设备控制数字孪生体的动作获取得到,也就是人工控制指令,或者可以是在历史记录中,能够完成目标任务或接近完成目标任务的控制指令。在初始控制指令根据先验数据生成时,由于接近完成目标任务,则需要强化学习网络训练得到完成目标任务的控制指令的过程效率较高,减少中间的调试次数,降低数据运算的内存占用。其中,交互设备包括鼠标、键盘、体感设备其中之一或其任意组合。从而先验数据为能够实现目标任务或者接近实现目标任务的数据,采用先验数据作为初始控制指令,能够减少训练次数,降低数据处理的复杂度。In one example, the initial control instruction output by the reinforcement learning network is generated based on prior data; wherein the prior data is obtained based on the actions of the user controlling the digital twin through the interactive device. That is, in order to reduce the number of adjustments of the control instructions of the reinforcement learning network, or to reduce the computing memory usage of the reinforcement learning network, the initial control instructions are generated based on the prior data, and the prior data is obtained according to the actions of the user controlling the digital twin through the interactive device , that is, manual control instructions, or in historical records, control instructions that can complete the target task or are close to completing the target task. When the initial control instruction is generated based on the prior data, since the target task is close to completion, the process of obtaining the control instruction to complete the target task through reinforcement learning network training is more efficient, reduces the number of intermediate debugging, and reduces the memory usage of data operations. Wherein, the interactive device includes one of a mouse, a keyboard, a somatosensory device or any combination thereof. Therefore, the prior data is data that can achieve the target task or is close to achieving the target task. Using the prior data as the initial control instruction can reduce the number of training times and reduce the complexity of data processing.
例如,对于数字孪生世界中,可以获取由训练师通过鼠标键盘及体感设备等输入的指令,来控制数字孪生体与数字孪生世界中的环境、物体或其他数据体进行交互,生成高质量的专业的先验数据,以此来提升强化学习网络的学习效率与质量。其中,从训练师处获取的控制指令,相较于强化学习网络自主生成的控制指令,对于目标任务的完成率大大提高。强化学习网络在没有外部控制指令(例如此处的训练师指令)干涉时,可以根据目标任务随机生成控制指令,或者根据部分标签信息生成不同类别的控制指令等,也就是不能够保证最初生成的控制指令与目标任务的相关性。若相关性不高,则会在训练过程中出现大量的需要调整的控制指令,对于数据处理过程中占用的空间要求高,且需要处理的时间长。但若存在从训练师处获取的控制指令,则能够作为先验数据,在与目标任务相关性高的训练师输入的控制指令的基础上进行训练,能够大大减少控制指令的调整需求,降低运算所需的存储空间和时间。例如,对于需要拿起在a1处的水杯的任务,训练师进行了控制指令的输入,数字孪生体得以成功完成;在目标任务为需要拿起在a2处的水杯时,查询到有相似的拿起a1处的水杯的任务存在能够执行的控制指令,在前述的训练师输入的控制指令的基础上进行训练,相较于直接训练完成拿起在a2处的水杯的控制指令,能够显著减少计算所需的时间,降低运算复杂度,提升用户体验。For example, in the digital twin world, instructions input by the trainer through the mouse, keyboard, and somatosensory equipment can be obtained to control the digital twin to interact with the environment, objects or other data in the digital twin world to generate high-quality professional In order to improve the learning efficiency and quality of reinforcement learning network. Among them, the control instructions obtained from the trainer, compared with the control instructions independently generated by the reinforcement learning network, greatly improve the completion rate of the target task. Without the intervention of external control instructions (such as the trainer instructions here), the reinforcement learning network can randomly generate control instructions according to the target task, or generate different types of control instructions according to part of the label information, that is, it cannot guarantee the original generated Relevance of control instructions to target tasks. If the correlation is not high, there will be a large number of control instructions that need to be adjusted during the training process, which requires a high space occupied during the data processing process and takes a long time to process. However, if there are control instructions obtained from the trainer, it can be used as a priori data to train on the basis of the control instructions input by the trainer with high correlation with the target task, which can greatly reduce the adjustment requirements of the control instructions and reduce the computation time. required storage space and time. For example, for the task of picking up the water cup at a1, the trainer inputs the control command, and the digital twin is successfully completed; when the target task is to pick up the water cup at a2, the query finds that there are similar There are executable control instructions for the task of picking up the water cup at a1, and training based on the aforementioned control instructions input by the trainer can significantly reduce calculations compared to direct training to complete the control instruction for picking up the water cup at a2 The required time is reduced, the computational complexity is reduced, and the user experience is improved.
在一个例子中,空间语义地图包括:数字孪生世界中各物体的位姿、3D碰撞盒、物体分类信息以及物体材质信息。数字孪生世界中各物体的位姿用于模拟物理世界中智能体所处的环境中的周围物体的位置;3D碰撞盒用于规定或限制数字孪生世界中的碰撞关系,使其更接近物理世界的运动情况;物体分类信息例如包括物体的物理结构,物体材质信息用于模拟物理世界中环境对于智能体移动前后的细节物理特征,例如摩擦系数,滑动等。In one example, the spatial semantic map includes: poses of objects in the digital twin world, 3D collision boxes, object classification information, and object material information. The pose of each object in the digital twin world is used to simulate the position of the surrounding objects in the environment where the agent is located in the physical world; the 3D collision box is used to specify or limit the collision relationship in the digital twin world, making it closer to the physical world The movement situation of the object; the object classification information includes, for example, the physical structure of the object, and the object material information is used to simulate the detailed physical characteristics of the environment in the physical world before and after the agent moves, such as friction coefficient, sliding, etc.
在一个例子中,强化学习网络包括:DQN(Deep Q Network,深度Q值)网络模型;DQN网络模型的输入为包括智能体的位姿和空间语义地图的RGBD图像,DQN网络模型的输出为机械臂各关节的动作。以针对机械臂自主抓取杯子的DQN(Deep Q Network,深度Q值)网络模型为例,模型的输入是RGBD图像,输出是机械臂各关节的动作,这里每个关节的动作分别为[逆时针转动1°,保持静止,顺时针转动1°],在网络中这三个动作用[-1,0,1]来代替。本例中的机械臂共7个关节,因此每一帧,DQN输入一个RGBD图像,输出一个7×3的数组。In one example, reinforcement learning networks include: DQN (Deep Q Network, depth Q value) network model; the input of the DQN network model is an RGBD image including the pose of the agent and the spatial semantic map, and the output of the DQN network model is the action of each joint of the robotic arm. DQN (Deep Q Network, depth Q value) network model as an example, the input of the model is an RGBD image, and the output is the movement of each joint of the robotic arm. °], these three actions are replaced by [-1,0,1] in the network. The robotic arm in this example has a total of 7 joints, so for each frame, DQN inputs an RGBD image and outputs a 7×3 array.
在一个例子中,先验数据通过以下方式获取得到:通过交互设备接收用户基于采集得到的RGBD图像输入的用于控制机械臂的操作指令;记录机械臂执行操作指令过程中的机械臂各关节的动作;将RGBD图像和机械臂各关节的动作作为先验数据进行保存。例如,对于先验数据的获取与使用,在已经建好的数字孪生世界中,训练师通过观察采集得到的RGBD图像,操作键盘、鼠标或体感设备等控制机械臂来完成杯子抓取的任务,在任务的完成过程中,会自动的记录每个关节的转动情况,这些转动情况与RGBD图像联合形成先验数据,作为DQN的初始数据。In one example, the a priori data is obtained in the following ways: through the interactive device, the user receives the operation instructions for controlling the mechanical arm based on the collected RGBD image; Action: save the RGBD image and the action of each joint of the manipulator as prior data. For example, for the acquisition and use of prior data, in the established digital twin world, the trainer completes the cup grasping task by observing the collected RGBD images, operating the keyboard, mouse or somatosensory devices to control the robotic arm, During the completion of the task, the rotation of each joint is automatically recorded, and these rotations are combined with the RGBD image to form prior data, which is used as the initial data of DQN.
另外,本实施例针对不同的目标任务及强化学习网络,并不仅限于一定获取RGBD图像,或者仅获取RGBD图像信息和智能体位姿等。In addition, this embodiment is aimed at different target tasks and reinforcement learning networks, and is not limited to acquiring RGBD images, or only acquiring RGBD image information and agent poses.
在一个例子中,数字孪生世界加载于云端服务器;根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令,包括:通过与云端服务器的交互,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令。即,数字孪生世界的处理需要复杂度高的设备支持,且占用的计算资源较多;将数字孪生世界加载于云端服务器,能够降低智能体设备的计算能力需求,同时云端服务器的计算能力较强,能够提高完成目标任务的控制指令的生成效率。其中,强化学习网络也可以位于云端服务器,能够进一步降低智能体所需的数据计算资源,提高完成目标任务的控制指令的生成效率。In one example, the digital twin world is loaded on the cloud server; according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated, including: through communication with the cloud The interaction of the server generates control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network. That is, the processing of the digital twin world requires high-complexity equipment support and takes up more computing resources; loading the digital twin world on the cloud server can reduce the computing power requirements of the agent device, and the computing power of the cloud server is relatively strong. , which can improve the generation efficiency of the control instruction to complete the target task. Among them, the reinforcement learning network can also be located in the cloud server, which can further reduce the data computing resources required by the agent and improve the generation efficiency of control instructions to complete the target task.
在步骤103中,根据完成目标任务的控制指令,控制智能体执行目标任务。在一些例子中,在强化学习网络根据模拟训练生成完成目标任务的控制指令之后,智能体接收该控制指令,并执行该控制指令用于完成目标任务。即,将数字孪生世界加载于云端,极大程度上降低对于智能体自身的数据计算要求,减少设备设置的复杂度,同时云端服务器的数据处理能力普遍较高,能够进一步提高获取完成所述目标任务的控制指令的效率。In step 103, the agent is controlled to execute the target task according to the control instruction for completing the target task. In some examples, after the reinforcement learning network generates a control instruction to complete the target task according to the simulation training, the agent receives the control instruction and executes the control instruction to complete the target task. That is, loading the digital twin world on the cloud can greatly reduce the data calculation requirements for the agent itself and reduce the complexity of device settings. At the same time, the data processing capabilities of cloud servers are generally high, which can further improve the acquisition and completion of the above goals. Efficiency of the task's control instructions.
在一个例子中,在获取目标任务后,生成用于控制数字孪生体完成目标任务的控制指令之前,还包括:关闭渲染功能;在生成用于控制数字孪生体完成目标任务的控制指令后,还包括:开启渲染功能。渲染功能用于向用户进行展示,在获取完成目标任务的控制指令之前,由于渲染功能普遍占用计算资源较多;在生成完成所述目标任务的控制指令之前的数据对于用户一般不具有实际作用,因此,关闭渲染功能,不对用户展示训练过程,降低数据运算所需的空间,并在该时间段取消渲染功能,将设备的数据处理资源均应用于生成控制指令,能够提高控制指令的生成效率;在完成目标任务的控制指令后,数字孪生体执行控制指令的过程中,开启渲染功能,使得控制指令的执行对于用户可视化,用户能够准确感知到控制指令的执行过程。例如,还可以根据观察到的控制指令的执行情况进行人工干预,提高生成完成目标任务的控制指令的效率。具体地,在进行数据训练的过程中可以将训练的过程数据放置在存储空间,保证能够完成存取即可,例如放置在cpu(central processing unit,中央处理器)中,并不将过程数据渲染展示,以降低训练复杂度,并且渲染所需要的时间较长,减少渲染也能提高训练效率;在训练完成或几乎完成的时候,将训练的过程数据进行渲染展示,一方面可以使用户能切实感知训练到结果,另一方面可以观察训练得到的控制指令是否符合人的行为习惯,例如,执行拿起杯子指令的时候,一般人的行为习惯是将杯口朝上拿起,本次训练的到的控制指令虽然确实完成了将杯子拿起的目的,但最终拿起的杯子是杯口朝下的,并不符合一般人的行为习惯,训练过程无法准确发现这种控制指令执行后不符合人的行为习惯的结果,然而靠渲染后观测是很容易察觉并进一步优化的。In an example, after obtaining the target task, before generating the control instruction for controlling the digital twin to complete the target task, it also includes: closing the rendering function; after generating the control instruction for controlling the digital twin to complete the target task, further Including: Turn on the rendering function. The rendering function is used to display to the user. Before obtaining the control instruction to complete the target task, the rendering function generally occupies more computing resources; the data before generating the control instruction to complete the target task generally has no practical effect on the user. Therefore, turning off the rendering function, not showing the training process to the user, reducing the space required for data calculation, and canceling the rendering function during this time period, and using all the data processing resources of the device to generate control commands can improve the generation efficiency of control commands; After completing the control instruction of the target task, the digital twin will turn on the rendering function during the execution of the control instruction, so that the execution of the control instruction can be visualized for the user, and the user can accurately perceive the execution process of the control instruction. For example, manual intervention can also be performed according to the observed execution of control instructions to improve the efficiency of generating control instructions to complete target tasks. Specifically, in the process of data training, the training process data can be placed in the storage space to ensure that access can be completed, for example, placed in the CPU (central In the processing unit, central processing unit), the process data is not rendered and displayed to reduce the training complexity, and the rendering takes a long time, reducing the rendering can also improve the training efficiency; when the training is completed or almost completed, the Rendering and displaying the training process data, on the one hand, can enable users to perceive the training results, and on the other hand, can observe whether the control instructions obtained from the training conform to human behavior habits. The habit is to pick up the cup with the rim facing up. Although the control instructions obtained in this training did achieve the purpose of picking up the cup, the cup that was finally picked up was with the rim facing down, which does not conform to the behavior habits of ordinary people. The training process cannot accurately discover the results that do not conform to human behavior habits after the execution of such control instructions, but it is easy to detect and further optimize by observing after rendering.
在一个例子中,在根据完成目标任务的控制指令,控制智能体执行目标任务之后,还包括:在智能体执行目标任务失败的情况下,接收用户通过交互设备输入的辅助指令,辅助指令用于控制智能体成功执行目标任务;在成功执行目标任务后,根据执行辅助指令过程中的机械臂各关节的动作,更新先验数据在强化学习网络收敛后,在后续的使用过程中,出现了失败案例,可以通过人工介入,针对失败的情况进行人工辅助,再生成一次先验数据,这些先验数据可以更新到DQN网络模型中去,提升智能体在下次面对这种情况时的鲁棒性。达到在失败中学习的目的。In one example, after controlling the agent to perform the target task according to the control instruction for completing the target task, it also includes: in the case that the agent fails to perform the target task, receiving an auxiliary instruction input by the user through the interactive device, the auxiliary instruction is used for Control the agent to successfully execute the target task; after successfully executing the target task, update the prior data according to the actions of the joints of the robotic arm during the execution of the auxiliary instructions. After the reinforcement learning network converges, failure occurs in the subsequent use process In case of failure, human intervention can be used to provide artificial assistance to generate prior data. These prior data can be updated to the DQN network model to improve the robustness of the agent when facing this situation next time. . To achieve the purpose of learning from failure.
在一个例子中,数字孪生世界根据物理世界实时同步更新。由于数字孪生世界是为了模拟物理世界中的运动过程,实现反馈训练的目的,所以若物理世界产生变化,需要同步变更数字孪生世界的数据信息,以保证数字孪生世界中的模拟结果符合实际的物理世界中的运动状态和结果。In one example, the digital twin world is updated synchronously with the physical world in real time. Since the digital twin world is to simulate the movement process in the physical world and achieve the purpose of feedback training, if the physical world changes, the data information of the digital twin world needs to be changed synchronously to ensure that the simulation results in the digital twin world conform to the actual physical Motion state and results in the world.
在一个例子中,本实施例可以通过三维重建技术对真实的物理世界进行虚拟重建,得到按1:1复原真实世界的数字孪生世界,并在其中加入数字孪生体,数字孪生体与物理世界中的智能体对应。或者采用ElasticFusion技术利用深度相机对环境进行扫描获取数字孪生世界,并通过人工对建立成果进行精修。在数字孪生世界中,训练师可以通过键盘鼠标对数字孪生世界中的数字孪生体进行控制,使其完成目标任务(如抓取杯子、倒饮料、打开柜门等工作)。对于特定的任务生成充分的先验数据,然后基于先验数据开始强化学习网络进行训练。训练过程在数字孪生世界中进行。当训练收敛后,可以利用该强化学习网络控制真实世界的智能体,完成相应的任务。In one example, this embodiment can use 3D reconstruction technology to perform virtual reconstruction of the real physical world to obtain a digital twin world that restores the real world at a ratio of 1:1, and add a digital twin to it. The digital twin and the physical world corresponding to the intelligent agent. Or use ElasticFusion technology to scan the environment with a depth camera to obtain a digital twin world, and refine the results manually. In the digital twin world, the trainer can control the digital twin in the digital twin world through the keyboard and mouse, so that it can complete target tasks (such as grabbing cups, pouring drinks, opening cabinet doors, etc.). Generate sufficient prior data for a specific task, and then start the reinforcement learning network for training based on the prior data. The training process takes place in the digital twin world. After the training converges, the reinforcement learning network can be used to control real-world agents to complete corresponding tasks.
在本实施例中,通过数字孪生世界对物理世界进行模拟,并在数字孪生世界中存在与物理世界中智能体对应的数字孪生体;在数字孪生世界中通过控制指令操作数字孪生体,能够模拟控制指令操作智能体的结果,通过训练获取合适的控制指令以使得智能体执行目标任务。不需要考虑对RGBD等输入参数进行预处理的过程,也降低对智能体输出的控制指令的数据计算的复杂度,提高对于智能体的控制效率。其中,智能体可以为机器人,即通过数字孪生世界的模拟,降低机器人的控制的复杂程度,提高机器人的控制效率。In this embodiment, the physical world is simulated through the digital twin world, and there is a digital twin body corresponding to the agent in the physical world in the digital twin world; in the digital twin world, the digital twin body can be simulated by controlling instructions. The control instruction is the result of operating the agent, and the appropriate control instruction is obtained through training to make the agent perform the target task. There is no need to consider the process of preprocessing the input parameters such as RGBD, and it also reduces the complexity of data calculation for the control commands output by the agent, and improves the control efficiency for the agent. Among them, the intelligent body can be a robot, that is, through the simulation of the digital twin world, the complexity of the control of the robot can be reduced and the control efficiency of the robot can be improved.
在一些实施例中,还可以将智能体控制分成三段:首先运用数字孪生技术实现物理世界与数字孪生世界的1:1仿真映射,实时同步更新虚拟世界;其次在数字孪生世界中基于强化学习网络,采用孪生世界的空间语义地图及智能体位姿为输入进行训练与决策,并控制智能体对应的数字孪生体;最后将数字孪生体的行为同步控制物理世界中的智能体。有效避免了直接基于RGB-D数据进行训练的复杂度问题,算法收敛快,同时算法输出不直接控制物理设备,有效降低了虚实迁移成本。In some embodiments, the agent control can also be divided into three stages: first, the digital twin technology is used to realize the 1:1 simulation mapping between the physical world and the digital twin world, and the virtual world is updated synchronously in real time; secondly, in the digital twin world, based on reinforcement learning The network uses the spatial semantic map of the twin world and the pose of the agent as input for training and decision-making, and controls the digital twin corresponding to the agent; finally, the behavior of the digital twin is synchronized to control the agent in the physical world. It effectively avoids the complexity problem of training directly based on RGB-D data, and the algorithm converges quickly. At the same time, the algorithm output does not directly control the physical equipment, which effectively reduces the cost of virtual-real migration.
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
本申请的一个实施例涉及一种智能体的控制装置,如图2所示,包括:An embodiment of the present application relates to a control device for an intelligent body, as shown in FIG. 2 , including:
获取模块201,用于获取目标任务。An acquisition module 201, configured to acquire a target task.
生成模块202,用于根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令;其中,数字孪生世界通过对物理世界的仿真映射得到,数字孪生体位于数字孪生世界内,智能体位于所述物理世界,且与数字孪生体相对应。The generation module 202 is used to generate control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network; The simulation mapping shows that the digital twin is located in the digital twin world, and the agent is located in the physical world and corresponds to the digital twin.
执行模块203,用于根据完成目标任务的控制指令,控制智能体执行目标任务。The execution module 203 is configured to control the agent to execute the target task according to the control instruction for completing the target task.
下面对本实施例的智能体的控制装置的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。The implementation details of the control device of the intelligent body in this embodiment will be described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this solution.
对于生成模块202,在一个例子中,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令,包括:将智能体的位姿和用于表征环境数据的空间语义地图,输入强化学习网络,强化学习网络输出用于控制数字孪生体的动作的控制指令;强化学习网络根据数字孪生体执行控制指令的结果,训练得到完成目标任务的控制指令。For the generation module 202, in an example, according to the environment data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated, including: the pose of the agent And the spatial semantic map used to represent the environmental data, input the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin; the reinforcement learning network is trained to complete the target task according to the results of the digital twin executing the control instructions control instructions.
在一个例子中,强化学习网络输出的初始控制指令根据先验数据生成;其中,先验数据根据用户通过交互设备控制数字孪生体的动作获取得到。In one example, the initial control instruction output by the reinforcement learning network is generated based on prior data; wherein the prior data is obtained based on the actions of the user controlling the digital twin through the interactive device.
在一个例子中,空间语义地图包括:数字孪生世界中各物体的位姿、3D碰撞盒、物体分类信息以及物体材质信息。In one example, the spatial semantic map includes: poses of objects in the digital twin world, 3D collision boxes, object classification information, and object material information.
在一个例子中,数字孪生世界加载于云端服务器;根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令,包括:通过与云端服务器的交互,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成目标任务的控制指令。In one example, the digital twin world is loaded on the cloud server; according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated, including: through communication with the cloud The interaction of the server generates control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network.
在一个例子中,在获取目标任务后,生成用于控制数字孪生体完成目标任务的控制指令之前,还包括:关闭渲染功能。In an example, after acquiring the target task, before generating a control instruction for controlling the digital twin to complete the target task, further includes: closing the rendering function.
在一个例子中,强化学习网络包括:DQN网络模型;DQN网络模型的输入为包括智能体的位姿和空间语义地图的RGBD图像,DQN网络模型的输出为机械臂各关节的动作。In one example, the reinforcement learning network includes: a DQN network model; the input of the DQN network model is an RGBD image including the pose of the agent and the spatial semantic map, and the output of the DQN network model is the action of each joint of the mechanical arm.
对于执行模块203,在一个例子中,在生成用于控制数字孪生体完成目标任务的控制指令后,还包括:开启渲染功能。For the execution module 203, in an example, after generating the control instruction for controlling the digital twin to complete the target task, it further includes: enabling the rendering function.
另外,数字孪生世界根据物理世界实时同步更新。In addition, the digital twin world is updated synchronously with the physical world in real time.
在物理世界中,智能体可以为机器人。In the physical world, agents can be robots.
在本实施例中,通过数字孪生世界对物理世界进行模拟,并在数字孪生世界中存在与物理世界中智能体对应的数字孪生体;在数字孪生世界中通过控制指令操作数字孪生体,能够模拟控制指令操作智能体的结果,从而获取合适的控制指令以使得智能体执行目标任务。不需要考虑对RGBD等输入参数进行预处理的过程,也降低对智能体输出的控制指令的数据处理的复杂度,提高对于智能体的控制效率。In this embodiment, the physical world is simulated through the digital twin world, and there is a digital twin body corresponding to the agent in the physical world in the digital twin world; in the digital twin world, the digital twin body can be simulated by controlling instructions. Control instructions operate on the result of the agent, so as to obtain appropriate control instructions to make the agent perform the target task. There is no need to consider the process of preprocessing the input parameters such as RGBD, and it also reduces the complexity of data processing of the control commands output by the agent, and improves the control efficiency for the agent.
不难发现,本实施例为与上述实施例相对应的系统实施例,本实施例可与上述实施例互相配合实施。上述实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述实施例中。It is not difficult to find that this embodiment is a system embodiment corresponding to the above-mentioned embodiments, and this embodiment can be implemented in cooperation with the above-mentioned embodiments. The relevant technical details mentioned in the foregoing embodiments are still valid in this embodiment, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
本申请的一个实施例涉及一种电子设备,如图3所示,包括至少一个处理器301;以及,与至少一个处理器301通信连接的存储器302;其中,存储器302存储有可被至少一个处理器301执行的指令,指令被至少一个处理器301执行,以使至少一个处理器301能够执行上述的智能体的控制方法。An embodiment of the present application relates to an electronic device, as shown in FIG. 3 , including at least one processor 301; and a memory 302 connected in communication with at least one processor 301; wherein, the memory 302 stores information that can be processed by at least one The instructions executed by the processor 301 are executed by at least one processor 301, so that the at least one processor 301 can execute the above-mentioned control method of the agent.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
本申请的一个实施例涉及一种计算机程序,该计算机程序被处理器执行时实现如上任一实施例所述的智能体的控制方法。An embodiment of the present application relates to a computer program. When the computer program is executed by a processor, the method for controlling an agent as described in any of the above embodiments is implemented.
本申请的一个实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。One embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the methods of the above embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical discs and other media that can store program codes.
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific embodiments for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.

Claims (14)

  1. 一种智能体的控制方法,包括:A control method for an agent, comprising:
    获取目标任务;Get the target task;
    根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令;其中,所述数字孪生世界通过对物理世界的仿真映射得到,所述数字孪生体位于所述数字孪生世界内,所述智能体位于所述物理世界,且与所述数字孪生体相对应;According to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated; wherein, the digital twin world is obtained through simulation mapping of the physical world, The digital twin is located in the digital twin world, and the agent is located in the physical world and corresponds to the digital twin;
    根据所述完成目标任务的控制指令,控制所述智能体执行所述目标任务。The agent is controlled to execute the target task according to the control instruction for completing the target task.
  2. 根据权利要求1所述的智能体的控制方法,其中,所述根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令,包括:The control method of the intelligent body according to claim 1, wherein, according to the environmental data of the digital twin world, the pose of the intelligent body and the reinforcement learning network, a control instruction for controlling the digital twin to complete the target task is generated ,include:
    将所述智能体的位姿和用于表征所述环境数据的空间语义地图,输入所述强化学习网络,所述强化学习网络输出用于控制所述数字孪生体的动作的控制指令;Inputting the pose of the agent and the spatial semantic map used to represent the environmental data into the reinforcement learning network, and the reinforcement learning network outputs control instructions for controlling the actions of the digital twin;
    所述强化学习网络根据所述数字孪生体执行所述控制指令的结果,训练得到完成所述目标任务的控制指令。The reinforcement learning network is trained to obtain a control instruction to complete the target task according to the result of the digital twin executing the control instruction.
  3. 根据权利要求2所述的智能体的控制方法,其中,所述强化学习网络包括:深度Q值网络DQN网络模型;The control method of the agent according to claim 2, wherein, the reinforcement learning network comprises: a deep Q value network DQN network model;
    所述DQN网络模型的输入为包括所述智能体的位姿和所述空间语义地图的RGBD图像,所述DQN网络模型的输出为机械臂各关节的动作。The input of the DQN network model is an RGBD image including the pose of the agent and the spatial semantic map, and the output of the DQN network model is the action of each joint of the manipulator.
  4. 根据权利要求2或3所述的智能体的控制方法,其中,所述强化学习网络输出的初始控制指令根据先验数据生成;The control method for an agent according to claim 2 or 3, wherein the initial control instruction output by the reinforcement learning network is generated according to prior data;
    其中,所述先验数据根据用户通过交互设备控制所述数字孪生体的动作获取得到。Wherein, the prior data is obtained according to the action of the user controlling the digital twin through the interactive device.
  5. 根据权利要求4所述的智能体的控制方法,其中,所述先验数据通过以下方式获取得到:The control method for an agent according to claim 4, wherein the prior data is acquired in the following manner:
    通过交互设备接收用户基于采集得到的RGBD图像输入的用于控制机械臂的操作指令;Receive the user's operation instructions for controlling the robotic arm based on the collected RGBD image input through the interactive device;
    记录所述机械臂执行所述操作指令过程中的机械臂各关节的动作;recording the movements of the joints of the robotic arm during the execution of the operation instruction by the robotic arm;
    将所述RGBD图像和所述机械臂各关节的动作作为先验数据进行保存。The RGBD image and the actions of the joints of the manipulator are stored as prior data.
  6. 根据权利要求4或5所述的智能体的控制方法,其中,在所述根据所述完成目标任务的控制指令,控制所述智能体执行所述目标任务之后,还包括:The control method for an agent according to claim 4 or 5, wherein, after controlling the agent to perform the target task according to the control instruction for completing the target task, further comprising:
    在所述智能体执行所述目标任务失败的情况下,接收用户通过交互设备输入的辅助指令,所述辅助指令用于控制所述智能体成功执行所述目标任务;In the case that the intelligent body fails to perform the target task, receiving an auxiliary instruction input by the user through an interactive device, the auxiliary instruction is used to control the intelligent body to successfully perform the target task;
    在成功执行所述目标任务后,根据执行所述辅助指令过程中的机械臂各关节的动作,更新所述先验数据。After the target task is successfully executed, the prior data is updated according to the actions of the joints of the manipulator during the execution of the auxiliary instruction.
  7. 根据权利要求2至6中任一项所述的智能体的控制方法,其中,所述空间语义地图包括:The control method of the agent according to any one of claims 2 to 6, wherein the spatial semantic map comprises:
    所述数字孪生世界中各物体的位姿、3D碰撞盒、物体分类信息以及物体材质信息。The pose, 3D collision box, object classification information and object material information of each object in the digital twin world.
  8. 根据权利要求1至7中任一项所述的智能体的控制方法,其中,所述数字孪生世界加载于云端服务器;The control method of the intelligent body according to any one of claims 1 to 7, wherein the digital twin world is loaded on a cloud server;
    所述根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令,包括:According to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, the control instruction for controlling the digital twin to complete the target task is generated, including:
    通过与所述云端服务器的交互,根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令。Through the interaction with the cloud server, according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network, control instructions for controlling the digital twin to complete the target task are generated.
  9. 根据权利要求1至8中任一项所述的智能体的控制方法,其中,在所述获取目标任务后,所述生成用于控制数字孪生体完成所述目标任务的控制指令之前,还包括:The control method of the intelligent body according to any one of claims 1 to 8, wherein, after the acquisition of the target task, before the generation of the control instruction for controlling the digital twin to complete the target task, further comprising :
    关闭渲染功能;Turn off the rendering function;
    在所述生成用于控制数字孪生体完成所述目标任务的控制指令后,还包括:After the generation of the control instruction for controlling the digital twin to complete the target task, it also includes:
    开启所述渲染功能。Turn on the rendering function.
  10. 根据权利要求1至9中任一项所述的智能体的控制方法,其中,所述数字孪生世界根据所述物理世界实时同步更新。The control method for an agent according to any one of claims 1 to 9, wherein the digital twin world is updated synchronously in real time according to the physical world.
  11. 一种智能体的控制装置,包括:A control device for an intelligent body, comprising:
    获取模块,用于获取目标任务;An acquisition module, used to acquire the target task;
    生成模块,用于根据数字孪生世界的环境数据、智能体的位姿和强化学习网络,生成用于控制数字孪生体完成所述目标任务的控制指令;其中,所述数字孪生世界通过对物理世界的仿真映射得到,所述数字孪生体位于所述数字孪生世界内,所述智能体位于所述物理世界,且与所述数字孪生体相对应;The generation module is used to generate control instructions for controlling the digital twin to complete the target task according to the environmental data of the digital twin world, the pose of the agent and the reinforcement learning network; The simulation mapping of is obtained, the digital twin is located in the digital twin world, the agent is located in the physical world, and corresponds to the digital twin;
    执行模块,用于根据所述完成目标任务的控制指令,控制所述智能体执行所述目标任务。An execution module, configured to control the agent to execute the target task according to the control instruction for completing the target task.
  12. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至10中任一项所述的智能体的控制方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the operation described in any one of claims 1 to 10 The control method of the agent described above.
  13. 一种计算机程序,所述计算机程序被处理器执行时实现如权利要求1至10中任一项所述的智能体的控制方法。A computer program, when the computer program is executed by a processor, the method for controlling the intelligent body according to any one of claims 1 to 10 is realized.
  14. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至10中任一项所述的智能体的控制方法。A computer-readable storage medium storing a computer program, the computer program implementing the intelligent body control method according to any one of claims 1 to 10 when the computer program is executed by a processor.
PCT/CN2022/125695 2021-11-10 2022-10-17 Agent control method and apparatus, electronic device, program, and storage medium WO2023082949A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111329240.2 2021-11-10
CN202111329240.2A CN114310870A (en) 2021-11-10 2021-11-10 Intelligent agent control method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023082949A1 true WO2023082949A1 (en) 2023-05-19

Family

ID=81045682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125695 WO2023082949A1 (en) 2021-11-10 2022-10-17 Agent control method and apparatus, electronic device, program, and storage medium

Country Status (2)

Country Link
CN (1) CN114310870A (en)
WO (1) WO2023082949A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116388893A (en) * 2023-06-02 2023-07-04 中国信息通信研究院 High-precision electromagnetic environment digital twin method and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114310870A (en) * 2021-11-10 2022-04-12 达闼科技(北京)有限公司 Intelligent agent control method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402712A (en) * 2011-08-31 2012-04-04 山东大学 Robot reinforced learning initialization method based on neural network
CN111461338A (en) * 2020-03-06 2020-07-28 北京仿真中心 Intelligent system updating method and device based on digital twin
CN111680893A (en) * 2020-05-25 2020-09-18 北京科技大学 Digital twin system of multi-self-addressing robot picking system and scheduling method
CN112668687A (en) * 2020-12-01 2021-04-16 达闼机器人有限公司 Cloud robot system, cloud server, robot control module and robot
EP3865257A1 (en) * 2020-02-11 2021-08-18 Ingenieurbüro Hannweber GmbH Device and method for monitoring and controlling a technical working system
CN114310870A (en) * 2021-11-10 2022-04-12 达闼科技(北京)有限公司 Intelligent agent control method and device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102083007B (en) * 2011-01-25 2013-03-20 上海交通大学 GSM-R (Global System for Mobile Communications-Railway) cluster scheduling service analysis system and implementing method thereof
EP2685810B1 (en) * 2011-03-17 2020-09-09 Mirobot Ltd. Human assisted milking robot and method
CN107870600B (en) * 2017-10-17 2018-10-19 广东工业大学 A kind of transparent monitoring method in intelligence workshop and system
US20190122146A1 (en) * 2017-10-23 2019-04-25 Artificial Intelligence Foundation, Inc. Dynamic and Intuitive Aggregation of a Training Dataset
CN109829543B (en) * 2019-01-31 2020-05-26 中国科学院空间应用工程与技术中心 Space effective load data flow online anomaly detection method based on ensemble learning
WO2021092263A1 (en) * 2019-11-05 2021-05-14 Strong Force Vcn Portfolio 2019, Llc Control tower and enterprise management platform for value chain networks
CN111461431B (en) * 2020-03-31 2022-05-27 广东工业大学 Optimization method and system based on screw locking process in mobile phone manufacturing
CN112171669B (en) * 2020-09-21 2021-10-08 西安交通大学 Brain-computer cooperation digital twin reinforcement learning control method and system
CN112440281A (en) * 2020-11-16 2021-03-05 浙江大学 Robot trajectory planning method based on digital twins
CN112632778B (en) * 2020-12-22 2023-07-18 达闼机器人股份有限公司 Operation method and device of digital twin model and electronic equipment
CN113111006A (en) * 2021-05-06 2021-07-13 上海三一重机股份有限公司 Debugging method and system for operating machine control system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402712A (en) * 2011-08-31 2012-04-04 山东大学 Robot reinforced learning initialization method based on neural network
EP3865257A1 (en) * 2020-02-11 2021-08-18 Ingenieurbüro Hannweber GmbH Device and method for monitoring and controlling a technical working system
CN111461338A (en) * 2020-03-06 2020-07-28 北京仿真中心 Intelligent system updating method and device based on digital twin
CN111680893A (en) * 2020-05-25 2020-09-18 北京科技大学 Digital twin system of multi-self-addressing robot picking system and scheduling method
CN112668687A (en) * 2020-12-01 2021-04-16 达闼机器人有限公司 Cloud robot system, cloud server, robot control module and robot
CN114310870A (en) * 2021-11-10 2022-04-12 达闼科技(北京)有限公司 Intelligent agent control method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116388893A (en) * 2023-06-02 2023-07-04 中国信息通信研究院 High-precision electromagnetic environment digital twin method and electronic equipment
CN116388893B (en) * 2023-06-02 2023-08-08 中国信息通信研究院 High-precision electromagnetic environment digital twin method and electronic equipment

Also Published As

Publication number Publication date
CN114310870A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN110026987B (en) Method, device and equipment for generating grabbing track of mechanical arm and storage medium
WO2023082949A1 (en) Agent control method and apparatus, electronic device, program, and storage medium
US11580724B2 (en) Virtual teach and repeat mobile manipulation system
US20180260685A1 (en) Hierarchical robotic controller apparatus and methods
CN109483534B (en) Object grabbing method, device and system
US11823048B1 (en) Generating simulated training examples for training of machine learning model used for robot control
JPWO2003019475A1 (en) Robot device, face recognition method, and face recognition device
WO2020058669A1 (en) Task embedding for device control
CN108229678B (en) Network training method, operation control method, device, storage medium and equipment
WO2014201422A2 (en) Apparatus and methods for hierarchical robotic control and robotic training
CN114516060A (en) Apparatus and method for controlling a robotic device
CN110524531A (en) A kind of robot control system and its workflow based on Internet of Things cloud service
Gonzalez et al. Asap: A semi-autonomous precise system for telesurgery during communication delays
US20240118667A1 (en) Mitigating reality gap through training a simulation-to-real model using a vision-based robot task model
Ogawara et al. Acquiring hand-action models in task and behavior levels by a learning robot through observing human demonstrations
WO2023051706A1 (en) Gripping control method and apparatus, and server, device, program and medium
CN116977506A (en) Model action redirection method, device, electronic equipment and storage medium
CN210121851U (en) Robot
Liu et al. Real-world robot reaching skill learning based on deep reinforcement learning
Jagersand Image based predictive display for tele-manipulation
CN111360819B (en) Robot control method and device, computer device and storage medium
US20230154160A1 (en) Mitigating reality gap through feature-level domain adaptation in training of vision-based robot action model
Yang et al. Data-driven Grasping and Pre-grasp Manipulation Using Hierarchical Reinforcement Learning with Parameterized Action Primitives
Saghour Vision-based robotic dual arm manipulation of soft objects
CN115357115A (en) Object interaction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891746

Country of ref document: EP

Kind code of ref document: A1