CN112782980B

CN112782980B - Multifunctional workshop robot based on DQN

Info

Publication number: CN112782980B
Application number: CN202011615034.3A
Authority: CN
Inventors: 敖邦乾; 梁定勇; 敖帮桃; 令狐金卿
Original assignee: Zunyi Normal University
Current assignee: Zunyi Normal University
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-09-13
Anticipated expiration: 2040-12-31
Also published as: CN112782980A

Abstract

The invention relates to the technical field of robot control, in particular to a multifunctional workshop robot based on DQN, which comprises: the DQN neural network model building module is used for building a DQN neural network model according to the algorithm model; the training module is used for training the DQN neural network model; the SLAM image construction module is used for generating an SLAM image according to the sensor data; an environment state generation module for generating a state according to the SLAM image; the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set; and the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model. The DQN-based workshop multifunctional robot can meet the requirements of various different tasks through self-learning, and solves the problems of high cost and low benefit of the current robot due to single application scene and task.

Description

Multifunctional workshop robot based on DQN

Technical Field

The invention relates to the technical field of robot control, in particular to a multifunctional workshop robot based on DQN.

Background

With the development of the internet of things and internet technology, the intelligent robot is widely applied to scenes such as exhibition hall navigation, workshop management, automatic production, warehouse management, smart home and the like.

Generally, a workshop robot or an intelligent machine has a single application scene and function, can only complete a certain part of work in a specific scene, cannot perform the work when being used for other tasks or different scenes, and can greatly reduce the workload of workers. Taking the movement of a robot as an example, movement control and path planning are the basis of the working operation of the robot, and the existing workshop robot generally adopts a fixed line or a scene to set a mark for identification and other modes to control the movement and the path of an intelligent vehicle, so that the intelligent vehicle cannot complete the work under different tasks in different scenes, and the problems of high application cost and low benefit exist.

Disclosure of Invention

The invention aims to provide a DQN-based workshop multifunctional robot, which can meet the requirements of various different tasks through self-learning and solve the problems of high cost and low benefit of the current robot due to single application scene and task.

The application provides the following technical scheme:

a DQN-based workshop multi-function robot, comprising:

the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:

Q(s _t ,a _t )←Q(s _t ,a _t )+α[R _t +γQ(s _t’ ,a _t’ )-Q(s _t ,a _t )]

where γ represents a discount factor, α represents a learning rate, and R _t Represents the accumulated return value, s _t Represents the state at the current time t;

the training module is used for training the DQN neural network model;

the SLAM image construction module is used for generating an SLAM image according to the sensor data;

an environment state generation module for generating a state from the SLAM image;

the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set;

and the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model.

Further, N is 64.

Further, the DQN neural network model comprises two convolution layers, the step length of each convolution layer is 3, and the DQN neural network model further comprises two full-connection layers, wherein the first full-connection layer is provided with 256 nodes, and the second full-connection layer is provided with 8 nodes.

Further, R is _t Calculated according to the following formula:

wherein r is _t Representing the value of the reward after taking action a at the current time t, state s.

Further, r _t The definition is as follows: when the robot moves towards the destination without collisionWhen in collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.

Further, the training module continuously adjusts the network weights using a stochastic gradient descent method to minimize a loss function.

Further, the loss function employed by the training module is defined as follows:

L _i (θ _i )＝E[(TargetQ-Q(s,a；θ _i )) ² ]

wherein the content of the first and second substances,

representing the target network parameter, θ, of the ith iteration _i Then the Q-network parameter.

Further, the SLAM image construction module comprises an RPLIDAR data acquisition module, an RGBD data acquisition module and a data fusion module, wherein the RPLIDAR data acquisition module is used for acquiring environment one-dimensional composition data, the prime RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data.

Further, random Gaussian noise is added when the RPLIDAR data acquisition module acquires the environment one-dimensional communication data. The data distortion caused by overfitting can be prevented, and the obtained data is more suitable for the actual situation.

The technical scheme of the invention has the beneficial effects that:

in the technical scheme of the invention, the DQN theory is used, so that the robot not only has a self-learning function, but also is more intelligent, and can be suitable for different application scenes to complete different tasks. The experience pool introduced in the DQN can greatly reduce the correlation problem among samples when the network is trained, and can prevent the neural network from learning only according to the latest action of the neural network by keeping the past experience, and encourage the neural network to learn from various random past experiences, thereby improving the decision accuracy. The non-equilibrium problem can be solved by dividing the DQN into a network for training and updating weights and a target network for calculating a target Q value; according to the technical scheme, the self-tracking function from the starting point to the terminal point can be realized under various different scenes, various carrying function requirements can be met, and the problems of high cost and low benefit of the current robot due to single applicable scene and task are solved.

Drawings

FIG. 1 is a control model structure diagram in an embodiment of a DQN-based workshop multifunctional robot according to the present application;

fig. 2 is a training flowchart in the embodiment of the DQN-based workshop multifunctional robot of the present application.

Fig. 3 is a schematic diagram of a simple simulation scene in an embodiment of the DQN-based workshop multifunctional robot of the present application.

Fig. 4 is a schematic diagram of a complex simulation scene in an embodiment of the DQN-based multi-function robot in a workshop.

FIG. 5 is a change curve diagram of the return value under a simple simulation environment in the embodiment of the DQN-based workshop multifunctional robot of the present application.

FIG. 6 is a change curve diagram of the return value under a complex simulation environment in the embodiment of the DQN-based workshop multifunctional robot.

Fig. 7 is a graph of iterative pace change in a simple simulation environment in the embodiment of the DQN-based multi-function robot for a plant of the present application.

Fig. 8 is a graph of iterative pace change in a complex simulation environment in the embodiment of the DQN-based multi-function robot for a plant of the present application.

Detailed Description

The technical scheme of the application is further explained in detail through the following specific implementation modes:

example one

As shown in fig. 1, the DQN-based multi-function robot for a workshop disclosed in this embodiment includes:

Q(s _t ,a _t )←Q(s _t ,a _t )+α[R _t +γQ(s _t’ ,a _t’ )-Q(s _t ,a _t )]

where γ represents a discount factor, α represents a learning rate, and R _t Represents the accumulated return value, s _t Represents the state at the current time t; in this embodiment, the DQN neural network model includes two convolutional layers, the step length of each convolutional layer is 3, and further includes two fully-connected layers, the first fully-connected layer is provided with 256 nodes, and the second fully-connected layer is provided with 8 nodes.

In this example, R _t Calculated according to the following formula:

r _t The definition is as follows: when the robot moves towards the destination and has no collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.

The training module is used for training the DQN neural network model;

the SLAM image construction module is used for generating an SLAM image according to the sensor data; an environment state generation module for generating a state according to the SLAM image; in this embodiment, the SLAM image construction module includes an RPLIDAR data acquisition module, an RGBD data acquisition module, and a data fusion module, where the RPLIDAR data acquisition module is configured to acquire environment one-dimensional composition data, and random gaussian noise is added when the RPLIDAR data acquisition module acquires environment one-dimensional communication data. The prime number RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data.

The feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set; in this embodiment, N is preferably 64.

As shown in fig. 2, the training module continuously adjusts the network weight by minimizing the loss function by using a stochastic gradient descent method, and the loss function used by the training module is defined as follows:

L _i (θ _i )＝E[(TargetQ-Q(s,a；θ _i )) ² ]

wherein the content of the first and second substances,

representing the target network parameter, theta, of the ith iteration _i Then Q-network parameters.

In this embodiment, a simulator is used, environments with different difficulties are designed, a plurality of models are trained simultaneously, and experience learned in each training can be summarized, accumulated and shared, so that a basis is provided for subsequent training. In each training environment, after many times of iterative training, an optimized barrier-free path from a starting point to an end point can be obtained, when the optimal result of the training is applied to an actual environment, tasks cannot be completed due to the change of the environment, and fig. 3 and 4 are two simulation situations of the training environments from simple to complex. Fig. 5-8 show the change of the return value and the change of the iteration pace in two training scenarios, respectively, and it can be seen that in a simple environment, the return value is stable quickly after approximately 10000 iterations, and in a complex situation, because more judgments and selections are needed, approximately 30000 iterations are needed. In either simple or complex situations, the machine based on the invention can realize the self-tracking function from the starting point to the end point after being trained in the simulator, and can meet various carrying functions and the like.

Example two

The difference between this embodiment and the first embodiment is that, in this embodiment, the system further includes a task receiving module and a target planning adjustment module, where the task receiving module is configured to obtain other transport tasks near each point of the path according to the forward path of the current target, and the task receiving module is further configured to determine whether the corresponding task can be accepted according to the load condition of the current robot and task requirements of the other transport tasks, and if so, accept the corresponding transport task; and the target planning and adjusting module is used for readjusting the target position according to the received carrying task. In the technical scheme of this embodiment, when carrying out the transport task, the robot still can acquire other transport tasks near the path point position to according to the selective other transport tasks of accepting of self load, and then can realize once accomplishing a plurality of tasks, improve work efficiency.

The above are merely examples of the present invention, and the present invention is not limited to the field related to this embodiment, and the common general knowledge of the known specific structures and characteristics in the schemes is not described herein too much, and those skilled in the art can know all the common technical knowledge in the technical field before the application date or the priority date, can know all the prior art in this field, and have the ability to apply the conventional experimental means before this date, and those skilled in the art can combine their own ability to perfect and implement the scheme, and some typical known structures or known methods should not become barriers to the implementation of the present invention by those skilled in the art in light of the teaching provided in the present application. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be defined by the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims

1. A multifunctional workshop robot based on DQN is characterized in that: the method comprises the following steps:

Q(S _t ,a _t )←Q(S _t ,a _t )+α[R _t +γQ(S _t’ ,a _t’ )-Q(S _t ,a _t )]

where γ represents a discount factor, α represents a learning rate, and R _t Indicating the accumulated return value, S _t Represents the state at the current time t;

the training module is used for training the DQN neural network model;

the SLAM image construction module is used for generating an SLAM image according to the sensor data; the SLAM image construction module comprises an RPLIDAR data acquisition module, an RGBD data acquisition module and a data fusion module, wherein the RPLIDAR data acquisition module is used for acquiring environment one-dimensional composition data, the RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data;

the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model;

the N is 64; the DQN neural network model comprises two convolution layers, the step length of each convolution layer is 3, and the DQN neural network model further comprises two full-connection layers, wherein the first full-connection layer is provided with 256 nodes, and the second full-connection layer is provided with 8 nodes;

the R is _t Calculated according to the following formula:

wherein r is _t Representing the reward value after taking action a at the current time t, state s;

2. The DQN-based workshop multifunctional robot according to claim 1, wherein: the training module continuously adjusts the network weights using a stochastic gradient descent method to minimize a loss function.

3. The DQN-based workshop multifunctional robot according to claim 2, characterized in that: the loss function employed by the training module is defined as follows:

L _i (θ _i )=E[(TargetQ-Q(s,a;θ _i )) ² ]

TargetQ=r+γQ(s ⁱ ,a ⁱ ;θ _i ^- )

wherein, theta _i ^- Representing the target network parameter, theta, of the ith iteration _i Then it is a Q-network parameter.

4. The DQN-based workshop multifunctional robot of claim 3, wherein: and adding random Gaussian noise when the RPLIDAR data acquisition module acquires the environment one-dimensional communication data.