CN112782980B - Multifunctional workshop robot based on DQN - Google Patents

Multifunctional workshop robot based on DQN Download PDF

Info

Publication number
CN112782980B
CN112782980B CN202011615034.3A CN202011615034A CN112782980B CN 112782980 B CN112782980 B CN 112782980B CN 202011615034 A CN202011615034 A CN 202011615034A CN 112782980 B CN112782980 B CN 112782980B
Authority
CN
China
Prior art keywords
dqn
module
robot
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011615034.3A
Other languages
Chinese (zh)
Other versions
CN112782980A (en
Inventor
敖邦乾
梁定勇
敖帮桃
令狐金卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zunyi Normal University
Original Assignee
Zunyi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zunyi Normal University filed Critical Zunyi Normal University
Priority to CN202011615034.3A priority Critical patent/CN112782980B/en
Publication of CN112782980A publication Critical patent/CN112782980A/en
Application granted granted Critical
Publication of CN112782980B publication Critical patent/CN112782980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the technical field of robot control, in particular to a multifunctional workshop robot based on DQN, which comprises: the DQN neural network model building module is used for building a DQN neural network model according to the algorithm model; the training module is used for training the DQN neural network model; the SLAM image construction module is used for generating an SLAM image according to the sensor data; an environment state generation module for generating a state according to the SLAM image; the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set; and the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model. The DQN-based workshop multifunctional robot can meet the requirements of various different tasks through self-learning, and solves the problems of high cost and low benefit of the current robot due to single application scene and task.

Description

Multifunctional workshop robot based on DQN
Technical Field
The invention relates to the technical field of robot control, in particular to a multifunctional workshop robot based on DQN.
Background
With the development of the internet of things and internet technology, the intelligent robot is widely applied to scenes such as exhibition hall navigation, workshop management, automatic production, warehouse management, smart home and the like.
Generally, a workshop robot or an intelligent machine has a single application scene and function, can only complete a certain part of work in a specific scene, cannot perform the work when being used for other tasks or different scenes, and can greatly reduce the workload of workers. Taking the movement of a robot as an example, movement control and path planning are the basis of the working operation of the robot, and the existing workshop robot generally adopts a fixed line or a scene to set a mark for identification and other modes to control the movement and the path of an intelligent vehicle, so that the intelligent vehicle cannot complete the work under different tasks in different scenes, and the problems of high application cost and low benefit exist.
Disclosure of Invention
The invention aims to provide a DQN-based workshop multifunctional robot, which can meet the requirements of various different tasks through self-learning and solve the problems of high cost and low benefit of the current robot due to single application scene and task.
The application provides the following technical scheme:
a DQN-based workshop multi-function robot, comprising:
the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:
Q(s t ,a t )←Q(s t ,a t )+α[R t +γQ(s t’ ,a t’ )-Q(s t ,a t )]
where γ represents a discount factor, α represents a learning rate, and R t Represents the accumulated return value, s t Represents the state at the current time t;
the training module is used for training the DQN neural network model;
the SLAM image construction module is used for generating an SLAM image according to the sensor data;
an environment state generation module for generating a state from the SLAM image;
the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set;
and the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model.
Further, N is 64.
Further, the DQN neural network model comprises two convolution layers, the step length of each convolution layer is 3, and the DQN neural network model further comprises two full-connection layers, wherein the first full-connection layer is provided with 256 nodes, and the second full-connection layer is provided with 8 nodes.
Further, R is t Calculated according to the following formula:
Figure BDA0002874650570000021
wherein r is t Representing the value of the reward after taking action a at the current time t, state s.
Further, r t The definition is as follows: when the robot moves towards the destination without collisionWhen in collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.
Further, the training module continuously adjusts the network weights using a stochastic gradient descent method to minimize a loss function.
Further, the loss function employed by the training module is defined as follows:
L ii )=E[(TargetQ-Q(s,a;θ i )) 2 ]
Figure BDA0002874650570000022
wherein the content of the first and second substances,
Figure BDA0002874650570000023
representing the target network parameter, θ, of the ith iteration i Then the Q-network parameter.
Further, the SLAM image construction module comprises an RPLIDAR data acquisition module, an RGBD data acquisition module and a data fusion module, wherein the RPLIDAR data acquisition module is used for acquiring environment one-dimensional composition data, the prime RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data.
Further, random Gaussian noise is added when the RPLIDAR data acquisition module acquires the environment one-dimensional communication data. The data distortion caused by overfitting can be prevented, and the obtained data is more suitable for the actual situation.
The technical scheme of the invention has the beneficial effects that:
in the technical scheme of the invention, the DQN theory is used, so that the robot not only has a self-learning function, but also is more intelligent, and can be suitable for different application scenes to complete different tasks. The experience pool introduced in the DQN can greatly reduce the correlation problem among samples when the network is trained, and can prevent the neural network from learning only according to the latest action of the neural network by keeping the past experience, and encourage the neural network to learn from various random past experiences, thereby improving the decision accuracy. The non-equilibrium problem can be solved by dividing the DQN into a network for training and updating weights and a target network for calculating a target Q value; according to the technical scheme, the self-tracking function from the starting point to the terminal point can be realized under various different scenes, various carrying function requirements can be met, and the problems of high cost and low benefit of the current robot due to single applicable scene and task are solved.
Drawings
FIG. 1 is a control model structure diagram in an embodiment of a DQN-based workshop multifunctional robot according to the present application;
fig. 2 is a training flowchart in the embodiment of the DQN-based workshop multifunctional robot of the present application.
Fig. 3 is a schematic diagram of a simple simulation scene in an embodiment of the DQN-based workshop multifunctional robot of the present application.
Fig. 4 is a schematic diagram of a complex simulation scene in an embodiment of the DQN-based multi-function robot in a workshop.
FIG. 5 is a change curve diagram of the return value under a simple simulation environment in the embodiment of the DQN-based workshop multifunctional robot of the present application.
FIG. 6 is a change curve diagram of the return value under a complex simulation environment in the embodiment of the DQN-based workshop multifunctional robot.
Fig. 7 is a graph of iterative pace change in a simple simulation environment in the embodiment of the DQN-based multi-function robot for a plant of the present application.
Fig. 8 is a graph of iterative pace change in a complex simulation environment in the embodiment of the DQN-based multi-function robot for a plant of the present application.
Detailed Description
The technical scheme of the application is further explained in detail through the following specific implementation modes:
example one
As shown in fig. 1, the DQN-based multi-function robot for a workshop disclosed in this embodiment includes:
the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:
Q(s t ,a t )←Q(s t ,a t )+α[R t +γQ(s t’ ,a t’ )-Q(s t ,a t )]
where γ represents a discount factor, α represents a learning rate, and R t Represents the accumulated return value, s t Represents the state at the current time t; in this embodiment, the DQN neural network model includes two convolutional layers, the step length of each convolutional layer is 3, and further includes two fully-connected layers, the first fully-connected layer is provided with 256 nodes, and the second fully-connected layer is provided with 8 nodes.
In this example, R t Calculated according to the following formula:
Figure BDA0002874650570000031
wherein r is t Representing the value of the reward after taking action a at the current time t, state s.
r t The definition is as follows: when the robot moves towards the destination and has no collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.
The training module is used for training the DQN neural network model;
the SLAM image construction module is used for generating an SLAM image according to the sensor data; an environment state generation module for generating a state according to the SLAM image; in this embodiment, the SLAM image construction module includes an RPLIDAR data acquisition module, an RGBD data acquisition module, and a data fusion module, where the RPLIDAR data acquisition module is configured to acquire environment one-dimensional composition data, and random gaussian noise is added when the RPLIDAR data acquisition module acquires environment one-dimensional communication data. The prime number RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data.
The feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set; in this embodiment, N is preferably 64.
And the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model.
As shown in fig. 2, the training module continuously adjusts the network weight by minimizing the loss function by using a stochastic gradient descent method, and the loss function used by the training module is defined as follows:
L ii )=E[(TargetQ-Q(s,a;θ i )) 2 ]
Figure BDA0002874650570000041
wherein the content of the first and second substances,
Figure BDA0002874650570000042
representing the target network parameter, theta, of the ith iteration i Then Q-network parameters.
In this embodiment, a simulator is used, environments with different difficulties are designed, a plurality of models are trained simultaneously, and experience learned in each training can be summarized, accumulated and shared, so that a basis is provided for subsequent training. In each training environment, after many times of iterative training, an optimized barrier-free path from a starting point to an end point can be obtained, when the optimal result of the training is applied to an actual environment, tasks cannot be completed due to the change of the environment, and fig. 3 and 4 are two simulation situations of the training environments from simple to complex. Fig. 5-8 show the change of the return value and the change of the iteration pace in two training scenarios, respectively, and it can be seen that in a simple environment, the return value is stable quickly after approximately 10000 iterations, and in a complex situation, because more judgments and selections are needed, approximately 30000 iterations are needed. In either simple or complex situations, the machine based on the invention can realize the self-tracking function from the starting point to the end point after being trained in the simulator, and can meet various carrying functions and the like.
Example two
The difference between this embodiment and the first embodiment is that, in this embodiment, the system further includes a task receiving module and a target planning adjustment module, where the task receiving module is configured to obtain other transport tasks near each point of the path according to the forward path of the current target, and the task receiving module is further configured to determine whether the corresponding task can be accepted according to the load condition of the current robot and task requirements of the other transport tasks, and if so, accept the corresponding transport task; and the target planning and adjusting module is used for readjusting the target position according to the received carrying task. In the technical scheme of this embodiment, when carrying out the transport task, the robot still can acquire other transport tasks near the path point position to according to the selective other transport tasks of accepting of self load, and then can realize once accomplishing a plurality of tasks, improve work efficiency.
The above are merely examples of the present invention, and the present invention is not limited to the field related to this embodiment, and the common general knowledge of the known specific structures and characteristics in the schemes is not described herein too much, and those skilled in the art can know all the common technical knowledge in the technical field before the application date or the priority date, can know all the prior art in this field, and have the ability to apply the conventional experimental means before this date, and those skilled in the art can combine their own ability to perfect and implement the scheme, and some typical known structures or known methods should not become barriers to the implementation of the present invention by those skilled in the art in light of the teaching provided in the present application. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be defined by the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (4)

1. A multifunctional workshop robot based on DQN is characterized in that: the method comprises the following steps:
the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:
Q(S t ,a t )←Q(S t ,a t )+α[R t +γQ(S t’ ,a t’ )-Q(S t ,a t )]
where γ represents a discount factor, α represents a learning rate, and R t Indicating the accumulated return value, S t Represents the state at the current time t;
the training module is used for training the DQN neural network model;
the SLAM image construction module is used for generating an SLAM image according to the sensor data; the SLAM image construction module comprises an RPLIDAR data acquisition module, an RGBD data acquisition module and a data fusion module, wherein the RPLIDAR data acquisition module is used for acquiring environment one-dimensional composition data, the RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data;
an environment state generation module for generating a state from the SLAM image;
the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set;
the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model;
the N is 64; the DQN neural network model comprises two convolution layers, the step length of each convolution layer is 3, and the DQN neural network model further comprises two full-connection layers, wherein the first full-connection layer is provided with 256 nodes, and the second full-connection layer is provided with 8 nodes;
the R is t Calculated according to the following formula:
Figure 500881DEST_PATH_IMAGE001
wherein r is t Representing the reward value after taking action a at the current time t, state s;
r t the definition is as follows: when the robot moves towards the destination and has no collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.
2. The DQN-based workshop multifunctional robot according to claim 1, wherein: the training module continuously adjusts the network weights using a stochastic gradient descent method to minimize a loss function.
3. The DQN-based workshop multifunctional robot according to claim 2, characterized in that: the loss function employed by the training module is defined as follows:
L ii )=E[(TargetQ-Q(s,a;θ i )) 2 ]
TargetQ=r+γQ(s i ,a ii - )
wherein, theta i - Representing the target network parameter, theta, of the ith iteration i Then it is a Q-network parameter.
4. The DQN-based workshop multifunctional robot of claim 3, wherein: and adding random Gaussian noise when the RPLIDAR data acquisition module acquires the environment one-dimensional communication data.
CN202011615034.3A 2020-12-31 2020-12-31 Multifunctional workshop robot based on DQN Active CN112782980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011615034.3A CN112782980B (en) 2020-12-31 2020-12-31 Multifunctional workshop robot based on DQN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011615034.3A CN112782980B (en) 2020-12-31 2020-12-31 Multifunctional workshop robot based on DQN

Publications (2)

Publication Number Publication Date
CN112782980A CN112782980A (en) 2021-05-11
CN112782980B true CN112782980B (en) 2022-09-13

Family

ID=75754083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011615034.3A Active CN112782980B (en) 2020-12-31 2020-12-31 Multifunctional workshop robot based on DQN

Country Status (1)

Country Link
CN (1) CN112782980B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109658445A (en) * 2018-12-14 2019-04-19 北京旷视科技有限公司 Network training method, increment build drawing method, localization method, device and equipment
CN109782600A (en) * 2019-01-25 2019-05-21 东华大学 A method of autonomous mobile robot navigation system is established by virtual environment
CN110321666B (en) * 2019-08-09 2022-05-03 重庆理工大学 Multi-robot path planning method based on priori knowledge and DQN algorithm
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN112782980A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
US11747155B2 (en) Global path planning method and device for an unmanned vehicle
US20220363259A1 (en) Method for generating lane changing decision-making model, method for lane changing decision-making of unmanned vehicle and electronic device
CN109540150B (en) Multi-robot path planning method applied to hazardous chemical environment
WO2021208771A1 (en) Reinforced learning method and device
Wang et al. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach
CN105137967B (en) The method for planning path for mobile robot that a kind of depth autocoder is combined with Q learning algorithms
CN110991972B (en) Cargo transportation system based on multi-agent reinforcement learning
CN109489667A (en) A kind of improvement ant colony paths planning method based on weight matrix
CN112119404A (en) Sample efficient reinforcement learning
CN112433525A (en) Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN109782600A (en) A method of autonomous mobile robot navigation system is established by virtual environment
CN112651374B (en) Future trajectory prediction method based on social information and automatic driving system
JP7448683B2 (en) Learning options for action selection using meta-gradient in multi-task reinforcement learning
CN112930541A (en) Determining a control strategy by minimizing delusional effects
CN112947591A (en) Path planning method, device, medium and unmanned aerial vehicle based on improved ant colony algorithm
CN114185339A (en) Mobile robot path planning method in dynamic environment
CN114167865A (en) Robot path planning method based on confrontation generation network and ant colony algorithm
CN114037050B (en) Robot degradation environment obstacle avoidance method based on internal plasticity of pulse neural network
CN115629607A (en) Reinforced learning path planning method integrating historical information
CN112782980B (en) Multifunctional workshop robot based on DQN
CN114493013A (en) Smart agent path planning method based on reinforcement learning, electronic device and medium
CN111221318B (en) Multi-robot state estimation method based on model predictive control algorithm
CN113110101A (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN116360454A (en) Robot path collision avoidance planning method based on deep reinforcement learning in pedestrian environment
Deb Single and multi-objective dynamic optimization: two tales from an evolutionary perspective

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant