CN112782980B - Multifunctional workshop robot based on DQN - Google Patents
Multifunctional workshop robot based on DQN Download PDFInfo
- Publication number
- CN112782980B CN112782980B CN202011615034.3A CN202011615034A CN112782980B CN 112782980 B CN112782980 B CN 112782980B CN 202011615034 A CN202011615034 A CN 202011615034A CN 112782980 B CN112782980 B CN 112782980B
- Authority
- CN
- China
- Prior art keywords
- dqn
- module
- robot
- neural network
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Manipulator (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to the technical field of robot control, in particular to a multifunctional workshop robot based on DQN, which comprises: the DQN neural network model building module is used for building a DQN neural network model according to the algorithm model; the training module is used for training the DQN neural network model; the SLAM image construction module is used for generating an SLAM image according to the sensor data; an environment state generation module for generating a state according to the SLAM image; the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set; and the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model. The DQN-based workshop multifunctional robot can meet the requirements of various different tasks through self-learning, and solves the problems of high cost and low benefit of the current robot due to single application scene and task.
Description
Technical Field
The invention relates to the technical field of robot control, in particular to a multifunctional workshop robot based on DQN.
Background
With the development of the internet of things and internet technology, the intelligent robot is widely applied to scenes such as exhibition hall navigation, workshop management, automatic production, warehouse management, smart home and the like.
Generally, a workshop robot or an intelligent machine has a single application scene and function, can only complete a certain part of work in a specific scene, cannot perform the work when being used for other tasks or different scenes, and can greatly reduce the workload of workers. Taking the movement of a robot as an example, movement control and path planning are the basis of the working operation of the robot, and the existing workshop robot generally adopts a fixed line or a scene to set a mark for identification and other modes to control the movement and the path of an intelligent vehicle, so that the intelligent vehicle cannot complete the work under different tasks in different scenes, and the problems of high application cost and low benefit exist.
Disclosure of Invention
The invention aims to provide a DQN-based workshop multifunctional robot, which can meet the requirements of various different tasks through self-learning and solve the problems of high cost and low benefit of the current robot due to single application scene and task.
The application provides the following technical scheme:
a DQN-based workshop multi-function robot, comprising:
the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:
Q(s t ,a t )←Q(s t ,a t )+α[R t +γQ(s t’ ,a t’ )-Q(s t ,a t )]
where γ represents a discount factor, α represents a learning rate, and R t Represents the accumulated return value, s t Represents the state at the current time t;
the training module is used for training the DQN neural network model;
the SLAM image construction module is used for generating an SLAM image according to the sensor data;
an environment state generation module for generating a state from the SLAM image;
the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set;
and the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model.
Further, N is 64.
Further, the DQN neural network model comprises two convolution layers, the step length of each convolution layer is 3, and the DQN neural network model further comprises two full-connection layers, wherein the first full-connection layer is provided with 256 nodes, and the second full-connection layer is provided with 8 nodes.
Further, R is t Calculated according to the following formula:
wherein r is t Representing the value of the reward after taking action a at the current time t, state s.
Further, r t The definition is as follows: when the robot moves towards the destination without collisionWhen in collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.
Further, the training module continuously adjusts the network weights using a stochastic gradient descent method to minimize a loss function.
Further, the loss function employed by the training module is defined as follows:
L i (θ i )=E[(TargetQ-Q(s,a;θ i )) 2 ]
wherein the content of the first and second substances,representing the target network parameter, θ, of the ith iteration i Then the Q-network parameter.
Further, the SLAM image construction module comprises an RPLIDAR data acquisition module, an RGBD data acquisition module and a data fusion module, wherein the RPLIDAR data acquisition module is used for acquiring environment one-dimensional composition data, the prime RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data.
Further, random Gaussian noise is added when the RPLIDAR data acquisition module acquires the environment one-dimensional communication data. The data distortion caused by overfitting can be prevented, and the obtained data is more suitable for the actual situation.
The technical scheme of the invention has the beneficial effects that:
in the technical scheme of the invention, the DQN theory is used, so that the robot not only has a self-learning function, but also is more intelligent, and can be suitable for different application scenes to complete different tasks. The experience pool introduced in the DQN can greatly reduce the correlation problem among samples when the network is trained, and can prevent the neural network from learning only according to the latest action of the neural network by keeping the past experience, and encourage the neural network to learn from various random past experiences, thereby improving the decision accuracy. The non-equilibrium problem can be solved by dividing the DQN into a network for training and updating weights and a target network for calculating a target Q value; according to the technical scheme, the self-tracking function from the starting point to the terminal point can be realized under various different scenes, various carrying function requirements can be met, and the problems of high cost and low benefit of the current robot due to single applicable scene and task are solved.
Drawings
FIG. 1 is a control model structure diagram in an embodiment of a DQN-based workshop multifunctional robot according to the present application;
fig. 2 is a training flowchart in the embodiment of the DQN-based workshop multifunctional robot of the present application.
Fig. 3 is a schematic diagram of a simple simulation scene in an embodiment of the DQN-based workshop multifunctional robot of the present application.
Fig. 4 is a schematic diagram of a complex simulation scene in an embodiment of the DQN-based multi-function robot in a workshop.
FIG. 5 is a change curve diagram of the return value under a simple simulation environment in the embodiment of the DQN-based workshop multifunctional robot of the present application.
FIG. 6 is a change curve diagram of the return value under a complex simulation environment in the embodiment of the DQN-based workshop multifunctional robot.
Fig. 7 is a graph of iterative pace change in a simple simulation environment in the embodiment of the DQN-based multi-function robot for a plant of the present application.
Fig. 8 is a graph of iterative pace change in a complex simulation environment in the embodiment of the DQN-based multi-function robot for a plant of the present application.
Detailed Description
The technical scheme of the application is further explained in detail through the following specific implementation modes:
example one
As shown in fig. 1, the DQN-based multi-function robot for a workshop disclosed in this embodiment includes:
the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:
Q(s t ,a t )←Q(s t ,a t )+α[R t +γQ(s t’ ,a t’ )-Q(s t ,a t )]
where γ represents a discount factor, α represents a learning rate, and R t Represents the accumulated return value, s t Represents the state at the current time t; in this embodiment, the DQN neural network model includes two convolutional layers, the step length of each convolutional layer is 3, and further includes two fully-connected layers, the first fully-connected layer is provided with 256 nodes, and the second fully-connected layer is provided with 8 nodes.
In this example, R t Calculated according to the following formula:
wherein r is t Representing the value of the reward after taking action a at the current time t, state s.
r t The definition is as follows: when the robot moves towards the destination and has no collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.
The training module is used for training the DQN neural network model;
the SLAM image construction module is used for generating an SLAM image according to the sensor data; an environment state generation module for generating a state according to the SLAM image; in this embodiment, the SLAM image construction module includes an RPLIDAR data acquisition module, an RGBD data acquisition module, and a data fusion module, where the RPLIDAR data acquisition module is configured to acquire environment one-dimensional composition data, and random gaussian noise is added when the RPLIDAR data acquisition module acquires environment one-dimensional communication data. The prime number RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data.
The feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set; in this embodiment, N is preferably 64.
And the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model.
As shown in fig. 2, the training module continuously adjusts the network weight by minimizing the loss function by using a stochastic gradient descent method, and the loss function used by the training module is defined as follows:
L i (θ i )=E[(TargetQ-Q(s,a;θ i )) 2 ]
wherein the content of the first and second substances,representing the target network parameter, theta, of the ith iteration i Then Q-network parameters.
In this embodiment, a simulator is used, environments with different difficulties are designed, a plurality of models are trained simultaneously, and experience learned in each training can be summarized, accumulated and shared, so that a basis is provided for subsequent training. In each training environment, after many times of iterative training, an optimized barrier-free path from a starting point to an end point can be obtained, when the optimal result of the training is applied to an actual environment, tasks cannot be completed due to the change of the environment, and fig. 3 and 4 are two simulation situations of the training environments from simple to complex. Fig. 5-8 show the change of the return value and the change of the iteration pace in two training scenarios, respectively, and it can be seen that in a simple environment, the return value is stable quickly after approximately 10000 iterations, and in a complex situation, because more judgments and selections are needed, approximately 30000 iterations are needed. In either simple or complex situations, the machine based on the invention can realize the self-tracking function from the starting point to the end point after being trained in the simulator, and can meet various carrying functions and the like.
Example two
The difference between this embodiment and the first embodiment is that, in this embodiment, the system further includes a task receiving module and a target planning adjustment module, where the task receiving module is configured to obtain other transport tasks near each point of the path according to the forward path of the current target, and the task receiving module is further configured to determine whether the corresponding task can be accepted according to the load condition of the current robot and task requirements of the other transport tasks, and if so, accept the corresponding transport task; and the target planning and adjusting module is used for readjusting the target position according to the received carrying task. In the technical scheme of this embodiment, when carrying out the transport task, the robot still can acquire other transport tasks near the path point position to according to the selective other transport tasks of accepting of self load, and then can realize once accomplishing a plurality of tasks, improve work efficiency.
The above are merely examples of the present invention, and the present invention is not limited to the field related to this embodiment, and the common general knowledge of the known specific structures and characteristics in the schemes is not described herein too much, and those skilled in the art can know all the common technical knowledge in the technical field before the application date or the priority date, can know all the prior art in this field, and have the ability to apply the conventional experimental means before this date, and those skilled in the art can combine their own ability to perfect and implement the scheme, and some typical known structures or known methods should not become barriers to the implementation of the present invention by those skilled in the art in light of the teaching provided in the present application. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be defined by the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.
Claims (4)
1. A multifunctional workshop robot based on DQN is characterized in that: the method comprises the following steps:
the DQN neural network model building module is used for building the DQN neural network model according to the following algorithm model:
Q(S t ,a t )←Q(S t ,a t )+α[R t +γQ(S t’ ,a t’ )-Q(S t ,a t )]
where γ represents a discount factor, α represents a learning rate, and R t Indicating the accumulated return value, S t Represents the state at the current time t;
the training module is used for training the DQN neural network model;
the SLAM image construction module is used for generating an SLAM image according to the sensor data; the SLAM image construction module comprises an RPLIDAR data acquisition module, an RGBD data acquisition module and a data fusion module, wherein the RPLIDAR data acquisition module is used for acquiring environment one-dimensional composition data, the RGBD data acquisition module is used for acquiring environment two-dimensional composition data, and the data fusion module is used for generating an SLAM image of an environment space according to the environment one-dimensional composition data and the environment two-dimensional composition data;
an environment state generation module for generating a state from the SLAM image;
the feasible action set construction module is used for dividing the coordinate plane into N directions to form a feasible action set;
the control module is used for taking the feasible action set and the SLAM image data as input and outputting a behavior control decision through the DQN neural network model;
the N is 64; the DQN neural network model comprises two convolution layers, the step length of each convolution layer is 3, and the DQN neural network model further comprises two full-connection layers, wherein the first full-connection layer is provided with 256 nodes, and the second full-connection layer is provided with 8 nodes;
the R is t Calculated according to the following formula:
wherein r is t Representing the reward value after taking action a at the current time t, state s;
r t the definition is as follows: when the robot moves towards the destination and has no collision, the value is 5; -5 when the robot is in reverse direction with the destination or when the robot collides with a surrounding obstacle; otherwise, the value is 0.
2. The DQN-based workshop multifunctional robot according to claim 1, wherein: the training module continuously adjusts the network weights using a stochastic gradient descent method to minimize a loss function.
3. The DQN-based workshop multifunctional robot according to claim 2, characterized in that: the loss function employed by the training module is defined as follows:
L i (θ i )=E[(TargetQ-Q(s,a;θ i )) 2 ]
TargetQ=r+γQ(s i ,a i ;θ i - )
wherein, theta i - Representing the target network parameter, theta, of the ith iteration i Then it is a Q-network parameter.
4. The DQN-based workshop multifunctional robot of claim 3, wherein: and adding random Gaussian noise when the RPLIDAR data acquisition module acquires the environment one-dimensional communication data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011615034.3A CN112782980B (en) | 2020-12-31 | 2020-12-31 | Multifunctional workshop robot based on DQN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011615034.3A CN112782980B (en) | 2020-12-31 | 2020-12-31 | Multifunctional workshop robot based on DQN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112782980A CN112782980A (en) | 2021-05-11 |
CN112782980B true CN112782980B (en) | 2022-09-13 |
Family
ID=75754083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011615034.3A Active CN112782980B (en) | 2020-12-31 | 2020-12-31 | Multifunctional workshop robot based on DQN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112782980B (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109658445A (en) * | 2018-12-14 | 2019-04-19 | 北京旷视科技有限公司 | Network training method, increment build drawing method, localization method, device and equipment |
CN109782600A (en) * | 2019-01-25 | 2019-05-21 | 东华大学 | A method of autonomous mobile robot navigation system is established by virtual environment |
CN110321666B (en) * | 2019-08-09 | 2022-05-03 | 重庆理工大学 | Multi-robot path planning method based on priori knowledge and DQN algorithm |
CN110764416A (en) * | 2019-11-11 | 2020-02-07 | 河海大学 | Humanoid robot gait optimization control method based on deep Q network |
CN110977967A (en) * | 2019-11-29 | 2020-04-10 | 天津博诺智创机器人技术有限公司 | Robot path planning method based on deep reinforcement learning |
-
2020
- 2020-12-31 CN CN202011615034.3A patent/CN112782980B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112782980A (en) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11747155B2 (en) | Global path planning method and device for an unmanned vehicle | |
US20220363259A1 (en) | Method for generating lane changing decision-making model, method for lane changing decision-making of unmanned vehicle and electronic device | |
CN109540150B (en) | Multi-robot path planning method applied to hazardous chemical environment | |
WO2021208771A1 (en) | Reinforced learning method and device | |
Wang et al. | Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach | |
CN105137967B (en) | The method for planning path for mobile robot that a kind of depth autocoder is combined with Q learning algorithms | |
CN110991972B (en) | Cargo transportation system based on multi-agent reinforcement learning | |
CN109489667A (en) | A kind of improvement ant colony paths planning method based on weight matrix | |
CN112119404A (en) | Sample efficient reinforcement learning | |
CN112433525A (en) | Mobile robot navigation method based on simulation learning and deep reinforcement learning | |
CN109782600A (en) | A method of autonomous mobile robot navigation system is established by virtual environment | |
CN112651374B (en) | Future trajectory prediction method based on social information and automatic driving system | |
JP7448683B2 (en) | Learning options for action selection using meta-gradient in multi-task reinforcement learning | |
CN112930541A (en) | Determining a control strategy by minimizing delusional effects | |
CN112947591A (en) | Path planning method, device, medium and unmanned aerial vehicle based on improved ant colony algorithm | |
CN114185339A (en) | Mobile robot path planning method in dynamic environment | |
CN114167865A (en) | Robot path planning method based on confrontation generation network and ant colony algorithm | |
CN114037050B (en) | Robot degradation environment obstacle avoidance method based on internal plasticity of pulse neural network | |
CN115629607A (en) | Reinforced learning path planning method integrating historical information | |
CN112782980B (en) | Multifunctional workshop robot based on DQN | |
CN114493013A (en) | Smart agent path planning method based on reinforcement learning, electronic device and medium | |
CN111221318B (en) | Multi-robot state estimation method based on model predictive control algorithm | |
CN113110101A (en) | Production line mobile robot gathering type recovery and warehousing simulation method and system | |
CN116360454A (en) | Robot path collision avoidance planning method based on deep reinforcement learning in pedestrian environment | |
Deb | Single and multi-objective dynamic optimization: two tales from an evolutionary perspective |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |