CN110084307B - Mobile robot vision following method based on deep reinforcement learning - Google Patents
Mobile robot vision following method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN110084307B CN110084307B CN201910361528.4A CN201910361528A CN110084307B CN 110084307 B CN110084307 B CN 110084307B CN 201910361528 A CN201910361528 A CN 201910361528A CN 110084307 B CN110084307 B CN 110084307B
- Authority
- CN
- China
- Prior art keywords
- model
- robot
- cnn
- image
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000002787 reinforcement Effects 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000008569 process Effects 0.000 claims abstract description 17
- 238000013508 migration Methods 0.000 claims abstract description 14
- 230000005012 migration Effects 0.000 claims abstract description 14
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 230000003993 interaction Effects 0.000 claims abstract description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 39
- 230000009471 action Effects 0.000 claims description 24
- 230000000007 visual effect Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 16
- 230000000875 corresponding effect Effects 0.000 claims description 12
- 238000010276 construction Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 2
- 238000004891 communication Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 4
- 238000004590 computer program Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 238000004088 simulation Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000011217 control strategy Methods 0.000 description 2
- 230000026058 directional locomotion Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000002086 nanomaterial Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/12—Target-seeking control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Aviation & Aerospace Engineering (AREA)
- Computational Linguistics (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Radar, Positioning & Navigation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Manipulator (AREA)
Abstract
The invention provides a mobile robot vision following method based on deep reinforcement learning. The method comprises the following steps of adopting a framework of 'simulated image supervised pre-training + model migration + RL', firstly collecting a small amount of data in a real environment, and automatically expanding a data set by adopting a computer program and an image processing technology so as to obtain a large amount of simulated data sets which can adapt to a real scene in a short time and be used for carrying out supervised training on a direction control model following a robot; secondly, a CNN model for controlling the direction of the robot is built, and supervised training is carried out on the CNN model by using an automatically constructed simulation data set to enable the CNN model to be used as a pre-training model; and then, the knowledge of the pre-training model is transferred to a control model based on DRL (distributed resource library), so that the robot executes a following task in a real environment, and a reinforcement learning mechanism is combined, so that the robot can follow in the environment interaction process and improve the direction control performance, the robustness is high, and the cost is greatly reduced.
Description
Technical Field
The invention belongs to the technical field of intelligent robots, and relates to a mobile robot vision following method based on deep reinforcement learning.
Background
With the progress of technology and the development of society, more and more intelligent robots appear in the lives of people. The following robot is one of new systems which have attracted much attention in recent years, and can be applied to complex environments such as hospitals, markets or schools as an assistant for owners of the following robot to move along with the follow robot, which brings great convenience to lives of people. The following robot has the functions of autonomous perception, recognition, decision and movement, can recognize a specific target and is combined with a corresponding control system to realize the following of the target in a complex scene.
At present, the following robot system is generally researched based on a visual sensor or combination of multiple sensors, the visual sensor is generally used for acquiring a visual image by using a stereo camera, complicated calibration steps are needed, and the following robot system is difficult to adapt to strong outdoor illumination; the latter increases the system cost due to the addition of additional sensors, and also brings about a complex data fusion process. To ensure robustness of tracking in dynamically unknown environments, complex features are typically required to be designed manually, which greatly increases human costs, time costs, and computational resources. In addition, the conventional following robot system usually splits the whole system into a target tracking module and a robot motion control part, and in such a pipeline design structure, errors occurring in a former module are usually sequentially transmitted to a latter module, so that accumulation of errors is gradually amplified, and finally, a large influence is generated on system performance.
In summary, the traditional following robot system has the disadvantages of high hardware cost and design cost, and cannot completely adapt to the variability and complexity of indoor and outdoor environments under the support of simple hardware, so that the robot easily loses the target person, and the robustness of the following system is reduced, thereby seriously affecting the application and popularization of the following robot in the actual life.
Disclosure of Invention
Aiming at the defects of the current traditional following robot design, the invention provides a mobile robot vision following method based on deep reinforcement learning.
The invention uses a monocular color camera as the only input sensor of the robot, introduces a Convolutional Neural Network (CNN) and Deep Reinforcement Learning (DRL) into the following robot system, gets rid of the process of complicated manual design characteristics in the traditional following robot system, enables the robot to directly learn the control strategy from the visual field image, greatly reduces the possibility of target tracking loss, and can better adapt to illumination change, background object interference and target pedestrian and interference elimination in the complex environment. Meanwhile, the introduction of deep reinforcement learning enables the following robot to continuously learn from experience in the process of interacting with the environment, and the intelligence level of the following robot is continuously improved.
The method adopts a framework of 'simulated image supervised pre-training + model migration + RL', firstly collects a small amount of data in a real environment, and automatically expands a data set by adopting a computer program and an image processing technology so as to obtain a large amount of simulated data sets which can adapt to a real scene in a short time and are used for carrying out supervised training on a direction control model following the robot; secondly, a CNN model for controlling the direction of the robot is built, and supervised training is carried out on the CNN model by using an automatically constructed simulation data set to enable the CNN model to be used as a pre-training model; and then, transferring the knowledge of the pre-training model to a control model based on DRL (DRL), so that the robot executes a following task in a real environment, and combining a Reinforcement Learning (RL) mechanism to enable the robot to follow in the process of environment interaction and improve the direction control performance.
The specific technical scheme is as follows:
a mobile robot vision following method based on deep reinforcement learning comprises the following steps:
the method comprises the following steps: automated construction of a data set;
in order to reduce the cost of data collection and quickly obtain large-scale training data, the invention designs an automatic data set construction method by utilizing a computer program and an image processing technology. A small amount of data is collected in a simple experiment scene, then the obtained small amount of experiment data is expanded in a large scale by using an image mask technology, and a large amount of data which can adapt to complex indoor and outdoor scenes can be obtained in a short time, so that the cost of manually collecting and marking the data is greatly reduced.
(1) Preparing a simple scene with a followed object easily distinguished from the background; acquiring view images of different positions of a target person in the robot view from the view of the following robot in a simple scene;
(2) and preparing an application scene following the robot as a complex scene image, such as an indoor scene, an outdoor scene, a street view and the like. Because the followed target person in the simple scene can be easily separated from the background, the target person can be extracted from the background of the simple scene by utilizing an image mask technology and then superposed with the complex scene to obtain an image of the target person in the complex scene, and a corresponding action space label in the simple scene is directly given to the synthesized complex scene image;
the image mask technology is mainly to multiply a two-dimensional matrix (namely a mask) designed for an interested area of an image and an image to be processed, and an obtained result is the interested area to be extracted.
Step two: constructing and training a direction control model based on the CNN;
the CNN-based direction control model is responsible for outputting direction prediction of actions to be taken for the robot visual field image. The model is supervised and trained by utilizing a large-scale simulation data set which is automatically constructed, so that the model has a high direction control level. Knowledge learned in this model will be migrated by means of model migration into the DRL-based directional control model as a priori knowledge of the latter with respect to the directional control strategy.
Before an image collected from a monocular color camera of the robot is input to a CNN, an RGB three channel of the image is converted into an HSV channel, and then the HSV channel is used as an input image and is sent to the CNN; then, carrying out supervised training on the CNN model by using the automatically constructed data set in the step one, so that the CNN can achieve the effect of outputting a corresponding action state through a robot visual field input image;
step three: model migration;
the invention takes the learned strategy in the CNN direction control model as a model migration method based on the prior knowledge of the DRL direction control model. Although the output of the CNN model and the DRL model have different meanings: the output of the CNN model is the probability of each directional motion, while the output of the DRL model is typically a value estimate of each directional motion, but they have the same output dimensions. Generally, the CNN model has a higher value estimate corresponding to the direction of motion with a higher output probability.
Transferring the CNN parameter weight trained in the step two to a DRL model as an initial parameter so that the DRL model obtains the same control level as the CNN model;
step four: building and training a direction control model based on the DRL;
the DRL model is responsible for further improving the performance of the model by utilizing an RL mechanism on the basis of obtaining the prior knowledge of the CNN model. The introduction of the RL mechanism enables the robot to collect experience and improve own knowledge in the process of interacting with the environment, so that the robot following direction control level higher than that of a CNN model is obtained.
And (4) applying the DRL model after the initial parameter migration in the step three to a robot end for use, and enabling the robot to continuously update the model, learn and adapt to the current environment through continuous interaction with the environment.
Further, the second step: the size of an image collected by a monocular color camera of the robot is 640 x 480, RGB three channels of the image are converted into HSV channels before the image is input to a neural network, the size of the image of 640 x 480 is adjusted to be 60 x 80, the images collected at 4 adjacent moments are combined together to be used as the input of the network, a final input layer comprises 12 channels of 4 x 3, and the size of each channel is 60 x 80.
Further, the second step: the CNN structure based on the nano-structure is composed of 8 layers, including a convolution layer 3 layer, a pooling layer 2 layer, a full-communication layer 2 layer and an output layer; the convolutional layer is designed to perform feature extraction on an input image, and the pooling layer is designed to perform dimension reduction on the extracted features to reduce the amount of calculation required for forward propagation. From front to back, the convolution kernel parameter settings for the three convolution layers are: 8 × 8, 4 × 4, 2 × 2; the two pooling layers are both subjected to maximum pooling, and the sizes of the two pooling layers are both 2 multiplied by 2; after the third convolution, the data is input to two full-connection layers, each layer is provided with 384 nodes, the output layer is arranged behind the full-connection layer, the data is output in a multi-dimensional mode after passing through the output layer, each dimension represents the action in the corresponding direction, and the actions in three directions are contained: forward, left, right; a Relu activation function is added after the three convolutional layers and the two full-link layers to carry out nonlinear transformation on the result of the input layer; the CNN parameter is updated by adopting a cross entropy loss function, which is specifically expressed as:
where y' is the label data of the sample, which is a three-dimensional One-Hot vector, where a dimension of 1 represents the correct action. And f (x) represents the prediction probability of the CNN model for each action dimension.
Further, the DRL model in the third step is specifically a DQN model, and the migration process is: and removing the Softmax layer of the trained CNN network, and directly endowing the weight parameters of the previous layers to the DQN model.
Further, the fourth step: DQN uses neural network to approximate function, i.e. input of neural network is current state value s, output is predicted value Qθ(s, a) at each time step, the environment gives a state value s, and the agent derives a value quantity Q for s and all actions according to a value function networkθ(s, a), then, selecting an action by using a greedy algorithm e-greedy, making a decision, and giving an incentive value r and a next state s' after the environment receives the action a; this is a step; updating parameters of the value function network according to r; DQN uses mean square error to define an objective function:
wherein s ', a' are the state and action at the next moment, gamma is a hyper-parameter, and theta is a model parameter;
during training, the updating mode of the parameters is as follows:
when the final depth reinforcement learning algorithm is applied to a physical robot, a real-time view image acquired by a monocular color camera carried by the robot is used as a state input value of a DRL algorithm, an algorithm output action space is a set of direction control signals, and the robot can move along with a target person in real time by executing a direction control instruction.
The method of the invention provides an intelligent following robot system based on a deep reinforcement learning algorithm aiming at the problems of the following robot system in practical application, and the end-to-end design enables a tracking module and a direction control module in the traditional following robot system to be fused, thereby preventing error transmission and accumulation between the modules and enabling the robot to directly learn the mapping relation between a target and a behavior strategy. Compared with a traditional following robot system, the system has higher robustness, can greatly reduce hardware cost and labor cost, and increases possibility of popularization and use of the following robot in actual life.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of an automated data set construction process of the present invention.
FIG. 3 is a diagram of the effect of the invention on the automated construction of a composite image from a data set.
Wherein, each subgraph is described as follows:
(a) (b) (c) (d) are examples of pictures of different positions of a target person acquired by the robot in a simple scene;
(e) (f) (g) (h) are examples of complex scene pictures collected on the interconnect;
(i) (g) (k) (l) is an example of a composite dataset image processed by an image masking technique;
(a) (e) (i), (b) (f) (g), (c) (g) (k), (d) (h) (l) show the complete image mask synthesis process and effect of the target person at different positions in the simple image, respectively.
Fig. 4 is a diagram showing correspondence between an input image and an operation space according to the present invention.
FIG. 5 is a following robot system architecture diagram of the present invention.
Detailed Description
The software environment of the present embodiment is the ubuntu14.04 system, the mobile robot is a urtlebot2 robot, and the robot input sensor is a monocular color camera with a resolution of 640 × 480.
The method comprises the following steps: automatic data set construction process
For the direction control model of the supervised following robot, a camera view image of the following robot is input, and an action which the robot should take at the current moment is output. The construction process of the entire data set includes two parts: and obtaining an input visual field image and labeling an output action.
A simple scene is prepared in which the followed object needs to be more easily distinguished from the background. In a simple scene, a plurality of visual field images of the target person at different positions in the visual field of the robot are collected from the visual field of the following robot. A certain number of complex scene images are downloaded from the Internet, and the scene images mainly comprise common application scenes of the following robot, such as indoor and outdoor scenes, street scenes and the like. Because the followed target person in the simple scene can be easily separated from the background, the target person can be extracted from the background by utilizing an image mask technology and then superposed with the complex scene obtained on the Internet, so that the image of the target person in the complex scene can be obtained, and a corresponding action space label in the simple scene can be directly given to the synthesized complex scene image. A schematic diagram of an automated data set construction process is shown in fig. 2. The effect diagrams of the simple scene image, the internet complex scene image and the data set automatic construction process are shown in fig. 3.
After collecting the images containing target persons at different positions in the simple scene, the color of the tracked target is greatly different from the background color of the simple scene, and an image mask is designed directly by setting a color threshold. After the mask is applied to the robot visual field image, a binary image of the tracked target and the background can be obtained, and the outline of the target person is extracted. At this time, the image values of the background portion are all 0, and the image value of the tracked target person is 1. At this time, the image part of the target person can be overlapped with the complex scene picture. The action tag is obtained by averaging the horizontal position of the tracked target person image value 1 in the binary image after the image mask.
Step two: CNN-based direction control model building and training process
The size of the image collected from the monocular color camera is 640 × 480, the RGB three channels are converted into HSV channels before being input to the neural network, and the 640 × 480 image is adjusted to 60 × 80 and then is used as an input image to be input to the neural network. The design of the invention combines the images collected at 4 adjacent moments together as the input of the network, and because a single image is a three-channel HSV image, the final input layer comprises 12 channels of 4 multiplied by 3, and the size of each channel is 60 multiplied by 80. And then, carrying out supervised training on the CNN model by using the automatically constructed data set, so that the CNN network can achieve the effect of outputting a corresponding action state through a robot visual field input image.
Step three: model migration process
The DQN model finally used in the invention has a structure similar to the CNN direction control network designed by the invention and described above, but the last Softmax layer is removed, and value prediction of each state action pair is directly output instead of probability distribution of each action. Therefore, the model migration strategy adopted by the invention is as follows: removing the Softmax layer of the trained CNN network, and directly endowing the weight parameters of the previous layers to the DQN model so as to achieve the purpose of prior knowledge migration.
Step four: DRL-based directional control model training process
After the model migration is completed, the DRL model can be used at the robot end and continuously interacts with the environment, so that the robot can continuously update the model, learn the current environment and improve the following robustness. In the process, the algorithm outputs discrete motion space to control the robot, the motion space following the robot is a set containing left, right and forward instructions, and the corresponding relation between the motion space and the input image is shown in fig. 4.
There is no separate label for the data in the RL, and only the externally fed reward signal is relied on to suggest how well the action is, so the design of the reward function is a crucial element for the successful application of the RL. The following robot direction control reward function in the invention is designed as follows: a user is remotely connected to the local end of the following robot to observe a visual field image of the robot, and the initial STOP is 0, which indicates that the following is not finished; when finding that the position of the user in the robot visual field image deviates from the center, the user can send a STOP message through the handheld device, and the robot end knows that the user has failed to follow the robot end when receiving the STOP message, sets the STOP to 1, and controls the robot to STOP moving. The design can facilitate the operation of the user on one hand, and on the other hand, a more accurate reward signal can be obtained so as to accelerate the convergence of the model. At this time, the bonus function can be expressed by the following equation:
wherein C is a negative number.
Through a verification experiment on the TurtleBot2 robot, the method can accurately follow a specific target person and has high robustness.
Claims (3)
1. A mobile robot vision following method based on deep reinforcement learning is characterized by comprising the following steps:
the method comprises the following steps: automated construction of a data set;
(1) preparing a simple scene with a followed object easily distinguished from the background; acquiring view images of different positions of a target person in the robot view from the view of the following robot in a simple scene;
(2) preparing an application scene following the robot as a complex scene image, extracting a target person from the background of a simple scene by using an image mask technology, and then overlapping the target person with the complex scene to obtain an image of the target person in the complex scene, and directly endowing a corresponding action space label under the simple scene to the synthesized complex scene image;
step two: constructing and training a direction control model based on the CNN;
carrying out supervised training on the CNN model by utilizing the data set automatically constructed in the first step, so that the CNN can achieve the effect of outputting corresponding action states by inputting images through the visual field of the robot, converting RGB three channels of the images collected from a monocular color camera of the robot into HSV channels before inputting the images to the CNN, then taking the HSV channels as input images to the CNN, and then outputting the corresponding action states by a network;
the CNN structure consists of 8 layers, including a convolution layer 3 layer, a pooling layer 2 layer, a full-communication layer 2 layer and an output layer; from front to back, the convolution kernel parameter settings for the three convolution layers are: 8 × 8, 4 × 4, 2 × 2; the two pooling layers are both subjected to maximum pooling, and the sizes of the two pooling layers are both 2 multiplied by 2; after the third convolution, the data is input to two full-connection layers, each layer is provided with 384 nodes, the output layer is arranged behind the full-connection layer, the data is output in a multi-dimensional mode after passing through the output layer, each dimension represents the action in the corresponding direction, and the actions in three directions are contained: forward, left, right; a Relu activation function is added after the three convolutional layers and the two full-link layers to carry out nonlinear transformation on the result of the input layer; the CNN parameter is updated by adopting a cross entropy loss function, which is specifically expressed as:
wherein y' is label data of the sample and is a three-dimensional One-Hot vector, and the dimension of 1 represents a correct action; (x) represents the predicted probability of the CNN model for each action dimension;
step three: model migration;
transferring the CNN parameter weight trained in the step two to a DRL model as an initial parameter so that the DRL model obtains the same control level as the CNN model; the DRL model is a DQN model, and the migration process is as follows: removing the Softmax layer of the trained CNN network, and directly endowing the weight parameters of the previous layers to the DQN model;
step four: building and training a direction control model based on the DRL;
and (4) applying the DRL model after the initial parameter migration in the step three to a robot end for use, and enabling the robot to continuously update the model and learn the current environment through continuous interaction with the environment.
2. The mobile robot visual following method based on deep reinforcement learning according to claim 1, wherein the second step: the size of an image collected by a monocular color camera of the robot is 640 x 480, RGB three channels of the image are converted into HSV channels before the image is input to a neural network, the size of the image of 640 x 480 is adjusted to be 60 x 80, the images collected at 4 adjacent moments are combined together to be used as the input of the network, a final input layer comprises 12 channels of 4 x 3, and the size of each channel is 60 x 80.
3. The mobile robot visual following method based on deep reinforcement learning of claim 1, wherein the fourth step: the DQN uses a neural network approximation function, i.e. the input of the neural network is the current state value s and the output is the predicted value Qθ(s, a) at each time step, the environment gives a state value s, and the agent derives a value quantity Q for s and all actions according to a value function networkθ(s, a) and then using greedy algorithm e-greedy selects action, makes decision, and gives out a reward value r and a next state s' after the environment receives the action a; this is a step; updating parameters of the value function network according to r; DQN uses mean square error to define an objective function:
wherein s ', a' are the state and action at the next moment, gamma is a hyper-parameter, and theta is a model parameter;
during training, the updating mode of the parameters is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910361528.4A CN110084307B (en) | 2019-04-30 | 2019-04-30 | Mobile robot vision following method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910361528.4A CN110084307B (en) | 2019-04-30 | 2019-04-30 | Mobile robot vision following method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110084307A CN110084307A (en) | 2019-08-02 |
CN110084307B true CN110084307B (en) | 2021-06-18 |
Family
ID=67418184
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910361528.4A Expired - Fee Related CN110084307B (en) | 2019-04-30 | 2019-04-30 | Mobile robot vision following method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110084307B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114270370A (en) * | 2019-09-05 | 2022-04-01 | 三菱电机株式会社 | Inference device, device control system, and learning device |
CN110728368B (en) * | 2019-10-25 | 2022-03-15 | 中国人民解放军国防科技大学 | Acceleration method for deep reinforcement learning of simulation robot |
CN112731804A (en) * | 2019-10-29 | 2021-04-30 | 北京京东乾石科技有限公司 | Method and device for realizing path following |
CN111578940B (en) * | 2020-04-24 | 2021-05-11 | 哈尔滨工业大学 | Indoor monocular navigation method and system based on cross-sensor transfer learning |
CN111539979B (en) * | 2020-04-27 | 2022-12-27 | 天津大学 | Human body front tracking method based on deep reinforcement learning |
CN111523495B (en) * | 2020-04-27 | 2023-09-01 | 天津中科智能识别产业技术研究院有限公司 | End-to-end active human body tracking method in monitoring scene based on deep reinforcement learning |
CN112297012B (en) * | 2020-10-30 | 2022-05-31 | 上海交通大学 | Robot reinforcement learning method based on self-adaptive model |
CN112702423B (en) * | 2020-12-23 | 2022-05-03 | 杭州比脉科技有限公司 | Robot learning system based on Internet of things interactive entertainment mode |
CN112799401A (en) * | 2020-12-28 | 2021-05-14 | 华南理工大学 | End-to-end robot vision-motion navigation method |
CN113031441B (en) * | 2021-03-03 | 2022-04-08 | 北京航空航天大学 | Rotary mechanical diagnosis network automatic search method based on reinforcement learning |
CN113158778A (en) * | 2021-03-09 | 2021-07-23 | 中国电子科技集团公司第五十四研究所 | SAR image target detection method |
CN113011526B (en) * | 2021-04-23 | 2024-04-26 | 华南理工大学 | Robot skill learning method and system based on reinforcement learning and unsupervised learning |
CN113156959B (en) * | 2021-04-27 | 2024-06-04 | 东莞理工学院 | Self-supervision learning and navigation method for autonomous mobile robot in complex scene |
CN113485326A (en) * | 2021-06-28 | 2021-10-08 | 南京深一科技有限公司 | Autonomous mobile robot based on visual navigation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549928A (en) * | 2018-03-19 | 2018-09-18 | 清华大学 | Visual tracking method and device based on continuous moving under deeply learning guide |
CN108932735A (en) * | 2018-07-10 | 2018-12-04 | 广州众聚智能科技有限公司 | A method of generating deep learning sample |
CN109242882A (en) * | 2018-08-06 | 2019-01-18 | 北京市商汤科技开发有限公司 | Visual tracking method, device, medium and equipment |
CN109341689A (en) * | 2018-09-12 | 2019-02-15 | 北京工业大学 | Vision navigation method of mobile robot based on deep learning |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180018562A1 (en) * | 2016-07-14 | 2018-01-18 | Cside Japan Inc. | Platform for providing task based on deep learning |
AU2017101165A4 (en) * | 2017-08-25 | 2017-11-02 | Liu, Yichen MR | Method of Structural Improvement of Game Training Deep Q-Network |
-
2019
- 2019-04-30 CN CN201910361528.4A patent/CN110084307B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549928A (en) * | 2018-03-19 | 2018-09-18 | 清华大学 | Visual tracking method and device based on continuous moving under deeply learning guide |
CN108932735A (en) * | 2018-07-10 | 2018-12-04 | 广州众聚智能科技有限公司 | A method of generating deep learning sample |
CN109242882A (en) * | 2018-08-06 | 2019-01-18 | 北京市商汤科技开发有限公司 | Visual tracking method, device, medium and equipment |
CN109341689A (en) * | 2018-09-12 | 2019-02-15 | 北京工业大学 | Vision navigation method of mobile robot based on deep learning |
Non-Patent Citations (4)
Title |
---|
Bridging the Gap Between Value and Policy Based Reinforcement Learning;Ofir Nachum et al;《arXiv:1702.08892v1》;20170228;1-16 * |
Human-level control through deep reinforcement learning;Volodymyr Mnih et al;《NATURE》;20150226;第518卷;第1页右栏、第6页右栏、第7页右栏 * |
基于多模态视频的鲁棒目标跟踪方法研究;李昌;《中国优秀硕士学位论文全文数据库信息科技辑》;20180815;第2018年卷(第8期);第四章 * |
基于机器学习的视觉目标跟踪算法研究;阳岳生;《中国优秀硕士学位论文全文数据库信息科技辑》;20190215;第2019年卷(第2期);I138-2106 * |
Also Published As
Publication number | Publication date |
---|---|
CN110084307A (en) | 2019-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084307B (en) | Mobile robot vision following method based on deep reinforcement learning | |
Li et al. | Building and optimization of 3D semantic map based on Lidar and camera fusion | |
CN108491880B (en) | Object classification and pose estimation method based on neural network | |
CN110705448B (en) | Human body detection method and device | |
CN111190981B (en) | Method and device for constructing three-dimensional semantic map, electronic equipment and storage medium | |
CN108734120A (en) | Mark method, apparatus, equipment and the computer readable storage medium of image | |
CN111079561A (en) | Robot intelligent grabbing method based on virtual training | |
CN109215067A (en) | High-resolution 3-D point cloud is generated based on CNN and CRF model | |
CN111062326B (en) | Self-supervision human body 3D gesture estimation network training method based on geometric driving | |
CN101154289A (en) | Method for tracing three-dimensional human body movement based on multi-camera | |
JP7110884B2 (en) | LEARNING DEVICE, CONTROL DEVICE, LEARNING METHOD, AND LEARNING PROGRAM | |
CN111598951A (en) | Method, device and storage medium for identifying space target | |
CN105976395B (en) | A kind of video target tracking method based on rarefaction representation | |
Passalis et al. | Deep reinforcement learning for controlling frontal person close-up shooting | |
WO2022052782A1 (en) | Image processing method and related device | |
CN115147488A (en) | Workpiece pose estimation method based on intensive prediction and grasping system | |
CN111531546B (en) | Robot pose estimation method, device, equipment and storage medium | |
CN111078008B (en) | Control method of early education robot | |
CN112270211A (en) | Stage lighting control method and system based on somatosensory interaction | |
CN112215766A (en) | Image defogging method integrating image restoration and image enhancement and convolution network thereof | |
CN113255514B (en) | Behavior identification method based on local scene perception graph convolutional network | |
Xia et al. | Modality translation and fusion for event-based semantic segmentation | |
CN117916773A (en) | Method and system for simultaneous pose reconstruction and parameterization of 3D mannequins in mobile devices | |
Xue et al. | An end-to-end multi-resolution feature fusion defogging network | |
CN117236433B (en) | Intelligent communication perception method, system, equipment and medium for assisting blind person life |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210618 |