CN110084307B - Mobile robot vision following method based on deep reinforcement learning - Google Patents

Mobile robot vision following method based on deep reinforcement learning Download PDF

Info

Publication number
CN110084307B
CN110084307B CN201910361528.4A CN201910361528A CN110084307B CN 110084307 B CN110084307 B CN 110084307B CN 201910361528 A CN201910361528 A CN 201910361528A CN 110084307 B CN110084307 B CN 110084307B
Authority
CN
China
Prior art keywords
model
robot
cnn
image
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910361528.4A
Other languages
Chinese (zh)
Other versions
CN110084307A (en
Inventor
张云洲
王帅
庞琳卓
刘及惟
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910361528.4A priority Critical patent/CN110084307B/en
Publication of CN110084307A publication Critical patent/CN110084307A/en
Application granted granted Critical
Publication of CN110084307B publication Critical patent/CN110084307B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/12Target-seeking control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Manipulator (AREA)

Abstract

The invention provides a mobile robot vision following method based on deep reinforcement learning. The method comprises the following steps of adopting a framework of 'simulated image supervised pre-training + model migration + RL', firstly collecting a small amount of data in a real environment, and automatically expanding a data set by adopting a computer program and an image processing technology so as to obtain a large amount of simulated data sets which can adapt to a real scene in a short time and be used for carrying out supervised training on a direction control model following a robot; secondly, a CNN model for controlling the direction of the robot is built, and supervised training is carried out on the CNN model by using an automatically constructed simulation data set to enable the CNN model to be used as a pre-training model; and then, the knowledge of the pre-training model is transferred to a control model based on DRL (distributed resource library), so that the robot executes a following task in a real environment, and a reinforcement learning mechanism is combined, so that the robot can follow in the environment interaction process and improve the direction control performance, the robustness is high, and the cost is greatly reduced.

Description

Mobile robot vision following method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of intelligent robots, and relates to a mobile robot vision following method based on deep reinforcement learning.
Background
With the progress of technology and the development of society, more and more intelligent robots appear in the lives of people. The following robot is one of new systems which have attracted much attention in recent years, and can be applied to complex environments such as hospitals, markets or schools as an assistant for owners of the following robot to move along with the follow robot, which brings great convenience to lives of people. The following robot has the functions of autonomous perception, recognition, decision and movement, can recognize a specific target and is combined with a corresponding control system to realize the following of the target in a complex scene.
At present, the following robot system is generally researched based on a visual sensor or combination of multiple sensors, the visual sensor is generally used for acquiring a visual image by using a stereo camera, complicated calibration steps are needed, and the following robot system is difficult to adapt to strong outdoor illumination; the latter increases the system cost due to the addition of additional sensors, and also brings about a complex data fusion process. To ensure robustness of tracking in dynamically unknown environments, complex features are typically required to be designed manually, which greatly increases human costs, time costs, and computational resources. In addition, the conventional following robot system usually splits the whole system into a target tracking module and a robot motion control part, and in such a pipeline design structure, errors occurring in a former module are usually sequentially transmitted to a latter module, so that accumulation of errors is gradually amplified, and finally, a large influence is generated on system performance.
In summary, the traditional following robot system has the disadvantages of high hardware cost and design cost, and cannot completely adapt to the variability and complexity of indoor and outdoor environments under the support of simple hardware, so that the robot easily loses the target person, and the robustness of the following system is reduced, thereby seriously affecting the application and popularization of the following robot in the actual life.
Disclosure of Invention
Aiming at the defects of the current traditional following robot design, the invention provides a mobile robot vision following method based on deep reinforcement learning.
The invention uses a monocular color camera as the only input sensor of the robot, introduces a Convolutional Neural Network (CNN) and Deep Reinforcement Learning (DRL) into the following robot system, gets rid of the process of complicated manual design characteristics in the traditional following robot system, enables the robot to directly learn the control strategy from the visual field image, greatly reduces the possibility of target tracking loss, and can better adapt to illumination change, background object interference and target pedestrian and interference elimination in the complex environment. Meanwhile, the introduction of deep reinforcement learning enables the following robot to continuously learn from experience in the process of interacting with the environment, and the intelligence level of the following robot is continuously improved.
The method adopts a framework of 'simulated image supervised pre-training + model migration + RL', firstly collects a small amount of data in a real environment, and automatically expands a data set by adopting a computer program and an image processing technology so as to obtain a large amount of simulated data sets which can adapt to a real scene in a short time and are used for carrying out supervised training on a direction control model following the robot; secondly, a CNN model for controlling the direction of the robot is built, and supervised training is carried out on the CNN model by using an automatically constructed simulation data set to enable the CNN model to be used as a pre-training model; and then, transferring the knowledge of the pre-training model to a control model based on DRL (DRL), so that the robot executes a following task in a real environment, and combining a Reinforcement Learning (RL) mechanism to enable the robot to follow in the process of environment interaction and improve the direction control performance.
The specific technical scheme is as follows:
a mobile robot vision following method based on deep reinforcement learning comprises the following steps:
the method comprises the following steps: automated construction of a data set;
in order to reduce the cost of data collection and quickly obtain large-scale training data, the invention designs an automatic data set construction method by utilizing a computer program and an image processing technology. A small amount of data is collected in a simple experiment scene, then the obtained small amount of experiment data is expanded in a large scale by using an image mask technology, and a large amount of data which can adapt to complex indoor and outdoor scenes can be obtained in a short time, so that the cost of manually collecting and marking the data is greatly reduced.
(1) Preparing a simple scene with a followed object easily distinguished from the background; acquiring view images of different positions of a target person in the robot view from the view of the following robot in a simple scene;
(2) and preparing an application scene following the robot as a complex scene image, such as an indoor scene, an outdoor scene, a street view and the like. Because the followed target person in the simple scene can be easily separated from the background, the target person can be extracted from the background of the simple scene by utilizing an image mask technology and then superposed with the complex scene to obtain an image of the target person in the complex scene, and a corresponding action space label in the simple scene is directly given to the synthesized complex scene image;
the image mask technology is mainly to multiply a two-dimensional matrix (namely a mask) designed for an interested area of an image and an image to be processed, and an obtained result is the interested area to be extracted.
Step two: constructing and training a direction control model based on the CNN;
the CNN-based direction control model is responsible for outputting direction prediction of actions to be taken for the robot visual field image. The model is supervised and trained by utilizing a large-scale simulation data set which is automatically constructed, so that the model has a high direction control level. Knowledge learned in this model will be migrated by means of model migration into the DRL-based directional control model as a priori knowledge of the latter with respect to the directional control strategy.
Before an image collected from a monocular color camera of the robot is input to a CNN, an RGB three channel of the image is converted into an HSV channel, and then the HSV channel is used as an input image and is sent to the CNN; then, carrying out supervised training on the CNN model by using the automatically constructed data set in the step one, so that the CNN can achieve the effect of outputting a corresponding action state through a robot visual field input image;
step three: model migration;
the invention takes the learned strategy in the CNN direction control model as a model migration method based on the prior knowledge of the DRL direction control model. Although the output of the CNN model and the DRL model have different meanings: the output of the CNN model is the probability of each directional motion, while the output of the DRL model is typically a value estimate of each directional motion, but they have the same output dimensions. Generally, the CNN model has a higher value estimate corresponding to the direction of motion with a higher output probability.
Transferring the CNN parameter weight trained in the step two to a DRL model as an initial parameter so that the DRL model obtains the same control level as the CNN model;
step four: building and training a direction control model based on the DRL;
the DRL model is responsible for further improving the performance of the model by utilizing an RL mechanism on the basis of obtaining the prior knowledge of the CNN model. The introduction of the RL mechanism enables the robot to collect experience and improve own knowledge in the process of interacting with the environment, so that the robot following direction control level higher than that of a CNN model is obtained.
And (4) applying the DRL model after the initial parameter migration in the step three to a robot end for use, and enabling the robot to continuously update the model, learn and adapt to the current environment through continuous interaction with the environment.
Further, the second step: the size of an image collected by a monocular color camera of the robot is 640 x 480, RGB three channels of the image are converted into HSV channels before the image is input to a neural network, the size of the image of 640 x 480 is adjusted to be 60 x 80, the images collected at 4 adjacent moments are combined together to be used as the input of the network, a final input layer comprises 12 channels of 4 x 3, and the size of each channel is 60 x 80.
Further, the second step: the CNN structure based on the nano-structure is composed of 8 layers, including a convolution layer 3 layer, a pooling layer 2 layer, a full-communication layer 2 layer and an output layer; the convolutional layer is designed to perform feature extraction on an input image, and the pooling layer is designed to perform dimension reduction on the extracted features to reduce the amount of calculation required for forward propagation. From front to back, the convolution kernel parameter settings for the three convolution layers are: 8 × 8, 4 × 4, 2 × 2; the two pooling layers are both subjected to maximum pooling, and the sizes of the two pooling layers are both 2 multiplied by 2; after the third convolution, the data is input to two full-connection layers, each layer is provided with 384 nodes, the output layer is arranged behind the full-connection layer, the data is output in a multi-dimensional mode after passing through the output layer, each dimension represents the action in the corresponding direction, and the actions in three directions are contained: forward, left, right; a Relu activation function is added after the three convolutional layers and the two full-link layers to carry out nonlinear transformation on the result of the input layer; the CNN parameter is updated by adopting a cross entropy loss function, which is specifically expressed as:
Figure BDA0002046960080000031
where y' is the label data of the sample, which is a three-dimensional One-Hot vector, where a dimension of 1 represents the correct action. And f (x) represents the prediction probability of the CNN model for each action dimension.
Further, the DRL model in the third step is specifically a DQN model, and the migration process is: and removing the Softmax layer of the trained CNN network, and directly endowing the weight parameters of the previous layers to the DQN model.
Further, the fourth step: DQN uses neural network to approximate function, i.e. input of neural network is current state value s, output is predicted value Qθ(s, a) at each time step, the environment gives a state value s, and the agent derives a value quantity Q for s and all actions according to a value function networkθ(s, a), then, selecting an action by using a greedy algorithm e-greedy, making a decision, and giving an incentive value r and a next state s' after the environment receives the action a; this is a step; updating parameters of the value function network according to r; DQN uses mean square error to define an objective function:
Figure BDA0002046960080000042
wherein s ', a' are the state and action at the next moment, gamma is a hyper-parameter, and theta is a model parameter;
during training, the updating mode of the parameters is as follows:
Figure BDA0002046960080000041
when the final depth reinforcement learning algorithm is applied to a physical robot, a real-time view image acquired by a monocular color camera carried by the robot is used as a state input value of a DRL algorithm, an algorithm output action space is a set of direction control signals, and the robot can move along with a target person in real time by executing a direction control instruction.
The method of the invention provides an intelligent following robot system based on a deep reinforcement learning algorithm aiming at the problems of the following robot system in practical application, and the end-to-end design enables a tracking module and a direction control module in the traditional following robot system to be fused, thereby preventing error transmission and accumulation between the modules and enabling the robot to directly learn the mapping relation between a target and a behavior strategy. Compared with a traditional following robot system, the system has higher robustness, can greatly reduce hardware cost and labor cost, and increases possibility of popularization and use of the following robot in actual life.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of an automated data set construction process of the present invention.
FIG. 3 is a diagram of the effect of the invention on the automated construction of a composite image from a data set.
Wherein, each subgraph is described as follows:
(a) (b) (c) (d) are examples of pictures of different positions of a target person acquired by the robot in a simple scene;
(e) (f) (g) (h) are examples of complex scene pictures collected on the interconnect;
(i) (g) (k) (l) is an example of a composite dataset image processed by an image masking technique;
(a) (e) (i), (b) (f) (g), (c) (g) (k), (d) (h) (l) show the complete image mask synthesis process and effect of the target person at different positions in the simple image, respectively.
Fig. 4 is a diagram showing correspondence between an input image and an operation space according to the present invention.
FIG. 5 is a following robot system architecture diagram of the present invention.
Detailed Description
The software environment of the present embodiment is the ubuntu14.04 system, the mobile robot is a urtlebot2 robot, and the robot input sensor is a monocular color camera with a resolution of 640 × 480.
The method comprises the following steps: automatic data set construction process
For the direction control model of the supervised following robot, a camera view image of the following robot is input, and an action which the robot should take at the current moment is output. The construction process of the entire data set includes two parts: and obtaining an input visual field image and labeling an output action.
A simple scene is prepared in which the followed object needs to be more easily distinguished from the background. In a simple scene, a plurality of visual field images of the target person at different positions in the visual field of the robot are collected from the visual field of the following robot. A certain number of complex scene images are downloaded from the Internet, and the scene images mainly comprise common application scenes of the following robot, such as indoor and outdoor scenes, street scenes and the like. Because the followed target person in the simple scene can be easily separated from the background, the target person can be extracted from the background by utilizing an image mask technology and then superposed with the complex scene obtained on the Internet, so that the image of the target person in the complex scene can be obtained, and a corresponding action space label in the simple scene can be directly given to the synthesized complex scene image. A schematic diagram of an automated data set construction process is shown in fig. 2. The effect diagrams of the simple scene image, the internet complex scene image and the data set automatic construction process are shown in fig. 3.
After collecting the images containing target persons at different positions in the simple scene, the color of the tracked target is greatly different from the background color of the simple scene, and an image mask is designed directly by setting a color threshold. After the mask is applied to the robot visual field image, a binary image of the tracked target and the background can be obtained, and the outline of the target person is extracted. At this time, the image values of the background portion are all 0, and the image value of the tracked target person is 1. At this time, the image part of the target person can be overlapped with the complex scene picture. The action tag is obtained by averaging the horizontal position of the tracked target person image value 1 in the binary image after the image mask.
Step two: CNN-based direction control model building and training process
The size of the image collected from the monocular color camera is 640 × 480, the RGB three channels are converted into HSV channels before being input to the neural network, and the 640 × 480 image is adjusted to 60 × 80 and then is used as an input image to be input to the neural network. The design of the invention combines the images collected at 4 adjacent moments together as the input of the network, and because a single image is a three-channel HSV image, the final input layer comprises 12 channels of 4 multiplied by 3, and the size of each channel is 60 multiplied by 80. And then, carrying out supervised training on the CNN model by using the automatically constructed data set, so that the CNN network can achieve the effect of outputting a corresponding action state through a robot visual field input image.
Step three: model migration process
The DQN model finally used in the invention has a structure similar to the CNN direction control network designed by the invention and described above, but the last Softmax layer is removed, and value prediction of each state action pair is directly output instead of probability distribution of each action. Therefore, the model migration strategy adopted by the invention is as follows: removing the Softmax layer of the trained CNN network, and directly endowing the weight parameters of the previous layers to the DQN model so as to achieve the purpose of prior knowledge migration.
Step four: DRL-based directional control model training process
After the model migration is completed, the DRL model can be used at the robot end and continuously interacts with the environment, so that the robot can continuously update the model, learn the current environment and improve the following robustness. In the process, the algorithm outputs discrete motion space to control the robot, the motion space following the robot is a set containing left, right and forward instructions, and the corresponding relation between the motion space and the input image is shown in fig. 4.
There is no separate label for the data in the RL, and only the externally fed reward signal is relied on to suggest how well the action is, so the design of the reward function is a crucial element for the successful application of the RL. The following robot direction control reward function in the invention is designed as follows: a user is remotely connected to the local end of the following robot to observe a visual field image of the robot, and the initial STOP is 0, which indicates that the following is not finished; when finding that the position of the user in the robot visual field image deviates from the center, the user can send a STOP message through the handheld device, and the robot end knows that the user has failed to follow the robot end when receiving the STOP message, sets the STOP to 1, and controls the robot to STOP moving. The design can facilitate the operation of the user on one hand, and on the other hand, a more accurate reward signal can be obtained so as to accelerate the convergence of the model. At this time, the bonus function can be expressed by the following equation:
Figure BDA0002046960080000061
wherein C is a negative number.
Through a verification experiment on the TurtleBot2 robot, the method can accurately follow a specific target person and has high robustness.

Claims (3)

1. A mobile robot vision following method based on deep reinforcement learning is characterized by comprising the following steps:
the method comprises the following steps: automated construction of a data set;
(1) preparing a simple scene with a followed object easily distinguished from the background; acquiring view images of different positions of a target person in the robot view from the view of the following robot in a simple scene;
(2) preparing an application scene following the robot as a complex scene image, extracting a target person from the background of a simple scene by using an image mask technology, and then overlapping the target person with the complex scene to obtain an image of the target person in the complex scene, and directly endowing a corresponding action space label under the simple scene to the synthesized complex scene image;
step two: constructing and training a direction control model based on the CNN;
carrying out supervised training on the CNN model by utilizing the data set automatically constructed in the first step, so that the CNN can achieve the effect of outputting corresponding action states by inputting images through the visual field of the robot, converting RGB three channels of the images collected from a monocular color camera of the robot into HSV channels before inputting the images to the CNN, then taking the HSV channels as input images to the CNN, and then outputting the corresponding action states by a network;
the CNN structure consists of 8 layers, including a convolution layer 3 layer, a pooling layer 2 layer, a full-communication layer 2 layer and an output layer; from front to back, the convolution kernel parameter settings for the three convolution layers are: 8 × 8, 4 × 4, 2 × 2; the two pooling layers are both subjected to maximum pooling, and the sizes of the two pooling layers are both 2 multiplied by 2; after the third convolution, the data is input to two full-connection layers, each layer is provided with 384 nodes, the output layer is arranged behind the full-connection layer, the data is output in a multi-dimensional mode after passing through the output layer, each dimension represents the action in the corresponding direction, and the actions in three directions are contained: forward, left, right; a Relu activation function is added after the three convolutional layers and the two full-link layers to carry out nonlinear transformation on the result of the input layer; the CNN parameter is updated by adopting a cross entropy loss function, which is specifically expressed as:
Figure FDA0003038578630000011
wherein y' is label data of the sample and is a three-dimensional One-Hot vector, and the dimension of 1 represents a correct action; (x) represents the predicted probability of the CNN model for each action dimension;
step three: model migration;
transferring the CNN parameter weight trained in the step two to a DRL model as an initial parameter so that the DRL model obtains the same control level as the CNN model; the DRL model is a DQN model, and the migration process is as follows: removing the Softmax layer of the trained CNN network, and directly endowing the weight parameters of the previous layers to the DQN model;
step four: building and training a direction control model based on the DRL;
and (4) applying the DRL model after the initial parameter migration in the step three to a robot end for use, and enabling the robot to continuously update the model and learn the current environment through continuous interaction with the environment.
2. The mobile robot visual following method based on deep reinforcement learning according to claim 1, wherein the second step: the size of an image collected by a monocular color camera of the robot is 640 x 480, RGB three channels of the image are converted into HSV channels before the image is input to a neural network, the size of the image of 640 x 480 is adjusted to be 60 x 80, the images collected at 4 adjacent moments are combined together to be used as the input of the network, a final input layer comprises 12 channels of 4 x 3, and the size of each channel is 60 x 80.
3. The mobile robot visual following method based on deep reinforcement learning of claim 1, wherein the fourth step: the DQN uses a neural network approximation function, i.e. the input of the neural network is the current state value s and the output is the predicted value Qθ(s, a) at each time step, the environment gives a state value s, and the agent derives a value quantity Q for s and all actions according to a value function networkθ(s, a) and then using greedy algorithm e-greedy selects action, makes decision, and gives out a reward value r and a next state s' after the environment receives the action a; this is a step; updating parameters of the value function network according to r; DQN uses mean square error to define an objective function:
Figure FDA0003038578630000021
wherein s ', a' are the state and action at the next moment, gamma is a hyper-parameter, and theta is a model parameter;
during training, the updating mode of the parameters is as follows:
Figure FDA0003038578630000022
CN201910361528.4A 2019-04-30 2019-04-30 Mobile robot vision following method based on deep reinforcement learning Expired - Fee Related CN110084307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910361528.4A CN110084307B (en) 2019-04-30 2019-04-30 Mobile robot vision following method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910361528.4A CN110084307B (en) 2019-04-30 2019-04-30 Mobile robot vision following method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN110084307A CN110084307A (en) 2019-08-02
CN110084307B true CN110084307B (en) 2021-06-18

Family

ID=67418184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910361528.4A Expired - Fee Related CN110084307B (en) 2019-04-30 2019-04-30 Mobile robot vision following method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN110084307B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114270370A (en) * 2019-09-05 2022-04-01 三菱电机株式会社 Inference device, device control system, and learning device
CN110728368B (en) * 2019-10-25 2022-03-15 中国人民解放军国防科技大学 Acceleration method for deep reinforcement learning of simulation robot
CN112731804A (en) * 2019-10-29 2021-04-30 北京京东乾石科技有限公司 Method and device for realizing path following
CN111578940B (en) * 2020-04-24 2021-05-11 哈尔滨工业大学 Indoor monocular navigation method and system based on cross-sensor transfer learning
CN111539979B (en) * 2020-04-27 2022-12-27 天津大学 Human body front tracking method based on deep reinforcement learning
CN111523495B (en) * 2020-04-27 2023-09-01 天津中科智能识别产业技术研究院有限公司 End-to-end active human body tracking method in monitoring scene based on deep reinforcement learning
CN112297012B (en) * 2020-10-30 2022-05-31 上海交通大学 Robot reinforcement learning method based on self-adaptive model
CN112702423B (en) * 2020-12-23 2022-05-03 杭州比脉科技有限公司 Robot learning system based on Internet of things interactive entertainment mode
CN112799401A (en) * 2020-12-28 2021-05-14 华南理工大学 End-to-end robot vision-motion navigation method
CN113031441B (en) * 2021-03-03 2022-04-08 北京航空航天大学 Rotary mechanical diagnosis network automatic search method based on reinforcement learning
CN113158778A (en) * 2021-03-09 2021-07-23 中国电子科技集团公司第五十四研究所 SAR image target detection method
CN113011526B (en) * 2021-04-23 2024-04-26 华南理工大学 Robot skill learning method and system based on reinforcement learning and unsupervised learning
CN113156959B (en) * 2021-04-27 2024-06-04 东莞理工学院 Self-supervision learning and navigation method for autonomous mobile robot in complex scene
CN113485326A (en) * 2021-06-28 2021-10-08 南京深一科技有限公司 Autonomous mobile robot based on visual navigation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549928A (en) * 2018-03-19 2018-09-18 清华大学 Visual tracking method and device based on continuous moving under deeply learning guide
CN108932735A (en) * 2018-07-10 2018-12-04 广州众聚智能科技有限公司 A method of generating deep learning sample
CN109242882A (en) * 2018-08-06 2019-01-18 北京市商汤科技开发有限公司 Visual tracking method, device, medium and equipment
CN109341689A (en) * 2018-09-12 2019-02-15 北京工业大学 Vision navigation method of mobile robot based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018562A1 (en) * 2016-07-14 2018-01-18 Cside Japan Inc. Platform for providing task based on deep learning
AU2017101165A4 (en) * 2017-08-25 2017-11-02 Liu, Yichen MR Method of Structural Improvement of Game Training Deep Q-Network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549928A (en) * 2018-03-19 2018-09-18 清华大学 Visual tracking method and device based on continuous moving under deeply learning guide
CN108932735A (en) * 2018-07-10 2018-12-04 广州众聚智能科技有限公司 A method of generating deep learning sample
CN109242882A (en) * 2018-08-06 2019-01-18 北京市商汤科技开发有限公司 Visual tracking method, device, medium and equipment
CN109341689A (en) * 2018-09-12 2019-02-15 北京工业大学 Vision navigation method of mobile robot based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Bridging the Gap Between Value and Policy Based Reinforcement Learning;Ofir Nachum et al;《arXiv:1702.08892v1》;20170228;1-16 *
Human-level control through deep reinforcement learning;Volodymyr Mnih et al;《NATURE》;20150226;第518卷;第1页右栏、第6页右栏、第7页右栏 *
基于多模态视频的鲁棒目标跟踪方法研究;李昌;《中国优秀硕士学位论文全文数据库信息科技辑》;20180815;第2018年卷(第8期);第四章 *
基于机器学习的视觉目标跟踪算法研究;阳岳生;《中国优秀硕士学位论文全文数据库信息科技辑》;20190215;第2019年卷(第2期);I138-2106 *

Also Published As

Publication number Publication date
CN110084307A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN110084307B (en) Mobile robot vision following method based on deep reinforcement learning
Li et al. Building and optimization of 3D semantic map based on Lidar and camera fusion
CN108491880B (en) Object classification and pose estimation method based on neural network
CN110705448B (en) Human body detection method and device
CN111190981B (en) Method and device for constructing three-dimensional semantic map, electronic equipment and storage medium
CN108734120A (en) Mark method, apparatus, equipment and the computer readable storage medium of image
CN111079561A (en) Robot intelligent grabbing method based on virtual training
CN109215067A (en) High-resolution 3-D point cloud is generated based on CNN and CRF model
CN111062326B (en) Self-supervision human body 3D gesture estimation network training method based on geometric driving
CN101154289A (en) Method for tracing three-dimensional human body movement based on multi-camera
JP7110884B2 (en) LEARNING DEVICE, CONTROL DEVICE, LEARNING METHOD, AND LEARNING PROGRAM
CN111598951A (en) Method, device and storage medium for identifying space target
CN105976395B (en) A kind of video target tracking method based on rarefaction representation
Passalis et al. Deep reinforcement learning for controlling frontal person close-up shooting
WO2022052782A1 (en) Image processing method and related device
CN115147488A (en) Workpiece pose estimation method based on intensive prediction and grasping system
CN111531546B (en) Robot pose estimation method, device, equipment and storage medium
CN111078008B (en) Control method of early education robot
CN112270211A (en) Stage lighting control method and system based on somatosensory interaction
CN112215766A (en) Image defogging method integrating image restoration and image enhancement and convolution network thereof
CN113255514B (en) Behavior identification method based on local scene perception graph convolutional network
Xia et al. Modality translation and fusion for event-based semantic segmentation
CN117916773A (en) Method and system for simultaneous pose reconstruction and parameterization of 3D mannequins in mobile devices
Xue et al. An end-to-end multi-resolution feature fusion defogging network
CN117236433B (en) Intelligent communication perception method, system, equipment and medium for assisting blind person life

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210618