CN110084307A - A kind of mobile robot visual follower method based on deeply study - Google Patents
A kind of mobile robot visual follower method based on deeply study Download PDFInfo
- Publication number
- CN110084307A CN110084307A CN201910361528.4A CN201910361528A CN110084307A CN 110084307 A CN110084307 A CN 110084307A CN 201910361528 A CN201910361528 A CN 201910361528A CN 110084307 A CN110084307 A CN 110084307A
- Authority
- CN
- China
- Prior art keywords
- model
- robot
- cnn
- training
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000000007 visual effect Effects 0.000 title claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000010276 construction Methods 0.000 claims abstract description 15
- 230000005012 migration Effects 0.000 claims abstract description 10
- 238000013508 migration Methods 0.000 claims abstract description 10
- 230000003993 interaction Effects 0.000 claims abstract description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000009471 action Effects 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000003786 synthesis reaction Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 claims 3
- 230000001276 controlling effect Effects 0.000 claims 2
- 230000007613 environmental effect Effects 0.000 abstract description 4
- 238000004590 computer program Methods 0.000 abstract description 3
- 230000007246 mechanism Effects 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007659 motor function Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/12—Target-seeking control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Image Analysis (AREA)
- Manipulator (AREA)
Abstract
The invention proposes a kind of mobile robot visual follower methods based on deeply study.Using the framework of " analog image has supervision pre-training+model migration+RL ", a small amount of data are collected in true environment first, automation expansion is carried out to data set using computer program and image processing techniques, to be largely adapted to the simulated data sets of real scene in a short time, for carrying out Training to the direction controlling model for following robot;Secondly, building the CNN model for robot direction controlling, and Training is carried out to it with the simulated data sets of automation construction, makes it as pre-training model;Then by the knowledge migration of pre-training model into the Controlling model based on DRL, it enables robot execute in true environment and follows task, in conjunction with intensified learning mechanism, robot is followed on one side during environmental interaction, direction handling quality is promoted on one side, not only robustness is high, and substantially reduces cost.
Description
Technical field
The invention belongs to intelligent robot technology fields, are related to a kind of mobile robot visual based on deeply study
Follower method.
Background technique
With advances in technology with the development of society, more and more intelligent robots appearance are in people's lives.With
It is one of the novel system received significant attention in recent years with formula robot, can applies in such as hospital, market or school
Assistant in equal complex environments as its owner, carries out following movement, this will bring great convenience to people's lives.
It follows robot that should have autonomous perception, identification, decision and motor function, can identify a certain specific target, and combine
Corresponding control system is realized under complex scene follows the target.
Visual sensor is typically based on for the research for following robot system at present or multisensor combines, the former is usually
Visual pattern is acquired using stereoscopic camera, needs cumbersome demarcating steps, and is difficult to adapt to outdoor stronger illumination;The latter due to
The addition of additional sensors, improves system cost, also brings complicated data fusion process.In order to guarantee that dynamic is unknown
The robustness tracked in environment, it usually needs the feature of hand-designed complexity, this considerably increases human cost, time cost and
Computing resource.In addition, whole system is usually split as target tracking module by robot system and robot transports for traditional following
Two parts of dynamic control, in such the pipeline design structure, the error occurred in previous module would generally be sequentially delivered to
Subsequent module is gradually amplified so as to cause the accumulation of error, is finally produced bigger effect to system performance.
In conclusion current tradition follows robot system to have the shortcomings that hardware cost and design cost are excessively high, and nothing
Method adapts to the variability and complexity of indoor and outdoor surroundings completely under hardware simplicity support, is easy to happen robot with losing target person
The case where, the robustness of system for tracking is reduced, therefore seriously affected application of the trailing type robot in real life.
Summary of the invention
The deficiency that Robot Design is followed for current tradition, the present invention provides a kind of shiftings based on deeply study
Mobile robot vision follower method.
The present invention is using monocular colour imagery shot as the unique input pickup of robot, by convolutional neural networks
The study of (Convolutional Neural Network, CNN) and deeply (Deep Reinforcement Learning,
DRL it) is introduced into and follows in robot system, get rid of the process that tradition follows hand-designed feature complicated in robot system,
The Learning control strategy directly from field-of-view image is allowed the robot to, a possibility that target is with losing is greatly reduced, it can be more preferable
Adapt to complex environment in illumination variation, background object interference, target disappear and pedestrian interference.Meanwhile deeply learns
Introducing but also follow robot that can constantly learn through experience during with environmental interaction, oneself is continuously improved
Intelligent level.
The present invention is received in true environment first using the framework of " analog image has supervision pre-training+model migration+RL "
Collect a small amount of data, automation expansion is carried out to data set using computer program and image processing techniques, so as in the short time
The simulated data sets of real scene are inside largely adapted to, for having carried out prison to the direction controlling model for following robot
Supervise and instruct white silk;Secondly, build the CNN model for robot direction controlling, and with the simulated data sets of automation construction to its into
Row Training makes it as pre-training model;Then by the knowledge migration of pre-training model to the Controlling model based on DRL
In, it enables robot execute in true environment and follows task, in conjunction with intensified learning (Reinforcement Learning, RL) machine
System, follows robot on one side during environmental interaction, is promoted on one side to direction handling quality.
Specific technical solution is as follows:
A kind of mobile robot visual follower method based on deeply study, comprises the following steps that
Step 1: the automation construction of data set;
In order to reduce the cost of data collection, be quickly obtained large scale training data, the present invention using computer program and
Image processing techniques devises a kind of dataset construction method of automation.It is collected on a small quantity first under simple experiment scene
Then data use pattern mask technology, are expanded on a large scale obtained a small amount of experimental data, can be obtained in a short time
The data for being largely adapted to complicated indoor and outdoor scene are obtained, to substantially reduce the cost artificially collected with labeled data.
(1) prepare the simple scenario that a target being followed easily is distinguished with background;Under simple scenario, from random
The field-of-view image of visual field acquisition target person different location in robot view field of device people;
(2) prepare to follow the application scenarios of robot as complex scene image, such as indoor and outdoor scene, streetscape.Due to
Being followed target person and can be easier to open with background separation under simple scenario, therefore can use pattern mask technology for target
People extracts from the background of simple scenario, and then superimposed to get being under complex scene to target person with complex scene
Image, and directly for synthesis complex scene image assign the motion space label under corresponding simple scenario;
Wherein, pattern mask technology is mainly first directed to the designed two-dimensional matrix of area-of-interest (i.e. mask) of image
The operation being multiplied is done with image to be processed, obtained result is exactly the area-of-interest to be extracted.
Step 2: direction controlling model buildings and training based on CNN;
Direction controlling model based on CNN is responsible for being directed to robot view field image, exports to the direction that will take movement
Prediction.Training is carried out by the extensive analogue data the set pair analysis model using automation construction, that is, may make model to have
There is higher direction controlling horizontal.The knowledge learnt in this model moves to the means migrated by model based on DRL's
Priori knowledge in direction controlling model, as the latter about direction controlling strategy.
From the monocular color camera acquired image of robot, before inputing to CNN, first its RGB triple channel is turned
It is changed to the channel HSV, input picture is re-used as and gives CNN;Then using step 1 automation construction data set to CNN model into
Row Training enables CNN to achieve the effect that export respective action state by robot view field input picture;
Step 3: model migration;
The present invention knows the strategy learnt in CNN direction controlling model as the priori of the direction controlling model based on DRL
The model moving method of knowledge.Although the output of CNN model and DRL model has different meanings: the output of CNN model is each side
To the probability of movement, and the output of DRL model is usually the value estimations of all directions movement, but their outputs having the same
Dimension.Generally, value estimations corresponding to the biggish direction of action of output probability are also higher in CNN model.
CNN parameters weighting trained in step 2 is migrated as initial parameter and gives DRL model, so that DRL model obtains
Obtain controlled level identical with CNN model;
Step 4: direction controlling model buildings and training based on DRL;
DRL model is responsible on the basis of obtaining CNN model priori knowledge, using performance of the RL mechanism to model carry out into
One step is promoted.The introducing of RL mechanism allows robot to collect experience on one side during with environmental interaction, on one side to certainly
Oneself knowledge improves, and follows direction controlling horizontal to obtain robot more higher than CNN model.
DRL model after the migration of step 3 initial parameter is used to robotic end to carry out using and by constantly and ring
Border interacts, and allows the robot to constantly update model, learns and adapt to the environment being presently in.
Further, above-mentioned steps two: being 640 × 480 from the monocular color camera acquired image size of robot,
Before inputing to neural network, its RGB triple channel is first converted into the channel HSV, and by the image tune of 640 × 480 sizes
60 × 80 sizes are made into, 4 adjacent moment institute acquired images are merged into the input as network, final input
Totally 12 channel, the size in each channel are 60 × 80 to layer comprising 4 × 3.
Further, above-mentioned steps two: based on CNN structure formed by 8 layers, including it is 3 layers of convolutional layer, 2 layers of pond layer, complete
2 layers of connectivity layer and output layer;The design of convolutional layer be in order to input picture carry out feature extraction, the design of pond layer be in order to
Dimensionality reduction is carried out to the feature extracted, with calculation amount needed for reducing propagated forward.From front to back, the convolution of three convolutional layers
Nuclear parameter setting is respectively as follows: 8 × 8,4 × 4,2 × 2;Two pond layers are all made of maximum pond, and size is 2 × 2;By
After three convolution, it will input to two full articulamentums, each layer has 384 nodes, is output after full articulamentum
Layer, by being multidimensional output after output layer, it includes three directions that each dimension indicates the movement of corresponding direction altogether
Movement: forward, to the left, to the right;It can all add a Relu activation primitive to right after three convolutional layers and two full articulamentums
The result non-linearization of input layer;The update of CNN parameter uses cross entropy loss function, is embodied as:
Wherein, y ' is the label data of sample, is three-dimensional One-Hot vector, wherein the dimension for 1 indicates correctly dynamic
Make.F (x) indicates CNN model to the prediction probability of each movement dimension.
Further, the DRL model in above-mentioned steps three is specially DQN model, transition process are as follows: removal is trained
The Softmax layer of CNN network, directly assigns the weight parameter of preceding layers to DQN model.
Further, above-mentioned steps four: DQN carry out approximate value functions using neural network, i.e. the input of neural network is to work as
Preceding state value s, output are the magnitude of value Q of predictionθ(s, a), in each time step, environment can provide a state value s, intelligence
Body obtains the magnitude of value Q about this s and everything according to value function networkθ(s a) then utilizes greedy algorithm e-
Greedy selection movement, makes a policy, and environment can provide reward value r and next state s ' after receiving this movement a;This
It is a step;According to the parameter of r updated value Function Network;DQN uses mean square deviation error objective function:
Wherein, s ', a ' are the state and movement of subsequent time, and γ is hyper parameter, and θ is model parameter;
When training, the mode of the update of parameter are as follows:
When final deeply learning algorithm to be applied on physical machine people, the monocular as entrained by robot is color
State input value of the collected real-time field-of-view image of color camera as DRL algorithm, algorithm output action space is exactly direction
The set for controlling signal allows robot to follow target person to be moved in real time by executing direction control command.
Method of the invention proposes to be based on deeply aiming at the problem that following robot system to exist in practical applications
The intelligence of learning algorithm follows robot system, design end to end so that tradition follow tracking module in robot system and
Direction controlling module is merged, it is therefore prevented that error propagation and accumulation between module, so that the direct learning objective of robot arrives
Mapping relations between behavioral strategy.Robot system is followed compared to traditional, the system not only robustness with higher, and
Hardware cost and human cost can be substantially reduced, it can to following popularization and use of the robot in real life to increase
Energy.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is data set automation construction process schematic diagram of the invention.
Fig. 3 is data set automation construction composograph effect picture of the invention.
Wherein, each subgraph is described as follows:
(a) (b) (c) (d) is the picture example of robot collected target person different location under simple scenario;
(e) (f) (g) (h) is the complex scene picture example being collected into interconnection;
(i) (g) (k) (l) is the generated data collection example images after pattern mask technical treatment;
(a) (e) (i), (b) (f) (g), (c) (g) (k), (d) (h) (l) respectively show target person and are in simple image
Complete image mask synthesis process and effect under different location.
Fig. 4 is input picture and motion space corresponding relationship of the invention.
Fig. 5 is of the invention to follow robot system architecture diagram.
Specific embodiment
The software environment of present embodiment is Ubuntu14.04 system, and mobile robot uses TurtleBot2 robot,
Robot input pickup is the monocular colour TV camera of 640 × 480 resolution ratio.
Step 1: the automatic construction process of data set
For thering is supervision to follow robot direction controlling model in the present invention, input to follow the camera of robot to regard
Wild image exports the movement that should be taken for robot at current time.The construction process of entire data set includes two parts:
Input the acquisition of field-of-view image and the mark of output action.
Prepare a simple scene, in this scene, the target needs being followed are easier to distinguish with background.?
Under simple scenario, from the field-of-view image for following the visual field of robot to acquire multiple target persons different location in robot view field.
The more complicated scene image of certain amount is downloaded from internet, main includes the application scenarios for following robot common, such as
Indoor and outdoor scene, streetscape etc..It, can be with due to being followed target person and can be easier to open with background separation under simple scenario
Target person is extracted from background using pattern mask technology, and then superimposed with the complex scene that is obtained on internet,
The image that target person is under complex scene can be obtained, and directly can assign corresponding letter for the complex scene image of synthesis
Motion space label under single game scape.It is as shown in Figure 2 that data set automates construction process schematic diagram.Simple scenario image, interconnection
Effect picture after net complex scene image and progress data set automation construction process is as shown in Figure 3.
After the image comprising different location target person being collected under simple scenario, due to tracked target color with
Simple scenario background color difference is larger, directly passes through setpoint color threshold design pattern mask.This mask is applied to machine
After people's field-of-view image, the bianry image of available tracked target and background and the profile for extracting target person.It carries on the back at this time
The image value of scape part all 0, the image value of tracked target people are 1.It at this time can be by target person image section and complexity
Scene picture is overlapped.Label is acted by asking tracked target people image value 1 in water the bianry image after pattern mask
Mean value that prosposition is set and obtain.
Step 2: direction controlling model buildings and training process based on CNN
It is 640 × 480 from monocular color camera acquired image size, before inputing to neural network, first by it
RGB triple channel is converted to the channel HSV, and by the Image Adjusting of 640 × 480 sizes at being re-used as input figure after 60 × 80 sizes
As giving neural network.Design merges 4 adjacent moment institute acquired images as the defeated of network in the present invention
Enter, since single image is triple channel HSV image, final input layer includes 4 × 3 totally 12 channels, each channel
Size is all 60 × 80.Then Training is carried out to CNN model using the data set of automation construction, so that CNN network
It can achieve the effect that export respective action state by robot view field input picture.
Step 3: model transition process
The DQN model finally used in the present invention has the CNN direction controlling network designed with the previously described present invention
Similar structure, but last Softmax layer is eliminated, directly value forecasting of the output to each state action pair, without
It is the probability distribution of every kind of movement.Therefore, the model migration strategy that the present invention uses is i.e.: removing trained CNN network
It Softmax layers, directly assigns the weight parameter of preceding layers to DQN model, achievees the purpose that priori knowledge migrates with this.
Step 4: the direction controlling model training process based on DRL
After completing model migration, DRL model can be used for robotic end and carried out using and by constantly and environment
It interacts, and then allows the robot to constantly update model, learn to the environment being presently in, promote the robustness followed.
In this during, algorithm exports discrete motion space and controls robot, follows the movement of robot empty in the present invention
Between be one and include the set that instructs to the left, to the right, forward, the corresponding relationship of motion space and input picture is as shown in Figure 4.
There is no independent mark for data in RL, relies only on the reward signal of external feedback only to imply the good of movement
Bad degree, therefore the design of reward function is a most important link of RL successful application.For in the present invention with random
Device people's direction controlling reward function design are as follows: user is by being connected remotely to follow robot local side, the view of observer robot
Wild image, initial STOP are 0, and expression, which follows, not to be terminated;When user find robot view field image in oneself position it is inclined
From center, a stopping message being sent by handheld device, just know oneself when robotic end receives this message
Failure is followed, 1 is set by STOP, controls robot stop motion.On the one hand such design can facilitate the user's operation,
On the other hand also available more accurate reward signal, to accelerate the convergence of model.At this point, reward function can use following formula
It indicates:
Wherein, C is negative.
By the confirmatory experiment in TurtleBot2 robot, this method can accurately follow specific target person,
And robustness with higher.
Claims (5)
1. a kind of mobile robot visual follower method based on deeply study, which comprises the steps of:
Step 1: the automation construction of data set;
(1) prepare the simple scenario that a target being followed easily is distinguished with background;Under simple scenario, from following robot
The visual field acquisition target person different location in robot view field field-of-view image;
(2) prepare to follow the application scenarios of robot as complex scene image, target person is conformed to the principle of simplicity using pattern mask technology
It is extracted in the background of single game scape, and then superimposed to get being in the image under complex scene to target person with complex scene,
And the complex scene image directly for synthesis assigns the motion space label under corresponding simple scenario;
Step 2: direction controlling model buildings and training based on CNN;
Training is carried out to CNN model using the data set of step 1 automation construction, CNN is reached and passes through machine
Device people visual field input picture exports the effect of respective action state, from the monocular color camera acquired image of robot,
Before inputing to CNN, its RGB triple channel is first converted into the channel HSV, input picture is re-used as and gives CNN, network can be with later
Export corresponding action state;
Step 3: model migration;
The trained CNN parameters weighting of step 2 is migrated as initial parameter and gives DRL model, so that DRL model obtains and CNN
The identical controlled level of model;
Step 4: direction controlling model buildings and training based on DRL;
By step 3 initial parameter migration after DRL model be used for robotic end carry out using, and by constantly and environment into
Row interaction allows the robot to constantly update model, study to the environment being presently in.
2. the mobile robot visual follower method according to claim 1 based on deeply study, which is characterized in that
Step 2: being 640 × 480 from the monocular color camera acquired image size of robot, before inputing to neural network,
Its RGB triple channel is first converted into the channel HSV, and by the Image Adjusting of 640 × 480 sizes at 60 × 80 sizes, by 4 phases
Adjacent moment institute acquired image merges the input as network, and final input layer includes 4 × 3 totally 12 channels, often
The size in one channel is all 60 × 80.
3. the mobile robot visual follower method according to claim 1 based on deeply study, which is characterized in that
Step 2: based on CNN structure formed by 8 layers, including 3 layers of convolutional layer, 2 layers of pond layer, 2 layers of full-mesh layer and output layer;From
After going to, the convolution kernel parameter setting of three convolutional layers is respectively as follows: 8 × 8,4 × 4,2 × 2;Two pond layers are all made of maximum
Chi Hua, size are 2 × 2;After third convolution, it will input to two full articulamentums, each layer has 384 sections
Point is output layer after full articulamentum, by being multidimensional output after output layer, each dimension expression corresponding direction
Movement includes the movement in three directions: forward, to the left, to the right altogether;Can all it add after three convolutional layers and two full articulamentums
One Relu activation primitive is to the result non-linearization to input layer;The update of CNN parameter uses cross entropy loss function, tool
Body surface is shown as:
Wherein, y ' is the label data of sample, is three-dimensional One-Hot vector, wherein the dimension for 1 indicates correctly movement;f
(x) indicate CNN model to the prediction probability of each movement dimension.
4. the mobile robot visual follower method according to claim 1 based on deeply study, which is characterized in that
DRL model in step 3 is specially DQN model, transition process are as follows: remove the Softmax layer of trained CNN network, will before
The weight parameter of each layer in face directly assigns DQN model.
5. the mobile robot visual follower method according to claim 4 based on deeply study, which is characterized in that
Step 4: DQN uses neural network approximate value functions, i.e. the input of neural network is current state value s, and output is the valence of prediction
Value amount Qθ(s, a), in each time step, environment can provide a state value s, intelligent body according to value function network obtain about
The magnitude of value Q of this s and everythingθ(s a) is then acted using greedy algorithm e-greedy selection, is made a policy, environment
Reward value r and next state s ' can be provided after receiving this movement a;This is a step;Value function net is updated according to r
The parameter of network;DQN uses mean square deviation error objective function:
Wherein, s ', a ' are the state and movement of subsequent time, and γ is hyper parameter, and θ is model parameter;
When training, the mode of the update of parameter are as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910361528.4A CN110084307B (en) | 2019-04-30 | 2019-04-30 | Mobile robot vision following method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910361528.4A CN110084307B (en) | 2019-04-30 | 2019-04-30 | Mobile robot vision following method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110084307A true CN110084307A (en) | 2019-08-02 |
CN110084307B CN110084307B (en) | 2021-06-18 |
Family
ID=67418184
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910361528.4A Expired - Fee Related CN110084307B (en) | 2019-04-30 | 2019-04-30 | Mobile robot vision following method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110084307B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110728368A (en) * | 2019-10-25 | 2020-01-24 | 中国人民解放军国防科技大学 | Acceleration method for deep reinforcement learning of simulation robot |
CN111523495A (en) * | 2020-04-27 | 2020-08-11 | 天津中科智能识别产业技术研究院有限公司 | End-to-end active human body tracking method in monitoring scene based on deep reinforcement learning |
CN111539979A (en) * | 2020-04-27 | 2020-08-14 | 天津大学 | Human body front tracking method based on deep reinforcement learning |
CN111578940A (en) * | 2020-04-24 | 2020-08-25 | 哈尔滨工业大学 | Indoor monocular navigation method and system based on cross-sensor transfer learning |
CN112297012A (en) * | 2020-10-30 | 2021-02-02 | 上海交通大学 | Robot reinforcement learning method based on self-adaptive model |
CN112702423A (en) * | 2020-12-23 | 2021-04-23 | 杭州比脉科技有限公司 | Robot learning system based on Internet of things interactive entertainment mode |
CN112731804A (en) * | 2019-10-29 | 2021-04-30 | 北京京东乾石科技有限公司 | Method and device for realizing path following |
CN112799401A (en) * | 2020-12-28 | 2021-05-14 | 华南理工大学 | End-to-end robot vision-motion navigation method |
CN113011526A (en) * | 2021-04-23 | 2021-06-22 | 华南理工大学 | Robot skill learning method and system based on reinforcement learning and unsupervised learning |
CN113031441A (en) * | 2021-03-03 | 2021-06-25 | 北京航空航天大学 | Rotary mechanical diagnosis network automatic search method based on reinforcement learning |
CN113158778A (en) * | 2021-03-09 | 2021-07-23 | 中国电子科技集团公司第五十四研究所 | SAR image target detection method |
CN113156959A (en) * | 2021-04-27 | 2021-07-23 | 东莞理工学院 | Self-supervision learning and navigation method of autonomous mobile robot in complex scene |
CN113485326A (en) * | 2021-06-28 | 2021-10-08 | 南京深一科技有限公司 | Autonomous mobile robot based on visual navigation |
TWI751511B (en) * | 2019-09-05 | 2022-01-01 | 日商三菱電機股份有限公司 | Inference device, machine control system and learning device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2017101165A4 (en) * | 2017-08-25 | 2017-11-02 | Liu, Yichen MR | Method of Structural Improvement of Game Training Deep Q-Network |
US20180018562A1 (en) * | 2016-07-14 | 2018-01-18 | Cside Japan Inc. | Platform for providing task based on deep learning |
CN108549928A (en) * | 2018-03-19 | 2018-09-18 | 清华大学 | Visual tracking method and device based on continuous moving under deeply learning guide |
CN108932735A (en) * | 2018-07-10 | 2018-12-04 | 广州众聚智能科技有限公司 | A method of generating deep learning sample |
CN109242882A (en) * | 2018-08-06 | 2019-01-18 | 北京市商汤科技开发有限公司 | Visual tracking method, device, medium and equipment |
CN109341689A (en) * | 2018-09-12 | 2019-02-15 | 北京工业大学 | Vision navigation method of mobile robot based on deep learning |
-
2019
- 2019-04-30 CN CN201910361528.4A patent/CN110084307B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180018562A1 (en) * | 2016-07-14 | 2018-01-18 | Cside Japan Inc. | Platform for providing task based on deep learning |
AU2017101165A4 (en) * | 2017-08-25 | 2017-11-02 | Liu, Yichen MR | Method of Structural Improvement of Game Training Deep Q-Network |
CN108549928A (en) * | 2018-03-19 | 2018-09-18 | 清华大学 | Visual tracking method and device based on continuous moving under deeply learning guide |
CN108932735A (en) * | 2018-07-10 | 2018-12-04 | 广州众聚智能科技有限公司 | A method of generating deep learning sample |
CN109242882A (en) * | 2018-08-06 | 2019-01-18 | 北京市商汤科技开发有限公司 | Visual tracking method, device, medium and equipment |
CN109341689A (en) * | 2018-09-12 | 2019-02-15 | 北京工业大学 | Vision navigation method of mobile robot based on deep learning |
Non-Patent Citations (5)
Title |
---|
OFIR NACHUM ET AL: "Bridging the Gap Between Value and Policy Based Reinforcement Learning", 《ARXIV:1702.08892V1》 * |
VOLODYMYR MNIH ET AL: "Human-level control through deep reinforcement learning", 《NATURE》 * |
李昌: "基于多模态视频的鲁棒目标跟踪方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
杨杰等: "《医学影像分析和三维重建及其应用》", 31 January 2015 * |
阳岳生: "基于机器学习的视觉目标跟踪算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI751511B (en) * | 2019-09-05 | 2022-01-01 | 日商三菱電機股份有限公司 | Inference device, machine control system and learning device |
CN110728368A (en) * | 2019-10-25 | 2020-01-24 | 中国人民解放军国防科技大学 | Acceleration method for deep reinforcement learning of simulation robot |
CN110728368B (en) * | 2019-10-25 | 2022-03-15 | 中国人民解放军国防科技大学 | Acceleration method for deep reinforcement learning of simulation robot |
CN112731804A (en) * | 2019-10-29 | 2021-04-30 | 北京京东乾石科技有限公司 | Method and device for realizing path following |
CN111578940A (en) * | 2020-04-24 | 2020-08-25 | 哈尔滨工业大学 | Indoor monocular navigation method and system based on cross-sensor transfer learning |
CN111578940B (en) * | 2020-04-24 | 2021-05-11 | 哈尔滨工业大学 | Indoor monocular navigation method and system based on cross-sensor transfer learning |
CN111523495A (en) * | 2020-04-27 | 2020-08-11 | 天津中科智能识别产业技术研究院有限公司 | End-to-end active human body tracking method in monitoring scene based on deep reinforcement learning |
CN111539979A (en) * | 2020-04-27 | 2020-08-14 | 天津大学 | Human body front tracking method based on deep reinforcement learning |
CN111523495B (en) * | 2020-04-27 | 2023-09-01 | 天津中科智能识别产业技术研究院有限公司 | End-to-end active human body tracking method in monitoring scene based on deep reinforcement learning |
CN111539979B (en) * | 2020-04-27 | 2022-12-27 | 天津大学 | Human body front tracking method based on deep reinforcement learning |
CN112297012A (en) * | 2020-10-30 | 2021-02-02 | 上海交通大学 | Robot reinforcement learning method based on self-adaptive model |
CN112297012B (en) * | 2020-10-30 | 2022-05-31 | 上海交通大学 | Robot reinforcement learning method based on self-adaptive model |
CN112702423A (en) * | 2020-12-23 | 2021-04-23 | 杭州比脉科技有限公司 | Robot learning system based on Internet of things interactive entertainment mode |
CN112702423B (en) * | 2020-12-23 | 2022-05-03 | 杭州比脉科技有限公司 | Robot learning system based on Internet of things interactive entertainment mode |
CN112799401A (en) * | 2020-12-28 | 2021-05-14 | 华南理工大学 | End-to-end robot vision-motion navigation method |
CN113031441B (en) * | 2021-03-03 | 2022-04-08 | 北京航空航天大学 | Rotary mechanical diagnosis network automatic search method based on reinforcement learning |
CN113031441A (en) * | 2021-03-03 | 2021-06-25 | 北京航空航天大学 | Rotary mechanical diagnosis network automatic search method based on reinforcement learning |
CN113158778A (en) * | 2021-03-09 | 2021-07-23 | 中国电子科技集团公司第五十四研究所 | SAR image target detection method |
CN113011526A (en) * | 2021-04-23 | 2021-06-22 | 华南理工大学 | Robot skill learning method and system based on reinforcement learning and unsupervised learning |
CN113011526B (en) * | 2021-04-23 | 2024-04-26 | 华南理工大学 | Robot skill learning method and system based on reinforcement learning and unsupervised learning |
CN113156959A (en) * | 2021-04-27 | 2021-07-23 | 东莞理工学院 | Self-supervision learning and navigation method of autonomous mobile robot in complex scene |
CN113156959B (en) * | 2021-04-27 | 2024-06-04 | 东莞理工学院 | Self-supervision learning and navigation method for autonomous mobile robot in complex scene |
CN113485326A (en) * | 2021-06-28 | 2021-10-08 | 南京深一科技有限公司 | Autonomous mobile robot based on visual navigation |
Also Published As
Publication number | Publication date |
---|---|
CN110084307B (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084307A (en) | A kind of mobile robot visual follower method based on deeply study | |
Ruan et al. | Mobile robot navigation based on deep reinforcement learning | |
CN106648103B (en) | A kind of the gesture tracking method and VR helmet of VR helmet | |
CN109800864B (en) | Robot active learning method based on image input | |
CN106966298B (en) | Assembled architecture intelligence hanging method based on machine vision and system | |
CN104589356B (en) | The Dextrous Hand remote operating control method caught based on Kinect human hand movement | |
CN113110482B (en) | Indoor environment robot exploration method and system based on priori information heuristic method | |
CN110135249A (en) | Human bodys' response method based on time attention mechanism and LSTM | |
CN109045676B (en) | Chinese chess recognition learning algorithm and robot intelligent system and method based on algorithm | |
CN108198221A (en) | A kind of automatic stage light tracking system and method based on limb action | |
CN101127078A (en) | Unmanned machine vision image matching method based on ant colony intelligence | |
CN111176309B (en) | Multi-unmanned aerial vehicle self-group mutual inductance understanding method based on spherical imaging | |
CN101154289A (en) | Method for tracing three-dimensional human body movement based on multi-camera | |
CN107818333A (en) | Robot obstacle-avoiding action learning and Target Searching Method based on depth belief network | |
CN112651262A (en) | Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment | |
CN115147488B (en) | Workpiece pose estimation method and grabbing system based on dense prediction | |
CN113255514B (en) | Behavior identification method based on local scene perception graph convolutional network | |
CN113370217A (en) | Method for recognizing and grabbing object posture based on deep learning for intelligent robot | |
CN109508686A (en) | A kind of Human bodys' response method based on the study of stratification proper subspace | |
CN110059597A (en) | Scene recognition method based on depth camera | |
CN114036969A (en) | 3D human body action recognition algorithm under multi-view condition | |
Cheng et al. | A grasp pose detection scheme with an end-to-end CNN regression approach | |
Chen et al. | Improving registration of augmented reality by incorporating DCNNS into visual SLAM | |
CN114998573A (en) | Grabbing pose detection method based on RGB-D feature depth fusion | |
Liu et al. | Robotic picking in dense clutter via domain invariant learning from synthetic dense cluttered rendering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210618 |
|
CF01 | Termination of patent right due to non-payment of annual fee |