CN110084307A

CN110084307A - A kind of mobile robot visual follower method based on deeply study

Info

Publication number: CN110084307A
Application number: CN201910361528.4A
Authority: CN
Inventors: 张云洲; 王帅; 庞琳卓; 刘及惟; 王磊
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2019-08-02
Anticipated expiration: 2039-04-30
Also published as: CN110084307B

Abstract

The invention proposes a kind of mobile robot visual follower methods based on deeply study.Using the framework of " analog image has supervision pre-training+model migration+RL ", a small amount of data are collected in true environment first, automation expansion is carried out to data set using computer program and image processing techniques, to be largely adapted to the simulated data sets of real scene in a short time, for carrying out Training to the direction controlling model for following robot；Secondly, building the CNN model for robot direction controlling, and Training is carried out to it with the simulated data sets of automation construction, makes it as pre-training model；Then by the knowledge migration of pre-training model into the Controlling model based on DRL, it enables robot execute in true environment and follows task, in conjunction with intensified learning mechanism, robot is followed on one side during environmental interaction, direction handling quality is promoted on one side, not only robustness is high, and substantially reduces cost.

Description

A kind of mobile robot visual follower method based on deeply study

Technical field

The invention belongs to intelligent robot technology fields, are related to a kind of mobile robot visual based on deeply study Follower method.

Background technique

With advances in technology with the development of society, more and more intelligent robots appearance are in people's lives.With It is one of the novel system received significant attention in recent years with formula robot, can applies in such as hospital, market or school Assistant in equal complex environments as its owner, carries out following movement, this will bring great convenience to people's lives. It follows robot that should have autonomous perception, identification, decision and motor function, can identify a certain specific target, and combine Corresponding control system is realized under complex scene follows the target.

Visual sensor is typically based on for the research for following robot system at present or multisensor combines, the former is usually Visual pattern is acquired using stereoscopic camera, needs cumbersome demarcating steps, and is difficult to adapt to outdoor stronger illumination；The latter due to The addition of additional sensors, improves system cost, also brings complicated data fusion process.In order to guarantee that dynamic is unknown The robustness tracked in environment, it usually needs the feature of hand-designed complexity, this considerably increases human cost, time cost and Computing resource.In addition, whole system is usually split as target tracking module by robot system and robot transports for traditional following Two parts of dynamic control, in such the pipeline design structure, the error occurred in previous module would generally be sequentially delivered to Subsequent module is gradually amplified so as to cause the accumulation of error, is finally produced bigger effect to system performance.

In conclusion current tradition follows robot system to have the shortcomings that hardware cost and design cost are excessively high, and nothing Method adapts to the variability and complexity of indoor and outdoor surroundings completely under hardware simplicity support, is easy to happen robot with losing target person The case where, the robustness of system for tracking is reduced, therefore seriously affected application of the trailing type robot in real life.

Summary of the invention

The deficiency that Robot Design is followed for current tradition, the present invention provides a kind of shiftings based on deeply study Mobile robot vision follower method.

The present invention is using monocular colour imagery shot as the unique input pickup of robot, by convolutional neural networks The study of (Convolutional Neural Network, CNN) and deeply (Deep Reinforcement Learning, DRL it) is introduced into and follows in robot system, get rid of the process that tradition follows hand-designed feature complicated in robot system, The Learning control strategy directly from field-of-view image is allowed the robot to, a possibility that target is with losing is greatly reduced, it can be more preferable Adapt to complex environment in illumination variation, background object interference, target disappear and pedestrian interference.Meanwhile deeply learns Introducing but also follow robot that can constantly learn through experience during with environmental interaction, oneself is continuously improved Intelligent level.

The present invention is received in true environment first using the framework of " analog image has supervision pre-training+model migration+RL " Collect a small amount of data, automation expansion is carried out to data set using computer program and image processing techniques, so as in the short time The simulated data sets of real scene are inside largely adapted to, for having carried out prison to the direction controlling model for following robot Supervise and instruct white silk；Secondly, build the CNN model for robot direction controlling, and with the simulated data sets of automation construction to its into Row Training makes it as pre-training model；Then by the knowledge migration of pre-training model to the Controlling model based on DRL In, it enables robot execute in true environment and follows task, in conjunction with intensified learning (Reinforcement Learning, RL) machine System, follows robot on one side during environmental interaction, is promoted on one side to direction handling quality.

Specific technical solution is as follows:

A kind of mobile robot visual follower method based on deeply study, comprises the following steps that

Step 1: the automation construction of data set；

In order to reduce the cost of data collection, be quickly obtained large scale training data, the present invention using computer program and Image processing techniques devises a kind of dataset construction method of automation.It is collected on a small quantity first under simple experiment scene Then data use pattern mask technology, are expanded on a large scale obtained a small amount of experimental data, can be obtained in a short time The data for being largely adapted to complicated indoor and outdoor scene are obtained, to substantially reduce the cost artificially collected with labeled data.

(1) prepare the simple scenario that a target being followed easily is distinguished with background；Under simple scenario, from random The field-of-view image of visual field acquisition target person different location in robot view field of device people；

(2) prepare to follow the application scenarios of robot as complex scene image, such as indoor and outdoor scene, streetscape.Due to Being followed target person and can be easier to open with background separation under simple scenario, therefore can use pattern mask technology for target People extracts from the background of simple scenario, and then superimposed to get being under complex scene to target person with complex scene Image, and directly for synthesis complex scene image assign the motion space label under corresponding simple scenario；

Wherein, pattern mask technology is mainly first directed to the designed two-dimensional matrix of area-of-interest (i.e. mask) of image The operation being multiplied is done with image to be processed, obtained result is exactly the area-of-interest to be extracted.

Step 2: direction controlling model buildings and training based on CNN；

Direction controlling model based on CNN is responsible for being directed to robot view field image, exports to the direction that will take movement Prediction.Training is carried out by the extensive analogue data the set pair analysis model using automation construction, that is, may make model to have There is higher direction controlling horizontal.The knowledge learnt in this model moves to the means migrated by model based on DRL's Priori knowledge in direction controlling model, as the latter about direction controlling strategy.

From the monocular color camera acquired image of robot, before inputing to CNN, first its RGB triple channel is turned It is changed to the channel HSV, input picture is re-used as and gives CNN；Then using step 1 automation construction data set to CNN model into Row Training enables CNN to achieve the effect that export respective action state by robot view field input picture；

Step 3: model migration；

The present invention knows the strategy learnt in CNN direction controlling model as the priori of the direction controlling model based on DRL The model moving method of knowledge.Although the output of CNN model and DRL model has different meanings: the output of CNN model is each side To the probability of movement, and the output of DRL model is usually the value estimations of all directions movement, but their outputs having the same Dimension.Generally, value estimations corresponding to the biggish direction of action of output probability are also higher in CNN model.

CNN parameters weighting trained in step 2 is migrated as initial parameter and gives DRL model, so that DRL model obtains Obtain controlled level identical with CNN model；

Step 4: direction controlling model buildings and training based on DRL；

DRL model is responsible on the basis of obtaining CNN model priori knowledge, using performance of the RL mechanism to model carry out into One step is promoted.The introducing of RL mechanism allows robot to collect experience on one side during with environmental interaction, on one side to certainly Oneself knowledge improves, and follows direction controlling horizontal to obtain robot more higher than CNN model.

DRL model after the migration of step 3 initial parameter is used to robotic end to carry out using and by constantly and ring Border interacts, and allows the robot to constantly update model, learns and adapt to the environment being presently in.

Further, above-mentioned steps two: being 640 × 480 from the monocular color camera acquired image size of robot, Before inputing to neural network, its RGB triple channel is first converted into the channel HSV, and by the image tune of 640 × 480 sizes 60 × 80 sizes are made into, 4 adjacent moment institute acquired images are merged into the input as network, final input Totally 12 channel, the size in each channel are 60 × 80 to layer comprising 4 × 3.

Further, above-mentioned steps two: based on CNN structure formed by 8 layers, including it is 3 layers of convolutional layer, 2 layers of pond layer, complete 2 layers of connectivity layer and output layer；The design of convolutional layer be in order to input picture carry out feature extraction, the design of pond layer be in order to Dimensionality reduction is carried out to the feature extracted, with calculation amount needed for reducing propagated forward.From front to back, the convolution of three convolutional layers Nuclear parameter setting is respectively as follows: 8 × 8,4 × 4,2 × 2；Two pond layers are all made of maximum pond, and size is 2 × 2；By After three convolution, it will input to two full articulamentums, each layer has 384 nodes, is output after full articulamentum Layer, by being multidimensional output after output layer, it includes three directions that each dimension indicates the movement of corresponding direction altogether Movement: forward, to the left, to the right；It can all add a Relu activation primitive to right after three convolutional layers and two full articulamentums The result non-linearization of input layer；The update of CNN parameter uses cross entropy loss function, is embodied as:

Wherein, y ' is the label data of sample, is three-dimensional One-Hot vector, wherein the dimension for 1 indicates correctly dynamic Make.F (x) indicates CNN model to the prediction probability of each movement dimension.

Further, the DRL model in above-mentioned steps three is specially DQN model, transition process are as follows: removal is trained The Softmax layer of CNN network, directly assigns the weight parameter of preceding layers to DQN model.

Further, above-mentioned steps four: DQN carry out approximate value functions using neural network, i.e. the input of neural network is to work as Preceding state value s, output are the magnitude of value Q of prediction_θ(s, a), in each time step, environment can provide a state value s, intelligence Body obtains the magnitude of value Q about this s and everything according to value function network_θ(s a) then utilizes greedy algorithm e- Greedy selection movement, makes a policy, and environment can provide reward value r and next state s ' after receiving this movement a；This It is a step；According to the parameter of r updated value Function Network；DQN uses mean square deviation error objective function:

Wherein, s ', a ' are the state and movement of subsequent time, and γ is hyper parameter, and θ is model parameter；

When training, the mode of the update of parameter are as follows:

When final deeply learning algorithm to be applied on physical machine people, the monocular as entrained by robot is color State input value of the collected real-time field-of-view image of color camera as DRL algorithm, algorithm output action space is exactly direction The set for controlling signal allows robot to follow target person to be moved in real time by executing direction control command.

Method of the invention proposes to be based on deeply aiming at the problem that following robot system to exist in practical applications The intelligence of learning algorithm follows robot system, design end to end so that tradition follow tracking module in robot system and Direction controlling module is merged, it is therefore prevented that error propagation and accumulation between module, so that the direct learning objective of robot arrives Mapping relations between behavioral strategy.Robot system is followed compared to traditional, the system not only robustness with higher, and Hardware cost and human cost can be substantially reduced, it can to following popularization and use of the robot in real life to increase Energy.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is data set automation construction process schematic diagram of the invention.

Fig. 3 is data set automation construction composograph effect picture of the invention.

Wherein, each subgraph is described as follows:

(a) (b) (c) (d) is the picture example of robot collected target person different location under simple scenario；

(e) (f) (g) (h) is the complex scene picture example being collected into interconnection；

(i) (g) (k) (l) is the generated data collection example images after pattern mask technical treatment；

(a) (e) (i), (b) (f) (g), (c) (g) (k), (d) (h) (l) respectively show target person and are in simple image Complete image mask synthesis process and effect under different location.

Fig. 4 is input picture and motion space corresponding relationship of the invention.

Fig. 5 is of the invention to follow robot system architecture diagram.

Specific embodiment

The software environment of present embodiment is Ubuntu14.04 system, and mobile robot uses TurtleBot2 robot, Robot input pickup is the monocular colour TV camera of 640 × 480 resolution ratio.

Step 1: the automatic construction process of data set

For thering is supervision to follow robot direction controlling model in the present invention, input to follow the camera of robot to regard Wild image exports the movement that should be taken for robot at current time.The construction process of entire data set includes two parts: Input the acquisition of field-of-view image and the mark of output action.

Prepare a simple scene, in this scene, the target needs being followed are easier to distinguish with background.? Under simple scenario, from the field-of-view image for following the visual field of robot to acquire multiple target persons different location in robot view field. The more complicated scene image of certain amount is downloaded from internet, main includes the application scenarios for following robot common, such as Indoor and outdoor scene, streetscape etc..It, can be with due to being followed target person and can be easier to open with background separation under simple scenario Target person is extracted from background using pattern mask technology, and then superimposed with the complex scene that is obtained on internet, The image that target person is under complex scene can be obtained, and directly can assign corresponding letter for the complex scene image of synthesis Motion space label under single game scape.It is as shown in Figure 2 that data set automates construction process schematic diagram.Simple scenario image, interconnection Effect picture after net complex scene image and progress data set automation construction process is as shown in Figure 3.

After the image comprising different location target person being collected under simple scenario, due to tracked target color with Simple scenario background color difference is larger, directly passes through setpoint color threshold design pattern mask.This mask is applied to machine After people's field-of-view image, the bianry image of available tracked target and background and the profile for extracting target person.It carries on the back at this time The image value of scape part all 0, the image value of tracked target people are 1.It at this time can be by target person image section and complexity Scene picture is overlapped.Label is acted by asking tracked target people image value 1 in water the bianry image after pattern mask Mean value that prosposition is set and obtain.

Step 2: direction controlling model buildings and training process based on CNN

It is 640 × 480 from monocular color camera acquired image size, before inputing to neural network, first by it RGB triple channel is converted to the channel HSV, and by the Image Adjusting of 640 × 480 sizes at being re-used as input figure after 60 × 80 sizes As giving neural network.Design merges 4 adjacent moment institute acquired images as the defeated of network in the present invention Enter, since single image is triple channel HSV image, final input layer includes 4 × 3 totally 12 channels, each channel Size is all 60 × 80.Then Training is carried out to CNN model using the data set of automation construction, so that CNN network It can achieve the effect that export respective action state by robot view field input picture.

Step 3: model transition process

The DQN model finally used in the present invention has the CNN direction controlling network designed with the previously described present invention Similar structure, but last Softmax layer is eliminated, directly value forecasting of the output to each state action pair, without It is the probability distribution of every kind of movement.Therefore, the model migration strategy that the present invention uses is i.e.: removing trained CNN network It Softmax layers, directly assigns the weight parameter of preceding layers to DQN model, achievees the purpose that priori knowledge migrates with this.

Step 4: the direction controlling model training process based on DRL

After completing model migration, DRL model can be used for robotic end and carried out using and by constantly and environment It interacts, and then allows the robot to constantly update model, learn to the environment being presently in, promote the robustness followed. In this during, algorithm exports discrete motion space and controls robot, follows the movement of robot empty in the present invention Between be one and include the set that instructs to the left, to the right, forward, the corresponding relationship of motion space and input picture is as shown in Figure 4.

There is no independent mark for data in RL, relies only on the reward signal of external feedback only to imply the good of movement Bad degree, therefore the design of reward function is a most important link of RL successful application.For in the present invention with random Device people's direction controlling reward function design are as follows: user is by being connected remotely to follow robot local side, the view of observer robot Wild image, initial STOP are 0, and expression, which follows, not to be terminated；When user find robot view field image in oneself position it is inclined From center, a stopping message being sent by handheld device, just know oneself when robotic end receives this message Failure is followed, 1 is set by STOP, controls robot stop motion.On the one hand such design can facilitate the user's operation, On the other hand also available more accurate reward signal, to accelerate the convergence of model.At this point, reward function can use following formula It indicates:

Wherein, C is negative.

By the confirmatory experiment in TurtleBot2 robot, this method can accurately follow specific target person, And robustness with higher.

Claims

1. a kind of mobile robot visual follower method based on deeply study, which comprises the steps of:

Step 1: the automation construction of data set；

(1) prepare the simple scenario that a target being followed easily is distinguished with background；Under simple scenario, from following robot The visual field acquisition target person different location in robot view field field-of-view image；

(2) prepare to follow the application scenarios of robot as complex scene image, target person is conformed to the principle of simplicity using pattern mask technology It is extracted in the background of single game scape, and then superimposed to get being in the image under complex scene to target person with complex scene, And the complex scene image directly for synthesis assigns the motion space label under corresponding simple scenario；

Step 2: direction controlling model buildings and training based on CNN；

Training is carried out to CNN model using the data set of step 1 automation construction, CNN is reached and passes through machine Device people visual field input picture exports the effect of respective action state, from the monocular color camera acquired image of robot, Before inputing to CNN, its RGB triple channel is first converted into the channel HSV, input picture is re-used as and gives CNN, network can be with later Export corresponding action state；

Step 3: model migration；

The trained CNN parameters weighting of step 2 is migrated as initial parameter and gives DRL model, so that DRL model obtains and CNN The identical controlled level of model；

Step 4: direction controlling model buildings and training based on DRL；

By step 3 initial parameter migration after DRL model be used for robotic end carry out using, and by constantly and environment into Row interaction allows the robot to constantly update model, study to the environment being presently in.

2. the mobile robot visual follower method according to claim 1 based on deeply study, which is characterized in that Step 2: being 640 × 480 from the monocular color camera acquired image size of robot, before inputing to neural network, Its RGB triple channel is first converted into the channel HSV, and by the Image Adjusting of 640 × 480 sizes at 60 × 80 sizes, by 4 phases Adjacent moment institute acquired image merges the input as network, and final input layer includes 4 × 3 totally 12 channels, often The size in one channel is all 60 × 80.

3. the mobile robot visual follower method according to claim 1 based on deeply study, which is characterized in that Step 2: based on CNN structure formed by 8 layers, including 3 layers of convolutional layer, 2 layers of pond layer, 2 layers of full-mesh layer and output layer；From After going to, the convolution kernel parameter setting of three convolutional layers is respectively as follows: 8 × 8,4 × 4,2 × 2；Two pond layers are all made of maximum Chi Hua, size are 2 × 2；After third convolution, it will input to two full articulamentums, each layer has 384 sections Point is output layer after full articulamentum, by being multidimensional output after output layer, each dimension expression corresponding direction Movement includes the movement in three directions: forward, to the left, to the right altogether；Can all it add after three convolutional layers and two full articulamentums One Relu activation primitive is to the result non-linearization to input layer；The update of CNN parameter uses cross entropy loss function, tool Body surface is shown as:

Wherein, y ' is the label data of sample, is three-dimensional One-Hot vector, wherein the dimension for 1 indicates correctly movement；f (x) indicate CNN model to the prediction probability of each movement dimension.

4. the mobile robot visual follower method according to claim 1 based on deeply study, which is characterized in that DRL model in step 3 is specially DQN model, transition process are as follows: remove the Softmax layer of trained CNN network, will before The weight parameter of each layer in face directly assigns DQN model.

5. the mobile robot visual follower method according to claim 4 based on deeply study, which is characterized in that Step 4: DQN uses neural network approximate value functions, i.e. the input of neural network is current state value s, and output is the valence of prediction Value amount Q_θ(s, a), in each time step, environment can provide a state value s, intelligent body according to value function network obtain about The magnitude of value Q of this s and everything_θ(s a) is then acted using greedy algorithm e-greedy selection, is made a policy, environment Reward value r and next state s ' can be provided after receiving this movement a；This is a step；Value function net is updated according to r The parameter of network；DQN uses mean square deviation error objective function:

When training, the mode of the update of parameter are as follows: