CN110084850A - A kind of dynamic scene vision positioning method based on image, semantic segmentation - Google Patents
A kind of dynamic scene vision positioning method based on image, semantic segmentation Download PDFInfo
- Publication number
- CN110084850A CN110084850A CN201910270280.0A CN201910270280A CN110084850A CN 110084850 A CN110084850 A CN 110084850A CN 201910270280 A CN201910270280 A CN 201910270280A CN 110084850 A CN110084850 A CN 110084850A
- Authority
- CN
- China
- Prior art keywords
- size
- image
- characteristic pattern
- point
- original image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of dynamic scene vision positioning methods based on image, semantic segmentation, belong to SLAM(Simultaneous Localization and Mapping, synchronous to position and build figure) field.The present invention uses the supervised learning mode in deep learning to be split the dynamic object in original image first, obtains semantic image;On this basis, ORB characteristic point is extracted from original image and dynamic object characteristic point is rejected according to semantic image;Finally, carrying out locating and tracking to camera motion using the monocular SLAM method based on point feature based on the characteristic point after rejecting.Positioning result shows that compared to conventional method, positioning accuracy of the method disclosed by the invention in dynamic scene improves 13% to 30%.
Description
Technical field
The present invention relates to application of the deep learning in vision SLAM, belong to SLAM (Simultaneous
Localization and Mapping, synchronous to position and build figure) field.
Background technique
It positions simultaneously and builds the key technology that figure (SLAM) is robot autonomous operation under circumstances not known.Based on robot
The environmental data that external sensor detects, SLAM construct the environmental view of robot, while giving robot in ring
Position in the figure of border.Compared with the distance mearuring equipments such as radar, sonar, visual sensor is with small in size, low in energy consumption, information collection is rich
The features such as rich, can provide texture information abundant in external environment.Therefore, vision SLAM has become the heat of current research
Point, and it is applied to the fields such as independent navigation, VR/AR.
Traditional vision SLAM algorithm based on point feature is based on static ring in restoration scenario information and camera motion
What border was assumed.Dynamic object in scene will affect positioning accuracy.Currently, traditional vision SLAM algorithm based on point feature is logical
It crosses detection dynamic point and is marked as exterior point to handle simple dynamic scene problem.ORB-SLAM passes through RANSAC, card side
It examines, key frame method and local map reduce influence of the dynamic object to positioning accuracy.Direct method passes through optimization cost function
To handle occlusion issue caused by dynamic object.2013, there is scholar to propose a kind of new key frame expression and update method,
For carrying out adaptive modeling to dynamic environment, appearance or structure change in dynamic environment are effectively detected and handled.The same year,
Attitude estimation and the method for building figure are used to handle dynamic scene between having scholar to introduce multiple-camera.However tradition SLAM method exists
Positioning accuracy and robustness under dynamic scene have to be hoisted.
Summary of the invention
The technical problems to be solved by the present invention are:
In order to promote positioning accuracy and robustness of traditional SLAM method under dynamic scene, provide a kind of based on image language
The dynamic scene vision positioning method of justice segmentation, can be split the dynamic object in scene, reject dynamic object feature
Point.
The present invention uses following technical scheme to solve above-mentioned technical problem:
The present invention proposes a kind of dynamic scene vision positioning method based on image, semantic segmentation, comprising the following steps:
Step 1, acquisition original image construct convolutional neural networks, and divide the original using the convolutional neural networks
Dynamic object in beginning image obtains semantic image;
Step 2 extracts ORB characteristic point in the original image;
Step 3 carries out the dynamic object characteristic point in step 2 gained ORB characteristic point according to step 1 gained semantic image
It rejects, only retains stationary body characteristic point;
Step 4 is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method pair based on point feature
Camera motion carries out locating and tracking.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1
In, the step of building convolutional neural networks includes:
Step 1.1.1, original image is downsampled to 1/4, inputs PSPNet, obtaining size step by step is 1/8 and 1/16
Characteristic pattern finally exports the characteristic pattern F1 of 1/32 size;
Step 1.1.2, original image is downsampled to 1/2, inputs the PSPNet, obtaining size step by step is 1/4 and 1/8
Characteristic pattern, finally export the characteristic pattern F2 of 1/16 size;
Step 1.1.3, that characteristic pattern F1, F2 and true value label having a size of original image 1/16 are inputted the first CFF is mono-
Member fusion, the characteristic pattern F that output size is 1/161With the loss item L of the first branch1;
Step 1.1.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, most
The characteristic pattern F3 of 1/8 size is exported afterwards;By characteristic pattern F1With characteristic pattern F3 and the true value label having a size of original image 1/8 is defeated
Enter the fusion of the 2nd CFF unit, the characteristic pattern F that output size is 1/82With the loss item L of the second branch2;
Step 1.1.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, the characteristic pattern F3
The loss item L of third branch is exported after the true value tag processes of 1/4 size3;
Step 1.1.6, by the loss item L1、L2、L3Superposition is for training the convolutional neural networks.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step
1.1.3 and CFF unit includes described in step 1.1.4 image processing step includes:
The lesser characteristic pattern of size in two input feature vector figures is up-sampled with sample rate for 2, input classification respectively
Convolutional layer and expansion convolutional layer, the convolution kernel of the classification convolutional layer is having a size of 1*1*1, the convolution kernel ruler of the expansion convolutional layer
Very little is 3*3*C3, expansion rate 2;By characteristic pattern larger-size in two input feature vector figures input convolution kernel having a size of 1*1*C3
Projection convolutional layer;Normalization is criticized respectively for the output result of the expansion convolutional layer and projection convolutional layer then to sum, then
The summed result is inputted into RELU function, exports characteristic pattern Fc, by the output result of the classification convolutional layerWith true value mark
Label substitute into Softmax function, obtain the loss item of the CFF unit respective branches.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1.6
It is described by the loss item L1、L2、L3Superposition is for training the specific steps of the convolutional neural networks to include:
To loss item L1、L2、L3Summation, obtains final loss item Ltotal:
Wherein i is branch's number, ωiFor the weight of each branch penalty item,To lose letter for calculating in each branch
Several characteristic patterns, Yi×XiForSize, N is kind of object number to be split in preset image,For in spy
Sign figureThe position (n, y, x) numerical value,ForThe corresponding true value label at (y, x).
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1 institute
State the dynamic object divided in the original image using the convolutional neural networks, obtain semantic image the following steps are included:
Step 1.2.1, original image is downsampled to 1/4, inputs PSPNet, obtaining size step by step is 1/8 and 1/16
Characteristic pattern finally exports the characteristic pattern F1 of 1/32 size;
Step 1.2.2, original image is downsampled to 1/2, inputs the PSPNet, obtaining size step by step is 1/4 and 1/8
Characteristic pattern, finally export the characteristic pattern F2 of 1/16 size;
Step 1.2.3, that characteristic pattern F1, F2 and true value label having a size of original image 1/16 are inputted the first CFF is mono-
Member fusion, the characteristic pattern F that output size is 1/161;
Step 1.2.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, most
The characteristic pattern F3 of 1/8 size is exported afterwards;By characteristic pattern F1The fusion of the 2nd CFF unit, output size 1/8 are inputted with characteristic pattern F3
Characteristic pattern F2;
Step 1.2.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, when test process,
By F3It is up-sampled, the characteristic pattern that Output Size size is 1, this feature figure is semantic segmentation figure;
Step 1.2.6, binary conversion treatment is carried out to the semantic segmentation figure: to the dynamic object in the semantic segmentation figure
It is marked using black picture element 0, other objects are marked using white pixel 1, and obtaining one only includes dynamic object
Black and white semantic image i 't;
Step 1.2.7, the operation of the step 1.1 to 1.7 is carried out to the image sequence being made of original image, it is final to obtain
To the semantic image sequence I '={ i ' for only including dynamic objectt, i '2, i '3, i '4..., i 't}。
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: the step
In rapid 2, extracting ORB characteristic point specific steps in original image includes:
According to the complexity of scene, feature quantity to be extracted is set, extracts input picture using ORB feature extractor
itIn characteristic point it(x, y), wherein x, y are characterized transverse and longitudinal coordinate a little.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: the step
In rapid 3, the dynamic object characteristic point in step 2 gained ORB characteristic point is rejected according to step 1 gained semantic image, only
Retain stationary body characteristic point the step of include:
For original image itEach of characteristic point it(x, y), in its semantic image i 'tMiddle determining corresponding position i 't
(x, y);
If i 't(x, y)=0, the point are black pixel point, that is, belong to dynamic object feature, execute and reject operation;
If i 't(x, y)=1, the point are white pixel point, that is, belong to stationary body feature, execute reservation operations.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: the step
In rapid 4, it is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method based on point feature to camera motion
Locating and tracking is carried out, specifically:
For image sequence I={ i1,i2,i3,i4,…,it, the ORB characteristic point after being rejected based on step 3, using tradition
SLAM frame based on point feature calculates and optimizes camera pose, completes the positioning and tracking of camera.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1, the present invention first carries out the dynamic object in original image using the supervised learning mode in deep learning
Segmentation, obtains semantic image;On this basis, ORB characteristic point is extracted from original image and according to semantic image to goer
Body characteristics point is rejected, to improve positioning accuracy and robustness of traditional SLAM method under dynamic scene;
2, method positioning result proposed by the present invention is better than the positioning result of traditional ORB-SLAM, and positioning accuracy improves
13% to 30%.
Detailed description of the invention
Fig. 1 is this method flow chart;
Fig. 2 is this method image, semantic segmentation network structure;
Fig. 3 is this method cascade nature integrated unit structure chart;
Fig. 4 is this method dynamic object segmentation flow chart;
Fig. 5 is this method image, semantic segmentation result figure;
Fig. 6 is that this method dynamic object characteristic point rejects result figure;
Fig. 7 is this method and positioning track plan view of the complete ORB-SLAM in four sequences;
Fig. 8 is this method and positioning track plan view of the incomplete ORB-SLAM in four sequences.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing:
Those skilled in the art can understand that unless otherwise defined, all terms used herein (including skill
Art term and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Also
It should be understood that those terms such as defined in the general dictionary should be understood that have in the context of the prior art
The consistent meaning of meaning will not be explained in an idealized or overly formal meaning and unless defined as here.
With the development of depth learning technology, people explore the semantic information of image, improve vision whereby
The performance of SLAM.Semantic segmentation is the basic task in computer vision, needs for vision input to be divided into not in semantic segmentation
Classification can be explained in same semanteme.The present invention proposes a kind of dynamic scene vision positioning method based on image, semantic segmentation, it is intended to
On the basis of rejecting dynamic object characteristic point, the positioning accuracy of SLAM under dynamic scene is improved, while it is abundant to obtain scene
Semantic information.
The present invention proposes a kind of dynamic scene vision positioning method based on image, semantic segmentation, and Fig. 1 is this method process
Figure, Fig. 4 are this method dynamic object segmentation flow charts.First using the supervised learning mode in deep learning to original image
In dynamic object be split, obtain semantic image;On this basis, ORB characteristic point and basis are extracted from original image
Semantic image rejects dynamic object characteristic point;Finally, using the monocular based on point feature based on the characteristic point after rejecting
SLAM method carries out locating and tracking to camera motion.
Step 1, building convolutional neural networks are split the dynamic object in original image, obtain semantic image:
Step 1.1, building are used for the convolutional neural networks of semantic segmentation
Constructed neural network structure is as shown in Figure 2.In the network structure of Fig. 2 description, including top, middle part, bottom
Branch, three layers of portion;Number in bracket is the dimension ratio compared to original input picture;' CFF ' it is that cascade nature fusion is single
Member;The identical parameter of the three first layers network share of top layer and middle layer branch.
Now network structure is described in further detail:
Cascade image input: original image is downsampled to 1/4 size first by the top branch of the network described by Fig. 2
Image, then input PSPNet, export the characteristic pattern of 1/32 size, this is a kind of coarse segmentation result, is lacked many thin
Section and boundary.At middle part and bottom leg, it is extensive that details is carried out to above-mentioned coarse result using the image and original image of 1/2 size
Multiple and refinement.Although the segmentation result of top branch is more rough, semantic component abundant is contained.Therefore, it is used for details
The middle part and bottom leg network of recovery and refinement are lightweights.Different points are merged using cascade nature integrated unit (CFF)
The output characteristic pattern of branch enhances the learning process of different branches using cascade label guidance.
Cascade nature fusion: Fig. 3 illustrates the specific structure of cascade nature integrated unit, and wherein F1 and F2 is different branches
The characteristic pattern of output, the bulk size of F2 are twice of F1.Cascade nature integrated unit is for merging different branch's outputs
Characteristic pattern, the input of this element includes two characteristic patterns F1, F2 and a true value label, and the size of F1 is Y1×X1×C1,
The size of F2 is Y2×X2×C2, the size of label is Y1×X1×1.For characteristic pattern F1, adopt for 2 with sample rate first
Sample exports the characteristic pattern of size identical with F2.Right the latter core is having a size of 3 × 3 × C3, spreading rate be 2 expansion convolutional layer use
It is refined in above-mentioned output characteristic pattern, therefore the size of F1 becomes Y2×X2×C3.For characteristic pattern F2, pass through a core
Having a size of 1 × 1 × C3Convolutional layer, export Y2×X2×C3The characteristic pattern of size.Batch standard is carried out simultaneously to the output of F1 and F2
Change, and by summation layer and ' RELU ' function layer, final output fusion characteristic pattern F2 '.
Cascade label guidance;In the network structure of Fig. 2 description, the different (size of relatively primitive image of three sizes
Size is respectively 1/16,1/8,1/4) true value label be used for network top, middle part and bottom leg generate three independences
Loss item, and sum to three loss items, obtain final loss item:
Wherein ωtFor the weight of each branch penalty item, FtFor the characteristic pattern of each branch output, Yt×XtFor FtRuler
Very little, N is kind of object number to be split in preset image,For in characteristic pattern FtThe position (n, y, x) numerical value,ForThe corresponding true value label at (y, x).
Dynamic object in step 1.2, segmentation original input picture:
The realization process for the step for Fig. 3 is illustrated.For one group of given image sequence I={ i1, i2, i3, i4...,
it, wherein itThe image shot for t moment camera:
(1) to semantic segmentation network inputs piece image i constructed by step 1.1t, export the colored language after a width is divided
Adopted image, in semantic image, the pixel of the objects such as automobile, pedestrian, building, direction board in different colors is labeled;
(2) binary conversion treatment is carried out to the semantic image in (1), the dynamic object (pedestrian, automobile) in image is utilized
Black picture element 0 is marked, other objects are marked using white pixel 1, obtains one only comprising the black and white of dynamic object
Semantic image i 't;
(3) it to each image in image sequence I, repeats step (1) and (2);
Finally obtain semantic image sequence I '={ i ' only comprising dynamic objectt, i '2, i '3, i '4..., i 't}。
Step 2, ORB characteristic point is extracted in original image, and dynamic object characteristic point is rejected according to semantic image,
Only retain stationary body characteristic point:
ORB characteristic point in step 2.1, extraction original image:
According to the complexity of scene, feature quantity to be extracted is set, extracts input picture using ORB feature extractor
itIn characteristic point it(x, y), wherein x, y are characterized transverse and longitudinal coordinate a little.
Step 3, dynamic object characteristic point is rejected according to semantic image, only retains stationary body characteristic point:
(1) for itEach of characteristic point it(x, y), in semantic image i 'tMiddle determining corresponding position i 't(x, y);
(2) if it(x, y)=0, the point are black pixel point, that is, belong to dynamic object feature, execute and reject operation;
(3) if it(x, y)=1, the point are white pixel point, that is, belong to stationary body feature, execute reservation operations.
Step 4, the ORB characteristic point after being rejected based on step 3, using traditional SLAM frame based on point feature to camera
Carry out locating and tracking:
For image sequence I={ i1, i2, i3, i4..., it, the ORB characteristic point after being rejected based on step 2, using tradition
SLAM frame based on point feature calculates and optimizes camera pose, completes the positioning and tracking of camera.
Embodiment one
The present invention is assessed using Frankfurt monocular image sequence, which is Cityscapes data set
A part.Entire Frankfurt sequence provides the outdoor environment image more than 100,000 frames, and provides and can be used as true value
Positioning result.The sequence is divided into several lesser sequences, wherein including the dynamic object sequence of 1300-2500 frame, is such as driven
Sail automobile or pedestrian.Experiment porch is configured that Intel XeonE5-2690V4;The RAM of 128GB;It is tall and handsome to reach TitanV
GPU。
The sequence separated from original Frankfurt sequence is as follows:
Seq.01:frankfurt_000001_054140_leftImg8bit.png-frankfurt_000001_
056555_leftImg8bit.png
Seq.02:frankfurt_000001_012745_leftImg8bit.png-frankfurt_000001_
014100_leftImg8bit.png
Seq.03:frankfurt_000001_003311_leftImg8bit.png-frankfurt_000001_
005555_leftImg8bit.png
Seq.04:frankfurt_000001_010580_leftImg8bit.png-frankfurt_000001_
012739_leftImg8bit.png
Fig. 5 illustrates the result of semantic segmentation.Middle column show scene in trees, building, road, traffic sign and its
His object is by fine Ground Split.Right side only retains the segmentation result of dynamic object (automobile and pedestrian).Although boundary is not exclusively smart
Really, but result is sufficient to reject characteristic point.
Fig. 6 illustrates the result of dynamic object characteristic point rejecting.White car is the dynamic object travelled on road.
The two images of left column are before rejecting as a result, many of them belongs to the characteristic point of dynamic vehicle.The right side is classified as rejecting as a result, vapour
The characteristic point of vehicle is rejected completely.
Fig. 7 illustrate this method based on complete ORB-SLAM and complete ORB-SLAM Seq.01, Seq.02,
Positioning track plan view in tetra- sections of video sequences of Seq.03, Seq.04.By four width figures it is found that method proposed by the present invention obtains
Positioning track (Ours) compared to the calculated track complete ORB-SLAM (ORB-SLAM Full) and real trace
Deviation between (Ground Truth) is smaller.Since dynamic vehicle and pedestrian are more in Seq.01 sequence, two methods result
All deviation is larger between true value, but this method in positioning accuracy still better than complete ORB-SLAM.Since system is based on closing
Key frame carries out position tracking, and it is discontinuous that positioning track will appear part.
Complete ORB-SLAM has used Chi-square Test, reduces behavioral characteristics point to a certain extent to positioning accuracy
It influences, Fig. 8 illustrates this method of the imperfect ORB-SLAM based on removal Chi-square Test and imperfect ORB-SLAM exists
Positioning track plan view in tetra- sections of video sequences of Seq.01, Seq.02, Seq.03, Seq.04.By four width figures it is found that the present invention
The positioning track (Ours) that the method for proposition obtains track (ORB-SLAM calculated compared to incomplete ORB-SLAM
Imcomplete the deviation) between real trace (Ground Truth) is smaller.Since pedestrian is more in scene, deposited in scene
In a large amount of behavioral characteristics point, incomplete ORB-SLAM is positioned in Seq.02 and is fallen flat, it was demonstrated that side proposed by the present invention
Method robustness is more preferable.Position tracking is carried out since system is based on key frame, it is discontinuous that positioning track will appear part.
Finally provide positioning of four sections of image sequences in complete ORB-SLAM, incomplete ORB-SLAM and this method
As a result.The method positioning result proposed by the present invention known to Tables 1 and 2 is better than the positioning result of traditional ORB-SLAM, positioning
Precision improves 13% to 30%.
Table 1: two methods positioning result on Seq01-Seq04 image sequence compares
Table 2: two methods positioning result on Seq01-Seq04 image sequence compares
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (8)
1. a kind of dynamic scene vision positioning method based on image, semantic segmentation, which comprises the following steps:
Step 1, acquisition original image construct convolutional neural networks, and divide the original graph using the convolutional neural networks
Dynamic object as in obtains semantic image;
Step 2 extracts ORB characteristic point in the original image;
Step 3 picks the dynamic object characteristic point in step 2 gained ORB characteristic point according to step 1 gained semantic image
It removes, only retains stationary body characteristic point;
Step 4 is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method based on point feature to camera
Movement carries out locating and tracking.
2. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that:
In step 1, the step of building convolutional neural networks, includes:
Step 1.1.1, original image is downsampled to 1/4, inputs PSPNet, obtain the feature that size is 1/8 and 1/16 step by step
Figure, finally exports the characteristic pattern F1 of 1/32 size;
Step 1.1.2, original image is downsampled to 1/2, inputs the PSPNet, obtain the spy that size is 1/4 and 1/8 step by step
Sign figure, finally exports the characteristic pattern F2 of 1/16 size;
Step 1.1.3, characteristic pattern F1, F2 and true value label having a size of original image 1/16 the first CFF unit is inputted to melt
It closes, the characteristic pattern F that output size is 1/161With the loss item L of the first branch1;
Step 1.1.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, it is last defeated
The characteristic pattern F3 of 1/8 size out;By characteristic pattern F1With characteristic pattern F3 and the input of the true value label having a size of original image 1/8 the
The fusion of two CFF units, the characteristic pattern F that output size is 1/82With the loss item L of the second branch2;
Step 1.1.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, the characteristic pattern F3Through 1/4
The loss item L of third branch is exported after the true value tag processes of size3;
Step 1.1.6, by the loss item L1、L2、L3Superposition is for training the convolutional neural networks.
3. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 2, it is characterised in that:
The image processing step that CFF unit described in step 1.1.3 and step 1.1.4 includes includes:
The lesser characteristic pattern of size in two input feature vector figures is up-sampled with sample rate for 2, respectively input classification convolution
Layer and expansion convolutional layer, the convolution kernel of the classification convolutional layer having a size of 1*1*1, the convolution kernel of the expansion convolutional layer having a size of
3*3*C3, expansion rate 2;By characteristic pattern larger-size in two input feature vector figures input convolution kernel having a size of 1*1*C3Throwing
Shadow convolutional layer;The expansion convolutional layer and the output result for projecting convolutional layer are criticized by normalization are respectively then summed, then by institute
Summed result input RELU function is stated, characteristic pattern F is exportedc, by the output result of the classification convolutional layerWith true value label generation
Enter Softmax function, obtains the loss item of the CFF unit respective branches.
4. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 2, it is characterised in that:
By the loss item L described in step 1.1.61、L2、L3Superposition is for training the specific steps of the convolutional neural networks to include:
To loss item L1、L2、L3Summation, obtains final loss item Ltotal:
Wherein i is branch's number, ωiFor the weight of each branch penalty item,For the spy for being used to calculate loss function in each branch
Sign figure, Yi×XiForSize, N is kind of object number to be split in preset image,For in characteristic pattern
The position (n, y, x) numerical value,ForThe corresponding true value label at (y, x).
5. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that:
Divide the dynamic object in the original image using the convolutional neural networks described in step 1, obtain semantic image include with
Lower step:
Step 1.2.1, original image is downsampled to 1/4, inputs PSPNet, obtain the feature that size is 1/8 and 1/16 step by step
Figure, finally exports the characteristic pattern F1 of 1/32 size;
Step 1.2.2, original image is downsampled to 1/2, inputs the PSPNet, obtain the spy that size is 1/4 and 1/8 step by step
Sign figure, finally exports the characteristic pattern F2 of 1/16 size;
Step 1.2.3, characteristic pattern F1, F2 and true value label having a size of original image 1/16 the first CFF unit is inputted to melt
It closes, the characteristic pattern F that output size is 1/161;
Step 1.2.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, it is last defeated
The characteristic pattern F3 of 1/8 size out;By characteristic pattern F1The fusion of the 2nd CFF unit, the spy that output size is 1/8 are inputted with characteristic pattern F3
Sign figure F2;
Step 1.2.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, when test process, by F3
It is up-sampled, the characteristic pattern that Output Size size is 1, this feature figure is semantic segmentation figure;
Step 1.2.6, binary conversion treatment is carried out to the semantic segmentation figure: the dynamic object in the semantic segmentation figure is utilized
Black picture element 0 is marked, other objects are marked using white pixel 1, obtains one only comprising the black and white of dynamic object
Semantic image i 't;
Step 1.2.7, the operation of the step 1.2.1 to 1.2.7 is carried out to the image sequence being made of original image, it is final to obtain
To the semantic image sequence I '={ i ' for only including dynamic objectt, i '2, i '3, i '4..., i 't}。
6. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that:
In the step 2, extracting ORB characteristic point specific steps in original image includes:
According to the complexity of scene, feature quantity to be extracted is set, extracts input picture i using ORB feature extractortIn
Characteristic point it(x, y), wherein x, y are characterized transverse and longitudinal coordinate a little.
7. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that:
In the step 3, the dynamic object characteristic point in step 2 gained ORB characteristic point is picked according to step 1 gained semantic image
It removes, only the step of reservation stationary body characteristic point includes:
For original image itEach of characteristic point it(x, y), in its semantic image i 'tMiddle determining corresponding position i 't(x,
y);
If i 't(x, y)=0, the point are black pixel point, that is, belong to dynamic object feature, execute and reject operation;
If i 't(x, y)=1, the point are white pixel point, that is, belong to stationary body feature, execute reservation operations.
8. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that:
In the step 4, it is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method based on point feature to phase
Machine movement carries out locating and tracking, specifically:
For image sequence I={ i1, i2, i3, i4..., it, the ORB characteristic point after being rejected based on step 3 is based on using tradition
The SLAM frame of point feature calculates and optimizes camera pose, completes the positioning and tracking of camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910270280.0A CN110084850B (en) | 2019-04-04 | 2019-04-04 | Dynamic scene visual positioning method based on image semantic segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910270280.0A CN110084850B (en) | 2019-04-04 | 2019-04-04 | Dynamic scene visual positioning method based on image semantic segmentation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110084850A true CN110084850A (en) | 2019-08-02 |
CN110084850B CN110084850B (en) | 2023-05-23 |
Family
ID=67414356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910270280.0A Active CN110084850B (en) | 2019-04-04 | 2019-04-04 | Dynamic scene visual positioning method based on image semantic segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110084850B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610521A (en) * | 2019-10-08 | 2019-12-24 | 云海桥(北京)科技有限公司 | Positioning system and method adopting distance measurement mark and image recognition matching |
CN110673607A (en) * | 2019-09-25 | 2020-01-10 | 优地网络有限公司 | Feature point extraction method and device in dynamic scene and terminal equipment |
CN110706269A (en) * | 2019-08-30 | 2020-01-17 | 武汉斌果科技有限公司 | Binocular vision SLAM-based dynamic scene dense modeling method |
CN110827305A (en) * | 2019-10-30 | 2020-02-21 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN111311708A (en) * | 2020-01-20 | 2020-06-19 | 北京航空航天大学 | Visual SLAM method based on semantic optical flow and inverse depth filtering |
CN111340881A (en) * | 2020-02-18 | 2020-06-26 | 东南大学 | Direct method visual positioning method based on semantic segmentation in dynamic scene |
CN111488882A (en) * | 2020-04-10 | 2020-08-04 | 视研智能科技(广州)有限公司 | High-precision image semantic segmentation method for industrial part measurement |
CN111950561A (en) * | 2020-08-25 | 2020-11-17 | 桂林电子科技大学 | Semantic SLAM dynamic point removing method based on semantic segmentation |
CN112163502A (en) * | 2020-09-24 | 2021-01-01 | 电子科技大学 | Visual positioning method under indoor dynamic scene |
CN112435278A (en) * | 2021-01-26 | 2021-03-02 | 华东交通大学 | Visual SLAM method and device based on dynamic target detection |
CN112734845A (en) * | 2021-01-08 | 2021-04-30 | 浙江大学 | Outdoor monocular synchronous mapping and positioning method fusing scene semantics |
CN112766136A (en) * | 2021-01-14 | 2021-05-07 | 华南理工大学 | Space parking space detection method based on deep learning |
CN112967317A (en) * | 2021-03-09 | 2021-06-15 | 北京航空航天大学 | Visual odometry method based on convolutional neural network architecture in dynamic environment |
CN113516664A (en) * | 2021-09-02 | 2021-10-19 | 长春工业大学 | Visual SLAM method based on semantic segmentation dynamic points |
CN113673524A (en) * | 2021-07-05 | 2021-11-19 | 北京物资学院 | Method and device for removing dynamic characteristic points of warehouse semi-structured environment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015180368A1 (en) * | 2014-05-27 | 2015-12-03 | 江苏大学 | Variable factor decomposition method for semi-supervised speech features |
CN107169974A (en) * | 2017-05-26 | 2017-09-15 | 中国科学技术大学 | It is a kind of based on the image partition method for supervising full convolutional neural networks more |
CN107833236A (en) * | 2017-10-31 | 2018-03-23 | 中国科学院电子学研究所 | Semantic vision positioning system and method are combined under a kind of dynamic environment |
CN109186586A (en) * | 2018-08-23 | 2019-01-11 | 北京理工大学 | One kind towards dynamically park environment while position and mixing map constructing method |
-
2019
- 2019-04-04 CN CN201910270280.0A patent/CN110084850B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015180368A1 (en) * | 2014-05-27 | 2015-12-03 | 江苏大学 | Variable factor decomposition method for semi-supervised speech features |
CN107169974A (en) * | 2017-05-26 | 2017-09-15 | 中国科学技术大学 | It is a kind of based on the image partition method for supervising full convolutional neural networks more |
CN107833236A (en) * | 2017-10-31 | 2018-03-23 | 中国科学院电子学研究所 | Semantic vision positioning system and method are combined under a kind of dynamic environment |
CN109186586A (en) * | 2018-08-23 | 2019-01-11 | 北京理工大学 | One kind towards dynamically park environment while position and mixing map constructing method |
Non-Patent Citations (1)
Title |
---|
郑腾辉等: "基于全卷积神经网络的手术器械图像语义分割算法", 《现代计算机(专业版)》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706269B (en) * | 2019-08-30 | 2021-03-19 | 武汉斌果科技有限公司 | Binocular vision SLAM-based dynamic scene dense modeling method |
CN110706269A (en) * | 2019-08-30 | 2020-01-17 | 武汉斌果科技有限公司 | Binocular vision SLAM-based dynamic scene dense modeling method |
CN110673607A (en) * | 2019-09-25 | 2020-01-10 | 优地网络有限公司 | Feature point extraction method and device in dynamic scene and terminal equipment |
CN110673607B (en) * | 2019-09-25 | 2023-05-16 | 优地网络有限公司 | Feature point extraction method and device under dynamic scene and terminal equipment |
CN110610521A (en) * | 2019-10-08 | 2019-12-24 | 云海桥(北京)科技有限公司 | Positioning system and method adopting distance measurement mark and image recognition matching |
CN110827305A (en) * | 2019-10-30 | 2020-02-21 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN110827305B (en) * | 2019-10-30 | 2021-06-08 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN111311708A (en) * | 2020-01-20 | 2020-06-19 | 北京航空航天大学 | Visual SLAM method based on semantic optical flow and inverse depth filtering |
CN111340881A (en) * | 2020-02-18 | 2020-06-26 | 东南大学 | Direct method visual positioning method based on semantic segmentation in dynamic scene |
CN111340881B (en) * | 2020-02-18 | 2023-05-19 | 东南大学 | Direct method visual positioning method based on semantic segmentation in dynamic scene |
CN111488882B (en) * | 2020-04-10 | 2020-12-25 | 视研智能科技(广州)有限公司 | High-precision image semantic segmentation method for industrial part measurement |
CN111488882A (en) * | 2020-04-10 | 2020-08-04 | 视研智能科技(广州)有限公司 | High-precision image semantic segmentation method for industrial part measurement |
CN111950561A (en) * | 2020-08-25 | 2020-11-17 | 桂林电子科技大学 | Semantic SLAM dynamic point removing method based on semantic segmentation |
CN112163502A (en) * | 2020-09-24 | 2021-01-01 | 电子科技大学 | Visual positioning method under indoor dynamic scene |
CN112163502B (en) * | 2020-09-24 | 2022-07-12 | 电子科技大学 | Visual positioning method under indoor dynamic scene |
CN112734845A (en) * | 2021-01-08 | 2021-04-30 | 浙江大学 | Outdoor monocular synchronous mapping and positioning method fusing scene semantics |
CN112766136A (en) * | 2021-01-14 | 2021-05-07 | 华南理工大学 | Space parking space detection method based on deep learning |
CN112766136B (en) * | 2021-01-14 | 2024-03-19 | 华南理工大学 | Space parking space detection method based on deep learning |
CN112435278B (en) * | 2021-01-26 | 2021-05-04 | 华东交通大学 | Visual SLAM method and device based on dynamic target detection |
CN112435278A (en) * | 2021-01-26 | 2021-03-02 | 华东交通大学 | Visual SLAM method and device based on dynamic target detection |
CN112967317A (en) * | 2021-03-09 | 2021-06-15 | 北京航空航天大学 | Visual odometry method based on convolutional neural network architecture in dynamic environment |
CN113673524A (en) * | 2021-07-05 | 2021-11-19 | 北京物资学院 | Method and device for removing dynamic characteristic points of warehouse semi-structured environment |
CN113516664A (en) * | 2021-09-02 | 2021-10-19 | 长春工业大学 | Visual SLAM method based on semantic segmentation dynamic points |
Also Published As
Publication number | Publication date |
---|---|
CN110084850B (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084850A (en) | A kind of dynamic scene vision positioning method based on image, semantic segmentation | |
CN111339903B (en) | Multi-person human body posture estimation method | |
Li et al. | A deep learning-based hybrid framework for object detection and recognition in autonomous driving | |
Garcia-Garcia et al. | A review on deep learning techniques applied to semantic segmentation | |
Yang et al. | Deep detection network for real-life traffic sign in vehicular networks | |
CN107038448B (en) | Target detection model construction method | |
CN106599773B (en) | Deep learning image identification method and system for intelligent driving and terminal equipment | |
CN109035293B (en) | Method suitable for segmenting remarkable human body example in video image | |
Neubert et al. | Superpixel-based appearance change prediction for long-term navigation across seasons | |
Liu et al. | FG-Net: Fast large-scale LiDAR point clouds understanding network leveraging correlated feature mining and geometric-aware modelling | |
CN108734194B (en) | Virtual reality-oriented single-depth-map-based human body joint point identification method | |
CN112200111A (en) | Global and local feature fused occlusion robust pedestrian re-identification method | |
CN106709568A (en) | RGB-D image object detection and semantic segmentation method based on deep convolution network | |
CN105956560A (en) | Vehicle model identification method based on pooling multi-scale depth convolution characteristics | |
CN114037833B (en) | Semantic segmentation method for image of germchit costume | |
CN112950645B (en) | Image semantic segmentation method based on multitask deep learning | |
CN112131908A (en) | Action identification method and device based on double-flow network, storage medium and equipment | |
CN108062569A (en) | It is a kind of based on infrared and radar unmanned vehicle Driving Decision-making method | |
CN108921850B (en) | Image local feature extraction method based on image segmentation technology | |
CN111582232A (en) | SLAM method based on pixel-level semantic information | |
CN112434723B (en) | Day/night image classification and object detection method based on attention network | |
CN112381045A (en) | Lightweight human body posture recognition method for mobile terminal equipment of Internet of things | |
CN111027586A (en) | Target tracking method based on novel response map fusion | |
CN112396655A (en) | Point cloud data-based ship target 6D pose estimation method | |
Li | Vehicle detection in foggy weather based on an enhanced YOLO method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |