CN110084850A - A kind of dynamic scene vision positioning method based on image, semantic segmentation - Google Patents

A kind of dynamic scene vision positioning method based on image, semantic segmentation Download PDF

Info

Publication number
CN110084850A
CN110084850A CN201910270280.0A CN201910270280A CN110084850A CN 110084850 A CN110084850 A CN 110084850A CN 201910270280 A CN201910270280 A CN 201910270280A CN 110084850 A CN110084850 A CN 110084850A
Authority
CN
China
Prior art keywords
size
image
characteristic pattern
point
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910270280.0A
Other languages
Chinese (zh)
Other versions
CN110084850B (en
Inventor
潘树国
盛超
曾攀
黄砺枭
赵涛
王帅
高旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910270280.0A priority Critical patent/CN110084850B/en
Publication of CN110084850A publication Critical patent/CN110084850A/en
Application granted granted Critical
Publication of CN110084850B publication Critical patent/CN110084850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of dynamic scene vision positioning methods based on image, semantic segmentation, belong to SLAM(Simultaneous Localization and Mapping, synchronous to position and build figure) field.The present invention uses the supervised learning mode in deep learning to be split the dynamic object in original image first, obtains semantic image;On this basis, ORB characteristic point is extracted from original image and dynamic object characteristic point is rejected according to semantic image;Finally, carrying out locating and tracking to camera motion using the monocular SLAM method based on point feature based on the characteristic point after rejecting.Positioning result shows that compared to conventional method, positioning accuracy of the method disclosed by the invention in dynamic scene improves 13% to 30%.

Description

A kind of dynamic scene vision positioning method based on image, semantic segmentation
Technical field
The present invention relates to application of the deep learning in vision SLAM, belong to SLAM (Simultaneous Localization and Mapping, synchronous to position and build figure) field.
Background technique
It positions simultaneously and builds the key technology that figure (SLAM) is robot autonomous operation under circumstances not known.Based on robot The environmental data that external sensor detects, SLAM construct the environmental view of robot, while giving robot in ring Position in the figure of border.Compared with the distance mearuring equipments such as radar, sonar, visual sensor is with small in size, low in energy consumption, information collection is rich The features such as rich, can provide texture information abundant in external environment.Therefore, vision SLAM has become the heat of current research Point, and it is applied to the fields such as independent navigation, VR/AR.
Traditional vision SLAM algorithm based on point feature is based on static ring in restoration scenario information and camera motion What border was assumed.Dynamic object in scene will affect positioning accuracy.Currently, traditional vision SLAM algorithm based on point feature is logical It crosses detection dynamic point and is marked as exterior point to handle simple dynamic scene problem.ORB-SLAM passes through RANSAC, card side It examines, key frame method and local map reduce influence of the dynamic object to positioning accuracy.Direct method passes through optimization cost function To handle occlusion issue caused by dynamic object.2013, there is scholar to propose a kind of new key frame expression and update method, For carrying out adaptive modeling to dynamic environment, appearance or structure change in dynamic environment are effectively detected and handled.The same year, Attitude estimation and the method for building figure are used to handle dynamic scene between having scholar to introduce multiple-camera.However tradition SLAM method exists Positioning accuracy and robustness under dynamic scene have to be hoisted.
Summary of the invention
The technical problems to be solved by the present invention are:
In order to promote positioning accuracy and robustness of traditional SLAM method under dynamic scene, provide a kind of based on image language The dynamic scene vision positioning method of justice segmentation, can be split the dynamic object in scene, reject dynamic object feature Point.
The present invention uses following technical scheme to solve above-mentioned technical problem:
The present invention proposes a kind of dynamic scene vision positioning method based on image, semantic segmentation, comprising the following steps:
Step 1, acquisition original image construct convolutional neural networks, and divide the original using the convolutional neural networks Dynamic object in beginning image obtains semantic image;
Step 2 extracts ORB characteristic point in the original image;
Step 3 carries out the dynamic object characteristic point in step 2 gained ORB characteristic point according to step 1 gained semantic image It rejects, only retains stationary body characteristic point;
Step 4 is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method pair based on point feature Camera motion carries out locating and tracking.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1 In, the step of building convolutional neural networks includes:
Step 1.1.1, original image is downsampled to 1/4, inputs PSPNet, obtaining size step by step is 1/8 and 1/16 Characteristic pattern finally exports the characteristic pattern F1 of 1/32 size;
Step 1.1.2, original image is downsampled to 1/2, inputs the PSPNet, obtaining size step by step is 1/4 and 1/8 Characteristic pattern, finally export the characteristic pattern F2 of 1/16 size;
Step 1.1.3, that characteristic pattern F1, F2 and true value label having a size of original image 1/16 are inputted the first CFF is mono- Member fusion, the characteristic pattern F that output size is 1/161With the loss item L of the first branch1
Step 1.1.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, most The characteristic pattern F3 of 1/8 size is exported afterwards;By characteristic pattern F1With characteristic pattern F3 and the true value label having a size of original image 1/8 is defeated Enter the fusion of the 2nd CFF unit, the characteristic pattern F that output size is 1/82With the loss item L of the second branch2
Step 1.1.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, the characteristic pattern F3 The loss item L of third branch is exported after the true value tag processes of 1/4 size3
Step 1.1.6, by the loss item L1、L2、L3Superposition is for training the convolutional neural networks.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1.1.3 and CFF unit includes described in step 1.1.4 image processing step includes:
The lesser characteristic pattern of size in two input feature vector figures is up-sampled with sample rate for 2, input classification respectively Convolutional layer and expansion convolutional layer, the convolution kernel of the classification convolutional layer is having a size of 1*1*1, the convolution kernel ruler of the expansion convolutional layer Very little is 3*3*C3, expansion rate 2;By characteristic pattern larger-size in two input feature vector figures input convolution kernel having a size of 1*1*C3 Projection convolutional layer;Normalization is criticized respectively for the output result of the expansion convolutional layer and projection convolutional layer then to sum, then The summed result is inputted into RELU function, exports characteristic pattern Fc, by the output result of the classification convolutional layerWith true value mark Label substitute into Softmax function, obtain the loss item of the CFF unit respective branches.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1.6 It is described by the loss item L1、L2、L3Superposition is for training the specific steps of the convolutional neural networks to include:
To loss item L1、L2、L3Summation, obtains final loss item Ltotal:
Wherein i is branch's number, ωiFor the weight of each branch penalty item,To lose letter for calculating in each branch Several characteristic patterns, Yi×XiForSize, N is kind of object number to be split in preset image,For in spy Sign figureThe position (n, y, x) numerical value,ForThe corresponding true value label at (y, x).
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: step 1 institute State the dynamic object divided in the original image using the convolutional neural networks, obtain semantic image the following steps are included:
Step 1.2.1, original image is downsampled to 1/4, inputs PSPNet, obtaining size step by step is 1/8 and 1/16 Characteristic pattern finally exports the characteristic pattern F1 of 1/32 size;
Step 1.2.2, original image is downsampled to 1/2, inputs the PSPNet, obtaining size step by step is 1/4 and 1/8 Characteristic pattern, finally export the characteristic pattern F2 of 1/16 size;
Step 1.2.3, that characteristic pattern F1, F2 and true value label having a size of original image 1/16 are inputted the first CFF is mono- Member fusion, the characteristic pattern F that output size is 1/161
Step 1.2.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, most The characteristic pattern F3 of 1/8 size is exported afterwards;By characteristic pattern F1The fusion of the 2nd CFF unit, output size 1/8 are inputted with characteristic pattern F3 Characteristic pattern F2
Step 1.2.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, when test process, By F3It is up-sampled, the characteristic pattern that Output Size size is 1, this feature figure is semantic segmentation figure;
Step 1.2.6, binary conversion treatment is carried out to the semantic segmentation figure: to the dynamic object in the semantic segmentation figure It is marked using black picture element 0, other objects are marked using white pixel 1, and obtaining one only includes dynamic object Black and white semantic image i 't
Step 1.2.7, the operation of the step 1.1 to 1.7 is carried out to the image sequence being made of original image, it is final to obtain To the semantic image sequence I '={ i ' for only including dynamic objectt, i '2, i '3, i '4..., i 't}。
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: the step In rapid 2, extracting ORB characteristic point specific steps in original image includes:
According to the complexity of scene, feature quantity to be extracted is set, extracts input picture using ORB feature extractor itIn characteristic point it(x, y), wherein x, y are characterized transverse and longitudinal coordinate a little.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: the step In rapid 3, the dynamic object characteristic point in step 2 gained ORB characteristic point is rejected according to step 1 gained semantic image, only Retain stationary body characteristic point the step of include:
For original image itEach of characteristic point it(x, y), in its semantic image i 'tMiddle determining corresponding position i 't (x, y);
If i 't(x, y)=0, the point are black pixel point, that is, belong to dynamic object feature, execute and reject operation;
If i 't(x, y)=1, the point are white pixel point, that is, belong to stationary body feature, execute reservation operations.
A kind of foregoing dynamic scene vision positioning method based on image, semantic segmentation, further: the step In rapid 4, it is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method based on point feature to camera motion Locating and tracking is carried out, specifically:
For image sequence I={ i1,i2,i3,i4,…,it, the ORB characteristic point after being rejected based on step 3, using tradition SLAM frame based on point feature calculates and optimizes camera pose, completes the positioning and tracking of camera.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1, the present invention first carries out the dynamic object in original image using the supervised learning mode in deep learning Segmentation, obtains semantic image;On this basis, ORB characteristic point is extracted from original image and according to semantic image to goer Body characteristics point is rejected, to improve positioning accuracy and robustness of traditional SLAM method under dynamic scene;
2, method positioning result proposed by the present invention is better than the positioning result of traditional ORB-SLAM, and positioning accuracy improves 13% to 30%.
Detailed description of the invention
Fig. 1 is this method flow chart;
Fig. 2 is this method image, semantic segmentation network structure;
Fig. 3 is this method cascade nature integrated unit structure chart;
Fig. 4 is this method dynamic object segmentation flow chart;
Fig. 5 is this method image, semantic segmentation result figure;
Fig. 6 is that this method dynamic object characteristic point rejects result figure;
Fig. 7 is this method and positioning track plan view of the complete ORB-SLAM in four sequences;
Fig. 8 is this method and positioning track plan view of the incomplete ORB-SLAM in four sequences.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing:
Those skilled in the art can understand that unless otherwise defined, all terms used herein (including skill Art term and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Also It should be understood that those terms such as defined in the general dictionary should be understood that have in the context of the prior art The consistent meaning of meaning will not be explained in an idealized or overly formal meaning and unless defined as here.
With the development of depth learning technology, people explore the semantic information of image, improve vision whereby The performance of SLAM.Semantic segmentation is the basic task in computer vision, needs for vision input to be divided into not in semantic segmentation Classification can be explained in same semanteme.The present invention proposes a kind of dynamic scene vision positioning method based on image, semantic segmentation, it is intended to On the basis of rejecting dynamic object characteristic point, the positioning accuracy of SLAM under dynamic scene is improved, while it is abundant to obtain scene Semantic information.
The present invention proposes a kind of dynamic scene vision positioning method based on image, semantic segmentation, and Fig. 1 is this method process Figure, Fig. 4 are this method dynamic object segmentation flow charts.First using the supervised learning mode in deep learning to original image In dynamic object be split, obtain semantic image;On this basis, ORB characteristic point and basis are extracted from original image Semantic image rejects dynamic object characteristic point;Finally, using the monocular based on point feature based on the characteristic point after rejecting SLAM method carries out locating and tracking to camera motion.
Step 1, building convolutional neural networks are split the dynamic object in original image, obtain semantic image:
Step 1.1, building are used for the convolutional neural networks of semantic segmentation
Constructed neural network structure is as shown in Figure 2.In the network structure of Fig. 2 description, including top, middle part, bottom Branch, three layers of portion;Number in bracket is the dimension ratio compared to original input picture;' CFF ' it is that cascade nature fusion is single Member;The identical parameter of the three first layers network share of top layer and middle layer branch.
Now network structure is described in further detail:
Cascade image input: original image is downsampled to 1/4 size first by the top branch of the network described by Fig. 2 Image, then input PSPNet, export the characteristic pattern of 1/32 size, this is a kind of coarse segmentation result, is lacked many thin Section and boundary.At middle part and bottom leg, it is extensive that details is carried out to above-mentioned coarse result using the image and original image of 1/2 size Multiple and refinement.Although the segmentation result of top branch is more rough, semantic component abundant is contained.Therefore, it is used for details The middle part and bottom leg network of recovery and refinement are lightweights.Different points are merged using cascade nature integrated unit (CFF) The output characteristic pattern of branch enhances the learning process of different branches using cascade label guidance.
Cascade nature fusion: Fig. 3 illustrates the specific structure of cascade nature integrated unit, and wherein F1 and F2 is different branches The characteristic pattern of output, the bulk size of F2 are twice of F1.Cascade nature integrated unit is for merging different branch's outputs Characteristic pattern, the input of this element includes two characteristic patterns F1, F2 and a true value label, and the size of F1 is Y1×X1×C1, The size of F2 is Y2×X2×C2, the size of label is Y1×X1×1.For characteristic pattern F1, adopt for 2 with sample rate first Sample exports the characteristic pattern of size identical with F2.Right the latter core is having a size of 3 × 3 × C3, spreading rate be 2 expansion convolutional layer use It is refined in above-mentioned output characteristic pattern, therefore the size of F1 becomes Y2×X2×C3.For characteristic pattern F2, pass through a core Having a size of 1 × 1 × C3Convolutional layer, export Y2×X2×C3The characteristic pattern of size.Batch standard is carried out simultaneously to the output of F1 and F2 Change, and by summation layer and ' RELU ' function layer, final output fusion characteristic pattern F2 '.
Cascade label guidance;In the network structure of Fig. 2 description, the different (size of relatively primitive image of three sizes Size is respectively 1/16,1/8,1/4) true value label be used for network top, middle part and bottom leg generate three independences Loss item, and sum to three loss items, obtain final loss item:
Wherein ωtFor the weight of each branch penalty item, FtFor the characteristic pattern of each branch output, Yt×XtFor FtRuler Very little, N is kind of object number to be split in preset image,For in characteristic pattern FtThe position (n, y, x) numerical value,ForThe corresponding true value label at (y, x).
Dynamic object in step 1.2, segmentation original input picture:
The realization process for the step for Fig. 3 is illustrated.For one group of given image sequence I={ i1, i2, i3, i4..., it, wherein itThe image shot for t moment camera:
(1) to semantic segmentation network inputs piece image i constructed by step 1.1t, export the colored language after a width is divided Adopted image, in semantic image, the pixel of the objects such as automobile, pedestrian, building, direction board in different colors is labeled;
(2) binary conversion treatment is carried out to the semantic image in (1), the dynamic object (pedestrian, automobile) in image is utilized Black picture element 0 is marked, other objects are marked using white pixel 1, obtains one only comprising the black and white of dynamic object Semantic image i 't
(3) it to each image in image sequence I, repeats step (1) and (2);
Finally obtain semantic image sequence I '={ i ' only comprising dynamic objectt, i '2, i '3, i '4..., i 't}。
Step 2, ORB characteristic point is extracted in original image, and dynamic object characteristic point is rejected according to semantic image, Only retain stationary body characteristic point:
ORB characteristic point in step 2.1, extraction original image:
According to the complexity of scene, feature quantity to be extracted is set, extracts input picture using ORB feature extractor itIn characteristic point it(x, y), wherein x, y are characterized transverse and longitudinal coordinate a little.
Step 3, dynamic object characteristic point is rejected according to semantic image, only retains stationary body characteristic point:
(1) for itEach of characteristic point it(x, y), in semantic image i 'tMiddle determining corresponding position i 't(x, y);
(2) if it(x, y)=0, the point are black pixel point, that is, belong to dynamic object feature, execute and reject operation;
(3) if it(x, y)=1, the point are white pixel point, that is, belong to stationary body feature, execute reservation operations.
Step 4, the ORB characteristic point after being rejected based on step 3, using traditional SLAM frame based on point feature to camera Carry out locating and tracking:
For image sequence I={ i1, i2, i3, i4..., it, the ORB characteristic point after being rejected based on step 2, using tradition SLAM frame based on point feature calculates and optimizes camera pose, completes the positioning and tracking of camera.
Embodiment one
The present invention is assessed using Frankfurt monocular image sequence, which is Cityscapes data set A part.Entire Frankfurt sequence provides the outdoor environment image more than 100,000 frames, and provides and can be used as true value Positioning result.The sequence is divided into several lesser sequences, wherein including the dynamic object sequence of 1300-2500 frame, is such as driven Sail automobile or pedestrian.Experiment porch is configured that Intel XeonE5-2690V4;The RAM of 128GB;It is tall and handsome to reach TitanV GPU。
The sequence separated from original Frankfurt sequence is as follows:
Seq.01:frankfurt_000001_054140_leftImg8bit.png-frankfurt_000001_ 056555_leftImg8bit.png
Seq.02:frankfurt_000001_012745_leftImg8bit.png-frankfurt_000001_ 014100_leftImg8bit.png
Seq.03:frankfurt_000001_003311_leftImg8bit.png-frankfurt_000001_ 005555_leftImg8bit.png
Seq.04:frankfurt_000001_010580_leftImg8bit.png-frankfurt_000001_ 012739_leftImg8bit.png
Fig. 5 illustrates the result of semantic segmentation.Middle column show scene in trees, building, road, traffic sign and its His object is by fine Ground Split.Right side only retains the segmentation result of dynamic object (automobile and pedestrian).Although boundary is not exclusively smart Really, but result is sufficient to reject characteristic point.
Fig. 6 illustrates the result of dynamic object characteristic point rejecting.White car is the dynamic object travelled on road. The two images of left column are before rejecting as a result, many of them belongs to the characteristic point of dynamic vehicle.The right side is classified as rejecting as a result, vapour The characteristic point of vehicle is rejected completely.
Fig. 7 illustrate this method based on complete ORB-SLAM and complete ORB-SLAM Seq.01, Seq.02, Positioning track plan view in tetra- sections of video sequences of Seq.03, Seq.04.By four width figures it is found that method proposed by the present invention obtains Positioning track (Ours) compared to the calculated track complete ORB-SLAM (ORB-SLAM Full) and real trace Deviation between (Ground Truth) is smaller.Since dynamic vehicle and pedestrian are more in Seq.01 sequence, two methods result All deviation is larger between true value, but this method in positioning accuracy still better than complete ORB-SLAM.Since system is based on closing Key frame carries out position tracking, and it is discontinuous that positioning track will appear part.
Complete ORB-SLAM has used Chi-square Test, reduces behavioral characteristics point to a certain extent to positioning accuracy It influences, Fig. 8 illustrates this method of the imperfect ORB-SLAM based on removal Chi-square Test and imperfect ORB-SLAM exists Positioning track plan view in tetra- sections of video sequences of Seq.01, Seq.02, Seq.03, Seq.04.By four width figures it is found that the present invention The positioning track (Ours) that the method for proposition obtains track (ORB-SLAM calculated compared to incomplete ORB-SLAM Imcomplete the deviation) between real trace (Ground Truth) is smaller.Since pedestrian is more in scene, deposited in scene In a large amount of behavioral characteristics point, incomplete ORB-SLAM is positioned in Seq.02 and is fallen flat, it was demonstrated that side proposed by the present invention Method robustness is more preferable.Position tracking is carried out since system is based on key frame, it is discontinuous that positioning track will appear part.
Finally provide positioning of four sections of image sequences in complete ORB-SLAM, incomplete ORB-SLAM and this method As a result.The method positioning result proposed by the present invention known to Tables 1 and 2 is better than the positioning result of traditional ORB-SLAM, positioning Precision improves 13% to 30%.
Table 1: two methods positioning result on Seq01-Seq04 image sequence compares
Table 2: two methods positioning result on Seq01-Seq04 image sequence compares
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (8)

1. a kind of dynamic scene vision positioning method based on image, semantic segmentation, which comprises the following steps:
Step 1, acquisition original image construct convolutional neural networks, and divide the original graph using the convolutional neural networks Dynamic object as in obtains semantic image;
Step 2 extracts ORB characteristic point in the original image;
Step 3 picks the dynamic object characteristic point in step 2 gained ORB characteristic point according to step 1 gained semantic image It removes, only retains stationary body characteristic point;
Step 4 is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method based on point feature to camera Movement carries out locating and tracking.
2. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that: In step 1, the step of building convolutional neural networks, includes:
Step 1.1.1, original image is downsampled to 1/4, inputs PSPNet, obtain the feature that size is 1/8 and 1/16 step by step Figure, finally exports the characteristic pattern F1 of 1/32 size;
Step 1.1.2, original image is downsampled to 1/2, inputs the PSPNet, obtain the spy that size is 1/4 and 1/8 step by step Sign figure, finally exports the characteristic pattern F2 of 1/16 size;
Step 1.1.3, characteristic pattern F1, F2 and true value label having a size of original image 1/16 the first CFF unit is inputted to melt It closes, the characteristic pattern F that output size is 1/161With the loss item L of the first branch1
Step 1.1.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, it is last defeated The characteristic pattern F3 of 1/8 size out;By characteristic pattern F1With characteristic pattern F3 and the input of the true value label having a size of original image 1/8 the The fusion of two CFF units, the characteristic pattern F that output size is 1/82With the loss item L of the second branch2
Step 1.1.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, the characteristic pattern F3Through 1/4 The loss item L of third branch is exported after the true value tag processes of size3
Step 1.1.6, by the loss item L1、L2、L3Superposition is for training the convolutional neural networks.
3. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 2, it is characterised in that: The image processing step that CFF unit described in step 1.1.3 and step 1.1.4 includes includes:
The lesser characteristic pattern of size in two input feature vector figures is up-sampled with sample rate for 2, respectively input classification convolution Layer and expansion convolutional layer, the convolution kernel of the classification convolutional layer having a size of 1*1*1, the convolution kernel of the expansion convolutional layer having a size of 3*3*C3, expansion rate 2;By characteristic pattern larger-size in two input feature vector figures input convolution kernel having a size of 1*1*C3Throwing Shadow convolutional layer;The expansion convolutional layer and the output result for projecting convolutional layer are criticized by normalization are respectively then summed, then by institute Summed result input RELU function is stated, characteristic pattern F is exportedc, by the output result of the classification convolutional layerWith true value label generation Enter Softmax function, obtains the loss item of the CFF unit respective branches.
4. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 2, it is characterised in that: By the loss item L described in step 1.1.61、L2、L3Superposition is for training the specific steps of the convolutional neural networks to include:
To loss item L1、L2、L3Summation, obtains final loss item Ltotal:
Wherein i is branch's number, ωiFor the weight of each branch penalty item,For the spy for being used to calculate loss function in each branch Sign figure, Yi×XiForSize, N is kind of object number to be split in preset image,For in characteristic pattern The position (n, y, x) numerical value,ForThe corresponding true value label at (y, x).
5. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that: Divide the dynamic object in the original image using the convolutional neural networks described in step 1, obtain semantic image include with Lower step:
Step 1.2.1, original image is downsampled to 1/4, inputs PSPNet, obtain the feature that size is 1/8 and 1/16 step by step Figure, finally exports the characteristic pattern F1 of 1/32 size;
Step 1.2.2, original image is downsampled to 1/2, inputs the PSPNet, obtain the spy that size is 1/4 and 1/8 step by step Sign figure, finally exports the characteristic pattern F2 of 1/16 size;
Step 1.2.3, characteristic pattern F1, F2 and true value label having a size of original image 1/16 the first CFF unit is inputted to melt It closes, the characteristic pattern F that output size is 1/161
Step 1.2.4, original image is inputted into the PSPNet, obtains the characteristic pattern that size is 1/2 and 1/4 step by step, it is last defeated The characteristic pattern F3 of 1/8 size out;By characteristic pattern F1The fusion of the 2nd CFF unit, the spy that output size is 1/8 are inputted with characteristic pattern F3 Sign figure F2
Step 1.2.5, the described characteristic pattern F2By up-sampling, the characteristic pattern F that size is 1/4 is obtained3, when test process, by F3 It is up-sampled, the characteristic pattern that Output Size size is 1, this feature figure is semantic segmentation figure;
Step 1.2.6, binary conversion treatment is carried out to the semantic segmentation figure: the dynamic object in the semantic segmentation figure is utilized Black picture element 0 is marked, other objects are marked using white pixel 1, obtains one only comprising the black and white of dynamic object Semantic image i 't
Step 1.2.7, the operation of the step 1.2.1 to 1.2.7 is carried out to the image sequence being made of original image, it is final to obtain To the semantic image sequence I '={ i ' for only including dynamic objectt, i '2, i '3, i '4..., i 't}。
6. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that: In the step 2, extracting ORB characteristic point specific steps in original image includes:
According to the complexity of scene, feature quantity to be extracted is set, extracts input picture i using ORB feature extractortIn Characteristic point it(x, y), wherein x, y are characterized transverse and longitudinal coordinate a little.
7. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that: In the step 3, the dynamic object characteristic point in step 2 gained ORB characteristic point is picked according to step 1 gained semantic image It removes, only the step of reservation stationary body characteristic point includes:
For original image itEach of characteristic point it(x, y), in its semantic image i 'tMiddle determining corresponding position i 't(x, y);
If i 't(x, y)=0, the point are black pixel point, that is, belong to dynamic object feature, execute and reject operation;
If i 't(x, y)=1, the point are white pixel point, that is, belong to stationary body feature, execute reservation operations.
8. a kind of dynamic scene vision positioning method based on image, semantic segmentation according to claim 1, it is characterised in that: In the step 4, it is based on the resulting stationary body characteristic point of step 3, using traditional SLAM method based on point feature to phase Machine movement carries out locating and tracking, specifically:
For image sequence I={ i1, i2, i3, i4..., it, the ORB characteristic point after being rejected based on step 3 is based on using tradition The SLAM frame of point feature calculates and optimizes camera pose, completes the positioning and tracking of camera.
CN201910270280.0A 2019-04-04 2019-04-04 Dynamic scene visual positioning method based on image semantic segmentation Active CN110084850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910270280.0A CN110084850B (en) 2019-04-04 2019-04-04 Dynamic scene visual positioning method based on image semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910270280.0A CN110084850B (en) 2019-04-04 2019-04-04 Dynamic scene visual positioning method based on image semantic segmentation

Publications (2)

Publication Number Publication Date
CN110084850A true CN110084850A (en) 2019-08-02
CN110084850B CN110084850B (en) 2023-05-23

Family

ID=67414356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910270280.0A Active CN110084850B (en) 2019-04-04 2019-04-04 Dynamic scene visual positioning method based on image semantic segmentation

Country Status (1)

Country Link
CN (1) CN110084850B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610521A (en) * 2019-10-08 2019-12-24 云海桥(北京)科技有限公司 Positioning system and method adopting distance measurement mark and image recognition matching
CN110673607A (en) * 2019-09-25 2020-01-10 优地网络有限公司 Feature point extraction method and device in dynamic scene and terminal equipment
CN110706269A (en) * 2019-08-30 2020-01-17 武汉斌果科技有限公司 Binocular vision SLAM-based dynamic scene dense modeling method
CN110827305A (en) * 2019-10-30 2020-02-21 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN111311708A (en) * 2020-01-20 2020-06-19 北京航空航天大学 Visual SLAM method based on semantic optical flow and inverse depth filtering
CN111340881A (en) * 2020-02-18 2020-06-26 东南大学 Direct method visual positioning method based on semantic segmentation in dynamic scene
CN111488882A (en) * 2020-04-10 2020-08-04 视研智能科技(广州)有限公司 High-precision image semantic segmentation method for industrial part measurement
CN111950561A (en) * 2020-08-25 2020-11-17 桂林电子科技大学 Semantic SLAM dynamic point removing method based on semantic segmentation
CN112163502A (en) * 2020-09-24 2021-01-01 电子科技大学 Visual positioning method under indoor dynamic scene
CN112435278A (en) * 2021-01-26 2021-03-02 华东交通大学 Visual SLAM method and device based on dynamic target detection
CN112734845A (en) * 2021-01-08 2021-04-30 浙江大学 Outdoor monocular synchronous mapping and positioning method fusing scene semantics
CN112766136A (en) * 2021-01-14 2021-05-07 华南理工大学 Space parking space detection method based on deep learning
CN112967317A (en) * 2021-03-09 2021-06-15 北京航空航天大学 Visual odometry method based on convolutional neural network architecture in dynamic environment
CN113516664A (en) * 2021-09-02 2021-10-19 长春工业大学 Visual SLAM method based on semantic segmentation dynamic points
CN113673524A (en) * 2021-07-05 2021-11-19 北京物资学院 Method and device for removing dynamic characteristic points of warehouse semi-structured environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180368A1 (en) * 2014-05-27 2015-12-03 江苏大学 Variable factor decomposition method for semi-supervised speech features
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107833236A (en) * 2017-10-31 2018-03-23 中国科学院电子学研究所 Semantic vision positioning system and method are combined under a kind of dynamic environment
CN109186586A (en) * 2018-08-23 2019-01-11 北京理工大学 One kind towards dynamically park environment while position and mixing map constructing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180368A1 (en) * 2014-05-27 2015-12-03 江苏大学 Variable factor decomposition method for semi-supervised speech features
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107833236A (en) * 2017-10-31 2018-03-23 中国科学院电子学研究所 Semantic vision positioning system and method are combined under a kind of dynamic environment
CN109186586A (en) * 2018-08-23 2019-01-11 北京理工大学 One kind towards dynamically park environment while position and mixing map constructing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑腾辉等: "基于全卷积神经网络的手术器械图像语义分割算法", 《现代计算机(专业版)》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706269B (en) * 2019-08-30 2021-03-19 武汉斌果科技有限公司 Binocular vision SLAM-based dynamic scene dense modeling method
CN110706269A (en) * 2019-08-30 2020-01-17 武汉斌果科技有限公司 Binocular vision SLAM-based dynamic scene dense modeling method
CN110673607A (en) * 2019-09-25 2020-01-10 优地网络有限公司 Feature point extraction method and device in dynamic scene and terminal equipment
CN110673607B (en) * 2019-09-25 2023-05-16 优地网络有限公司 Feature point extraction method and device under dynamic scene and terminal equipment
CN110610521A (en) * 2019-10-08 2019-12-24 云海桥(北京)科技有限公司 Positioning system and method adopting distance measurement mark and image recognition matching
CN110827305A (en) * 2019-10-30 2020-02-21 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN110827305B (en) * 2019-10-30 2021-06-08 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN111311708A (en) * 2020-01-20 2020-06-19 北京航空航天大学 Visual SLAM method based on semantic optical flow and inverse depth filtering
CN111340881A (en) * 2020-02-18 2020-06-26 东南大学 Direct method visual positioning method based on semantic segmentation in dynamic scene
CN111340881B (en) * 2020-02-18 2023-05-19 东南大学 Direct method visual positioning method based on semantic segmentation in dynamic scene
CN111488882B (en) * 2020-04-10 2020-12-25 视研智能科技(广州)有限公司 High-precision image semantic segmentation method for industrial part measurement
CN111488882A (en) * 2020-04-10 2020-08-04 视研智能科技(广州)有限公司 High-precision image semantic segmentation method for industrial part measurement
CN111950561A (en) * 2020-08-25 2020-11-17 桂林电子科技大学 Semantic SLAM dynamic point removing method based on semantic segmentation
CN112163502A (en) * 2020-09-24 2021-01-01 电子科技大学 Visual positioning method under indoor dynamic scene
CN112163502B (en) * 2020-09-24 2022-07-12 电子科技大学 Visual positioning method under indoor dynamic scene
CN112734845A (en) * 2021-01-08 2021-04-30 浙江大学 Outdoor monocular synchronous mapping and positioning method fusing scene semantics
CN112766136A (en) * 2021-01-14 2021-05-07 华南理工大学 Space parking space detection method based on deep learning
CN112766136B (en) * 2021-01-14 2024-03-19 华南理工大学 Space parking space detection method based on deep learning
CN112435278B (en) * 2021-01-26 2021-05-04 华东交通大学 Visual SLAM method and device based on dynamic target detection
CN112435278A (en) * 2021-01-26 2021-03-02 华东交通大学 Visual SLAM method and device based on dynamic target detection
CN112967317A (en) * 2021-03-09 2021-06-15 北京航空航天大学 Visual odometry method based on convolutional neural network architecture in dynamic environment
CN113673524A (en) * 2021-07-05 2021-11-19 北京物资学院 Method and device for removing dynamic characteristic points of warehouse semi-structured environment
CN113516664A (en) * 2021-09-02 2021-10-19 长春工业大学 Visual SLAM method based on semantic segmentation dynamic points

Also Published As

Publication number Publication date
CN110084850B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN110084850A (en) A kind of dynamic scene vision positioning method based on image, semantic segmentation
CN111339903B (en) Multi-person human body posture estimation method
Li et al. A deep learning-based hybrid framework for object detection and recognition in autonomous driving
Garcia-Garcia et al. A review on deep learning techniques applied to semantic segmentation
Yang et al. Deep detection network for real-life traffic sign in vehicular networks
CN107038448B (en) Target detection model construction method
CN106599773B (en) Deep learning image identification method and system for intelligent driving and terminal equipment
CN109035293B (en) Method suitable for segmenting remarkable human body example in video image
Neubert et al. Superpixel-based appearance change prediction for long-term navigation across seasons
Liu et al. FG-Net: Fast large-scale LiDAR point clouds understanding network leveraging correlated feature mining and geometric-aware modelling
CN108734194B (en) Virtual reality-oriented single-depth-map-based human body joint point identification method
CN112200111A (en) Global and local feature fused occlusion robust pedestrian re-identification method
CN106709568A (en) RGB-D image object detection and semantic segmentation method based on deep convolution network
CN105956560A (en) Vehicle model identification method based on pooling multi-scale depth convolution characteristics
CN114037833B (en) Semantic segmentation method for image of germchit costume
CN112950645B (en) Image semantic segmentation method based on multitask deep learning
CN112131908A (en) Action identification method and device based on double-flow network, storage medium and equipment
CN108062569A (en) It is a kind of based on infrared and radar unmanned vehicle Driving Decision-making method
CN108921850B (en) Image local feature extraction method based on image segmentation technology
CN111582232A (en) SLAM method based on pixel-level semantic information
CN112434723B (en) Day/night image classification and object detection method based on attention network
CN112381045A (en) Lightweight human body posture recognition method for mobile terminal equipment of Internet of things
CN111027586A (en) Target tracking method based on novel response map fusion
CN112396655A (en) Point cloud data-based ship target 6D pose estimation method
Li Vehicle detection in foggy weather based on an enhanced YOLO method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant