CN111626417B - Closed loop detection method based on unsupervised deep learning - Google Patents

Closed loop detection method based on unsupervised deep learning Download PDF

Info

Publication number
CN111626417B
CN111626417B CN202010360548.2A CN202010360548A CN111626417B CN 111626417 B CN111626417 B CN 111626417B CN 202010360548 A CN202010360548 A CN 202010360548A CN 111626417 B CN111626417 B CN 111626417B
Authority
CN
China
Prior art keywords
landmark
image
scene
loop detection
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010360548.2A
Other languages
Chinese (zh)
Other versions
CN111626417A (en
Inventor
石朝侠
汪丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202010360548.2A priority Critical patent/CN111626417B/en
Publication of CN111626417A publication Critical patent/CN111626417A/en
Application granted granted Critical
Publication of CN111626417B publication Critical patent/CN111626417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a closed loop detection method based on unsupervised deep learning, which utilizes the theory that very rich semantic information is usually embedded in the last several layers of convolutional layers of a convolutional neural network, directly identifies an interested region by deep layers of convolutional layers of the convolutional neural network to generate landmarks, and then extracts convolution characteristics from each landmark to generate the final expression of an image. The novel mode has appearance invariance and viewpoint invariance, and can detect whether the current position is the position which the robot has arrived at under the condition of extreme change, thereby eliminating the accumulated error of the robot in the simultaneous positioning and mapping and the relocation after the tracking is lost. The method can be applied to the field of mobile robots, such as unmanned vehicles, unmanned planes, virtual reality and augmented reality, the positioning capability of the unmanned vehicles, the unmanned planes, the virtual reality and the augmented reality is improved, and meanwhile, a globally consistent map is constructed.

Description

Closed loop detection method based on unsupervised deep learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a closed loop detection method based on unsupervised deep learning.
Background
The mobile robot technology is a frontier field with wide application and great prospect in the world at present. The system integrates theoretical research results of a plurality of subjects such as artificial intelligence, sensor technology, signal processing, automatic control engineering, computer technology, industrial design and the like, is widely applied to various industries such as industry, agriculture, service industry, medical treatment, national defense and the like, can assist or replace the work of human beings, and is particularly important for application research in occasions such as space and underwater exploration and the like under the condition that the human beings cannot reach or are in danger. Therefore, the mobile robot technology is generally concerned and invested in a large amount in all countries of the world, and also becomes an important index for measuring the national scientific research technology. For a mobile robot, its autonomous navigation, path planning, relies on good positioning and mapping. Loop detection is a method for eliminating accumulated errors caused in the long-term motion process of the robot. The method has the key idea that the current position is identified to be the position where the robot has arrived, the track of the robot is pulled back to the correct position, and the robot can be repositioned under the condition that the camera is lost, so that more accurate positioning is realized, and a globally consistent map is constructed.
There are two significant challenges in the closed loop detection algorithm: 1) Appearance changes due to weather, shading, and dynamic objects; 2) A change in viewpoint due to a camera photographing position or the like. The current mainstream methods are as follows: (1) Generating an image representation by using the characteristics of local artificial design characteristics extracted from the image, and then accelerating image descriptor matching through a bag-of-words model; (2) And (4) directly extracting globally considered design features of the image and then directly matching.
Method (1) is robust to viewpoint changes, but is not suitable for handling appearance changes. Method (2) performs well in environmental changes, but they do not perform well when viewpoints and occlusions are present in the environment. And neither approach provides satisfactory performance in the presence of variations in combinations of lighting, occlusion, viewpoint, and other factors
Disclosure of Invention
The invention aims to provide a closed loop detection method based on unsupervised deep learning.
The technical scheme adopted by the invention is as follows: a closed loop detection method based on unsupervised deep learning comprises the following steps:
1) Inputting the scene query frame and the scene database image into a pre-trained vgg-16 convolutional neural network, and directly identifying an interested region from a convolutional layer of the convolutional neural network;
2) For each scene query frame and scene database image, generating 100 landmarks with the identified regions of interest;
3) Extracting a convolution feature descriptor from each landmark generated from the image by using an unsupervised deep neural network to obtain a corresponding feature vector;
4) Cross-matching landmark regions of the two frames by calculating cosine distances between landmark vectors of the scene query frames and landmark vectors of each scene database image, and reserving landmarks which are matched with each other;
5) And calculating the overall similarity between the scene query frame and each scene database image according to the matched landmark pairs to determine whether a scene similar to the scene query frame exists in the scene database, thereby judging whether a loop appears.
In addition, the closed-loop detection method based on unsupervised deep learning according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, according to the input requirement of vgg-16 network, all images are adjusted to 224 × 224 size as the input of vgg-16 network, the deep convolutional layer of the convolutional neural network is used to obtain the feature map corresponding to the image, and then all non-zero activation values and its surrounding 8 adjacent activation values are respectively gathered into a type as the region of interest of the identified image.
Further, in one embodiment of the invention, each cluster C is calculated i Energy value E of i
Figure BDA0002474911650000021
Wherein, | c i | represents the size of the ith cluster,
Figure BDA0002474911650000022
is represented by C i The jth activation value of (a).
The first 100 clusters with the largest energy values are then selected as landmarks for the image generation.
Further, in one embodiment of the invention, an unsupervised deep neural network pair is utilized to extract the convolution feature descriptors for each landmark. The unsupervised deep neural network is specially designed for closed-loop detection tasks and aims to train the ability of the network to learn and extract HOG features. When training is finished, the network has the capability of learning and reconstructing HOG characteristics, only three convolution layers and corresponding pooling layers are reserved, and all network layers except the three convolution layers are discarded to extract convolution characteristics of the image.
Further, in one embodiment of the present invention, all landmarks extracted from both images are cross-matched. Scene query frame I using cosine distance metric q A landmark u and each image in the scene database frame
Figure BDA0002474911650000031
Similarity between one landmark v:
Figure BDA0002474911650000032
in the formula (d) u,v I.e. the cosine distances of u and v. Wherein
Figure BDA0002474911650000033
Respectively represent a pair I q Landmark u and
Figure BDA0002474911650000034
the landmark v in (1) extracts the convolved feature vector, | represents the length of the vector.
I is then determined using a simple linear search q And
Figure BDA0002474911650000035
similarity between all landmarks and applying cross-checking to accept only landmarks that match each other. For each matching landmark pair (u, v), its weight is determined according to their region size, and the weight is denoted as W u,v
Figure BDA0002474911650000036
Wherein h is u ,h v ,w u ,w v Height and width, | h, of (u, v) regions, respectively u -h v I and | w u -w v | represents the absolute value of the difference between the height and the width of the two regions, respectively
Final I q And
Figure BDA0002474911650000037
global similarity score
Figure BDA0002474911650000038
Comprises the following steps:
Figure BDA0002474911650000039
further, in one embodiment of the present invention, image I is queried for each frame q Traverse and calculate it and all images in the database
Figure BDA00024749116500000310
Wherein the image with the highest score is I q The best matching:
Figure BDA0002474911650000041
z is represented by q The reference frame with the highest similarity score. Thereby obtaining a scene similar to the scene query frame in the scene database.
Compared with the prior art, the invention has the following remarkable advantages:
(1) The method disclosed by the invention combines the deep learning technology and the closed-loop detection by utilizing the successful application of the deep learning in the field of scene identification, thereby greatly improving the closed-loop detection capability of the mobile robot in the environment with extreme appearance change and viewpoint change.
(2) The invention utilizes the last several layers of convolution layers of the convolution neural network to embed very rich semantic information which corresponds to some image areas meaningful for closed loop detection tasks and can directly generate landmark representation of the image.
(3) The invention extracts landmark feature descriptors by using an unsupervised deep neural network model specially designed for closed-loop detection, and the convolution features are lighter and more compact than those extracted from a general neural network.
Drawings
FIG. 1 is an overall block diagram of the method of the present invention.
Detailed Description
With the wide application of the mobile robot, the positioning and mapping capability of the mobile robot is an important factor limiting the application scenario of the mobile robot. And a good closed-loop detection algorithm can greatly improve the positioning and mapping capability of the mobile robot in an unknown environment. In order to overcome the defects in the prior art, the invention provides a closed-loop detection method based on unsupervised deep learning.
The invention is further described in the following with reference to the drawings.
The specific steps of the present invention are further described in detail with reference to fig. 1, and the present implementation takes a data set, campusLoop, as an example to illustrate the closed loop detection process.
Step 1, a CamputLoop dataset is used as input of the method.
The CamputLoop dataset is read and reset to 224 × 224, a sequence of 100 images shot in clear weather in the dataset is used as a scene query set, and another sequence of 100 images shot in snow weather is used as a scene database set. Respectively input into a pre-trained vgg-16 convolutional neural network
And 2, generating 100 landmarks for each image frame.
For each frame image, firstly, a specific convolution layer of the vgg-16 convolution neural network is utilized to obtain corresponding feature mapping, and then all non-zero activation values and 8 adjacent activation values around the activation values are respectively gathered into a class which is marked as C i (i∈{1,2,…,N}),N(N>= 100) represents the number of clusters in one image. Each cluster C i Energy value E of i Can be calculated as:
Figure BDA0002474911650000051
wherein | c i | represents the size of the ith cluster,
Figure BDA0002474911650000052
is represented by C i The jth activation value of (b). After the energy values of N clusters are obtained, 100 clusters with the largest energy value are taken as detected landmarks and are recorded as: l is s ,s∈[1,2,…,100]
And 3, training the capability of unsupervised deep neural network learning and HOG feature extraction.
In this network, X represents the dimension of the HOG feature,
Figure BDA0002474911650000053
representing the dimensions of the reconstructed feature descriptors. In the self-coding model, linear rectification function (ReLU) activation is used for three convolutional layers, and sigmoid activation is used for a fully-connected layer so that the network can better reconstruct the HOG features. When training is finished, the network has the ability of learning and reconstructing HOG features, only three convolution layers and corresponding pooling layers are reserved, and all the network layers except the convolution layers are discarded to extract convolution features of the image. Furthermore, since the dimensions of the HOG features extracted for the same size input are the same, it is possible to utilize the euclidean distance as a distance metric for the HOG descriptor, utilizing l at the loss level 2 By comparison of X with its reconstruction by a loss function
Figure BDA0002474911650000054
The size of (2):
Figure BDA0002474911650000055
and 4, extracting convolution characteristics from each landmark.
For each landmark detected, the convolutional feature descriptors are extracted using the above trained unsupervised convolutional autoencoder network, which is specifically designed for closed-loop detection tasks. The network is fast and reliable, can realize real-time detection of closed loops without reducing the dimensionality of the extracted convolution features, and can replace a general neural network in a closed loop detection system based on the convolution features. Furthermore, the network does not require context-specific training. For any one image, the total feature dimension is 106400.
And 5, cross-matching the landmarks between frames.
By computing scene query frame I q Landmark vectors and each database image
Figure BDA0002474911650000061
Cross-matching the landmark regions of two frames by cosine distance between landmark vectors, I q A landmark u (u e i) and
Figure BDA0002474911650000062
the similarity between one landmark v (v ∈ i), i.e. the cosine distance, is:
Figure BDA0002474911650000063
d u,v i.e. the cosine distances of u and v. Wherein
Figure BDA0002474911650000064
Respectively represent a pair I q Landmark u and
Figure BDA0002474911650000065
the landmark v extracted convolved feature vector of (1), ii · |, represents the length of the vector.
Determining I using a simple linear search q And
Figure BDA0002474911650000066
and matching all the landmarks, and keeping the landmarks which are matched with each other.
And 6, generating a final representation of the image to perform image retrieval.
For each matching landmark pair (u, v), its weight is determined according to their region size, and the weight is denoted as W u,v
Figure BDA0002474911650000067
Wherein h is u ,h v ,w u ,w v Height and width, | h, of (u, v) region, respectively u -h v I and | w u -w v | represents the absolute value of the difference between the height and the width of the two regions, respectively
Finally, I q And
Figure BDA0002474911650000068
global similarity score
Figure BDA0002474911650000069
Comprises the following steps:
Figure BDA00024749116500000610
query image I for each frame q Traverse and calculate it and all images in the database
Figure BDA00024749116500000611
Wherein the image with the highest score is I q The best matching:
Figure BDA00024749116500000612
z is represented by q The reference frame with the highest similarity score.
And 7, judging whether a loop appears or not.
And 6, judging whether a loop is detected or not by combining the real scene relation corresponding to the data set according to the retrieved result in the step 6.

Claims (6)

1. A closed loop detection method based on unsupervised deep learning is characterized by comprising the following steps: 1) Inputting the scene query frame and the scene database image into a pre-trained vgg-16 convolutional neural network, and directly identifying an interested region from a convolutional layer of the convolutional neural network; 2) Generating a landmark for each scene query frame and scene database image by using the identified region of interest; 3) Extracting a convolution feature descriptor from each landmark generated from the image by using an unsupervised deep neural network to obtain a corresponding feature vector; 4) Calculating cosine distance between landmark vectors of scene query frames and landmark vectors of each scene database image to cross-match landmark areas of the two frames and keep landmarks which are matched with each other; 5) Calculating the overall similarity between the scene query frame and each scene database image according to the matched landmark pairs to obtain the optimal matching, and judging whether the closed loop can be correctly detected according to the corresponding relation of the real scene;
the unsupervised deep neural network in the step 3) is used for training the ability of the network to learn and extract the HOG characteristics; when training is finished, the network has the capability of learning and reconstructing HOG characteristics, only three convolution layers and corresponding pooling layers are reserved, and all network layers except the three convolution layers are discarded to extract convolution characteristics of the image;
in the step 4), all landmarks extracted from the two images are matched in a cross mode, and a scene query frame I is measured by using cosine distance q A landmark u and each image in the scene database frame
Figure FDA0003737250800000011
Similarity between one landmark v:
Figure FDA0003737250800000012
in the formula (d) u,v I.e. the cosine distance of u and v, where
Figure FDA0003737250800000013
Respectively represent a pair I q Landmark u and
Figure FDA0003737250800000014
the landmark v of (1) represents a convolution feature vector of (II · |) the vectorA length;
determining I using a linear search q And
Figure FDA0003737250800000015
matches between all landmarks and applying cross-checking to accept only landmarks that match each other; for each matching landmark pair (u, v), its weight is determined according to their region size, and the weight is denoted as W u,v
Figure FDA0003737250800000016
In the formula, h u ,h v ,w u ,w v Height and width, | h, of (u, v) region, respectively u -h v I and | w u -w v And | represents the absolute value of the difference between the high and wide values of the two regions, respectively.
2. The closed-loop detection method based on unsupervised deep learning of claim 1, characterized in that: in step 1, according to the input requirement of vgg-16 network, all images are adjusted to 224 × 224 size as the input of vgg-16 network, the deep convolutional layer of the convolutional neural network is used to obtain the feature mapping corresponding to the image, and then all nonzero activation values and 8 adjacent activation values around the activation values are respectively gathered into a type of region of interest as the identified image.
3. The closed-loop detection method based on unsupervised deep learning of claim 2, characterized in that: in step 2, each cluster C is calculated i Energy value E of i
Figure FDA0003737250800000021
Wherein, | c i | represents the size of the ith cluster,
Figure FDA0003737250800000022
to representC i The jth activation value of (a).
4. The unsupervised deep learning-based closed-loop detection method according to claim 3, characterized in that: in step 2, the first 100 clusters with the largest energy value are selected as landmarks for current image generation.
5. The unsupervised deep learning-based closed-loop detection method of claim 1, wherein final I q And
Figure FDA0003737250800000023
global similarity score
Figure FDA0003737250800000024
Comprises the following steps:
Figure FDA0003737250800000025
6. the closed-loop detection method based on unsupervised deep learning of claim 1, characterized in that: in step 5, the image I is queried for each frame q Traverse and calculate it and all images in the database
Figure FDA0003737250800000026
Wherein the image with the highest score is I q The best matching of (2):
Figure FDA0003737250800000027
z is represented by q The reference frame with the highest similarity score, thereby obtaining a scene similar to the scene query frame in the scene database.
CN202010360548.2A 2020-04-30 2020-04-30 Closed loop detection method based on unsupervised deep learning Active CN111626417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010360548.2A CN111626417B (en) 2020-04-30 2020-04-30 Closed loop detection method based on unsupervised deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010360548.2A CN111626417B (en) 2020-04-30 2020-04-30 Closed loop detection method based on unsupervised deep learning

Publications (2)

Publication Number Publication Date
CN111626417A CN111626417A (en) 2020-09-04
CN111626417B true CN111626417B (en) 2022-10-28

Family

ID=72259758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010360548.2A Active CN111626417B (en) 2020-04-30 2020-04-30 Closed loop detection method based on unsupervised deep learning

Country Status (1)

Country Link
CN (1) CN111626417B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767905A (en) * 2020-09-01 2020-10-13 南京晓庄学院 Improved image method based on landmark-convolution characteristics
CN114018271A (en) * 2021-10-08 2022-02-08 北京控制工程研究所 Accurate fixed-point landing autonomous navigation method and system based on landmark images

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583371A (en) * 2018-11-29 2019-04-05 北京航天自动控制研究所 Landmark information based on deep learning extracts and matching process
CN110555881A (en) * 2019-08-29 2019-12-10 桂林电子科技大学 Visual SLAM testing method based on convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583371A (en) * 2018-11-29 2019-04-05 北京航天自动控制研究所 Landmark information based on deep learning extracts and matching process
CN110555881A (en) * 2019-08-29 2019-12-10 桂林电子科技大学 Visual SLAM testing method based on convolutional neural network

Also Published As

Publication number Publication date
CN111626417A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN109508654B (en) Face analysis method and system fusing multitask and multi-scale convolutional neural network
Hu et al. Dasgil: Domain adaptation for semantic and geometric-aware image-based localization
CN107967457A (en) A kind of place identification for adapting to visual signature change and relative positioning method and system
Xia et al. Loop closure detection for visual SLAM using PCANet features
Komorowski et al. Minkloc++: lidar and monocular image fusion for place recognition
Yue et al. Robust loop closure detection based on bag of superpoints and graph verification
CN112562081B (en) Visual map construction method for visual layered positioning
CN109272577B (en) Kinect-based visual SLAM method
CN111626417B (en) Closed loop detection method based on unsupervised deep learning
CN111767905A (en) Improved image method based on landmark-convolution characteristics
Yin et al. Pse-match: A viewpoint-free place recognition method with parallel semantic embedding
Liu et al. Loop closure detection using CNN words
Ma et al. 3D convolutional auto-encoder based multi-scale feature extraction for point cloud registration
CN113592015B (en) Method and device for positioning and training feature matching network
CN112861808B (en) Dynamic gesture recognition method, device, computer equipment and readable storage medium
Barroso-Laguna et al. Scalenet: A shallow architecture for scale estimation
CN114067128A (en) SLAM loop detection method based on semantic features
Li et al. Sparse-to-local-dense matching for geometry-guided correspondence estimation
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
Zhou et al. Retrieval and localization with observation constraints
Tsintotas et al. The revisiting problem in simultaneous localization and mapping
Tsintotas et al. Visual place recognition for simultaneous localization and mapping
Munoz et al. Improving Place Recognition Using Dynamic Object Detection
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
Zhao et al. Attention-enhanced cross-modal localization between spherical images and point clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Shi Chaoxia

Inventor after: Wang Dan

Inventor before: Wang Dan

Inventor before: Shi Chaoxia

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant