CN111767854B - SLAM loop detection method combined with scene text semantic information - Google Patents

SLAM loop detection method combined with scene text semantic information Download PDF

Info

Publication number
CN111767854B
CN111767854B CN202010608535.2A CN202010608535A CN111767854B CN 111767854 B CN111767854 B CN 111767854B CN 202010608535 A CN202010608535 A CN 202010608535A CN 111767854 B CN111767854 B CN 111767854B
Authority
CN
China
Prior art keywords
text
feature
model
slam
current frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010608535.2A
Other languages
Chinese (zh)
Other versions
CN111767854A (en
Inventor
杨国青
李夷奇
李红
吕攀
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010608535.2A priority Critical patent/CN111767854B/en
Publication of CN111767854A publication Critical patent/CN111767854A/en
Application granted granted Critical
Publication of CN111767854B publication Critical patent/CN111767854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a SLAM loop detection method combining scene text semantic information, which utilizes a deep neural network to extract features in an image input by a sensor, detects and identifies texts appearing in the image, and finally performs weighted fusion on feature point similarity and text semantic similarity. Aiming at the real-time requirement of SLAM and the limitation of computing resources of an embedded platform, the invention provides a lightweight text detection model EAST-light on the basis of an EAST model, and a ShuffleNet V2 model is used for replacing a VGG16 model in a feature extraction module, thereby greatly improving the running speed of the model and realizing better balance on speed and precision.

Description

SLAM loop detection method combined with scene text semantic information
Technical Field
The invention belongs to the technical field of synchronous positioning and map construction, and particularly relates to a SLAM loop detection method combined with scene text semantic information.
Background
The intelligent mobile robot has wide attention due to wide application prospect, and with the development of artificial intelligence technology, technical innovation in the fields of machine learning and the like is also integrated into the robot technology, so that the mobility and the intelligence of the robot are improved. In order to play a greater role in industry and life, the intelligent mobile robot needs to have the capability of autonomous movement, namely, positioning and navigation are performed by sensing environmental information, which is a problem to be solved by a simultaneouspositioning and Mapping (SLAM) technology. The robot based on the SLAM technology can perform self-positioning according to pose estimation and sensor data in the moving process, meanwhile, an incremental map is constructed for the surrounding environment, and functions of path planning, navigation and the like are further realized.
Loop detection is an important link of SLAM, namely, the problem that pose estimation drifts along with time is solved by enabling a robot to identify a scene which is reached once; in visual SLAM, the loop detection consists in finding the similarity between two frame images. Similarity is generally calculated by a Bag-of-Words (BoW) model in traditional loop tests: after the visual features of the manual design in the images are extracted, the BoW model clusters the feature descriptors to obtain words, a dictionary is built, then the words contained in each frame of image are found to form description vectors, and whether a loop appears or not is judged by calculating the similarity between the vectors. The disadvantage of the BoW model is that only whether a word appears in an image is concerned, the relative position of the word in the space is ignored, and the BoW model completely depends on the visual characteristics of manual design and is easy to generate deviation when illumination changes or shakes.
The vigorous development of deep learning nowadays promotes the great progress in the field of computer vision, and features extracted by a neural network are more robust than those designed manually, and can better represent original data. The development of the text detection and recognition technology is also beneficial to mining the text which is an element frequently appearing in the SLAM scene, and a new idea is provided for loopback detection by utilizing semantic information of the text. Gaoyang et al, in the document Loop Detection for Visual SLAM Systems Using Deep Neural Networks, propose to use a Deep Neural network structure, i.e., a stacked auto-encoder, to learn how to extract features from an image and use the learned features for detecting loops. The chinese patent with application number 201910999570.9 proposes a visual SLAM method based on instance segmentation, which uses Mask RCNN to perform instance segmentation, and uses semantic information of image classification to construct a semantic map, thereby implementing loop detection. A method for utilizing Text information in a scene in SLAM is proposed in a document 'textSLAM with Planar Text Features' by Boying Li et al, but only the Text is treated as a Planar feature, and semantic information contained in the Text itself is not well mined.
In some application scenes of the visual SLAM, such as supermarkets, parking lots, markets and the like, text pictures often appear and contain abundant texture features and semantic information, the texture features and the semantic features of texts cannot be fully utilized by the prior method, and if the text features can be combined into the SLAM method, the performance of the SLAM method in such scenes can be expected to be remarkably improved.
Disclosure of Invention
In view of the above, the invention provides a SLAM loop detection method combined with scene text semantic information, which is used for solving the problem of the loop detection method based on a bag-of-words model, automatically extracts image features by using a neural network, and fuses the image features with semantic information of text road signs in scenes and relative position information of the text road signs in space.
A SLAM loop detection method combined with scene text semantic information comprises the following steps:
(1) building and training a text detection model and a text recognition model based on a lightweight neural network;
(2) acquiring an environment image by using a monocular camera, detecting a text in the image by using a text detection model, outputting coordinates of a text box, and saving a feature image output of a second stage of a feature extraction part of the text detection model;
(3) recognizing the detected text area by using a text recognition model;
(4) calculating a characteristic information vector and a semantic information vector of the current frame according to the text detection result and the recognition result obtained in the step (2) and the step (3), and obtaining a total information vector through weighting fusion;
(5) calculating the cosine similarity of the total information vector of any key frame in the key frame set and the total information vector of the current frame, and taking the key frame which has the similarity larger than a certain threshold and is not directly adjacent to the current frame as a loop candidate frame;
(6) when three consecutive adjacent loop candidate frames appear, the loop is judged to appear.
Further, aiming at the real-time requirement of SLAM and the limitation of computing resources of the embedded platform, the step (1) is improved on the basis of an EAST (efficient and accurate Scene text) model to obtain a text detection model based on a lightweight neural network: the input is a picture, the corresponding area of the text information in the picture is directly predicted by using a full convolution network, the area exceeding a set threshold value is subjected to non-maximum suppression in the area predicted by the full convolution network, and the result of the non-maximum suppression is the final output of the model, namely the text box coordinate on the picture.
Further, the step (1) adopts a crnn (conditional recovery Neural network) model as the text recognition model based on the lightweight Neural network.
Further, the full convolution network comprises three parts of feature extraction, feature fusion and output layer, wherein the feature extraction part adopts a ShuffleNet V2 model to output a feature map f of four levels1,f2,f3,f4The sizes are 1/32, 1/16, 1/8 and 1/4, respectively, of the original.
Further, the feature fusion part outputs four levels of feature maps f to the ShuffleNet V2 model1,f2,f3,f4Performing step-by-step feature fusion, wherein three feature fusion stages are provided, in each feature fusion stage, firstly performing up-sampling on a feature map from the previous stage to make the feature map have the same size as the current feature map, then cascading the feature map and the current feature map along a channel direction, further reducing the number of channels of the feature map after cascading by using a 1 × 1 convolutional layer to reduce the calculated amount, and finally performing information fusion on the feature map by using a 3 × 3 convolutional layer to generate a result of the current feature fusion stage; after the last feature fusion stage, generating a final feature map by using a 3 multiplied by 3 convolutional layer and inputting the final feature map into an output layer; the number of channels of the 1 × 1 convolutional layer in the three feature fusion stages is 1256, 244, 88, respectively, the number of channels of the 3 × 3 convolutional layer in the three feature fusion stages is 128, 1256, 32, respectively, and the number of channels of the 3 × 3 convolutional layer after the last feature fusion stage is 32.
Further, the stepsIn step (4), for the current frame, the feature map f of the second stage of the model feature extraction part in the text detection process is taken2Carrying out global average pooling on each channel to obtain a feature map f of each element in a feature information vector f of the current frame2Average value of corresponding channel in the channel.
Further, in the step (4), for the current frame, semantic information of the current frame is described by a vector, and the semantic information vector of the current frame is noted as t ═ e1,e2,…,eN]Wherein e isi=[pi,x1i,y1i,x2i,y2i]N denotes the number of textual signposts, eiInformation describing the ith textual roadmap in the current frame, piIndicating whether the ith text road sign appears in the current frame or not, if so, pi1, otherwise pi=0,(x1i,y1i) And (x)2i,y2i) And respectively outputting the coordinates of the upper left corner and the lower right corner of the text box corresponding to the ith text signpost in the current frame, wherein the information is output by a trained text detection model and a trained text recognition model.
Further, in the step (4), the feature information vector f and the semantic information vector t are weighted and fused by a formula s ═ λ t + f to obtain a total information vector s, where λ is a weight occupied by the semantic information vector f and may be set to 0.1.
Further, in the step (5), for two total information vectors m and n, calculating cosine similarity cos (m, n) of the two total information vectors m and n through the following formula;
Figure BDA0002560030670000041
the first step of utilizing text information in a scene in the visual SLAM is to extract the text information from an image captured by a camera sensor, in order to avoid the unicity of artificial visual features, the invention uses a deep neural network to automatically extract image features, and designs a text detection model EAST-light based on a lightweight neural network aiming at the condition that the common deployment platform (embedded platform) of the visual SLAM algorithm is limited in computing resources, thereby meeting the requirements of the SLAM on the real-time property. The method uses the EAST-light model to simultaneously extract image features and detect the text, uses another neural network model CRNN to identify the text, uses the extracted image features and text semantic information for SLAM loop detection, and adds the coordinate information of each detected text object into a feature vector, thereby overcoming the defect that the semantic information cannot completely represent the image features and improving the accuracy of loop detection.
Compared with the prior art, the invention has the following advantages:
1. the invention provides a SLAM loop detection method combined with scene text semantic information, which utilizes a deep neural network to extract image features.
2. The method comprises the steps of detecting and identifying text objects appearing in a scene, and extracting semantic information of the text; compared with visual features such as ORB (object relational features) and the like, semantic information in the image is a more stable variable, and when dynamic interference exists in a scene, the influence on the semantic information of the image is smaller than the influence on the visual features of the image; the visual loop detection is an algorithm for calculating the similarity of image data, and the accuracy of similarity judgment can be improved by correct weighted fusion of the similarity of the feature points and the semantic similarity, so that the accuracy of loop detection is improved, and the robustness of an SLAM system is enhanced.
3. Aiming at the real-time requirement of SLAM and the limitation of computing resources of an embedded platform, the invention provides a light text detection model EAST-light on the basis of improving an EAST model, the EAST-light model changes a feature extraction grid VGG16 of the EAST model into a ShuffleNet V2 network, the running speed of the model is greatly improved, an image with the resolution of 512 multiplied by 512 is processed on a Jetson TX2 development board, the EAST model needs 0.42 second, and the EAST-light model only needs 0.06 second; on the public data set ICDAR2015 test set, the accuracy of EAST was 80.46% and the accuracy of EAST-light was 71.54%, so EAST-light achieved a better balance in speed and accuracy.
Drawings
FIG. 1 is a schematic flow chart of EAST-light model according to the present invention.
FIG. 2 is a schematic diagram of a full convolutional network structure in EAST-light of the present invention.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
The invention relates to a SLAM loop detection method combined with scene text semantic information, which comprises the following steps:
step 1: and constructing and training a text detection and recognition model based on the lightweight neural network model.
The invention provides a lightweight text detection model EAST-light on the basis of an EAST model: as shown in fig. 1, EAST-light is divided into two parts of a multi-channel full convolution network and non-maximum suppression: the text information prediction is directly given by using the full convolution network, in the prediction area obtained by the full convolution network, the non-maximum suppression is carried out on the prediction area exceeding the preset threshold, and the result of the non-maximum suppression is the final output of the model, namely the detected text box coordinate on the picture.
As shown in fig. 2, the full convolutional network is divided into 3 parts: and a feature extraction layer, a feature fusion layer and an output layer.
The feature extraction part adopts a Shufflenet V2 model to output feature maps f of four levels1,f2,f3,f4The sizes are 1/32, 1/16, 1/8 and 1/4 of the original drawings, respectively.
Feature fusion part outputs four-level feature map f to Shufflenet V2 model1,f2,f3,f4Step-by-step feature fusion is carried out, 3 feature fusion stages are provided in total, in each feature fusion stage, the feature graph from the previous stage is firstly up-sampled to make the feature graph have the same size as the current feature graph, and then the feature graph and the current feature graph are cascaded along the channel direction, so that a 1-mode feature is utilized1, reducing the number of channels and the calculated amount by the convolution layer, and finally fusing information by a 3 multiplied by 3 convolution layer to generate a result of the characteristic fusion stage; after the last feature fusion stage, generating a final feature map by using a 3 multiplied by 3 convolutional layer and inputting the final feature map into an output layer; the number of channels of the 1 × 1 convolutional layer in the 3 feature fusion stages is 1256, 244 and 88 respectively; the number of channels of the 3 × 3 convolutional layers in the 3 feature fusion stages is 128, 1256 and 32 respectively; the number of channels of the 3 × 3 convolutional layer after the last feature fusion stage is 32; the specific network settings and output sizes at each stage are shown in table 1:
TABLE 1
Figure BDA0002560030670000061
The output layer outputs the probability that each pixel of the image belongs to the text region and the geometric information of the text box, the geometric information of the text box is represented by 4-dimensional axial bounding box parameters (AABB) R and 1-dimensional rotation angle theta, and the 4-dimensional parameters of R respectively represent the distances from the pixel points to the upper, right, lower and left boundaries of the rectangular box.
After the model is built by utilizing the open source deep learning frame PyTorch, images in an application scene are collected by using a monocular camera, a data set is manufactured, the manufactured data set is used for training a text detection model EAST-light and a text recognition model CRNN on a computer with a GPU, and the trained model weight is stored.
Step 2: the method comprises the steps of using a Jetson TX2 development board of the Invidant company as a computing platform of a SLAM loop detection method, receiving an environment image collected by a monocular camera sensor as input, detecting texts in the image through a text detection model EAST-light, outputting coordinates of a text box, storing the coordinates of the text detection model, extracting features of a second stage of a network ShuffleNet V2, and outputting f a feature map2
And 3, step 3: the detected text region is identified by a text identification model CRNN.
And 4, step 4: from the text detection and recognition results obtained in steps 2 and 3,feature map f of the second stage of the Shufflenet V2 model2Performing global average pooling on each channel to obtain a feature information vector f, wherein each element in the f is equal to the feature map f2Average value of corresponding channel in the channel.
Describing semantic information of an image by a vector, and recording a text semantic information vector as t ═ e1,e2,…,eN]Wherein e isi=[pi,x1i,y1i,x2i,y2i]N denotes the number of textual signposts, eiDescribing the information of the ith textual road sign in the image, piIndicating whether the ith textual landmark appears in the image, (x)1i,y1i) And (x)2i,y2i) The coordinates of the upper left corner and the lower right corner of the text box are respectively, and the information is output by the trained text detection and recognition model.
And obtaining a total information vector by weighting and fusing the characteristic information vector and the semantic information vector of the current frame: s is λ t + f, where λ represents the weight occupied by the semantic information vector and can be set to 0.1, and the similarity is calculated by cosine value, and the cosine similarity between the vector m and the vector n is
Figure BDA0002560030670000071
And 5: for each key frame in the set of key frames, its total information vector s is calculatediThe total information vector s of the current framejCosine similarity of
Figure BDA0002560030670000072
And taking the key frame with the similarity larger than a certain threshold value and not directly adjacent to the current frame as a loop candidate frame.
Step 6: if three consecutive adjacent loop candidate frames occur, a loop is considered to occur.
The method utilizes the deep neural network to extract the image characteristics, is more robust than the manually designed characteristics, can better represent original data when the scene slightly changes, simultaneously detects and identifies text objects appearing in the scene, extracts the semantic information of the text, performs weighted fusion on the similarity of the characteristic points and the semantic similarity, improves the precision of similarity judgment, improves the accuracy of loopback detection, and enhances the robustness of the SLAM system. Compared with the EAST, the EAST-light model greatly improves the running speed of the model and realizes better balance on speed and precision for the real-time requirement of the SLAM and the limitation of the computing resource of the embedded platform.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (5)

1. A SLAM loop detection method combined with scene text semantic information comprises the following steps:
(1) building and training a text detection model and a text recognition model based on a lightweight neural network;
aiming at the real-time requirement of SLAM and the limitation of computing resources of an embedded platform, the method is improved on the basis of an EAST model to obtain a text detection model based on a lightweight neural network, and specifically comprises the following steps: the method comprises the steps that an input picture is a picture, a corresponding area of text information in the picture is directly predicted by using a full convolution network, non-maximum suppression is carried out on the area exceeding a set threshold value in the area obtained by prediction of the full convolution network, and the result of the non-maximum suppression is the final output of a model, namely the coordinates of a text box on the picture;
the full convolution network comprises three parts of feature extraction, feature fusion and output layer, wherein the feature extraction part adopts a ShuffleNet V2 model to output a feature map f of four levels1,f2,f3,f4Size 1/32, 1/16, 1/8 and 1/4 of original drawing; the feature fusion moiety pair ShuffleNet V2 model output four-level feature map f1,f2,f3,f4Performing step-by-step feature fusion, wherein three feature fusion stages are provided, in each feature fusion stage, firstly performing up-sampling on a feature map from the previous stage to make the feature map have the same size as the current feature map, then cascading the feature map and the current feature map along a channel direction, further reducing the number of channels of the feature map after cascading by using a 1 × 1 convolutional layer to reduce the calculated amount, and finally performing information fusion on the feature map by using a 3 × 3 convolutional layer to generate a result of the current feature fusion stage; after the last feature fusion stage, generating a final feature map by using a 3 multiplied by 3 convolutional layer and inputting the final feature map into an output layer; the number of channels of the 1 × 1 convolutional layer in the three feature fusion stages is 1256, 244 and 88 respectively, the number of channels of the 3 × 3 convolutional layer in the three feature fusion stages is 128, 1256 and 32 respectively, and the number of channels of the 3 × 3 convolutional layer after the last feature fusion stage is 32;
(2) acquiring an environment image by using a monocular camera, detecting a text in the image by using a text detection model, outputting coordinates of a text box, and saving a feature image output of a second stage of a feature extraction part of the text detection model;
(3) recognizing the detected text area by using a text recognition model;
(4) calculating a characteristic information vector and a semantic information vector of the current frame according to the text detection result and the recognition result obtained in the step (2) and the step (3), and obtaining a total information vector through weighting fusion;
for the current frame, taking the feature picture f of the second stage of the model feature extraction part in the text detection process2Carrying out global average pooling on each channel to obtain a feature map f of each element in a feature information vector f of the current frame2Average values of corresponding channels in the channel;
(5) calculating the cosine similarity of the total information vector of any key frame in the key frame set and the total information vector of the current frame, and taking the key frame which has the similarity larger than a certain threshold and is not directly adjacent to the current frame as a loop candidate frame;
(6) when three consecutive adjacent loop candidate frames appear, the loop is judged to appear.
2. The SLAM loop back detection method of claim 1, wherein: and (1) adopting a CRNN model as a text recognition model based on a lightweight neural network.
3. The SLAM loopback detection method as recited in claim 1, wherein: in the step (4), for the current frame, the semantic information of the current frame is described by one vector, and the semantic information vector of the current frame is recorded as t ═ e1,e2,…,eN]Wherein e isi=[pi,x1i,y1i,x2i,y2i]N denotes the number of textual signposts, eiInformation describing the ith textual roadmap in the current frame, piIndicating whether the ith text road sign appears in the current frame or not, if so, pi1, otherwise pi=0,(x1i,y1i) And (x)2i,y2i) And respectively outputting the coordinates of the upper left corner and the lower right corner of the text box corresponding to the ith text signpost in the current frame, wherein the information is output by a trained text detection model and a trained text recognition model.
4. The SLAM loop back detection method of claim 1, wherein: in the step (4), the feature information vector f and the semantic information vector t are subjected to weighted fusion through a formula s ═ λ t + f to obtain a total information vector s, wherein λ is the weight occupied by the semantic information vector t.
5. The SLAM loop back detection method of claim 1, wherein: in the step (5), for two total information vectors m and n, calculating cosine similarity cos (m, n) of the two total information vectors m and n through the following formula;
Figure FDA0003622906720000021
CN202010608535.2A 2020-06-29 2020-06-29 SLAM loop detection method combined with scene text semantic information Active CN111767854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010608535.2A CN111767854B (en) 2020-06-29 2020-06-29 SLAM loop detection method combined with scene text semantic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010608535.2A CN111767854B (en) 2020-06-29 2020-06-29 SLAM loop detection method combined with scene text semantic information

Publications (2)

Publication Number Publication Date
CN111767854A CN111767854A (en) 2020-10-13
CN111767854B true CN111767854B (en) 2022-07-01

Family

ID=72724102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010608535.2A Active CN111767854B (en) 2020-06-29 2020-06-29 SLAM loop detection method combined with scene text semantic information

Country Status (1)

Country Link
CN (1) CN111767854B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784942B (en) * 2020-12-29 2022-08-23 浙江大学 Special color block coding method for positioning navigation in large-scale scene
CN112990220B (en) * 2021-04-19 2022-08-05 烟台中科网络技术研究所 Intelligent identification method and system for target text in image
CN113723379A (en) * 2021-11-02 2021-11-30 深圳市普渡科技有限公司 Artificial intelligence device, visual positioning method, device and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784232A (en) * 2018-12-29 2019-05-21 佛山科学技术学院 A kind of vision SLAM winding detection method and device merging depth information
CN111126404A (en) * 2019-12-11 2020-05-08 杭州电子科技大学 Ancient character and font identification method based on improved YOLO v3

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784232A (en) * 2018-12-29 2019-05-21 佛山科学技术学院 A kind of vision SLAM winding detection method and device merging depth information
CN111126404A (en) * 2019-12-11 2020-05-08 杭州电子科技大学 Ancient character and font identification method based on improved YOLO v3

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Loop Closure Detection for Visual SLAM Fusing Semantic Information;Mingyue Hu等;《Proceedings of the 38th Chinese Control Conference》;20190730;第4136-4238页 *

Also Published As

Publication number Publication date
CN111767854A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
Yuan et al. VSSA-NET: Vertical spatial sequence attention network for traffic sign detection
CN110084850B (en) Dynamic scene visual positioning method based on image semantic segmentation
Neubert et al. Superpixel-based appearance change prediction for long-term navigation across seasons
Mattyus et al. Enhancing road maps by parsing aerial images around the world
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
US20200364554A1 (en) Systems and methods for deep localization and segmentation with a 3d semantic map
CN110738673A (en) Visual SLAM method based on example segmentation
CN111080659A (en) Environmental semantic perception method based on visual information
Matzen et al. Nyc3dcars: A dataset of 3d vehicles in geographic context
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
CN111368846B (en) Road ponding identification method based on boundary semantic segmentation
CN108564120B (en) Feature point extraction method based on deep neural network
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
Lu et al. Cascaded multi-task road extraction network for road surface, centerline, and edge extraction
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN115273032A (en) Traffic sign recognition method, apparatus, device and medium
Lowphansirikul et al. 3D Semantic segmentation of large-scale point-clouds in urban areas using deep learning
Zhang et al. Improved Lane Detection Method Based on Convolutional Neural Network Using Self-attention Distillation.
Wilson et al. Visual and object geo-localization: A comprehensive survey
Abdigapporov et al. Joint multiclass object detection and semantic segmentation for autonomous driving
Rong et al. Guided text spotting for assistive blind navigation in unfamiliar indoor environments
CN117370498B (en) Unified modeling method for 3D open vocabulary detection and closed caption generation
Park et al. Estimating the camera direction of a geotagged image using reference images
Chen et al. Occlusion and multi-scale pedestrian detection A review
CN116563553B (en) Unmanned aerial vehicle image segmentation method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant