CN112261719B

CN112261719B - Area positioning method combining SLAM technology with deep learning

Info

Publication number: CN112261719B
Application number: CN202011121186.8A
Authority: CN
Inventors: 冷阳; 牟海涛; 唐琪; 康斌
Original assignee: Dalian Institute Of Artificial Intelligence Dalian University Of Technology; Dalian Sanli Technology Co ltd
Current assignee: Dalian Institute Of Artificial Intelligence Dalian University Of Technology; Dalian Sanli Technology Co ltd
Priority date: 2020-07-24
Filing date: 2020-10-20
Publication date: 2022-02-11
Anticipated expiration: 2040-10-20
Also published as: CN112261719A

Abstract

The invention belongs to the technical field of image identification and regional positioning, and provides a regional positioning method combining an SLAM technology with deep learning. And the invention method proposed this time overcomes some disadvantages of the existing indoor positioning technology. The target is positioned by adopting the SLAM technology and combining with deep learning, so that the positioning precision is higher, the coverage is wider, the cost is lower, and the practical value is higher; and because the vision SLAM technology is a positioning mode based on computer vision, huge calculation amount is needed, and positioning has certain delay, and the method can well solve the two problems.

Description

Area positioning method combining SLAM technology with deep learning

Technical Field

The invention belongs to the technical field of image recognition and area positioning, and relates to a positioning method for obtaining positioning coordinates in a certain area by means of deep learning.

Background

SLAM is an abbreviation for Simultaneous localization and mapping, the name of which is translated as "Simultaneous localization and mapping". The method is characterized in that under the condition that environmental information is unknown, the environmental information is continuously constructed in the motion process through a specified sensor, and meanwhile, the position of the user is estimated; when a camera is employed as the sensor, it is referred to as "visual SLAM".

Deep learning is a new field of machine learning research, the concept of which is derived from the study of artificial neural networks. By constructing a neural network simulating the human brain to analyze and learn, simulating human brain mechanism interpretation data, extracting characteristics from a 'low layer' to a 'high layer' of input data layer by layer, and realizing good mapping from input to output. Deep learning with respect to specific research content, three types of methods are mainly involved: convolutional neural networks, self-coding neural networks, deep belief networks. The deep learning can improve the accuracy of classification and prediction, improve the limitation that the representation capability of the traditional neural network algorithm to a complex function is limited, and realize the processing of nonlinear natural signals, such as natural language processing, big data feature extraction, image recognition, voice recognition and the like.

With the development of positioning technology, GPS and base station positioning technology have achieved accurate positioning of most outdoor scenes. But in a few outdoor scenes or indoors, due to the presence of various obstacles, the GNSS signals decay rapidly or even not, so that accurate positioning cannot be achieved in these areas. Meanwhile, indoor scenes such as shopping malls have large people flow, the environment is relatively complex, and the real-time performance is strong, so that higher requirements on positioning are provided. The technologies currently applied to indoor positioning mainly include wifi positioning technology, bluetooth positioning technology, infrared positioning technology, ultrasonic positioning technology, geomagnetic positioning technology, RFID positioning, ultra-wideband positioning, and the like. However, these techniques have different defects in terms of positioning accuracy, coverage, reliability, power consumption, cost, and the like. If wifi positioning is carried out, the hot spot is greatly influenced by the surrounding environment, the positioning precision is low, and the maintenance cost is high; the RFID positioning coverage range is small, and the communication capability is not available; the ultra-wideband positioning technology has high cost, complex network deployment and the like. Aiming at the defects, the SLAM technology is combined with the positioning technology by means of deep learning, so that the positioning precision in special indoor or outdoor areas is improved, the coverage area is expanded, and the cost is reduced.

Disclosure of Invention

The main purposes of the invention are: the positioning method is mainly applied to indoor areas or outdoor areas such as construction sites and the like, and aims to improve the positioning accuracy, enlarge the coverage range and reduce the cost.

The technical scheme of the invention is as follows:

a region positioning method combining SLAM technology with deep learning comprises the following steps:

firstly, a base station is installed in a certain specific area, the base station can receive the signal intensity of a target in the area, firstly, a positioning area is divided into m small areas according to the size of an actual scene and the condition of an obstacle, sampling points are divided according to a certain interval, and then an instrument capable of obtaining the signal intensity is worn by the target to be measured. And finally, respectively collecting the signal intensity at the sampling points in the m areas, and recording and storing the collected RSSI values.

Secondly, positioning by adopting a binocular camera, installing the binocular camera on the target to be detected, enabling the target to walk through the sampling points of the m small areas in the first step, and obtaining the coordinates Y (the coordinates Y) of each sampling point of the target to be detected in each area by utilizing a classic frame of a visual SLAM (five modules including sensor data processing, front-end visual odometer, rear-end optimization, loop detection and drawing construction)x, y）。

Thirdly, training a classification model, inputting the classification model as the firstAnd outputting the signal intensity X obtained in the step (A) into m divided positioning areas. I.e., the filtered signal strength, it can be determined that the positioning target is in the i (i =1,2.. multidot.m) th region. And respectively training prediction models of the m positioning areas. The signal intensity of the position of the positioning target obtained in the ith area is used as an input X, the coordinate obtained in the second step is used as an output Y, and the signal intensity and the coordinate of the target are used as a set of corresponding data (X)，Y) training the ith prediction model.

Introducing a GA-BP neural network into the classification model; the initial weight and the threshold of the neural network are optimized by using the genetic algorithm, so that the neural network can be effectively prevented from falling into local optimum during training. In the method, a classification model hidden layer is 2 layers, an output layer is connected with a softmax classifier, and the number of output nodes is m. Output 0 represents that the positioning target is located in the 1 st area, output 1 represents that the positioning target is located in the 2 nd area, and output m-1 represents that the positioning target is located in the m-th area.

The prediction model introduces denoising self-coding and stacked self-coding networks. The denoising method adds certain times of Gaussian noise to the input signal intensity in the encoding, so that the model has stronger generalization capability. The stack self-coding can deepen any layer number of the network and has higher fitting degree.

The prediction model is provided with n layers of networks, and the specific network model is as follows:

(1) layer 1 structure of the model: firstly, training an autoencoder to obtain a first-order characteristic h of signal intensity⁽¹⁾；

(2) Layer i structure of the model: feature h of output of i-1 layer^(i-1)As input, it is self-encoded and the feature h is obtained⁽ⁱ⁾；

(3) Layer n structure of the model: the characteristic h of the previous step⁽ⁿ⁾Carrying out linear output to obtain a coordinate; combining the n layers to form a prediction network model which can obtain coordinates from the input signal intensity;

and fourthly, after the classification and prediction model is trained, removing the binocular camera. Then, the signal intensity of the target to be detected is transmitted into the classification model, and the target to be detected is judgedLocating in the ith area, inputting the signal intensity into the ith prediction model to obtain the target coordinate (x,y). Since the environment in the area is not constant, changes in the environment (e.g., movement of the position of static objects in the area, local building modification, movement of people, etc.) may occur, and the changes in the environment may affect the signal strength. And then, training the network model regularly to adapt to the change of the environment in the area and ensure the positioning accuracy.

In the first step, the number of installed base stations increases with the increase of the area, and symmetrical installation is ensured as much as possible. When the positioning area is large, area grid division can be carried out on the whole, the large area of the area to be detected is divided into a plurality of sub-areas, then interval division is carried out on each sub-area, in the signal intensity acquisition process, the sub-area where the object to be detected is located and a plurality of adjacent areas are determined, then signal values received by base stations of the areas are taken as representatives of the points, Gaussian filtering processing can be carried out on acquired data information before recording in order to improve the accuracy of the received signal values, and therefore the problem caused by signal fluctuation is reduced.

In the second step, information is read from the vision sensor, the distance between the camera and the object is determined by comparing different image information in left and right visual angles, the image formed by the camera on each road sign in the visual field forms pixel points in an imaging plane, and the distance between each pixel and the camera can be obtained through an imaging model of the binocular camera. And then, data acquired by the camera is transmitted into a front-end vision odometer, the change of the motion pose between adjacent images can be estimated, local environment information is restored, a video stream obtained from the camera is subjected to extraction and matching of the feature points, and the approximate motion track of the target to be detected is obtained. Meanwhile, the calculation result is preliminarily optimized to obtain an optimal coordinate solution and the optimal coordinate solution is transmitted to the rear end of the visual SLAM, so that the accumulated error of the visual odometer can be eliminated, and the data information omitted by the front end is further calculated and analyzed. And obtaining a map and a track which are more in line with the actual situation by utilizing a nonlinear graph optimization algorithm. And then the front end and the rear end are checked and repaired by utilizing loop detection, namely, some parts are checkedThe part which is not optimized is subjected to line area error correction, and the positioning precision is further improved. And finally, establishing a real-time map, and establishing a real-time map model for the small area by adopting monocular dense reconstruction. The method is constructed according to the task to be executed by the target to be measured, namely, the coordinates Y (of each sampling point in the area) are obtained by positioning each sampling point in the first stepx,y）。

The invention has the beneficial effects that: because the satellite signal is seriously attenuated due to the shielding interference of buildings, effective positioning cannot be realized in some areas, and the target can be effectively positioned in the areas by utilizing the method. And the invention method proposed this time overcomes some disadvantages of the existing indoor positioning technology. The target is positioned by adopting the SLAM technology and combining with deep learning, so that the positioning precision is higher, the coverage is wider, the cost is lower, and the practical value is higher; and because the vision SLAM technology is a positioning mode based on computer vision, huge calculation amount is needed, and positioning has certain delay, and the method can well solve the two problems.

Drawings

Fig. 1 is a schematic flow chart of the area location method combining SLAM technology with deep learning according to the present invention.

FIG. 2 is a flow chart of a model algorithm.

Fig. 3 is a visual SLAM classic framework.

Detailed Description

The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.

An area positioning method combining SLAM technology and deep learning is applied to specific examples in factories:

firstly, a base station is installed in a factory with the length of 90m and the width of 30m, the base station can receive the signal intensity of a target in the area, a positioning area is divided into three small areas according to the size and the actual condition of the factory, sampling points are divided according to certain intervals, and then an instrument capable of obtaining the signal intensity is worn by the target to be measured. And finally, respectively collecting the signal intensity at sampling points in the three areas, and recording and storing the collected RSSI values.

And secondly, selecting a first small area from the three small areas, installing a binocular camera on the target to be detected, determining the distance between the camera and the object by comparing different image information in left and right visual angles, forming pixel points on an image formed by each road sign in the visual field by the camera in an imaging plane, and obtaining the distance between each pixel and the camera through an imaging model of the binocular camera. And then, data acquired by the camera is transmitted into a front-end vision odometer, the change of the motion pose between adjacent images can be estimated, local environment information is restored, a video stream obtained from the camera is subjected to extraction and matching of the feature points, and the approximate motion track of the target to be detected is obtained. Meanwhile, the calculation result is preliminarily optimized to obtain an optimal coordinate solution and the optimal coordinate solution is transmitted to the rear end of the visual SLAM, so that the accumulated error of the visual odometer can be eliminated, and the data information omitted by the front end is further calculated and analyzed. And obtaining a map and a track which are more in line with the actual situation through a nonlinear graph optimization algorithm. And then the front end and the rear end are subjected to defect detection and leakage repair by utilizing loop detection, namely, some parts which are not optimized are subjected to regional error correction, and the positioning precision is further improved. And finally, establishing a real-time map, and establishing a real-time map model for the small area by adopting monocular dense reconstruction. The method is constructed according to the task to be executed by the target to be measured, namely, the coordinates Y (of each sampling point in the area) are obtained by positioning each sampling point in the first stepx,y). The same is true for the second and third two small regions.

And thirdly, training a classification model, inputting the signal intensity X obtained in the first step, and outputting the signal intensity X into 3 divided positioning areas. I.e., the filtered signal strength, it can be determined that the localization object is in the i (i =1,2, 3) th region. And respectively training the prediction models of the 3 positioning areas. The signal intensity of the position of the positioning target obtained in the ith area is used as an input X, the coordinate obtained in the second step is used as an output Y, and the signal intensity and the coordinate of the target are used as a set of corresponding data (X)，Y) training the ith prediction model.

Introducing a GA-BP neural network into the classification model; the initial weight and the threshold of the neural network are optimized by using the genetic algorithm, so that the neural network can be effectively prevented from falling into local optimum during training. Under the condition that the parking lot is divided into 3 positioning areas, the classification model hiding layer is 2 layers, the output layer is connected with the softmax classifier, and the number of output nodes is 3. Output 0 represents that the positioning object is located in the first area, output 1 represents that the positioning object is located in the second area, and output 2 represents that the positioning object is located in the third area.

The prediction model introduces denoising self-coding and stacked self-coding networks. The prediction model has 10 layers of networks, and coordinates can be directly obtained from the input signal strength.

And fourthly, after the classification and prediction model is trained, removing the binocular camera. Then, the signal intensity of the target to be measured is transmitted into the classification model, the target to be measured is judged to be located in the ith area, and then the signal intensity is input into the ith prediction model, so that the target coordinate (a)x,y). And then training the network model every other half year to adapt to the change of the environment in the area and ensure the positioning accuracy.

Claims

1. A region positioning method combining SLAM technology with deep learning is characterized by comprising the following steps:

firstly, installing a base station in a certain specific area, and requiring the base station to receive the signal intensity of a target in the area; firstly, dividing a positioning area into m small areas according to the size of an actual scene and the condition of an obstacle, dividing sampling points according to a certain interval, and then wearing an instrument capable of obtaining signal intensity by a target to be measured; finally, collecting each signal intensity X at sampling points in m small areas respectively, and recording and storing each collected RSSI value;

secondly, positioning by using a binocular camera, installing the binocular camera on the target to be detected, enabling the target to be detected to pass through all sampling points on the m small areas in the first step, and obtaining the coordinates Y (the coordinates Y) of each sampling point of the target to be detected in each small area by using a classic frame of a visual SLAM (coordinate system of the sensor data processing, the front-end visual odometer, the rear-end optimization, the loop detection and the drawing building five modules) (the coordinates Y are obtained by using the sensor data processing, the front-end visual odometer, the rear-end optimization, the loop detection and the drawing building module)x,y）；

Training a classification model, inputting the signal intensity X obtained in the first step, and outputting the signal intensity X into m divided small areas; namely, inputting the filtered signal intensity, namely, firstly judging that the target to be detected is in the ith cell, i =1,2.. m; respectively training prediction models of m small areas, taking the signal intensity of the position of the target to be detected in the ith small area as input X, taking the coordinate obtained in the second step as output Y, and training the ith prediction model by taking X and Y as corresponding training data;

introducing a GA-BP neural network into the classification model; optimizing the initial weight and the threshold of the GA-BP neural network by utilizing a genetic algorithm, and effectively avoiding the GA-BP neural network from falling into local optimum during training; the hidden layer of the classification model is 2 layers, the output layer is connected with a softmax classifier, and the number of output nodes is m; outputting 0 to represent that the target to be detected is located in the 1 st area, outputting 1 to represent that the target to be detected is located in the 2 nd area, and so on, and outputting m-1 to represent that the target to be detected is located in the mth area;

introducing denoising self-coding and stacked self-coding networks into the prediction model;

(1) layer 1 structure of the prediction model: firstly, training an autoencoder to obtain a first-order characteristic h of signal intensity⁽¹⁾；

(2) Layer i structure of the prediction model: feature h of output of i-1 layer^(i-1)As input, it is self-encoded and the feature h is obtained⁽ⁱ⁾；

(3) Layer n structure of the prediction model: the characteristics h obtained in the step (2)⁽ⁿ⁾Carrying out linear output to obtain a coordinate;

(4) combining the n layers to form a prediction model, and obtaining coordinates from the input signal intensity;

fourthly, after the classification model and the prediction model are trained, removing the binocular camera; firstly, the signal intensity of the target to be measured is transmitted into the classification model, the target to be measured is judged to be positioned in the ith area, and then the signal intensity is input into the ith prediction model, so that the target coordinate (x,y）。