CN115115669A

CN115115669A - Terminal sensing positioning method and system based on edge device self-supervision learning

Info

Publication number: CN115115669A
Application number: CN202210745389.7A
Authority: CN
Inventors: 谢水生; 陈放; 丁磊; 柏晓乐
Original assignee: Smart Dynamics Co ltd
Current assignee: Smart Dynamics Co ltd
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-09-27

Abstract

The application discloses a terminal perception positioning method and system based on edge device self-supervision learning, belonging to the field of artificial intelligence, the system comprises: the method comprises the steps of constructing a self-supervision characteristic detection network, carrying out self-supervision learning characteristic excavation on image data, obtaining self-supervision characteristics, constructing a local map, carrying out tracking operation based on the local map and the self-supervision characteristics, inserting a key frame, generating characteristic road mark points according to characteristics in the key frame, updating road mark points of the local map, carrying out self-supervision characteristic closed loop detection according to the key frame and the characteristic road mark points, calling global optimization in global map optimization and post-processing to correct the positions of a camera motion pose track, the key frame and the characteristic road mark points after closed loop occurs, updating the global map after operation, and storing a perception positioning map. The technical problems of low map feature recognition efficiency and low positioning and global map accuracy are solved. The technical effect of improving the synchronization of positioning and map construction is achieved.

Description

Terminal sensing positioning method and system based on edge device self-supervision learning

Technical Field

The application relates to the field of artificial intelligence, in particular to a terminal perception positioning method and system based on edge device self-supervision learning.

Background

With the rapid development of economy and the steady improvement of the social living standard, the artificial intelligence is rapidly developed, and a lot of changes and convenience are brought to the daily life of people. At present, the demand of the intelligent robot is greatly increased, and the research on how to optimize the motion of the intelligent robot is of great significance.

Currently, the robot motion is determined by performing synchronous positioning and map construction, the position and the posture of the robot are positioned according to repeatedly observed map features in the motion process of an unknown environment, the actual positioning is evaluated, and the environmental state at the using moment is compared with the environmental state at the moment when a map is depicted, so that the purpose of performing map construction while positioning is realized.

However, since the positioning and the mapping must be performed simultaneously in practical applications, a non-biased map is required for the positioning, and the non-biased map needs to be accurately estimated for the mapping, which finally results in low precision of the positioning and the mapping, and cannot provide practical guidance for the robot motion. The technical problems of low map feature identification efficiency and low positioning and global map accuracy exist.

Disclosure of Invention

The application aims to provide a terminal perception positioning method and system based on edge device self-supervision learning, and the method and system are used for solving the technical problems that in the prior art, map feature recognition efficiency is low, and positioning and global map accuracy is low.

In view of the above problems, the present application provides a terminal sensing and positioning method and system based on edge device self-supervision learning.

In a first aspect, the present application provides a terminal sensing and positioning method based on edge device self-supervision learning, where the method includes: constructing an automatic supervision characteristic detection network; performing self-supervision learning characteristic mining on image data by using the self-supervision characteristic detection network based on the edge equipment terminal to obtain self-supervision characteristics; constructing a local map, performing tracking operation based on the local map and self-supervision characteristics, inserting a key frame, generating characteristic landmark points according to the characteristics in the key frame, and updating the landmark points of the local map; performing self-supervision characteristic closed-loop detection according to the key frames and the characteristic road sign points; if a closed loop occurs in the tracking process of the camera, the global optimization is called in global map optimization and post-processing to correct the positions of the motion pose track, the key frame and the feature landmark points of the camera, and the global map is updated and the perception positioning map is stored after operation, wherein the global map comprises the feature landmark points generated by all the self-monitoring features in the operation process.

On the other hand, the application also provides a terminal perception positioning system based on the edge device self-supervision learning, wherein the system comprises: a network construction module for constructing an auto-supervised feature detection network; the self-supervision characteristic acquisition module is used for carrying out self-supervision learning characteristic mining on image data by utilizing the self-supervision characteristic detection network based on the edge equipment terminal to acquire self-supervision characteristics; the updating module is used for constructing a local map, performing tracking operation based on the local map and self-supervision characteristics, inserting key frames, generating characteristic road marking points according to the characteristics in the key frames and updating the road marking points of the local map; the closed-loop detection module is used for carrying out self-supervision characteristic closed-loop detection according to the key frames and the characteristic road sign points; and the correction module is used for calling global optimization to correct the positions of the motion pose track, the key frame and the feature landmark points of the camera in global map optimization and post-processing if a closed loop occurs in the tracking process of the camera, updating the global map after operation and storing a perception positioning map, wherein the global map comprises the feature landmark points generated by all the self-supervision features in the operation process.

One or more technical solutions provided in the present application have at least the following technical effects or advantages:

the method comprises the steps of constructing a self-supervision feature detection network, performing self-supervision learning feature mining on image data by utilizing the self-supervision feature detection network on an edge device terminal to obtain self-supervision features, then constructing a local map, performing tracking operation based on the local map and the self-supervision features, inserting a key frame to obtain feature road mark points, updating road mark points of the local map, performing self-supervision feature closed loop detection according to the key frame and the feature road mark points, further calling global optimization in global map optimization and post-processing to correct the positions of a camera motion pose track, the key frame and the feature road mark points if closed loops occur in the tracking process of the camera, updating the global map after operation, and storing a perception positioning map. The method achieves the technical effects of improving the accuracy of map feature identification and improving the map and positioning precision.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only exemplary, and for those skilled in the art, other drawings can be obtained according to the provided drawings without inventive effort.

Fig. 1 is a schematic flowchart of a terminal sensing and positioning method based on edge device self-supervision learning according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart illustrating a process of obtaining an auto-supervised feature in a terminal sensing and positioning method based on edge device auto-supervised learning according to an embodiment of the present application;

fig. 3 is a schematic flowchart illustrating a process of updating landmark points of the local map in the terminal-aware positioning method based on edge device self-supervised learning according to the embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal sensing and positioning system based on edge device self-supervision learning according to the present application;

description of reference numerals: the system comprises a network construction module 11, an automatic supervision characteristic obtaining module 12, an updating module 13, a closed loop detection module 14 and a correction module 15.

Detailed Description

The application provides the terminal perception positioning method and system based on the edge device self-supervision learning, and solves the technical problems that in the prior art, the map feature recognition efficiency is low, and the positioning and global map accuracy is low. The technical effects of improving the accuracy of feature identification and improving the synchronism of positioning and map construction are achieved.

According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.

In the following, the technical solutions in the present application will be clearly and completely described with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments of the present application, and it is to be understood that the present application is not limited by the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. It should be further noted that, for the convenience of description, only some but not all of the elements relevant to the present application are shown in the drawings.

Example one

As shown in fig. 1, the present application provides a terminal sensing and positioning method based on edge device self-supervised learning, wherein the method includes:

step S100: constructing an automatic supervision characteristic detection network;

further, step S100 in the embodiment of the present application further includes:

step S110: the self-supervision feature detection network comprises an input layer, a shared feature extraction layer, a description sub-network layer and a key point network layer, wherein the input layer is connected with the shared feature extraction layer, a feature tensor is extracted from an input image through the shared feature extraction layer, the feature tensor is transmitted to the description sub-network layer and the key point network layer respectively, and the description sub-network layer and the key point network layer process the feature tensor respectively and extract self-supervision features.

Specifically, the self-supervision feature detection network is a detection network deployed at an edge computing device terminal and used for extracting the positions of feature points with pixel-level accuracy and descriptors thereof from an original image, and has strong adaptability to factors such as visual angles, illumination, noise and the like. The input layer is used for inputting image data into the self-supervision feature detection network. The shared feature extraction layer is positioned behind the input layer and used for extracting features of input image data, different features can be extracted by utilizing the convolution layer, and then qualitative detection is carried out on local features according to a loss function and a data true value, useless features are effectively eliminated, and therefore interference of factors such as noise and illumination is reduced.

Specifically, the shared feature extraction layer converts input image data into a feature tensor after feature extraction, so that the feature tensor is transmitted to the description sub-network layer and the key point network layer for processing, and the self-supervision features are obtained. Wherein the feature tensor is an array of features of various dimensions in the image data. And the key point network layer is used for identifying and processing the key points in the characteristic tensor to obtain the positions of the key points which are uniformly distributed. The key point network layer comprises an encoding layer and a decoding layer, the encoding layer is used for identifying key point features in the feature tensor, and the decoding layer is used for decoding the processed tensor back to the size of an input image to obtain a key point score map so as to determine the position of the key point.

Specifically, the description sub-network layer is configured to convert the feature tensor into a descriptor tensor, where the descriptor tensor is used to describe pixels around the key points, and for example, when it is determined whether the key points at two different positions are similar, the determination is performed by calculating a distance between descriptors between the two key points. The self-supervision feature is used for providing an object for subsequent self-supervision sensing and positioning and comprises the key point position and the feature point descriptor. Therefore, the goal of constructing an automatic supervision feature detection network and acquiring the automatic supervision features is achieved, the technical effects of improving the accuracy of feature identification, improving the identification efficiency and laying a cushion for updating and optimizing the whole-area map for subsequent automatic supervision learning are achieved.

Step S200: performing self-supervision learning characteristic mining on image data by using the self-supervision characteristic detection network based on the edge equipment terminal to obtain self-supervision characteristics;

further, as shown in fig. 2, based on the edge device terminal performing the feature mining on the image data by using the self-supervised feature detection network to obtain the self-supervised feature, step S200 in the embodiment of the present application further includes:

step S210: inputting image data, and performing image projection transformation on the image data to obtain a projection image;

step S220: reasoning the image data and the projected image by a twin network respectively to obtain a descriptor tensor;

step S230: mining and fusing positive and negative samples based on the image data and the descriptor tensor of the projection image to obtain a training sample;

step S240: performing feature extraction on input image data through a shared feature extraction layer to convert the input image data into a feature tensor, and supplying the feature tensor to a key point network layer and a description sub-network layer;

step S250: the key point network layer carries out key point feature identification processing on the feature tensor to determine the position of a key point;

step S260: the description sub-network layer converts the feature tensor into a descriptor tensor, and performs bicubic interpolation and normalization processing on the descriptor tensor to obtain a feature point descriptor;

step S270: and obtaining the self-supervision characteristic according to the key point position and the characteristic point descriptor.

In particular, the edge device terminal is a terminal that provides computation, storage, and network bandwidth in proximity to a data input or user. The image projective transformationMeans by dividing the input image I e R ^1024*512*1 Randomly generated projection matrix H epsilon R ^3*3*1 Transforming to obtain the projected image I' epsilon R ^1024*512*1 And H is retained for calculating the loss value. The twin network reasoning refers to the fact that the input image I and the projection image I' are respectively reasoned through the twin network to obtain a descriptor tensor D e R ^60*80*32 And D' is epsilon R ^60*80*32 . According to the nature of the convolutional neural network, each spatial pixel in the descriptor tensor is descriptive of a region of the original image portion, which are not coincident with each other, and thus the descriptor tensor can be regarded as a semi-dense description of the input image.

Specifically, by creating a square grid from the input image I and the projection image I', the descriptor tensor and a spatial pixel can be regarded as the descriptor vector of the pixel corresponding to the center point of the grid. Because a projection transformation matrix between the input image I and the projection image I ' is known, the grid central point G on I can be obtained by calculating the coordinates G ' after I/projection transformation, the nearest neighbor point of G ' is taken as the matching point of G in the grid central point of I ', a positive sample pair can be obtained, and other grid central points on G and I ' can form a negative sample pair. If the distance between G' and the nearest neighbor is too large, the set of samples is discarded. By establishing an appropriate loss function, samples with too large distances can be eliminated, and thus training samples are combined by the obtained positive sample pairs and the negative sample pairs. The training samples are used as image data for use in self-supervised learning.

Specifically, each spatial pixel in the descriptor tensor can be regarded as a multi-dimensional descriptor vector for the image, and corresponds to a region feature of the input image data. The bicubic interpolation may be used to increase the resolution of the image corresponding to the descriptor vector. The normalization processing refers to that the data are subjected to normalization processing and limited within a certain range, so that the data processing speed can be increased. And obtaining the characteristic point descriptor with high matching performance through the bicubic interpolation and the normalization processing. And then, through the key point position and the feature point descriptor, the feature point position and the surrounding pixel information can be obtained, and the self-supervision feature is obtained. Therefore, the goal of self-supervision learning feature mining on the image data is achieved, the accuracy of the self-supervision features is improved, and the technical effect of providing accurate initial data for subsequent nonlinear optimization is achieved.

Step S300: constructing a local map, performing tracking operation based on the local map and self-supervision characteristics, inserting a key frame, generating characteristic landmark points according to the characteristics in the key frame, and updating the landmark points of the local map;

further, as shown in fig. 3, a local map is constructed, tracking operation is performed based on the local map and an auto-supervision feature, a key frame is inserted, a feature landmark point is generated according to a feature in the key frame, and the landmark point of the local map is updated, in step S300 in this embodiment of the present application, the method further includes:

step S310: obtaining a key frame discrimination requirement;

step S320: determining a key frame based on the key frame discrimination requirement, and inserting the key frame through a local map construction thread;

step S330: inserting the characteristic points of the key frames into a local map as characteristic road marking points;

step S340: and based on the local map, utilizing the feature landmark points to project into the current frame to search for matched feature points, and updating the key frame of the local map.

Further, before inserting the feature point of the key frame as a feature landmark point into the local map, step S330 in this embodiment of the present application further includes:

step S331: judging whether the characteristic points of the key frames meet the requirement of presetting local map insertion, and if so, inserting the characteristic points of the key frames into the local map as the characteristic landmark points;

step S332: and when the key frame does not meet the requirement, removing the characteristic points of the key frame.

Further, based on the local map, the feature waypoints are projected to the current frame to find matching feature points, and the keyframe of the local map is updated, in the step S340 in the embodiment of the present application, the step includes:

step S341: calculating the position of the feature landmark point projected to the current frame, judging whether the projected position exceeds the image boundary, and abandoning the feature landmark point when the projected position exceeds the image boundary;

step S342: calculating an included angle between a visual angle vector of the current frame and an observation visual angle vector of the characteristic landmark point, and if the included angle does not meet the preset requirement, discarding the included angle;

step S343: calculating the distance between the characteristic road sign point and the center of the current frame camera, and discarding the characteristic road sign point if the distance is not within the range of the visible distance;

step S344: matching the characteristic road mark point with the characteristic point which is not matched with the current frame and has a similar three-dimensional coordinate range, and taking the minimum distance as a new characteristic road mark point;

step S345: and determining newly added characteristic landmark points, and optimizing the pose according to the newly added characteristic landmark points.

Specifically, the key frame determination requirement is a condition for determining whether a current frame is a key frame in a camera tracking process, and includes: (1) if the camera is lost in tracking, the camera enters a tracking state again through repositioning, and 20 frames of images are processed; (2) in the tracking state of the camera, after 20 frames are inserted from the last time, or the local map building thread has no task; (3) the tracking quantity of the matching feature points of the current frame and the local map is less than 25% of the quantity of the feature landmark points contained in the key frame; (4) the number of feature landmark points tracked for the current frame is less than 75% of the feature landmark points for the reference key frame. The current frame may be inserted as a key frame only when it satisfies one of the (4) th and first three. After determining the key frame based on the key frame discrimination requirement, the local map construction thread is to update the public view and generate the feature points in the key frame as feature landmark points in the local map.

Specifically, the requirement for presetting the insertion local map is to preset a condition whether the feature point in the key frame can be inserted into the local map as a feature landmark point, and the condition includes: (1) more than 25% of key frames capable of observing the feature landmark points have feature points capable of being matched with the feature landmark points; (2) more than three key frames are in the observation range for the feature landmark points; and (3) the distance between the current key frame and the key frame which observes the point for the first time is less than 3 key frames. The feature points in the keyframe can be inserted into the local map as feature landmark points only when the above 3 conditions are simultaneously satisfied.

Specifically, based on the local map, the feature landmark points in the local map are projected to the current keyframe to find more matched feature points, so that the keyframe of the local map can be updated. And judging a new landmark point for updating the local map by processing the characteristic landmark points contained in the key frame set. And calculating the position of the feature landmark point projected to the current frame, and if the position exceeds the image boundary, indicating that the feature landmark point has no matching point in the current frame and is discarded. And the current frame view angle vector is the view angle and the view length when the current frame is shot. The feature road marking point observation visual angle vector refers to a visual angle and a visual field length which can be observed by observing from the feature road marking point. The preset requirement means that an included angle between the two view angle vectors needs to meet a certain range, and if the included angle does not meet the requirement that the common feature points of the current frame and the feature landmark points are too few, the common feature points need to be discarded. Illustratively, an included angle between a viewing angle vector v of the current frame and an observation viewing angle vector n of the feature landmark point is calculated, and if v · n is less than cos60 °, the included angle does not meet the condition, and the feature landmark point needs to be discarded.

Specifically, the distance between the feature landmark point and the center of the current frame camera exceeds a distance range, which means that the feature landmark point cannot find a matching feature point in the current frame, and therefore the feature landmark point is discarded. And taking the point with the minimum distance from the current frame camera among the points matched with the feature road mark points in the similar range of the three-dimensional coordinates as a new feature road mark point. And then, updating the key frame in the local map according to the new characteristic landmark points, optimizing and improving the accuracy of the pose.

Step S400: performing self-supervision characteristic closed-loop detection according to the key frames and the characteristic road sign points;

further, performing an auto-supervised feature closed-loop detection according to the key frame and the feature landmark points, where step S400 in the embodiment of the present application further includes:

step S410: the self-supervision characteristic closed-loop detection comprises a scene identification module and a closed-loop detection and fusion thread, and the scene identification module is used for detecting the latest key frame in an identification database to determine a closed-loop candidate frame;

step S420: and performing BA optimization after solving the position and posture of the closed-loop candidate frame, determining that the closed loop is detected if the optimized features are successfully matched with the feature landmark points, performing fusion optimization through closed-loop detection and a fusion thread, and converting the closed-loop candidate frame matched with the key frame into a closed-loop frame.

Specifically, the closed-loop detection of the self-supervision features is to identify whether the current observation scene appears in the historical observation scene, and if the current observation scene appears, the motion trail and the map can be closed. The scene recognition module comprises an offline training bag-of-words model and a recognition database, a key frame set in the recognition database is detected according to the latest key frame, the key frames which are lower in similarity and directly connected in the key frame set are removed by calculating the similarity between the key frames and adjacent key frames on a public view, and the rest frames in the key frame set are used as closed-loop candidate frames. The closed loop candidate frame is used for rigid body transformation and optimization, and since the closed loop candidate frame is a key frame with high similarity with the latest key frame, a historical observation scene can be obtained.

Specifically, the change relation between the camera coordinate system and the world coordinate system is obtained by solving the position in the closed-loop candidate frame, then BA optimization is carried out on the change relation to obtain the characteristics of the candidate frame obtained after the camera position and the characteristic point are adjusted, if the matching with the characteristic landmark point is successful, the current scene is shown to be over in the historical motion, and the closed loop can be formed. After the closed loop is confirmed, the closed loop frame is obtained through the closed loop fusion and optimization process, namely adding sequence edges and closed loop edges to the closed loop frame with the longest fixed time, and then starting optimization. Therefore, the goal of determining the closed-loop frame is achieved, and the technical effect of improving the accuracy of the self-supervision learning is achieved.

Step S500: if a closed loop occurs in the tracking process of the camera, the global optimization is called in global map optimization and post-processing to correct the positions of the motion pose track, the key frame and the feature landmark points of the camera, and the global map is updated and the perception positioning map is stored after operation, wherein the global map comprises the feature landmark points generated by all the self-monitoring features in the operation process.

Further, in the global map optimization, step S500 in the embodiment of the present application further includes:

(a) the number of initialization state spaces num _ x is 0, and when entering a new local map, a new relative transformation structure is obtained

Let num _ x equal to 1+ num _ x;

(b) updating an estimated M matrix, and estimating the M matrix through a block diagonal matrix formed by the relative confidence coefficient between each pose in the local map;

(c) according to the formula

Calculating a gradient descending direction vector delta x, judging whether the value of delta x is smaller than a given constraint error value a or not, and finishing iteration when the value of delta x is smaller than the given constraint error value a;

(d) when the new state space variable x is not satisfied, the learning rate lambda is calculated, the learning efficiency is iterated through a formula lambda/(1 + lambda), the initial value of lambda in the experiment is set to 1/3, and a new state space variable x is obtained _i And let x _i ＝x _i-1 +Δx；

(e) Judging whether the difference value of the limiting cost f (x) between the local maps at the adjacent moments meets a given error b or not, and if so, ending the iteration;

(f) when the iteration number does not meet the requirement, adding 1 to the iteration number, and turning to the step (b) to carry out new iteration solution;

(g) and when the iteration set times are reached, ending the optimization algorithm, and ending the iteration.

Specifically, the global map includes feature landmark points, key frames, and key frame relationships generated by all the self-supervision features in the operation process. If closed loop occurs in the tracking process of the camera, the current scene is shown to appear in the historical observation scene, and the camera is adjusted through the global optimization and post-processing thread, so that the positions of the motion pose track, the key frame and the feature landmark point of the camera are corrected, the accuracy of the global map is improved, and the global map is updated and stored.

In summary, the terminal sensing and positioning method based on edge device self-supervision learning provided by the present application has the following technical effects:

1. the method comprises the steps of constructing a self-supervision feature detection network, performing self-supervision learning feature mining on image data by utilizing the self-supervision feature detection network on an edge device terminal to obtain self-supervision features, then constructing a local map, performing tracking operation based on the local map and the self-supervision features, inserting a key frame to obtain feature road mark points as road mark points for updating the local map, performing self-supervision feature closed-loop detection according to the key frame and the feature road mark points, further performing closed-loop condition in the tracking process of the camera, correcting the positions of a motion pose track, the key frame and the feature road mark points of the camera by calling global optimization in global map optimization and post-processing, updating the global map after operation, and storing a perception positioning map. The technical effects of improving the map and positioning precision and improving the accuracy of feature identification are achieved.

2. The method includes the steps of conducting image projection transformation through input image data to obtain a projected image, then conducting reasoning through a twin network to obtain a descriptor tensor, conducting positive and negative sample mining fusion on the descriptor tensor of the projected image based on the image data to obtain a training sample, then conducting feature extraction on the input image data through a shared feature extraction layer to convert the descriptor data into a feature tensor, conducting key point feature identification processing on the feature tensor through a key point network layer to determine the position of a key point, converting the feature tensor into the descriptor tensor through the description sub network layer, conducting bicubic interpolation and normalization processing on the descriptor tensor to obtain a feature point descriptor, and obtaining self-supervision features by combining the position of the key point and the feature point descriptor. Therefore, the goal of obtaining the self-supervision characteristic in the operation process is achieved, the characteristic accuracy is improved, and the technical effect of providing reliable characteristic for subsequent key frame determination according to the characteristic is achieved.

Example two

Based on the same inventive concept as the terminal sensing and positioning method based on the edge device self-supervised learning in the foregoing embodiment, as shown in fig. 4, the present application further provides a terminal sensing and positioning system based on the edge device self-supervised learning, wherein the system includes:

the network construction module 11 is used for constructing an automatic supervision characteristic detection network;

the self-supervision characteristic acquisition module 12 is used for carrying out self-supervision learning characteristic mining on image data by utilizing the self-supervision characteristic detection network based on an edge device terminal to acquire self-supervision characteristics;

the updating module 13 is configured to construct a local map, perform tracking operation based on the local map and an auto-supervision feature, insert a key frame, generate a feature landmark point according to a feature in the key frame, and update the landmark point of the local map;

a closed loop detection module 14, wherein the closed loop detection module 14 is configured to perform an auto-supervision feature closed loop detection according to the key frame and the feature landmark points;

and the correction module 15 is configured to, if a closed loop occurs in the camera tracking process, invoke global optimization to correct the positions of the camera motion pose trajectory, the keyframe, and the feature landmark points in global map optimization and post-processing, and update and store a perception positioning map of the global map after operation, where the global map includes the feature landmark points generated in the operation process of all the self-supervision features.

Further, the system further comprises:

the self-supervision feature detection network comprises an input layer, a shared feature extraction layer, a description sub-network layer and a key point network layer, wherein the input layer is connected with the shared feature extraction layer, a feature tensor is extracted from an input image through the shared feature extraction layer, the feature tensor is transmitted to the description sub-network layer and the key point network layer respectively, and the description sub-network layer and the key point network layer process the feature tensor respectively and extract self-supervision features.

Further, the system further comprises:

the projection conversion unit is used for inputting image data, and performing image projection conversion on the image data to obtain a projection image;

the descriptor tensor obtaining unit is used for reasoning the image data and the projection image through a twin network respectively to obtain a descriptor tensor;

the sample mining unit is used for mining and fusing positive and negative samples based on the image data and the descriptor tensor of the projection image to obtain a training sample;

the system comprises a feature conversion unit, a feature extraction unit and a description sub-network layer, wherein the feature conversion unit is used for performing feature extraction on input image data through a shared feature extraction layer and converting the feature extraction into a feature tensor, and the feature tensor is supplied to a key point network layer and a description sub-network layer;

a key point position determining unit, configured to perform key point feature identification processing on the feature tensor by the key point network layer to determine a key point position;

the feature point descriptor obtaining unit is used for converting the feature tensor into a descriptor tensor by the descriptor sub-network layer, and performing bicubic interpolation and normalization processing on the descriptor tensor to obtain a feature point descriptor;

and the self-supervision characteristic obtaining unit is used for obtaining the self-supervision characteristic according to the key point position and the characteristic point descriptor.

Further, the system further comprises:

a discrimination request obtaining unit for obtaining a key frame discrimination request;

the key frame inserting unit is used for determining a key frame based on the key frame judging requirement and inserting the key frame through a local map building thread;

the characteristic road sign point inserting unit is used for inserting the characteristic points of the key frames into a local map as characteristic road sign points;

and the key frame updating unit is used for utilizing the feature landmark points to project into the current frame to search for matched feature points based on the local map and updating the key frame of the local map.

Further, the system further comprises:

the judging unit is used for judging whether the characteristic points of the key frames meet the requirement of presetting local map insertion, and when the characteristic points meet the requirement, the characteristic points of the key frames are used as the characteristic road sign points to be inserted into the local map;

and the rejecting unit is used for rejecting the characteristic points of the key frame when the characteristic points do not meet the requirement.

Further, the system further comprises:

the position judging unit is used for calculating the position of the feature landmark point projected to the current frame, judging whether the projected position exceeds the image boundary or not, and abandoning the projected position when the projected position exceeds the image boundary;

the included angle calculation unit is used for calculating an included angle between a visual angle vector of the current frame and an observation visual angle vector of the characteristic landmark points, and abandoning the current frame if the included angle does not meet the preset requirement;

the distance calculation unit is used for calculating the distance between the characteristic road sign point and the center of the current frame camera, and abandoning the characteristic road sign point if the distance is not within the range of the visible distance;

the matching unit is used for matching the feature road mark with the feature points which are not matched with the current frame and have similar three-dimensional coordinates, and the minimum distance is taken as a new feature road mark point;

and the pose optimization unit is used for determining the newly added characteristic landmark points and optimizing the pose according to the newly added characteristic landmark points.

Further, the system further comprises:

the closed-loop candidate frame determining unit is used for the closed-loop detection of the self-supervision characteristic and comprises a scene recognition module, a closed-loop detection and fusion thread, and the scene recognition module is used for detecting the latest key frame in a recognition database to determine a closed-loop candidate frame;

and the fusion optimization unit is used for performing BA optimization after solving the position and posture of the closed-loop candidate frame, determining that the closed loop is detected if the optimized features are successfully matched with the feature landmark points, performing fusion optimization through closed-loop detection and a fusion thread, and converting the closed-loop candidate frame matched with the key frame into a closed-loop frame.

Further, the system further comprises:

an optimization setting unit configured to set the global map optimization to include:

Let num _ x equal to 1+ num _ x;

(c) according to the formula

(d) when the learning rate is not satisfied, the learning rate lambda is calculated, the learning efficiency is iterated through a formula lambda/(1 + lambda), and experiments are carried outSetting the initial value of lambda to 1/3 to obtain new state space variable x _i And let x _i ＝x _i-1 +Δx；

In this specification, each embodiment is described in a progressive manner, and the main point of each embodiment is that the embodiment is different from other embodiments, the terminal sensing and positioning method based on edge device self-supervised learning in the first embodiment in fig. 1 and the specific example are also applicable to the terminal sensing and positioning system based on edge device self-supervised learning in this embodiment, and through the foregoing detailed description of the terminal sensing and positioning method based on edge device self-supervised learning, those skilled in the art can clearly know the terminal sensing and positioning method and system based on edge device self-supervised learning in this embodiment, so for the sake of brevity of the description, detailed description is not repeated here. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Claims

1. The terminal perception positioning method based on the edge device self-supervision learning is characterized by comprising the following steps:

constructing an automatic supervision characteristic detection network;

performing self-supervision learning characteristic mining on image data by using the self-supervision characteristic detection network based on the edge equipment terminal to obtain self-supervision characteristics;

constructing a local map, performing tracking operation based on the local map and self-supervision characteristics, inserting a key frame, generating characteristic landmark points according to the characteristics in the key frame, and updating the landmark points of the local map;

performing self-supervision characteristic closed-loop detection according to the key frames and the characteristic road sign points;

if a closed loop occurs in the tracking process of the camera, the global optimization is called in the global map optimization and post-processing to correct the motion pose track, the key frame and the positions of the feature road mark points of the camera, and the global map is updated and the perception positioning map is stored after operation, wherein the global map comprises the feature road mark points generated by all the self-supervision features in the operation process.

2. The method of claim 1, wherein the self-supervised feature detection network comprises an input layer, a shared feature extraction layer, a description sub-network layer and a key point network layer, wherein the input layer is connected with the shared feature extraction layer, feature tensors are extracted from an input image through the shared feature extraction layer, and the feature tensors are transmitted to the description sub-network layer and the key point network layer respectively, and the description sub-network layer and the key point network layer process the feature tensors respectively to extract self-supervised features.

3. The method of claim 2, wherein performing an unsupervised learning feature mining on image data using the unsupervised feature detection network based on an edge device terminal to obtain an unsupervised feature comprises:

inputting image data, and performing image projection transformation on the image data to obtain a projected image;

reasoning the image data and the projected image by a twin network respectively to obtain a descriptor tensor;

mining and fusing positive and negative samples based on the image data and the descriptor tensor of the projection image to obtain a training sample;

performing feature extraction on input image data through a shared feature extraction layer to convert the input image data into a feature tensor, and supplying the feature tensor to a key point network layer and a description sub-network layer;

the key point network layer carries out key point feature identification processing on the feature tensor to determine the position of a key point;

the description sub network layer converts the feature tensor into a descriptor tensor, and performs bicubic interpolation and normalization processing on the descriptor tensor to obtain a feature point descriptor;

and obtaining the self-supervision characteristic according to the key point position and the characteristic point descriptor.

4. The method of claim 1, wherein constructing a local map, performing a tracking operation based on the local map and an auto-supervised feature, inserting a key frame, generating a feature landmark point according to a feature in the key frame, and updating the landmark point of the local map comprises:

obtaining a key frame discrimination requirement;

determining a key frame based on the key frame discrimination requirement, and inserting the key frame through a local map construction thread;

inserting the characteristic points of the key frames into a local map as characteristic road marking points;

and based on the local map, utilizing the feature landmark points to project into the current frame to search for matched feature points, and updating the key frame of the local map.

5. The method of claim 4, wherein inserting the feature points of the keyframe as feature landmark points into a local map comprises:

judging whether the characteristic points of the key frames meet the requirement of presetting local map insertion, and if so, inserting the characteristic points of the key frames into the local map as the characteristic landmark points;

and when the key frame does not meet the requirement, removing the characteristic points of the key frame.

6. The method of claim 4, wherein updating the keyframe of the local map based on the local map using the feature landmark projection into the current frame to find matching feature points comprises:

calculating the position of the feature landmark point projected to the current frame, judging whether the projected position exceeds the image boundary, and abandoning the feature landmark point when the projected position exceeds the image boundary;

calculating an included angle between a visual angle vector of the current frame and an observation visual angle vector of the characteristic landmark point, and if the included angle does not meet the preset requirement, discarding the included angle;

calculating the distance between the characteristic road sign point and the center of the current frame camera, and discarding the characteristic road sign point if the distance is not within the range of the visible distance;

matching the characteristic road mark point with the characteristic point which is not matched with the current frame and has a similar three-dimensional coordinate range, and taking the minimum distance as a new characteristic road mark point;

and determining newly added characteristic landmark points, and optimizing the pose according to the newly added characteristic landmark points.

7. The method of claim 1, wherein performing an auto-supervised feature closed-loop detection based on the keyframes, feature landmark points, comprises:

the self-supervision characteristic closed-loop detection comprises a scene identification module and a closed-loop detection and fusion thread, and the scene identification module is used for detecting the latest key frame in an identification database to determine a closed-loop candidate frame;

and performing BA optimization after solving the position and posture of the closed-loop candidate frame, determining that the closed loop is detected if the optimized features are successfully matched with the feature landmark points, performing fusion optimization through closed-loop detection and a fusion thread, and converting the closed-loop candidate frame matched with the key frame into a closed-loop frame.

8. The method of claim 1, wherein the global map optimization comprises:

Let num _ x equal to 1+ num _ x;

(c) according to the formula

Calculating a gradient descending direction vector delta x, judging whether the value of delta x is smaller than a given constraint error value a, and finishing iteration when the value of delta x is smaller than the given constraint error value a;

(f) when the iteration number does not meet the requirement, adding 1 to the iteration number, and turning to the step (b) to perform new iteration solution;

9. Terminal perception positioning system based on edge device self-supervision learning, characterized in that, the system includes:

a network construction module for constructing an auto-supervised feature detection network;

the self-supervision characteristic acquisition module is used for carrying out self-supervision learning characteristic mining on image data by utilizing the self-supervision characteristic detection network based on the edge equipment terminal to acquire self-supervision characteristics;

the updating module is used for constructing a local map, performing tracking operation based on the local map and self-supervision characteristics, inserting key frames, generating characteristic road marking points according to the characteristics in the key frames and updating the road marking points of the local map;

the closed-loop detection module is used for carrying out self-supervision characteristic closed-loop detection according to the key frames and the characteristic road sign points;

and the correction module is used for calling global optimization to correct the positions of the motion pose track, the key frame and the feature landmark points of the camera in global map optimization and post-processing if a closed loop occurs in the tracking process of the camera, updating the global map after operation and storing a perception positioning map, wherein the global map comprises the feature landmark points generated by all the self-supervision features in the operation process.