CN110163095B

CN110163095B - Loop detection method, loop detection device and terminal equipment

Info

Publication number: CN110163095B
Application number: CN201910303060.3A
Authority: CN
Inventors: 张锲石; 刘袁; 程俊
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2022-11-29
Anticipated expiration: 2039-04-16
Also published as: CN110163095A

Abstract

The application is applicable to the technical field of loop detection, and provides a loop detection method, a loop detection device, a terminal device and a computer readable storage medium, comprising: acquiring a current frame and a plurality of historical frames corresponding to the current frame; inputting the current frame and the plurality of historical frames into a trained convolutional self-coding structure, and outputting a feature descriptor of the current frame and a feature descriptor of each historical frame in the plurality of historical frames; calculating the Euclidean distance between the current frame and each historical frame in the plurality of historical frames according to the feature descriptors of the current frame and the feature descriptors of each historical frame in the plurality of historical frames; and determining the frame with the shortest Euclidean distance from the current frame to the plurality of historical frames as a loop. According to the method and the device, loop detection is carried out by using a convolutional self-coding structure without supervision learning, so that the success rate of loop detection and the robustness in a complex environment can be improved.

Description

Loop detection method, loop detection device and terminal equipment

Technical Field

The present application belongs to the field of loop detection technologies, and in particular, to a loop detection method, a loop detection apparatus, a terminal device, and a computer-readable storage medium.

Background

Currently, loop detection mainly faces two problems: firstly, sensing deviation, also called false positive, namely different scenes look similar and are judged to be a loop; the second is perceptual variation, also called false negative, that is, the same scene is not judged to be a loop because of different illumination, viewing angle, dynamic objects, etc. A good loop detection algorithm should be able to overcome both of these problems. Many appearance-based loop detection algorithms adopt bag-of-word models to achieve good effects, but the image features are based on manually designed features, and when the illumination change in the environment is obvious, the method is easy to fail. The deep neural network can automatically learn the feature representation of the image from a large amount of data, and a large amount of research shows that the features learned by the convolutional neural network are good in robustness to illumination changes in the environment. However, the convolutional neural network extracts global features, when the visual angle of the image changes greatly, the success rate of detecting the loop is not high, and the convolutional neural network belongs to supervised learning and can be trained only by acquiring a large number of labels.

Disclosure of Invention

In view of this, embodiments of the present application provide a loop detection method, a loop detection apparatus, a terminal device, and a computer-readable storage medium, so as to perform loop detection using a convolutional self-coding structure without supervised learning, thereby improving a success rate of loop detection and robustness in a complex environment.

A first aspect of an embodiment of the present application provides a loop detection method, where the loop detection method includes:

acquiring a current frame and a plurality of historical frames corresponding to the current frame;

inputting the current frame and the plurality of historical frames into a trained convolutional self-coding structure, and outputting a feature descriptor of the current frame and a feature descriptor of each historical frame in the plurality of historical frames;

calculating the Euclidean distance between the current frame and each historical frame in the plurality of historical frames according to the feature descriptor of the current frame and the feature descriptor of each historical frame in the plurality of historical frames;

and determining the frame with the shortest Euclidean distance from the current frame to the plurality of historical frames as a loop.

A second aspect of the embodiments of the present application provides a loop detection apparatus, including:

the frame acquisition module is used for acquiring a current frame and a plurality of historical frames corresponding to the current frame;

the characteristic output module is used for inputting the current frame and the plurality of historical frames into a trained convolutional self-coding structure and outputting a characteristic descriptor of the current frame and a characteristic descriptor of each historical frame in the plurality of historical frames;

a distance calculation module, configured to calculate an euclidean distance between the current frame and each of the plurality of historical frames according to the feature descriptors of the current frame and the feature descriptors of each of the plurality of historical frames;

and the loop determining module is used for determining the frame with the shortest Euclidean distance from the current frame to the plurality of historical frames as a loop.

A third aspect of embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the loop detection method according to the first aspect when executing the computer program

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the loop detection method according to the first aspect.

A fifth aspect of the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the loop detection method as described in the first aspect above.

From the above, after a plurality of historical frames corresponding to a current frame and the current frame are obtained, the current frame and the historical frames are input into a trained convolutional self-coding structure, a feature descriptor of the current frame and a feature descriptor of each historical frame in the historical frames are output, the Euclidean distance between the current frame and each historical frame is calculated according to the feature descriptor of the current frame and the feature descriptor of each historical frame, and therefore frames representing the same scene or the same place as the current frame are selected according to the Euclidean distance, and loop detection is completed. According to the method and the device, through the convolution self-coding structure of unsupervised learning, the characteristic descriptors which can adapt to more complex environment changes and more robust can be extracted, loop detection is carried out by utilizing the characteristic descriptors, and the success rate of loop detection and the robustness in a complex environment can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a loop detection method according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of a convolutional self-coding structure;

FIG. 3 is an exemplary diagram of a four-scale pyramid pooling structure;

fig. 4 is a schematic flow chart illustrating an implementation of a loop detection method according to a second embodiment of the present application;

FIG. 5 is an exemplary graph of a random projective transformation;

fig. 6 is a schematic view of a loop detection device provided in the third embodiment of the present application;

fig. 7 is a schematic diagram of a terminal device according to a fourth embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In the discussion that follows, a terminal device that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal device may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of this embodiment.

In order to explain the technical means described in the present application, the following description will be given by way of specific examples.

Referring to fig. 1, is a schematic view of an implementation flow of a loop detection method provided in an embodiment of the present application, where the loop detection method is applied to a terminal device, and as shown in the figure, the loop detection method may include the following steps:

step S101, a current frame and a plurality of historical frames corresponding to the current frame are obtained.

In the embodiment of the present application, the loop detection is also called closed loop detection, which refers to the capability of the terminal device to identify a scene that has arrived, and if the detection is successful, the accumulated error can be significantly reduced. The current frame may be a frame to be subjected to loop detection; the plurality of history frames corresponding to the current frame may be frames for which loop detection has been performed, where the history frames occur before the current frame, for example, there are five video frames, and the fifth video frame is the current frame, so that the first four video frames are history frames of the current frame.

Step S102, inputting the current frame and the plurality of historical frames into a trained convolutional self-coding structure, and outputting a feature descriptor of the current frame and a feature descriptor of each historical frame in the plurality of historical frames.

In the embodiment of the application, the feature descriptor is a representation of an image, useful information is extracted to discard irrelevant information, for example, to predict buttons on clothes in an image, the buttons are usually round, and holes are formed on the buttons, so that the image can be changed into an image with only edges by edge detection for a long time, the edge information is useful for the image, the color information is useless, and good features can usually distinguish the buttons from other round objects. An image of w x h x 3 (width x h 3,3 channels) is converted by the feature descriptor into a vector or matrix of length n. For example, a 64 x 128 x 3 image, the converted output image vector may be 3780 in length.

In this embodiment of the present application, the current frame and the plurality of history frames may be sequentially input to the convolutional self-coding structure, or the current frame and the plurality of history frames may be input to the convolutional self-coding structure together, which is not limited herein. In outputting the feature descriptor for the current frame and the feature descriptor for each of the plurality of historical frames, the convolutional self-coding structure may output the feature descriptor for the current frame and the feature descriptor for each of the plurality of historical frames, respectively.

The convolution self-coding structure is an unsupervised learning algorithm, the output of the algorithm can realize the reproduction of input data, and the algorithm is a data compression algorithm. An unsupervised deep learning structure based on self-coding is adopted, the spatial local characteristics of the image are considered, and the method is excellent in generalization, robustness and the like. Compared with a convolutional neural network, the convolutional self-coding structure can be trained without data containing marks, so that the workload of the marks can be effectively reduced, the responsibility of a training model is simplified, and the training efficiency of the convolutional self-coding structure is improved.

Optionally, the convolutional self-coding structure includes a plurality of convolutional layers, a pyramid pooling structure and a plurality of full-connection layers, and the pyramid pooling structure is connected to the last convolutional layer and the first full-connection layer, respectively.

In the embodiment of the application, a plurality of convolutional layers and a pyramid pooling structure in the convolutional self-coding structure simulate the coding process of the convolutional self-coding structure to realize data compression; and simulating a decoding process by a plurality of full connection layers in the convolution self-coding structure to realize decompression. The encoding stage maps high-dimensional data to low-dimensional data to reduce the amount of data, whereas the decoding stage, just the opposite, enables reproduction of the input data. Fig. 2 is an exemplary diagram of a convolutional self-coding structure, where the convolutional self-coding structure in fig. 2 includes four convolutional layers, a pyramid pooling structure (i.e., a spatial pyramid structure), and three full-connected layers, the last convolutional layer is connected to the input of the pyramid pooling structure, and the output of the pyramid pooling structure is connected to the first full-connected layer.

An important step in convolutional networks when performing pooling operations is to reduce the complexity of the network by reducing the dimensionality of the features and to maintain some invariance, for example: rotation invariance, translation invariance, scaling invariance, thereby creating a feature that is invariant to minor changes and distortions in the image. However, local feature information is easily lost by the pooling operation, so that more comprehensive information can be extracted by using a plurality of pyramid pooling operations with different scales for the pyramid pooling structure in the convolutional self-coding structure in the embodiment of the present application. As fig. 3 is an exemplary diagram of a four-scale pyramid pooling structure, the exemplary diagram of the pyramid pooling structure in fig. 3 uses four different-scale pyramid pooling operations.

The pyramid pooling structure can extract feature vectors of fixed size by multi-scale features. Pyramid pooling differs from conventional pooling in that pyramid pooling can produce a fixed number of output profiles. When the convolutional neural network is used, pyramid pooling is used for replacing a pooling layer after the last convolutional layer in the convolutional neural network. Thus, the network can accept input of any size and has certain robustness to target deformation. Since many aspects of feature extraction are performed on each picture behind the convolutional layer, the precision of the task can be improved. Similar to the training of pictures with different sizes in different networks, the accuracy of the model is greatly improved.

Due to the parameter sharing mechanism of convolution operations, the convolution feature maps can be interpreted as detection scores obtained by applying convolution filters on the input image, and locations with high activation values indicate that there are visual patterns around them that the filters are searching for. It is observed that typically the convolution signature is sparse, since only a few locations have high activation and there are some visual patterns present. This indicates that the convolution filter is highly selective for certain visual modes. When the same place is viewed from different angles, some of their visual patterns remain, detectable by the same convolution filter. With this observation, a multi-scale merging method can be applied to search for the most prominent visual pattern at multiple locations of an image in order to match the image across different viewpoints. For each convolved feature, a number of different scale pyramidal pooling operations can be applied, each convolved feature is first divided into H (i.e., m, n, p, and q in FIG. 3) cells, and within each spatial cell, we pool the feature descriptors using maximal pooling. After the merging operation, the feature maps with any size can be reduced to low-dimensional vectors, further reducing the computational complexity. And finally, connecting the feature descriptors obtained by the pyramid pooling operation of a plurality of different scales to form the feature descriptors of the image.

Optionally, the outputting the feature descriptor of the current frame and the feature descriptor of each of the plurality of historical frames includes:

and outputting the feature descriptors of the current frame and the feature descriptors of each historical frame in the plurality of historical frames through the pyramid pooling structure.

Step S103, calculating Euclidean distance between the current frame and each historical frame in the plurality of historical frames according to the feature descriptors of the current frame and the feature descriptors of each historical frame in the plurality of historical frames.

And step S104, determining the frame with the shortest Euclidean distance between the current frame and the plurality of historical frames as a loop.

In the embodiment of the application, the plurality of history frames correspond to the plurality of euclidean distances, after the euclidean distance between the current frame and each history frame is obtained through calculation, the plurality of euclidean distances may be compared, and the shortest euclidean distance is selected from the plurality of euclidean distances, and the history frame corresponding to the shortest euclidean distance is a frame looped back to the current frame, that is, the current frame and the history frame corresponding to the shortest euclidean distance are looped back. Optionally, in order to further improve accuracy of loop back, a distance threshold may be preset, and the distance threshold is used to determine whether the historical frame and the current frame are loop back, for example, after selecting the shortest euclidean distance, determine whether the shortest euclidean distance is smaller than the distance threshold, and if the shortest euclidean distance is smaller than the distance threshold, determine that the historical frame corresponding to the current frame and the shortest euclidean distance is loop back; if the shortest Euclidean distance is not smaller than the distance threshold, determining that the current frame and the historical frame corresponding to the shortest Euclidean distance are not a loop, namely that no frame looped back to the current frame exists in the plurality of historical frames.

According to the embodiment of the application, through the unsupervised learning convolution self-coding structure, the characteristic descriptor which can adapt to more complex environment change and is more robust can be extracted, loop detection is carried out by using the characteristic descriptor, and the success rate of loop detection and the robustness in a complex environment can be improved.

Referring to fig. 4, which is a schematic view of an implementation flow of a loop detection method provided in the second embodiment of the present application, where the loop detection method is applied to a terminal device, as shown in the figure, the loop detection method may include the following steps:

step S401, training a convolution self-coding structure.

In the embodiment of the application, when the feature descriptors of the current frame and the historical frame are acquired by using the convolutional self-coding structure, the convolutional self-coding structure needs to be trained in advance, so that the trained convolutional self-coding structure can extract the feature descriptors which can meet the invariance to illumination change and the invariance to view angle change, namely extract the more robust feature descriptors (namely image expression or feature expression).

Optionally, the training convolutional self-coding structure includes:

acquiring an image training set;

generating an image pair from each image in the image training set;

calculating a Histogram of Oriented Gradients (HOG) descriptor of one image in each image pair;

inputting the other image in each image pair into the convolutional self-coding structure, and outputting a feature descriptor of the other image in each image pair;

calculating a loss function of the HOG descriptor of one image in each image and the feature descriptor of the other image in each image pair;

and training the convolution self-coding structure according to the loss function.

In this embodiment of the present application, the image training set may refer to a plurality of images for training the self-coding structure, that is, an image set for training the self-coding structure, and a user may select the image set according to actual needs, which is not limited herein. The scene represented by one image pair generated by each image is the same, and the scene may be different only in view angle, so that the convolutional self-coding structure is trained according to the image pair generated by each image, and the feature descriptors which can meet the invariance to illumination change and the invariance to view angle change can be output or extracted by the convolutional self-coding structure.

In the embodiment of the present application, after generating an image pair for each image, one input image can be randomly selected from the image pairThe image expression is automatically learned in a convolution self-coding structure, and the directional Histogram (HOG) descriptor of the other image is calculated. The HOG descriptor forms features by calculating and counting a gradient direction histogram of a local area of an image, can keep good invariance to geometric and optical deformation, namely has strong robustness to environmental change, and the main idea of the features is as follows: the appearance and the character of the local object in the image can be well described by the direction density of the gradient or the edge, the essence of the description is the statistical information of the gradient, and the gradient mainly exists in the edge. In practice, the image is divided into small cell units, and a gradient direction (or edge direction) histogram is calculated for each cell unit. Can use the formula

And

find the magnitude and direction of the gradient, where g _x Denotes the gradient of the cell unit in the x-direction, g _y Represents the gradient of the cell unit in the y direction, and (x, y) represents the coordinates of the cell unit.

For better invariance to illumination and shading, contrast normalization of the gradient direction histogram is required, which can be achieved by grouping the cells into larger blocks and normalizing all the cells within a block, the HOG descriptors of all the blocks being combined to form the final HOG descriptor.

In the embodiment of the application, the loss function measures the difference between one image and the other image in each image pair, parameters in the convolutional self-coding structure can be updated according to the loss function in the back propagation process, so that the convolutional self-coding structure is trained, when the convolutional self-coding structure is stable, the training of the convolutional self-coding structure is completed, and after the training is completed, the output of the pyramid pooling structure is used as an image expression (namely, a feature descriptor).

Optionally, the generating one image pair for each image in the image training set includes:

and carrying out random projection transformation on each image in the image training set to generate an image pair.

In the embodiment of the application, the projective transformation is a non-singular linear transformation in homogeneous coordinates, which aims to model the geometric distortion generated when a plane is imaged on a perspective camera, and has nine degrees of freedom but only a proportional meaning, so that the projective transformation can be defined by eight parameters, and the projective transformation between two planes can be determined by four pairs of matching points, wherein any three points on one plane are not collinear. The random projective transformation may mean that four points selected from the original image are random, as shown in fig. 5, which is an example of the random projective transformation, and in fig. 5, the image is an image pair, and the image on the left side is randomly projectively transformed to obtain an image on the right side. By randomly projecting a varying warped image, the natural perspective variation due to terminal device (e.g., robot) motion can be better simulated. Wherein the projective transformation may be decomposed into a cascade of a similarity transformation, an affine transformation and a projective transformation.

As an example, the training process of the convolutional self-coding structure may be: for any input image, firstly generating an image pair by using the random projection transformation to obtain the appearance expression of the same image under different visual angles; then randomly selecting and calculating an HOG descriptor, wherein the HOG descriptor can learn better illumination invariance, inputting another image into a convolution self-coding structure to automatically learn image characteristics, performing multi-angle characteristic extraction on the image by using pyramid pooling operation on the last convolution layer, and then constructing a characteristic vector with fixed dimensionality by using a plurality of full-connection layers; and finally, comparing the descriptors calculated by the two methods, wherein the HOG descriptor is a vector with a fixed length, so that the HOG descriptor can be compared through Euclidean distance, the Euclidean distance can be easily integrated into a neural network with L2 loss, so that the HOG descriptor can be compared with the feature descriptor reconstructed by the convolutional self-coding structure by using an L2 loss function, and the convolutional self-coding structure can finally learn the more robust feature descriptor through repeated learning reconstruction.

Optionally, before generating an image pair for each image in the image training set, the method further includes:

and converting each image in the image training set into a gray scale map.

In the embodiment of the present application, in order to reduce the amount of original data of each image in the image training set and facilitate a small amount of calculation in the subsequent processing of each image, each image may be converted into a grayscale map, and then an image pair in which the original image (i.e., each image in the image training set) is deformed may be obtained by performing an operation with a random projection change.

Step S402, a current frame and a plurality of historical frames corresponding to the current frame are obtained.

The step is the same as step S101, and reference may be made to the related description of step S101, which is not repeated herein.

Step S403, inputting the current frame and the plurality of history frames into a trained convolutional self-coding structure, and outputting a feature descriptor of the current frame and a feature descriptor of each of the plurality of history frames.

The step is the same as step S102, and reference may be made to the related description of step S102, which is not repeated herein.

Step S404, calculating an euclidean distance between the current frame and each of the plurality of historical frames according to the feature descriptors of the current frame and the feature descriptors of each of the plurality of historical frames.

The step is the same as step S103, and reference may be made to the related description of step S103, which is not described herein again.

Step S405, determining a frame with the shortest euclidean distance between the current frame and the plurality of historical frames as a loopback.

The step is the same as step S104, and reference may be made to the related description of step S104, which is not repeated herein.

The embodiment of the application adds a training convolution self-coding structure on the basis of the implementation I, designs an unsupervised convolution self-coding structure through a directional gradient histogram and a self-coding structure in a neural network, on one hand, learns the image expression of an image through the directional gradient histogram, on the other hand, automatically learns and reconstructs an original image through the convolution self-coding network, and combines the advantages of two methods, so that the finally extracted features can not only meet the invariance to illumination change but also meet the invariance to angle change, and a more robust feature descriptor is extracted.

Fig. 6 is a schematic diagram of a loop detection apparatus provided in the third embodiment of the present application, and for convenience of description, only the portions related to the third embodiment of the present application are shown.

The loop detection device includes:

a frame obtaining module 61, configured to obtain a current frame and multiple historical frames corresponding to the current frame;

a feature output module 62, configured to input the current frame and the multiple historical frames into a trained convolutional self-coding structure, and output a feature descriptor of the current frame and a feature descriptor of each of the multiple historical frames;

a distance calculating module 63, configured to calculate an euclidean distance between the current frame and each of the plurality of historical frames according to the feature descriptors of the current frame and the feature descriptors of each of the plurality of historical frames;

a loop determining module 64, configured to determine that a frame with a shortest euclidean distance between the current frame and the plurality of historical frames is a loop.

Optionally, the convolutional self-coding structure includes multiple convolutional layers, a pyramid pooling structure and multiple fully-connected layers, where the pyramid pooling structure is connected to the last convolutional layer and the first fully-connected layer, respectively.

Optionally, the feature output module 62 is specifically configured to:

and outputting the feature descriptor of the current frame and the feature descriptor of each historical frame in the plurality of historical frames through the pyramid pooling structure.

Optionally, the loop detection apparatus further includes:

a structure training module 65 for training the convolutional self-coding structure.

Optionally, the structure training module 65 includes:

an acquisition unit for acquiring an image training set;

the generating unit is used for generating an image pair for each image in the image training set;

the descriptor calculation unit is used for calculating a HOG descriptor of the direction gradient histogram of one image in each image pair;

the output unit is used for inputting the other image in each image pair into the convolutional self-coding structure and outputting a feature descriptor of the other image in each image pair;

a loss calculating unit, configured to calculate a loss function between the HOG descriptor of one image in each image and the feature descriptor of the other image in each image pair;

and the training unit is used for training the convolutional self-coding structure according to the loss function.

Optionally, the generating unit is specifically configured to:

Optionally, the structure training module 65 further includes:

and the conversion unit is used for converting each image in the image training set into a gray scale map.

The loop detection device provided in the embodiment of the present application can be applied to the first method embodiment and the second method embodiment, and for details, reference is made to the description of the first method embodiment and the second method embodiment, and details are not repeated here.

Fig. 7 is a schematic diagram of a terminal device according to a fourth embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72 stored in said memory 71 and executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-described embodiments of the loop detection method, such as the steps S101 to S104 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 61 to 65 shown in fig. 6.

Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the terminal device 7. For example, the computer program 72 may be divided into a frame acquisition module, a feature output module, a distance calculation module, a loop determination module, and a structure training module, and the specific functions of the modules are as follows:

a distance calculation module, configured to calculate an euclidean distance between the current frame and each of the plurality of historical frames according to the feature descriptor of the current frame and the feature descriptor of each of the plurality of historical frames;

Optionally, the feature output module is specifically configured to:

Optionally, the structure training module is configured to train the convolutional self-coding structure.

Optionally, the structure training module includes:

an acquisition unit for acquiring an image training set;

a loss calculation unit for calculating a loss function of the HOG descriptor of one image in each image and the feature descriptor of the other image in each image pair;

and the training unit is used for training the convolution self-coding structure according to the loss function.

Optionally, the generating unit is specifically configured to:

Optionally, the structure training module further includes:

The terminal device 7 can be a robot, an unmanned aerial vehicle and other devices which need to perform loop detection. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of a terminal device 7 and does not constitute a limitation of the terminal device 7, and may include more or fewer components than shown, or some of the components may be combined, or different components, e.g. the terminal device may also include input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A loop detection method, comprising:

calculating the Euclidean distance between the current frame and each historical frame in the plurality of historical frames according to the feature descriptors of the current frame and the feature descriptors of each historical frame in the plurality of historical frames;

determining a frame with the shortest Euclidean distance from the current frame to the plurality of historical frames as a loop;

the loop detection method further comprises the following steps:

training the convolutional self-coding structure;

the training the convolutional self-coding structure comprises:

acquiring an image training set;

generating an image pair from each image in the image training set;

2. The loop-back detection method of claim 1, wherein the convolutional self-coding structure comprises a plurality of convolutional layers, a pyramid-pooling structure and a plurality of fully-connected layers, the pyramid-pooling structure being connected to a last convolutional layer and a first fully-connected layer, respectively.

3. The loopback detection method as recited in claim 2, wherein said outputting a feature descriptor for the current frame and a feature descriptor for each of the plurality of historical frames comprises:

4. The loop back detection method of claim 1, wherein said generating an image pair for each image in said training set of images comprises:

5. The loop back detection method of claim 1, further comprising, prior to generating an image pair for each image in the training set of images:

and converting each image in the image training set into a gray scale map.

6. A loop detection apparatus, comprising:

a loop determining module, configured to determine that a frame with a shortest euclidean distance between the current frame and the plurality of historical frames is a loop;

the loop detection device further comprises:

the structure training module is used for training the convolution self-coding structure;

the structural training module includes:

an acquisition unit for acquiring an image training set;

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the loop detection method according to any of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the loop detection method according to any one of claims 1 to 5.