CN112115286A

CN112115286A - Robot environment identification method and system based on deep reinforcement learning

Info

Publication number: CN112115286A
Application number: CN202010781668.XA
Authority: CN
Inventors: 朱太云; 陈忠; 杨为; 柯艳国; 胡迪; 赵恒阳; 蔡梦怡; 张国宝; 赵常威
Original assignee: Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd; State Grid Anhui Electric Power Co Ltd
Current assignee: Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd; State Grid Anhui Electric Power Co Ltd
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2020-12-22

Abstract

The embodiment of the invention provides a robot environment recognition method and system based on deep reinforcement learning, and belongs to the technical field of robot control. The method comprises the following steps: describing a visual image shot by the robot in real time based on global features; and matching the visual images in a preset database image set based on a depth reinforcement learning algorithm to obtain a current environment recognition result. According to the robot environment recognition method and system based on the depth reinforcement learning, the visual images are described by adopting the global features, and are matched in the preset database image set based on the depth reinforcement learning algorithm, so that the technical problem that the visual descriptors depend on the prior probability of designers in the prior art is solved, and the accuracy of visual environment recognition is improved.

Description

Robot environment identification method and system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of robot control, in particular to a robot environment identification method and system based on deep reinforcement learning.

Background

Environmental recognition is a leading research hotspot in the related fields of robotics, computer vision and the like, and is a core technology for realizing autonomous visual navigation and positioning by robots. In recent years, although mobile robot visual environment recognition research has made great progress, its practical application still faces great challenges. The main reason is that the visual feature representation method for generating the visual environment descriptor cannot adapt to the change of the external environment. The performance of visual environment descriptors depends greatly on visual feature representation methods whereas current visual feature representation methods mainly rely on traditional manual features in view of machine vision. These traditional manual features rely primarily on a priori knowledge of the designer and, by virtue of relying on manual parametrization. Therefore, the feature model mainly focuses on low-level features, the expression capability is insufficient, the invariance of extracted features is limited, and particularly, the invariance is seriously reduced when the environment changes, so that the robustness of a visual environment description algorithm is influenced to a greater extent, and the accuracy of visual environment identification is finally reduced.

Disclosure of Invention

The embodiment of the invention aims to provide a robot environment recognition method and system based on deep reinforcement learning, which can overcome the technical problem that a visual descriptor depends on the prior probability of a designer in the prior art and improve the accuracy of visual environment recognition.

In order to achieve the above object, an embodiment of the present invention provides a robot environment recognition method based on deep reinforcement learning, including:

describing a visual image shot by the robot in real time based on global features;

and matching the visual images in a preset database image set based on a depth reinforcement learning algorithm to obtain a current environment recognition result.

Optionally, describing the visual image based on the global features specifically includes:

inputting the visual image into a trained convolutional neural network for forward reasoning;

and carrying out normalization processing operation on the global features of the convolutional neural network to obtain global feature vectors serving as visual descriptors, wherein the global features comprise network features output by each network layer in the convolutional neural network.

Optionally, the method further comprises:

scaling the visual image to a predetermined size before inputting the visual image into a trained convolutional neural network for forward inference.

Optionally, matching the visual image in a preset database image set based on a depth-enhanced learning algorithm to obtain a current environment recognition result specifically includes:

and constructing an environment map according to the database image set to form a place memory.

Optionally, constructing the environment map according to the database image set to form a location memory specifically includes:

inputting each image in the database image set into the convolutional neural network respectively to obtain a corresponding global feature vector;

establishing a tree index based on the global feature vector;

associating the global feature vector, a corresponding image bounding box of the image, and a source of the image to form a lookup table;

respectively generating corresponding hash codes according to the global feature vectors to form a hash code database;

combining the lookup table, the hash code database, and the tree index to form the place memory.

Optionally, the method further comprises:

performing a coarse matching on the visual image based on the tree index;

and performing fine matching on the visual image based on the query table and the hash code database.

Optionally, the rough matching of the visual image based on the tree index specifically includes:

querying similar-feature K from the tree index according to the global feature vector of the visual image_nn-Nearest neighbor to form a nearest neighbor set;

voting for each image for the first time according to the similarity distance between the global feature vector of each image in the nearest neighbor set and the global feature vector of the visual image;

selecting front K with higher similarity_topThe images are selected as a set of candidate best matching images.

Optionally, the precisely matching the visual image based on the lookup table and the hash code database specifically includes:

generating a corresponding hash code according to the global feature vector of the visual image;

respectively calculating the Hamming distance between the Hash code of the visual image and each Hash code in the Hash code database to form a Hamming space;

searching a nearest neighbor hash code in the Hamming space by adopting a linear nearest neighbor searching method to form a hash code nearest neighbor set;

voting for each image for the second time according to the similarity of the hash code of each image in the hash code nearest neighbor set and the hash code of the visual image;

and selecting the image corresponding to the hash code with the highest vote number obtained in the second voting as a final location identification and result.

In another aspect, the present invention further provides a robot environment recognition system based on deep reinforcement learning, the system including a processor configured to execute any one of the methods described above.

In yet another aspect, the present invention also provides a storage medium storing instructions for reading by a machine to cause the machine to perform a method as claimed in any one of the above.

According to the technical scheme, the robot environment recognition method and system based on the depth reinforcement learning describe the visual images by adopting the global features, and match the visual images in the preset database image set based on the depth reinforcement learning algorithm, so that the technical problem that the visual descriptors depend on the prior probability of designers in the prior art is solved, and the accuracy of visual environment recognition is improved.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a flow chart of a robot environment recognition method based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a process of obtaining a visual descriptor according to one embodiment of the invention;

FIG. 3 is a flow diagram of a process of forming a location memory according to one embodiment of the invention;

FIG. 4 is a flow diagram of a matching process according to one embodiment of the invention;

FIG. 5 is a flow diagram of coarse matching according to one embodiment of the present invention; and

FIG. 6 is a flow diagram of fine matching according to one embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

In the embodiments of the present invention, unless otherwise specified, the use of directional terms such as "upper, lower, top, and bottom" is generally used with respect to the orientation shown in the drawings or the positional relationship of the components with respect to each other in the vertical, or gravitational direction.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between the various embodiments can be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not be within the protection scope of the present invention.

Fig. 1 is a flowchart illustrating a robot environment recognition method based on deep reinforcement learning according to an embodiment of the present invention. In fig. 1, the method may include:

in step S10, a visual image that the robot takes on the spot is described based on the global features. The global feature may be a network feature output by a network layer of each layer after the visual image is input into the convolutional neural network. Accordingly, the step S10 may further include the step shown in fig. 2. In fig. 2, the step S10 may include:

in step S20, the visual image is input into the trained convolutional neural network for forward inference, so as to obtain the network characteristics output by each layer of the network layer of the convolutional neural network.

In addition, in consideration of the number of the network features and the subsequent calculation amount, the processing may be performed in a normalized manner, that is, the step S10 may further include:

in step S21, a normalization process is performed on the global features of the convolutional neural network to obtain a global feature vector as a visual descriptor.

Further, it is considered that the calculation load of the device is high in the convolutional neural network calculation. Then to reduce this computational load, the visual image may also be scaled to a predetermined size before being input into the trained convolutional neural network. Wherein the predetermined dimension may be determined according to actual operating conditions, as would be known to one skilled in the art.

In step S11, the visual images are matched in the preset database image set based on the depth-enhanced learning algorithm to obtain the current environment recognition result. In this embodiment, for the database image set, to facilitate the matching calculation, an environment map may be first constructed from the database image set to form a location memory. In particular, it may be the steps shown in fig. 3, for example. In fig. 3, the step of forming the location memory may specifically include:

in step S30, each image in the database image set is input into a convolutional neural network to obtain a corresponding global feature vector. Specifically, the images in the database image set may be input into a convolutional neural network, and then normalized calculation is performed on the network features output by each layer of the convolutional neural network, so as to obtain the corresponding global feature vector.

In step S31, a tree index is built based on the global feature vector.

In step S32, the global feature vector, the image bounding box of the corresponding image, and the source of the image are associated to form a lookup table.

In step S33, a corresponding hash code is generated from each global feature vector to form a hash code database.

In step S34, the lookup table, hash code database, and tree index are combined to form a place memory.

Consider that in the method illustrated in FIG. 2, both a hash code database and a tree index match set have been established based on a database image set. Then, in this step S11, the way of matching should be set with the corresponding two for the two matching sets. In this embodiment, the inventors consider that the matching process of the tree index is relatively fast, but the accuracy of the matching is relatively poor. The process of hash code matching is relatively slow, but the matching precision is high. Therefore, the inventors devised the steps as shown in fig. 4 for this matching approach. In fig. 4, the matching method may specifically include:

in step S40, the visual image is roughly matched based on the tree index. In particular, the course of the coarse matching may be, for example, the steps shown in fig. 5. In fig. 5, the process of rough matching may include:

in step S50, K with similar features is searched for from the tree index according to the global feature vector of the visual image_nn-Nearest neighbors to form a set of nearest neighbors.

In step S51, a first vote is performed on each image in the nearest neighbor set according to the similarity distance between the global feature vector of each image and the global feature vector of the visual image.

In step S52, the top K with higher similarity is selected_topThe images are taken as a candidate best matching image set.

In step S41, the visual image is fine-matched based on the lookup table and the hash code database. In particular, the process of fine matching may be, for example, the steps shown in fig. 6. In fig. 6, the process of fine matching may include:

in step S60, a corresponding hash code is generated from the global feature vector of the visual image.

In step S61, hamming distances of the hash code of the visual image and each hash code in the hash code database are respectively calculated to form a hamming space.

In step S62, a nearest neighbor hash code is found in hamming space by using a linear nearest neighbor search method to form a hash code nearest neighbor set.

In step S63, each image is voted for a second time according to the similarity between the hash code of each image in the hash code nearest neighbor set and the hash code of the visual image.

In step S64, the image corresponding to the hash code with the highest vote count obtained in the second voting is selected as the final location identification and result.

In another aspect, the present invention also provides a deep reinforcement learning-based robot environment recognition system, which may include a processor configured to perform any one of the methods described above.

In yet another aspect, the present invention also provides a storage medium which may store instructions which are readable by a machine to cause the machine to perform any one of the methods described above.

Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention will not be described separately for the various possible combinations.

Those skilled in the art can understand that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a (may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In addition, various different embodiments of the present invention may be arbitrarily combined with each other, and the embodiments of the present invention should be considered as disclosed in the disclosure of the embodiments of the present invention as long as the embodiments do not depart from the spirit of the embodiments of the present invention.

Claims

1. A robot environment recognition method based on deep reinforcement learning is characterized by comprising the following steps:

2. The method according to claim 1, wherein describing the visual image based on global features specifically comprises:

3. The method of claim 2, further comprising:

4. The method of claim 1, wherein matching the visual image in a preset database image set based on a depth-enhanced learning algorithm to obtain a current environment recognition result specifically comprises:

5. The method of claim 4, wherein constructing an environment map from the database image set to form a location memory specifically comprises:

establishing a tree index based on the global feature vector;

6. The method of claim 5, further comprising:

performing a coarse matching on the visual image based on the tree index;

7. The method of claim 6, wherein coarsely matching the visual image based on the tree index specifically comprises:

querying similar-feature K from the tree index according to the global feature vector of the visual image_nn-nearest neighbors to form a set of nearest neighbors;

8. The method of claim 7, wherein fine-matching the visual image based on the lookup table and the hash code database specifically comprises:

9. A deep reinforcement learning based robotic environment recognition system, the system comprising a processor configured to perform the method of any of claims 1 to 8.

10. A storage medium storing instructions for reading by a machine to cause the machine to perform a method according to any one of claims 1 to 8.