CN115393614A

CN115393614A - Robot and matching method

Info

Publication number: CN115393614A
Application number: CN202110564841.5A
Authority: CN
Inventors: 徐宽; 吴伟; 陈超
Original assignee: Beijing Jizhijia Technology Co Ltd
Current assignee: Beijing Jizhijia Technology Co Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2022-11-25

Abstract

The present disclosure provides a robot and a matching method, wherein the robot includes: a vision sensor, and a control assembly; the vision sensor is configured to acquire a first image during the robot driving; the control component is configured to: performing feature extraction on the first image to obtain a feature map of the first image; obtaining a first object descriptor of at least one first object in the first image based on the feature map; and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object. The robot has higher matching accuracy when matching objects.

Description

Robot and matching method

Technical Field

The disclosure relates to the technical field of machine vision, in particular to a robot and a matching method.

Background

For semantic synchronized positioning and Mapping (SLAM) and Visual location Recognition (VPR), object description and object matching play a crucial role. When matching objects, there is a problem that matching accuracy is low.

Disclosure of Invention

The embodiment of the disclosure provides at least a robot and a matching method.

In a first aspect, an embodiment of the present disclosure provides a robot, including: a vision sensor, and a control assembly; wherein the vision sensor is configured to acquire a first image during the robot driving; the control component is configured to: performing feature extraction on a first image to obtain a feature map of the first image; obtaining a first object descriptor of at least one first object in the first image based on the feature map; and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object.

In one possible embodiment, the feature map includes a first feature map and a second feature map; the control component, when deriving a first object descriptor for at least one first object in the first image based on the feature map, is configured to: performing feature point detection processing on the first feature map to obtain feature point position information and feature point descriptors corresponding to the feature points in the first feature map; carrying out object detection processing on the second feature map to obtain object position information of the first object in the second feature map; obtaining a first object descriptor of the first object based on the feature point position information, the object position information, and the feature point descriptor.

In one possible implementation, the feature points include: the first object corresponds to an end point and/or a fixed point of the contour.

In a possible implementation, the control component, when deriving the first object descriptor for the first object based on the landmark positions, the object position, and the landmark descriptors, is configured to: determining a target feature point of the first object based on the object position information and the feature point position information; and obtaining a first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point.

In one possible implementation, the control component, when obtaining the first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point, is configured to: performing attention processing on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point by using a pre-trained graph neural network to obtain feature data representing the appearance feature and/or the structural feature of the first object; and performing feature aggregation processing on the feature data to obtain a first object descriptor of the first object.

In a possible implementation, before performing the feature aggregation process on the feature data to obtain the first object descriptor of the first object, the control component is further configured to: performing sparsification processing on the feature data; the performing feature aggregation processing on the feature data to obtain a first object descriptor of the first object includes: and performing feature aggregation processing on the feature data after the sparsification processing to obtain a first object descriptor of the first object.

In a possible embodiment, the control component, based on the first object descriptor and a second object descriptor of at least one second object in a second image, when matching the first object and the second object is configured to: determining similarity information of the first object and the second object based on the first object descriptor and the second object descriptor; comparing the similarity information with a preset similarity threshold; determining that the first object and the second object are the same object when the similarity information is greater than the similarity threshold.

In a possible embodiment, the second image comprises: the timestamp is earlier than the historical image of the first image.

In a possible embodiment, the control component is further configured to: based on the matching result, determining the position of the robot in the target scene when the first image is acquired.

In a second aspect, an embodiment of the present disclosure further provides a matching method, including: performing feature extraction on a first image to obtain a feature map of the first image; obtaining a first object descriptor of at least one first object in the first image based on the feature map; and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object.

In one possible embodiment, the feature map includes a first feature map and a second feature map; the obtaining a first object descriptor of at least one first object in the first image based on the feature map comprises: performing feature point detection processing on the first feature map to obtain feature point position information and feature point descriptors corresponding to the feature points in the first feature map; carrying out object detection processing on the second feature map to obtain object position information of the first object in the second feature map; obtaining a first object descriptor of the first object based on the feature point position information, the object position information, and the feature point descriptor.

In a possible implementation, obtaining a first object descriptor of the first object based on the feature point position, the object position, and the feature point descriptor includes: determining a target feature point of the first object based on the object position information and the feature point position information; and obtaining a first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point.

In one possible embodiment, obtaining the first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point includes: performing attention processing on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point by using a pre-trained graph neural network to obtain feature data representing the appearance feature and/or the structural feature of the first object; and performing feature aggregation processing on the feature data to obtain a first object descriptor of the first object.

In a possible implementation manner, before performing the feature aggregation processing on the feature data to obtain the first object descriptor of the first object, the method further includes: performing sparsification processing on the feature data; the performing feature aggregation processing on the feature data to obtain a first object descriptor of the first object includes: and performing feature aggregation processing on the feature data after the sparsification processing to obtain a first object descriptor of the first object.

In a possible embodiment, matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image comprises: determining similarity information of the first object and the second object based on the first object descriptor and the second object descriptor; comparing the similarity information with a preset similarity threshold; determining that the first object and the second object are the same object when the similarity information is greater than the similarity threshold.

In a possible implementation, the method further comprises: based on the matching result, determining the position of the robot in the target scene when the first image is acquired.

The embodiment of the disclosure determines a first object descriptor of at least one first object in a first image by using a feature map obtained by extracting features of the first image, and determines a second object descriptor of at least one second object in a second image, so as to match the first object and the second object, and obtain a matching result of the first object and the second object. Since the object descriptor describes the whole object, the matching by using the object descriptor has higher matching accuracy compared with the case that the feature points can only describe the local features of the object.

In addition, when the first object is represented by the first object descriptor, the data volume is smaller compared with the first object represented by the feature points, so when the whole map is subjected to object matching, the required calculation amount is less, the calculation force requirement is lower, and the embedded device can be more easily deployed.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is to be understood that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art to which the disclosure pertains without the benefit of the inventive faculty, and that additional related drawings may be derived therefrom.

Fig. 1 shows a schematic structural diagram of a robot provided by an embodiment of the present disclosure;

fig. 2 shows a flow chart of a matching method provided by the embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It is found through research that when the feature point matching is used to determine whether two objects are the same object, the determination is usually made by using the number or ratio of feature point matches on the two objects. Because the size of each object on the image is different, the quantity difference of the corresponding characteristic points is larger; moreover, when the view angle changes, different parts of the same object are projected on the image, and the characteristic points determined under different view angles are also different corresponding to the same object. Therefore, the manner of determining whether two objects are the same object using the number or ratio of feature point matches has a limitation, which is difficult to apply to matching between objects having different sizes or varying viewing angles, thereby causing a decrease in matching accuracy.

In addition, when the object matching is performed on the whole map, because the number of the objects in the map is large, the number of the feature points of the determined objects is multiplied, so that the calculation amount is huge when the matching is performed, and the calculation power of the embedded device is generally difficult to meet the calculation requirement, so that the current object matching algorithm is difficult to deploy into the embedded device such as a robot.

Based on the above research, the present disclosure provides a matching method for matching between a first object and a second object by determining a first object descriptor of each first object in a first image and performing matching using the first object descriptor and the second object descriptor, where the object descriptor describes the whole object and the matching using the object descriptor has higher matching accuracy than that a feature point can only describe a local feature of the object.

The above drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above problems and the solutions proposed by the present disclosure in the following description should be the contribution of the inventor to the present disclosure in the course of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.

In order to facilitate understanding of the embodiment, first, a robot disclosed in the embodiment of the present disclosure is described in detail, and then a matching method disclosed in the embodiment of the present disclosure is described in detail, where an execution subject of the matching method provided in the embodiment of the present disclosure may include: the robot or a server controlling the robot. In some possible implementations, the matching method may be implemented by a processor invoking computer readable instructions stored in a memory.

The robot can realize the matching between the objects in different images acquired when the robot moves in a target scene by adopting the matching method. In addition, the robot can be positioned according to the matching result of the object, or the robot is controlled to accurately move to a position which is expected to be reached in the target scene.

The following describes a robot and a matching method provided in the embodiments of the present disclosure.

Referring to fig. 1, a schematic structural diagram of a robot provided in an embodiment of the present disclosure is shown, where the robot includes: a vision sensor 10, and a control assembly 20;

wherein the vision sensor 10 is configured to acquire a first image during the robot driving;

the control assembly 20 is configured to: performing feature extraction on a first image to obtain a feature map of the first image; obtaining a first object descriptor of at least one first object in the first image based on the feature map; and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object.

In one possible embodiment, the feature map includes a first feature map and a second feature map; the control component 20, when deriving the first object descriptor of the at least one first object in the first image based on the feature map, is configured to: performing feature point detection processing on the first feature map to obtain feature point position information corresponding to the feature points in the first feature map and a feature point descriptor; performing object detection processing on the second feature map to obtain object position information of the first object in the second feature map; obtaining a first object descriptor of the first object based on the feature point position information, the object position information, and the feature point descriptor.

In one possible embodiment, the feature points include: the first object corresponds to an end point and/or a fixed point of the contour.

In a possible embodiment, the control component 20, when deriving the first object descriptor of the first object based on the feature point position, the object position, and the feature point descriptor, is configured to: determining a target feature point of the first object based on the object position information and the feature point position information; and obtaining a first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point.

In one possible implementation, the control component 20, when obtaining the first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point, is configured to: performing attention processing on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point by using a pre-trained graph neural network to obtain feature data representing the appearance feature and/or the structural feature of the first object; and performing feature aggregation processing on the feature data to obtain a first object descriptor of the first object.

In a possible implementation, before performing the feature aggregation process on the feature data to obtain the first object descriptor of the first object, the control component 20 is further configured to: carrying out sparsification processing on the feature data; the performing feature aggregation processing on the feature data to obtain a first object descriptor of the first object includes: and performing feature aggregation processing on the feature data after the sparsification processing to obtain a first object descriptor of the first object.

In one possible embodiment, the control component 20, based on the first object descriptor and a second object descriptor of at least one second object in a second image, when matching the first object and the second object, is configured to: determining similarity information of the first object and the second object based on the first object descriptor and the second object descriptor; comparing the similarity information with a preset similarity threshold; determining that the first object and the second object are the same object when the similarity information is greater than the similarity threshold.

In a possible embodiment, the control assembly 20 is further configured to: based on the matching result, determining the position of the robot in the target scene when the first image is acquired.

Based on the same inventive concept, the embodiment of the disclosure also provides a matching method corresponding to the robot method.

Referring to fig. 2, a flowchart of a matching method provided in the embodiment of the present disclosure includes:

s201: performing feature extraction on a first image to obtain a feature map of the first image;

s202: obtaining a first object descriptor of at least one first object in the first image based on the feature map;

s203: and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object.

The following describes details of S201 to S203.

The embodiment of the disclosure determines a first object descriptor of at least one first object in a first image by using a feature map obtained by extracting features of the first image, and determines a second object descriptor of at least one second object in a second image, so as to match the first object with the second object, and obtain a matching result of the first object and the second object. Since the object descriptor describes the whole object, the matching by using the object descriptor has higher matching accuracy compared with the case that the feature points can only describe the local features of the object.

For the above S201, the manner of acquiring the first image is also different in different scenes.

For example, in a smart warehousing scenario, a visual sensor may be mounted on a freight robot, for example. The robot runs in a running space where the robot can run, and then a visual sensor mounted on the robot carries out image acquisition to acquire a first image.

In addition, the matching method can also be applied to an automatic driving scene, for example, a visual sensor can be mounted on an automatic driving vehicle to acquire an image of a region where the automatic driving vehicle can drive in the driving process of the automatic driving vehicle so as to acquire a first image.

Next, an example in which the robot acquires the first image while traveling in the storage space will be described.

When feature extraction is performed on the first image, a feature map of the first image may be obtained using, for example, a Shared Encoder (Shared Encoder). Specifically, for example, the first feature map and the second feature map having different sizes may be obtained by using convolution kernels having different sizes. Alternatively, the first feature map and the second feature map with different sizes may be determined directly by using a convolutional neural network (VGG).

The dimensions of the size of the first image may include, for example, height H (right), width W (width), and number of channels n. Illustratively, the size of the first image may be represented as (H, W, n), for example. After feature extraction is performed on the first image, feature maps of different sizes, for example, can be obtained. The dimensions of the different feature maps may include, for example, (H/2, W/2, 64), (H/4, W/4, 128), (H/8, W/8, 256), (H/16, W/16, 512), and (H/32, W/32, 512).

Since the different-sized feature maps can express different meanings, for example, the feature map corresponding to a larger number of channels, for example, the feature map with the size of (H/32, W/32, 512), which has higher-level semantic information, is more suitable for the feature point detection processing; the feature map corresponding to the smaller number of channels, such as the feature map with the size of (H/8, w/8, 256), can more accurately reflect the shape (such as the appearance and structure features of the first object) and the position of the first object in the first image, and thus is more suitable for the object detection process.

In addition, for example, when the size information of the first image reflects that the height and width of the first image are small, if there is a case where the feature map can be applied to both the feature point detection processing and the object detection processing well after the feature extraction is performed on the first image, the feature map at the size can be used as the first feature map and as the second feature map in this case.

With respect to S202, in the case that the feature map is determined, a first object descriptor of at least one object in the first image may be obtained based on the determined feature map.

In the first image, the included object may include, but is not limited to, at least one of the following: goods shelves, containers, other robots, ground indication marks, and positioning point marks.

Illustratively, the objects included in the first image include one pallet and two containers.

In particular, in determining the first object descriptor of the at least one object in the first image, the following may for example be used: performing feature point detection processing on the first feature map to obtain feature point position information corresponding to the feature points in the first feature map and a feature point descriptor; carrying out object detection processing on the second feature map to obtain object position information of the first object in the second feature map; obtaining a first object descriptor of the first object based on the feature point position information, the object position information, and the feature point descriptor.

Next, a procedure of performing the feature point detection processing on the first feature map and the object detection processing on the second feature map will be described.

When the feature Point detection processing is performed on the first feature map, for example, a sparse feature Point and descriptor extraction module (Point Detector) may be used. Specifically, the sparse feature point and descriptor extraction module may perform feature point detection processing on the first feature map by using a deep learning network, namely, a super point, to determine feature points in the first feature map.

The SuperPoint network can be used for determining the position information of the feature points in the first feature map and the feature point descriptors corresponding to the feature points in a centralized manner; in addition, the characteristic points can be determined with higher accuracy, more characteristic points can be obtained, and the determined characteristic points have better dispersity so as to more easily and accurately describe the first object.

The feature points in the first feature map may include, for example, endpoints and/or fixed points of the corresponding contour of the first object. For example, where the first object comprises a shelf, the determined feature points in the first feature map may comprise, for example, end points corresponding to vertices of the shelf on the outline of the shelf mapped in the first feature map.

In addition, when the feature point detection processing is performed, feature point position information corresponding to the feature point of the first feature map and a feature point descriptor can be obtained. The feature point descriptor may have a one-to-one correspondence relationship with the determined feature point, so that the feature point descriptor is not changed with changes in illumination, viewing angle and the like, and has high uniqueness. Thus, by using the feature point descriptors, the accuracy of matching can be improved.

Here, since the first feature map is obtained by performing feature extraction on the first image, when the feature point position information and the feature point descriptor corresponding to the feature point in the first feature map are specified, the position information and the feature point descriptor of the feature point actually corresponding to the first object in the first image can be specified from the association between the first feature map and the first image.

In the object detection processing of the second feature map, for example, an instance Segmentation module (Segmentation) may be used. Specifically, the example segmentation module may perform object detection processing on the first feature map by using a Mask Region-CNN (Mask RCNN) in a Convolutional Neural Network (CNN) to obtain object position information of the first object in the second feature map.

Specifically, when the instance segmentation module performs the object detection processing on the first feature map, for example, the object position information of the first object in the second feature map may be determined by determining an instance segmentation mask (mask) of the instance segmentation module.

Here, since the second feature map is obtained by feature extraction of the first image as well, by determining the object position information of the first object in the second feature map, the actual object position information of the first object in the first image can be determined according to the association relationship between the second feature map and the first image.

Based on the feature point position information, the object position information, and the feature point descriptor, the first object descriptor of the first object may be obtained.

In a specific implementation, when determining the first object descriptor of the first object, for example, the following manner may be adopted: determining a target feature point of the first object based on the object position information and the feature point position information; and obtaining a first object descriptor of the first object based on the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point.

In one possible embodiment, an instance segmentation mask characterizing the object position information of the first object in the second feature map may be determined by the instance segmentation module, i.e. the specific position in the first image at which the first object is located may be determined by the instance segmentation mask. Here, the specific position at which the first object is located includes an area occupied in the first image. Then, by using the feature point position information, the feature point falling in the area occupied by the first object in the first image can be determined according to the specific position where the first object is located, and is taken as the target feature point corresponding to the first object.

After the target feature points are determined, using a pre-trained graph neural network to perform attention processing on feature point position information corresponding to the target feature points and feature point descriptors corresponding to the target feature points to obtain feature data representing appearance features and/or structural features of the first object; then, feature aggregation processing is carried out on the feature data to obtain a first object descriptor of the first object.

Here, after the feature point position information corresponding to the target feature point and the feature point descriptor corresponding to the target feature point are determined, for the feature point position information and the feature point descriptor, information that can be expressed still remains in a layer for describing the feature point, that is, the information that can be expressed is still insufficient to fully and accurately express the first object. Therefore, the feature point positions corresponding to the target feature points and the feature point descriptors corresponding to the target feature points are subjected to attention processing by using the graph neural network in sequence to learn the appearance features and/or the structural features of the first object, so as to obtain feature data of the appearance features and/or the structural features of the first object. In this way, the determined feature data of the appearance feature and/or the structural feature of the first object can better represent the appearance feature and/or the structural feature of the first object.

Before the feature data obtained is subjected to feature aggregation processing, the feature data can be subjected to sparsification processing. Here, since the first object corresponds to, a plurality of target feature points are provided; for a plurality of target feature points corresponding to the first object, after the attention processing is performed on the corresponding feature point positions and the feature point descriptors by using the graph neural network, a plurality of feature data are obtained, corresponding to the target feature points.

In this case, since the number of feature data may be large, the calculation effort required for matching using such feature data is also large; in addition, for the first object, when the plurality of determined target feature points characterize the first object, the corresponding importance proportions are different, so that the feature data can be thinned, and the influence of a single target feature point on the feature data of the first object as a whole can be reduced.

Illustratively, when the first object includes a shelf, the corresponding determined target feature point includes a feature point corresponding to a top corner of the shelf and a feature point on an edge of the shelf. The shelf is characterized in that the target characteristic points are arranged on the shelf, and the target characteristic points are arranged on the shelf and correspond to the top corners of the shelf.

When the structural characteristics of the shelf are expressed by using the corresponding target characteristic points for the side line of one side of the shelf, on one hand, the appearance characteristics and/or the structural characteristics of the shelf are difficult to be sufficiently expressed by the side line compared with the vertex angle; moreover, for one edge of the shelf, a plurality of target feature points may be corresponding to one edge, and the semantics that the plurality of target feature points may express are similar, so that matching with a plurality of target feature points corresponding to the edge may cause a waste of computing resources and reduce the matching efficiency.

Therefore, the feature data are subjected to sparsification, redundant data in the feature data are further reduced, and the accuracy of the feature data subjected to sparsification in the process of expressing the first object is improved.

After the feature data is subjected to sparsification processing, feature aggregation processing is performed on the feature data subjected to the sparsification processing, and a first object descriptor of the first object is obtained.

Here, after the feature aggregation processing is further performed on the feature data obtained after the thinning processing, the feature point descriptors in the plurality of feature data after the thinning processing may be further aggregated into one descriptor, and the descriptor may be used as the first object descriptor of the first object. The first object descriptor of the first object may comprise, for example, a multidimensional vector, for example, a 2048-dimensional descriptor vector corresponding to one of the multidimensional vectors.

Thus, for a first object in the first image, it may be characterized by a first object descriptor; when the first object descriptor is determined, the semantic information of the corresponding feature point can be well reflected by the utilized feature point descriptor, so that the determined first object descriptor can more accurately represent the semantic information of the first object.

For the above S203, the second image may be, for example, a second image different from the first image, which is acquired when the robot travels in the storage space. Illustratively, the second image may include, for example, a historical image having an earlier timestamp than the second image.

The manner of obtaining the second object descriptor of the at least one second object in the second image is similar to the manner of determining the first object descriptor of the at least one first object in the first image corresponding to S201 and S202 in fig. 2, and is not described herein again.

After the first object descriptor and the second object descriptor are determined, the first object descriptor and the second object descriptor can be used for matching the first object and the second object to obtain a matching result of the first object and the second object.

Wherein the first object and the second object respectively determined in the first image and in the second image may each comprise one or more. Therefore, when the first object descriptor and the second object descriptor are determined, the first object descriptors corresponding to the plurality of first objects and the second object descriptors corresponding to the plurality of second objects can be obtained.

In a specific implementation, for example, the following may be used: determining similarity information of the first object and the second object based on the first object descriptor and the second object descriptor; comparing the similarity information with a preset similarity threshold; determining that the first object and the second object are the same object when the similarity information is greater than the similarity threshold.

Specifically, in the case where the first object descriptor and the second object descriptor each include a plurality of descriptors, for example, the first object descriptor and each of the second object descriptors may be determined one by one to correspond to the similarity information, respectively, and then it may be determined whether or not a first object exists in the first object, which is the same object as the second object. The specific process may be determined according to actual conditions, and is not described herein again.

When determining the similarity information of the first object and the second object, for example, an inner product of the first object descriptor and the second object descriptor may be calculated, and a result obtained by calculating the inner product may be used as the similarity information of the first object and the second object.

Illustratively, in the case where the determined first object descriptor and the second object descriptor each correspond to a 2048-dimensional descriptor vector, products of corresponding dimensions in the two descriptor vectors are calculated, and then the resulting 2048 products are summed, and the resulting calculation result is taken as similarity information of the first object and the second object. Here, the obtained similarity information of the first object and the second object may be expressed as Sim, for example.

Here, since the obtained first object descriptor and the obtained second object descriptor can express the first object and the second object respectively more comprehensively and accurately, if the first object and the second object are the same object, the determined first object descriptor and the determined second object descriptor should be similar; that is, in this case, the inner product of the first object descriptor and the second object descriptor calculated should be large.

Therefore, a preset similarity threshold SIM may also be set, so as to compare the similarity threshold SIM with the similarity information SIM to determine whether the first object and the second object are the same object.

In one possible implementation, in a case that the similarity information is greater than the similarity threshold, determining that the first object and the second object are the same object; in another possible implementation, in a case that the similarity information is less than or equal to the similarity threshold, it is determined that the first object and the second object are not the same object.

Therefore, the first object and the second object can be quickly matched by utilizing the determined similarity information and only comparing the similarity information with the similarity threshold value, compared with a mode of matching the characteristic points respectively corresponding to the first object and the second object, the data size required by calculation is greatly reduced, the calculation is simple, the calculation force required by correspondence is less, and the method is suitable for being deployed on an actual system; in addition, the matching accuracy can be provided, and meanwhile, the matching efficiency can be further improved.

In addition, for the method of matching by using feature points, since the feature points cannot eliminate the influence of factors such as a change in view angle on the feature points of the recognized object, the method is often applied to matching of objects in two adjacent frames of images. In the matching method provided by the embodiment of the present disclosure, the object descriptor is used for matching the object, so that the time interval between the two images represented by the corresponding timestamps between the first image and the second image can be longer, and the matching of the first object in the first image and the second object in the second image is not affected.

In another embodiment of the present disclosure, the position of the robot in the target scene when the first image is acquired may also be determined based on the matching result.

Here, with the matching method provided by the embodiment of the present disclosure, loop detection and relocation may also be performed.

Illustratively, the target scene may include, for example, a location a. After the robot acquires the second image at location a, the robot continues to travel in the target scene. And after a period of time, controlling the robot to return to the place A by using a positioning result obtained by the determined second image to obtain the first image. At this time, if it is determined that the first object and the second object are the same object in the matching result when the first object in the first image and the second object in the second image are matched, it may be considered that the robot is controlled to reach the same position by using the positioning result, that is, the positioning result of the point a is relatively accurate, and at this time, the position of the robot may be labeled. And under the condition that the matching result obtained after matching indicates that the first object and the second object are not matched, the situation indicates that when the positioning result is used for controlling the robot to return to the place A, the robot cannot return to the same position, namely the current positioning is not accurate, and the detection and positioning are carried out again.

It will be understood by those of skill in the art that in the above method of the present embodiment, the order of writing the steps does not imply a strict order of execution and does not impose any limitations on the implementation, as the order of execution of the steps should be determined by their function and possibly inherent logic.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units into only one type of logical function may be implemented in other ways, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes and substitutions do not depart from the spirit and scope of the embodiments disclosed herein, and they should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A robot, comprising: a vision sensor, and a control assembly;

wherein the vision sensor is configured to acquire a first image during the robot driving;

the control component is configured to: performing feature extraction on the first image to obtain a feature map of the first image; obtaining a first object descriptor of at least one first object in the first image based on the feature map; and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object.

2. The robot of claim 1, wherein the feature map comprises a first feature map and a second feature map;

the control component, when deriving a first object descriptor for at least one first object in the first image based on the feature map, is configured to:

performing feature point detection processing on the first feature map to obtain feature point position information corresponding to the feature points in the first feature map and a feature point descriptor;

and

carrying out object detection processing on the second feature map to obtain object position information of the first object in the second feature map;

and obtaining a first object descriptor of the first object based on the characteristic point position information, the object position information and the characteristic point descriptor.

3. The robot of claim 1, wherein the control component, based on the first object descriptor and a second object descriptor of at least one second object in a second image, when matching the first object and the second object, is configured to:

determining similarity information of the first object and the second object based on the first object descriptor and the second object descriptor;

comparing the similarity information with a preset similarity threshold;

determining that the first object and the second object are the same object when the similarity information is greater than the similarity threshold.

4. The robot of claim 1, wherein the second image comprises: the timestamp is earlier than the historical image of the first image.

5. The robot of claim 1, wherein the control component is further configured to: based on the matching result, determining the position of the robot in the target scene when the first image is acquired.

6. A matching method, comprising:

performing feature extraction on a first image to obtain a feature map of the first image;

obtaining a first object descriptor of at least one first object in the first image based on the feature map;

and matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image to obtain a matching result of the first object and the second object.

7. The matching method according to claim 6, wherein the feature map includes a first feature map and a second feature map;

the obtaining a first object descriptor of at least one first object in the first image based on the feature map includes:

and

8. The matching method according to claim 6, wherein matching the first object and the second object based on the first object descriptor and a second object descriptor of at least one second object in a second image comprises:

comparing the similarity information with a preset similarity threshold;

9. The matching method according to claim 6, wherein the second image includes: the timestamp is earlier than the historical image of the first image.

10. The matching method according to claim 6, further comprising: based on the matching result, determining the position of the robot in the target scene when the first image is acquired.